With working on deploying sites on Amazon EC2 we wanted to cut down on the time that it took to sync the sites from our seed servers to the new server. Our logical way to do this was to put a snapshot of the sites are Amazon S3 as a tgz file. Our sites are around 2gb each and while that is no problem to S3 as it max file size is 5gb but in interest of taking less time to upload each site and save on the chance of the file be corrupted on upload we decided to split the files.
Splitting The Archive In To Pieces.
tar czPf - /dir_to_tar/ | split -b 200m -d - test_backup.tgz.
The code is very simple. First well tell tar to create (c), gzip (g), absolute names (P) and force (f) and the path that we want in the archive. It then gets piped to the split command where split is told to make no larger than 200mb files (b) and to put numeric suffixes on the files (d). Since we don’t have an input file the – represents the stream in from the tar command and we want it to output to test_backup.tgz. as the 00, 01, 02…. will be added to the end of that file.
After that is done running we just put the files on s3 using the s3Sync utility.
Combining The Pieces In To One Archive:
What happens when you want to extract the archive that you split. You can just extract each file like you normally do. This is how I do it but I’m always open for better ideas.
cat test_backup.tgz.* > test_backup.tgz
I must state that I’ve never had a split archive over 09 so i’m not sure how it will act when when you get more than that but like I said before if you have any better methods I’m open to new and better ways.

