About TT's Zip Archive Package
Version: 1.3.1 (22 March 2007)
This is a collection of REALbasic Classes to extract and create ZIP archives (known as PKZIP and Info-ZIP formats).
Written March 2003 by Thomas Tempelmann
For updates see http://www.tempel.org/rb/#zip
For feedback and questions, write to: tt@tempel.org
This code is free for your own use, but you are encouraged to send me some money if you have a use for it (consider it shareware). See the end of this document for more information.
Current requirements: REALbasic 5.5.5 or later. The Einhugur e-CryptIt plugin is also required, which must be acquired separately.
It implements only a subset of the entire ZIP archive definition, though. Here are the known restrictions:
- Can't do encryption (neither decryption).
- Supports only "stored" and "deflate" compression methods, which are the most common ones. This means that it can't read all possible ZIP archives because some of them use different compression methods, but if you're using any Zip creating tool, you usually have control over which methods should be used. And "deflate" is usually the most effective, anyways.
- No support for multi-segment archives.
- There is no provision for loss of file name information under older Mac OS versions (before 8.6): If an archive item has a name longer than 31 chars and if it is extracted using the provided function ZipEntry.MakeDestination(), the name may get cut off at the end without preserving an extension. Under Mac OS 8.6 and later, including Mac OS X, as well as under MS Windows, long file names will be preserved.
- Can't deal well with corrupted archives. Theoretically, if parts of an archive are corrupted, redundant information in the archive may allow to retrieve the non-damaged data. But that requires extended functionality that these classes do not provide. At least, however, this software should be able to detect such damaged archives so that the user can use other options (more capable tools) to deal with this exceptional case.
The only thing to be sure about is that files created by this class can be read by any modern ZIP tool, such as ZipIt (for Mac OS), OS X Tiger's Archives, WinZIP (for Windows), Stuffit Expander (Mac and Windows) and, of course, by itself.
A nice feature is that you can not only add items to an existing archive, but even remove them and then compact the archive to gain the space back.
Another cool thing I've added is the class ZipSnapshots: It allows you to make incremental backups of the same folder, in which unchanged files will not be stored multiple times. See the notes in the class and the code in the demo app to learn how to use it.
For more information about the ZIP archive format, search on www.google.com for "ZIP format appnote". The "appnote.txt" describes the format.
This code has been originally developed using RB 4.5.3 and has been tested to compile and run with RB 5.1. It has been tested by me (TT) on Mac OS 9.2.2 and on Windows 98, and a few other people have done more compatibility testing. In January 2007, it has been updated to compile with RB 2006r4 and 2007r1, and work on Intel based Macs as well.
Programming information
Note: To actually be able to create and read compressed items in and archive you will also need a plugin providing the so-called "ZLib" compression functions. The only known plugin to provide the necessary functionality is:
- e-CryptIt Engine (http://www.einhugur.com/, costs money)
Quick Demo
Make sure you have the Einhugur "e-CryptIt Engine" plugin installed (plus its accompanying plugin "#TypeLib.rbx") into the Plugins folder inside your RB application folder - you may also use the Demo version if you do not want to buy the full version yet. Open the project called "TT's ZipArchiver.rb" and run it. The program should open a window into which you can drop files to compress or decompress instantly. It also allows you to choose options such as the encoding for the file names (unfortunately, different tools on the various computers use different encodings - though this will not be of relevance as long as you're using files names that contain only ASCII characters. Once you use accented chars, symbols or even non-Latin scripts in your file names, you need to make sure that the encoding used for compression matches the one used to later read the archive again.
Overview of the classes
The only classes you need to look at are:
- ZipArchive - create/open/close archives, add and remove files
- ZipEntry - get information about items in an archive and extract them
- ZipExtraField - get or create some optional additional data for archive items
-
- ZipConfig - here you may configure a few options on how you want the code to behave
or, if you only want to uncompress .zip files easily, look at:
- ZipExtractor - contains simple methods to unzip an archive
The files inside the ZipSupport folder are not of much interest to you, while the files in the StreamSupport folder may be of general use for you in other projects, as they provide generalized read and write functions for both data and resource forks.
There is also a folder called "AddWhenNotUsingEinhugurPlugin". As its name suggests, you need to add its contents to your project if you do not have those plugins installed (you then also need to change the value of the constant "HaveEinhugurPlugin" to false and install the "TT's CRC-Plugin" which you can get from TT's web site, but which is not supporting the latest RB versions any more).
Creating an archive and adding files to it
First, create a new ZipArchive instance. Then you Open the archive by specifying a FolderItem along with a Boolean that is TRUE to signal that you want to write to the archive. If the FolderItem exists, it will opened as an archive (provided it's a valid archive file), otherwise a new archive file will be created.
dim zar as ZipArchive
zar = new ZipArchive
if not zar.Open(theFile, true) then
MsgBox "Error: " + zar.ErrorMessage
return
end
Next, you can either add single files or entire folders to the archive:
dim result as Integer
// add a file:
result = zar.AddItemToRoot(aFile, zar.MacBinaryNever)
if result <= 0 then
MsgBox "Error: " + zar.ErrorMessage
return
end
// add a folder:
if not zar.AddFolderContents(aFolder, aFolder.Name, zar.MacBinaryNever, false) then
MsgBox "Error: " + zar.ErrorMessage
return
end
Finally, close the archive:
if not zar.Close() then
MsgBox "Error: " + zar.ErrorMessage
return
end
MacBinary information
The MacBinary option allows you to preserve Macintosh-specific file information, mainly that's the resource forks (it does also preserve some minor attributes such as the File Creation Date, because standard Zip format only preserves the Modification Date).
A Mac file's Type and Creator codes will always be preserved, even if no MacBinary is used. You may want to pass MacBinarySmart or MacBinaryAlways instead of MacBinaryNever in order to better preserve Macintosh specific information (when running on Windows, they make no difference).
The "Smart" version will encode files only as MacBinary if the file contains a Resource Fork.
Note that many non-Macintosh Zip tools can not handle MacBinary encoding, which means that they would decode such files as a MacBinary file, hiding the data fork in them along with other information (some tools can decode this properly, though, like Stuffit Expander for Windows - they extract the data fork only, ignoring the resource fork).
Extracting all items from an archive
Create a new ZipArchive instance, then call Open with passing a FolderItem identifying the archive you want to extract from, and FALSE for reading from the archive (instead of TRUE for writing to it).
dim zar as ZipArchive
zar = new ZipArchive
if not zar.Open(theFile, false) then
MsgBox "Error: " + zar.ErrorMessage
return
end
You can now loop over all entries in the archive, get their "ZipEntry" instances, and then extract their files to a folder hierarchy of which you can provide the starting folder:
dim f as FolderItem, e as ZipEntry, i as Integer
for i = 1 to zar.EntryCount
e = zar.Entry(i)
f = e.MakeDestination(destFolder,false)
if f.exists then
// here you could ask the user whether to overwrite the
// existing file (but it could be even a folder!) or
// to skip the item or abort the entire process.
f.Delete // this will not work if it's a non-empty folder, though!
end
if not e.Extract(f) then
MsgBox "Extraction of """+e.RawPath+""" failed: "+e.ErrorMessage
return
end
next
Finally, close the archive again:
if not zar.Close() then
MsgBox "Error: " + zar.ErrorMessage
return
end
More detailed programming information
There are many more options to store and retrieve items in a Zip archive:
- Read and write the global (archive-wide) comment
- Read and add comments about individual items
- Read and write the data and resource forks separately, which may be helpful if you want to access the resource fork on Windows where the file system does not support this implicitly.
- Create and extract additional Extra Field data to support extensions for other operating systems (see the PKZIP "appnote.txt" for more information on the format of these extensions)
- Learn about the size of the uncompressed data for both the items in an existing archive and for data you plan to add to an archive. Helpful for progress bars. See the demo project for an example.
- Verify the integrity of individual items in an archive without extracting them.
- Remove or exchange entries in an archive, and compact the archive after removing items.
- Use a simple undo technique to revert changes made to an archive (see Mark and Rollback)
- Specify the encoding used in the archive for file names and comments.
To learn about all these, look up the functions in the classes (mostly ZipArchive and ZipEntry) and see the comments at the top of them. That's their documentation.
Note that the methods starting with "z_" are private functions, all other are open for general use.
If you find bugs or make enhancements to this software, please contact me about them (send me your changed version), and I'll see to incorporate them into my next release to have others benefit from it, too.
Known limitations and problems
- All Stuffit Expander versions up to 7.0.1 for Mac OS can not automatically decode MacBinary-encoded items if those items are in subfolders of the archive. Stuffit Expander 7.0.3 has fixed this problem. (Expander for Windows does not have this problem.)
- The "MacDefault" encoding does only work on Mac OS, but not on Windows (this is a RB limitation)
- The archive files are signed to be made by PKZIP, indicating that file names and other Operating System dependencies use MS-DOS as their reference. This means that, by default, file names should be using MS-DOS encodings, and text files should use CR+LF (chr(13)+chr(10)) as end-of-line delimiters.
- Archive items can have the "text file" flag set. If this is set, the end-of-line (EOL) delimiters of such a file should be converted to the local defaults. This, however, does currently not happen with this software! You can, at least, inquire about this flag along with getting an entry's OS code to determine the source and destination EOL format and convert it then yourself. See ZipEntry methods IsTextFile() and OSMadeBy().
- Be aware that the Zip directory can officially hold only up to 65535 entries. If you add more, many tools, including this one, may not be able to deal with the archive any more. The limitation comes from the fact that the number of entries is stored in a 16 bit field. There are work-arounds, but they're outside the specification. This Zip Package makes a best effort attempt at dealing with archives that contain more than 65535 entries: You can read them, but you cannot alter them.
Compatibility & interchangeability considerations
There are many different Zip archiver tools around, and many have interchange problems when it comes to these particularities:
- Non-ASCII file names. The problem is that the Zip "standard" does not provide clear rules for encoding file name. That means that if a file name uses extended letters such as ç, é or even non-Roman scripts (Japanese etc.), the Zip archive does not include the information how these letters are encoded in the archive. Originally, the DOS character sets were used, but nowadays Apple's OS X uses UTF-8, while most Windows archives rather still use local system encodings (i.e. they depend on the language the Windows system uses). Here are some details on the uses:
- Apple's Zip tool in OS X 10.4 and later (as invoked by the Finder) always writes UTF-8 in the names in the directory. When reading an archive, it apparently also always assumes the names to be in UTF-8 format.
- ZipIt 2.2.2 is a bit more versatile: It has an option in its Preferences to choose whether to use Unicode file names or the older encodings understood better by Windows zip tools. When the option enabled, it will encode the name in UTF-8 and set the lowest byte of the "external permissions" 32 bit value to 1. When reading an archive, it checks this byte: Only when it's non-zero, it interprets the file name as UTF-8, otherwise it uses an old encoding (e.g. MacRoman). The problem is now that Apple's zip does set this same byte to zero. Therefore, non-ASCII file names, when created by Apple's zip and read by ZipIt, are not properly decoded. The other direction works, however, as Apple blindly always assumes UTF-8.
- This Zip Package will from version 1.3 on, to enable best interchangeability, write the lowest byte of the "external permissions" with the value 1 in order to please ZipIt (former versions of this code did not do this) if UTF-8 encoding is chosen. When reading an archive, it will also check for the special signatures the Apple Zip tool and ZipIt write to an entry in order to detect UTF-8 encoded file names automatically.
- Resource Forks, MacBinary format. These are still used on Mac OS systems. Early on, the semi-official pkzip spec has been extended with the description of two possible ways to store resource forks in a Zip archive, one used by Info-Zip (not supported by this Zip Package) and one used by ZipIt and Stuffit (which use the MacBinary format for it and which is supported by this Zip Pacakge). Unfortunately, when Apple added Zip archive support to the Finder in OS 10.4, it invented yet another, and badly documented, alternative: It stores the Resource forks and other Mac-specific information in Appleingle/AppleDouble encoded files which are then stored in separate files (e.g. in a directory structure called "__MACOS"). The problem is that such created archives are not understood by any other Zip tools as of now (AFAIK). This Zip Package, from version 1.3 on, is at least able to decode these Apple-created archives properly, but can't create them yet. On the other hand, Apple's zip tool can't handle MacBinary encoded files. That means that if you want to create Zip archives that contain Resource Forks and that can be unarchived by Apple's Finder (e.g. by a simple double click on the zip file), you have to archive them with Apple's zip tool! If you want this Zip Package to create the new format used by Apple, you have to add this code yourself or pay me (TT) at least a few hundred dollars, as I am otherwise not eager to do it.
Further reading
The "zip archive" standard is quite a loose one. There appear to be lots of zip programs around that do not follow the PKWARE-specs very well, and while I have tried to program this software as close to the specs as I could, you may run into compatibility problems when trying to open archives from or create archives for other zip implementations.
A universal, open-sourced, reference implementation that attempts to deal with all eventualities is the so-called info-zip code base, in C language, which is also used by Apple with OS X (even though Apple appears to use quite an outdated version of it). Here is the info-zip home page:
Info-Zip home
The original PKWARE specification for Zip archives is included with this package in the "Technical docs" folder, along with the MacBinary specs.
List of changes (Version History)
v1.0, 8 Apr 2003
v1.1, 25 Apr 2003
- Added methods to get and set the "OS Made By" property (ZipEntry: OSMadeBy, SetOSMadeBy; ZipArchive: DefaultOSMadeBy, SetDefaultOSMadeBy)
- Added methods to get and set the "Text File" flag (ZipEntry: IsTextFile, SetTextFileFlag)
- Fixed "ZipArchive.Compact": It damaged archives if duplicate ("fake") entries were used.
- Adding a snapshot won't include hidden items any more. A new "ZipSnapshots.IncludeHiddenItems" property can be set to have them added again.
- Added "ZipSnapshots.RemoveSnapshot".
- Internal fix: z_isFakeEntry does not test for "hdr.Long(20)<>0" any more since this test was not always correct.
- "ZipSnapshots.ExtractFromSnapshot" has a new parameter to specify how to deal with Alias files at the destination (note that snapshots do not contain alias files): They can be skipped (the item in the archive will not be extracted if an alias exists at its destination), followed (the file where the alias points to will be replaced) or overwritten (the alias file is replaced with the file from the archive).
- Fixed storing 0-length files: They are now using the compression mode "stored", not "deflated". This prevents Stuffit Expander and other zip tools from complaining.
- ZipEntry.ExtraField() does not return nil any more for empty fields, but returns a valid ZipExtrafield object. nil is only returned in case of an error.
- Fixed a problem where extracting or verifying certain Unix-creatied and other archives would caused a "Header mismatch" error.
- The "protected" flag is now ignored when extracting an item, unless the new method "ZipEntry.EnableFileLocking" is called before extraction.
- The "FInfo.fdFlags" in MacBinary headers are now supported, too.
v1.1.1, 18 June 2003
- Fixed a bug in ZipArchive.z_readDirectory() that made reading a Zip file's comment fail.
v1.1.2, 17 July 2003
- Fixed a bug in ZipEntry.z_unzip() that flagged some compressed items as being corrupt even though they were just fine. A case of having added more safety checks than necessary (and appropriate).
v1.1.3, 29 July 2004
- This release also includes Windows versions of the plugins, so that one can compile the Zip Package with a RB IDE running on Windows, too.
- Added a new Class called ZipExtractor to facility extraction of an entire archive and other common tasks.
- Improved detection of valid Zip archives by adding a check for a Zip header at the start of the file. This allows for quick checks whether any file is a Zip archive, even if it's not using the proper file type or extension.
- Increased the minimum and preferred memory amounts of the project files because the values were definetely too low for use in Mac OS 8+9.
v.1.2, 2004-2006
- These were internal versions, not officially released
- Can now read zips with > 65535 entries, but will not modify them (they stay read-only)
- ZipEntry.Extract: added setting CreationDate to avoid Norton and other tools to complain about creation > modification date
- ZipEntry.ApplyMacBinaryInformation: Fixed setting of creation date
- Added get/set for UnixPermissions
- Does not compile with RB 4.5 any more, it requires 5.5.5 now.
- Added support for reading Apple's zip files (as created by the Finder). They need special handling because resource fork information is being stored in a new, incompatible way: Resource and Finder Info data is stored in separated "__MACOS" folders. Note that this Zip Package can read and uncompress such archives, but not create or alter them (yet). On the other hand, Apple's zip tool can not read the Resource forks as written by all the other Zip tools such as Stuffit, ZipIt, this Zip Package and others. Shame on Apple's OS X team for not observing the long-existing de-facto standards!
v1.3.1, 22 March 2007
- Updated the classes to build with RB 2006 and later.
- Uses less plugins: Only the e-CryptIt (and "#TypeLib.rbx") plugin is now needed, none of the old TT plugins.
- Tested to work on Intel-Macs
- Converted all external items to use the modern naming convention with endings such as .rbo etc. in order to make them better usable in IDEs under Windows and Linux.
Copyrights and Acknowledgements
e-CryptIt Engine copyright Björn Eiríksson (www.einhugur.com)
e-CryptIt Engine uses zlib code, copyright © 1995-2002 Jean-Loup Gailly and Mark Adler.
Original zip format REALbasic code was written by Carsten Friehe for the Mieze program (http://carsten-friehe.de/).
RB code improved and reorganized by Thomas Tempelmann (http://www.tempel.org/rb/) for public release.
Some of the design and error messages was influenced by Java 1.1's ZipFile and related classes.
My thanks go to Leonard Rosenthol (author of Stuffit Zip support and maintainer of MacBinary format) and Tom Brown (author of ZipIt) for providing helpful information.
Terms of use
This RB code, written by Carsten Friehe and Thomas Tempelmann, is given to the Public Domain, which means you can do whatever you want with it. It is, however appreciated if you would "tip" me by sending me a few dollars for my work.
It took me more than two full weeks to develop this software for the public - I myself could have done with much less for my own needs, but I wanted to provide this as a clean and complete solution so that others won't have to deal with this not-trivial task. So, if you benefit from my work, please acknowledge it with a little financial support for me.
Please visit the following web address to find out how to tip me:
http://tip.tempel.org/
Enjoy!
8 April 2003
Thomas Tempelmann