Unzip NSData without temporary file - iphone

I've found a couple of libs (LiteZip and ZipArchive) that allow to unzip files on iPhone. But both of them require an input as a file. Is there a library that allows to directly unzip NSData containing zip-archived data without writing it to temporary file?
I've tried to adopt mentioned above libs for that, but with no success so far.

In this answer to this question, I point out the CocoaDev wiki category on NSData which adds zip / unzip support to that class. This would let you do this entirely in memory.

From what I understand, the zip format stores files separately and each stored file is compressed using a compression algorithm (generally it's the DEFLATE algorithm).
If you're only interested in uncompressing data that was compressed using the DEFLATE algorithm you could use this zlib addition to NSData from Google Toolbox For Mac
It doesn't need temporary files.

Related

swift - Is there a simple way to keep symlinks when zipping files in swift?

Swift newbie here.
TL;DR:
The point is to zlib, tar-gzip (preferred) or zip a folder(s) that would (among regular files) contain symbolic links while retain those symlinks (not zipping the actual targets of those into the archive).
Problem:
Recursive zipping with folders and symlinks in swift seem to be a tough task as only single file/data compressing functions seem to be implemented in Swift natively (https://developer.apple.com/documentation/accelerate/compressing_and_decompressing_files_with_swift_stream_compression) with no easy hands-on how to use it for complex folder structure.
There are some libraries to help with to handle that, like https://github.com/marmelroy/Zip for zip or https://github.com/1024jp/GzipSwift for gzip, the problem with those is that none I found is handling symlinks as symlinks. Those just follow the symlink and zip the actual file.
I love the marmelroy's Zip syntax where you can simply specify the NSURL of the file/folder to zip and not worry about Data structures, folders structure and content, buffering and all the stuff:
do {
let filePath = Bundle.main.url(forResource: "file", withExtension: "zip")!
let unzipDirectory = try Zip.quickUnzipFile(filePath) // Unzip
let zipFilePath = try Zip.quickZipFiles([filePath], fileName: "archive") // Zip
}
catch {
print("Something went wrong")
}
the only issue it does not let you work with symlinks or have some custom handlers.
Question:
Is there a swift package / hack / simple enough way to tar-gzip/zlib/zip a large enough folder that would have symlinks while still keeping those as links? Could you please share some working snippet or point me to right direction?
Would be best if that could be compatible both with macos, ipados and ios.
Thank you!
Update:
Libraries working with Data structures (like https://github.com/tsolomko/SWCompression or GzipSwift) may have it right but both struggling to represent a folder (containing some files and symlinks in it) as Data structure and doubting writing the whole (e.g. 500 Gb) folder to a byte buffer in memory rather than doing that in small chunks is a good thing (= out of memory issues).
Should be something easy to do, I guess, but struggling to find any easy working code in swift here that would work for larger folders containing media and symlinks compressing and decompressing those in any format (tar-gzip, zip, or anything else is okay).
Info-ZIP's zip has a -y option to store the symbolic link instead of what it references.
tar, by default, stores the symbolic links as links. You have to give it an option (--dereference) to get it to follow symbolic links.
zlib does not provide any file/directory archive functions. Only compression.
You can run commands like zip or tar from Swift using Process.

Get Maximum Compression from 7zip compression algorithm

I am trying to compress some of my large document files. But most of files are getting compresses by only 10% maximum. I am using 7zip Terminal Commands.
7z a filename.7z -m0=LZMA -mx=9 -mmt=on -aoa -mfb=64 filename.pptx
Any suggestion on changing parameters. I need at least 30% compression ratio.
.pptx files or .docx files are internally .zip archives. You can not expect a lot of compression on an already compressed file.
Documentation states lzma2 handles better data that can not be compressed, so you can try with
7z a -m0=lzma2 -mx filename.7z filename.pptx
But the required 30% is almost unreachable.
If you really need that compression, you could use the fact that a pptx is just a fancy zip file:
Unzip the pptx, then compress it with 7zip. To recover an equivalent (but not identical) pptx decompress with 7zip and recompress with zip.
There are probably some complications, for example with epub there is a certain file that must be stored uncompressed as first file in the archive at a certain offset from the start. I'm not familiar with pptx, but it might have similar requirements.
I think it's unlikely that the small reduction in file size is worth the trouble, but it's the only approach I can think of.
Depending on what's responsible for the size of the pptx you could also try to compress the contained files. For example by recompressing png files with a better compressor, stripping unnecessary data (e.g. meta-data or change histories) or applying lossy compression with lower quality settings for jpeg files.
Well just an idea to max compressing is
'recompress' these .zip archives(the .docx, .pptx, jar...) using -m0 (storing = NoCompression) and then
apply lzma2 on them
lzma2 is petty good - however if the file contains many jpg's consider to give the opensource packer peazip or more specify paq8o a try. Paq8 has a build in Jpeg compressor and supports range compression. So it will also come along with jpg's the are inside some other file. Winzip's zipx in contrast to this will require pure jpg files and is useless in this case.
But again to make PAQ effectively working/compressing your target file you'll need to 'null' the zip/deflate compression, turn it into an uncompressed zip.
Well PAQ is probably a little exotic, however it's in my eye's more honest and clear than zipx. PAQ is unsupport so it's as always a good idea to just google for what don't have/know and you will find something.
Zipx in contrast may appears a little intrigious since it looks like a normal zip and files are listed properly in Winrar or 7zip but when you like to extract jpg's it will fail so if the user is not experienced it may seem like the zip corrupted. It'll be much harder to find out that is a zipx that so far only winzip or The Unarchiver(unar.exe) can handle properly.
PPTX, XLSX, and DOCX files can indeed be compressed effectively if there are many of them. By unzipping each of them into their directories, an archiver can find commonalities between them, deduplicating the boilerplate XML as well as any common text between them.
If you must use the ZIP format, first create a zero-compression "store" archive containing all of them, then ZIP that. This is necessary because each file in a ZIP archive is compressed from scratch without taking advantage of redundancies across different files.
By taking advantage of boilerplate deduplication, 30% should be a piece of cake.

iPhone - reading .epub files

I am engaged in preparing an application regarding reading the .epub files in iPhone. Where can I get the reference for sample applications for unzipping and parsing the files? Can anyone guide me with a best link? Thank you in advance.
An .epub file is just a .zip file. It contains a few directory files in XML format and the actual book content is usually XHTML. You can use Objective-Zip to unzip the .epub file and then use NSXMLParser to parse the XML files.
More info: Epub Format Construction Guide
On top of Ole's answer (that's a pretty good how-to guide), it's definitely worth reading the specification for the Open Container Format (OCF) - sorry it's a word file. It's the formal specification for the for zip structure used.
In brief you parse the file by
Checking it's plausibly valid by looking for the text 'mimetype' starting at byte 30 and the text 'application/epub+zip' starting at byte 38.
Extracting the file META-INF/container.xml from the zip
Parsing that file and extracting the value of the full-path attribute of the first rootfile element in it.
Load the referenced file (the full-path attribute is a URL relative to the root of zip file)
Parse that file. It contains all the metadata required to reference all the other content (mostly XHTML/CSS/images). Particularly you want to read the contents of the spine element which will list all content files in reading order.
If you want to do it right, you should probably also handle DTBook content as well.
If you want to do this right, you need to read and understand the Open Packaging Format (OPF) and Open Publication Structure (OPS) specifications as well.

NSFileHandle for binary files?

How can we read executable files in memory and then manipulate them , does NSFileHandle can works with executable files and how so ?!
Thank you.
Sure, NSFileHandle can manipulate anything you can read and write with a file descriptor. It just gives you raw access to the data in the file though, so to work with executables you would need to implement things like a Mach-O parser if you want to actually do anything that requires you to do anything that requires semantic understanding of the file,
On the other hand, if you just want to do something like checksum the file you don't need much more infrastructure than NSFileHandle.

what is the difference between tar and gZ?

when i compress the file "file.tar.gZ" in iphone SDK, it gives file.tar , but both tar and tar.gZ gives same size?any help please?
*.tar means that multiple files are combined to one. (Tape Archive)
*.gz means that the files are compressed as well. (GZip compression)
Edit: that the size is the same doesn't say a lot. Sometimes files can't be compressed.
As Rhapsody said, tar is an archive containing multiple files, and gz is a file that is compressed using gzip. The reason why two formats are used is because gzip only supports compressing one file - perhaps due to the UNIX philosophy that a program should do one thing, and do it well.
In any case, if you have the option you may want to use bzip2 compression, which is more efficient (IE, compresses files to a smaller size) than gzip.