Get Maximum Compression from 7zip compression algorithm - command-line

I am trying to compress some of my large document files. But most of files are getting compresses by only 10% maximum. I am using 7zip Terminal Commands.
7z a filename.7z -m0=LZMA -mx=9 -mmt=on -aoa -mfb=64 filename.pptx
Any suggestion on changing parameters. I need at least 30% compression ratio.

.pptx files or .docx files are internally .zip archives. You can not expect a lot of compression on an already compressed file.
Documentation states lzma2 handles better data that can not be compressed, so you can try with
7z a -m0=lzma2 -mx filename.7z filename.pptx
But the required 30% is almost unreachable.

If you really need that compression, you could use the fact that a pptx is just a fancy zip file:
Unzip the pptx, then compress it with 7zip. To recover an equivalent (but not identical) pptx decompress with 7zip and recompress with zip.
There are probably some complications, for example with epub there is a certain file that must be stored uncompressed as first file in the archive at a certain offset from the start. I'm not familiar with pptx, but it might have similar requirements.
I think it's unlikely that the small reduction in file size is worth the trouble, but it's the only approach I can think of.
Depending on what's responsible for the size of the pptx you could also try to compress the contained files. For example by recompressing png files with a better compressor, stripping unnecessary data (e.g. meta-data or change histories) or applying lossy compression with lower quality settings for jpeg files.

Well just an idea to max compressing is
'recompress' these .zip archives(the .docx, .pptx, jar...) using -m0 (storing = NoCompression) and then
apply lzma2 on them
lzma2 is petty good - however if the file contains many jpg's consider to give the opensource packer peazip or more specify paq8o a try. Paq8 has a build in Jpeg compressor and supports range compression. So it will also come along with jpg's the are inside some other file. Winzip's zipx in contrast to this will require pure jpg files and is useless in this case.
But again to make PAQ effectively working/compressing your target file you'll need to 'null' the zip/deflate compression, turn it into an uncompressed zip.
Well PAQ is probably a little exotic, however it's in my eye's more honest and clear than zipx. PAQ is unsupport so it's as always a good idea to just google for what don't have/know and you will find something.
Zipx in contrast may appears a little intrigious since it looks like a normal zip and files are listed properly in Winrar or 7zip but when you like to extract jpg's it will fail so if the user is not experienced it may seem like the zip corrupted. It'll be much harder to find out that is a zipx that so far only winzip or The Unarchiver(unar.exe) can handle properly.

PPTX, XLSX, and DOCX files can indeed be compressed effectively if there are many of them. By unzipping each of them into their directories, an archiver can find commonalities between them, deduplicating the boilerplate XML as well as any common text between them.
If you must use the ZIP format, first create a zero-compression "store" archive containing all of them, then ZIP that. This is necessary because each file in a ZIP archive is compressed from scratch without taking advantage of redundancies across different files.
By taking advantage of boilerplate deduplication, 30% should be a piece of cake.

Related

copy (multiple) files only if filesize is smaller

I'm trying to make my image reference library take up less space. I know how to make Photoshop batch save directories of images with a particular amount of compression. BUT some of my images were originally save with more compression than what I would have done.
So I wind up with two directories of images, some of the newer files have a larger filesize, some smaller, and some the same. I want to copy over the new images into the old directory, excluding any files that have a larger filesize (or the same, though these probably aren't numerous enough for me to care about the extra time to process them).
I obviously don't want to sit there and parse through each file, but other than that I'm not picky about how it gets tackled.
running Windows 10, btw.
We have similar situations. Instead of Photoshop, I use FFmpeg (using its qscale option) to batch re-encode multiple images into a subfolder then use XXCOPY to overwrite only the larger original source images. In fact I ended up creating my BATCH file which let FFmpeg do the batch e-encoding (using its "best" qscale setting), then let ExifTool batch copy the metadata to the newly encoded images, then let XXCOPY copy only the smaller newly created images. All automated, with the "new" folder and its leftover newly created but larger-sized images deleted too. Thus I save considerable disk space, as I have many images categorized/grouped in many different folders. But you should make a test run first or back up your images. I hope this works for you.
Here is my XXCOPY command line:
xxcopy "C:\SOURCE" "C:\DESTINATION" /s /bzs /y
The original post/forum where I learned this from is:
overwrite only files wich are smaller
https://groups.google.com/forum/#!topic/alt.msdos.batch.nt/Agooyf23kFw
Just to add, XXCOPY can also do it if the larger file size is wanted instead which I think is /BZL. I think it's also mentioned in that original post/forum.

Getting a list of all files inside a zip/rar/7z file with Scala

Is there a way to get a list of all the files inside a compressed file without decompressing it?
I don't mind using a Java library but all the solutions I found performed a decompression.
Also, if it is relevant, I know that the compressed file has sub directories in it and I want to also get the files from them.

Generate PNG file with pre-known CRC

Is it possible to create a PNG file with a predefined CRC? (kind of a programming challenge..)
I have a python script to generate hex codes with the target CRC, but I'm not sure how to make a valid PNG out of it.
BTW - it may be that I'm talking nonsense, but it sounds possible on theory (right?)
You can use spoof.c to do that, either at the level of a PNG chunk or at the level of the entire file. (Note that a PNG file does not contain a CRC of the whole thing, only CRCs of the chunks.)

Method to decompress a PDF (non-Adobe) while retaining form fields?

I found a similar question that involves Acrobat, but in this case the PDF was made with a combination of MS Word and CenoPDF v3, with which I'm unfamiliar. Additionally the PDF is version 1.3. I'd like to decompress it, to see its low-level workings and make some changes. It's easy with GhostScript's -dCompressPages=false parameter, but that simultaneously strips all the fill-in form functionality. Is there a method for decompressing the file while leaving everything else intact? A quick search of the docs for tcpdf and fpdi (cited in the link) didn't reveal a compression option.
Ghostscript and pdfwrite isn't a good combination. The PDF file you get out is NOT the same as the one you put in. This is because of the way that Ghostscript and pdfwrite work; the input is fully interpreted to a sequence of graphics primitives, which is sent to the Ghostscript graphics library. These are then sent to the requested device, most devices then render the result to a bitmap, but the pdfwrite family reassemble those graphics primitives int a new PDF file.
Note that the contents of the new PDF file have no relationship to the original, other than the appearance when rendered. Ghostscript and pdfwrite do maintain much of the non-marking content of PDF files such as hyperlinks and so on (which obviously don't get turned into graphics primitives), by interpreting them into pdfmark operations (an extension to the PostScript language defined by Adobe). However, even if Ghostscript and pdfwrite maintained all this content, the resulting PDF file wouldn't be the same as the original one decompressed....
There are tools which will decompress PDF files, and I would recommend one of our other products, MuPDF. A part of this is mutool, and "mutool clean -d in.pdf out.pdf" will decompress pretty much everything in a PDF file
QPDF can decompress PDF documents (among other things). I used this tool in the past and it preserved forms and data.
The tool has some issues with large PDFs (can take too much time and memory for decompression). The tool can produce incomplete output (with warnings in console) for some partially broken / nonstandard PDFs.

what is the difference between tar and gZ?

when i compress the file "file.tar.gZ" in iphone SDK, it gives file.tar , but both tar and tar.gZ gives same size?any help please?
*.tar means that multiple files are combined to one. (Tape Archive)
*.gz means that the files are compressed as well. (GZip compression)
Edit: that the size is the same doesn't say a lot. Sometimes files can't be compressed.
As Rhapsody said, tar is an archive containing multiple files, and gz is a file that is compressed using gzip. The reason why two formats are used is because gzip only supports compressing one file - perhaps due to the UNIX philosophy that a program should do one thing, and do it well.
In any case, if you have the option you may want to use bzip2 compression, which is more efficient (IE, compresses files to a smaller size) than gzip.