how to use perl Archive::Zip to recursively walk archive files? - perl

I have a small perl script that I use to search archives for members matching a name. I'd like to enhance this so that if it finds any members in the archive that are also archives (zip, jar, etc) it will then recursively scan those, looking for the original desired pattern.
I've looked through the "Archive::Zip" documentation, and I thought I saw how to do this. I noticed the "fh()" and "readFromFileHandle()" methods. However, in my testing, it appears that the "fh()" call on an archive member returns the file handle for the containing archive, not the member. Perhaps I'm doing it wrong, but I would appreciate an example of how to do this.

You can't read the contents of any sort of archive member (whether it is text, picture, or another archive) without extracting it from the archive file.
Once you have identified a member that you want to view, you must call extractMember (or, more likely, extractMemberWithoutPaths if the file is to be temporary) to extract it to a disk file. Then you can create a new Archive::Zip object and read the new file while keeping the old one open.
You will presumably want to unlink the archive file once you have catalogued its contents.
Edit
I hadn't come across the Archive::Zip::MemberRead module before. It appears you were on the right track with readFromFileHandle. I would guess that it should work like this, but it would be awkward for me to test it at present.
my $zip = Archive::Zip->new;
$zip->read('myfile.zip');
my $zipfh = Archive::Zip::MemberRead->new($zip, 'archive/path/to/member.zip');
my $newzip = Archive::Zip->new;
$newzip->readFromFileHandle($zipfh)

Related

Can I package a CSV file as a module resource

I have a custom PowerShell module with a corresponding module manifest. In one command in my module I have a hard-coded array of hash tables. This was fine at first but occasionally I have to go back and add new hash tables to this hard-coded array and the array is becoming quite long. It is becoming difficult to manage this data in this way. What I would really like to do is move this collection out into an external resource (e.g. a CSV file) and have the command read the data from the CSV file. Actually, this is what I preferred from the beginning but it has only just now become painful enough that I feel compelled to figure out how to do this.
My question is how would I go about doing this? Or can it even be done? I have read quite a bit about module manifests but I do not ever recall reading anything that describes a way to specify additional resources in the manifest file or how to load those resources in such a way as to be 'private' to a module. I suppose I could just drop the CSV file in the module's folder with all the other PowerShell files and then maybe I can find it using $PSScriptRoot but that does not seem very 'official' (and I am not 100% sure it would work). Plus, by doing it that way there is nothing in the manifest that would suggest to somebody else that there are other resources that are required for the module to function properly.
Is there a best practice for something like this or am I coming at this all wrong?
The manifest definition does have a key for this; it is called FileList and is essentially an array of files. Since the description generated by the New-ModuleManifest cmdlet says, "List of all files packaged with this module," that is what I specified when I used it. (I didn't have to list the .psm1 file since it is listed elsewhere in the manifest.)
# List of all files packaged with this module
FileList = #(
'script1.ps1',
'script2.ps1',
'Microsoft.Web.Publishing.Tasks.Dll',
'transform.proj',
'some_file.xml'
)
As for locating the files, I simply use $PSScriptRoot, just like you suggested.
To my knowledge, there isn't anything that automatically handles installation of the module. It's still up to you to get it into a folder in the PSModulePath environment variable.

Grouping two files into one custom file-type

I am currently working on a simple tower defense game for iOS (using objective-c), which contains several maps/levels. However, as it is now, each map consists of an image file and a .plist file with information. My question is: is there any way I could create a custom file type (for example, *.map) that contains both the image and the information from the plist?
If this is possible, how do I implement this?
Thanks in advance!
You have several good choices for that:
The simplest solution would be grouping the related files in subfolders: rather than having xyz.map file, you could have an xyz sub-folder, and reference the files out of it. You would not need to use any additional libraries for this, and you would be able to use the same name for all your image files and all your level files, because they would be in separate folders.
You can make a zip archive with the files that you would like to combine, and unzip it before use. Here is a link to an answer referencing a library to do it.
You can use a tar format - here is a list to an answer referencing a library that supports it. You would be able to use tar utility on OS-X to group images with plists on your workstation.
Finally, you can define a format of your own: store the length of the first file in the first four bytes, then store the content of the first file, and then the second. You would need to write a utility for combining the two files into one. This sounds like the hardest choice to implement.

Delete multiple files with names containing a substring efficiently

I would like to delete multiple files that contain a substring. Say for example I would like to delete all the files that has the substring my. Assume that my directory contains 4 files: photo.jpg, myPhoto.jpg, beachMyPhoto.jpg, anyPhoto.jpg, since the term of search is my the files that I am interested to delete are myPhoto.jpg and beachMyPhoto.jpg (case insensitive).
My proposed solution (which I know how to do) is to use NSFileManager class, and use the function contentsOfDirectoryAtPath:error: to read all the directory contents, and then search by a loop for a hit. If a hit is found I delete that file.
What I don't like in my proposed solution is that it is not that efficient especially if the directory contains too many files and the hit is a small number. Is there a more efficient way to do this?
If you don't want a big array loaded into memory, you can try -[NSFileManager enumeratorAtURL:includingPropertiesForKeys:options:errorHandler:]. Since you only want the immediate contents of the directory, you would invoke -[NSDirectoryEnumerator skipDescendants] for each directory that it returns.
If your concern is iterating over all of the items in the directory, testing for your match pattern, well that's unavoidable. Any technique you would hope to use has to somehow iterate over all of the items in the directory and test for a match. The only question is whether that iteration is exposed to you or not. In Cocoa, it is. You could drop down to the glob() function if you want an alternative where it isn't.

Pipe multiple files into a zip file

I have several files in a GridFS Document Store and what I'd like to do is to pipe this data into a zip file via stdin in NodeJS. So that I will end up with a zip file containing all these files.
Now my question is how can I give the files a valid filename inside of the zip file. I think I need to emulate/fake a file header containing the filename?
Any help is appreciated!
Thanks
I had problems when writing zip files with Node.js not long ago. I ended up doing something similar to what is described in Zip archives in node.js
I can't help you directly with your problem, but at least I hope I can point out some things:
Don't try to use node-archive. Even if the description says it allows to create zip files, the moment I read the source code (since documentation is unexistant) I realized that's just a lie. It only exposes methods for reading.
Using zip by spawning a process, like recommended on the provided link, seems to be the best way. Something that would work is copying the files to a local folder with whatever name you desire and then calling the zip command, just to delete the files afterwards.
The other option, which seems ok, is to use zipper (https://github.com/rubenv/zipper, although better just use npm). The reason I'm not really wishing to use it is because there's not that much flexibility, it seems to have been done in a day and it hasn't been modified since the first commit, so I'm not sure it will receive maintenance (sure, you could just fork it...).
I swear the day I have an entire free weekend with no work I will write a freaking module that does this as complete as possible. It's silly that there isn't and it shouldn't be that much struggle. blablablarant.
Edit:
Not sure if it was there before, but now I've been using the node-compress module (also using gzippo). It works fine.

How do you compare the content of two archive files programmatically?

I'm doing some testing to ensure that the all in one zip file that i created using a script file will produce the same output as the content of a few zip files that i must manually click and create via web interface. Therefore the zip will have different folder structure.
Of course i can manually extracted them out and using my powerful eyeball technique to scan them or even lazier i can write a script to do that, but before i invest more time and get accused by my boss for company time robbery, i'm asking if there's a better way to do this?
I'm using perl LAMP stack by the way.
thanks.
You can use perl's Archive::ZIP or Python's zipfile to extract the filenames, sizes and CRC checksums of the files in the archives. Create a file which contains the results sorted by file name (ignore the path).
For your smaller ZIPs, merge the results of the script (cat list1 list2 list3 | sort).
Now, you can use diff to compare the results.
I can wholeheartly recommend Beyond Compare. Unless you're really getting underpaid, it's the biggest bang for your (bosses) buck.
[Edit] I seem to have scanned over the different folder structure, sorry about that.Beyond Compare can compare all files in folders with the same folderstructure. It does not have (I believe) the intelligence to go searching for matches in files in different folders.
Regards,
Lieven
Create a crc checksum for your files.
If your checksum is the same for the original files and the unzipped files, you can be sure the files are the same. And even works for non text data.
A checksum be easily be created with an external program such as "SFV Checker" or programmatically (.net/java for example include libraries to do this).
Taking a cue from Carra's answer...if A.zip is your single big archive and B.zip is the archive generated through the web then use the following algorithm
Extract all files from A.zip and recursively (w.r.t folders) compute the checksum of the files present in the folder (using cksum, md5sum etc) where the contents were extracted and save this information after sorting it (pipe it through sort) to a file (say A.txt)
Do the same for B.zip and generate B.txt
Compare A.txt with B.txt they should be exactly the same.
OR
Use unzip -l to get file/directory lists for both the (zip) archives and then flatten the hierarchy of the user generated zip file and compare with the contents of your script generated zip file using some thing like diff. By flattening of hierarchy I mean you may need to do some kind of pre-precessing on one or both lists before you can do a meaningful comparison with diff.