Reduce relocatable win32 Perl to as few files and bytes as possible - perl

I'm trying to use a perl program on a Windows HTCondor computing cluster. The way HTCondor on windows works is it copies all dependencies into a temporary directory (used as a chroot of sorts) and then it deletes the directory after the specified outputs are moved to a designated place.
If I take only perl.exe and perl514.dll and make a job like this: perl -e "print qq/hello\n/" and tell the cluster to run it 200 times, then each replication winds up taking about 15 seconds, which is acceptable overhead. That's almost all time spent repeatedly copying the files over the network and then deleting them. echo_hello.bat run 200 times takes more like two seconds per replication.
The problem I have is that when I try to use my full blown perl distribution of 55MB and 2,289 files, a single "hello" rep takes something like four minutes of copying and deleting, which is unacceptable. When I try to do many runs the disks on the machines grind to a halt trying to concurrently handle all the file operations across all the reps, so it doesn't work at all. I don't know how long it might take to eventually finish because I gave up after half an hour and no jobs had finished.
I figured PAR::Packer might fix the issue, but nope. I tried print_hello.exe created like this: pp -o print_hello.exe -e "print qq/hello\n/". It still makes things grind to a halt, apparently by swamping the filesystem. I think a PAR::Packer executable makes a ton of temporary files as it pulls out files it needs from the archive. I think the windows file system totally chokes when there are a bunch of concurrent small file operations.
So how can I go about cutting down the perl I built to something like 6MB and a dozen files? I'm really only using a tiny number of core modules and don't need most of the crap in bin and lib, but I have no idea how to proceed ripping out stuff in a sane way.
Is there an automated way to strip away un-needed files and modules?
I know TCL has a bunch of facilities for packing files into a single uncompressed archive that can then be accessed through a "virtual filesystem" without expanding the file. Is there some way to do this with perl itself sort of like with PAR? The problem is PAR compresses everything and then has to extract to temporary files, rather than directly work through a virtual filesystem layer. (If I understand correctly.)
My usage of perl is actually as a scripting layer. It's embedded in a simulation. So I'm really running my_simulation.exe which depends on per514.dll, but you get the idea. I also cannot realistically do anything to the HTCondor cluster other than use it. So there's no need to think outside the box on what I should be using instead of perl and what I could administratively tweak in Windows and HTCondor, thanks.

You can use Module::ScanDeps to get list of actual dependencies of your perl. It was terrible, that it took significant amount of time, when PAR::Packer unpacked the whole application, so I decided to build the executable by myself.
Here is my ready to use script which gathers perl dependencies into some directory; it might be useful for you to reduce the number of perl-modules, e.g. by manually removing some dependencies after copying.
In theory (I have never tried that), the next your step could be merge all pure-perl dependencies into single file (like deps.pm); although it might be non-trivial due to perl's autoload magic and some other tricks.

You can list the modules that are needed by your program using the very nice ListDependencies module
To my knowledge it isn't downloadable anywhere, but it is simple to copy and paste into your own ListDependencies.pm file
You should read the POD documentation within the module for usage instructions

Related

Is there a simpler way to check if multiple files have been modified?

I am working on a project with around 40 script files and I am going to package the scripts to distribute them to my clients (kind of like a version release). I don't want my clients to change my scripts (at least make it hard for them to change).
I have made certain files Read Only by setting the execution policy but the clients can simply set it back to writable so I want to add in a few lines of code (preferably less than 5) to check that the scripts are not modified.
I am aware of using property LastWriteTime will do it but I will need to do this for each of the script (a hash table to keep track of the LastWriteTime for each file will be to long and not clean enough) which is not ideal.
I have also considered Get-FileHash but I am concerned about the hash code will change each time I run it.
As you already have realized, it is impossible to prevent clients from modifying scripts in water-tight a way. Bruce Schneier sums it up nicely: "Trying to make bits uncopyable is like trying to make water not wet."
To run a script, one needs to copy it at least in system's memory - and at that point you've lost control. What's to prevent copying the script in alternate a location and editing it before running? Nothing, unless you have tight control on client. Should you have tight control, setting execution policy to signed prevents running unsigned scripts. Until the client starts Powershell from command line with -Executionpolicy Bypass switch. The execution policy isn't a security system that restricts user actions .
There are a few approaches that can hinder editing, but a determined hacker can overcome those. So the root question is: why? Why shouldn't the clients modify the scripts? Is it to protect some IP? Are they trying to achieve something the scripts are not designed to? Something else?
A simple solution is to use a tool like PS2EXE that converts Powershell script as an executable. The contents can be extracted and modified, but it requires at least a bit more effort than running Notepad.
Another approach would be modules. Distribute the scripts as a Powershell module that the clients will import. Editing a module requires a bit more effort than editing a simple script file, but is quite possible too.

Opening .py files with micropython on TI Nspire

I uploaded Fabian Vogt's micropython port to my TI Nspire CX CAS, together with a couple of *.py.tns files to try. I can't find a way to load/launch those files.
As micropython does not include the os module, I can't use os.chdir to change the current directory and load the *.py files from the python shell. I tried from python shell: open("documents/mydirectory/myfile")
with different extensions .py or .py.tns, without success.
I don't think the Nspire has anything like the terminal commmand line either.
Thanks for your help,
There are 2 ways that you could do this, one easy way and one tedious way.
1. Map .py to micropython in your ndless.cfg
(ndless.cfg should be at /documents/ndless/ndless.cfg)
Like so:
ext.xxx=program-name
ext.xxx=program-name
ext.txt=nTxt
ext.py=micropython
ext.xxx=program-name
ext.xxx=program-name
You can edit this file either by copying it back and forth from your computer using TiLP or the official software, or you can edit it on-calc using nTxt. (This requires a bit of fiddling with making a copy of ndless.cfg so that the mappings still exist to open the copied file ndless.txt).
Ndless should come with a standard ndless.cfg containing basic bindings for nTxt and a few popular emulators. If you don't have one, get the standard one here. It will scan all directories (at least /documents/*, AFAIK) for programs. I've found that removing lines related to programs not on your Nspire will decrease load time.
2. Proper way to run a file in Python
To run a file in Python, you should do something like this:
with open("/documents/helloworld.py.tns","r") as file:
exec(file.read())
This will properly close the file after executing, which I've noticed is quite important on the Nspire, as leaving files open has given me trouble before. Of course, if you'd like, you can do exec(open("...","r").read()) and then handle closing the file yourself, but be warned: bad things can happen if you forget.
Also, you must remember to add the leading / and the .tns extension, or else strange things will happen, especially with writing to files.
That's about it! Feel free to ask more questions if needed, I'll be watching the ti-nspire tag.
(Just realized this question is quite old, but I guess it still might be helpful for others who end up on empty questions months later while trying to figure something out :P)

Can I assume an executable file as a snapshot image of an execution state?

I read some unix manual (http://pubs.opengroup.org/onlinepubs/009695399/functions/posix_spawn.html), and there was a mention about execution.
The new process image shall be constructed from a regular executable
file called the new process image file.
The expression process image caught my eyes.
I have been thought executable file is just a kind of sequence of command. Just as the word program means. But actually, I don't know the concept and structure of the executable file. And I felt executable file could be looks like an execution state image from the mention.
Could you explain me something about this? About the concept and structure of regular executable files in nowadays. In any OS.
Usually the executable file does not contain only instructions but also global data, readonly data and many more. I suggest you briefly look e.g. on the ELF format widely used in UNIX-like operating systems or PE format used in Windows.
The OS may also need for example to replace some addresses of functions (jump targets) with the real addresses of these functions in the memory, although this technique is probably not used anymore in common OSes. Anyway, there can be more work to do than just copy the file into memory and start executing from the first byte.

Command line CSV viewer with column-alignment for LARGE files

I would like to view my CSV files in a column-aligned format from the command line, with something like less, but my CSV files are sometimes gigabytes big, and I'm using a little computer (Netbook, 1GB RAM, 8GB HD, 1GHz processor), so I don't want to waste a lot of memory or processing power viewing the file.
I mention that I'd like to use something like less because I would like to be able to navigate around within the file.
cat FILE | column -s, -t | less is one thought, but cat is still going to try to print the whole file and I'm not sure how much buffering the pipes will use (if any) or what sort of caching less employs.
This question is similar to this other question, but I'm specifically interested in viewing large files using minimal resources preferably already on the machine. I don't presently use VI or EMACS, and think they'd both be overkill here. VI, for instance, would be a 27MB install for a utility acting merely as a viewer.
First of all, less can open oversized files. Second, both vim (which I use with the Largefile plugin and with files over 8 GB) and emacs can do it.
But... Most of the time, viewing a big file in a 80x40 (or a bit bigger) terminal is useless... so you should filter it with something like (f)grep or process it with awk. If you want only the start or end, then there are head and tail.
HTH
Check the tail \ head commands.
Or even better, Download VIM source and compile it. That should be easy enough. Version 5.8 source is 1Mb before decompressing (4MB after). Enjoy.

Hash of an .exe file

I'm wondering whether I will ever get a different result when producing a checksum on an .exe file before and then while or after running that file. I'm more concerned with common practice (such as producing a SHA hash of popular app like firefox.exe) than with boundary cases, but both are interesting. Thanks.
The hash of a file should be constant for as long as the file is identical (i.e. contains only the same bytes, in the same order). It's very rare to find applications that rewrite their on-disk representation at runtime, so the hash should be constant. There are self-modifying programs, but they tend to operate on the in-memory loaded copy of their code, rather than the disk copy.
Edit: We should consider "Self-updating" applications, but these tend to launch a little helper program to download and update the core application. It's difficult (especially on Windows) to update an execution whilst it's running. UNIX systems tend to operate Copy on Write systems, so it's possible that a software update might change your executable under your feet - but again, this is a "corner case".
The hash will only change if the exe changes. That will only happen if the app modifies itself, which isn't going to happen on windows without the app restarting. Firefox might update itself (including a restart), but apart from such cases, the hash will remain the same.
The hash will change if the file changes.
EXE files rarely change on their own. firefox.exe would change if the user updates to a new version.
You can check the "date modified" attribute of an EXE file (like firefox.exe) after running it to see whether it has changed, but you'll probably find it hasn't.
If you mean the modification of the last access time, don't worry, it's stored at the filesystem level, not within the file so the hash will remain the same.