Hash of an .exe file - hash

I'm wondering whether I will ever get a different result when producing a checksum on an .exe file before and then while or after running that file. I'm more concerned with common practice (such as producing a SHA hash of popular app like firefox.exe) than with boundary cases, but both are interesting. Thanks.

The hash of a file should be constant for as long as the file is identical (i.e. contains only the same bytes, in the same order). It's very rare to find applications that rewrite their on-disk representation at runtime, so the hash should be constant. There are self-modifying programs, but they tend to operate on the in-memory loaded copy of their code, rather than the disk copy.
Edit: We should consider "Self-updating" applications, but these tend to launch a little helper program to download and update the core application. It's difficult (especially on Windows) to update an execution whilst it's running. UNIX systems tend to operate Copy on Write systems, so it's possible that a software update might change your executable under your feet - but again, this is a "corner case".

The hash will only change if the exe changes. That will only happen if the app modifies itself, which isn't going to happen on windows without the app restarting. Firefox might update itself (including a restart), but apart from such cases, the hash will remain the same.

The hash will change if the file changes.
EXE files rarely change on their own. firefox.exe would change if the user updates to a new version.
You can check the "date modified" attribute of an EXE file (like firefox.exe) after running it to see whether it has changed, but you'll probably find it hasn't.

If you mean the modification of the last access time, don't worry, it's stored at the filesystem level, not within the file so the hash will remain the same.

Related

Is there a simpler way to check if multiple files have been modified?

I am working on a project with around 40 script files and I am going to package the scripts to distribute them to my clients (kind of like a version release). I don't want my clients to change my scripts (at least make it hard for them to change).
I have made certain files Read Only by setting the execution policy but the clients can simply set it back to writable so I want to add in a few lines of code (preferably less than 5) to check that the scripts are not modified.
I am aware of using property LastWriteTime will do it but I will need to do this for each of the script (a hash table to keep track of the LastWriteTime for each file will be to long and not clean enough) which is not ideal.
I have also considered Get-FileHash but I am concerned about the hash code will change each time I run it.
As you already have realized, it is impossible to prevent clients from modifying scripts in water-tight a way. Bruce Schneier sums it up nicely: "Trying to make bits uncopyable is like trying to make water not wet."
To run a script, one needs to copy it at least in system's memory - and at that point you've lost control. What's to prevent copying the script in alternate a location and editing it before running? Nothing, unless you have tight control on client. Should you have tight control, setting execution policy to signed prevents running unsigned scripts. Until the client starts Powershell from command line with -Executionpolicy Bypass switch. The execution policy isn't a security system that restricts user actions .
There are a few approaches that can hinder editing, but a determined hacker can overcome those. So the root question is: why? Why shouldn't the clients modify the scripts? Is it to protect some IP? Are they trying to achieve something the scripts are not designed to? Something else?
A simple solution is to use a tool like PS2EXE that converts Powershell script as an executable. The contents can be extracted and modified, but it requires at least a bit more effort than running Notepad.
Another approach would be modules. Distribute the scripts as a Powershell module that the clients will import. Editing a module requires a bit more effort than editing a simple script file, but is quite possible too.

Reduce relocatable win32 Perl to as few files and bytes as possible

I'm trying to use a perl program on a Windows HTCondor computing cluster. The way HTCondor on windows works is it copies all dependencies into a temporary directory (used as a chroot of sorts) and then it deletes the directory after the specified outputs are moved to a designated place.
If I take only perl.exe and perl514.dll and make a job like this: perl -e "print qq/hello\n/" and tell the cluster to run it 200 times, then each replication winds up taking about 15 seconds, which is acceptable overhead. That's almost all time spent repeatedly copying the files over the network and then deleting them. echo_hello.bat run 200 times takes more like two seconds per replication.
The problem I have is that when I try to use my full blown perl distribution of 55MB and 2,289 files, a single "hello" rep takes something like four minutes of copying and deleting, which is unacceptable. When I try to do many runs the disks on the machines grind to a halt trying to concurrently handle all the file operations across all the reps, so it doesn't work at all. I don't know how long it might take to eventually finish because I gave up after half an hour and no jobs had finished.
I figured PAR::Packer might fix the issue, but nope. I tried print_hello.exe created like this: pp -o print_hello.exe -e "print qq/hello\n/". It still makes things grind to a halt, apparently by swamping the filesystem. I think a PAR::Packer executable makes a ton of temporary files as it pulls out files it needs from the archive. I think the windows file system totally chokes when there are a bunch of concurrent small file operations.
So how can I go about cutting down the perl I built to something like 6MB and a dozen files? I'm really only using a tiny number of core modules and don't need most of the crap in bin and lib, but I have no idea how to proceed ripping out stuff in a sane way.
Is there an automated way to strip away un-needed files and modules?
I know TCL has a bunch of facilities for packing files into a single uncompressed archive that can then be accessed through a "virtual filesystem" without expanding the file. Is there some way to do this with perl itself sort of like with PAR? The problem is PAR compresses everything and then has to extract to temporary files, rather than directly work through a virtual filesystem layer. (If I understand correctly.)
My usage of perl is actually as a scripting layer. It's embedded in a simulation. So I'm really running my_simulation.exe which depends on per514.dll, but you get the idea. I also cannot realistically do anything to the HTCondor cluster other than use it. So there's no need to think outside the box on what I should be using instead of perl and what I could administratively tweak in Windows and HTCondor, thanks.
You can use Module::ScanDeps to get list of actual dependencies of your perl. It was terrible, that it took significant amount of time, when PAR::Packer unpacked the whole application, so I decided to build the executable by myself.
Here is my ready to use script which gathers perl dependencies into some directory; it might be useful for you to reduce the number of perl-modules, e.g. by manually removing some dependencies after copying.
In theory (I have never tried that), the next your step could be merge all pure-perl dependencies into single file (like deps.pm); although it might be non-trivial due to perl's autoload magic and some other tricks.
You can list the modules that are needed by your program using the very nice ListDependencies module
To my knowledge it isn't downloadable anywhere, but it is simple to copy and paste into your own ListDependencies.pm file
You should read the POD documentation within the module for usage instructions

Archiving succesive beta versions : how to save harddisk space?

I archive successive versions of an in-progress work :
MySoftware-v1.01beta.rar [2 GB]
MySoftware-v1.02beta.rar [2 GB]
MySoftware-v1.03beta.rar [2 GB]
MySoftware-v1.04beta.rar [2 GB]
etc.
Lots of files are modified, so it's not possible to backup only modified files : most of the files are modified each time.
How can do a .rar file that only saves the "difference" (should I use something like "patch" or "diff" ? -> I never used them). There are lots of "difference" tool, okay, but the result file won't be a .rar, it will only be a "difference file" : so each time I would like to re-open such an archive, I'll have to "de-diff" it and only THEN I will have a .rar again.
I'm on Windows, and if possible, I'd like to use winrar or command-line tool (it would be great if no third party software is needed).
Thanks a lot in advance!
You say 90% of your product is .wav files. Since diff on two wav files that are different is likely to produce huge differences, this is not likely to save you any space. Nor are .wav files really compressible, so zip or rar likely doesn't help much, either.
However, if, like most of us programmers, you derive your next version of the product from the previous one, by mostly retaining files unchanged (whether that be source or be .wav files), then what you really want to do is simply store, for each version, the files that changed. This is called "de-duplication" in the backup/compression world.
You can organize a complicated scheme your self to do this. (e.g., your self-suggested "do this with winrar"). But if you use a decent "source control system" (SVN or GIT would be fine), this will happen automatically as you checkin changed (and don't re-checkin unchanged) files. These tools work by keeping track of "differences" between versions; you can tell the tools to track text ("diff") style differences, or simply store the entire thing.
Also, since your individual versions occupy 2GB, I'd go waste $100 on a 2 or 4 terabyte (external) drive. That should last you in worst case through some 1000 iterations. (SVN/GIT will likely extended this a lot further).
You should really be using a source control system. A popular one is called 'git'. There are many others, each with their own strengths and weaknesses and the debate about which is 'best' is long and tedious.
Source control systems take care of storing and managing revisions of your files. The actual methods vary, but as a programmer who uses version control you 'check in' files for storage and version control, 'tag' them with revision numbers and then 'check out' files for modifying.
If you've ever downloaded source off the Internet using 'svn' or 'cvs', that's the type of thing I mean.
The source control system usually uses some sort of difference system to only store differences between modified files. Its purpose is to save you from having to even think about copying and backing up files - all you have to do is ensure your 'repository' is backed up correctly.
Also, as an added advantage you can make changes to source files and always have backups in case your changes need reverting. So suppose you want to try out a new file handling system you can use the source control system to create a testing (or whatever you want to call it) 'branch' and do all your changes in there without damaging a working copy of your software. If the changes are good you can then 'merge' the changes into the non testing branch of your repository.

Can I assume an executable file as a snapshot image of an execution state?

I read some unix manual (http://pubs.opengroup.org/onlinepubs/009695399/functions/posix_spawn.html), and there was a mention about execution.
The new process image shall be constructed from a regular executable
file called the new process image file.
The expression process image caught my eyes.
I have been thought executable file is just a kind of sequence of command. Just as the word program means. But actually, I don't know the concept and structure of the executable file. And I felt executable file could be looks like an execution state image from the mention.
Could you explain me something about this? About the concept and structure of regular executable files in nowadays. In any OS.
Usually the executable file does not contain only instructions but also global data, readonly data and many more. I suggest you briefly look e.g. on the ELF format widely used in UNIX-like operating systems or PE format used in Windows.
The OS may also need for example to replace some addresses of functions (jump targets) with the real addresses of these functions in the memory, although this technique is probably not used anymore in common OSes. Anyway, there can be more work to do than just copy the file into memory and start executing from the first byte.

Sync (Diff) of the remote binary file

Usually both files are availble for running some diff tool but I need to find the differences in 2 binary files when one of them resides in the server and another is in the mobile device. Then only the different parts can be sent to the server and file updated.
There is the bsdiff tool. Debian has a bsdiff package, too, and there are high-level programming language interfaces like python-bsdiff.
I think that a jailbreaked iPhone, Android or similar mobile device can run bsdiff, but maybe you have to compile the software yourself.
But note! If you use the binary diff only to decide which part of the file to update, better use rsync. rsync has a built-in binary diff algorithm.
You're probably using the name generically, because diff expects its arguments to be text files.
If given binary files, it can only say they're different, not what the differences are.
But you need to update only the modified parts of binary files.
This is how the Open Source program called Rsync works, but I'm not aware of any version running on mobile devices.
To find the differences, you must compare. If you cannot compare, you cannot compute the minimal differences.
What kind of changes do you do to the local file?
Inserts?
Deletions?
Updates?
If only updates, ie. the size and location of unchanged data is constant, then a block-type checksum solution might work, where you split the file up into blocks, compute the checksum of each, and compare with a list of previous checksums. Then you only have to send the modified blocks.
Also, if possible, you could store two versions of the file locally, the old and modified.
Sounds like a job for rsync. See also librsync and pyrsync.
Cool thing about the rsync algorithm is that you don't need both files to be accessible on the same machine.