Sync (Diff) of the remote binary file - diff

Usually both files are availble for running some diff tool but I need to find the differences in 2 binary files when one of them resides in the server and another is in the mobile device. Then only the different parts can be sent to the server and file updated.

There is the bsdiff tool. Debian has a bsdiff package, too, and there are high-level programming language interfaces like python-bsdiff.
I think that a jailbreaked iPhone, Android or similar mobile device can run bsdiff, but maybe you have to compile the software yourself.
But note! If you use the binary diff only to decide which part of the file to update, better use rsync. rsync has a built-in binary diff algorithm.

You're probably using the name generically, because diff expects its arguments to be text files.
If given binary files, it can only say they're different, not what the differences are.
But you need to update only the modified parts of binary files.
This is how the Open Source program called Rsync works, but I'm not aware of any version running on mobile devices.

To find the differences, you must compare. If you cannot compare, you cannot compute the minimal differences.
What kind of changes do you do to the local file?
Inserts?
Deletions?
Updates?
If only updates, ie. the size and location of unchanged data is constant, then a block-type checksum solution might work, where you split the file up into blocks, compute the checksum of each, and compare with a list of previous checksums. Then you only have to send the modified blocks.
Also, if possible, you could store two versions of the file locally, the old and modified.

Sounds like a job for rsync. See also librsync and pyrsync.
Cool thing about the rsync algorithm is that you don't need both files to be accessible on the same machine.

Related

current scctext replacement for textual representation of vfp binary files

What are people using in vfp 9 for a replacement for the built-in scctext.prg that translates binary files in vfp to a textual representation?
We’ve moving an existing project that’s in vfp 9 sp1 into tfs source control, but we need a way to make sure that the non-textual files are able to get the benefits of comparison that only non-binary text files allow. We plan to check both the textual representation and the binary file into source control (the binary is more for the “just in case” scenario)
According to the document at
http://www.ita-software.com/papers/Borup_Mercurial_Published.pdf
there are at least three options for converting .scx, .frx, .lbx, .prj and other non-prg dbf files in visual foxpro (vfp) to a textual representation. Only some of them allow for converting the textual information back to binary - not sure how often we’d really use that or not.
ALTERNATE SCCTEXT
This one seems older with latest version in 2009 - not sure if it’s still the preferred tool - and it seems to have no way to take the textual representation and convert it back to a binary file.
http://vfpx.codeplex.com/releases/view/12955
TWOFOX
This one seems similar to the foxbin2prg except it creates xml files - seems like only one dev is working on it unlike the others that are open to contributions from others so not sure how current it is and how much it’s being used by other developers - it does have two way conversion like fox2binprg has.
http://www.foxpert.com/downloads.htm
FOXBIN2PRG
This one is fairly recent - but not sure if it’s production ready enough to use for prod coding working - it does have two way conversion
http://vfpx.codeplex.com/releases/view/116407
TRIGGER INVOKE ONE OF THE ABOVE ON CHANGE OF BINARY FILES IN VFP IDE
What are people using to invoke these textual representation options?
I’ve seen this class that was created to run one of the programs listed above for all files in the project. Apparently it does it when the date time of the last generate is older that the date time on the textual version of the file. One detriment I’ve read is that it generates for foundation classes and other things that really are not items that a dev is working on (code that is referenced by but not included in your project).
http://codepaste.net/9yy1gm
Thanks for any advice from those that are using vfp 9 with source control out there!
You should check out the scX library written by Paul McNett which is published on Ed Leafe's web site. I haven't used it in a mission-critical software project yet, but I have tested it out. It seemed to catch all the potential problems I've encountered with other scctext replacements.
The reason I haven't used it in a big project for a couple of reasons.
It is a breaking change for source control history. So, comparing source code in your current SCA or VCA files with the new files generated by scX isn't going to be simple.
It isn't a drop in replacement for scctext. Instead of checking files into and out of source control directly from the IDE, you'll have an intermediary folder.
You'll check your files out of source control into one folder, convert them to FoxPro format, and then edit them in the FoxPro IDE.
Then, you'll save your changes in the FoxPro IDE, convert them to scX format, and then check them into source control.
I'm sure much of #2 can be automated; but combined with #1, making the change to scX wasn't worth it for me.
FoxBin2Prg is Production ready, and AFAIK, it's the only tool that allow Diff and Merge of the generated text (tx2) files, and can regenerate the binaries from them.
The generated files are PRG style, so developers can see them as modifying a PRG (with PROc/ENDPROC structures and such), but they aren't mean to compile. Primary use is for SCM tools, but can be used seperately.
I'm actually using on production code with a 10 member team using concurrent modifications on forms and classes.
Some documentation is available on VFPx in English and Spanish, Internal messages are vailable on both languages and from version v1.19.24 a new translation to German is available too.
More info on VFPx site,
Best regards!

Which source control uses a "s." prefix on its filenames?

I found what appears to be an old source repository for some source code that I need to resurrect. But I have no idea what source control tools were used to generate and manage this source repository. In the directory, all of the files have a "s." prefixed to the file name. Without knowing the format in these files, I cannot manually extract the source code with any degree of accuracy. And even if I did, manually extracting the source code would be very time consuming and error prone.
What source/version control system prefixes its source files with "s." when it stores the source file in its repository directory?
How can I effectively extract the latest source code from this repository directory?
The s. prefix is characteristic of SCCS, the Source Code Control System. The code for that is probably still proprietary, but GNU has the CSSC project which can manipulate SCCS files. It tracks changes per-file in revisions, known as 'deltas'.
SCCS is the official revision control system for POSIX; you can find the commands documented on the Open Group site (but the file format is not specified there, AFAICT):
admin
delta
get
prs
rmdel
sact
unget
val
what
The file format is not specified by POSIX. The manual page for get says:
The SCCS files shall be files of an unspecified format.
The original SCCS command set included some extras not recorded by POSIX:
cdc — change delta commentary (for changing the checkin comments for a delta)
comb — combine, effectively for merging deltas
help — no prefix; the wasn't any other help program at the time. Commands generate error codes such as cm3 and help interpreted them.
sccsdiff — difference between two deltas of a file
Most systems now have a single command, sccs, which takes the operation name and then options. Often, the files were placed into an ./SCCS/ subdirectory and extracted from that as required, and the sccs front-end would handle name expansion, adding s. or SCCS/s. to the start of the file names.
For extracting the latest version of the source code, use get.
get s.*
sccs get s.*
These will get the default version of each file, and the default default is the latest version of the file.
If you need to make changes, use:
get -e s.filename.c
...make changes...
delta -y'Why you made the changes' s.filename.c
get s.filename.c
Note that the files 'lose' the s. prefix for the working file names, rather like RCS (Revision Control System) files lose the ,v suffix for the working file names. If you've not come across that, accept that it was different when SCCS and RCS were created, back in the late 70s or early 80s.
SCCS uses an s. prefix. But it might not be the only one!
I never knew this knowledge would come in useful some day!

Archiving succesive beta versions : how to save harddisk space?

I archive successive versions of an in-progress work :
MySoftware-v1.01beta.rar [2 GB]
MySoftware-v1.02beta.rar [2 GB]
MySoftware-v1.03beta.rar [2 GB]
MySoftware-v1.04beta.rar [2 GB]
etc.
Lots of files are modified, so it's not possible to backup only modified files : most of the files are modified each time.
How can do a .rar file that only saves the "difference" (should I use something like "patch" or "diff" ? -> I never used them). There are lots of "difference" tool, okay, but the result file won't be a .rar, it will only be a "difference file" : so each time I would like to re-open such an archive, I'll have to "de-diff" it and only THEN I will have a .rar again.
I'm on Windows, and if possible, I'd like to use winrar or command-line tool (it would be great if no third party software is needed).
Thanks a lot in advance!
You say 90% of your product is .wav files. Since diff on two wav files that are different is likely to produce huge differences, this is not likely to save you any space. Nor are .wav files really compressible, so zip or rar likely doesn't help much, either.
However, if, like most of us programmers, you derive your next version of the product from the previous one, by mostly retaining files unchanged (whether that be source or be .wav files), then what you really want to do is simply store, for each version, the files that changed. This is called "de-duplication" in the backup/compression world.
You can organize a complicated scheme your self to do this. (e.g., your self-suggested "do this with winrar"). But if you use a decent "source control system" (SVN or GIT would be fine), this will happen automatically as you checkin changed (and don't re-checkin unchanged) files. These tools work by keeping track of "differences" between versions; you can tell the tools to track text ("diff") style differences, or simply store the entire thing.
Also, since your individual versions occupy 2GB, I'd go waste $100 on a 2 or 4 terabyte (external) drive. That should last you in worst case through some 1000 iterations. (SVN/GIT will likely extended this a lot further).
You should really be using a source control system. A popular one is called 'git'. There are many others, each with their own strengths and weaknesses and the debate about which is 'best' is long and tedious.
Source control systems take care of storing and managing revisions of your files. The actual methods vary, but as a programmer who uses version control you 'check in' files for storage and version control, 'tag' them with revision numbers and then 'check out' files for modifying.
If you've ever downloaded source off the Internet using 'svn' or 'cvs', that's the type of thing I mean.
The source control system usually uses some sort of difference system to only store differences between modified files. Its purpose is to save you from having to even think about copying and backing up files - all you have to do is ensure your 'repository' is backed up correctly.
Also, as an added advantage you can make changes to source files and always have backups in case your changes need reverting. So suppose you want to try out a new file handling system you can use the source control system to create a testing (or whatever you want to call it) 'branch' and do all your changes in there without damaging a working copy of your software. If the changes are good you can then 'merge' the changes into the non testing branch of your repository.

Can I assume an executable file as a snapshot image of an execution state?

I read some unix manual (http://pubs.opengroup.org/onlinepubs/009695399/functions/posix_spawn.html), and there was a mention about execution.
The new process image shall be constructed from a regular executable
file called the new process image file.
The expression process image caught my eyes.
I have been thought executable file is just a kind of sequence of command. Just as the word program means. But actually, I don't know the concept and structure of the executable file. And I felt executable file could be looks like an execution state image from the mention.
Could you explain me something about this? About the concept and structure of regular executable files in nowadays. In any OS.
Usually the executable file does not contain only instructions but also global data, readonly data and many more. I suggest you briefly look e.g. on the ELF format widely used in UNIX-like operating systems or PE format used in Windows.
The OS may also need for example to replace some addresses of functions (jump targets) with the real addresses of these functions in the memory, although this technique is probably not used anymore in common OSes. Anyway, there can be more work to do than just copy the file into memory and start executing from the first byte.

Hash of an .exe file

I'm wondering whether I will ever get a different result when producing a checksum on an .exe file before and then while or after running that file. I'm more concerned with common practice (such as producing a SHA hash of popular app like firefox.exe) than with boundary cases, but both are interesting. Thanks.
The hash of a file should be constant for as long as the file is identical (i.e. contains only the same bytes, in the same order). It's very rare to find applications that rewrite their on-disk representation at runtime, so the hash should be constant. There are self-modifying programs, but they tend to operate on the in-memory loaded copy of their code, rather than the disk copy.
Edit: We should consider "Self-updating" applications, but these tend to launch a little helper program to download and update the core application. It's difficult (especially on Windows) to update an execution whilst it's running. UNIX systems tend to operate Copy on Write systems, so it's possible that a software update might change your executable under your feet - but again, this is a "corner case".
The hash will only change if the exe changes. That will only happen if the app modifies itself, which isn't going to happen on windows without the app restarting. Firefox might update itself (including a restart), but apart from such cases, the hash will remain the same.
The hash will change if the file changes.
EXE files rarely change on their own. firefox.exe would change if the user updates to a new version.
You can check the "date modified" attribute of an EXE file (like firefox.exe) after running it to see whether it has changed, but you'll probably find it hasn't.
If you mean the modification of the last access time, don't worry, it's stored at the filesystem level, not within the file so the hash will remain the same.