Adding RCS Header in Binary files - version-control

I am using RCS source control and need to check in an binary file (gif image and a jar file) how do I add an $Header$ keyword so that the version information is replaced in this file during check in and get revealed when I issue "ident" command.
For text files like Java, XML etc we usually add the RCS header comments and public strings but no idea about binary files.

Basically, you don't.
Binary file formats don't typically have a way to have a variable-length chunk of arbitrary data. Even if there's a region of the file that can contain arbitrary data, the length of the expansion can vary from one checkout to another (e.g., if it goes from version 1.9 to 1.10), and that's likely to mess up the file.
For this to work, the binary format would have to tolerate a change in the size of the header string. For example, if the version number changes from 1.9 to 1.10, the RCS co command (which has no knowledge of the binary file format) will replace the string in-place, changing the offset of all data following the string. If the file format has a comment section, and that section's size is stored as a number, co isn't going to update that number.
Compiler-generated object and executable files often have RCS version information in them, but it's usually generated from the source file(s); objects and executables themselves typically aren't stored in a version control system.
Before the initial checkin of a binary file, you should run rcs -i -kb filename, so that the RCS co command doesn't attempt to do keyword replacement (just in case the file happens to accidentally contain something that looks like an RCS keyword).
If you have a binary file that you've checked out of an RCS system, and you want to know which version it is, you'll have to compare it to each of the versions in RCS. (My own get-versions might be useful for this.)
If you have a way of storing textual metadata in the file, you could also consider annotating your binary file with a timestamp. You can then correlate the timestamp with the revision by looking at the RCS log.
You mentioned Excel files. I just tried some experiments. The new .xlsx format is really a zip file; anything you put in the Comment section will be compressed, and not visible to ident. The older .xls format, at least for the small file I tried, does store the Comment section in readable text, so ident works -- but when I checked in a file, RCS expanded the Comment from "$Header:$" to "$Header: /home/kst/2012-12-06/RCS/foo.xls,v 1.1 2012-12-06 11:47:48-08 kst Exp kst $"; when I tried to open it with Excel, I got:
Excel found unreadable content in 'foo.xls'.
and it was unable to recover the contents.

In general you can't, but certain binaries has a ASCII slot to place RCS headers.
For example ZIP files
% zip -z archive.zip
$Header$
And then, after CVS handling:
% unzip -l archive.zip
$Header: /cygdrive/c/cvsroot/archive.zip,v 1.2 2020/10/14 13:46:06 omg Exp $
There are dozen of extensions extensions that are actually a zip file where you can do this: odt, pdf, ... but use carefully and prefer short RCS headers like Version or Date, because RCS don't know the slot size, and may corrupt the file.

Related

2 files with same hash, but 1 is corrupted and 1 isn't

I found something very weird on a project.
I have 2 files :
One is the input file, it's a .bip file which you can open with GIS software like QGIS
here's the input. this file is provided by the CCSDS and accessible here
The other is the output after been compressed and decompressed by a lossless compression algorithm (CCSDS 123 by ESA)
Those 2 files shares the exact same sha256 and sha1 hash, so they are identical.
3226009de97d66589fc58cdc9af377e6315ccc69a7095bec8dc04447bf3cea2e test_ptn_x100y36z17_16u.bip
3226009de97d66589fc58cdc9af377e6315ccc69a7095bec8dc04447bf3cea2e test_ptn_decomp.bip (sha256 shown here).
The thing is, if the entry is showed by QGIS, the second one displays a message and refuses to open it shows this message (translated : the file test_ptn_decomp.bip is not a recognized or valid data source)
Is there something i don't understand with hashes ? i've tried moving files to other directories and renaming but nothing changes QGIS wise.
It is highly unlikely you got a different content with same sha256 hash by chance. So I'll assume the files are identical. Anyway, it is easy to use any diff program to compare.
So there should be some other differences, things that come to mind:
file name might contain some meaningful information needed by QGIS. Try renaming decompressed file e.g. decomp_ptn_x100y36z17_16u.bip, maybe x100.. is essential?
There are some additional files, that must have matching names. Do you have a .hdr file, as explained in QGIS tutorials?
https://www.qgistutorials.com/en/docs/open_bil_bip_bsq_files.html

How to develop additional File Formats for BeyondCompare

I see that BeyondCompare can be extended to include additional file formats, as in Additional File Viewer Rules for Beyond Compare 2 and also Additional file format downloads for version 3 but after a quick initial search I don't see how user's develop these special viewers. Is that documented anywhere?
I downloaded a few additional viewers which are handily imported via the BCFormats.bcpkg file
C:\Program Files (x86)\Beyond Compare 3\Helpers>dir /b /s
C:\Program Files (x86)\Beyond Compare 3\Helpers\HtmlTidy
C:\Program Files (x86)\Beyond Compare 3\Helpers\PdfToText.exe
C:\Program Files (x86)\Beyond Compare 3\Helpers\XLS_to_TAB_Single.vbs
C:\Program Files (x86)\Beyond Compare 3\Helpers\HtmlTidy\HtmlTidy.exe
C:\Program Files (x86)\Beyond Compare 3\Helpers\HtmlTidy\XML_tidied_sorted.bat
C:\Program Files (x86)\Beyond Compare 3\Helpers\HtmlTidy\XML_tidied_sorted_config.txt
What's the design to these things, are they something to the effect of a command line tool that reads in a text file as the first argument and outputs the converted file to standard output?
They are command line tools that preprocess a file before it is loaded for comparison. The first argument is an input file and the second argument is an output filename. As an example, the pdftotext.exe tool extracts a .pdf file to a plain text .txt file, then displays the temp file in Beyond Compare's Text Compare.
See Beyond Compare's help file topic Text Format Conversion Settings for details.
In another question (Compare Json Files in Beyond Compare
) I walk through a step by step example that demonstrates some json conversion for diffs to lend a concrete example to this question. What Chris said above is spot on, it's basically a console application that uses some fixed argument positions to take in the input file path as well as an output file path that the text representation will be written to.
$myConvertingConsoleApp $inputFilePath $outputFilePath
Beyond compare will actually provide the actual arguments used by the console application during the conversion process.
It's worth noting that the input file need not even be a text file so long as you can come up with some sane textual representation of the file format that makes sense for diff algorithms to operate upon.

Which source control uses a "s." prefix on its filenames?

I found what appears to be an old source repository for some source code that I need to resurrect. But I have no idea what source control tools were used to generate and manage this source repository. In the directory, all of the files have a "s." prefixed to the file name. Without knowing the format in these files, I cannot manually extract the source code with any degree of accuracy. And even if I did, manually extracting the source code would be very time consuming and error prone.
What source/version control system prefixes its source files with "s." when it stores the source file in its repository directory?
How can I effectively extract the latest source code from this repository directory?
The s. prefix is characteristic of SCCS, the Source Code Control System. The code for that is probably still proprietary, but GNU has the CSSC project which can manipulate SCCS files. It tracks changes per-file in revisions, known as 'deltas'.
SCCS is the official revision control system for POSIX; you can find the commands documented on the Open Group site (but the file format is not specified there, AFAICT):
admin
delta
get
prs
rmdel
sact
unget
val
what
The file format is not specified by POSIX. The manual page for get says:
The SCCS files shall be files of an unspecified format.
The original SCCS command set included some extras not recorded by POSIX:
cdc — change delta commentary (for changing the checkin comments for a delta)
comb — combine, effectively for merging deltas
help — no prefix; the wasn't any other help program at the time. Commands generate error codes such as cm3 and help interpreted them.
sccsdiff — difference between two deltas of a file
Most systems now have a single command, sccs, which takes the operation name and then options. Often, the files were placed into an ./SCCS/ subdirectory and extracted from that as required, and the sccs front-end would handle name expansion, adding s. or SCCS/s. to the start of the file names.
For extracting the latest version of the source code, use get.
get s.*
sccs get s.*
These will get the default version of each file, and the default default is the latest version of the file.
If you need to make changes, use:
get -e s.filename.c
...make changes...
delta -y'Why you made the changes' s.filename.c
get s.filename.c
Note that the files 'lose' the s. prefix for the working file names, rather like RCS (Revision Control System) files lose the ,v suffix for the working file names. If you've not come across that, accept that it was different when SCCS and RCS were created, back in the late 70s or early 80s.
SCCS uses an s. prefix. But it might not be the only one!
I never knew this knowledge would come in useful some day!

How to load a matlab file (.mat) in windows if it was originally saved in UNIX

I'm trying to open a .mat file in the windows environment but it is failing. It was created in a Unix environment. Also note that this file was first put in a .tar file first, ftp via binary method. The file opens in Unix and I don't think it was corrupted in any way.
The *.mat file format is platform agnostic. The OS does not matter.
There are a number of variants of the *.mat file which have been used, and older versions cannot always read formats saved with newer versions. You can save to an older version using flags available in the save command. These formats have been updated as the Matlab feature set has demanded a more flexible file format, and as other technologies have advanced, most notably HDF5 in the recent version.
Finally, the save command supports an ASCII formatted option. I suspect this is your current problem, based on your comment regarding the error message received.
To address your current problem:
First, check to see if the file is an ASCII file. The easiest way is to simply open it in notepat, wordpad, or even the matlab editor. If the file is text, then this becomes a file parsing problem, and the appropriate use of fscanf shoudl fix the problem.
If the file is actually a binary *.mat file then you probably have a Matlab version incompatability. Yuo can either go back to the source unix environment and save to an older version (eg save .... -v7) or update the Matlab version of the reading computer.

Concatenate content of TAGS files from different directories

I'm referring to TAGS file generated by ctags or etags in order to have some code navigation in Emacs with M-..
The typical project looks like this:
Large standard library (more than 100 files, but rarely updated).
Project-specific library (updated on the daily basis).
I would like the project to be able to use two (or maybe more TAGS files), but regenerate only the portion of them, only the ones used inside the particular project. How would I approach this problem?
etags --help:
-i FILE, --include=FILE
Include a note in tag file indicating that, when searching for
a tag, one should also consult the tags file FILE after
checking the current file.