Mercurial: Pre and Post merge operations (per file) - version-control

We are using Mercurial as an SCM to handle the source script files of a program. Each project we manage has ~5000 files with each file containing a section with some product-specific informations about the file itself (version list, date, time etc.). This section is - due to the way it is structured - in 80% of the merges, the only section that has conflicts. They are easily resolved, but when merging around 300 files, it gets tiresome.
The problem is: I have no control over the way this section is written and I cannot change the format of the section itself, as it would make the file unusable by the program.
My question: is there a way in mercurial (hooks?), that allows me to
pre-process the file with a script
let mercurial do the merge
if merged correctly: post-process the file with a script. otherwise: "resolve-conflicts" as usual.

You could probably get away with it by creating a custom merge tool:
https://www.mercurial-scm.org/wiki/MergeToolConfiguration
A simple script that invokes 'diff' after removing the ever changing sections might be enough.
It sounds like those sections are the sort of nonsense that the (disrecommended) KeywordsExtension are built to handle, but I gather you don't have a lot of flexibility around them.

Related

Hash of a textual file for integrity purposes

I have a more general requirement to track changes in asset files that are committed into source code and deployed inside the binaries, but for now I am implementing it in unit testing context and facing a potential problem for the future. Before asking the TLDR question I will show a lot of contextual information.
Scenario
Some application assets are loaded from CSV files committed into Git repository via ClasspathResource[1] and they may sometime change. Change occurs across commits, but for a runtime application the change occurs across different versions of the application.
My test solution
I have implemented the following mechanism to alert me about changes in the resource:
#Before
public void setUp() throws Exception
{
assertEquals("Resource file has changed. Make sure the test reflects the changes in the file and update the checksum", MD5_OF_FILE,
DigestUtils.md5Hex(new ClassPathResource("META-INF/resources/assets.csv").getInputStream()));
}
Basically, I want my unit tests to fail until I explicitly code the checksum of the file. When I run md5sum assets.txt I hardcode the result into the code so tests know they are working with a fixed version of the file.
Problem
I ran the tests on my own Windows box and worked like a charm. Switching to Linux, I found that they failed. Immediately I realized that it may be due to line endings, which I totally forgot.
In the specific case, Git is configured to commit files LF but checkout (in Windows) CRLF. This configuration is reasonable for working with source code.
So I need to check if the asset file has changed in a smart way that allows a box to change/reinterpret the line endings. This is especially true for the runtime application which will store the file hash and will compare the actual assets file (which may have changed), performing corrective actions on differences ==> reloading the assets.
TL;DR
Given a textual file of which I can extract and store any hash (not just cryptographic, I used MD5), how can I tell that it has changed regardless of the environment the file is processed into, which may modify the line endings?
Note
I have requirement not to use a versioning system in the asset itself (e.g. first row has incremental version, since developers will fail to update correctly).
[1] Spring framework tool wrapping Class.getResourceAsStream
A solution could be normalizing the file to chosen line endings, i.e. always CRLF or always LF, then compute the cryptographic hash over that normalized content.
E.g. compute md5sum | dos2unix file and use a proper Stream in code that normalizes the file on the fly

Which source control uses a "s." prefix on its filenames?

I found what appears to be an old source repository for some source code that I need to resurrect. But I have no idea what source control tools were used to generate and manage this source repository. In the directory, all of the files have a "s." prefixed to the file name. Without knowing the format in these files, I cannot manually extract the source code with any degree of accuracy. And even if I did, manually extracting the source code would be very time consuming and error prone.
What source/version control system prefixes its source files with "s." when it stores the source file in its repository directory?
How can I effectively extract the latest source code from this repository directory?
The s. prefix is characteristic of SCCS, the Source Code Control System. The code for that is probably still proprietary, but GNU has the CSSC project which can manipulate SCCS files. It tracks changes per-file in revisions, known as 'deltas'.
SCCS is the official revision control system for POSIX; you can find the commands documented on the Open Group site (but the file format is not specified there, AFAICT):
admin
delta
get
prs
rmdel
sact
unget
val
what
The file format is not specified by POSIX. The manual page for get says:
The SCCS files shall be files of an unspecified format.
The original SCCS command set included some extras not recorded by POSIX:
cdc — change delta commentary (for changing the checkin comments for a delta)
comb — combine, effectively for merging deltas
help — no prefix; the wasn't any other help program at the time. Commands generate error codes such as cm3 and help interpreted them.
sccsdiff — difference between two deltas of a file
Most systems now have a single command, sccs, which takes the operation name and then options. Often, the files were placed into an ./SCCS/ subdirectory and extracted from that as required, and the sccs front-end would handle name expansion, adding s. or SCCS/s. to the start of the file names.
For extracting the latest version of the source code, use get.
get s.*
sccs get s.*
These will get the default version of each file, and the default default is the latest version of the file.
If you need to make changes, use:
get -e s.filename.c
...make changes...
delta -y'Why you made the changes' s.filename.c
get s.filename.c
Note that the files 'lose' the s. prefix for the working file names, rather like RCS (Revision Control System) files lose the ,v suffix for the working file names. If you've not come across that, accept that it was different when SCCS and RCS were created, back in the late 70s or early 80s.
SCCS uses an s. prefix. But it might not be the only one!
I never knew this knowledge would come in useful some day!

Using version control for RTF documents?

I have recently started a new job as a Business Systemd Analyst. The company has an in-house document management system that reads/parses RTF documents that have a BBCode-like syntax to do basic conditional, looping and inserting of data from a database; my role is to modify these RTF files with the code blocks to make them dynamic.
For my own personal use I would like to utilize a version control system to better handle revisions and so I don't have to have dozens of copies of a file during the various stages I'm working on them, probably Mercurial (I don't feel like dealing with Cygwin), but seeing as I'm more used to source code in an IDE than a rich text document template, I'm not quite sure if a VCS system is even the appropriate solution to use as I couldn't really use them to diff files, just as storage and tracking.
Any suggestions for this? Could I get by with a VCS system or am I applying programmer logic to a non-programming problem? :)
seeing as I'm more used to source code in an IDE than a rich text
document template
It is a look at a strange angle: you can version all, always, anytime. Just sometimes it's less usable, sometimes - more.
If your files are basically text - you can version/compare/rollback, if your files are readable by special viewers texts - you can also diff revisons, if your files are readable by eyes - you can also merge sources. If you have GUI, you have all power of SCM and usability of tools.
...And be glad that you did not have to work with something like this
{\rtf1\ansi\ansicpg1251\deff0\deflang1049{\fonttbl{\f0\fswiss\fcharset204{\*\fname Arial;}Arial CYR;}}
{\colortbl ;\red0\green128\blue0;}
{\*\generator Msftedit 5.41.15.1515;}\viewkind4\uc1\pard\f0\fs20\'dd\'f2\'ee \'ef\'e5\'f0\'e2\'e0\'ff \'f1\'f2\'f0\'ee\'ea\'e0\par
\'dd\'f2\'ee \b\'e2\'f2\'ee\'f0\'e0\'ff \cf1\b0\'f1\'f2\'f0\'ee\'ea\'e0\cf0\par
}
(ordinary pure-RTF with short russian text in it)

Archiving succesive beta versions : how to save harddisk space?

I archive successive versions of an in-progress work :
MySoftware-v1.01beta.rar [2 GB]
MySoftware-v1.02beta.rar [2 GB]
MySoftware-v1.03beta.rar [2 GB]
MySoftware-v1.04beta.rar [2 GB]
etc.
Lots of files are modified, so it's not possible to backup only modified files : most of the files are modified each time.
How can do a .rar file that only saves the "difference" (should I use something like "patch" or "diff" ? -> I never used them). There are lots of "difference" tool, okay, but the result file won't be a .rar, it will only be a "difference file" : so each time I would like to re-open such an archive, I'll have to "de-diff" it and only THEN I will have a .rar again.
I'm on Windows, and if possible, I'd like to use winrar or command-line tool (it would be great if no third party software is needed).
Thanks a lot in advance!
You say 90% of your product is .wav files. Since diff on two wav files that are different is likely to produce huge differences, this is not likely to save you any space. Nor are .wav files really compressible, so zip or rar likely doesn't help much, either.
However, if, like most of us programmers, you derive your next version of the product from the previous one, by mostly retaining files unchanged (whether that be source or be .wav files), then what you really want to do is simply store, for each version, the files that changed. This is called "de-duplication" in the backup/compression world.
You can organize a complicated scheme your self to do this. (e.g., your self-suggested "do this with winrar"). But if you use a decent "source control system" (SVN or GIT would be fine), this will happen automatically as you checkin changed (and don't re-checkin unchanged) files. These tools work by keeping track of "differences" between versions; you can tell the tools to track text ("diff") style differences, or simply store the entire thing.
Also, since your individual versions occupy 2GB, I'd go waste $100 on a 2 or 4 terabyte (external) drive. That should last you in worst case through some 1000 iterations. (SVN/GIT will likely extended this a lot further).
You should really be using a source control system. A popular one is called 'git'. There are many others, each with their own strengths and weaknesses and the debate about which is 'best' is long and tedious.
Source control systems take care of storing and managing revisions of your files. The actual methods vary, but as a programmer who uses version control you 'check in' files for storage and version control, 'tag' them with revision numbers and then 'check out' files for modifying.
If you've ever downloaded source off the Internet using 'svn' or 'cvs', that's the type of thing I mean.
The source control system usually uses some sort of difference system to only store differences between modified files. Its purpose is to save you from having to even think about copying and backing up files - all you have to do is ensure your 'repository' is backed up correctly.
Also, as an added advantage you can make changes to source files and always have backups in case your changes need reverting. So suppose you want to try out a new file handling system you can use the source control system to create a testing (or whatever you want to call it) 'branch' and do all your changes in there without damaging a working copy of your software. If the changes are good you can then 'merge' the changes into the non testing branch of your repository.

How to join two files in a version control system

I am doing a refactoring of my C++ project containing many source files.
The current refactoring step includes joining two files (say, x.cpp and y.cpp) into a bigger one (say, xy.cpp) with some code being thrown out, and some more code added to it.
I would like to tell my version control system (Perforce, in my case) that the resulting file is based on two previous files, so in future, when i look at the revision history of xy.cpp, i also see all the changes ever done to x.cpp and y.cpp.
Perforce supports renaming files, so if y.cpp didn't exist i would know exactly what to do. Perforce also supports merging, so if i had 2 different versions of xy.cpp it could create one version from it. From this, i figure out that joining two different files is possible (not sure about it); however, i searched through some documentation on Perforce and other source control systems and didn't find anything useful.
Is what i am trying to do possible at all?
Does it have a conventional name (searching the documentation on "merging" or "joining" was unsuccessful)?
You could try integrating with baseless merges (-i on the command line). If I understand the documentation correctly (and I've never used it myself), this will force the integration of two files. You would then need to resolve the integration however you choose, resulting in something close to the file you are envisioning.
After doing this, I assume the Perforce history would show the integration from the unrelated file in it's integration history, allowing you to track back to that file when desired.
I don't think it can be done in a classic VCS.
Those versioning systems come in two flavors (slide 50+ of Getting git by Scott Chacon):
delta-based history: you take one file, and record its delta. In this case, the unit being the file, you cannot associate its history with another file.
DAG-based history: you take one content and record its patches. In this case, the file itself can vary (it can be renamed/moved at will), and it can be the result of two other contents (so it is close of what you want)... but still within the history of one file (the contents coming from different branches of its DAG).
The easy part would be this:
p4 edit x.cpp y.cpp
p4 move x.cpp xy.cpp
p4 move y.cpp xy.cpp
Then the tricky part becomes resolving the move of y.cpp and doing your refactoring. But this will tell Perforce that the files are combined.