Version control for DOCX and PDF? - version-control

I've been playing around with git and hg lately and then suddenly it occurred to me that this kind of thing will be great for documents.
I've a document which I edit in DOCX and export as PDF. I tried using both git and hg to version control it and turns out with hg you end up tracking only binary and diff-ing isn't meaningful. Although with git I can meaningfully diff DOCX (haven't tried on PDF yet) I was wondering if there is a better way to do it than I'm doing it right now. (Ideally, not having to leave Word to diff will be the best solution.)

There are two different concepts here - one is "can the version control system make some intelligent judgements about the contents of files?" - so that it can store just delta information between revisions (and do things like assign responsibility to individual parts of a file).
The other is 'do I have a file comparison tool which is useful for the types of files I have in the version control system'. Version control systems tend to come with file comparison tools which are inferior to dedicated alternatives. But they can pretty much always be linked to better diff programs - either for all file types or specific ones.
So it's common to use, for example, Beyond Compare as a general compare tool, with Word as a dedicated Word document comparer.
Different version control systems differ as to how good people perceive them to be at handling 'binaries', but that's often as much to do with handling huge files and providing exclusive locking as it is to do with file comparison.

http://tortoisehg.bitbucket.io/ includes a plugin called docdiff that integrates Word and Excel diff'ing.

You can use Beyond Compare as external diff tool for hg. Add to/change your user mercurial.ini as:
[extdiff]
cmd.vdiff = c:/path/to/BCompare.exe
Then get Beyond Compare file viewer rule for docx.
Now you should be able to compare two versions of docx in Beyond Compare.

This article outlines the solution for Docx using Pandoc
While this post outlines solution for PDF using pdf2html.

Only for docx, I compiled instructions for multiple places here: https://gist.github.com/nachocab/6429893
# download docx2txt by Sandeep Kumar
wget -O docx2txt.pl http://www.cs.indiana.edu/~kinzler/home/binp/docx2txt
# make a wrapper
echo '#!/bin/bash
docx2txt.pl $1 -' > docx2txt
chmod +x docx2txt
# make sure docx2txt.pl and docx2txt are your current PATH. Here's a guide
http://shapeshed.com/using_custom_shell_scripts_on_osx_or_linux/
mv docx2txt docx2txt.pl ~/bin/
# set .gitattributes (unfortunately I don't this can't be set by default, you have to create it for every project)
echo "*.docx diff=word" > .git/info/attributes
# add the following to ~/.gitconfig
[diff "word"]
binary = true
textconv = docx2txt
# add a new alias
[alias]
wdiff = diff --color-words
# try it
git init
# create my_file.docx, add some content
git add my_file.docx
git commit -m "Initial commit"
# change something in my_file.docx
git wdiff my_file.docx
# awesome!
It works great on OSX

If you happen to use a Mac, I wrote a git merge driver that can use Microsoft Word and tracked changes to merge and show conflicts between any file types Word can read & write.
http://github.com/jasmas/wordMerge
I say 'if you happen to use a Mac' because the driver I wrote uses AppleScript, primarily to accomplish this task.
It'd be nice to add a vbscript version to the project, but at the moment I don't have a Windows environment for testing. Anyone with some basic scripting knowledge should be able to take a look at what I'm doing and duplicate it in vbscript, powershell or whatever on Windows.

I used SVN (yes, in 2020 :-)) with TortoiseSVN on Windows. It has a built-in function to compare DOCX files (it opens Microsoft Word in a mode where your screen is divided into four parts: the file after the changes, before the changes, with changes highlighted and a list of changes). Screenshot below (sorry for the Polish version of MS Word). I also checked TortoiseGIT and it also has this functionality. I've read that TortoiseHG has it as well.

Related

iOS Localization - Updating Localizable.strings with just new strings

I have searched Google and StackOverflow and still have no clear answer on an easy and automated way of doing this but here is the scenario:
I have an app with 1000 strings localized into en, fr, de, es, it.
I build a new feature that makes 10 distinctly new NSLocalizedString() keys.
I just want those 10 new strings appended onto the ends of the files:
en.lproj/Localizable.strings
fr.lproj/Localizable.strings
es.lproj/Localizable.strings
de.lproj/Localizable.strings
it.lproj/Localizable.strings
genstrings will retrieve all 1010 distinct strings. This is a pain since I'll need to "needle in a haystack" find those 10 strings every time I do an update.
UPDATE 19-SEP-2014 -- XCode 6 - Apple has finally released support for XLIFF export and import of your .strings files
Whats new in XCode 6? Localisation
Linguan (v1.1.3) whilst it is a lovely tool most of the time, it is starting to be a tool in the other sense. It merges the changes but some strings aren't matching correctly when it merges, so everytime it does a Scan Sources it creates 100 new duplicate keys as well as the 10 strings I am after so it is making more work.
FileMerge As suggested below try doing a diff between old and new versions of the genstrings output files. The genstrings output has the strings sorted alphabetically so 10 strings scattered throughout 1000 means that there are 200 differences to review. it keeps matching the /*...*/ and the "..." = "..." and saying that the ... has been updated. It hasn't been updated, just shifted to a new location in the file. More and more it is looking like I am going to have to write a custom tool.
MacHG + FileMerge on a side note, for some strange reason doesn't like doing diffs out of the repository with the working copy of Localizable.strings. Both the left and right panes appear empty.
UPDATE: Turns out variations in some changesets being saved as UTF-16 and some as UTF-8 are screwing with it being able to do a proper diff.
Bash Script + FileMerge I have written the following script to help maintain my english reference file after each time I add new NSLocalizedString entries:
#LOCALISATION UPDATE SCRIPT
#
#This will create a temporary copy of the current 'en' reference file then generate the
#latest reference file using the 'genstrings' tool. Finally forcing FileMerge to launch
#and diff the changes.
#
#Last Updated: 2014-JAN-06
#Author(s): Josh Wilson
clear
#assuming this script is run from $SRCROOT
#Backup Existing 'en' reference
cp "en.lproj/Localizable.strings" "en.lproj/Localizable-src.strings"
#Scan source files for 'NSLocalizableString' macros
genstrings -q -u -o en.lproj Classes/*.{m,mm}
genstrings -q -u -a -o en.lproj Classes/iPad/*.{m,mm}
genstrings -q -u -a -o en.lproj Classes/iPhone/*.{m,mm}
#Force FileMerge to launch and diff the update (NOTE: piping to cat forces GUI to open)
opendiff "en.lproj/Localizable-src.strings" "en.lproj/Localizable.strings" | cat
#Cleanup up temporary file
rm "en.lproj/Localizable-src.strings"
But this only updates the EN file and I am lacking a way of having the other language files updated with the new keys. This one has been good for instances where I don't have an english word as the key and genstrings bombs my
"welcome_message" = "Welcome!" with "welcome_message" = "welcome_message"
POEditor http://poeditor.com/. This is an online tool and subscription based after 1000 strings. Seems to work well but it would be good if there was a non subscription based tool.
Traducto Pro Seems to do an alright job of integrating with XCode and extracting the strings and merging things together. But it is impossible to get anything back out of it until it is fully translated so you are coerced into using their translation services.
Surely this functionality has been implemented before. How does Apple keep their Apps localised?
Script junkies, I call upon thee! iOS development has been going on for some time now and localisation is kind of common, surely there is a mature solution to this by now?
Python Script update_strings.py: Stackoverflow finally recommended a related question and the python script in this answer Best practice using NSLocalizedString looks promising...
Tested it and in its current form (31-MAY-2013) it doesn't handle multiline comments if you have duplicate comments entries (expects single line comments).
Might just need to tweak the regex's a bit.
Checkout BartyCrouch, it perfectly solves your problem. Also it is open source, actively maintained and can be easily installed and integrated within your project.
Install BartyCrouch via Homebrew:
brew install bartycrouch
Alternatively, install it via Mint:
mint install Flinesoft/BartyCrouch
Incrementally update your Localizable.strings files:
$ bartycrouch update
This will do exactly what you were looking for.
In order to keep your Storyboards/XIBs Strings files updated over time I highly recommend adding a build script (instructions on how to add a build script here):
if which bartycrouch > /dev/null; then
bartycrouch update -x
bartycrouch lint -x
else
echo "warning: BartyCrouch not installed, download it from https://github.com/Flinesoft/BartyCrouch"
fi
In addition to incrementally updating your Storyboards/XIBs Strings files this will also make sure your Localizable.strings files stay updated with newly added keys in code using NSLocalizedString and show warnings for duplicate keys or empty values.
Make sure to checkout BartyCrouch on GitHub for additional information.
if you have the genstrings for the previous version, just a "diff" between new and old could do the tricks
EDIT: best use vimdiff to deal with utf-16 files
You can check out this Xcode Plugin I built for OneSky, it aims to improve the localization work flow for iOS/Mac OSX developers.
The string generation feature of the plugin runs genstrings and ibtool --export-strings-file to the selected source/IB files, new files will be added the project and target automatically, new strings will be merged into existing files with comments.
It will only generate/update strings for the base language, but you can make use of other features of the plugin to automate translation export and import with OneSky platform, which is free for crowdsource projects.
You may want to check out my solution here: SwiftyLocalization
With few steps to setup, you will have a very flexible localization in Google Spreadsheet (comment, custom color, highlight, font, multiple sheets, and more).
In short, steps are: Google Spreadsheet --> CSV files --> Localizable.strings
Moreover, it also generates Localizables.swift, a struct that acts like interfaces to a key retrieval & decoding for you (You have to manually specify a way to decode String from key though).
Why is this great?
You no longer need have a key as a plain string all over the places.
Wrong keys are detected at compile time.
Xcode can do autocomplete, so you can do something like this:
// It's defined as computed static var, so it's up-to-date every time you call.
// You can also have your custom retrieval method there.
button.setTitle(Localizables.login.button_title_login, forState: .Normal)
The project uses Google App Script to convert Sheets --> CSV Python script to convert CSV files --> Localizable.strings
You can have a quick look at this example sheet to know what's possible.

Diff merge : view difference between two folders and ignore file version number

I need some help. I have to view difference between two folders, but I need to ignore the file version number (project version number) which is in the header of the file. Like that:
#version Release: $Revision: 9939 $
And do you know the best diff merge software for doing that in Mac OSX and the most beautiful. I know diff merge and Kaleidoscope. I love Kaleidoscope but, it cannot make some difference between two folders.
Many thanks before.
Try going into DiffMerge → Preferences → File Windows → Rulesets. You can modify the existing ruleset for your file type (or add a new one if no ruleset exists already).
Edit the ruleset you're interested in, and go to Lines to Omit. In there you can add a regex to match that line #version.
What I'm having trouble with is getting the folder diff to honor this. I find that files with no diffs according to my rules still end up in the folder diff as a non-match, but when I open the file diff window it says Files are identical or equivalent under the current RuleSet. Not sure if this is a bug or I still have something configured wrong. If I go into Folder Windows → Equivalence Mode and dig into the help there, I think I have all the folder diff properties set correctly to honor my rulesets, but still no luck.
It's a pity you need MacOS. For Win32 there is WinMerge readily configurable via Tools/Filters/Linefilters where you simply enter a regular expression to be ignored.
http://manual.winmerge.org/Filters.html
For example, you might use line filters to ignore comments or certain type of generated code, like version control system timestamps
WinMerge 3 will be Qt based hence MacOS positive too, but current 2.x is not yet.

Multiple repositories in one directory (same level) - is it possible?

My original problem is that I have a directory where I write various scripts. Each of them is independent of others, and usually one-file-long. I want to have some versioning applied to them, but I have the following problems/requirements:
I don't want to have to store each small script in a separate directory!
I don't want to store them all in one repository OTOH, as they are completely unrelated, and:
some of them may later grow to more files (and then they will need a separate dir),
I sometimes want to copy one of them to a different machine (and I want to clone the whole repo).
I want to benefit from (distributed) version control mechanisms -- at least:
"infinite" number of revisions,
ability to clone repositories on different computers,
ability to do "atomic" multi-file commits.
Is it possible?
I'd prefer to do it in some mainstream distributed VCS (a solution using Mercurial would be preferable, but I'm not fixed).
EDIT: the solution has to be free (at least "as in beer") and cross-platform (at least Win32 & Linux).
Related, but didn't help:
"two-git-repositories-in-one-directory" -- didn't find it helpful: the accepted answer looks like point 2. (above) to me; the current "community voted" answer sounds like 1.
"Version control of single files using Subversion" -- also too much of 2. or 1.
These requirements seem pretty "special" to me, so here is a solution on par with them ^^
You may use two completely different VCS, in the same directory. Even two "instances" of SVN might work: SVN stores its metadata in a directory called .SVN and has (for historical reasons regarding ASP) the option to use _SVN. The Directory listing should look like this
.SVN // Metadata for rep1
_SVN // Metadata for rep2
script1 // in rep1
script2 // in rep2
...
Of course, you will need to hide or ignore the foreign scripts or folders from each VCS...
Added:
This only accounts for two scripts in one folder and needs one additional VCS per script beyond that, so if you even consider this route and need more repositories, rename each Metadir and use a script to rename it back before updating:
MOVE .SVN-script1 .SVN
svn update
MOVE .SVN .SVN-script1
Why don't you simply create a separate branch (in the git sense) for each (group of) script(s)?
You can develop them individually as you please. Switching to a branch will show you only the scripts from that branch. It's sort of like directories but managed by the version control system. If you later want to pluck a branch out into another repository, you can do that and if you want to combine two scripts into a single project, you can do that as well. The copying them to the different machine point might be a problem but you can clone the branch you're interested in and you it should work for you.
Another proposition for my own consideration is "Using Convert to Decompose Your Repository" article on hgtip.com. It fails as a "standalone" solution, but could be helpful as an addition to the "mv .hgN .hg / MOVE .SVN-script1 .SVN" idea.
You can create multiple hidden repository directories and symlink .hg to whichever one you want to be active. So if you have two repositories, create directories for them:
.hg_production
.hg_staging
Then to activate either of them just do:
ln -sf .hg_production .hg
You could easily create a bash command to do this. So instead you could write something like activate-repo production, which would run ln -sf .hg_production .hg.
Note: Mac doesn't seem to support ln -sf so instead you'll need to do:
rm .hg; ln -s .hg_production .hg
I can only think of these two lightweight versioning systems:
1) Using Dropbox with the Pack-Rat upgrade, to keep a full history of versions for each file automatically backed up and with the possibility to be shared with multiple Dropbox users: https://www.dropbox.com/help/113
If you have multiple machines managed by the same user (you), the synching would be automatic. Also if the machines are in the same LAN, Dropbox is smart enough to sync the files over the local network, so big files shouldn't be a worry.
2) Using a 'Versions' aware text editor for Mac OS X Lion. I'd expect TextMate, Coda and other popular Mac code editors to be updated to support this feature when Lion is released.
How about a compromise between 1 and 2? Instead of a folder+repo for each script, can you bundle them into loosely related groups, such as "database", "backup", etc. and then make one folder+repo for each group? Then if you clone a repo on another machine, you're only pulling down a smaller number of unrelated files. (Is the bandwidth/drivespace really a concern?) To me, this sounds WAAAY simpler than all of the other suggestions so far.
(Technically this approach meets your requirements because (1) each script isn't in its own directory, (2) not all scripts are in the same repository, and (3) you can easily do this with any popular DVCS. :D)
UPDATE (2016): Apparently, a guy named Cosmin Apreutesei created a tool named multigit, which seems to implement what I wished for in this question! If you ever read it, thanks a lot Cosmin! I've started using your tool this year and find it awesome.
I'm starting to think of some kind of an overlay over Mercurial/git/... which would keep a couple "disabled" repository meta-directories, let's say:
.hg1/
.hg2/
.hg3/
etc., and then on hg commit FILENAME would find the particular .hgN that is linked to FILENAME, and would then temporarily:
mv .hgN .hg
hg commit FILENAME
mv .hg .hgN
The main disadvantage is that it would require me to spend some time writing the tool. Or does anybody know of some ready-made one like this? If you do, please post as a full-featured answer (not a comment), I'm more than willing to accept it.

Getting more out of *.diff -files

I wonder if there are tools to show *.diff files used in patching related to debian packaging. What I need from the tool is that it could just read the diff file and show the actual files changed with changed rows, like kdiff or meld would do when comparing directly 2 different files. Or maybe I have totally wrong kind of approach to this, maybe I should ask how can I get more out of diff-files?
Kompare is able to open a .diff, and it shows you the files changed at the top, alist of changes of the selected file, and a side by side diff (for the lines that it is able to extract from the .diff.
However, when I feed it a debdiff, it got confused. The diff did not have === file headers, only --- and +++ headers, and so it included the changes from the /debian/changelog, /debian/copyright, and /debian/rules with in the /debian/control file. Ymmv.
Screenshot: http://imagebin.ca/view/fNWEzx.html
The Debian diff format seems to be a special diff format. As my short google search didn't result in a graphical tool, which could handle these files in the way normal diff tools do, I'm not sure, if such a tool exists. Perhaps you could try to convert these debiff files to normal diff files (I didn't find a tool, which would do that, either).
There is a tool to visualize changes in Linux packages (Deb, RPM, TAR.GZ, etc.) - pkgdiff.
Usage:
pkgdiff -old OLD.deb -new NEW.deb
Sample reports:
http://lvc.github.com/pkgdiff/pkgdiff_reports/libqb/0.4.1_to_0.8.1/changes_report.html
http://lvc.github.com/pkgdiff/pkgdiff_reports/gstreamer/0.10.23-i486-1_to_0.10.32-i486-1/changes_report.html

Merge tools that ignores $Id lines

I need to merge a forked project.
Unfortunately, the CVS $Id lines are different so the merge tools I tried report that all the files are different (and 95% of them have only this line different)
Is there a merge tool that can be configured to ignore line comparison results based on a pattern ?
[edit]
I discovered that WinMerge has line filters - setting up them correctly actually works.
Francesco
I use meld, which can use regex filters to ignore.
It has some preset ones you can select including CVS keywords.
The regex it uses for that BTW is:
\$\w+(:[^\n$]+)?\$
You can get meld on any linux distro or
download from here: http://meld.sourceforge.net/
I'm not sure how it's supported on windos,
but I do know kdiff3 supports windows so you could
give it a try there: http://kdiff3.sourceforge.net/
well you could use
cvs update -kk
whick does not expand the $words.
of course this is still a problems the $log which is expanded on commits and not updates.
CompareIT allow to use regular expression matching. I used it for automatically generated code comparison and it was very useful.