CVS has the keyword substitution feature: in a text file you write $Header$ and, when you commit the file, CVS substitutes $Header$ with something like $Header: /repo/src.cpp,v 1.6 2009/03/12 14:53:14 luser Exp $
Is it possible to get the same feature when dealing with a binary Microsoft Word file?
Thank you.
The basic problem you have with a Word file is that it is effectively a binary file (as opposed to a plain-text file), so you cannot be sure a key string like "$Header$" doesn't appear somewhere (VB macro code, for example) by accident. CVS would expand that key string, and suddenly something apparently unrelated (VB macro code, for example...) stops working.
Using CVS? Not likely. Even if $Header$ doesn't appear anywhere in your Word document (as DevSolar suggested it might), where do you place that string? Word stores text in its proprietary binary format, but CVS looks for plain text.
On the other hand, I'm sure you can achieve the effect by using either an XML Word format, or a Word macro.
Seems almost impossible with the traditional .doc format. Some creative work might allow you to create a process for making it happen with the newer XML format. I'm not sure CVS can do the job even then, but using a post-commit hook in subversion might make it more reasonable to pull off.
Related
I sometimes use VSCode for a Delphi 7 project because I like VSCode's git functionality and for a few other reasons (superior string search, diff, etc).
Delphi 7 is a pain, and to get it to consistently compile I need to convert the dfm files to their binary version (all 2300 of them). This of course makes them unviewable in the diff viewer, or to just open the file?
Is there a setting where if I open that file, it will first pass it through the convert.exe (that's its actual name) util so that it can be viewed as a text? I understand that this might be read-only, which would be sufficient to my needs (though if on save it could just pass it back through, that'd be great too).
I'm having trouble figuring out what exactly to to search for on Google (the keywords seem too generic), but I can imagine some generalized functionality that would work for other environments beyond just Delphi/pascal.
Are there any Perl methods or modules to get the SVN keyword of a given file without calling the system SVN installation?
Thanks!
If you do not involve SVN, then the file is just a text file. Whether or not it is version controlled is not a property of the file but is stored in the hidden .svn folder on the same level. That means, if you want to analyze the file without involving SVN, you have to treat it as a plain text file. However, expanded SVN keywords do follow a certain syntax (or SVN itself would be unable to find them after they have been expanded once).
In your case, any file that had the keyword substituted most likely has a line like
$Id: ActualIdHere $
and if the keyword has not yet been substituded, it will just be $Id$. And Perl is very able to extract data from a known syntax. Especially given that you should know what characters can make up an Id (hint: Avoid $ if any possible).
To answer your specific question: A regex like /^\$Id\: (.*) \$$/ should probably find the Id you are looking for and store it in $1. That is of course assuming there is nothing else in that line. But only you can know how and where you have what sort of keywords, so it is up to you to craft the regex or to provide more detailed information, if you struggle to.
you may try something like this, to get the svn id into a perl variable.
my ($SVN_ID) = ('$Id: ActualIdHere $' =~ /^\$Id\: (.*) \$$/);
print $SVN_ID;
But for SVN you also have to enable keyword substitution in SVN, see svn documentation
Note: the inserted id is the value at the time the file was last changed. The id may have changed later than that for other files.
First, let me explain what I am doing. I have a CVS repository that I store 5,000 Data Definition Language files in. These 5,000 files are generated from an external data modeling application, they are text and have windows CRLFs. During development, if I need to make a change, I re-generate the 5,000 files and then overwrite the contents of my local CVS workspace in eclipse. The full overwrite/replacement is to make sure that I don't miss any updates to files. After overwriting/replacing the files, I use eclipse to do a team < synchronize with repository. When I do this, the comparison flags every single file as an outgoing change because it looks to not be ignoring CRLFs in its comparison. I have "Ignore white space" checked off and the eclipse documentation states that it should be ignoring CRLFs:
Ignore whitespace option:
Causes the comparison to ignore differences which are whitespace characters
(spaces, tabs, etc.). Also causes differences in line terminators ( LF
versus CRLF) to be ignored.
When I open the files in text compare, it shows no diffs but there is an extra CRLF at top of one of the files. Is this a bug or is there an option I am missing in eclipse? It looks like the problem is that it doesn't ignore CRLFs that are on their own line.
The Eclipse compare dialog doesn't have a bug; you're just confused because you're seeing the output of several, independent problems.
The option "ignore whitespace" only reduces the amount of changes that the compare dialog shows; it has no effect whatsoever on the differences that CVS sees. So as long as the files have the wrong line ending, CVS will complain.
Some version control systems allow you to specify converters to solve this issue, CVS doesn't. So you really need to generate files with the correct line endings.
The "single file with extra CRLF" really has a an extra CRLF. Find out why and fix that to make the difference go away.
When generating files, you should never use PrintStream or PrintWriter. It is tempting but these two have many bugs (like close() doesn't flush(), violating their API contract) plus they use platform dependent line endings which is almost never what you want. Yes, it might work by accident but trust me on this, that's not what you want. You don't want you pay check filed on accident, either, right?
If you don't use PrintStream nor PrintWriter, then avoid the System property line.separator for the same reasons.
I suggest to wrote a helper class which has many of the methods of PrintStream / PrintWriter but none of the bugs. Plus it should allow you to set the line delimiter to whatever you need.
Note: If you use a Writer, make sure you also specify the charset / encoding or the "UTF-8 to bytes" conversion will be as random as the line endings.
I could not find this answer in the man or info pages, nor with a search here or on Google. I have a file which is, in essence, a text file, but it somehow got screwed up upon saving. (I think there are a few strange bytes at the front of the file accidentally.)
I am able to open the file, and it makes sense, using head or cat, but not using any sort of editor.
In the end, all I wish to do is open the file in emacs, delete the "messy" characters, and save it once cleaned up. The file, however, is huge, so I need something powerful like emacs to be able to open it.
Otherwise, I suppose I can try to create a script to read this in line by line, forcing the script to read it in text format, then write it. But I wanted something quick, since I won't be doing this over & over.
Thanks!
Mike
perl -i.bk -pe 's/[^[:ascii:]]//g;' file
Found this perl one liner here: http://www.perlmonks.org/?node_id=619792
Try M-xfind-file-literally in Emacs.
You could edit the file using hexl-mode, which lets you edit the file in hexadecimal. That would let you see precisely what those offending characters are, and remove them.
It sounds like you either got a different line ending in the file (eg: carriage returns on a *nix system) or it got saved in an unexpected encoding.
You could use strings to grab "printable characters in file". You might have to play with the --encoding though I have only ever used it to grab ascii strings from executable files.
I've been playing around with git and hg lately and then suddenly it occurred to me that this kind of thing will be great for documents.
I've a document which I edit in DOCX and export as PDF. I tried using both git and hg to version control it and turns out with hg you end up tracking only binary and diff-ing isn't meaningful. Although with git I can meaningfully diff DOCX (haven't tried on PDF yet) I was wondering if there is a better way to do it than I'm doing it right now. (Ideally, not having to leave Word to diff will be the best solution.)
There are two different concepts here - one is "can the version control system make some intelligent judgements about the contents of files?" - so that it can store just delta information between revisions (and do things like assign responsibility to individual parts of a file).
The other is 'do I have a file comparison tool which is useful for the types of files I have in the version control system'. Version control systems tend to come with file comparison tools which are inferior to dedicated alternatives. But they can pretty much always be linked to better diff programs - either for all file types or specific ones.
So it's common to use, for example, Beyond Compare as a general compare tool, with Word as a dedicated Word document comparer.
Different version control systems differ as to how good people perceive them to be at handling 'binaries', but that's often as much to do with handling huge files and providing exclusive locking as it is to do with file comparison.
http://tortoisehg.bitbucket.io/ includes a plugin called docdiff that integrates Word and Excel diff'ing.
You can use Beyond Compare as external diff tool for hg. Add to/change your user mercurial.ini as:
[extdiff]
cmd.vdiff = c:/path/to/BCompare.exe
Then get Beyond Compare file viewer rule for docx.
Now you should be able to compare two versions of docx in Beyond Compare.
This article outlines the solution for Docx using Pandoc
While this post outlines solution for PDF using pdf2html.
Only for docx, I compiled instructions for multiple places here: https://gist.github.com/nachocab/6429893
# download docx2txt by Sandeep Kumar
wget -O docx2txt.pl http://www.cs.indiana.edu/~kinzler/home/binp/docx2txt
# make a wrapper
echo '#!/bin/bash
docx2txt.pl $1 -' > docx2txt
chmod +x docx2txt
# make sure docx2txt.pl and docx2txt are your current PATH. Here's a guide
http://shapeshed.com/using_custom_shell_scripts_on_osx_or_linux/
mv docx2txt docx2txt.pl ~/bin/
# set .gitattributes (unfortunately I don't this can't be set by default, you have to create it for every project)
echo "*.docx diff=word" > .git/info/attributes
# add the following to ~/.gitconfig
[diff "word"]
binary = true
textconv = docx2txt
# add a new alias
[alias]
wdiff = diff --color-words
# try it
git init
# create my_file.docx, add some content
git add my_file.docx
git commit -m "Initial commit"
# change something in my_file.docx
git wdiff my_file.docx
# awesome!
It works great on OSX
If you happen to use a Mac, I wrote a git merge driver that can use Microsoft Word and tracked changes to merge and show conflicts between any file types Word can read & write.
http://github.com/jasmas/wordMerge
I say 'if you happen to use a Mac' because the driver I wrote uses AppleScript, primarily to accomplish this task.
It'd be nice to add a vbscript version to the project, but at the moment I don't have a Windows environment for testing. Anyone with some basic scripting knowledge should be able to take a look at what I'm doing and duplicate it in vbscript, powershell or whatever on Windows.
I used SVN (yes, in 2020 :-)) with TortoiseSVN on Windows. It has a built-in function to compare DOCX files (it opens Microsoft Word in a mode where your screen is divided into four parts: the file after the changes, before the changes, with changes highlighted and a list of changes). Screenshot below (sorry for the Polish version of MS Word). I also checked TortoiseGIT and it also has this functionality. I've read that TortoiseHG has it as well.