Using version control for RTF documents? - version-control

I have recently started a new job as a Business Systemd Analyst. The company has an in-house document management system that reads/parses RTF documents that have a BBCode-like syntax to do basic conditional, looping and inserting of data from a database; my role is to modify these RTF files with the code blocks to make them dynamic.
For my own personal use I would like to utilize a version control system to better handle revisions and so I don't have to have dozens of copies of a file during the various stages I'm working on them, probably Mercurial (I don't feel like dealing with Cygwin), but seeing as I'm more used to source code in an IDE than a rich text document template, I'm not quite sure if a VCS system is even the appropriate solution to use as I couldn't really use them to diff files, just as storage and tracking.
Any suggestions for this? Could I get by with a VCS system or am I applying programmer logic to a non-programming problem? :)

seeing as I'm more used to source code in an IDE than a rich text
document template
It is a look at a strange angle: you can version all, always, anytime. Just sometimes it's less usable, sometimes - more.
If your files are basically text - you can version/compare/rollback, if your files are readable by special viewers texts - you can also diff revisons, if your files are readable by eyes - you can also merge sources. If you have GUI, you have all power of SCM and usability of tools.
...And be glad that you did not have to work with something like this
{\rtf1\ansi\ansicpg1251\deff0\deflang1049{\fonttbl{\f0\fswiss\fcharset204{\*\fname Arial;}Arial CYR;}}
{\colortbl ;\red0\green128\blue0;}
{\*\generator Msftedit 5.41.15.1515;}\viewkind4\uc1\pard\f0\fs20\'dd\'f2\'ee \'ef\'e5\'f0\'e2\'e0\'ff \'f1\'f2\'f0\'ee\'ea\'e0\par
\'dd\'f2\'ee \b\'e2\'f2\'ee\'f0\'e0\'ff \cf1\b0\'f1\'f2\'f0\'ee\'ea\'e0\cf0\par
}
(ordinary pure-RTF with short russian text in it)

Related

Can VSCode display a binary file if there is an executable that will convert it to an equivalent text file?

I sometimes use VSCode for a Delphi 7 project because I like VSCode's git functionality and for a few other reasons (superior string search, diff, etc).
Delphi 7 is a pain, and to get it to consistently compile I need to convert the dfm files to their binary version (all 2300 of them). This of course makes them unviewable in the diff viewer, or to just open the file?
Is there a setting where if I open that file, it will first pass it through the convert.exe (that's its actual name) util so that it can be viewed as a text? I understand that this might be read-only, which would be sufficient to my needs (though if on save it could just pass it back through, that'd be great too).
I'm having trouble figuring out what exactly to to search for on Google (the keywords seem too generic), but I can imagine some generalized functionality that would work for other environments beyond just Delphi/pascal.

current scctext replacement for textual representation of vfp binary files

What are people using in vfp 9 for a replacement for the built-in scctext.prg that translates binary files in vfp to a textual representation?
We’ve moving an existing project that’s in vfp 9 sp1 into tfs source control, but we need a way to make sure that the non-textual files are able to get the benefits of comparison that only non-binary text files allow. We plan to check both the textual representation and the binary file into source control (the binary is more for the “just in case” scenario)
According to the document at
http://www.ita-software.com/papers/Borup_Mercurial_Published.pdf
there are at least three options for converting .scx, .frx, .lbx, .prj and other non-prg dbf files in visual foxpro (vfp) to a textual representation. Only some of them allow for converting the textual information back to binary - not sure how often we’d really use that or not.
ALTERNATE SCCTEXT
This one seems older with latest version in 2009 - not sure if it’s still the preferred tool - and it seems to have no way to take the textual representation and convert it back to a binary file.
http://vfpx.codeplex.com/releases/view/12955
TWOFOX
This one seems similar to the foxbin2prg except it creates xml files - seems like only one dev is working on it unlike the others that are open to contributions from others so not sure how current it is and how much it’s being used by other developers - it does have two way conversion like fox2binprg has.
http://www.foxpert.com/downloads.htm
FOXBIN2PRG
This one is fairly recent - but not sure if it’s production ready enough to use for prod coding working - it does have two way conversion
http://vfpx.codeplex.com/releases/view/116407
TRIGGER INVOKE ONE OF THE ABOVE ON CHANGE OF BINARY FILES IN VFP IDE
What are people using to invoke these textual representation options?
I’ve seen this class that was created to run one of the programs listed above for all files in the project. Apparently it does it when the date time of the last generate is older that the date time on the textual version of the file. One detriment I’ve read is that it generates for foundation classes and other things that really are not items that a dev is working on (code that is referenced by but not included in your project).
http://codepaste.net/9yy1gm
Thanks for any advice from those that are using vfp 9 with source control out there!
You should check out the scX library written by Paul McNett which is published on Ed Leafe's web site. I haven't used it in a mission-critical software project yet, but I have tested it out. It seemed to catch all the potential problems I've encountered with other scctext replacements.
The reason I haven't used it in a big project for a couple of reasons.
It is a breaking change for source control history. So, comparing source code in your current SCA or VCA files with the new files generated by scX isn't going to be simple.
It isn't a drop in replacement for scctext. Instead of checking files into and out of source control directly from the IDE, you'll have an intermediary folder.
You'll check your files out of source control into one folder, convert them to FoxPro format, and then edit them in the FoxPro IDE.
Then, you'll save your changes in the FoxPro IDE, convert them to scX format, and then check them into source control.
I'm sure much of #2 can be automated; but combined with #1, making the change to scX wasn't worth it for me.
FoxBin2Prg is Production ready, and AFAIK, it's the only tool that allow Diff and Merge of the generated text (tx2) files, and can regenerate the binaries from them.
The generated files are PRG style, so developers can see them as modifying a PRG (with PROc/ENDPROC structures and such), but they aren't mean to compile. Primary use is for SCM tools, but can be used seperately.
I'm actually using on production code with a 10 member team using concurrent modifications on forms and classes.
Some documentation is available on VFPx in English and Spanish, Internal messages are vailable on both languages and from version v1.19.24 a new translation to German is available too.
More info on VFPx site,
Best regards!

Mercurial: Pre and Post merge operations (per file)

We are using Mercurial as an SCM to handle the source script files of a program. Each project we manage has ~5000 files with each file containing a section with some product-specific informations about the file itself (version list, date, time etc.). This section is - due to the way it is structured - in 80% of the merges, the only section that has conflicts. They are easily resolved, but when merging around 300 files, it gets tiresome.
The problem is: I have no control over the way this section is written and I cannot change the format of the section itself, as it would make the file unusable by the program.
My question: is there a way in mercurial (hooks?), that allows me to
pre-process the file with a script
let mercurial do the merge
if merged correctly: post-process the file with a script. otherwise: "resolve-conflicts" as usual.
You could probably get away with it by creating a custom merge tool:
https://www.mercurial-scm.org/wiki/MergeToolConfiguration
A simple script that invokes 'diff' after removing the ever changing sections might be enough.
It sounds like those sections are the sort of nonsense that the (disrecommended) KeywordsExtension are built to handle, but I gather you don't have a lot of flexibility around them.

Conversion between docx / doc / rtf and lightweight markup

I am looking for a tool or set of tools to convert between file formats D and M where
D is a format handled by MSWord, in order of preference, docx, doc, rtf
M is a lightweight markup, such as markdown, textile, txt2tags, it can be an esoteric one
there is a way to generate html from M
conversion is two-way, it's done both from D to M, and from M to D
utf-8 encoding is handled properly
the content is simple, paragraphs, some simple formatting like bold and italics, maybe lists
the tools are platform-independent
What I've found so far
TeX, LaTeX -- too heavyweight
docx2txt -- too lightweight, it supports no formatting at all
html -- MSWord produces bloated html
a few one-way conversions, like doc to mediawiki,
UPDATE:
The use case is a document workflow between technical and non-technical people
I, the technical guy edit a document in plain text, put it into version control, etc.
I send it to my manager or other non-technical people
They add comments, make changes to it using their Word, then they send it back to me
I want to simply grok their changes, make my changes, put it into version control, without having to use Word
I think that Pandoc much more than meet all requirements.
http://pandoc.org
Adam, I've used docx4j to convert docx to html, edit the html in CKEditor, and then use docx4j to convert the html back to docx. My process made some assumptions about the css (ie it was designed to handle docx4j's clean html, and editing in CKEditor).
You don't say whether there is a way to generate M from HTML?
This is probably hard to do two-way, since you will have impedance mismatches between the various formats.
The best world I can think of would be a sort of Wiki / Word hybrid: Maybe you can get Google Wave to do that for you?
Another solution that might work is a CMS like Plone (did they ever add WYSIWIG capability? I stopped caring after version 1). Keep your documents there. Let the system handle changes, annotations etc. You can automate retrieval of the source (should be ReStructuredText) and commit that to your source control if you have to.
This script I wrote might help you in your workflow:
https://github.com/matb33/docx2md
It is a command-line PHP script that will only work with .docx files. It will extract the XML, run some XSL transformations, and provide you the result in Markdown format.
I encourage you to send me .docx files that don't convert accurately. I'd love to make this script as robust and reliable as possible.

Version control for DOCX and PDF?

I've been playing around with git and hg lately and then suddenly it occurred to me that this kind of thing will be great for documents.
I've a document which I edit in DOCX and export as PDF. I tried using both git and hg to version control it and turns out with hg you end up tracking only binary and diff-ing isn't meaningful. Although with git I can meaningfully diff DOCX (haven't tried on PDF yet) I was wondering if there is a better way to do it than I'm doing it right now. (Ideally, not having to leave Word to diff will be the best solution.)
There are two different concepts here - one is "can the version control system make some intelligent judgements about the contents of files?" - so that it can store just delta information between revisions (and do things like assign responsibility to individual parts of a file).
The other is 'do I have a file comparison tool which is useful for the types of files I have in the version control system'. Version control systems tend to come with file comparison tools which are inferior to dedicated alternatives. But they can pretty much always be linked to better diff programs - either for all file types or specific ones.
So it's common to use, for example, Beyond Compare as a general compare tool, with Word as a dedicated Word document comparer.
Different version control systems differ as to how good people perceive them to be at handling 'binaries', but that's often as much to do with handling huge files and providing exclusive locking as it is to do with file comparison.
http://tortoisehg.bitbucket.io/ includes a plugin called docdiff that integrates Word and Excel diff'ing.
You can use Beyond Compare as external diff tool for hg. Add to/change your user mercurial.ini as:
[extdiff]
cmd.vdiff = c:/path/to/BCompare.exe
Then get Beyond Compare file viewer rule for docx.
Now you should be able to compare two versions of docx in Beyond Compare.
This article outlines the solution for Docx using Pandoc
While this post outlines solution for PDF using pdf2html.
Only for docx, I compiled instructions for multiple places here: https://gist.github.com/nachocab/6429893
# download docx2txt by Sandeep Kumar
wget -O docx2txt.pl http://www.cs.indiana.edu/~kinzler/home/binp/docx2txt
# make a wrapper
echo '#!/bin/bash
docx2txt.pl $1 -' > docx2txt
chmod +x docx2txt
# make sure docx2txt.pl and docx2txt are your current PATH. Here's a guide
http://shapeshed.com/using_custom_shell_scripts_on_osx_or_linux/
mv docx2txt docx2txt.pl ~/bin/
# set .gitattributes (unfortunately I don't this can't be set by default, you have to create it for every project)
echo "*.docx diff=word" > .git/info/attributes
# add the following to ~/.gitconfig
[diff "word"]
binary = true
textconv = docx2txt
# add a new alias
[alias]
wdiff = diff --color-words
# try it
git init
# create my_file.docx, add some content
git add my_file.docx
git commit -m "Initial commit"
# change something in my_file.docx
git wdiff my_file.docx
# awesome!
It works great on OSX
If you happen to use a Mac, I wrote a git merge driver that can use Microsoft Word and tracked changes to merge and show conflicts between any file types Word can read & write.
http://github.com/jasmas/wordMerge
I say 'if you happen to use a Mac' because the driver I wrote uses AppleScript, primarily to accomplish this task.
It'd be nice to add a vbscript version to the project, but at the moment I don't have a Windows environment for testing. Anyone with some basic scripting knowledge should be able to take a look at what I'm doing and duplicate it in vbscript, powershell or whatever on Windows.
I used SVN (yes, in 2020 :-)) with TortoiseSVN on Windows. It has a built-in function to compare DOCX files (it opens Microsoft Word in a mode where your screen is divided into four parts: the file after the changes, before the changes, with changes highlighted and a list of changes). Screenshot below (sorry for the Polish version of MS Word). I also checked TortoiseGIT and it also has this functionality. I've read that TortoiseHG has it as well.