Added Word Doc to CVS - became corrupt - ms-word

I'm using CVSNT. I added a Microsoft 2007 docx file "as text" to the repository. After committing and before updating I tried to open the file again but was unable to. It said it was corrupt.
I tried using the office word doc recovery and that was unable to recover the document.
From what I understand I should've added the word doc as a binary file instead of text. My mistake.
Unfortunately my word doc is still corrupt. Is there anyway to get that back?

The file is lost. :( Looking at an older version of the document, the filesize is 40k. The file stored in cvs is 1k. Too much info loss to be recovered.
Note to self. Use git.

Unfortunately there is no real way. Adding as text instructs CVS to manage the line endings, CVS will have played with \r and \n such that it is unrecoverable.

CVS probably attempted to change line endings. If you change CR/LF to LF inside, the problem may be fixed, or you may have gotten closer to a situation where the recovery tool works.

Related

Edmx update model add blank lines from autogeneration

I have an annoying problem and can't seem to figure out what's causing it. On my machine when I try to use Update Model from Database... on Edmx file in EF Database first approach the autogenerated model has blank lines between properties. This doesn't seem to occure on other developers machines even though we have same versions of VS , extensions etc.
Problem is that even when I add for example one new table the refresh automatically adds blank lines for all mapped tables. Later all of this is visible as conflict during merge operations in GIT.
Would really appreciate any help since I did't find a single shred of information on this issue anywhere and this really disrupts work.
I checked the files (Model.tt on my machine and my friends) using Notepad++ comparer and it said there are no differences but the encoding is different. When I copied Model.tt manually and did the update the blank lines were gone.... Must be some kind of quirk.
Posting as an answer since I wasted few hours on this and someone might have simmilar problem.
What worked for me
💡Turns out it was how my OS was ending lines
Working in Windows. Earlier disabled "auto carriage returns (CR) + line feeds (LF) line endings" in global Git configuration, reenabled:
git config --global core.autocrlf true
FYI 'nix/Mac ends lines w/ LFs only, Windows end lines w/ CRs + LFs
Opened up *DataModel.tt and *DataModel.Context.tt in Notepad++
Edit > EOL Conversions > Windows (CR LF) > Save
Refresh EDMX
Looking for a better terminal-based solution, sounds like dos2unix will come in to play at some point. Will amend this as soon as I've ironed this out.

Huge docx filled with <w:p> tags

My girlfriend is writing a Word document for a homework. She's using the old .doc format as required by her teacher ( :'( ).
At some point, the .doc file went from 150 kB to 2.6 MB with no noticeable change (seen in Dropbox history. Sadly, Word's comparison function fails because Word crashes). From that point, she was unable to save her document without crashing word...
I converted the .doc to docx, unzipped it, and found a 18 MB document.xml file !
I can't even format the xml properly because it crashes Notepad++, but I can see that the file is filled with the same xml tag repeating over and over :
<w:p w:rsidR="002A70E5" w:rsidRDefault="002A70E5" w:rsidP="00565ED9"/>
Do you have any idea what could cause this ?
EDIT: Here's the docx
EDIT2: The motivation for this question is more curiosity than looking for a fix. Thanks for your answers though.
If you're willing to edit the XML directly, you can just delete all the empty <w:p> tags and rezip.
If you're good with Python, you might give python-docx a try and use it to delete all empty paragraphs.
Hopefully that will at least recover the work she's done so far.
Not sure how this would happen, or whether it matters much. Only thing I can think of is a sticking Return key on the keyboard that would insert a huge number of carriage returns. Those each insert a new paragraph. I've actually had that happen occasionally on a Windows virtual machine running on my Mac. No clue why it does it though.
The tag you are talking about is the OpenXml format for building word documents. The openxml stores the document as a zipped file and I am afraid you are seeing the unzipped document.xml file. If you want to keep working with the doc just convert the doc file to docx. Dont unzip it.

Prevent Word 2010 from saving o:gfxdata base64 or uuencoded VML?

I am working with .docx files containing several drawing canvases with images inserted and some lines and arrows drawn in Word 2010. I am using 2010 format with no compatibility mode.
Word inserts an o:gfxdata attribute into each v:shape and v:group element and fills it with ascii encoded something. From what I have read it may be a copy of the VML describing the v:shape or v:group. I don't know if I just don't know what to look for, but I cannot determine what this data is for as its removal has no apparent effect on my ability to read or edit the document in Word 2003, 2007, or 2010.
It does swell the document.xml to almost twice the (apparent) necessary size. This considerably slows OpenTBS' processing so I would like to remove it, if possible. Does anyone know of a way to tell Word 2010 to quit saving this extra data? Or what it is for? I have really struggled to find any documentation on it beyond this post.
Edit:
Here is a sample .docx. The document.xml is ~141KB and OpenTBS takes an average of 10.35 seconds to create a file that includes this as a subtemplate 21 times. If I remove all of the o:ogfxdata attributes, the file size is reduced to ~37KB and OpenTBS takes only 2.99 seconds to produce the same file.
Edit 2:
After further investigation, it appears the removal of the o:gfxdata may cause Word 2003 with an older Compatibilty Pack installed, to object to the file with the following error:
"This is a pre-release version of the Compatibility Pack and can open
pre-release Office 2007 files only. Do you want to check for a newer
version of the Compatibility Pack?"
I have been able to open the file by installing a newer compatibility pack - though it prompts the user about the incompatibility and converts the file in order to open it. This does not damage my file, but it is something to look out for.
Attribute o:ogfxdata is poorly documented in the web.
According to your investigations, it's some kind of compatibility extra information.
You can delete those attributes in your template using OpenTBS.
The cleaning can be done once on your template without any merging, and then save the cleaned template as a new template. Or you can perform the cleaning each time you open the template.
Cleaning the DOCX file:
while ($x = clsTbsXmlLoc::FindStartTagHavingAtt($TBS->Source, 'o:gfxdata', 0) ) {
$x->ReplaceAtt('o:gfxdata', '');
$TBS->Source = str_replace(' o:gfxdata=""', '', $TBS->Source);
}
Note that the class clsTbsXmlLoc is provided with OpenTBS and is undocumented.
The code should work since OpenTBS 1.8.0. (which is currently in stable beta version).
I've noticed that since attributes o:gfxdata are deleted, they do not come back immediately when you edit the docx.

CVS keeps adding code at the end of the file I want to commit

I have trouble with 4 files in my CVS project. Each time I commit one of those files, CVS keeps adding the same line of code at the end of it. This line of code is a repeated line of the current file (but not the last line of it).
I've try several things : update, delete lines and commit, delete all lines and commit, adding lines and commit, adding header and commit. But I always get the same line of code added to the end of my file. I could delete all files and recreate those, but I would lost all my history data.
I find it awkward that CVS is modifying my file when I commit. Is it not counter productive as it may add errors in a compliant code?
I could add that my file is a .strings (text file, unicode). I'm working on a branch, but recently merge it in the trunk.
More Details:
I'm using TortoiseSVN on a virtual Windows machine, which has access to my Documents folder of Mac OS X via a Network Drive between those two.
It turns out that my colleague, which has the same project but on a real Windows folder, could commit without any problem.
And now that he done that, the problem is solve for me too.
But I have no idea what happen. My only clue would be a hidden character in Mac OS X that would breaks TortoiseSVN. Is it possible?
I haven't experienced this issue with CVS, but note that you mention that the file you are editing is Unicode text (you don't mention if this means UTF8 or UTF16, but either can cause issues).
Depending on how your CVS server was built, and how (and on what platform) it is being run, it is highly possible that the server is not Unicode-aware. This can cause a whole range of possible issues, including expanding RCS-style $ tags in places where the second (or later) byte of a Unicode character is equal to ASCII '$'.
The workaround for this is to mark Unicode source files as binary objects. From the command line, this can be done using
cvs add -kb file-name
when adding a new file, or
cvs admin -kb file-name
for an existing file (replace file-name with the name of your file).
In the latter case, I'd recommend removing the (local copy of the) file and running 'cvs update' to get it back after changing the type.
Note that doing this is unlikely to help with changes you're already seeing in the file, so make sure to check the file, and fix any existing problem after making this change.

CVS keyword substitution and Microsoft Word file

CVS has the keyword substitution feature: in a text file you write $Header$ and, when you commit the file, CVS substitutes $Header$ with something like $Header: /repo/src.cpp,v 1.6 2009/03/12 14:53:14 luser Exp $
Is it possible to get the same feature when dealing with a binary Microsoft Word file?
Thank you.
The basic problem you have with a Word file is that it is effectively a binary file (as opposed to a plain-text file), so you cannot be sure a key string like "$Header$" doesn't appear somewhere (VB macro code, for example) by accident. CVS would expand that key string, and suddenly something apparently unrelated (VB macro code, for example...) stops working.
Using CVS? Not likely. Even if $Header$ doesn't appear anywhere in your Word document (as DevSolar suggested it might), where do you place that string? Word stores text in its proprietary binary format, but CVS looks for plain text.
On the other hand, I'm sure you can achieve the effect by using either an XML Word format, or a Word macro.
Seems almost impossible with the traditional .doc format. Some creative work might allow you to create a process for making it happen with the newer XML format. I'm not sure CVS can do the job even then, but using a post-commit hook in subversion might make it more reasonable to pull off.