GNU Diff Utility report as Differ where Files are Same - windows-xp

I have installed diffutils-2.8.7-1.exe in Windows XP System.
I have created a MS Office Word Document with some text and an image.
Scenario 1:
Command: diff --report-identical-files "file1.doc" "file1.doc"
It gives the output as "Identical".
Action: Now i have copy pasted the file1.doc.
Scenario 2:
Command: diff --report-identical-files "file1.doc" "Copy of file1.doc"
It gives the output as "Identical".
Action: Now i have opened the file1.doc and Save As file2.doc
Without doing any modification in the content.
Visually both files look identical.
Scenario 3:
Command: diff --report-identical-files "file1.doc" "file2.doc"
It gives the output as "Differ".
Query: Could anyone please explain how it can happen ??
Is diff utility checks something beyond the content of the document ??

The reason for two .doc files to be different even if their contents are identical is that there are additional metadata saved in the file, and these metadata differ.
Unless you use some more intelligent comparison tool, you are out of luck. diff does not understand at all the .doc file format and thus compares every byte individually, unable to ignore what you consider as insignificant.

Related

Command prompt for merging word documents

I have two word documents and I need to merge them into one word document using command prompt. Option copy *.extension newfile.extensions works with .txt or .csv file but if I do that with docx result of operation is corrupted word file.
You won't be able to achieve what you want (with command copy) as merging two word documents is a bit more subtle than just do a 'bit wise append' (which is what the copy command does).
Google give quite a few result when searching for 'merge word documents'. One of the result might point you to a tool that you can invoke at the command line.
Hope this helps.
I have need of something similar, and this was the best I could find: https://github.com/jamessantiago/DocxMerge
I haven't tried it yet, but from all I can tell there is no command line option for combining Word documents.
(Good to know: The .docx file is actually a zip file. Rename it to .zip and unzip it to view what's inside. Combining two documents is tricky, but obviously possible, from command line. But there is no built-in command to do it.)

How to develop additional File Formats for BeyondCompare

I see that BeyondCompare can be extended to include additional file formats, as in Additional File Viewer Rules for Beyond Compare 2 and also Additional file format downloads for version 3 but after a quick initial search I don't see how user's develop these special viewers. Is that documented anywhere?
I downloaded a few additional viewers which are handily imported via the BCFormats.bcpkg file
C:\Program Files (x86)\Beyond Compare 3\Helpers>dir /b /s
C:\Program Files (x86)\Beyond Compare 3\Helpers\HtmlTidy
C:\Program Files (x86)\Beyond Compare 3\Helpers\PdfToText.exe
C:\Program Files (x86)\Beyond Compare 3\Helpers\XLS_to_TAB_Single.vbs
C:\Program Files (x86)\Beyond Compare 3\Helpers\HtmlTidy\HtmlTidy.exe
C:\Program Files (x86)\Beyond Compare 3\Helpers\HtmlTidy\XML_tidied_sorted.bat
C:\Program Files (x86)\Beyond Compare 3\Helpers\HtmlTidy\XML_tidied_sorted_config.txt
What's the design to these things, are they something to the effect of a command line tool that reads in a text file as the first argument and outputs the converted file to standard output?
They are command line tools that preprocess a file before it is loaded for comparison. The first argument is an input file and the second argument is an output filename. As an example, the pdftotext.exe tool extracts a .pdf file to a plain text .txt file, then displays the temp file in Beyond Compare's Text Compare.
See Beyond Compare's help file topic Text Format Conversion Settings for details.
In another question (Compare Json Files in Beyond Compare
) I walk through a step by step example that demonstrates some json conversion for diffs to lend a concrete example to this question. What Chris said above is spot on, it's basically a console application that uses some fixed argument positions to take in the input file path as well as an output file path that the text representation will be written to.
$myConvertingConsoleApp $inputFilePath $outputFilePath
Beyond compare will actually provide the actual arguments used by the console application during the conversion process.
It's worth noting that the input file need not even be a text file so long as you can come up with some sane textual representation of the file format that makes sense for diff algorithms to operate upon.

Using diff3 where filenames contain a dash (-)

I'm trying to use diff3 in this way
diff3 options... mine older yours
My problem is that I probably can't use it, since all my 3 files contain a "dash" within.
The manual mentions:
At most one of these three file names may be `-', which tells diff3 to read the standard input for that file.
so I probably have to rename filenames before running diff3.
If you know for a better solution or a workaround, please let me know about. Thank you!
At most one of these three file names may be `-', which tells diff3 to read the standard input for that file.
It does not state, that your filenames should not contain dash symbols. It simply says, that if you want to, you can put - instead of one of the names, in which case the standard input will be read instead of reading one of the files.
So, you can have as many dashes in your filenames as you like and diff3 should work just fine.
However, on Windows putting filenames in "" for escaping space characters does not work, and I failed to find a suitable workaround. However, you can automatize the process of renaming files (if the files are relatively small, this would not even be too inefficient):
#echo off
copy %1 tempfile_1.txt
copy %2 tempfile_2.txt
copy %3 tempfile_3.txt
"C:\Program Files (x86)\KDiff3\bin\diff3.exe" -E tempfile_1.txt tempfile_2.txt tempfile_3.txt
del tempfile_1.txt tempfile_2.txt tempfile_3.txt
Put this in a file like diff3.cmd, then run diff3.cmd "first file.txt" "second file.txt" "third file.txt".
P.S. Moving files would be more efficient (if they are on the same disk volume as the script, which they are not in your case), you could even move them back to where they were initially, but for some time they would not be present at their original folder.

SAS- Reading multiple compressed data files

I hope you are all well.
So my question is about the procedure to open multiple raw data files that are compressed.
My files' names are ordered so I have for example : o_equities_20080528.tas.zip o_equities_20080529.tas.zip o_equities_20080530.tas.zip ...
Thank you all in advance.
How much work this will be depends on whether:
You have enough space to extract all the files simultaneously into one folder
You need to be able to keep track of which file each record has come from (i.e. you can't tell just from looking at a particular record).
If you have enough space to extract everything and you don't need to track which records came from which file, then the simplest option is to use a wildcard infile statement, allowing you to import the records from all of your files in one data step:
infile "c:\yourdir\o_equities_*.tas" <other infile options as per individual files>;
This syntax works regardless of OS - it's a SAS feature, not shell expansion.
If you have enough space to extract everything in advance but you need to keep track of which records came from each file, then please refer to this page for an example of how to do this using the filevar option on the infile statement:
http://www.ats.ucla.edu/stat/sas/faq/multi_file_read.htm
If you don't have enough space to extract everything in advance, but you have access to 7-zip or another archive utility, and you don't need to keep track of which records came from each file, you can use a pipe filename and extract to standard output. If you're on a Linux platform then this is very simple, as you can take advantage of shell expansion:
filename cmd pipe "nice -n 19 gunzip -c /yourdir/o_equities_*.tas.zip";
infile cmd <other infile options as per individual files>;
On windows it's the same sort of idea, but as you can't use shell expansion, you have to construct a separate filename for each zip file, or use some of 7zip's more arcane command-line options, e.g.:
filename cmd pipe "7z.exe e -an -ai!C:\yourdir\o_equities_*.tas.zip -so -y";
This will extract all files from all of the matching archives to standard output. You can narrow this down further via the 7-zip command if necessary. You will have multiple header lines mixed in with the data - you can use findstr to filter these out in the pipe before SAS sees them, or you can just choose to tolerate the odd error message here and there.
Here, the -an tells 7-zip not to read the zip file name from the command line, and the -ai tells it to expand the wildcard.
If you need to keep track of what came from where and you can't extract everything at once, your best bet (as far as I know) is to write a macro to process one file at a time, using the above techniques and add this information while you're importing each dataset.

Getting more out of *.diff -files

I wonder if there are tools to show *.diff files used in patching related to debian packaging. What I need from the tool is that it could just read the diff file and show the actual files changed with changed rows, like kdiff or meld would do when comparing directly 2 different files. Or maybe I have totally wrong kind of approach to this, maybe I should ask how can I get more out of diff-files?
Kompare is able to open a .diff, and it shows you the files changed at the top, alist of changes of the selected file, and a side by side diff (for the lines that it is able to extract from the .diff.
However, when I feed it a debdiff, it got confused. The diff did not have === file headers, only --- and +++ headers, and so it included the changes from the /debian/changelog, /debian/copyright, and /debian/rules with in the /debian/control file. Ymmv.
Screenshot: http://imagebin.ca/view/fNWEzx.html
The Debian diff format seems to be a special diff format. As my short google search didn't result in a graphical tool, which could handle these files in the way normal diff tools do, I'm not sure, if such a tool exists. Perhaps you could try to convert these debiff files to normal diff files (I didn't find a tool, which would do that, either).
There is a tool to visualize changes in Linux packages (Deb, RPM, TAR.GZ, etc.) - pkgdiff.
Usage:
pkgdiff -old OLD.deb -new NEW.deb
Sample reports:
http://lvc.github.com/pkgdiff/pkgdiff_reports/libqb/0.4.1_to_0.8.1/changes_report.html
http://lvc.github.com/pkgdiff/pkgdiff_reports/gstreamer/0.10.23-i486-1_to_0.10.32-i486-1/changes_report.html