How to develop additional File Formats for BeyondCompare - diff

I see that BeyondCompare can be extended to include additional file formats, as in Additional File Viewer Rules for Beyond Compare 2 and also Additional file format downloads for version 3 but after a quick initial search I don't see how user's develop these special viewers. Is that documented anywhere?
I downloaded a few additional viewers which are handily imported via the BCFormats.bcpkg file
C:\Program Files (x86)\Beyond Compare 3\Helpers>dir /b /s
C:\Program Files (x86)\Beyond Compare 3\Helpers\HtmlTidy
C:\Program Files (x86)\Beyond Compare 3\Helpers\PdfToText.exe
C:\Program Files (x86)\Beyond Compare 3\Helpers\XLS_to_TAB_Single.vbs
C:\Program Files (x86)\Beyond Compare 3\Helpers\HtmlTidy\HtmlTidy.exe
C:\Program Files (x86)\Beyond Compare 3\Helpers\HtmlTidy\XML_tidied_sorted.bat
C:\Program Files (x86)\Beyond Compare 3\Helpers\HtmlTidy\XML_tidied_sorted_config.txt
What's the design to these things, are they something to the effect of a command line tool that reads in a text file as the first argument and outputs the converted file to standard output?

They are command line tools that preprocess a file before it is loaded for comparison. The first argument is an input file and the second argument is an output filename. As an example, the pdftotext.exe tool extracts a .pdf file to a plain text .txt file, then displays the temp file in Beyond Compare's Text Compare.
See Beyond Compare's help file topic Text Format Conversion Settings for details.

In another question (Compare Json Files in Beyond Compare
) I walk through a step by step example that demonstrates some json conversion for diffs to lend a concrete example to this question. What Chris said above is spot on, it's basically a console application that uses some fixed argument positions to take in the input file path as well as an output file path that the text representation will be written to.
$myConvertingConsoleApp $inputFilePath $outputFilePath
Beyond compare will actually provide the actual arguments used by the console application during the conversion process.
It's worth noting that the input file need not even be a text file so long as you can come up with some sane textual representation of the file format that makes sense for diff algorithms to operate upon.

Related

how can i distinguish with perl if i have specified an XLSX or a ZIP file

I have a script which takes two parameters on the command line. One should be the name of an ZIP archive, one must be an excel (XSLX) file. Both parameters must be either relative or full quallified.
Ususally I use File::Type to check if a file has the expected format. but for XLSX files the answer is: "application/zip". I know this is right, because XLSX files are zipped, but how can I distinguish if it's a excel file or if the user made a misstake and provided the ZIP Archive as excel.
I also found File::LibMagic, but I can't get it running on Windows.

What is the usage of blacklist.txt in pythonforandroid (p4a)?

In the documentation of pythonforandroid, at https://python-for-android.readthedocs.io/en/latest/buildoptions/, there is a build option described called blacklist.
--blacklist: The path to a file containing blacklisted patterns that will be excluded from the final APK. Defaults to ./blacklist.txt
However, not a word can be found anywhere about how to use this file and what exactly the patterns are supposed to represent. For instance, is this used to exclude libraries, files, or directories? Do the patterns match file names or contents? What is the syntax of the patterns, or an example of a valid blacklist.txt file?
This file should contain a list of glob patterns, i.e. as implemented by fnmatch, one per line. These patterns are compared against the full filepath of each file in your source dir, probably using a global filepath but I'm not certain about that (it might be relative to the source dir).
For instance, the file could contain the following lines:
*.txt
*/test.jpg
This would prevent all files ending with .txt from being included in the apk, and all files named test.jpg in any subfolder.
If using buildozer, the android.blacklist_src buildozer.spec option can be used to point to your choice of blacklist file.

GNU Diff Utility report as Differ where Files are Same

I have installed diffutils-2.8.7-1.exe in Windows XP System.
I have created a MS Office Word Document with some text and an image.
Scenario 1:
Command: diff --report-identical-files "file1.doc" "file1.doc"
It gives the output as "Identical".
Action: Now i have copy pasted the file1.doc.
Scenario 2:
Command: diff --report-identical-files "file1.doc" "Copy of file1.doc"
It gives the output as "Identical".
Action: Now i have opened the file1.doc and Save As file2.doc
Without doing any modification in the content.
Visually both files look identical.
Scenario 3:
Command: diff --report-identical-files "file1.doc" "file2.doc"
It gives the output as "Differ".
Query: Could anyone please explain how it can happen ??
Is diff utility checks something beyond the content of the document ??
The reason for two .doc files to be different even if their contents are identical is that there are additional metadata saved in the file, and these metadata differ.
Unless you use some more intelligent comparison tool, you are out of luck. diff does not understand at all the .doc file format and thus compares every byte individually, unable to ignore what you consider as insignificant.

Using diff3 where filenames contain a dash (-)

I'm trying to use diff3 in this way
diff3 options... mine older yours
My problem is that I probably can't use it, since all my 3 files contain a "dash" within.
The manual mentions:
At most one of these three file names may be `-', which tells diff3 to read the standard input for that file.
so I probably have to rename filenames before running diff3.
If you know for a better solution or a workaround, please let me know about. Thank you!
At most one of these three file names may be `-', which tells diff3 to read the standard input for that file.
It does not state, that your filenames should not contain dash symbols. It simply says, that if you want to, you can put - instead of one of the names, in which case the standard input will be read instead of reading one of the files.
So, you can have as many dashes in your filenames as you like and diff3 should work just fine.
However, on Windows putting filenames in "" for escaping space characters does not work, and I failed to find a suitable workaround. However, you can automatize the process of renaming files (if the files are relatively small, this would not even be too inefficient):
#echo off
copy %1 tempfile_1.txt
copy %2 tempfile_2.txt
copy %3 tempfile_3.txt
"C:\Program Files (x86)\KDiff3\bin\diff3.exe" -E tempfile_1.txt tempfile_2.txt tempfile_3.txt
del tempfile_1.txt tempfile_2.txt tempfile_3.txt
Put this in a file like diff3.cmd, then run diff3.cmd "first file.txt" "second file.txt" "third file.txt".
P.S. Moving files would be more efficient (if they are on the same disk volume as the script, which they are not in your case), you could even move them back to where they were initially, but for some time they would not be present at their original folder.

Adding RCS Header in Binary files

I am using RCS source control and need to check in an binary file (gif image and a jar file) how do I add an $Header$ keyword so that the version information is replaced in this file during check in and get revealed when I issue "ident" command.
For text files like Java, XML etc we usually add the RCS header comments and public strings but no idea about binary files.
Basically, you don't.
Binary file formats don't typically have a way to have a variable-length chunk of arbitrary data. Even if there's a region of the file that can contain arbitrary data, the length of the expansion can vary from one checkout to another (e.g., if it goes from version 1.9 to 1.10), and that's likely to mess up the file.
For this to work, the binary format would have to tolerate a change in the size of the header string. For example, if the version number changes from 1.9 to 1.10, the RCS co command (which has no knowledge of the binary file format) will replace the string in-place, changing the offset of all data following the string. If the file format has a comment section, and that section's size is stored as a number, co isn't going to update that number.
Compiler-generated object and executable files often have RCS version information in them, but it's usually generated from the source file(s); objects and executables themselves typically aren't stored in a version control system.
Before the initial checkin of a binary file, you should run rcs -i -kb filename, so that the RCS co command doesn't attempt to do keyword replacement (just in case the file happens to accidentally contain something that looks like an RCS keyword).
If you have a binary file that you've checked out of an RCS system, and you want to know which version it is, you'll have to compare it to each of the versions in RCS. (My own get-versions might be useful for this.)
If you have a way of storing textual metadata in the file, you could also consider annotating your binary file with a timestamp. You can then correlate the timestamp with the revision by looking at the RCS log.
You mentioned Excel files. I just tried some experiments. The new .xlsx format is really a zip file; anything you put in the Comment section will be compressed, and not visible to ident. The older .xls format, at least for the small file I tried, does store the Comment section in readable text, so ident works -- but when I checked in a file, RCS expanded the Comment from "$Header:$" to "$Header: /home/kst/2012-12-06/RCS/foo.xls,v 1.1 2012-12-06 11:47:48-08 kst Exp kst $"; when I tried to open it with Excel, I got:
Excel found unreadable content in 'foo.xls'.
and it was unable to recover the contents.
In general you can't, but certain binaries has a ASCII slot to place RCS headers.
For example ZIP files
% zip -z archive.zip
$Header$
And then, after CVS handling:
% unzip -l archive.zip
$Header: /cygdrive/c/cvsroot/archive.zip,v 1.2 2020/10/14 13:46:06 omg Exp $
There are dozen of extensions extensions that are actually a zip file where you can do this: odt, pdf, ... but use carefully and prefer short RCS headers like Version or Date, because RCS don't know the slot size, and may corrupt the file.