Two closely matching files: get corresponding lines? - diff

I'm in a situation where I'm programmatically generating LaTeX code, and I want my Synctex to point to the correct lines in the original file.
The generation is basically doing template expansion, so the original files are nearly identical to the generated ones, but with some snippets expanded.
I'm wondering, is there a diff tool or library that will easily give me the line number of the original file that corresponds to a given line in the generated one? Can this be extracted from a normal Unix diff somehow?
This is part of a build script, so ideally something easy to run, like bash or python, is preferred to something that needs to be compiled.

Google’s diff-match-patch lib is a neat solution to questions like these: https://github.com/google/diff-match-patch

Related

Merge 2 pdf files and preserve forms

I'd like to merge at least 2 PDF files into one while preserving all the form elements in the original PDFs. The form elements include text fields, radio buttons, check boxes, drop down menus and others. Please have a look at this sample PDF file with forms:
http://foersom.com/net/HowTo/data/OoPdfFormExample.pdf
Now try to merge it with any other arbitrary PDF file.
Can you do it?
EDIT: As for the implementation, I'd ideally prefer a command line solution on a linux plattform using open source tools such as 'ghostscript', or any other tool that you think is appropriate to solve this task.
Of course, everybody is welcome to supply any working solution to this problem, including a coded solution that involves writing a script which makes some API calls to a pdf-processing library. However, I'd suggest to take the path of least resistance first (CMD Solution).
Best Regards
EDIT #2: Well there are indeed several CMD tools that merge PDFs. However, these tools don't seem to, AFAIK, to preserve the forms in the original PDFs! These tools appear to simply just concatenate the printouts of all those PDFs into a single Printout, which is then presented as a single PDF.
Furthermore, If you printout a PDF file with forms into a file, you lose all the forms in it. This clearly not what I'm looking for.
I have found success using pdftk, which is an open-source software that runs on linux and can be called from your terminal.
To concatenate multiple pdfs into one (and preserve form-fillable elements), you can use the following command:
pdftk input1.pdf input2.pdf cat output output-file.pdf

Cond statement doxygen does not work

I am trying to separate out internal and external documentation using the doxygen constructs of cond; but i just cant seem to get get it working. I would essentially like to exclude some files completely and not conditionally. Regardless of where i add the tag (before include, before header guards etc) , the files and source both show up.
What i have tried in vain is to take the test file from doxegen repo for
conditional test and add it to the project.
Steps to reproduce [Linux]
create a new directory.
copy paste the above file (had to rename it to .h as .c was passed over?).
generate dummy config via doxygen -g.
update Doxyfile ENABLED_SECTION = COND_ENABLED.
Run doxygen.
check html/index.html
This however is still visible in the html documentation it generates for the project. I have set the ENABLED_SECTION variable with other values , but cond_enabled function still shows up. Running the testing directory of the project (doxygen) it passes. So i am lost.
Any suggestions?
Tried with latest version 1.8.14.
Thanks!
Regarding the \cond problems (not an answer directly to the real problem you face, I think, but to long for a comment).
The mentioned file is used in the, limited, testing doxygen can do / does and the first lines contain some instructions on what to do. Furthermore there is a default Doxyfile with the tests in use. It is hard to run a separate test outside the doxygen build tree.
Regarding the remark "Running the testing directory of the project (doxygen) it passes." This is correct, here, at the moment, only testing is done against the XML output and the generated output is compared to a once created version of the XML output. No tests are done, at the moment, in respect to HTML or PDF / LaTeX. Recently the test framework has been slightly extended so in the future this should be possible (compare the xhtml and tex output, but some work has still to be done here).
The version of the parser sees the \cond in the first line (normal C comment) as a doxygen command and skips everything till the first \endcond (your friend in these cases is always doxygen -d preprocessor). I think that removing / modifying the first line will result in an already better result. There is however another hiccup for e.g. HTML output. As the function cond_enabled is not documented and EXPAND_ALL is not set to YES the function will not appear in the documentation. So best is also to add a line of documentation with the function cond_enabled.
Regarding the seen HTML problems I modified the the relevant test in doxygen slightly and pushed a proposed patch to github (pull request 714, https://github.com/doxygen/doxygen/pull/714).
Note: the problem of skipping the \cond in normal C comment is quite a bit harder to implement (seen the logical complexity of the doxygen code in pre.l and commentcnv.l.
EDIT: 2018/06/10: The push request has been integrated in the master version on github.

diff ignore certain pattern in the file

I want to make diff between two files which contains lines beginning with "line_$NR". I want to make diff between the files making abstraction of the presence of "lines_$NR" but when the differences are printed I want lines_$NR to be displayed.
It is possible to do that?
Thanks.
I believe in this case, you have to preprocess your iput files to remove /^line_[0-9]*/, diff the resulting files, then recombine the diff output with the removed words according to line numbers in diff output.
Python's difflib should be very handy here, or same from perl. If you want to stick to shell, I suppose you could get by with awk.
If you don't need exact output, perhaps you can use diff's --line-format=... directive to inject actual line number in a diff, rather than the word you removed in preprocessing step.

Programmatically change text config files in Linux with minimal effort

I am looking for a tool that would ease the modification of text configuration files for tasks like:
Set ForwardAgent yes on /etc/ssh/ssh_config
Append HGUSER to AcceptEnv in /etc/ssh/sshd_config (that's more complex as it does accept several params, if yours is not alread there it should add it)
Most important:
running it several times should have no side effects.
if something looks weird, it should complain (for example if you find the same line several times in a file, or if the expected syntax does not match).
Is there any linux tool that can easily be used to automate things like this?
The whole point is to be able to write these config patches somewhere so you can deploy them on several machines or on a new machine when needed.
I would certainly do this with bash scripting. Here is a great tutorial.
http://linuxconfig.org/Bash_scripting_Tutorial
to change a line in a file you could do something like:
check the file exists
grep for the value you want to change - error if it appears multiple times or something
use sed to change that line
to append something to a file
check if file exists
grep to ensure it hasn't been appended to already
echo whatever >> file - the double greater than appends to a file
with each of these I would make a backup copy of the file first, just in case something goes wrong
You might want to have a look at the Unified Configuration Interface (UCI) used in Embedded Linux systems. If you have the flexibility to adapt the UCI format for your config files, this is pretty similar to what you are looking for.

LaTeX command for last modified

Is there a LaTeX command that prints the "last modified" date of the actual document? Since LaTeX projects consist of more than one file this command ideally prints the date of the actual file, not that of the project.
pdfTeX provides the primitive \pdffilemoddate to query this information for files. (LuaTeX uses its own Lua functions for the same thing.) Since pdfTeX is used by default in all LaTeX distributions in the last few years (at least), there's no harm in using the new functionality unless you're dealing with very old production systems. Here's an example:
\documentclass{article}
\begin{document}
\def\parsedate #1:20#2#3#4#5#6#7#8\empty{20#2#3/#4#5/#6#7}
\def\moddate#1{\expandafter\parsedate\pdffilemoddate{#1}\empty}
this is the moddate: \moddate{\jobname.tex}
\end{document}
(Assuming the file has been modified since year 2000.)
The package filemod seems to do exactly what you need. To get the last modified date of the file you just include the package in the usual way:
\usepackage{filemod}
and the modification time of the current document is printed by:
\filemodprintdate{\jobname}
you can also print the modification time, and there are many options to format the output.
Unfortunately, TeX does not provide commands for such information; the only way to get such information is
by running a non-TeX script to create a TeX file before running LaTeX and including this file in your main LaTeX document somehow, or
by running the external script from TeX (which only works if the so-called write18 or shellescape feature is enabled; you'd have to consult the manual of your TeX implementation for this, and not have a stubborn sysadmin).
It is possible that extended TeXs do support file info commands (luaTeX perhaps?), but it's not part of TeX proper.
If you are using an automated build system, you could ask it to generate a file (perhaps named today.sty) which depends on all the source files.
In make that might look like:
today.sty: $LATEX_SRCS
echo "\date{" > $#
date +D >> $#
echo "}" >> $#
and \usepackage{today.sty}.
The will use the date of the first build after a file changes, and won't update until either you delete today.sty or alter another source file.
thank dmckee
LATEX_SRCS = test.tex
define moddate
date +%Y%m%d%H%M%S
endef
today.sty: $(LATEX_SRCS)
#echo "\def\moddate{"$(shell $(moddate))"}"> $#
There is the getfiledate LaTeX package (it was part of my LaTeX distribution by default). It seems to be designed to automatically output a paragraph like:
The date of last modification of file misc-test1.tex was 2009-10-11  21:45:50.
with a bit of ability to tweak the output. You can definitely get just the date. However, I couldn't figure out how to get rid of newlines around the date and how to change the date format. To be honest I think the authors implemented it exactly for the single purpose they needed it, and it is rather cumbersome for general use.