How to convert XCF to PNG using GIMP from the command-line? - image-manipulation

As part of my build process I need to convert a number of XCF (GIMP's native format) images into PNG format. I'm sure this should be possible using GIMP's batch mode, but I have forgotten all of the script-fu I used to know.
My input images have multiple layers, so I need the batch mode equivalent of "merge visible layers" followed by "save as PNG". Also note that I can't install anything in ~/.gimp*/scripts/ — I need a self-contained command-line, or a way to install scripts in my source tree.
Note that while this is similar to this question, I have the additional constraint that I need this to be done using GIMP. I tried the current version of ImageMagick and it mangles my test images.

Before jsbueno posted his answer I had also tried asking on the #gimp IRC channel. I was directed to this thread on Gimptalk which contains the following code:
gimp -n -i -b - <<EOF
(let* ( (file's (cadr (file-glob "*.xcf" 1))) (filename "") (image 0) (layer 0) )
(while (pair? file's)
(set! image (car (gimp-file-load RUN-NONINTERACTIVE (car file's) (car file's))))
(set! layer (car (gimp-image-merge-visible-layers image CLIP-TO-IMAGE)))
(set! filename (string-append (substring (car file's) 0 (- (string-length (car file's)) 4)) ".png"))
(gimp-file-save RUN-NONINTERACTIVE image layer filename filename)
(gimp-image-delete image)
(set! file's (cdr file's))
)
(gimp-quit 0)
)
EOF
This scriptfu globs for xcf files, and then for each file it loads the file, merges the visible layers, saves the result as a PNG, and "unloads" the image. Finally, it quits GIMP. The glob approach is used to avoid starting up GIMP for each image. It also side-steps the issue of getting parameters from the shell into gimp.
I'm posting this answer just in case someone needs a way to do this without the use of GIMP-Python (perhaps because it isn't installed).

There are a few ways to go through this - my preferred method is always developing a GIMP-Python plug-in. Script-fu uses GIMP's built-in scheme implementation, which is extremely poor in handling I/O (such as listing files in a directory) - a task which is absolutely trivial in Python.
SO, since you said you don't want to save new scripts (you could do it because you can add new plug-in and scripts directories on edit->preferences->folder options so that you don't need to write on ~/.gimp*/plugins/)
Anyway, you can open the python console under filters, and paste this snippet in the interactive prompt.
import gimpfu
def convert(filename):
img = pdb.gimp_file_load(filename, filename)
new_name = filename.rsplit(".",1)[0] + ".png"
layer = pdb.gimp_image_merge_visible_layers(img, gimpfu.CLIP_TO_IMAGE)
pdb.gimp_file_save(img, layer, new_name, new_name)
pdb.gimp_image_delete(img)
This small function will open any image, passed as a file path, flatten it, and save it as a png with the default settings, in the same directory.
Following the function, you can simply type:
from glob import glob
for filename in glob("*.xcf"):
convert(filename)
On the interactive prompt to convert all .xcf files on the current directory to .png
(If you don't have gimp-python installed, it may be a separate package on your linux distribution, just install it using your favorite tool - for those under windows, and gimp-2.6, the instructions on this page have have to be followed - it should become easier on gimp 2.8)
Another way, altogether, applies if your images are sequentially numbered in their filenames, such as myimage_001.xcf, mymage_002.xcf and so on. If they are so arranged you could install GIMP-GAP (gimp animation package) which allows one to apply any filters or actions in such an image sequence.

Here is my solution utilizing GIMP, GIMP's python-fu and bash. Note that GIMP's python-fu can only be run under a gimp process, not under plain python.
#!/bin/bash
set -e
while getopts df: OPTION "$#"; do
case $OPTION in
d)
set -x
;;
f)
XCFFILE=${OPTARG}
;;
esac
done
if [[ -z "$XCFFILE" ]]; then
echo "usage: `basename $0` [-d] -f <xcfFile>"
exit 1
fi
# Start gimp with python-fu batch-interpreter
gimp -i --batch-interpreter=python-fu-eval -b - << EOF
import gimpfu
def convert(filename):
img = pdb.gimp_file_load(filename, filename)
new_name = filename.rsplit(".",1)[0] + ".png"
layer = pdb.gimp_image_merge_visible_layers(img, 1)
pdb.gimp_file_save(img, layer, new_name, new_name)
pdb.gimp_image_delete(img)
convert('${XCFFILE}')
pdb.gimp_quit(1)
EOF

Use xcftools
It's possible that this tool wasn't available (or in any case widely known) at the time this question was originally asked, but the xcf2png terminal command from the xcftools package does exactly what you ask and is very good. I use it for my thesis scripts and would be my go-to choice for this question.
From the documentation:
This is a set of fast command-line tools for extracting information
from the Gimp's native file format XCF.
The tools are designed to allow efficient use of layered XCF files as
sources in a build system that use 'make' and similar tools to manage
automatic processing of the graphics.
These tools work independently of the Gimp engine and do not require
the Gimp to even be installed.
"xcf2png" converts XCF files to PNG format, flattening layers if
necessary.
Transparency information can be kept in the image, or a background
color can be specified on the command line.
(PS. I spotted xcftools mentioned in the comments just after I decided to post an answer about it, but I'm posting anyway as that comment was easy to miss (I saw it because I specifically checked for it) and I think xcftools deserves a proper answer's slot, as it really is the most appropriate answer at this point in time.)

I know this is not strictly the correct answer but seeing as a I got directed here while searching.
I modified the example code here to create a workable plugin
http://registry.gimp.org/node/28124
There is a bit of code from another answer on this page
#!/usr/bin/env python
from gimpfu import *
import os
def batch_convert_xcf_to_png(img, layer, inputFolder, outputFolder):
''' Convert the xcf images in a directory to png.
Parameters:
img : image The current image (unused).
layer : layer The layer of the image that is selected (unused).
inputFolder : string The folder of the images that must be modified.
outputFolder : string The folder in which save the modified images.
'''
# Iterate the folder
for file in os.listdir(inputFolder):
try:
# Build the full file paths.
inputPath = inputFolder + "\\" + file
new_name = file.rsplit(".",1)[0] + ".png"
outputPath = outputFolder + "\\" + new_name
# Open the file if is a XCF image.
image = None
if(file.lower().endswith(('.xcf'))):
image = pdb.gimp_xcf_load(1,inputPath, inputPath)
# Verify if the file is an image.
if(image != None):
#layer = pdb.gimp_image_merge_visible_layers(image, gimpfu.CLIP_TO_IMAGE)
layer = pdb.gimp_image_merge_visible_layers(image, 1)
# Save the image.
pdb.file_png_save(image, image.layers[0], outputPath, outputPath, 0, 9, 0, 0, 0, 0, 0)
except Exception as err:
gimp.message("Unexpected error: " + str(err))
'''
img = pdb.gimp_file_load(filename, filename)
new_name = filename.rsplit(".",1)[0] + ".png"
layer = pdb.gimp_image_merge_visible_layers(img, gimpfu.CLIP_TO_IMAGE)
pdb.gimp_file_save(img, layer, new_name, new_name)
pdb.gimp_image_delete(img)
'''
register(
"batch_convert_xcf_to_png",
"convert chris",
"convert ",
"Chris O'Halloran",
"Chris O'Halloran",
"2014",
"<Image>/Filters/Test/Batch batch_convert_xcf_to_png",
"*",
[
(PF_DIRNAME, "inputFolder", "Input directory", ""),
(PF_DIRNAME, "outputFolder", "Output directory", ""),
],
[],
batch_convert_xcf_to_png)
main()

Related

Latex rendering in README.md on Github

Is there any way to render LaTex in README.md in a GitHub repository? I've googled it and searched on stack overflow but none of the related answers seems feasible.
For short expresions and not so fancy math you could use the inline HTML to get your latex rendered math on codecogs and then embed the resulting image. Here an example:
- <img src="https://latex.codecogs.com/gif.latex?O_t=\text { Onset event at time bin } t " />
- <img src="https://latex.codecogs.com/gif.latex?s=\text { sensor reading } " />
- <img src="https://latex.codecogs.com/gif.latex?P(s | O_t )=\text { Probability of a sensor reading value when sleep onset is observed at a time bin } t " />
Which should result in something like the next
Update: This works great in eclipse but not in github unfortunately. The only work around is the next:
Take your latex equation and go to http://www.codecogs.com/latex/eqneditor.php, at the bottom of the area where your equation appears displayed there is a tiny dropdown menu, pick URL encoded and then paste that in your github markdown in the next way:
![equation](http://latex.codecogs.com/gif.latex?O_t%3D%5Ctext%20%7B%20Onset%20event%20at%20time%20bin%20%7D%20t)
![equation](http://latex.codecogs.com/gif.latex?s%3D%5Ctext%20%7B%20sensor%20reading%20%7D)
![equation](http://latex.codecogs.com/gif.latex?P%28s%20%7C%20O_t%20%29%3D%5Ctext%20%7B%20Probability%20of%20a%20sensor%20reading%20value%20when%20sleep%20onset%20is%20observed%20at%20a%20time%20bin%20%7D%20t)
I upload repositories with equations to Gitlab because it has native support for LaTeX in .md files:
```math
SE = \frac{\sigma}{\sqrt{n}}
```
The syntax for inline latex is $`\sqrt{2}`$.
Gitlab renders equations with JavaScript in the browser instead of showing images, which improves the quality of equations.
More info here.
Let's hope Github will implement this as well in the future.
My trick is to use the Jupyter Notebook.
GitHub has built-in support for rendering .ipynb files. You can write inline and display LaTeX code in the notebook and GitHub will render it for you.
Here's a sample notebook file: https://gist.github.com/cyhsutw/d5983d166fb70ff651f027b2aa56ee4e
Readme2Tex
I've been working on a script that automates most of the cruft out of getting LaTeX typeset nicely into Github-flavored markdown: https://github.com/leegao/readme2tex
There are a few challenges with rendering LaTeX for Github. First, Github-flavored markdown strips most tags and most attributes. This means no Javascript based libraries (like Mathjax) nor any CSS styling.
The natural solution then seems to be to embed images of precompiled equations. However, you'll soon realize that LaTeX does more than just turning dollar-sign enclosed formulas into images.
Simply embedding images from online compilers gives this really unnatural look to your document. In fact, I would argue that it's even more readable in your everyday x^2 mathematical slang than jumpy .
I believe that making sure that your documents are typeset in a natural and readable way is important. This is why I wrote a script that, beyond compiling formulas into images, also ensures that the resulting image is properly fitted and aligned to the rest of the text.
For example, here is an excerpt from a .md file regarding some enumerative properties of regular expressions typeset using readme2tex:
As you might expect, the set of equations at the top is specified by just starting the corresponding align* environment
**Theorem**: The translation $[\![e]\!]$ given by
\begin{align*}
...
\end{align*}
...
Notice that while inline equations ($...$) run with the text, display equations (those that are delimited by \begin{ENV}...\end{ENV} or $$...$$) are centered. This makes it easy for people who are already accustomed to LaTeX to keep being productive.
If this sounds like something that could help, make sure to check it out. https://github.com/leegao/readme2tex
Since May 2022, this has been officially supported:
Inline:
Where $x = 0$, evaluate $x + 1$
Blocks:
Where
$$x = 0$$
Evaluate
$$x + 1$$
One can also use this online editor: https://www.codecogs.com/latex/eqneditor.php which generates SVG files on the fly. You can put a link in your document like this:
![](https://latex.codecogs.com/svg.latex?y%3Dx%5E2) which results in:
.
I test some solution proposed by others and I would like to recommend TeXify created and proposed in comment by agurodriguez and further described by Tom Hale - I would like develop his answer and give some reason why this is very good solution:
TeXify is wrapper of Readme2Tex (mention in Lee answer). To use Readme2Tex you must install a lot of software in your local machine (python, latex, ...) - but TeXify is github plugin so you don't need to install anything in your local machine - you only need to online installation that plugin in you github account by pressing one button and choose repositories for which TeXify will have read/write access to parse your tex formulas and generate pictures.
When in your repository you create or update *.tex.md file, the TeXify will detect changes and generate *.md file where latex formulas will be exchanged by its pictures saved in tex directory in your repo. So if you create README.tex.md file then TeXify will generate README.md with pictures instead tex formulas. So parsing tex formulas and generate documentation is done automagically on each commit&push :)
Because all your formulas are changed into pictures in tex directory and README.md file use links to that pictures, you can even uninstall TeXify and all your old documentation will still works :). The tex directory and *.tex.md files will stay on repository so you have access to your original latex formulas and pictures (you can also safely store in tex directory your other documentation pictures "made by hand" - TeXify will not touch them).
You can use equations latex syntax directly in README.tex.md file (without loosing .md markdown syntax) which is very handy. Julii in his answer proposed to use special links (with formulas) to external service e.g . http://latex.codecogs.com/gif.latex?s%3D%5Ctext%20%7B%20sensor%20reading%20%7D which is good however has some drawbacks: the formulas in links are not easy (handy) to read and update, and if there will be some problem with that third-party service your old documentation will stop work... In TeXify your old documentation will works always even if you uninstall that plugin (because all your pictures generated from latex formulas are stay in repo in tex directory).
The Yuchao Jiang in his answer, proposed to use Jupyter Notebook which is also nice however have som drawbacks: you cannot use formulas directly in README.md file, you need to make link there to other file *.ipynb in your repo which contains latex (MathJax) formulas. The file *.ipynb format is JSON which is not handy to maintain (e.g. Gist don't show detailed error with line number in *.ipynb file when you forgot to put comma in proper place...).
Here is link to some of my repo where I use TeXify for which documentation was generated from README.tex.md file.
Update
Today 2020.12.13 I realised that TeXify plugin stop working - even after reinstallation :(
For automatic conversion upon push to GitHub, take a look at the TeXify app:
GitHub App that looks in your pushes for files with extension *.tex.md and renders it's TeX expressions as SVG images
How it works (from the source repository):
Whenever you push TeXify will run and seach for *.tex.md files in your last commit. For each one of those it'll run readme2tex which will take LaTeX expressions enclosed between dollar signs, convert it to plain SVG images, and then save the output into a .md extension file (That means that a file named README.tex.md will be processed and the output will be saved as README.md). After that, the output file and the new SVG images are then commited and pushed back to your repo.
I just published a new version of xhub, a browser extension that renders LaTeX (and other things) in GitHub pages.
Cons:
You have to install the extension once.
Pros:
No need to set up anything.
Just write Markdown with math
Display math:
```math
e^{i\pi} + 1 = 0
```
and line math $`a^2 + b^2 = c^2`$.
(Syntax like on GitLab.)
Works on light and dark background. (Math has text-color)
You can copy-and-paste the math just like text
As an example, check out this GitHub README:
You can get a continuous integration service (e.g. Travis CI) to render LaTeX and commit results to github. CI will deploy a "cloud" worker after each new commit. The worker compiles your document into pdf and either cuses ImageMagick to convert it to an image or uses PanDoc to attempt LaTeX->HTML conversion where success may vary depending on your document. Worker then commits image or html to your repository from where it can be shown in your readme.
Sample TravisCi config that builds a PDF, converts it to a PNG and commits it to a static location in your repo is pasted below. You would need to add a line that fetches pdfconverts PDF to an image
sudo: required
dist: trusty
os: linux
language: generic
services: docker
env:
global:
- GIT_NAME: Travis CI
- GIT_EMAIL: builds#travis-ci.org
- TRAVIS_REPO_SLUG: your-github-username/your-repo
- GIT_BRANCH: master
# I recommend storing your GitHub Access token as a secret key in a Travis CI environment variable, for example $GH_TOKEN.
- secure: ${GH_TOKEN}
script:
- wget https://raw.githubusercontent.com/blang/latex-docker/master/latexdockercmd.sh
- chmod +x latexdockercmd.sh
- "./latexdockercmd.sh latexmk -cd -f -interaction=batchmode -pdf yourdocument.tex -outdir=$TRAVIS_BUILD_DIR/"
- cd $TRAVIS_BUILD_DIR
- convert -density 300 -quality 90 yourdocument.pdf yourdocument.png
- git checkout --orphan $TRAVIS_BRANCH-pdf
- git rm -rf .
- git add -f yourdoc*.png
- git -c user.name='travis' -c user.email='travis' commit -m "updated PDF"
# note we are again using GitHub access key stored in the CI environment variable
- git push -q -f https://your-github-username:$GH_TOKEN#github.com/$TRAVIS_REPO_SLUG $TRAVIS_BRANCH-pdf
notifications:
email: false
This Travis Ci configuration launches a Ubuntu worker downloads a latex docker image, compiles your document to pdf and commits it to a branch called branchanme-pdf.
For more examples see this github repo and its accompanying sx discussion, PanDoc example,
https://dfm.io/posts/travis-latex/, and this post on Medium.
I have been looking around and found that this answer in another question works best for me. i.e. use githubcontent math renderer, e.g. to display:
Use this link
Beware of the latex needs to be url encoded, but otherwise work quite well for me.
If you are having issues with https://www.codecogs.com/latex/eqneditor.php, I found that https://alexanderrodin.com/github-latex-markdown/ worked for me. It generates the Markdown code you need, so you just cut and paste it into your README.md document.
You may also take a look on my tool latexMarkdown2Markdown which convert LaTeX to SVG and generate a table of content with chapter numbering.
Good news!
According to this blogpost, now GitHub supports Mathjax in readme files.
You can use in-line LaTeX inspired syntax using $ delimiters, or in-blocks using $$ delimiters.
Writing inline expressions:
This sentence uses $ delimiters to show math inline:
$\sqrt{3x-1}+(1+x)^2$
Writing expressions as blocks:
The Cauchy-Schwarz Inequality
$$\left( \sum_{k=1}^n a_k b_k \right)^2 \leq \left( \sum_{k=1}^n a_k^2
\right) \left( \sum_{k=1}^n b_k^2 \right)$$
Source: https://docs.github.com/en/get-started/writing-on-github/working-with-advanced-formatting/writing-mathematical-expressions
You can use markdowns, e.g.
![equ](https://latex.codecogs.com/gif.latex?log(y)=\beta_0&space;&plus;&space;\beta_1&space;x&space;&plus;&space;u)
Code can be typed here: https://www.codecogs.com/latex/eqneditor.php.
Edit: As germanium pointed out, it does not work for README.md but other git pages though no explanation is available.
My quick solution is this
step 1. Add latex to your .md file
$$x=\sqrt{2}$$
Note: math eqns must be in $$...$$ or \\(... \\).
step 2. Add the following to your scripts.html or theme file (append this code at the end)
<script type="text/javascript" async
src="https://cdn.mathjax.org/mathjax/latest/MathJax.js?config=TeX-MML-AM_CHTML">
Done!. See your eq. by loading the page.

How do I make emacs save its backup files alongside symbolic links?

I have files in some location:
location_a/doc.tex
location_a/doc.cls
...
I want to work on them in another directory via symbolic links:
work_directory/doc.tex -> location_a/doc.tex
work_directory/doc.cls -> location_a/doc.cls
work_directory/doc.pdf
work_directory/doc.log
...
However, when I run emacs doc.tex in the work directory and do some editing, emacs creates a backup file at location_a/doc.tex~. I want the backup file to be stored in the work directory, though. I don't want any new files created in location_a.
How can I make emacs do that?
This is trickier than it seems it should be because backup-buffer insists on chasing the links of the buffer file name before calling any backup file name construction machinery, such as make-backup-file-name-function. The result is that Emacs allows no way to customize this behavior, short of redefining backup-buffer, which is a fairly complicated piece of code.
A compromise solution I came up with is to install an "advice" around backup-buffer that temporarily disables file-chase-links while backup-buffer is being evaluated. This allows the backup file to be in the directory where the symlink resides. However, it also causes Emacs to create the backup by renaming the original symlink, leaving one with work_directory/doc.tex~ being a symlink that points to location_a/doc.tex! Fortunately, this is easy to prevent by setting backup-by-copying to t.
Here is the code. A word of warning: while I have tried it to verify that it works, I cannot guarantee that it will not have an undesirable side effect, like the above interference with the backup mechanism that required backup-by-copying. However, it might also work just fine - just be careful when using it.
(require 'cl) ; for flet
(defadvice backup-buffer (around disable-chase-links)
(flet ((file-chase-links (file) file))
ad-do-it))
(ad-activate 'backup-buffer)
For the fun of it, let me describe a radically different approach, based on directory variables.
In short, you would put in your work-directory/ a file named .dir-locals.el containing:
((nil . ((eval . (set (make-local-variable 'backup-directory-alist)
(list (cons "."
(file-relative-name
(file-name-directory (buffer-file-name))
(file-name-directory (file-truename
(buffer-file-name)))))))))))
What this does is abusing somewhat the backup-directory-alist, and install a local version
of it for all your files in work-directory/. That local version will in turn make sure that any backup file is kept within work-directory.
In order to achieve that, we need 2 things:
have something like '(("." . "path/to/work-directory/")) as the local value
make sure this path is relative to location_a/
The reason for the second point is that as noted elsewhere, the starting point of backup-buffer is indeed the location of the actual file, once symlinks are resolved. And we can't simply put the absolute path without having backup files changing shape (in case of absolute path for the backup directory, the backup filenames encode the complete path, so that there is no collision)
Notes:
you'll need to make sure that specific local variable is recorded in the safe-local-variable-values. Since it's a generic form, it's a one time job though (just hit "!" the first time you're asked about it)
this assumes find-file-visit-truename is set to nil, but I guess you wouldn't ask that question if that was not the case :)
Pros of the approach:
no need for advice (which is always a good thing)
reasonably portable although it assumes your Emacs supports directory variables
you keep the flexibility to put that in place only where you need it
Cons of the approach:
well, obviously you might have to copy that .dir-locals.el in several places
Also note that if you wanted a one-shot approach, you could make it much simpler, such as:
((nil . ((backup-directory-alist (("." . "../path/to/work-directory"))))))
where you actually compute the relative name yourself, once and for all.

How to generate files from Liquid blocks in Jekyll?

I am writing a plugin that defines a new Jekyll block ditaa. Any content in the block should be converted from Ditaa markup to an image file and that image inserted into the post instead of the block. Generating the file works but when copying into or generating in the _site directory, the file is apparently deleted.
Is there a proper/better way to implement a block plugin that generates custom assets?
I've found the proper solution: use the Jekyll::StaticFile class.
When you add one object of this class to the site.static_files array, you are marking this file as pending for copy after the render process is completed. In fact, the copy of such files is done in the site.write process. Take a look at the site_process.rb file in your Jekyll installation.
The usage of this class is easy. When you need to mark a file for future copy, you simply execute a code like this:
site.static_files << Jekyll::StaticFile.new(site, site.source, path, filename)
Where path and filename depends on the location of your file in the src folder.
I had a similar issue developing a LaTeX -> PNG liquid tag. You can take a look at my code at GitHub: https://github.com/fgalindo/jekyll-liquid-latex-plugin
I haven't found the proper way to do it, but one that works. The solution can be found on GitHub and uses Jekylls ability to copy anything that is not prefixed with an underscore to the _site directory. However, this approach has also two drawbacks:
The "source" directory gets polluted with auto-generated files
Deploying without auto-regeneration is a bit awkward because the images are generated after Jekyll already copied all files. So a second Jekyll run is necessary.
I have found the answer.
Replace this
site.static_files << Jekyll::StaticFile.new(site, site.source, path, filename)
with
gnufile = GNUplotFile.new(site, site.source, "_site/media/", "#{#file}")
gnufile.givemethecommands commands
site.static_files << gnufile
and create a GNUplotFile class that inherits Jekyll::StaticFile
class GNUplotFile < Jekyll::StaticFile
def write(dest)
puts "WRITE---->>>>>>>>>>>"
#File.write('_site/media/BTTTTT.svg', DateTime.now)
gnuplot(#commands)
# do nothing
end
def gnuplot(commands)
IO.popen("gnuplot", "w") { |io| io.puts commands }
end
def givemethecommands(commands)
#commands = commands
end
end
The write command runs after the cleanup process. I just have a Liquid block and the above code.

How do you trim the XMP XML contained within a jpg

Through the use of sanselan I've found that the root cause of iPhone photos imported to windows becoming uneditable is that there is content (white space?) after the actual XML (for more details and a linked example of the bad XMP XML see https://apple.stackexchange.com/questions/45326/why-can-i-not-edit-some-photos-imported-from-an-iphone-to-windows-vista).
I'd like to scan through my photo archive and 'trim' the XMP XML.
Is there an easy way to do this?
I have some java code that can recursively navigate my photo archive and DETECT the issue. I'm not sure how to trim and write the XML back though.
Obtain the existing XML using any means.
The following works if using the Apache Sanselan library:
String xmpXml = Sanselan.getXmpXml(new File('/path/to/jpeg'));
Then trim it...
xmpXml = xmpXml.trim();
Then write it back to the file using the solution to serializing Xmp XML to an existing jpeg.
try the following steps:
collect all of the photos in a single folder (e.g. folder xmlToConvert on your Desktop)
open a Terminal.app window
cd to the directory you put the files in (e.g. cd ~/Desktop/xmlToConvert)
run the following command from your command line prompt
mkdir converted ; for f in *.xml ; do cat $f | head -n $(wc -l $f) > converted/$f ; done
the converted/ sub-directory should now contain all the files without the whitespace at the end.
(i.e. a folder called converted in the xmlToConvert you created on your Desktop)
hth

Uncompress OpenOffice files for better storage in version control

I've heard discussion about how OpenOffice (ODF) files are compressed zip files of XML and other data. So making a tiny change to the file can potentially totally change the data, so delta compression doesn't work well in version control systems.
I've done basic testing on an OpenOffice file, unzipping it and then rezipping it with zero compression. I used the Linux zip utility for my testing. OpenOffice will still happily open it.
So I'm wondering if it's worth developing a small utility to run on ODF files each time just before I commit to version control. Any thoughts on this idea? Possible better alternatives?
Secondly, what would be a good and robust way to implement this little utility? Bash shell that calls zip (probably Linux only)? Python? Any gotchas you can think of? Obviously I don't want to accidentally mangle a file, and there are several ways that could happen.
Possible gotchas I can think of:
Insufficient disk space
Some other permissions issue that prevents writing the file or temporary files
ODF document is encrypted (probably should just leave these alone; the encryption probably also causes large file changes and thus prevents efficient delta compression)
First, version control system you want to use should support hooks which are invoked to transform file from version in repository to the one in working area, like for example clean / smudge filters in Git from gitattributes.
Second, you can find such filter, instead of writing one yourself, for example rezip from "Management of opendocument (openoffice.org) files in git" thread on git mailing list (but see warning in "Followup: management of OO files - warning about "rezip" approach"),
You can also browse answers in "Tracking OpenOffice files/other compressed files with Git" thread, or try to find the answer inside "[PATCH 2/2] Add keyword unexpansion support to convert.c" thread.
Hope That Helps
You may consider to store documents in FODT-format - flat XML format.
This is relatively new alternative solution available.
Document is just stored unzipped.
More info is available at https://wiki.documentfoundation.org/Libreoffice_and_subversion.
I've modified the python program in Craig McQueen's answer just a bit. Changes include:
Actually checking the return of testZip (according to the docs, it appears that the original program will happily proceed with a corrupt zip file past the checkzip step).
Rewrite the for-loop to check for already-uncompressed files to be a single if-statement.
Here is the new program:
#!/usr/bin/python
# Note, written for Python 2.6
import sys
import shutil
import zipfile
# Get a single command-line argument containing filename
commandlineFileName = sys.argv[1]
backupFileName = commandlineFileName + ".bak"
inFileName = backupFileName
outFileName = commandlineFileName
checkFilename = commandlineFileName
# Check input file
# First, check it is valid (not corrupted)
checkZipFile = zipfile.ZipFile(checkFilename)
if checkZipFile.testzip() is not None:
raise Exception("Zip file is corrupted")
# Second, check that it's not already uncompressed
if all(f.compress_type==zipfile.ZIP_STORED for f in checkZipFile.infolist()):
raise Exception("File is already uncompressed")
checkZipFile.close()
# Copy to "backup" file and use that as the input
shutil.copy(commandlineFileName, backupFileName)
inputZipFile = zipfile.ZipFile(inFileName)
outputZipFile = zipfile.ZipFile(outFileName, "w", zipfile.ZIP_STORED)
# Copy each input file's data to output, making sure it's uncompressed
for fileObject in inputZipFile.infolist():
fileData = inputZipFile.read(fileObject)
outFileObject = fileObject
outFileObject.compress_type = zipfile.ZIP_STORED
outputZipFile.writestr(outFileObject, fileData)
outputZipFile.close()
Here's another program I stumbled across: store_zippies_uncompressed by Mirko Friedenhagen.
The wiki also shows how to integrate it with Mercurial.
Here is a Python script that I've put together. It's had minimal testing so far. I've done basic testing in Python 2.6. But I prefer the idea of Python in general because it should abort with an exception if any error occurs, whereas a bash script may not.
This first checks that the input file is valid and not already uncompressed. Then it copies the input file to a "backup" file with ".bak" extension. Then it uncompresses the original file, overwriting it.
I'm sure there are things I've overlooked. Please feel free to give feedback.
#!/usr/bin/python
# Note, written for Python 2.6
import sys
import shutil
import zipfile
# Get a single command-line argument containing filename
commandlineFileName = sys.argv[1]
backupFileName = commandlineFileName + ".bak"
inFileName = backupFileName
outFileName = commandlineFileName
checkFilename = commandlineFileName
# Check input file
# First, check it is valid (not corrupted)
checkZipFile = zipfile.ZipFile(checkFilename)
checkZipFile.testzip()
# Second, check that it's not already uncompressed
isCompressed = False
for fileObject in checkZipFile.infolist():
if fileObject.compress_type != zipfile.ZIP_STORED:
isCompressed = True
if isCompressed == False:
raise Exception("File is already uncompressed")
checkZipFile.close()
# Copy to "backup" file and use that as the input
shutil.copy(commandlineFileName, backupFileName)
inputZipFile = zipfile.ZipFile(inFileName)
outputZipFile = zipfile.ZipFile(outFileName, "w", zipfile.ZIP_STORED)
# Copy each input file's data to output, making sure it's uncompressed
for fileObject in inputZipFile.infolist():
fileData = inputZipFile.read(fileObject)
outFileObject = fileObject
outFileObject.compress_type = zipfile.ZIP_STORED
outputZipFile.writestr(outFileObject, fileData)
outputZipFile.close()
This is in a Mercurial repository in BitBucket.
If you don't need the storage savings, but just want to be able to diff OpenOffice.org files stored in your version control system, you can use the instructions on the oodiff page, which tells how to make oodiff the default diff for OpenDocument formats under git and mercurial. (It also mentions SVN, but it's been so long since I used SVN regularly I'm not sure if those are instructions or limitations.)
(I found this using Mirko Friedenhagen's page (cited by Craig McQueen above))