exiftool not showing space character when using -t or -T option - exiftool

I am using the following command to save a tag with a space character:
exiftool -config xmp.config -overwrite_original -PropertyID=' ' /Users/admin/Downloads/Files/09913/1KingWithSofaBed_rm521_1.tif
Using the -X option, I can see that the space character was saved succesfully:
exiftool -X -filename -PropertyID /Users/admin/Downloads/Files/09913/1KingWithSofaBed_rm521_1.tif
<?xml version='1.0' encoding='UTF-8'?>
<rdf:RDF xmlns:rdf='http://www.w3.org/1999/02/22-rdf-syntax-ns#'>
<rdf:Description rdf:about='/Users/admin/Downloads/Files/09913/1KingWithSofaBed_rm521_1.tif'
xmlns:et='http://ns.exiftool.ca/1.0/' et:toolkit='Image::ExifTool 11.84'
xmlns:System='http://ns.exiftool.ca/File/System/1.0/'
xmlns:XMP-xmp='http://ns.exiftool.ca/XMP/XMP-xmp/1.0/'>
<System:FileName>1KingWithSofaBed_rm521_1.tif</System:FileName>
<XMP-xmp:PropertyID> </XMP-xmp:PropertyID>
</rdf:Description>
</rdf:RDF>
The problem is that -t or -T does not show the space:
exiftool -t -filename -PropertyID /Users/admin/Downloads/Files/09913/1KingWithSofaBed_rm521_1.tif
File Name 1KingWithSofaBed_rm521_1.tif
Property ID
exiftool -T -filename -PropertyID /Users/admin/Downloads/Files/09913/1KingWithSofaBed_rm521_1.tif
1KingWithSofaBed_rm521_1.tif
In both cases the space is not present (I have checked the contents with an hex editor) for the PropertyID field.
Is this a limitation of exiftool or it is possible to show it usint -t or -T option?

The answer from Phil Harvey, the author of exiftool
You can use the (undocumented) -ec option (ExifTool 11.54 or later) to escape control characters using C-style escape sequences and preserve trailing newlines, nulls and newlines, etc
I tested it out and it seemed to preserve trailing spaces

Related

How to perform UTF-8 encoding using xmlstarlet fo --encode option?

The synopsis for xmlstarlet fo says
XMLStarlet Toolkit: Format XML document
Usage: xmlstarlet fo [<options>] <xml-file>
where <options> are
-n or --noindent - do not indent
-t or --indent-tab - indent output with tabulation
-s or --indent-spaces <num> - indent output with <num> spaces
-o or --omit-decl - omit xml declaration <?xml version="1.0"?>
--net - allow network access
-R or --recover - try to recover what is parsable
-D or --dropdtd - remove the DOCTYPE of the input docs
-C or --nocdata - replace cdata section with text nodes
-N or --nsclean - remove redundant namespace declarations
-e or --encode <encoding> - output in the given encoding (utf-8, unicode...)
-H or --html - input is HTML
-h or --help - print help
When I run
cat unformatted.html | xmlstarlet fo -H -R --encode utf-8
I am returned the error message
failed to load external entity "utf-8"
In my limited experience, xmlstarlet fo especially, needs the stdin dash to work (better).
In your example, the 'unformatted.html' contents are piped to xmlstarlet.
But xmlstarlet fo doesn't 'see' the piped input, if you don't use a - (dash).
It assumes that the last argument (utf-8) is the filename ("external entity") whose contents you're trying to format. Obviously, there's no such file. Just to be on the safe side, I'd also enclose the encoding argument with double quotes, like so: "utf-8".
Altering your statement to
xmlstarlet fo -H -R --encode "utf-8" unformatted.html
should do the trick.
The cat is unnecessary, I'd think.

Setting file modification date from exif date

To set the file modification date of images to the exif date, I tried the following:
exiftool '-FileModifyDate<DateTimeOriginal' image.jpg
But this gives me an error about SetFileTime.
So maybe exiftool cannot do it in linux.
Can I combine
exiftool -m -p '$FileName - $DateTimeOriginal' -if '$DateTimeOriginal' -DateTimeOriginal -s -S -ext jpg . with "touch --date ..."?
See this Exiftool Forum post.
The command used there is (take note of the use of backticks, not single quotes):
touch -t `exiftool -s -s -s -d "%Y%m%d%H%M.%S" -DateTimeOriginal TEST.JPG` TEST.JPG
But I'm curious about your error. Exiftool should be able to set the FileModifyDate on Linux (though FileCreateDate is a different story). What version of Exiftool are you using (exiftool -ver to check)?
Another possibility is that the DateTimeOriginal tag is malformed or doesn't have the full date/time info in it.
FWIW, StarGeek's answer was a great pointer in the right direction, but it did not work for me: many of my photos were reported to have "Invalid EXIF text encoding" (no obvious difference compared to those that were "fine"), even though exiftool somefile.jpg would clearly output a valid "Modify Date".
So this is what I did:
for i in *.jpg ; do d=`exiftool $i | grep Modify | sed 's/.*: //g'` ; echo "$i : $d" ; done
...to produce output like this:
CAM00786.jpg : 2013:11:19 18:47:27
CAM00787.jpg : 2013:11:25 08:46:08
CAM00788.jpg : 2013:11:25 08:46:19
...
It was enough for me to output the timestamps next to the file names, but given a little bit of date-time formatting, it could easily be used to "touch" the files to modify their filesystem timestamps.

Extracting the contents between two different strings using bash or perl

I have tried to scan through the other posts in stack overflow for this, but couldn't get my code work, hence I am posting a new question.
Below is the content of file temp.
<?xml version="1.0" encoding="UTF-8"?>
<env:Envelope xmlns:env="http://schemas.xmlsoap.org/soap/envelope/<env:Body><dp:response xmlns:dp="http://www.datapower.com/schemas/management"><dp:timestamp>2015-01-
22T13:38:04Z</dp:timestamp><dp:file name="temporary://test.txt">XJzLXJlc3VsdHMtYWN0aW9uX18i</dp:file><dp:file name="temporary://test1.txt">lc3VsdHMtYWN0aW9uX18i</dp:file></dp:response></env:Body></env:Envelope>
This file contains the base64 encoded contents of two files names test.txt and test1.txt. I want to extract the base64 encoded content of each file to seperate files test.txt and text1.txt respectively.
To achieve this, I have to remove the xml tags around the base64 contents. I am trying below commands to achieve this. However, it is not working as expected.
sed -n '/test.txt"\>/,/\<\/dp:file\>/p' temp | perl -p -e 's#<dp:file name="temporary://test.txt">##g'|perl -p -e 's#</dp:file>##g' > test.txt
sed -n '/test1.txt"\>/,/\<\/dp:file\>/p' temp | perl -p -e 's#<dp:file name="temporary://test1.txt">##g'|perl -p -e 's#</dp:file></dp:response></env:Body></env:Envelope>##g' > test1.txt
Below command:
sed -n '/test.txt"\>/,/\<\/dp:file\>/p' temp | perl -p -e 's#<dp:file name="temporary://test.txt">##g'|perl -p -e 's#</dp:file>##g'
produces output:
XJzLXJlc3VsdHMtYWN0aW9uX18i
<dp:file name="temporary://test1.txt">lc3VsdHMtYWN0aW9uX18i</dp:response> </env:Body></env:Envelope>`
Howeveer, in the output I am expecting only first line XJzLXJlc3VsdHMtYWN0aW9uX18i. Where I am commiting mistake?
When i run below command, I am getting expected output:
sed -n '/test1.txt"\>/,/\<\/dp:file\>/p' temp | perl -p -e 's#<dp:file name="temporary://test1.txt">##g'|perl -p -e 's#</dp:file></dp:response></env:Body></env:Envelope>##g'
It produces below string
lc3VsdHMtYWN0aW9uX18i
I can then easily route this to test1.txt file.
UPDATE
I have edited the question by updating the source file content. The source file doesn't contain any newline character. The current solution will not work in that case, I have tried it and failed. wc -l temp must output to 1.
OS: solaris 10
Shell: bash
sed -n 's_<dp:file name="\([^"]*\)">\([^<]*\).*_\1 -> \2_p' temp
I add \1 -> to show link from file name to content but for content only, just remove this part
posix version so on GNU sed use --posix
assuming that base64 encoded contents is on the same line as the tag around (and not spread on several lines, that need some modification in this case)
Thanks to JID for full explaination below
How it works
sed -n
The -n means no printing so unless explicitly told to print, then there will be no output from sed
's_
This is to substitute the following regex using _ to separate regex from the replacement.
<dp:file name=
Regular text
"\([^"]*\)"
The brackets are a capture group and must be escaped unless the -r option is used( -r is not available on posix). Everything inside the brackets is captured. [^"]* means 0 or more occurrences of any character that is not a quote. So really this just captures anything between the two quotes.
>\([^<]*\)<
Again uses the capture group this time to capture everything between the > and <
.*
Everything else on the line
_\1 -> \2
This is the replacement, so replace everything in the regex before with the first capture group then a -> and then the second capture group.
_p
Means print the line
Resources
http://unixhelp.ed.ac.uk/CGI/man-cgi?sed
http://www.grymoire.com/Unix/Sed.html
/usr/xpg4/bin/sed works well here.
/usr/bin/sed is not working as expected in case if the file contains just 1 line.
below command works for a file containing only single line.
/usr/xpg4/bin/sed -n 's_<env:Envelope\(.*\)<dp:file name="temporary://BackUpDir/backupmanifest.xml">\([^>]*\)</dp:file>\(.*\)_\2_p' securebackup.xml 2>/dev/null
Without 2>/dev/null this sed command outputs the warning sed: Missing newline at end of file.
This because of the below reason:
Solaris default sed ignores the last line not to break existing scripts because a line was required to be terminated by a new line in the original Unix implementation.
GNU sed has a more relaxed behavior and the POSIX implementation accept the fact but outputs a warning.

Xmlstarlet and sed to replace string in a file

I have huge number of html files. I need to replace all the , and " with html entities &nsbquo and &quto respectively.
I need to succeed in two steps for this:
1) Find all the text between tags. I need to replace only in this text between tags.
2) Replace all required strings using sed
My command for this is :
xmlstarlet sel -t -v "*//p" "index.html" | sed 's/,/\&nsbquo/'
This works, but now I dont know how to put back the changes to index.html file.
In sed we have -i option, but for that I need to specify the filename with sed command. But in my case, i have to use | to filter out the required string from html file.
Please help. I did a lot of search for this from 2 days but no luck.
Thank you,
Divya.
The main problem here is that in XML there is no difference between " and ", so you can't use xmlstarlet to do this directly. You could replace " with a special string and then use sed to replace that with ":
xmlstarlet ed -u "//p/text()" \
-x "str:replace(str:replace(., ',', '#NSBQUO#'), '\"', '#QUOT#')" \
quote.html | \
sed 's/#NSBQUO#/\&nsbquo\;/g; s/#QUOT#/\&quot\;/g' > quote-new.html
mv quote-new.html quote.html
NOTE: str:replace and other exslt functions were only added to xmlstarlet ed in version 1.3.0, so it was not available at the time this question was asked.

How to search and replace in text files only?

I have a directory containing a bunch of files, some text some binary, with no consistent naming. I want to search and replace a string in text files only. So I went with:
perl -i -pne 's#/some/text/to/replace#/replacement/text#' *
Remove the -i option and you will see that binary files get caught. How do I modify this one-liner to skip binary files?
ack -n --text --sort -f . | xargs perl -i -pne 's…'
Abusing ack goes much quicker than writing your own solution with -T.
Well, this is all based on what your definition of a text file is. Perl 5 has the -T filetest operator that will tell you if a filename or filehandle is a text file (using Perl 5's definition):
perl -i -pne 'BEGIN{#ARGV=grep-T,#ARGV}s#regex#replacement#' *
The BEGIN block will filter out any files that don't pass the -T test, so they won't even be read (except for their first block because that is what -T uses to determine if they are text).
From perldoc -f -X
The -T and -B switches work as follows. The first block or so of the file is examined for odd characters such as strange control codes or characters with the high bit set. If too many strange characters (>30%) are found, it's a -B file; otherwise it's a -T file. Also, any file containing a zero byte in the first block is considered a binary file. If -T or -B is used on a filehandle, the current IO buffer is examined rather than the first block. Both -T and -B return true on an empty file, or a file at EOF when testing a filehandle. Because you have to read a file to do the -T test, on most occasions you want to use a -f against the file first, as in next unless -f $file && -T $file .