Exiftool - modify metadata format - exiftool

Suppose I have 5000 images with following metadata in the LABEL field.
0001 ELEPHANT
0002 ELEPHANT
0003 ELEPHANT
...
4999 ELEPHANT
5000 ELEPHANT
I wish to change the format to:
ELEPHANT-0001
ELEPHANT-0002
ELEPHANT-0003
…
ELEPHANT-4999
ELEPHANT-5000
In other words, I want to do the following for a metadata field of multiple images:
#### NAME —> NAME-####
From what I can gather there could be two ways of doing this
Ignore the current metadata in the images, and reference a (plain text? csv?) file that I prepare separately; or
Read the file's metadata as a string, identify the space and the number preceding the space, save that number, and finally make a new string by concatenating the number and space, and adding a hyphen in between!
Any suggestions?

Expanding upon the answer I gave in the exiftool forums.
The basic command would be
exiftool "-LABEL<${LABEL;s/(\d{4}) (.*)/$2-$1/}" <FileOrDir>
You basically want to copy a tag into the same tag, with some modifications. The option to copy a tag is the less than (or greater than) symbol < or >. A common mistake is to use the equal sign = which is used to assign a static value to a tag.
To do the modification to the tag, it takes the Advance Formatting option, which is actually some in-line perl code. In this example, the tag is treated as a perl string and a regex substitution is used. It matches and captures the first four digits (\d{4}), matches the space (but doesn't capture it), then matches and captures the rest of the tag (.*). The two captures are assigned to the variables $1 and $2, respectively. In the replace half of the substitution $2-$1, the two captures are reversed with the hyphen between them.
To take full advantage of the advance formatting, some basic perl and regex knowledge is helpful.
Once you are sure of the command, you can add -overwrite_original to suppress the generation of backup files and -r to recurse into subdirectories.

Related

I like to cut my images' IPTC core Title to a certain number

I like to cut my images' IPTC core Title, Description, Headline characters at to a certain number.
So for example if an image has 300 characters, I would like to cut it at 195 characters to be 195 character long.
It would be great if it would cut it at the nearest word, resulting a title below 195 character
To cut to a specific length, you could use this command:
exiftool -if "length($Title)>195" "-Title<${Title;s/^(.{1,195}).*/$1/s}" /path/to/files/
Cutting on a word, based upon this StackOverflow answer, would be:
exiftool -if "length($Title)>195" "-Title<${Title;s/^(.{1,195}(?!\w)).*/$1/s}" /path/to/files/
This command creates backup files. Add -overwrite_original to suppress the creation of backup files. Add -r to recurse into subdirectories. If this command is run under Unix/Mac, reverse any double/single quotes to avoid bash interpretation.

officejs : Search Word document using regular expression

I want to search strings like "number 1" or "number 152" or "number 36985".
In all above strings "number " will be constant but digits will change and can have any length.
I tried Search option using wildcard but it doesn't seem to work.
basic regEx operators like + seem to not work.
I tried 'number*[1-9]*' and 'number*[1-9]+' but no luck.
This regular expression only selects upto one digit. e.g. If the string is 'number 12345' it only matches number 12345 (the part which is in bold).
Does anyone know how to do this?
Word doesn't use regular expressions in its search (Find) functionality. It has its own set of wildcard rules. These are very similar to RegEx, but not identical and not as powerful.
Using Word's wildcards, the search text below locates the examples given in the question. (Note that the semicolon separator in 1;100 may be soemthing else, depending on the list separator set in Windows (or on the Mac). My European locale uses a semicolon; the United States would use a comma, for example.
"number [0-9]{1;100}"
The 100 is an arbitrary number I chose for the maximum number of repeats of the search term just before it. Depending on how long you expect a number to be, this can be much smaller...
The logic of the search text is: number is a literal; the valid range of characters following the literal are 0 through 9; there may be one to one hundred of these characters - anything in that range is a match.
The only way RegEx can be used in Word is to extract a string and run the search on the string. But this dissociates the string from the document, meaning Word-specific content (formatting, fields, etc.) will be lost.
Try putting < and > on the ends of your search string to indicate the beginning and ending of the desired strings. This works for me: '<number [1-9]*>'. So does '<number [1-9]#>' which is probably what you want. Note that in Word wildcards the # is used where + is used in other RegEx systems.

How to fill a field with spaces until a length in Notepad++

I've prepared a macro in Notepad++ to transform a ldif file in a csv file with a few fields. Everything is OK but I have a final problem: I have to have 2 fields with a specific length and in this moment I cannot ensure that length because in the source file they are not coming so
For instance, I generate this line:
12345,namenamename,123456
And I have to ensure that the 2nd and 3rd fields have 30 (filling with spaces at right side) and 9 (filling with zeros at left) characters, so in this case I should generate:
12345,namenamename ,000123456
I haven't found how Notepad++ could match a pattern in order to add spaces/zeros, so I have though in to add 1 space/zero to the proper field and repeat this step so many times as needed to ensure the lengths (this is, 29 and 8, because they cannot come empty) and search with the length in the regex (for instance: \d{1,8} for the third field)
My question is: can I repeat only one step of the macro several times (and the rest of the macro only 1 repetition)?
I've read the wiki related to this point (http://sourceforge.net/apps/mediawiki/notepad-plus/index.php?title=Editing_Configuration_Files#.3CMacros.3E) and I don't found anything neither
If not possible, how could be a good solution? Create another 2 different macros and after execute the main one, execute this new 2 macros several times?
Thanks in advance!
A two pass solution with Notepad++ is possible. Find a pair of characters or two short sequence of characters that never occurs in your data file. I will use =#<= and =>#= here.
First pass, generate or convert the input text into the form 12345,=#<=namenamename______________________________,000000000123456=>#=. Ie add 30 spaces after the name and nine zeroes before the number (underscores used here just to make things clearer).
Second pass, do a regular expression search for =#<=(.{30})_*,0*(\d{9})=>#= and replace with \1,\2.
I have just suggested a similar solution in special timestamp format of csv

Sed for partial replacement?

Imagine I have a file that has the following type of line:
FIXED_DATA1 VARIABLE_DATA FIXED_DATA2
I want to change the fixed data and leave the variable data as is. For various reasons, using two sed operations to replace the fixed data will not work. For instance, the fixed fields might be double-quotes, and the line has other areas containing them, thus really the regex is written to match a pattern in the variable data and the fixed data.
If I'm bent on using sed, is there a way to change both fixed data fields at once while leaving the variable field unchanged?
Thanks.
You need to partition the line into the three pieces, replace the outer two and leave the middle alone:
sed 's/^FIX1 \(.*\) FIX2$/New \1 End/'
You can make the beginning and end matches more complex as needed.

Removing a trailing Space from Regex Matched group

I'm using regular expression lib icucore via RegKit on the iPhone to
replace a pattern in a large string.
The Pattern i'm looking for looks some thing like this
| hello world (P1)|
I'm matching this pattern with the following regular expression
\|((\w*|.| )+)\((\w\d+)\)\|
This transforms the input string into 3 groups when a match is found, of which group 1(string) and group 3(string in parentheses) are of interest to me.
I'm converting these formated strings into html links so the above would be transformed into
Hello world
My problem is the trailing space in the third group. Which when the link is highlighted and underlined, results with the line extending beyond the printed characters.
While i know i could extract all the matches and process them manually, using the search and replace feature of the icu lib is a much cleaner solution, and i would rather not do that as a result.
Many thanks as always
Would the following work as an alternate regular expression?
\|((\w*|.| )+)\s+\((\w\d+)\)\| Where inserting the extra \s+ pulls the space outside the 1st grouping.
Though, given your example & regex, I'm not sure why you don't just do:
\|(.+)\s+\((\w\d+)\)\|
Which will have the same effect. However, both your original regex and my simpler one would both fail, however on:
| hello world (P1)| and on the same line | howdy world (P1)|
where it would roll it up into 1 match.
\|\s*([\w ,.-]+)\s+\((\w\d+)\)\|
will put the trailing space(s) outside the capturing group. This will of course only work if there always is a space. Can you guarantee that?
If not, use
\|\s*([\w ,.-]+(?<!\s))\s*\((\w\d+)\)\|
This uses a lookbehind assertion to make sure the capturing group ends in a non-space character.