I like to cut my images' IPTC core Title to a certain number - exiftool

I like to cut my images' IPTC core Title, Description, Headline characters at to a certain number.
So for example if an image has 300 characters, I would like to cut it at 195 characters to be 195 character long.
It would be great if it would cut it at the nearest word, resulting a title below 195 character

To cut to a specific length, you could use this command:
exiftool -if "length($Title)>195" "-Title<${Title;s/^(.{1,195}).*/$1/s}" /path/to/files/
Cutting on a word, based upon this StackOverflow answer, would be:
exiftool -if "length($Title)>195" "-Title<${Title;s/^(.{1,195}(?!\w)).*/$1/s}" /path/to/files/
This command creates backup files. Add -overwrite_original to suppress the creation of backup files. Add -r to recurse into subdirectories. If this command is run under Unix/Mac, reverse any double/single quotes to avoid bash interpretation.

Related

How to replace the same character in multiple text files?

So I have over 100 text files, all of which are over the size required to be opened in a normal text editor (eg; notepad, notepad++). Meaning I cannot use those mentioned.
All text files contain the same format, they contain:
abc0001:00000009a
abc0054:000000809a
abc00888:054450000009a
and so on..
I was wondering, how do I replace the ":" in each of those text files to then be "\n" (regex for new line)
So then it would be:
abc0001
00000009a
abc0054
000000809a
abc00888
054450000009a
How would I do this to all of the 100 text files, without doing this manually and individually. (if there's any way?)
Any help is appreciated.
You can use sed. The following does something similar to what you want. The question concerns Unix, but a lot of Unix utilities have been ported to MS Windows (even sed): http://gnuwin32.sourceforge.net/packages/sed.htm
UNIX: Replace Newline w/ Colon, Preserving Newline Before EOF
Something like (where you provide your text file as input, and the output becomes your new text file):
sed 's/:/\n/g'

Exiftool - modify metadata format

Suppose I have 5000 images with following metadata in the LABEL field.
0001 ELEPHANT
0002 ELEPHANT
0003 ELEPHANT
...
4999 ELEPHANT
5000 ELEPHANT
I wish to change the format to:
ELEPHANT-0001
ELEPHANT-0002
ELEPHANT-0003
…
ELEPHANT-4999
ELEPHANT-5000
In other words, I want to do the following for a metadata field of multiple images:
#### NAME —> NAME-####
From what I can gather there could be two ways of doing this
Ignore the current metadata in the images, and reference a (plain text? csv?) file that I prepare separately; or
Read the file's metadata as a string, identify the space and the number preceding the space, save that number, and finally make a new string by concatenating the number and space, and adding a hyphen in between!
Any suggestions?
Expanding upon the answer I gave in the exiftool forums.
The basic command would be
exiftool "-LABEL<${LABEL;s/(\d{4}) (.*)/$2-$1/}" <FileOrDir>
You basically want to copy a tag into the same tag, with some modifications. The option to copy a tag is the less than (or greater than) symbol < or >. A common mistake is to use the equal sign = which is used to assign a static value to a tag.
To do the modification to the tag, it takes the Advance Formatting option, which is actually some in-line perl code. In this example, the tag is treated as a perl string and a regex substitution is used. It matches and captures the first four digits (\d{4}), matches the space (but doesn't capture it), then matches and captures the rest of the tag (.*). The two captures are assigned to the variables $1 and $2, respectively. In the replace half of the substitution $2-$1, the two captures are reversed with the hyphen between them.
To take full advantage of the advance formatting, some basic perl and regex knowledge is helpful.
Once you are sure of the command, you can add -overwrite_original to suppress the generation of backup files and -r to recurse into subdirectories.

How do I tell org-mode to disable headings in verbatim text?

How do I make org mode not interpret a line that begins with an asterisk as a headline? I have some verbatim text in my org mode document. Some of the lines begin with an asterisk. Org mode interprets these lines as headlines. I don't want that.
Here is the text with some context:
* 20160721 Headline for July 21, 2016
I created a git repository for rfc-tools. It's in
~/Documents/rfc-tools.
Renamed grep-rfc-index.sh to search-rfc-index.sh because it searches.
That it uses grep is irrelevant.
Wrote a README.md for the project. Here it is:
#+BEGIN_SRC text
----- BEGIN QUOTED TEXT -----
This is the README.md for rfc-tools, a collection of programs for
processing IETF RFCs.
* fetch-rfcs-by-title.sh downloads into the current directory the RFCs
whose titles contain the string given on the command line. Uses an
rfc-index file in the current directory. Prefers the PDF version of
RFCs but will obtain the text version if the PDF is not available.
* fetch-sip-rfcs.sh downloads RFCs that contain "Session Initiation"
in their titles into the current directory.
* search-rfc-index.sh searches an rfc-index file in the current
directory for the string given on the command line. The string can
contain spaces.
* join-titles.awk turns the contents of an rfc-index file into a
series of long lines. Each line begins with the RFC number, then a
space, then the rest of the entry from the rfc-index.
----- END QUOTED TEXT -----
#+END_SRC
I want the lines between "----- BEGIN QUOTED TEXT -----" and "----- END QUOTED TEXT -----" to be plain text and subordinate to the headline "20160721 Headline for July 21, 2016". Org mode interprets all lines that begin with an asterisk as top-level headlines.
By the way, the verbatim text is Markdown. I hope that doesn't matter.
worked for me:
#+BEGIN_SRC markdown
Try wrapping your text in one of the various special block tags. For example you could try putting your text inside these tags:
#+BEGIN_SRC text
...
#+END_SRC
Here is a screenshot of how the formatting turns out on my Emacs:
If that doesn't meet your needs, you could try:
#+BEGIN_EXAMPLE
...
#+END_EXAMPLE
Which will render everything inside the tags without markup and in a monospace font.
If that doesn't work either, you could try one of the other kinds of tags listed here.
Escape the * with a comma like this,*
Probably if you type C-c ' to enter a special edit and then exit, org will do that for you.
I think the answer is "You can't do that". I found a way to work around the problem using drawers. The org-mode manual explains that a drawer is a place to put text that you don't want to see all of the time.
A StackExchange user had a question about
getting a custom org drawer to open/close. It seems that for older versions of org-mode, you must tell org-mode the names of your drawers. E.g. If you have a drawer named "COMMANDS"
:COMMANDS:
ls
cat
grep
:END:
you must tell org-mode the name of the drawer using the +DRAWERS keyword:
#+DRAWERS COMMAND
and restart org-mode.
I found a solution:
Escape Character
You may sometimes want to write text that looks like Org syntax, but should really read as plain text. Org may use a specific escape character in some situations, i.e., a backslash in macros (see Macro Replacement) and links (see Link Format), or a comma in source and example blocks (see Literal Examples). In the general case, however, we suggest to use the zero width space. You can insert one with any of the following:
C-x 8 zero width space
C-x 8 200B
For example, in order to write ‘[[1,2]]’ as-is in your document, you may write instead
[X[1,2]]
where ‘X’ denotes the zero width space character.
How to remove zero width space:
sed -i "s/$(echo -ne '\u200b')//g" abc.txt

How can I strip out "file separator" characters from CSS/text files?

My CSS files have become contaminated with "file separator" characters (AKA "INFORMATION SEPARATOR FOUR" or ALT/028 characters). How can I get rid of them?
This is the character:
http://www.fileformat.info/info/unicode/char/1c/index.htm
Background
I manage a number of .CSS text files that are fairly similar. Unfortunately a number of these file have somehow got "file separator" characters pasted into them. Although they do still seem to work in browsers any file that has one of these characters anywhere within it can not be indexed by my desktop search utility (X1 Search). And this is making them extremely hard for me to manage because I need to compare CSS files contantly.
[Bizarrely X1 Search ignores the character if the filename extension is .TXT but files to index the entire file if the filename extension is .CSS]
Worse this "file separator" character is almost invisible within my text editor (TextPad 7.2). The only way I can detect it is to make spaces and carriage returns visible and then it appears as blank space. Worse still it appears to be impossible to search for using text search.
To make it clear what I mean an example that I have pasted into this page. The "file separator" character is on LineB below
LineA
LineB
LineC
LineD
Is there any way to remove this character from multiple text (in this case CSS) files at once?
NB I do NOT want to remove the whole line, just the one character(!)
Thanks
J
P.S. I am running on Windows7 (x64). I am using TextPad 7.3.
I have eventually managed to answer my own question.
Text Crawler and the use of a regular expression of "\x1c" appears to be the answer.
Fwiw, both Agent Ransack and FileLocator Pro filter out any characters in the ASCII range 0-31 (excluding 0x09 - tab) from the input field.

What is a good way to encode diffs in absence of the source text?

The situation is this like: there's some text
hello world!
It is processed by my tool and is converted to some symbolic form, e.g.
[hello#0, world#6]
(notice how the ! is discarded).
Now my tool wants to recommend adding there to the original source text. My tool can send textual data back, so it makes sense to encode the delta in some format and send it back. Here's an example with diff:
1c1
< hello world!
---
> hello there world!
But the problem is that I cannot use the classical diff format because I don't have the original text any longer, and I can't produce that text from my model accurately (for example, because the ! is missing).
My question is, is there some standard textual format that can encode modifications in the middle of the line without knowing the entire line? Something like:
insert 'there ' at 1:6
I know diff itself has a few other possible output formats, but I could not spot anyone which can add things to the middle of a line without needing the entire new line content.
One of the output formats of diff is an ed script with diff -e. Now, diff produces ed scripts that make line-oriented edits, like delete lines or insert lines.
But since you're not necessarily using diff, you can make your tool output a finer-grained ed script which performs insertions and substitutions within a line.
Ed does not support numeric addressing of the characters within a line, but it can be done with regular expression match/replace.
To replace an n-character sequence starting at column m (counting from 1) with the text rep, you can use this command:
s/\(.\{m-1\}\).\{n\}/\1rep/
Here m-1 and n are replaced by decimal numbers. If m happens to be 1, then just
s/.\{n\}/&rep/
Your program has be careful about escaping the characters of rep, of course.
The edits are then applied to a file like this:
$ cp file file.tmp # operate in-place on file.tmp
$ (cat diffs ; echo wq) | ed -q file.tmp # edits are in file "diffs"