How to insert a keyword after 3 CONSECUTIVE pattern using SED - sed

The below example gives the required result but it works for consecutive and non-consecutive pattern search
I need to have this logic only for consecutive patterns
ORANGE should be inserted after every 3 continuous occurrences of APPLE
sed "/APPLE/{p;s/.*/1/;H;g;/^\(\n1\)\{3\}$/s//ORANGES/p;d}" < input.txt > output.txt
Input
APPLE
APPLE
APPLE
APPLE
APPLE
APPLE
APPLE
APPLE
MANGO
APPLE
APPLE
CURRENT OUTPUT
APPLE
APPLE
APPLE
ORANGE
APPLE
APPLE
APPLE
ORANGE
APPLE
APPLE
MANGO
APPLE
ORANGE -------->>> NOT NEEDED <<
APPLE

ORANGE should be inserted after every 3 continuous occurrences of APPLE
The following script:
#!/bin/bash
cat <<EOF |
APPLE
APPLE
APPLE
APPLE
APPLE
APPLE
APPLE
APPLE
MANGO
APPLE
APPLE
APPLE
EOF
sed '
# Add to hold space and inspect hold space
H
x
/^\(\nAPPLE\)\{3\}$/{
# 3 apples in hold space, means we add orange to pattern space
# and clear hold space
s///
x
s/$/\nORANGE/
x
}
# There are at least 3 lines in hold space
/^\(\n[^\n]*\)\{3\}/{
# Remove first line from hold space
s/\n[^\n]*//
}
x
'
outputs:
APPLE
APPLE
APPLE
ORANGE
APPLE
APPLE
APPLE
ORANGE
APPLE
APPLE
MANGO
APPLE
APPLE
APPLE
ORANGE

This might work for you (GNU sed):
sed '/APPLE/!b;n;//!b;n;//!b;a\ORANGE' file
This will append the line ORANGE after 3 consecutive lines with the string APPLE contained with them.
To parametrize the above solution for n consecutive lines (e.g. 5), use:
sed '/APPLE/!b;'$(printf 'n;//!b;%.0s' {2..5})'a\ORANGE' file
Another alternative:
sed '/APPLE/!b;:a;N;/\n[^\n]*APPLE[^\n]*$/!b;s/[^\n]*/&/3;Ta;a\ORANGE' file
If the value to be appended is a variable, use:
sed '/APPLE/!b;:a;N;/\n[^\n]*APPLE[^\n]*$/!b;s/[^\n]*/&/3;Ta;a\'"$var" file

Related

Suppressing renumbering of ordered lists in export

I would like to refer to a few of Alan Perlis' Epigrams on Programming by their original numbers, but in an Org Mode ordered list.
When I export my document, the numbers I provide for the list items are discarded and replaced with new numbers, beginning with 1.
The raw source text:
#+begin_example
A few of Alan J. Perlis\rsquo{} [[http://www-pu.informatik.uni-tuebingen.de/users/klaeren/epigrams.html][Epigrams on Programming]]:
8. A programming language is low level when its programs require attention to the irrelevant.
15. Everything should be built top-down, except the first time.
31. Simplicity does not precede complexity, but follows it.
54. Beware of the Turing tar-pit in which everything is possible but nothing of interest is easy.
#+end_example
The text as rendered, and renumbered, by export:
#begin_quote
A few of Alan J. Perlis\rsquo{} [[http://www-pu.informatik.uni-tuebingen.de/users/klaeren/epigrams.html][Epigrams on Programming]]:
8. A programming language is low level when its programs require attention to the irrelevant.
15. Everything should be built top-down, except the first time.
31. Simplicity does not precede complexity, but follows it.
54. Beware of the Turing tar-pit in which everything is possible but nothing of interest is easy.
#end_quote
You can set the item number to whatever you want by beginning the text of the item with [#8] (for example). See Ordered List here.
A working example:
A list with custom ordering:
1. [#8] apple
1. [#77] orange
1. [#101] lime
When you export the document the list numbers will be 8, 77, and 101.

Sed's regex to eliminate a very specific string

Disclaimer:
I have found several examples in this site that address questions/problems similar to mine, though I was unfortunately not able to figure out the modifications that would need to be introduced to fit my needs.
The "Problem":
I have a list of servers (VMs) that have it's UUID embedded as part of the name. I need to get rid of that in order to obtain the "pure/clean" server name. Now, the problem is precisely that: I need to get rid of the UUID (which has a very specific and constant format, more details on this below) and ONLY that, nothing else.
The UUID - as you might already know or have noticed - has a specific and constant format which consists of the following parts:
It starts with a dash (-).
Which is followed by a subset of 8 alphanumeric characters (letters are always lowercase).
Which is followed by a dash (-).
Which is followed by a subset of 4 alphanumeric characters (letters are always lowercase).
Which is followed by a dash (-).
Which is followed by a subset of 4 alphanumeric characters (letters are always lowercase).
Which is followed by a dash (-).
Which is followed by a subset of 4 alphanumeric characters (letters are always lowercase).
Which is followed by a dash (-).
Which is followed by a subset of 12 alphanumeric characters (letters are always lowercase).
Samples of results achieved using "my" """"code"""":
In this case the result is the expected one:
echo PRODSERVER0022-872151c8-1a75-43fb-9b63-e77652931d3f | sed 's/-[a-z0-9]*//g'
PRODSERVER0022
In this case the result is the expected one too:
echo PRODSERVER0022-872151c8-1a75-43fb-9b63-e77652931d3f_OLD | sed 's/-[a-z0-9]*//g'
PRODSERVER0022_OLD
Expected result: PRODSERVER0022-OLD
echo PRODSERVER0022-872151c8-1a75-43fb-9b63-e77652931d3f-OLD | sed 's/-[a-z0-9]*//g'
PRODSERVER0022
Expected result: PRODSERVER00-22
echo PRODSERVER00-22-872151c8-1a75-43fb-9b63-e77652931d3f-old | sed 's/-[a-z0-9]*//g'
PRODSERVER00
I know that, within the sed universe, a . means "any character", while a * means "any number of the preceding character". However, what I would need in this case, as I see it at least, is a way to tell sed to do the replacement only if this specific sequence is present (8 alphanumeric characters [any, but specifically 8, not more, not less]; followed by a dash, then followed by 4 alphanumeric characters [any, but specifically 4, not more, not less], etc..). So, the question would be: Is there a regex construction (or a combination [through piping I guess] of several of them, if it has to be the case) that can achieve the expected results in this case?
Note that: Even though servers may have additional dashes (-) as part of their names, the resulting sub-strings will never consist of 8 characters, neither of 4. They might, however, end up having 12 characters, which, even though would initially match up with the last sub-string in the UUID, it will not be at the end of the string, so we have that to discriminate between these two 12-chars substrings (and also it will not be a problem if there is indeed a regex combination that can get rid of the UUID as a whole).
Try this to match the UUID.
-[a-f0-9]{8}-[a-f0-9]{4}-[a-f0-9]{4}-[a-f0-9]{4}-[a-f0-9]{12}
Embed it in the sed command line in the usual way. As Benjamin W. has said, we need to use extended regular expressiongs.
sed -E 's/-[a-f0-9]{8}-[a-f0-9]{4}-[a-f0-9]{4}-[a-f0-9]{4}-[a-f0-9]{12}//g'

How can I convert text to title case?

I have a text file containing a list of titles that I need to change to title case (words should begin with a capital letter except for most articles, conjunctions, and prepositions).
For example, this list of book titles:
barbarians at the gate
hot, flat, and crowded
A DAY LATE AND A DOLLAR SHORT
THE HITCHHIKER'S GUIDE TO THE GALAXY
should be changed to:
Barbarians at the Gate
Hot, Flat, and Crowded
A Day Late and a Dollar Short
The Hitchhiker's Guide to the Galaxy
I wrote the following code:
while(<DATA>)
{
$_=~s/(\s+)([a-z])/$1.uc($2)/eg;
print $_;
}
But it capitalizes the first letter of every word, even words like "at," "the," and "a" in the middle of a title:
Barbarians At The Gate
Hot, Flat, And Crowded
A Day Late And A Dollar Short
The Hitchhiker's Guide To The Galaxy
How can I do this?
Thanks to See also Lingua::EN::Titlecase – Håkon Hægland given the way to get the output.
use Lingua::EN::Titlecase;
my $tc = Lingua::EN::Titlecase->new();
while(<DATA>)
{
my $line = $_;
my $tc = Lingua::EN::Titlecase->new($line);
print $tc;
}
You can also try using this regex: ^(.)(.*?)\b|\b(at|to|that|and|this|the|a|is|was)\b|\b(\w)([\w']*?(?:[^\w'-]|$)) and replace with \U$1\L$2\U$3\L$4. It works my matching the first letter of words that are not articles, capitalizing it, then matching the rest of the word. This seems to work in PHP, I don't know about Perl but it will likely work.
^(.)(.*?)\b matches the first letter of the first word (group 1) and the rest of the word (group 2). This is done to prevent not capitalizing the first word because it's an article.
\b(word|multiple words|...)\b matches any connecting word to prevent capitalizing them.
(\w)([\w']*?(?:[^\w'-]|$)) matches the first letter of a word (group 3) and the rest of the word (group 4). Here I used [^\w'-] instead of \b so hyphens and apostrophes are counted as word characters too. This prevent 's from becoming 'S
The \U in replacement capitalizes the following characters and \L lowers them. If you want you can add more articles or words to the regex to prevent capitalizing them.
UPDATE: I changed the regex so you can include connecting phrases too (multiple words). But that will still make a very long regex...

How will Perl 6 handle the new combining emoji length?

Some emoji now combine. For instance, U+1f441 (👁) U+200d (ZWJ) U+1f5e8 (🗨) combine to make 👁‍🗨 (I am a witness). Rakudo 2016.07.1 on MoarVM 2016.07 says there are two graphemes:
> "\x[1f441]\x[200d]\x[1f5e8]".chars
2
I think that should be 1. It seems to have a similar problem with
> "\x[1f441]\x[fe0f]\x[200d]\x[1f5e8]\x[fe0f]".chars
2
But at least it handles U+fe0f (VS-16, emoji representation) correctly.
Are there plans to fix this in a later version of Perl 6 or am I misunderstanding the intent of the chars method?
The ZWJ sequence you mentioned is only part of Unicode Emoji 4.0 which is still in draft status and planned for release in November 2016. Under this new version, U+1F5E8 has the Grapheme_Cluster_Break property E_Base_GAZ (EBG), so the sequence should indeed form a single grapheme cluster.
I'm sure that Perl 6 will catch up at some point.

Using a newline character with Core Data and displaying it properly in the iPhone

I'm storing a list of things in one string stored in a core data database. For example.. the stored string would look like #"apple \n pear \n orange". I'm using a UITextView to display the list and I want it to display like:
apple
pear
orange
..but it just displays:
apple \n pear \n orange
Anyone know how to get it to honor the newline characters?
I'm setting the UITextView's text just like this. list is equal to "apple \n pear \n orange"
myTextView.text = fruits.list;
this is happening because xcode takes "\n" as a literal string \n.
You have to have 2 \'s because the first is an escape character that will tell xcode that your \ is not a string. It looks like:
NSString #"apple \\n pear \\n orange";