Multi-line search with sed - sed

I need to append a few lines to a configuration file. The format is something like follows:
[Topic1]
param=foo
param=bar
param=foobar
[Topic2]
param=one
param=two
etc...
I am trying to write a script using sed to append parameters to a specific topic. Since all the topics have param=, I can't just insert a line after the last occurrence of that string. Also, I can't count on the value of the last parameter being consistent so for example I can't just insert a line after the string param=two
Any help would be appreciated. I'm not too familiar with mutliline sed-fu.
Thanks!

sed -i -r ':a; N; $!ba; s/\[Topic1\]\n(param=[a-zA-Z]*\n)*/&param=VALUE\n/g' FILE_NAME
Basically what :a; N; $!ba; doing is append all line when not the last line (N) to the tag created by :a so that we can use \n in our expression.
Then match [Topic1] followed by arbitrary number of param=xxx, and append param=VALUE to the end of the matching result (&).

Related

remove last delimiter in sed/awk/perl

An input file is given, each line of which contains delimited data with extra delimiter at the end in data/header with or without enclosures.
Extra delimiter at the end it can contain with/without spaces.
Scenario 1 : Header & Data contain extra delimiter at the end
eno|ename|address|
A|B|C|
D|E|F|
Scenario 2 : Header doesn't contain extra delimiter at the end
eno|ename|address
A|B|C|
D|E|F|
Scenario 3 : With enclosures
eno|ename|address|
1|2|"A"|
Final output has to be like
Scenario 1 :
eno|ename|address
A|B|C
D|E|F
Scenario 2 :
eno|ename|address
A|B|C
D|E|F
Scenario 3 :
eno|ename|address
1|2|"A"
Solution which i have tried so far. But below solution won't work for all three scenarios is there anyway which i can make single command to support all the three scenarios in Sed/Awk/Perl
perl -pne 's/(.*)\|/$1/' filename
Could you please try following.
awk '{gsub(/\|$|\| +$/,"")} 1' Input_file
Explanation:
gsub is awk function which Globally substitute matched pattern with mentioned value.
Explanation of regex:
/\|$|\| +$/: Here there are 2 parts of regex. First is /\|$ and second is +$ which is segrigated with | where 1st regex is for removing | from last of the line and second regex removes | with space at last. So it basically takes care of both conditions successfully.
perl -lpe 's/\|\s*$//' file
will do it. That only removes pipes followed by optional whitespace at the end of each line. Note the $ line anchor.
I added the -l since each line's newline will get removes by the s/// command, and -l will put it back.
All you need is this:
sed 's/|$//'
A bit more generic. Let's assume you have the same problem, but with different field separators in different files. Some of these field separators are regular expressions (e.g. a sequence of blanks), others are just a single character c. With a tiny little awk program you can get far:
# remove_last_empty_field.awk
# 1. Get the correct `fs`
BEGIN { fs=FS; if(length(FS)==1) fs=(FS==" ") ? "[[:blank:]]+" : "["FS"]" }
# remove the empty field
{ sub(fs"$","") }
# Print the current record
1
Now you can run this on your various files as:
$ awk -f remove_last_empty_field.awk f1.txt
$ awk -f remove_last_empty_field.awk FS="|" f2.txt
$ awk -f remove_last_empty_field.awk FS="[|.*]" f3.txt
perl -pi -e 's/\|$//' Your_FIle

Extract filename from multiple lines in unix

I'm trying to extract the name of the file name that has been generated by a Java program. This Java program spits out multiple lines and I know exactly what the format of the file name is going to be. The information text that the Java program is spitting out is as follows:
ABCASJASLEKJASDFALDSF
Generated file YANNANI-0008876_17.xml.
TDSFALSFJLSDJF;
I'm capturing the output in a variable and then applying a sed operator in the following format:
sed -n 's/.*\(YANNANI.\([[:digit:]]\).\([xml]\)*\)/\1/p'
The result set is:
YANNANI-0008876_17.xml.
However, my problem is that want the extraction of the filename to stop at .xml. The last dot should never be extracted.
Is there a way to do this using sed?
Let's look at what your capture group actually captures:
$ grep 'YANNANI.\([[:digit:]]\).\([xml]\)*' infile
Generated file YANNANI-0008876_17.xml.
That's probably not what you intended:
\([[:digit:]]\) captures just a single digit (and the capture group around it doesn't do anything)
\([xml]\)* is "any of x, m or l, 0 or more times", so it matches the empty string (as above – or the line wouldn't match at all!), x, xx, lll, mxxxxxmmmmlxlxmxlmxlm, xml, ...
There is no way the final period is removed because you don't match anything after the capture groups
What would make sense instead:
Match "digits or underscores, 0 or more": [[:digit:]_]*
Match .xml, literally (escape the period): \.xml
Make sure the rest of the line (just the period, in this case) is matched by adding .* after the capture group
So the regex for the string you'd like to extract becomes
$ grep 'YANNANI.[[:digit:]_]*\.xml' infile
Generated file YANNANI-0008876_17.xml.
and to remove everything else on the line using sed, we surround regex with .*\( ... \).*:
$ sed -n 's/.*\(YANNANI.[[:digit:]_]*\.xml\).*/\1/p' infile
YANNANI-0008876_17.xml
This assumes you really meant . after YANNANI (any character).
You can call sed twice: first in printing and then in replacement mode:
sed -n 's/.*\(YANNANI.\([[:digit:]]\).\([xml]\)*\)/\1/p' | sed 's/\.$//g'
the last sed will remove all the last . at the end of all the lines fetched by your first sed
or you can go for a awk solution as you prefer:
awk '/.*YANNANI.[0-9]+.[0-9]+.xml/{print substr($NF,1,length($NF)-1)}'
this will print the last field (and truncate the last char of it using substr) of all the lines that do match your regex.

Insert specific lines from file before first occurrence of pattern using Sed

I want to insert a range of lines from a file, say something like 210,221r before the first occurrence of a pattern in a bunch of other files.
As I am clearly not a GNU sed expert, I cannot figure how to do this.
I tried
sed '0,/pattern/{210,221r file
}' bunch_of_files
But apparently file is read from line 210 to EOF.
Try this:
sed -r 's/(FIND_ME)/PUT_BEFORE\1/' test.text
-r enables extendend regular expressions
the string you are looking for ("FIND_ME") is inside parentheses, which creates a capture group
\1 puts the captured text into the replacement.
About your second question: You can read the replacement from a file like this*:
sed -r 's/(FIND_ME)/`cat REPLACEMENT.TXT`\1/' test.text
If replace special characters inside REPLACEMENT.TXT beforehand with sed you are golden.
*= this depends on your terminal emulator. It works in bash.
In https://stackoverflow.com/a/11246712/4328188 CodeGnome gave some "sed black magic" :
In order to insert text before a pattern, you need to swap the pattern space into the hold space before reading in the file. For example:
sed '/pattern/ {
h
r file
g
N
}' in
However, to read specific lines from file, one may have to use a two-calls solution similar to dummy's answer. I'd enjoy knowing of a one-call solution if it is possible though.

Printing all words that start with "#" using sed in BASH

I have a file with a lot of text, but I want to print only words that contain "#" at the beginning. Ex:
My name is #Laura and I live in #London. Name=#Laura. City=#London
How can I print all words that start with #?.I did this the following and it worked, but I want to do it using sed. I tried several patters, but I cannot make it print anything.
grep -o -E "#\w+" file.txt
Thanks
Use this sed command:
sed 's/[^#]*\(#[^ .]*\)/\1\n/g' file.txt
Explanation: we invoke the substitution command of sed. This has following structure: sed 's/regex/replace/options'. We will search for a regex and replace it using the g option. g makes sure the match is made multiple times per line.
We look for a series of non at chars followed by an # and a number of non-spaces #[^ ]*. We put this last part in a group \(\) and sub it during the replacement \1.
Note that we add a newline at the end of each match, you can also get the output on a single line by omitting the \n.

How to replace the nth occurrence of a string using sed

Is there any way to replace the nth occurrence of a string in a file using sed?
I'm using sed -i '0,/jack.*/ s//jill/' to replace the first occurrence.
How can i change it so that it replaces the nth occurrence?
My file contents the following lines:
first line
second line
third line
jack=1
fifth line
jack=
seventh line
I don't know the value after jack=, it can be anything or nothing.
I want to replace the 2nd occurrence of jack= and anything that follows it with jill.
First replace all the newlines with a unique character that does not occur anywhere else in your file (e.g. ^) using tr. You need to do this in order to create a single string for sed. Then pass it to sed and tell it to replace the nth occurrence of your string. Finally, pass the output back through tr to recreate the newlines.
For n=2, the command is:
$ tr '\n' '^' < file | sed 's/jack/jill/2' | tr '^' '\n'
first line
second line
third line
jack
fifth line
jill
seventh line
Update:
It can also be done with sed, WITHOUT changing the newlines first, using the following command:
$ sed ':a;N;$!ba;s/jack/jill/2' file
Alternatively, use awk:
$ awk '/jack/{c+=1}{if(c==2){sub("jack","jill",$0)};print}' file
Try this, sed ':a;N;$!ba;s/word1/word2/n' filename
Here, :a;N;$!ba is used to load the entire file into memory, line by line, so that sed can process the whole file in a single pass. The s/word1/word2/N substitution then replaces every Nth occurrence of word1 with word2.