Replace the "pattern" on second-to-last line of a file - sed

I have to replace a "pattern" with a "string" on the second-to-last line of the file - file.txt.
The below three sed commands are able to print the second-to-last line. But I need to replace a "pattern" with a "string". Any help??
sed -e '$!{h;d;}' -e x file.txt
sed -n 'x;$p' file.txt
sed 'x;$!d' file.txt
$ cat file.txt
cabbage
spinach
collard greens
corn salad
Sweet pepper
kale
How to replace the second-to-last line of a file (Sweet pepper):
a. replace "Sweet" with "green" if second-to-last line contains "Sweet pepper"
b. replace the whole line with "carrots", no matter what it contains

To change Sweet to Green on the second to last line but only if that line contains Sweet pepper:
$ sed 'x; ${/Sweet pepper/s/Sweet/Green/;p;x}; 1d' file.txt
cabbage
spinach
collard greens
corn salad
Green pepper
kale
To replace the whole of the second to last line, regardless of what it contains, to carrots:
$ sed 'x; ${s/.*/carrots/;p;x}; 1d' file.txt
cabbage
spinach
collard greens
corn salad
carrots
kale
How it works
Let's take this command and examine it one step at a time:
sed 'x; ${s/.*/carrots/;p;x}; 1d'
x
This exchanges the pattern space (which holds the most recently read line) and the hold space.
When this is done, the hold space will contain the most recently read line and the pattern space will contain the previous line.
(The exception is when we have just read the first line. In that case, the hold space will have the first line and the pattern space will be empty.)
${s/.*/carrots/;p;x}
When we are on the last line, indicated by the $, the pattern space holds the second to last line and we can perform whatever substitutions or other commands that we like. When we are done, we print the second to last line with p. Lastly, we swap pattern and hold space again with x so that the pattern space will again contain the last line. sed will print this because, by default, at the end of the commands, sed prints whatever is in the pattern space.
1d
When we are on the first line, indicated by the 1, the patten space is empty (because there was no previous line) and we delete it (d).
A still simpler method
This method is easy to understand at the cost of slower execution speed:
$ tac file.txt | sed '2 {/Sweet pepper/s/Sweet/Green/}' | tac
cabbage
spinach
collard greens
corn salad
Green pepper
kale
And, for carrots:
$ tac file.txt | sed '2 s/.*/carrots/' | tac
cabbage
spinach
collard greens
corn salad
carrots
kale
How it works: Here, we use tac to reverse the order of the lines. Observe:
$ tac file.txt
kale
Sweet pepper
corn salad
collard greens
spinach
cabbage
In this way, the second-to-last line becomes line number 2. Thus, we just simply tell sed to operate on line number 2. Afterward, we use tac again to put the lines but in correct order.

You might find an awk script easier to understand, maintain, port, etc.:
$ awk 'NR==FNR{tgt=NR-1;next} (FNR==tgt) && /Sweet pepper/ { $1="green" } 1' file file
cabbage
spinach
collard greens
corn salad
green pepper
kale
$ awk 'NR==FNR{tgt=NR-1;next} (FNR==tgt) { $0="carrots" } 1' file file
cabbage
spinach
collard greens
corn salad
carrots
kale
Want to change the line 3 before the end instead of the line 1 before the end? That's just the simple, obvious tweak to change -1 to -3:
$ awk 'NR==FNR{tgt=NR-3;next} (FNR==tgt) { $0="carrots" } 1' file file
cabbage
spinach
carrots
corn salad
Sweet pepper
kale

awk solution
$ cat 38649053
cabbage
spinach
collard greens
corn salad
Sweet pepper
kale
$tac 38649053 | awk 'NR==2{if($0=="Sweet pepper")#is record sweet pepper?
{
$1="green"} #changing sweet to green , note $1 is the first field
else{
$0="carrot" # $0 is the whole record which replaced by carrot
}}
{record[NR]=$0} #adding each record to an record number-indexed array
END{ #printing the records in reverse at the end
i=NR;for(;i>=1;i--)print record[i]}
'
cabbage
spinach
collard greens
corn salad
green pepper
kale
Sed solution
$ cat 38649053
cabbage
spinach
collard greens
corn salad
Sweet pepper
kale
$ lines=$(wc -l <38649053) # lines contains the total # of lines
$ ((s2l=--lines)) # storing the second to last line number to s2l
$ sed -E "${s2l}"'{/^Sweet pepper$/!s/.*/carrot/;/^Sweet pepper$/s/Sweet/green/}' 38649053
# applying the sed to the required line
cabbage
spinach
collard greens
corn salad
green pepper
kale

Related

Reorder "interesting" pieces of text using sed

I have a file named file, whose content is
noise
noise
X noise STUFF1 noise STUFF2 noise
noise
Y noise STUFF3 noise
noise
and I assert that X and Y are distinct, that each occur once in file, and that X occurs first.
I'm able to issue a sed command to extract the first pieces of information, the like of
$ sed -n '/X/s/\(.*\)\(…\)\(.*\)\(…\)/\2 \4/p' < file
STUFF1 STUFF2
$
and a similar one to extract STUFF3 (¹), but what I'd really like to do is to find the right sed incantation so that
$ sed … < file
STUFF3 STUFF1 STUFF2
$
(and possibly learn, at last! how sed's hold buffer works).
(1) This is not a question on regular expression, I know how to insulate the pieces of text that I need. I need to save the info I've collected and output it at the right time.
Using sed
$ sed -n '/^X/{s/.[^[:upper:]]*\([[:alnum:]]*\)/\1 /g;h};/^Y/{s/.[^[:upper:]]*\([[:alnum:]]*\)/\1 /g;G;s/\n//p}' file
STUFF3 STUFF1 STUFF2
$ cat script.sed
/^X/{ #Match line beginning with X
s/.[^[:upper:]]*\([[:alnum:]]*\)/\1 /g #As you know how to extract what you need, this is just for your sample data to extract needed strings
h #Retain the output of the substitution in the hold buffer
}
/^Y/{ #Match line beginning with Y
s/.[^[:upper:]]*\([[:alnum:]]*\)/\1 /g #Same as above
G #Append the contents of the hold space
s/\n//p #Remov the new line
}
sed -nf script.sed file
STUFF3 STUFF1 STUFF2
sed -n ' # Do not print by default
/X/{
# pattern space holds 'X noise STUFF1 noise STUFF2 noise'
s/.*\(STUFF1).*\(STUFF2\).*/\1 \2/
# pattern space holds 'STUFF1 STUFF2'
# add stuff from pattern space to hold space with __leading newline__
H
# hold space holds '\nSTUFF1 STUFF2'
# use l to inspect
d
}
/Y/{
s/.*\(…\).*/\1/p
H
# hold space holds '\nSTUFF1 STUFF2\nSTUFF3'
d
}
${ # last line?
# switch hold space with pattern space
x
# we have '\nSTUFF1 STUFF2\nSTUFF3' in paterrn space, let's make it nice with spaces
s/\n/ /g
s/ */ /g
s/^ *//g
s/ *$//g
# print it
p
}
'
This might work for you (GNU sed):
sed -En '/^X/h;/^Y/{G;s/\s+/ /g;s/.*/echo "&"|cut -d" " -f3,7,9/ep}' file
Make a copy of the line starting X in the hold space.
Append the copy to a line starting Y.
Replace one or more white spaces by a space globally on the above line(s).
Replace the contents of that line by required columns using the cut command.

sed or awk: delete/comment n lines following a pattern before 3 lines

To delete/comment 3 lines befor a pattern (including the line with the pattern):
how can i achive it through sed command
Ref:
sed or awk: delete n lines following a pattern
the above ref blog help to achive the this with after a pattern match but i need to know before match
define host{
use xxx;
host_name pattern;
alias yyy;
address zzz;
}
the below sed command will comment the '#' after the pattern match for example
sed -e '/pattern/,+3 s/^/#/' file.cfg
define host{
use xxx;
#host_name pattern;
#alias yyy;
#address zzz;
#}
like this how can i do this for the before pattern?
can any one help me to resolve this
If tac is allowed :
tac|sed -e '/pattern/,+3 s/^/#/'|tac
If tac isn't allowed :
sed -e '1!G;h;$!d'|sed -e '/pattern/,+3 s/^/#/'|sed -e '1!G;h;$!d'
(source : http://sed.sourceforge.net/sed1line.txt)
Reverse the file, comment the 3 lines after, then re-reverse the file.
tac file | sed '/pattern/ {s/^/#/; N; N; s/\n/&#/g;}' | tac
#define host{
#use xxx;
#host_name pattern;
alias yyy;
address zzz;
}
Although I think awk is a little easier to read:
tac file | awk '/pattern/ {c=3} c-- > 0 {$0 = "#" $0} 1' | tac
This might work for you (GNU sed):
sed ':a;N;s/\n/&/3;Ta;/pattern[^\n]*$/s/^/#/mg;P;D' file
Gather up 4 lines in the pattern space and if the last line contains pattern insert # at the beginning of each line in the pattern space.
To delete those 4 lines, use:
sed ':a;N;s/\n/&/3;Ta;/pattern[^\n]*$/d;P;D' file
To delete the 3 lines before pattern but not the line containing pattern use:
sed ':a;N;s/\n/&/3;Ta;/pattern[^\n]*$/s/.*\n//;P;D'

xargs and sed to extract specific lines

I want to extract lines that have a particular pattern, in a certain column. For example, in my 'input.txt' file, I have many columns. I want to search the 25th column for 'foobar', and extract only those lines that have 'foobar' in the 25th column. I cannot do:
grep foobar input.txt
because other columns may also have 'foobar', and I don't want those lines. Also:
the 25th column will have 'foobar' as part of a string (i.e. it could be 'foobar ; muller' or 'max ; foobar ; john', or 'tom ; foobar35')
I would NOT want 'tom ; foobar35'
The word in column 25 must be an exact match for 'foobar' (and ; so using awk $25=='foobar' is not an option.
In other words, if column 25 had the following lines:
foobar ; muller
max ; foobar ; john
tom ; foobar35
I would want only lines 1 & 2.
How do I use xargs and sed to extract these lines? I am stuck at:
cut -f25 input.txt | grep -nw foobar | xargs -I linenumbers sed ???
thanks!
Do not use xargs and sed, use the other tool common on so many machines and do this:
awk '{if($25=="foobar"){print NR" "$0}}' input.txt
print NR prints the line number of the current match so the first column of the output will be the line number.
print $0 prints the current line. Change it to print $25 if you only want the matching column. If you only want the output, use this:
awk '{if($25=="foobar"){print $0}}' input.txt
EDIT1 to match extended question:
Use what #shellter and #Jotne suggested but add string delimiters.
awk -vFPAT="([^ ]*)|('[^']*')" -vOFS=' ' '$25~/foobar/' input.txt
[^ ]* matches all characters that are not a space.
'[^']*' matches everything inside single quotes.
EDIT2 to exclude everything but foobar:
awk -vFPAT="([^ ]*)|('[^']*')" -vOFS=' ' "\$25~/[;' ]foobar[;' ]/" input.txt
[;' ] only allows ;, ' and in front and after foobar.
Tested with this file:
1 "1 ; 1" 4
2 'kom foobar' 33
3 "ll;3" 3
4 '1; foobar' asd
7 '5 ;foobar' 2
7 '5;foobar' 0
2 'kom foobar35' 33
2 'kom ; foobar' 33
2 'foobar ; john' 33
2 'foobar;paul' 33
2 'foobar1;paul' 33
2 'foobarli;paul' 33
2 'afoobar;paul' 33
and this command awk -vFPAT="([^ ]*)|('[^']*')" -vOFS=' ' "\$2~/[;' ]foobar[;' ]/" input.txt
To get the line with foobar as part of the 25 field.
awk '$25=="foobar"' input.txt
$25 25th filed
== equal to
"foobar"
Since no action spesified, print the complete line will be done, same as {print $0}
Or
awk '$25~/^foobar$/' input.txt
This might work for you (GNU sed):
sed -En 's/\S+/\n&\n/25;s/\n(.*foobar.*)\n/\1/p' file
Surround the 25th field by newlines and pattern match for foobar between newlines.
If you only want to match the word foobar use:
sed -En 's/\S+/\n&\n/25;s/\n(.*\<foobar\>.*)\n/\1/p' file

Find duplicate records in file

I have a text file with lines like below:
name1#domainx.com, name1
info#domainy.de, somename
name2#domainz.com, othername
name3#domainx.com, name3
How can I find duplicate domains like domainx.com with sed or awk?
With GNU awk you can do:
$ awk -F'[#,]' '{a[$2]++}END{for(k in a) print a[k],k}' file
1 domainz.com
2 domainx.com
1 domainy.de
You can use sort to order the output i.e. ascending numerical with -n:
$ awk -F'[#,]' '{a[$2]++}END{for(k in a) print a[k],k}' file | sort -n
1 domainy.de
1 domainz.com
2 domainx.com
Or just to print duplicate domains:
$ awk -F'[#,]' '{a[$2]++}END{for(k in a)if (a[k]>1) print k}' file
domainx.com
Here:
sed -n '/#domainx.com/ p' yourfile.txt
(Actually is grep what you should use for that)
Would you like to count them? add an |nl to the end.
Using that minilist you gave, using the sed line with |nl, outputs this:
1 name1#domainx.com, name1
2 name3#domainx.com, name3
What if you need to count how many repetitions have each domain? For that try this:
for line in `sed -n 's/.*#\([^,]*\).*/\1/p' yourfile.txt|sort|uniq` ; do
echo "$line `grep -c $line yourfile.txt`"
done
The output of that is:
domainx.com 2
domainy.de 1
domainz.com 1
Print only duplicate domains
awk -F"[#,]" 'a[$2]++==1 {print $2}'
domainx.com
Print a "*" in front of line that are listed duplicated.
awk -F"[#,]" '{a[$2]++;if (a[$2]>1) f="* ";print f$0;f=x}'
name1#domainx.com, name1
info#domainy.de, somename
name2#domainz.com, othername
* name3#domainx.com, name3
This version paints all line with duplicate domain in color red
awk -F"[#,]" '{a[$2]++;b[NR]=$0;c[NR]=$2} END {for (i=1;i<=NR;i++) print ((a[c[i]]>1)?"\033[1;31m":"\033[0m") b[i] "\033[0m"}' file
name1#domainx.com, name1 <-- This line is red
info#domainy.de, somename
name2#domainz.com, othername
name3#domainx.com, name3 <-- This line is red
Improved version (reading the file twice):
awk -F"[#,]" 'NR==FNR{a[$2]++;next} a[$2]>1 {$0="\033[1;31m" $0 "\033[0m"}1' file file
name1#domainx.com, name1 <-- This line is red
info#domainy.de, somename
name2#domainz.com, othername
name3#domainx.com, name3 <-- This line is red
If you have GNU grep available, you can use the PCRE matcher to do a positive look-behind to extract the domain name. After that sort and uniq can find duplicate instances:
<infile grep -oP '(?<=#)[^,]*' | sort | uniq -d
Output:
domainx.com

divide each line in equal part

I would be happy if anyone can suggest me command (sed or AWK one line command) to divide each line of file in equal number of part. For example divide each line in 4 part.
Input:
ATGCATHLMNPHLNTPLML
Output:
ATGCA THLMN PHLNT PLML
This should work using GNU sed:
sed -r 's/(.{4})/\1 /g'
-r is needed to use extended regular expressions
.{4} captures every four characters
\1 refers to the captured group which is surrounded by the parenthesis ( ) and adds a space behind this group
g makes sure that the replacement is done as many times as possible on each line
A test; this is the input and output in my terminal:
$ echo "ATGCATHLMNPHLNTPLML" | sed -r 's/(.{4})/\1 /g'
ATGC ATHL MNPH LNTP LML
I suspect awk is not the best tool for this, but:
gawk --posix '{ l = sprintf( "%d", 1 + (length()-1)/4);
gsub( ".{"l"}", "& " ) } 1' input-file
If you have a posix compliant awk you can omit the --posix, but --posix is necessary for gnu awk and since that seems to be the most commonly used implementation I've given the solution in terms of gawk.
This might work for you (GNU sed):
sed 'h;s/./X/g;s/^\(.*\)\1\1\1/\1 \1 \1 \1/;G;s/\n/&&/;:a;/^\n/bb;/^ /s/ \(.*\n.*\)\n\(.\)/\1 \n\2/;ta;s/^.\(.*\n.*\)\n\(.\)/\1\2\n/;ta;:b;s/\n//g' file
Explanation:
h copy the pattern space (PS) to the hold space (HS)
s/./X/g replace every character in the HS with the same non-space character (in this case X)
s/^\(.*\)\1\1\1/\1 \1 \1 \1/ split the line into 4 parts (space separated)
G append a newline followed by the contents of the HS to the PS
s/\n/&&/ double the newline (to be later used as markers)
:a introduce a loop namespace
/^\n/bb if we reach a newline we are done and branch to the b namespace
/^ /s/ \(.*\n.*\)\n\(.\)/\1 \n\2/;ta; if the first character is a space add a space to the real line at this point and repeat
s/^.\(.*\n.*\)\n\(.\)/\1\2\n/;ta any other character just bump along and repeat
:b;s/\n//g all done just remove the markers and print out the result
This work for any length of line, however is the line is not exactly divisible by 4 the last portion will contain the remainder as well.
perl
perl might be a better choice here:
export cols=4
perl -ne 'chomp; $fw = 1 + int length()/$ENV{cols}; while(/(.{1,$fw})/gm) { print $1 . " " } print "\n"'
This re-calculates field-width for every line.
coreutils
A GNU coreutils alternative, field-width is chosen based on the first line of infile:
cols=4
len=$(( $(head -n1 infile | wc -c) - 1 ))
fw=$(echo "scale=0; 1 + $len / 4" | bc)
cut_arg=$(paste -d- <(seq 1 $fw 19) <(seq $fw $fw $len) | head -c-1 | tr '\n' ',')
Value of cut_arg is in the above case:
1-5,6-10,11-15,16-
Now cut the line into appropriate chunks:
cut --output-delimiter=' ' -c $cut_arg infile