I have a file with text as follows:
###interest1 moreinterest1### sometext ###interest2###
not-interesting-line
sometext ###interest3###
sometext ###interest4### sometext othertext ###interest5### sometext ###interest6###
I want to extract all strings between ### .
My desired output would be something like this:
interest1 moreinterest1
interest2
interest3
interest4
interest5
interest6
I have tried the following:
grep '###' file.txt | sed -e 's/.*###\(.*\)###.*/\1/g'
This almost works but only seems to grab the first instance per line, so the first line in my output only grabs
interest1 moreinterest1
rather than
interest1 moreinterest1
interest2
Here is a single awk command to achieve this that makes ### field separator and prints each even numbered field:
awk -F '###' '{for (i=2; i<NF; i+=2) print $i}' file
interest1 moreinterest1
interest2
interest3
interest4
interest5
interest6
Here is an alternative grep + sed solution:
grep -oE '###[^#]*###' file | sed -E 's/^###|###$//g'
This assumes there are no # characters in between ### markers.
With GNU awk for multi-char RS:
$ awk -v RS='###' '!(NR%2)' file
interest1 moreinterest1
interest2
interest3
interest4
interest5
interest6
You can use pcregrep:
pcregrep -o1 '###(.*?)###' file
The regex - ###(.*?)### - matches ###, then captures into Group 1 any zero o more chars other than line break chars, as few as possible, and ### then matches ###.
o1 option will output Group 1 value only.
See the regex demo online.
sed 't x
s/###/\
/;D; :x
s//\
/;t y
D;:y
P;D' file
Replacing "###" with newline, D, then conditionally branching to P if a second replacement of "###" is successful.
This might work for you (GNU sed):
sed -n 's/###/\n/g;/[^\n]*\n/{s///;P;D}' file
Replace all occurrences of ###'s by newlines.
If a line contains a newline, remove any characters before and including the first newline, print the details up to and including the following newline, delete those details and repeat.
I have one command which is used to extract lines between two string patterns 'string1' and 'string2'. This is stored in variable called 'var1'.
var1=$(awk '/string1/{flag=1; next} /string2/{flag=0} flag' text.txt)
This command works well and the output is a set of lines.
Do you hear the people sing?
Singing a song of angry men?
It is the music of a people
Who will not be slaves again
I want the output of the above command to be inserted after a string pattern 'string3' in another file called stat.txt. I used sed as follows
sed '/string3/a'$var1'' stat.txt
I am having trouble getting the new output. Here, the $var1 seems to be working partially i.e. only one line -
string3
Do you hear the people sing?
Any other suggestions to solve this?
I would be tempted to use sed to extract the lines, and awk to insert them into the other text:
lines=$(sed -n '/string1/,/string2/ p' text.txt)
awk -v new="$lines" '{print} /string3/ {print new}' stat.txt
or perhaps both tasks in a single awk call
awk '
NR == FNR && /string1/ {flag = 1}
NR == FNR && /string2/ {flag = 0}
NR == FNR && flag {lines = lines $0 ORS}
NR == FNR {next}
{print}
/string3/ {printf "%s", lines} # it already ends with a newline
' text.txt stat.txt
It's a data format problem...
Appending a multi-line block of text with the sed append command requires that every line in the block to be appended ends with a \ -- except for the last line of that block. So if we take the two lines of code that didn't work in the question, and reformat the text as required by the append command, the original code should work as expected:
var1=$(awk '/string1/{flag=1; next} /string2/{flag=0} flag' text.txt)
var1="$(sed '$!s/$/\\/' <<< "$var1")"
sed '/string3/a'$var1'' stat.txt
Note that the 2nd line above contains a bashism. A more portable version would be:
var1="$(echo "$var1" | sed '$!s/$/\\/')"
Either variant would convert $var1 to:
Do you hear the people sing?\
Singing a song of angry men?\
It is the music of a people\
Who will not be slaves again
There is a command to replace bbb to ccc, if the line contains abc.
echo "abc yyy bbb xzy" | sed -e "/abc/ s/bbb/ccc/"
Does anyone know what the command would be, if I want to do the replacement, only if the line contains both abc and xyz?
Because it doesn't matter which one is matched first, you can look for abc first, then make the substitution if also xyz matches1:
sed '/abc/{/xyz/s/bbb/ccc/}'
or, considerably less elegant:
sed '/abc.*xyz\|xyz.*abc/s/bbb/ccc/'
but no nesting.
1BSD sed requires a semicolon before the closing brace.
Just use awk and you can code it as you'd write it, with && between the conditions:
awk '/abc/ && /xyz/ { sub(/bbb/,"ccc") } 1'
Try writing:
awk '(/abc/ && /xyz/) || (/def/ && (/ghi/ || /klm/)) { sub(/bbb/,"ccc") } 1'
or any other more interesting compound condition with sed. Awk is available everywhere sed is and the above is fully portable and will work as-is in every awk in every UNIX installation.
I am trying to Replace a character with #(hash symble) only in 5th & 6th field.
eg. I have to replace 'Z' with '#' only in 5th & 6th field (using perl or AWK script). And remaining fields containng 'Z' symbol should not be affected.
(just I'm updating the post to replace double quote(") instead of Z by #. Can I achive this? thanks for precious help)
eg: i/p file:
aa",bb,ccc,ddd,eee",ddd",fff
aa1",ba1,ccc1,"ddd1,eee"1,ddd1,fff1
z,aa2,bb2",ccc2,ddd2","eee2",ddd2,fff2"
Expected O/p file:
aa",bb,ccc,ddd,eee#,ddd#,fff
aa1",ba1,ccc1,#ddd1,eee#1,ddd1,fff1
aa2,bb2",ccc2,ddd2#,#eee2#,ddd2,fff2"
Thanks.
$ awk 'BEGIN{FS=OFS=","} {for (i=5;i<=6;i++) gsub(/Z/,"#",$i)} 1' file
x,aaZ,bb,ccc,ddd,eee#,dddZ,fff
y,aa1Z,ba1,ccc1,#ddd1,eee#1,ddd1,fff1
z,aa2,bb2Z,ccc2,ddd2#,#eee2,ddd2,fff2Z
Since its only two filed, loop can be omitted.
awk -F, -v OFS=, '{gsub(/Z/,"#",$5);gsub(/Z/,"#",$6)} 1' file
x,aaZ,bb,ccc,ddd,eee#,dddZ,fff
y,aa1Z,ba1,ccc1,#ddd1,eee#1,ddd1,fff1
z,aa2,bb2Z,ccc2,ddd2#,#eee2,ddd2,fff2Z
To replace " in fifth and sixth field:
awk -F, -v OFS=, '{gsub(/\"/,"#",$5);gsub(/\"/,"#",$6)} 1' file
aa",bb,ccc,ddd,eee#,ddd#,fff
aa1",ba1,ccc1,"ddd1,eee#1,ddd1,fff1
z,aa2,bb2",ccc2,ddd2#,#eee2#,ddd2,fff2"
Here is a Perl way to do the job:
perl -anF, -e '$"=","; s/Z/#/ for (#F)[4,5];print"#F";' < in1.txt
If you have mutiple Z in a field, you could use:
perl -anF, -e '$"=","; s/Z/#/g for (#F)[4,5];print"#F";' < in1.txt
Output:
aaZ,bb,ccc,ddd,eee#,ddd#,fff
aa1Z,ba1,ccc1,Zddd1,eee#1,ddd1,fff1
aa2,bb2Z,ccc2,ddd2Z,#eee2,ddd2,fff2Z
Edit according to comment:
in1.txt
aa",bb,ccc,ddd,eee",ddd",fff
aa1",ba1,ccc1,"ddd1,eee"1,ddd1,fff1
aa2,bb2",ccc2,ddd2","eee2,ddd2,fff2"
Command:
perl -anF'','' -e '$"=",";s/"/#/ for (#F)[4,5];print"#F";' < in1.txt
result:
aa",bb,ccc,ddd,eee#,ddd#,fff
aa1",ba1,ccc1,"ddd1,eee#1,ddd1,fff1
aa2,bb2",ccc2,ddd2",#eee2,ddd2,fff2"
I have a file split in blocks like the following:
AGGATAGGTTTTGGTGTTTGAGGTTAATTTTGTTTTATTTTGGGG
AGGTAGTTATTATTTTTTTGGTTTTTAGTATTTAATTGAGTGTTT
ATGTAGGTGTTTATGTATTAGTTTTTTTTAGGTTTAGGGTGTTGT
ATTTAGGTTTTGTGTTTTGTGTATTATTGAATTTAATTAAAGTTA
AGGATAGGTTTTGGTGTTTGAGGTTAATTTTGTTTTATTTTTTTT
AGTTTTTTTTTATTTGTCGGGATATTTTAGTTGATTTTAGATTGC
TATATTTTTAGTTTCGATTCGTCGTAAGTTTTATTTTTTTTTAAT
GGATAGGTTTTGGTGTTTGAGGTTAATTTTGTTTTATTTTTTTTT
I've truncated/wrapped the lines for clarity's sake, but imagine very long lines. The point of my question is that I want a final file that looks like this:
AGGATAGGTTTTGGTGTTTGAGGTTAATTTTGTTTTATTTTGGGGAGGATAGGTTTTGGTGTTTGAGGTTAATTTTGTTTTATTTTTTTT
AGGTAGTTATTATTTTTTTGGTTTTTAGTATTTAATTGAGTGTTTAGTTTTTTTTTATTTGTCGGGATATTTTAGTTGATTTTAGATTGC
ATGTAGGTGTTTATGTATTAGTTTTTTTTAGGTTTAGGGTGTTGTTATATTTTTAGTTTCGATTCGTCGTAAGTTTTATTTTTTTTTAAT
ATTTAGGTTTTGTGTTTTGTGTATTATTGAATTTAATTAAAGTTAGGATAGGTTTTGGTGTTTGAGGTTAATTTTGTTTTATTTTTTTTT
Where this new block has:
the same number of lines as the initial blocks,
each of the lines of the resulting block is a concatenation of the lines with the same line-number in the initial blocks.
this concatenation should be in-order (i.e. "1st line of 1st block" + "1st line of 2nd block", etc
Is it possible to achieve this final block using sed and/or awk, could you show me how it could be done?
In bash with paste:
$ paste <(head -4 file) <(tail -4 file) | tr -d '\t'
AGGATAGGTTTTGGTGTTTGAGGTTAATTTTGTTTTATTTTGGGGAGGATAGGTTTTGGTGTTTGAGGTTAATTTTGTTTTATTTTTTTT
AGGTAGTTATTATTTTTTTGGTTTTTAGTATTTAATTGAGTGTTTAGTTTTTTTTTATTTGTCGGGATATTTTAGTTGATTTTAGATTGC
ATGTAGGTGTTTATGTATTAGTTTTTTTTAGGTTTAGGGTGTTGTTATATTTTTAGTTTCGATTCGTCGTAAGTTTTATTTTTTTTTAAT
ATTTAGGTTTTGTGTTTTGTGTATTATTGAATTTAATTAAAGTTAGGATAGGTTTTGGTGTTTGAGGTTAATTTTGTTTTATTTTTTTTT
try this:
awk -vOFS="" '$0{a[NR]=$0}END{for(i=1;i<=NR/2;i++)print a[i],a[i+5]}' file
test with your example:
kent$ cat tmp.txt
AGGATAGGTTTTGGTGTTTGAGGTTAATTTTGTTTTATTTTGGGG
AGGTAGTTATTATTTTTTTGGTTTTTAGTATTTAATTGAGTGTTT
ATGTAGGTGTTTATGTATTAGTTTTTTTTAGGTTTAGGGTGTTGT
ATTTAGGTTTTGTGTTTTGTGTATTATTGAATTTAATTAAAGTTA
AGGATAGGTTTTGGTGTTTGAGGTTAATTTTGTTTTATTTTTTTT
AGTTTTTTTTTATTTGTCGGGATATTTTAGTTGATTTTAGATTGC
TATATTTTTAGTTTCGATTCGTCGTAAGTTTTATTTTTTTTTAAT
GGATAGGTTTTGGTGTTTGAGGTTAATTTTGTTTTATTTTTTTTT
kent$ awk -vOFS="" '$0{a[NR]=$0}END{for(i=1;i<=NR/2;i++)print a[i],a[i+5]}' tmp.txt
AGGATAGGTTTTGGTGTTTGAGGTTAATTTTGTTTTATTTTGGGGAGGATAGGTTTTGGTGTTTGAGGTTAATTTTGTTTTATTTTTTTT
AGGTAGTTATTATTTTTTTGGTTTTTAGTATTTAATTGAGTGTTTAGTTTTTTTTTATTTGTCGGGATATTTTAGTTGATTTTAGATTGC
ATGTAGGTGTTTATGTATTAGTTTTTTTTAGGTTTAGGGTGTTGTTATATTTTTAGTTTCGATTCGTCGTAAGTTTTATTTTTTTTTAAT
ATTTAGGTTTTGTGTTTTGTGTATTATTGAATTTAATTAAAGTTAGGATAGGTTTTGGTGTTTGAGGTTAATTTTGTTTTATTTTTTTTT
awk -F'\n' -v RS= '{for (i=1;i<=NF;i++) {str[i] = str[i] $i} END {for (i=1;i<=NF;i++) print str[i]}' file