sed + need to remove each line in file that begin with ### - sed

How to remove line that begins with three #
For example need to delete all the following lines: from file
1 ### bla bla bal
2 ###blablabla
3 ### blabla
.
.
.
THX
Yael

cat file | sed '/^###/d'

you can use awk as well
awk '!/^[ \t]*###/' file

Related

xargs and sed to extract specific lines

I want to extract lines that have a particular pattern, in a certain column. For example, in my 'input.txt' file, I have many columns. I want to search the 25th column for 'foobar', and extract only those lines that have 'foobar' in the 25th column. I cannot do:
grep foobar input.txt
because other columns may also have 'foobar', and I don't want those lines. Also:
the 25th column will have 'foobar' as part of a string (i.e. it could be 'foobar ; muller' or 'max ; foobar ; john', or 'tom ; foobar35')
I would NOT want 'tom ; foobar35'
The word in column 25 must be an exact match for 'foobar' (and ; so using awk $25=='foobar' is not an option.
In other words, if column 25 had the following lines:
foobar ; muller
max ; foobar ; john
tom ; foobar35
I would want only lines 1 & 2.
How do I use xargs and sed to extract these lines? I am stuck at:
cut -f25 input.txt | grep -nw foobar | xargs -I linenumbers sed ???
thanks!
Do not use xargs and sed, use the other tool common on so many machines and do this:
awk '{if($25=="foobar"){print NR" "$0}}' input.txt
print NR prints the line number of the current match so the first column of the output will be the line number.
print $0 prints the current line. Change it to print $25 if you only want the matching column. If you only want the output, use this:
awk '{if($25=="foobar"){print $0}}' input.txt
EDIT1 to match extended question:
Use what #shellter and #Jotne suggested but add string delimiters.
awk -vFPAT="([^ ]*)|('[^']*')" -vOFS=' ' '$25~/foobar/' input.txt
[^ ]* matches all characters that are not a space.
'[^']*' matches everything inside single quotes.
EDIT2 to exclude everything but foobar:
awk -vFPAT="([^ ]*)|('[^']*')" -vOFS=' ' "\$25~/[;' ]foobar[;' ]/" input.txt
[;' ] only allows ;, ' and in front and after foobar.
Tested with this file:
1 "1 ; 1" 4
2 'kom foobar' 33
3 "ll;3" 3
4 '1; foobar' asd
7 '5 ;foobar' 2
7 '5;foobar' 0
2 'kom foobar35' 33
2 'kom ; foobar' 33
2 'foobar ; john' 33
2 'foobar;paul' 33
2 'foobar1;paul' 33
2 'foobarli;paul' 33
2 'afoobar;paul' 33
and this command awk -vFPAT="([^ ]*)|('[^']*')" -vOFS=' ' "\$2~/[;' ]foobar[;' ]/" input.txt
To get the line with foobar as part of the 25 field.
awk '$25=="foobar"' input.txt
$25 25th filed
== equal to
"foobar"
Since no action spesified, print the complete line will be done, same as {print $0}
Or
awk '$25~/^foobar$/' input.txt
This might work for you (GNU sed):
sed -En 's/\S+/\n&\n/25;s/\n(.*foobar.*)\n/\1/p' file
Surround the 25th field by newlines and pattern match for foobar between newlines.
If you only want to match the word foobar use:
sed -En 's/\S+/\n&\n/25;s/\n(.*\<foobar\>.*)\n/\1/p' file

How to replace a block of code between two patterns with blank lines?

I am trying replace a block of code between two patterns with blank lines
Tried using below command
sed '/PATTERN-1/,/PATTERN-2/d' input.pl
But it only removes the lines between the patterns
PATTERN-1 : "=head"
PATTERN-2 : "=cut"
input.pl contains below text
=head
hello
hello world
world
morning
gud
=cut
Required output :
=head
=cut
Can anyone help me on this?
$ awk '/=cut/{f=0} {print (f ? "" : $0)} /=head/{f=1}' file
=head
=cut
To modify the given sed command, try
$ sed '/=head/,/=cut/{//! s/.*//}' ip.txt
=head
=cut
//! to match other than start/end ranges, might depend on sed implementation whether it dynamically matches both the ranges or statically only one of them. Works on GNU sed
s/.*// to clear these lines
awk '/=cut/{found=0}found{print "";next}/=head/{found=1}1' infile
# OR
# ^ to take care of line starts with regexp
awk '/^=cut/{found=0}found{print "";next}/^=head/{found=1}1' infile
Explanation:
awk '/=cut/{ # if line contains regexp
found=0 # set variable found = 0
}
found{ # if variable found is nonzero value
print ""; # print ""
next # go to next line
}
/=head/{ # if line contains regexp
found=1 # set variable found = 1
}1 # 1 at the end does default operation
# print current line/row/record
' infile
Test Results:
$ cat infile
=head
hello
hello world
world
morning
gud
=cut
$ awk '/=cut/{found=0}found{print "";next}/=head/{found=1}1' infile
=head
=cut
This might work for you (GNU sed):
sed '/=head/,/=cut/{//!z}' file
Zap the lines between =head and =cut.

Joining lines in order of different blocks in the same text file

I have a file split in blocks like the following:
AGGATAGGTTTTGGTGTTTGAGGTTAATTTTGTTTTATTTTGGGG
AGGTAGTTATTATTTTTTTGGTTTTTAGTATTTAATTGAGTGTTT
ATGTAGGTGTTTATGTATTAGTTTTTTTTAGGTTTAGGGTGTTGT
ATTTAGGTTTTGTGTTTTGTGTATTATTGAATTTAATTAAAGTTA
AGGATAGGTTTTGGTGTTTGAGGTTAATTTTGTTTTATTTTTTTT
AGTTTTTTTTTATTTGTCGGGATATTTTAGTTGATTTTAGATTGC
TATATTTTTAGTTTCGATTCGTCGTAAGTTTTATTTTTTTTTAAT
GGATAGGTTTTGGTGTTTGAGGTTAATTTTGTTTTATTTTTTTTT
I've truncated/wrapped the lines for clarity's sake, but imagine very long lines. The point of my question is that I want a final file that looks like this:
AGGATAGGTTTTGGTGTTTGAGGTTAATTTTGTTTTATTTTGGGGAGGATAGGTTTTGGTGTTTGAGGTTAATTTTGTTTTATTTTTTTT
AGGTAGTTATTATTTTTTTGGTTTTTAGTATTTAATTGAGTGTTTAGTTTTTTTTTATTTGTCGGGATATTTTAGTTGATTTTAGATTGC
ATGTAGGTGTTTATGTATTAGTTTTTTTTAGGTTTAGGGTGTTGTTATATTTTTAGTTTCGATTCGTCGTAAGTTTTATTTTTTTTTAAT
ATTTAGGTTTTGTGTTTTGTGTATTATTGAATTTAATTAAAGTTAGGATAGGTTTTGGTGTTTGAGGTTAATTTTGTTTTATTTTTTTTT
Where this new block has:
the same number of lines as the initial blocks,
each of the lines of the resulting block is a concatenation of the lines with the same line-number in the initial blocks.
this concatenation should be in-order (i.e. "1st line of 1st block" + "1st line of 2nd block", etc
Is it possible to achieve this final block using sed and/or awk, could you show me how it could be done?
In bash with paste:
$ paste <(head -4 file) <(tail -4 file) | tr -d '\t'
AGGATAGGTTTTGGTGTTTGAGGTTAATTTTGTTTTATTTTGGGGAGGATAGGTTTTGGTGTTTGAGGTTAATTTTGTTTTATTTTTTTT
AGGTAGTTATTATTTTTTTGGTTTTTAGTATTTAATTGAGTGTTTAGTTTTTTTTTATTTGTCGGGATATTTTAGTTGATTTTAGATTGC
ATGTAGGTGTTTATGTATTAGTTTTTTTTAGGTTTAGGGTGTTGTTATATTTTTAGTTTCGATTCGTCGTAAGTTTTATTTTTTTTTAAT
ATTTAGGTTTTGTGTTTTGTGTATTATTGAATTTAATTAAAGTTAGGATAGGTTTTGGTGTTTGAGGTTAATTTTGTTTTATTTTTTTTT
try this:
awk -vOFS="" '$0{a[NR]=$0}END{for(i=1;i<=NR/2;i++)print a[i],a[i+5]}' file
test with your example:
kent$ cat tmp.txt
AGGATAGGTTTTGGTGTTTGAGGTTAATTTTGTTTTATTTTGGGG
AGGTAGTTATTATTTTTTTGGTTTTTAGTATTTAATTGAGTGTTT
ATGTAGGTGTTTATGTATTAGTTTTTTTTAGGTTTAGGGTGTTGT
ATTTAGGTTTTGTGTTTTGTGTATTATTGAATTTAATTAAAGTTA
AGGATAGGTTTTGGTGTTTGAGGTTAATTTTGTTTTATTTTTTTT
AGTTTTTTTTTATTTGTCGGGATATTTTAGTTGATTTTAGATTGC
TATATTTTTAGTTTCGATTCGTCGTAAGTTTTATTTTTTTTTAAT
GGATAGGTTTTGGTGTTTGAGGTTAATTTTGTTTTATTTTTTTTT
kent$ awk -vOFS="" '$0{a[NR]=$0}END{for(i=1;i<=NR/2;i++)print a[i],a[i+5]}' tmp.txt
AGGATAGGTTTTGGTGTTTGAGGTTAATTTTGTTTTATTTTGGGGAGGATAGGTTTTGGTGTTTGAGGTTAATTTTGTTTTATTTTTTTT
AGGTAGTTATTATTTTTTTGGTTTTTAGTATTTAATTGAGTGTTTAGTTTTTTTTTATTTGTCGGGATATTTTAGTTGATTTTAGATTGC
ATGTAGGTGTTTATGTATTAGTTTTTTTTAGGTTTAGGGTGTTGTTATATTTTTAGTTTCGATTCGTCGTAAGTTTTATTTTTTTTTAAT
ATTTAGGTTTTGTGTTTTGTGTATTATTGAATTTAATTAAAGTTAGGATAGGTTTTGGTGTTTGAGGTTAATTTTGTTTTATTTTTTTTT
awk -F'\n' -v RS= '{for (i=1;i<=NF;i++) {str[i] = str[i] $i} END {for (i=1;i<=NF;i++) print str[i]}' file

Get multi-line text in between horizontal delimiter with sed / awk

I would like to get multi-line text in between horizontal delimiter and ignore anything else before and after the delimiter.
An example would be:-
Some text here before any delimiter
----------
Line 1
Line 2
Line 3
Line 4
----------
Line 1
Line 2
Line 3
Line 4
----------
Some text here after last delimiter
And I would like to get
Line 1
Line 2
Line 3
Line 4
Line 1
Line 2
Line 3
Line 4
How do I do this with awk / sed with regex? Thanks.
You can try this.
file: a.awk:
BEGIN { RS = "-+" }
{
if ( NR > 1 && RT != "" )
{
print $0
}
}
run: awk -f a.awk data_file
If you can comfortably fit the entire file into memory, and if Perl is acceptable instead of awk or sed,
perl -0777 -pe 's/\A.*?\n-{10}\n//s;
s/(.*\n)-{10}\n.*?\Z/\1/s;
s/\n-{10}\n/\n\n\n/g' file >newfile
The main FAQs here are the -0777 option (slurp mode) and the /s (dot matches newlines) regex flag.
This might work for you:
sed '1,/^--*$/d;:a;$!{/\(^\|\n\)--*$/!N;//!ba;s///p};d' file

Concatenate Lines in Bash

Most command-line programs just operate on one line at a time.
Can I use a common command-line utility (echo, sed, awk, etc) to concatenate every set of two lines, or would I need to write a script/program from scratch to do this?
$ cat myFile
line 1
line 2
line 3
line 4
$ cat myFile | __somecommand__
line 1line 2
line 3line 4
sed 'N;s/\n/ /;'
Grab next line, and substitute newline character with space.
seq 1 6 | sed 'N;s/\n/ /;'
1 2
3 4
5 6
$ awk 'ORS=(NR%2)?" ":"\n"' file
line 1 line 2
line 3 line 4
$ paste - - < file
line 1 line 2
line 3 line 4
Not a particular command, but this snippet of shell should do the trick:
cat myFile | while read line; do echo -n $line; [ "${i}" ] && echo && i= || i=1 ; done
You can also use Perl as:
$ perl -pe 'chomp;$i++;unless($i%2){$_.="\n"};' < file
line 1line 2
line 3line 4
Here's a shell script version that doesn't need to toggle a flag:
while read line1; do read line2; echo $line1$line2; done < inputfile