Remove consecutive lines containing the same pattern - sed

I'd like to create a sed (or equivalent) expression that would remove consecutive lines containing a specific character. For instance, I have a list of IPs followed by a colon. If they contain a value the following line(s) would not contain a colon. If there are consecutive lines with colons, the first should be removed (since they're empty), as so:
+159.0.0.0:
+159.0.0.1:
+443/tcp open https
+159.0.0.2:
+159.0.0.3:
+159.0.0.4:
+159.0.0.5:
+80/tcp open http
+443/tcp open https
Desired Result:
+159.0.0.1:
+443/tcp open https
+159.0.0.5:
+80/tcp open http
+443/tcp open https

This might work for you (GNU sed):
sed 'N;/:.*\n.*:/!P;D' file
Keep a moving window of two lines and if both lines contain a : do not print the first.

Another awk:
$ awk '/:/ { p = $0 } $0 !~ /:/ {if (p) {print p} print $0; p = ""} ' file
+159.0.0.1:
+443/tcp open https
+159.0.0.5:
+80/tcp open http
+443/tcp open https

EDIT: To check final line is having colon or not made a bit change to code now too as follows.
awk '!/:/ && prev{print prev ORS $0;prev="";next} {prev=$0} END{if(prev && prev !~ /:/){print prev}}' Input_file
Completely tested on your provided sample, could you please try following and let me know if this helps you.
awk '!/:/ && prev{print prev ORS $0;prev="";next} {prev=$0} END{if(prev){print prev}}' Input_file
Adding a non-one liner form of solution too now.
awk '
!/:/ && prev{
print prev ORS $0;
prev="";
next
}
{
prev=$0
}
END{
if(prev){
print prev}
}' Input_file
Explanation: Adding explanation for above code too now.
awk '
!/:/ && prev{ ##Checking condition here if a line is NOT having colon in it and variable prev is NOT NULL then do following.
print prev ORS $0; ##Printing the value of variable named prev ORS(whose default value is new line) and then current line by $0.
prev=""; ##Nullifying prev variable value here.
next ##Using awk out of the box next keyword which will skip all further statements from here.
}
{
prev=$0 ##Setting value of variable prev to current line here.
}
END{ ##Starting END section of current code here, which will be executed after Input_file is being read.
if(prev){ ##Checking if variable prev is NOT NULL, if yes then do following.
print prev} ##Printing the value of variable prev here.
}' Input_file ##Mentioning Input_file name here.

sed is for s/old/new, THAT IS ALL. This will work with any awk in any shell on any UNIX box:
$ awk '/:/{s=$0 ORS;next} {print s $0; s=""}' file
+159.0.0.1:
+443/tcp open https
+159.0.0.5:
+80/tcp open http
+443/tcp open https
and is trivial to enhance for anything else you might want to do, for example to handle the final line ending in a colon just add an END section to print the last saved colon-ending line, if any:
$ cat file
+159.0.0.0:
+159.0.0.1:
+443/tcp open https
+159.0.0.2:
+159.0.0.3:
+159.0.0.4:
+159.0.0.5:
+80/tcp open http
+443/tcp open https
+159.0.0.6:
$ awk '/:/{s=$0 ORS;next} {print s $0; s=""} END{printf "%s", s}' file
+159.0.0.1:
+443/tcp open https
+159.0.0.5:
+80/tcp open http
+443/tcp open https
+159.0.0.6:

Related

Replace one matched pattern with another in multiline text with sed

I have file with this text:
mirrors:
docker.io:
endpoint:
- "http://registry:5000"
registry:5000:
endpoint:
- "http://registry:5000"
localhost:
endpoint:
- "http://registry:5000"
I need to replace it with this text in POSIX shell script (not bash):
mirrors:
docker.io:
endpoint:
- "http://docker.io"
registry:5000:
endpoint:
- "http://registry:5000"
localhost:
endpoint:
- "http://localhost"
Replace should be done dynamically in all places without hard-coded names. I mean we should take sub-string from a first line ("docker.io", "registry:5000", "localhost") and replace with it sub-string "registry:5000" in a third line.
I've figure out regex, that splits it on 5 groups: (^ )([^ ]*)(:[^"]*"http:\/\/)([^"]*)(")
Then I've tried to use sed to print group 2 instead of 4, but this didn't work: sed -n 's/\(^ \)\([^ ]*\)\(:[^"]*"http:\/\/\)\([^"]*\)\("\)/\1\2\3\2\5/p'
Please help!
This might work for you (GNU sed):
sed -E '1N;N;/\n.*endpoint:.*\n/s#((\S+):.*"http://)[^"]*#\1\2#;P;D' file
Open up a three line window into the file.
If the second line contains endpoint:, replace the last piece of text following http:// with the first piece of text before :
Print/Delete the first line of the window and then replenish the three line window by appending the next line.
Repeat until the end of the file.
Awk would be a better candidate for this, passing in the string to change to as a variable str and the section to change (" docker.io" or " localhost" or " registry:5000") and so:
awk -v findstr=" docker.io" -v str="http://docker.io" '
$0 ~ findstr { dockfound=1 # We have found the section passed in findstr and so we set the dockfound marker
}
/endpoint/ && dockfound==1 { # We encounter endpoint after the dockfound marker is set and so we set the found marker
found=1;
print;
next
}
found==1 && dockfound==1 { # We know from the found and the dockfound markers being set that we need to process this line
match($0,/^[[:space:]]+-[[:space:]]"/); # Match the start of the line to the beginning quote
$0=substr($0,RSTART,RLENGTH)str"\""; # Print the matched section followed by the replacement string (str) and the closing quote
found=0; # Reset the markers
dockfound=0
}1' file
One liner:
awk -v findstr=" docker.io" -v str="http://docker.io" '$0 ~ findstr { dockfound=1 } /endpoint/ && dockfound==1 { found=1;print;next } found==1 && dockfound==1 { match($0,/^[[:space:]]+-[[:space:]]"/);$0=substr($0,RSTART,RLENGTH)str"\"";found=0;dockfound=0 }1' file

Perl script throws syntax error for awk command

I have a file which contains each users userid and password. I need to fetch userid and password from that file by passing userid as an search element using awk command.
user101,smith,smith#123
user102,jones,passj#007
user103,albert,albpass#01
I am using a awk command inside my perl script like this:
...
...
my $userid = ARGV[0];
my $user_report_file = "report_file.txt";
my $data = `awk -F, '$1 ~ /$userid/ {print $2, $3}' $user_report_file`;
my ($user,$pw) = split(" ",$data);
...
...
Here I am getting the error:
awk: ~ /user101/ {print , }
awk: ^ syntax error
But if I run same command in terminal window its able to give result like below:
$] awk -F, '$1 ~ /user101/ {print $2, $3}' report_file.txt
smith smith#123
What could be the issue here?
The backticks are a double-quoted context, so you need to escape any literal $ that you want awk to interpret.
my $data = `awk -F, '\$1 ~ /$userid/ {print \$2, \$3}' $user_report_file`;
If you don't do that, you're interpolating the capture variables from the last successful Perl match.
When I have these sorts of problems, I try the command as a string first to see if it is what I expect:
my $data = "awk -F, '\$1 ~ /$userid/ {print \$2, \$3}' $user_report_file";
say $data;
Here's the Perl equivalent of that command:
$ perl -aF, -e '$F[0]=~/101/ && print "#F[1,2]"' report_file
But, this is something you probably want to do in Perl instead of creating another process:
Interpolating data into external commands can go wrong, such as a filename that is foo.txt; rm -rf /.
The awk you run is the first one in the path, so someone can make that a completely different program (so use the full path, like /usr/bin/awk).
Taint checking can tell you when you are passing unsanitized data to the shell.
Inside a program you don't get all the shortcuts, but if this is the part of your program that is slow, you probably want to rethink how you are accessing this data because scanning the entire file with any tool isn't going to be that fast:
open my $fh, '<', $user_report_file or die;
while( <$fh> ) {
chomp;
my #F = split /,/;
next unless $F[0] =~ /\Q$userid/;
print "#F[1,2]";
last; # if you only want the first one
}

AWK how to print more than one information and redirect all to a text file

Need help on this
I am creatinga script that will analyse a file and want to use Awk to print 2 informations in an output txt file
I am able to print the information No one am looking for in my screen but how to print with the same Awk another information (exemple the number of lines of my file analyzed) and output those two information in a file calle test.txt
I tried with this code and code erreor : operator expected
#!/usr/bin/perl
if ($#ARGV ==-1)
{
print "Saisissez un nom de fichier a nalyser \n";
}
else
{
$fname = $ARGV[0];
open(FILE, $fname) || die ("cant open \n");
}
while($ligne=<FILE>)
{
chop ($ligne);
my ($elemnt1, $ellement2, $element3) = split (/ /, $ligne_);
}
system("awk '{print \$2 > "test.txt"}' $fname");
Try escaping the quotes on your last line, so test.txt is actually in the string passed to system.
system("awk '{print \$2 > \"test.txt\"}' $fname");
Edit: Adding the number of lines to the same file
The Awk variable NR ends up holding the number of lines in the input while the END rule is executing. Try this:
$outfile = '"test.txt"';
system("awk '{print \$2 > $outfile} END {print NR > $outfile}' $fname");
Notes:
Watch out that $outfile doesn't have any funny characters in the name.
Unlike in shell, it's perfectly safe to use > both times. See here.

join 2 lines only if field-1 are equals with sed or awk

input file:
$ cat t.txt
id1;value1_1
id1;value1_2
id2;value2_1
id3;value3_1
id4;value4_1
id4;value4_2
id5;value5_1
result would be:
id1;value1_1;id1;value1_2
id3;value3_1
id4;value4_1;id4;value4_2
id5;value5_1
using sed or awk. Please give your opinion.
Here's one way to do it:
awk -F';' 'BEGIN { getline; id=$1; line=$0 } { if ($1 != id) { print line; line = $0; } else { line = line ";" $0; } id=$1; } END { print line; }' t.txt
Explanation:
Set field separator to ;:
-F';'
Start by reading the first line of input (getline), save the first field ($1) as id, and the first line ($0) as line:
BEGIN { getline; id=$1; line=$0 }
For each line of input, check if the first field differs from the stored id:
if ($1 != id)
If it does, then print the saved line and store the new one ($0):
print line; line = $0;
Otherwise, append the new line to the stored line(s):
line = line ";" $0;
And save the new id:
id=$1
At the end, print whatever is left in line:
END { print line; }
I guess in your result example, the id2; line is missing by mistake, right?
anyway, you could try the awk line below:
awk -F';' '{a[$1]=($1 in a)?a[$1]";"$0:$0}END{for(x in a)print a[x]}' yourFile|sort
output would be:
id1;value1_1;id1;value1_2
id2;value2_1
id3;value3_1
id4;value4_1;id4;value4_2
id5;value5_1
This might work for you:
sed -e '1{h;d};H;${x;:a;s/\(\([^;]*;\)\([^\n]*\)\)\n\2/\1;\2/;ta;p};d' t.txt
Explanation:
Slurp file in to hold space (HS) then on end-of-file swap to the HS and using substitution concatenate lines with duplicate keys and print. N.B. lines normally printed are all deleted.
EDIT:
The above solution works (as far as I know) but for large volumes is not very fast (read incredibly slow). This solution is better:
# cat -A /tmp/t.txt
id1;value1_1$
id1;value1_2$
id2;value2_1$
id3;value3_1$
id4;value4_1$
id4;value4_2$
id5;value5_1$
# for x in {1..1000};do cat /tmp/t.txt;done |
> sed ':a;$!N;/^\([^;]*;\).*\n\1/s/\n//;ta;P;D'| sort | uniq
id1;value1_1;id1;value1_2
id2;value2_1
id3;value3_1
id4;value4_1;id4;value4_2
id5;value5_1

Perform action on line range in sed/awk

How can I extract certain variables from a specific range of lines in sed/awk?
Example: I want to exctract the host and port from .tnsnames.ora
from this section that starts at line 105.
DB_CONNECTION=
(description=
(address=
(protocol=tcp)
(host=127.0.0.1)
(port=1234)
)
(connect_data=
(sid=ABCD)
(sdu=4321)
)
The gawk can use regular expression in field separater(FS).
'$0=$2' is always true, so automatically this script print $2.
$ gawk -F'[()]' 'NR>105&&NR<115&&(/host/||/port/)&&$0=$2' .tnsnames.ora
use:
sed '105,$< whatever sed code you want here >'
If you specifically want the host and the port you can do something like:
sed .tnsnames.ora '105,115p'|grep -e 'host=' -e 'port='
You can use address ranges to specify to which section to apply the regular expressions. If you leave the end line address out (keep the comma) it will match to the end of file. You can also 'chain' multiple expressions by using '-e' multiple times. The following expression will just print the port and host value to standard out. It uses back references (\1) in order to just print the matching parts.
sed -n -e '105,115s/(port=\([0-9].*\))/\1/p' -e '105,115s/(host=\([0-9\.].*\))/\1/p' tnsnames.ora
#lk, to address the answer you posted:
You can write awk code like C, but it's more succinctly expressed as "pattern {action}" pairs.
If you have gawk or nawk, the field separator is an ERE as Hirofumi Saito said
gawk -F'[()=]' '
NR < 105 {next}
NR > 115 {exit}
$2 == "host" || $2 == "port" {
# do stuff with $2 and $3
print $2 "=" $3
}
'