compare two files making abstraction of a pattern but display it - sed

I want to compare two file using diff or soemthing else. Each line on the file starts with "line_x".
File:
line_1: This is line1
line_2: This is line2
....
I want to compare the files without the line_x. Something like this:
diff <(sed '/line/,/:/g' diff1) <(sed '/line/,/:/g' diff2)
But wen i print the differences I want to insert the exact line_x that is different.
It is possible with awk or something else to do that?
Thanks

You may try the following:
awk -f cmp.awk file1.txt file2.txt
where file1.txt and file2.txt are your input files and cmp.awk is
NR==FNR {
$1=""
b[FNR]=$0
next
}
{
$1=""
if ($0!=b[FNR]) {
printf "Line: %d\n", FNR
printf " File 1: %s\n", b[FNR]
printf " File 2: %s\n", $0
}
}
if the lines in the two files are not sorted, you could try:
NR==FNR {
a=$1; $1=""
b[a]=$0
next
}
{
a=$1; $1=""
if ($0!=b[a]) {
printf "%s\n", a
printf " File 1: %s\n", b[a]
printf " File 2: %s\n", $0
}
}

Related

search for a key value pair and append the value to other keys in unix

I need to search for a key and append the value to every key:value pair in a Unix file
Input file data:
1A:trans_ref_id|10:account_no|20:cust_name|30:trans_amt|40:addr
1A:trans_ref_id|10A:ccard_no|20:cust_name|30:trans_amt|40:addr
My desired Output:
account_no|1A:trans_ref_id
account_no|10:account_no
account_no|20:cust_name
account_no|30:trans_amt
account_no|40:addr
ccard_no|1A:trans_ref_id
ccard_no|10A:ccard_no
ccard_no|20:cust_name
ccard_no|30:trans_amt
ccard_no|40:addr
Basically, I need the value of 10 or 10A appended to every key:value pair and split into new lines. To be clear, this won't always be the second field.
I am new to sed, awk and perl. I started with extracting the value using awk:
awk -v FS="|" -v key="59" '$2 == key {print $2}' target.txt
I need the value of 10 or 10A appended to every key:value pair
Going by these requirements, you may try this awk:
awk '
BEGIN{FS=OFS="|"}
match($0, /\|10A?:[^|]+/) {
s = substr($0, RSTART, RLENGTH)
sub(/.*:/, "", s)
}
{
for (i=1; i<=NF; ++i)
print s, $i
}' file
account_no|1A:trans_ref_id
account_no|10:account_no
account_no|20:cust_name
account_no|30:trans_amt
account_no|40:addr
ccard_no|1A:trans_ref_id
ccard_no|10A:ccard_no
ccard_no|20:cust_name
ccard_no|30:trans_amt
ccard_no|40:addr
# Looks for 10 or 10A
perl -F'\|' -lane'my ($id) = map /^10A?:(.*)/s, #F; print "$id|$_" for #F'
# Looks for 10 or 10<non-digit><maybe more>
perl -F'\|' -lane'my ($id) = map /^10(?:\D[^:]*)?:(.*)/s, #F; print "$id|$_" for #F'
-n executes the program for each line of input.
-l removes LF on read and adds it on print.
-a splits the line on | (specified by -F) into #F.
The first statement extracts what follows : in the field with id 10 or 10-plus-something.
The second statement prints a line for each field.
Specifying file to process to Perl one-liner
If you are still stuck on where to get started, you will use a field-separator and output-field-separator (FS and OFS) set equal to '|' that will split each record into fields at each '|'. Your fields are available as $1, $2, ... $NF. You care about getting, e.g. account_no from field two ($2) so you split() field two with the separator ':' saving the split fields in an array (a used below). You want the second part from field two which will be in the 2nd array element a[2] to use as the new field-1 in output.
The rest is just looping over each field and outputting a[2] a separator and then the current field. You can do that with:
awk 'BEGIN{FS=OFS="|"} {split ($2,a,":"); for(i=1;i<=NF;i++) print a[2],$i}' file
Example Use/Output
With your example input in file, the result would be:
account_no|1A:trans_ref_id
account_no|10:account_no
account_no|20:cust_name
account_no|30:trans_amt
account_no|40:addr
ccard_no|1A:trans_ref_id
ccard_no|10A:ccard_no
ccard_no|20:cust_name
ccard_no|30:trans_amt
ccard_no|40:addr
Which appears to be what you are after. Let me know if you have further questions.
"10" or "10A" at Unknown Field
You can handle the fields containing "10" and "10A" in any order. You just add a loop to loop over the fields and determine which holds "10" or "10A" and save the 2nd element from the array resulting from split() from that field. The rest is the same, e.g.
awk '
BEGIN { FS=OFS="|" }
{ for (i=1;i<=NF;i++){
split ($i,a,":")
if (a[1]=="10"||a[1]=="10A"){
key=a[2]
break
}
}
for (i=1;i<=NF;i++)
print key, $i
}
' file1
Example Input
1A:trans_ref_id|10:account_no|20:cust_name|30:trans_amt|40:addr
1A:trans_ref_id|20:cust_name|30:trans_amt|10A:ccard_no|40:addr
Example Use/Output
awk '
> BEGIN { FS=OFS="|" }
> { for (i=1;i<=NF;i++){
> split ($i,a,":")
> if (a[1]=="10"||a[1]=="10A"){
> key=a[2]
> break
> }
> }
> for (i=1;i<=NF;i++)
> print key, $i
> }
> ' file1
account_no|1A:trans_ref_id
account_no|10:account_no
account_no|20:cust_name
account_no|30:trans_amt
account_no|40:addr
ccard_no|1A:trans_ref_id
ccard_no|20:cust_name
ccard_no|30:trans_amt
ccard_no|10A:ccard_no
ccard_no|40:addr
Which picks up the proper new field 1 for output from the 4th field containing "10A" for the second line above.
Let em know if this is what you needed.
EDIT: To find 10 OR 10A values in anywhere in line and then print as per that try following then.
awk '
BEGIN{
FS=OFS="|"
}
match($0,/(10|10A):[^|]*/){
split(substr($0,RSTART,RLENGTH),arr,":")
}
{
for(i=1;i<=NF;i++){
print arr[2],$i
}
}' Input_file
Explanation: Adding detailed explanation for above.
awk ' ##Starting awk program from here.
BEGIN{ ##Starting BEGIN section of this program.
FS=OFS="|" ##Setting FS and OFS to | here.
}
match($0,/(10|10A):[^|]*/){ ##using match function to match either 10: till | OR 10A: till | here.
split(substr($0,RSTART,RLENGTH),arr,":") ##Splitting matched sub string into array arr with delmiter of : here.
}
{
for(i=1;i<=NF;i++){ ##Running for loop for each field for each line.
print arr[2],$i ##Printing 2nd element of ar, along with current field.
}
}' Input_file ##Mentioning Input_file name here.
With your shown samples, please try following.
awk '
BEGIN{
FS=OFS="|"
}
{
split($2,arr,":")
print arr[2],$1
for(i=2;i<=NF;i++){
print arr[2],$i
}
}
' Input_file
Perl script implementation
use strict;
use warnings;
use feature 'say';
my $fname = shift || die "run as 'script.pl input_file key0 key1 ... key#'";
open my $fh, '<', $fname || die $!;
while( <$fh> ) {
chomp;
my %data = split(/[:\|]/, $_);
for my $key (#ARGV) {
if( $data{$key} ) {
say "$data{$key}|$_" for split(/\|/,$_);
}
}
}
close $fh;
Run as script.pl input_file 10 10A
Output
account_no|1A:trans_ref_id
account_no|10:account_no
account_no|20:cust_name
account_no|30:trans_amt
account_no|40:addr
ccard_no|1A:trans_ref_id
ccard_no|10A:ccard_no
ccard_no|20:cust_name
ccard_no|30:trans_amt
ccard_no|40:addr
Here's an alternate perl solution:
perl -pe '($id) = /(?<![^|])10A?:([^|]+)/; s/([^|]+)[|\n]/$id|$1\n/g'
($id) = /(?<![^|])10A?:([^|]+)/ this will capture the string after 10: or 10A: and save in $id variable. First such match in the line will be captured.
s/([^|]+)[|\n]/$id|$1\n/g every field is then prefixed with value in $id and | character

Find matches in a log file based on the time and ID

I have a radius log file which is comma separated.
"1/3/2013","00:52:23","NASK","Stop","15444111111","200","15444111111","15444111111","10.142.98.190","moen",,,,,"D89BA1F93E5DC400",,,"31026","216.155.166.8","310260010265999",,"10.184.81.145","780246","18","ATGGSN17","2","7",,,"1385772885",,
"1/3/2013","00:52:23","NASK","Start","15444111111","200","15444111111","15444111111","10.142.98.190","moen",,,,,"D89BA1F93E5DC500",,,"31026","216.155.166.8","310260010265999",,"10.184.81.145","780246","18","A","2","7",,,"1385772885",,
Is it possible through any Linux command line tool like awk to count the number of occurrences where the second column (the time) and the seventh column (the number) are the same, and a Start event follows a Stop event?
I want to find the occurrences where a Stop is followed by a Start at the same time for the same number.
There will be other entries as well with the same timestamp between these cases.
You don't say very clearly what kind of result you want, but you should use Perl with Text::CSV to process CSV files.
This program just prints the three relevant fields from all lines of the file where the event is Start or Stop and the time and the ID string are duplicated.
use strict;
use warnings;
use Text::CSV;
my $csv = Text::CSV->new;
open my $fh, '<', 'text.csv' or die $!;
my %data;
while (my $row = $csv->getline($fh)) {
my ($time, $event, $id) = #$row[1,3,6];
next unless $event eq 'Start' or $event eq 'Stop';
push #{ $data{"$time/$id"} }, $row;
}
for my $lines (values %data) {
next unless #$lines > 1;
print "#{$_}[1,3,6]\n" for #$lines;
print "\n";
}
output
00:52:23 Stop 15444111111
00:52:23 Start 15444111111
I have tried the following using GNU sed & awk
sed -n '/Stop/,/Start/{/Stop/{h};/Start/{H;x;p}}' text.csv \
| awk -F, 'NR%2 != 0 {prev=$0;time=$2;num=$7} \
NR%2 == 0 {if($2==time && $7==num){print prev,"\n", $0}}'
The sed part would select pairing Stop line and Start line. There can(or not) be other lines between the two lines, and if there are multiple Stop lines before a Start line the last Stop line would be selected (This may be not necessary in this case...).
The awk part would compare the selected pairs in sed part, if the second and seventh columns are identical, the pair would be print out.
My test as below:
text.csv:
"1/3/2013","00:52:20","NASK","Stop","15444111111","200","15444111111","15444111111","10.142.98.190","moen",,,,,"D89BA1F93E5DC400",,,"31026","216.155.166.8","310260010265999",,"10.184.81.145","780246","18","ATGGSN17","2","7",,,"1385772885",,
"1/3/2013","00:52:23","NASK","XXXX","15444111111","200","15444111111","15444111111","10.142.98.190","moen",,,,,"D89BA1F93E5DC400",,,"31026","216.155.166.8","310260010265999",,"10.184.81.145","780246","18","ATGGSN17","2","7",,,"1385772885",,
"1/3/2013","00:52:23","NASK","Stop","15444111111","200","15444111111","15444111111","10.142.98.190","moen",,,,,"D89BA1F93E5DC400",,,"31026","216.155.166.8","310260010265999",,"10.184.81.145","780246","18","ATGGSN17","2","7",,,"1385772885",,
"1/3/2013","00:52:23","NASK","XXXX","15444111111","200","15444111111","15444111111","10.142.98.190","moen",,,,,"D89BA1F93E5DC400",,,"31026","216.155.166.8","310260010265999",,"10.184.81.145","780246","18","ATGGSN17","2","7",,,"1385772885",,
"1/3/2013","00:52:23","NASK","Start","15444111111","200","15444111111","15444111111","10.142.98.190","moen",,,,,"D89BA1F93E5DC500",,,"31026","216.155.166.8","310260010265999",,"10.184.81.145","780246","18","A","2","7",,,"1385772885",,
"1/3/2013","00:52:28","NASK","Stop","15444111111","200","15444111111","15444111111","10.142.98.190","moen",,,,,"D89BA1F93E5DC400",,,"31026","216.155.166.8","310260010265999",,"10.184.81.145","780246","18","ATGGSN17","2","7",,,"1385772885",,
"1/3/2013","00:52:29","NASK","Start","15444111111","200","15444111111","15444111111","10.142.98.190","moen",,,,,"D89BA1F93E5DC500",,,"31026","216.155.166.8","310260010265999",,"10.184.81.145","780246","18","A","2","7",,,"1385772885",,
The output:
"1/3/2013","00:52:23","NASK","Stop","15444111111","200","15444111111","15444111111","10.142.98.190","moen",,,,,"D89BA1F93E5DC400",,,"31026","216.155.166.8","310260010265999",,"10.184.81.145","780246","18","ATGGSN17","2","7",,,"1385772885",,
"1/3/2013","00:52:23","NASK","Start","15444111111","200","15444111111","15444111111","10.142.98.190","moen",,,,,"D89BA1F93E5DC500",,,"31026","216.155.166.8","310260010265999",,"10.184.81.145","780246","18","A","2","7",,,"1385772885",,
If the "stop" line is followed immediately by the "start" line, you could try the following:
awk -f cnt.awk input.txt
where cnt.awk is
BEGIN {
FS=","
}
$4=="\"Stop\"" {
key=($2 $5)
startl=$0
getline
if ($4=="\"Start\"") {
if (key==($2 $5)) {
print startl
print $0
}
}
}
Update
If there can be other lines between a "Start" and "Stop" line, you could try:
BEGIN {
FS=","
}
$4=="\"Stop\"" {
a[($2 $5)]=$0
next
}
$4=="\"Start\"" {
key=($2 $5)
if (key in a) {
sl[++i]=a[key]
el[i]=$0
}
}
END {
nn=i
for (i=1; i<=nn; i++) {
print sl[i]
print el[i]
}
}

AWK how to print more than one information and redirect all to a text file

Need help on this
I am creatinga script that will analyse a file and want to use Awk to print 2 informations in an output txt file
I am able to print the information No one am looking for in my screen but how to print with the same Awk another information (exemple the number of lines of my file analyzed) and output those two information in a file calle test.txt
I tried with this code and code erreor : operator expected
#!/usr/bin/perl
if ($#ARGV ==-1)
{
print "Saisissez un nom de fichier a nalyser \n";
}
else
{
$fname = $ARGV[0];
open(FILE, $fname) || die ("cant open \n");
}
while($ligne=<FILE>)
{
chop ($ligne);
my ($elemnt1, $ellement2, $element3) = split (/ /, $ligne_);
}
system("awk '{print \$2 > "test.txt"}' $fname");
Try escaping the quotes on your last line, so test.txt is actually in the string passed to system.
system("awk '{print \$2 > \"test.txt\"}' $fname");
Edit: Adding the number of lines to the same file
The Awk variable NR ends up holding the number of lines in the input while the END rule is executing. Try this:
$outfile = '"test.txt"';
system("awk '{print \$2 > $outfile} END {print NR > $outfile}' $fname");
Notes:
Watch out that $outfile doesn't have any funny characters in the name.
Unlike in shell, it's perfectly safe to use > both times. See here.

split file into single lines via delimiter

Hi I have the following file:
>101
ADFGLALAL
GHJGKGL
>102
ASKDDJKJS
KAKAKKKPP
>103
AKNCPFIGJ
SKSK
etc etc;
and I need it in the following format:
>101
ADFGLALALGHJGKGL
>102
ASKDDJKJSKAKAKKKPP
>103
AKNCPFIGJSKSK
how can I do this? perhaps a perl one liner?
Thanks very much!
perl -npe 'chomp if ($.!=1 && !s/^>/\n>/)' input
Remove the newline at the end (chomp) if there is no > at the beginning (!s/^>/\n>/ is false). Also, add a newline at the beginning of the line if this is not the first line ($.!=1) and there is a > at the beginning of the line (s/^>/\n>/).
perl -lne '
if (/^>/) {print}
else{
if ($count) {
print $string . $_;
$count = 0;
} else {
$string = $_;
$count++;
}
}
' file.txt

Print Valid words with _ in between them

I have done my research, but not able to find the solution to my problem.
I am trying to extract all valid words(Starting with a letter) in a string and concatenate them with underscore("_"). I am looking for solution with awk, sed or grep, etc.
Something like:
echo "The string under consideration" | (awk/grep/sed) (pattern match)
Example 1
Input:
1.2.3::L2 Traffic-house seen during ABCD from 2.2.4/5.2.3a to 1.2.3.X11
Desired output:
L2_Traffic_house_seen_during_ABCD_from
Example 2
Input:
XYZ-2-VRECYY_FAIL: Verify failed - Client 0x880016, Reason: Object exi
Desired Output:
XYZ_VRECYY_FAIL_Verify_failed_Client_Reason_Object_exi
Example 3
Input:
ABCMGR-2-SERVICE_CRASHED: Service "abcmgr" (PID 7582) during UPGRADE
Desired Output:
ABCMGR_SERVICE_CRASHED_Service_abcmgr_PID_during_UPGRADE
This might work for you (GNU sed):
sed 's/[[:punct:]]/ /g;s/\<[[:alpha:]]/\n&/g;s/[^\n]*\n//;s/ [^\n]*//g;y/\n/_/' file
A perl one-liner. It searches any alphabetic character followed by any number of word characters enclosed in word boundaries. Use the /g flag to try several matches for each line.
Content of infile:
1.2.3::L2 Traffic-house seen during ABCD from 2.2.4/5.2.3a to 1.2.3.X11
XYZ-2-VRECYY_FAIL: Verify failed - Client 0x880016, Reason: Object exi
ABCMGR-2-SERVICE_CRASHED: Service "abcmgr" (PID 7582) during UPGRADE
Perl command:
perl -ne 'printf qq|%s\n|, join qq|_|, (m/\b([[:alpha:]]\w*)\b/g)' infile
Output:
L2_Traffic_house_seen_during_ABCD_from_to_X11
XYZ_VRECYY_FAIL_Verify_failed_Client_Reason_Object_exi
ABCMGR_SERVICE_CRASHED_Service_abcmgr_PID_during_UPGRADE
One way using awk, with the contents of script.awk:
BEGIN {
FS="[^[:alnum:]_]"
}
{
for (i=1; i<=NF; i++) {
if ($i !~ /^[0-9]/ && $i != "") {
if (i < NF) {
printf "%s_", $i
}
else {
print $i
}
}
}
}
Run like:
awk -f script.awk file.txt
Alternatively, here is the one liner:
awk -F "[^[:alnum:]_]" '{ for (i=1; i<=NF; i++) { if ($i !~ /^[0-9]/ && $i != "") { if (i < NF) printf "%s_", $i; else print $i; } } }' file.txt
Results:
L2_Traffic_house_seen_during_ABCD_from_to_X11
XYZ_VRECYY_FAIL_Verify_failed_Client_Reason_Object_exi
ABCMGR_SERVICE_CRASHED_Service_abcmgr_PID_during_UPGRADE
This solution requires some tuning and I think one needs gawk to have regexp as "record separator"
http://www.gnu.org/software/gawk/manual/html_node/Records.html#Records
gawk -v ORS='_' -v RS='[-: \"()]' '/^[a-zA-Z]/' file.dat