Exclude e-mails which domain name match with the global one - sed

The global domain are in "*#" option, when e-mail match with one of these global domains, I need to exclude them from the list.
Example:
WF,*#stackoverflow.com
WF,*#superuser.com
WF,*#stackexchange.com
WF,test#superuser.com
WF,test#stackapps.com
WF,test#stackexchange.com
Output:
WF,*#stackoverflow.com
WF,*#superuser.com
WF,*#stackexchange.com
WF,test#stackapps.com

You have two types of data in the same file, so the easiest way to process is to divide it first:
<infile tee >(grep '\*#' > global) >(grep -v '\*#' > addr) > /dev/null
Then use global to remove information from addr:
grep -vf <(cut -d# -f2 global) addr
Putting it together:
<infile tee >(grep '\*#' > global) >(grep -v '\*#' > addr) > /dev/null
cat global <(grep -vf <(cut -d# -f2 global) addr) > outfile
Contents of outfile:
WF,*#stackoverflow.com
WF,*#superuser.com
WF,*#stackexchange.com
WF,test#stackapps.com
Clean up temporary files with rm global addr.

$ awk -F, 'NR==FNR && /\*#/{a[substr($2,3)]=1;print;next}NR!=FNR && $2 !~ /^\*/{x=$2;sub(/.*#/,"",x); if (!(x in a))print;}' OFS=, file file
WF,*#stackoverflow.com
WF,*#superuser.com
WF,*#stackexchange.com
WF,test#stackapps.com

You could do:
grep -o "\*#.*" file.txt | sed -e 's/^/[^*]/' > global.txt
grep -vf global.txt file.txt
This will start by extracting the global emails, and prepend them with [^*], saving the results into global.txt. This file is then used as input to grep, where each line is treated as a regex in the form [^*]*#global.domain.com. The -v option tells grep to only print lines that don't match that pattern.
Another analogous option, using sed for in-place editing would be:
grep -o "\*#.*" file.txt | sed -e 's/^.*$/\/[^*]&\/d/' > global.sed
sed -i -f global.sed file.txt

Here's one way using GNU awk. Run like:
awk -f script.awk file.txt{,}
Contents of script.awk:
BEGIN {
FS=","
}
FNR==NR {
if (substr($NF,1,1) == "*") {
array[substr($NF,2)]++
}
next
}
substr($NF,1,1) == "*" || !(substr($NF,index($NF,"#")) in array)
Results:
WF,*#stackoverflow.com
WF,*#superuser.com
WF,*#stackexchange.com
WF,test#stackapps.com
Alternatively, here's the one-liner:
awk -F, 'FNR==NR { if (substr($NF,1,1) == "*") array[substr($NF,2)]++; next } substr($NF,1,1) == "*" || !(substr($NF,index($NF,"#")) in array)' file.txt{,}

This might work for you (GNU sed):
sed '/.*\*\(#.*\)/!d;s||/[^*]\1/d|' file | sed -f - file

With one pass of the file and allowing for the global domains to be intermixed with the addresses:
$ cat file
WF,*#stackoverflow.com
WF,test#superuser.com
WF,*#superuser.com
WF,test#stackapps.com
WF,test#stackexchange.com
WF,*#stackexchange.com
WF,foo#stackapps.com
$
$ awk -F'[,#]' '
$2=="*" { glbl[$3]; print; next }
{ addrs[$3] = addrs[$3] $0 ORS }
END {
for (dom in addrs)
if (!(dom in glbl))
printf "%s",addrs[dom]
}
' file
WF,*#stackoverflow.com
WF,*#superuser.com
WF,*#stackexchange.com
WF,test#stackapps.com
WF,foo#stackapps.com
or if you don't mind a 2-pass approach:
$ awk -F'[,#]' '(NR==FNR && $2=="*" && !glbl[$3]++) || (NR!=FNR && !($3 in glbl))' file file
WF,*#stackoverflow.com
WF,*#superuser.com
WF,*#stackexchange.com
WF,test#stackapps.com
WF,foo#stackapps.com
I know that second one's a bit cryptic, but it's pretty easily translated to not use the default action and a good exercise in awk idioms :-).

Related

How to find a variable and replace it with other variable in Perl?

I have tried the below Perl command to find a $from_word variable and replace it with $to_word variable in the $bat_file_path file.
system("perl -i -p -e 's/$from_word/$to_word/ee' $bat_file_path");
but I get error as
Substitution replacement not terminated at -e line 4.
Also it did not replaced as expected.
Please help me out of this concern.
sub change_cg_name {
if(!-e $output_running) {
print ("show running file not available. test case failed. [$output_running]");
return 0;
}
if(!-e $bat_file_path) {
print ("bat file not available test case filed. [$bat_file_path]");
return 0
}
my $from_word=`grep 'config-group type node IMPT_' $bat_file_path | awk '{print \$(4)}'`;
my $to_word= `grep 'config-group type node IMPT_' $output_running | awk '{print \$(4)}'`;
print("from WORD IS [$from_word]");
print("TO WORD IS [$to_word]");
if($to_word ne "") {
if (index($to_word, "IMPT_") != -1) {
system("perl -i -p -e 's/"$from_word"/"$to_word"/ee' $bat_file_path");
system("perl -p -i -e 's/\r\n$/\n/g' $bat_file_path");
print("ARUL changed the impt name in the bat file [$to_word] and file [$bat_file_path]");
return 0;
}
}
}
Change this line:
system("perl -i -p -e 's/"$from_word"/"$to_word"/ee' $bat_file_path");
with
system("perl -i -p -e 's/$from_word/$to_word/ee' $bat_file_path");
serenesat is right in removing the incorrectly nested quotation marks.
glenn jackman is right in pointing to the line endings in the $from_word and $to_word values. Instead of removing them, I suggest not to produce them in the first place by changing the awk command print \$(4) to printf "%s", \$(4).
Finally, in the command system("perl -p -i -e 's/\r\n$/\n/g' $bat_file_path") the \n and $ need \-escaping: system("perl -p -i -e 's/\r\\n\$/\\n/g' $bat_file_path").

Make some replacements on a bunch of files depending the number of columns per line

I'm having a problem dealing with some files. I need to perform a column count for every line in a file and depending the number of columns i need to add severals ',' in in the end of each line. All lines should have 36 columns separated by ','
This line solves my problem, but how do I run it in a folder with several files in a automated way?
awk ' BEGIN { FS = "," } ;
{if (NF == 32) { print $0",,,," } else if (NF==31) { print $0",,,,," }
}' <SOURCE_FILE> > <DESTINATION_FILE>
Thank you for all your support
R&P
The answer depends on your OS, which you haven't told us. On UNIX and assuming you want to modify each original file, it'd be:
for file in *
do
awk '...' "$file" > tmp$$ && mv tmp$$ "$file"
done
Also, in general to get all records in a file to have the same number of fields you can do this without needing to specify what that number of fields is (though you can if appropriate):
$ cat tst.awk
BEGIN { FS=OFS=","; ARGV[ARGC++] = ARGV[ARGC-1] }
NR==FNR { nf = (NF > nf ? NF : nf); next }
{
tail = sprintf("%*s",nf-NF,"")
gsub(/ /,OFS,tail)
print $0 tail
}
$
$ cat file
a,b,c
a,b
a,b,c,d,e
$
$ awk -f tst.awk file
a,b,c,,
a,b,,,
a,b,c,d,e
$
$ awk -v nf=10 -f tst.awk file
a,b,c,,,,,,,
a,b,,,,,,,,
a,b,c,d,e,,,,,
It's a short one-liner with Perl:
perl -i.bak -F, -alpe '$_ .= "," x (36-#F)' *
if this is only a single folder without subfolders, use:
for oldfile in /path/to/files/*
do
newfile="${oldfile}.new"
awk '...' "${oldfile}" > "${newfile}"
done
if you also want to include subdirectories recursively, it's probably easiest to put the awk+redirection into a small shell-script, like this:
#!/bin/bash
oldfile=$1
newfile="${oldfile}.new"
awk '...' "${oldfile}" > "${newfile}"
and then run this script (let's calls it runawk.sh) via find:
find /path/to/files/ -type f -not -name "*.new" -exec runawk.sh \{\} \;

Remove lines from AWK output

I would like to remove lines that have less than 2 columns from a file:
awk '{ if (NF < 2) print}' test
one two
Is there a way to store these lines into variable and then remove it with xargs and sed, something like
awk '{ if (NF < 2) VARIABLE}' test | xargs sed -i /VARIABLE/d
GNU sed
I would like to remove lines that have less than 2 columns
less than 2 = remove lines with only one column
sed -r '/^\s*\S+\s+\S+/!d' file
If you would like to split the input into two files (named "pass" and "fail"), based on condition:
awk '{if (NF > 1 ) print > "pass"; else print > "fail"}' input
If you simply want to filter/remove lines with NF < 2:
awk '(NF > 1){print}' input

Unix - Removing everything after a pattern using sed

I have a file which looks like below:
memory=500G
brand=HP
color=black
battery=5 hours
For every line, I want to remove everything after = and also the =.
Eventually, I want to get something like:
memory:brand:color:battery:
(All on one line with colons after every word)
Is there a one-line sed command that I can use?
sed -e ':a;N;$!ba;s/=.\+\n\?/:/mg' /my/file
Adapted from this fine answer.
To be frank, however, I'd find something like this more readable:
cut -d = -f 1 /my/file | tr \\n :
Here's one way using GNU awk:
awk -F= '{ printf "%s:", $1 } END { printf "\n" }' file.txt
Result:
memory:brand:color:battery:
If you don't want a colon after the last word, you can use GNU sed like this:
sed -n 's/=.*//; H; $ { g; s/\n//; s/\n/:/g; p }' file.txt
Result:
memory:brand:color:battery
This might work for you (GNU sed):
sed -i ':a;$!N;s/=[^\n]*\n\?/:/;ta' file
perl -F= -ane '{print $F[0].":"}' your_file
tested below:
> cat temp
abc=def,100,200,dasdas
dasd=dsfsf,2312,123,
adasa=sdffs,1312,1231212,adsdasdasd
qeweqw=das,13123,13,asdadasds
dsadsaa=asdd,12312,123
> perl -F= -ane '{print $F[0].":"}' temp
abc:dasd:adasa:qeweqw:dsadsaa:
My command is
First step:
sed 's/([a-z]+)(\=.*)/\1:/g' Filename |cat >a
cat a
memory:
brand:
color:
battery:
Second step:
sed -e 'N;s/\n//' a | sed -e 'N;s/\n//'
My output is
memory:brand:color:battery:

How can I add the current date or time to end of each line in file?

I have a file called data.txt.
I want to add the current date, or time, or both to the beginning or end of each line.
I have tried this:
awk -v v1=$var ' { printf("%s,%s\n", $0, v1) } ' data.txt > data.txt
I have tried this:
sed "s/$/,$var/" data.txt
Nothing works.
Can someone help me out here?
How about :
cat filename | sed "s/$/ `date`/"
The problem with this
awk -v v1=$var ' { printf("%s,%s\n", $0, v1) } ' data.txt > data.txt
is that the > redirection happens first, and the shell truncates the file. Only then does the shell exec awk, which then reads an empty file.
Choose one of these:
sed -i "s/\$/ $var/" data.txt
awk -v "date=$var" '{print $0, date}' data.txt > tmpfile && mv tmpfile data.txt
However, does your $var contain slashes (such as "10/04/2011 12:34") ? If yes, then choose a different delimiter for sed's s/// command: sed -i "s#\$# $var#" data.txt