Remove duplicate numbers within parentheses using sed

Remove duplicate numbers within parentheses using sed - sed

I am trying to remove duplicate numbers within parentheses using sed.
So I have the following string:
Abdc 1234 1234 (5678) (5678) (9012) (9012) (3456)
I want to use sed to remove any 4-digit numbers within the parentheses, including the parentheses. So my string should look like this:
Abdc 1234 1234 (5678) (9012) (3456)
In this case the "(5678)" and "(9012)" were removed because they were 4-digit numbers within parentheses that repeated. The "1234" numbers were not removed because they were not within parenthesis. The "(3456)" was not removed because it was not repeating.

I do not know how to do this with sed but you could try the following with awk:
$ echo "Abdc 1234 1234 (5678) (5678) (9012) (9012) (3456)" | awk '
{
for(i=1;i<=NF;i++) {
if(substr($i,0,1) != "(" || (seen[$i] != 1)) {
seen[$i]=1;
printf "%s ",$i
}
};
print ""
}'
Output:
Abdc 1234 1234 (5678) (9012) (3456)
This loops through the line fields then prints each field only if it has never been seen before or if it is not starting with (.

This works for your input:
echo 'Abdc 1234 1234 (5678) (5678) (9012) (9012) (3456)' |
sed 's/\(([0-9][0-9]*)\) \1/\1/g'
It assumes duplicates follow each other, if that is not the case, use this version:
echo 'Abdc 1234 1234 (5678) (5678) (9012) (9012) (3456)' |
sed 's/\(([0-9][0-9]*)\) \(.*\)\1/\1\2/g'
Or a bit shorter with GNU sed extended expressions:
echo 'Abdc 1234 1234 (5678) (5678) (9012) (9012) (3456)' |
sed -r 's/(\([0-9]+\)) (.*)\1/\1\2/g'
Output in all cases:
Abdc 1234 1234 (5678) (9012) (3456)
Edit - handle situation where more than two identical items exist
This can be done by looping over the pattern until it no longer matches:
echo 'Abdc 1234 1234 (5678) (5678) (9012) (9012) (3456) (5678) (5678)' |
sed -r ':a; s/(\([0-9]+\))(.*)\1 ?/\1\2/g; ta'

Using Perl :
$ echo "Abdc 1234 1234 (5678) (5678) (9012) (9012) (3456)" |
perl -ne '
my (#arr, %hash);
for (split) {
if (/^\(.*\)/) {
$hash{$_}++;
push #arr, $_ if $hash{$_} == 1;
}
else {
push #arr, $_;
}
}
print join " ", #arr, "\n";
'
That will works with multi line as input and N occurrences of repeated stuff with parenthesis.

This might work for you (GNU sed):
sed ':a;s/\(\(([0-9]\+) *\).*\)\2/\1/g;ta' file

awk -F"(" '{for(i in a)delete a[i];for(i=2;i<=NF;i++){if($i in a){$i="";}else{a[$i];$i="("$i}}print $0}' your_file
Tested below:
input:
> cat temp
Abdc 1234 1234 (5678) (5678) (9012) (9012) (3456)
1234 1234 (1234) (5678) (9012) (1234) (3456)
(5678) (6467) (6467) (9012) (5678)
Now the execution:
> awk -F"(" '{for(i in a)delete a[i];for(i=2;i<=NF;i++){if($i in a){$i="";}else{a[$i];$i="("$i}}print $0}' temp
Abdc 1234 1234 (5678) (9012) (3456)
1234 1234 (1234) (5678) (9012) (3456)
(5678) (6467) (9012) (5678)
>

Related

filter data in text file and load into postgresql

I have a text file with the below format:
Text: htpps:/xxx
Expiry: ddmm/yyyy
object_id: 00
object: ABC
auth: 333
RequestID: 1234
Text: htpps:/yyy
Expiry: ddmm/yyyy
object_id: 01
object: NNN
auth: 222
RequestID: 3456
and so on
...
I want to delete all lines with the exception of lines with prefix "Expiry:" "object:" and "object_id:"
then load it into a table in postgresql
Would really appreciate your help on the above two.
thanks
Nick

I'm sure there will be other methods, but I found an iterative approach if every object has the same format of
Text: htpps:/xxx
Expiry: ddmm/yyyy
object_id: 00
object: ABC
auth: 333
RequestID: 1234
Then you can transform the above with
more test.txt | awk '{ printf "%s\n", $2 }' | tr '\n' ',' | sed 's/,,/\n/' | sed '$ s/.$//'
and, for your example it will generate the entries in CSV format
htpps:/xxx,ddmm/yyyy,00,ABC,333,1234
htpps:/yyy,ddmm/yyyy,01,NNN,222,3456
The above code does:
awk '{ printf "%s\n", $2 }': prints only the second element for each row
tr '\n' ',': transform new lines in ,
sed 's/,,/\n/': removes the empty lines
sed '$ s/.$//': removes the trailing ,
Of course this is probably an oversimplified example, but you could use it as basis. Once the file is in CSV you can load it with psql

Remove row from a file perl

I have file with | delimited row in that i want to add check on the value of 8th position if the value matches i want to remove that row from the file and if it not matching i want to leave that in file.
Below is the file format , i want to remove all the rows which have U value on the 8th position
A|B|DADD|H|O| |123 A Street; Apt.2|U|M
A|B|DADD|H|O| |123 A Street; Apt.2|A|M
A|B|DADD|H|O| |123 A Street; Apt.2|B|M
A|B|DADD|H|O| |123 A Street; Apt.2|U|M
How we can do it this Perl or is there any way we can use Awk or Sed. But after removing i want to print them as well .
I have tried sed but is matching through out the file i want to match at specific position.
sed -i '' "/$pattern/d" $file

perl -F'\|' -wlane'print if $F[7] ne "U"' file > new
With -a switch each line is split into words, available in #F array. The separator to split on can be set with -F option (default is whitespace) and here it's |. See switches in perlrun. Then we just check for the 8th field and print.
In order to change the input file in-place add -i switch
perl -i -F'\|' -wlane'print if $F[7] ne "U"' file
or use -i.bak to keep (.bak) backup as well.
I see that a question popped up about logging those lines that aren't kept in the file.
One way is to hijack the STDERR stream for them
perl -i -F'\|' -wlane'$F[7] ne "U" ? print : print STDERR $_' file 2> excluded
where the file excluded gets the STDERR stream, redirected (in bash) using 2>. However, that can be outright dangerous since now possible warnings are hidden and corrupt the file intended for excluded lines (as they also go to that file).
So better collect those lines and print them at the end
perl -i -F'\|' -wlanE'
$F[7] ne "U" ? print : push #exclude, $_;
END { say for #exclude }
' input > excluded
where file excluded gets all omitted (excluded) lines. (I switched -e to -E so to have say.)

Sounds like this might be what you want:
$ cat file
A|B|DADD|H|O| |123 A Street; Apt.2|U|M
A|B|DADD|H|O| |123 A Street; Apt.2|A|M
A|B|DADD|H|O| |123 A Street; Apt.2|B|M
A|B|DADD|H|O| |123 A Street; Apt.2|U|M
$ awk -i inplace -F'[|]' '$8=="U"{print|"cat>&2"; next} 1' file
A|B|DADD|H|O| |123 A Street; Apt.2|U|M
A|B|DADD|H|O| |123 A Street; Apt.2|U|M
$ cat file
A|B|DADD|H|O| |123 A Street; Apt.2|A|M
A|B|DADD|H|O| |123 A Street; Apt.2|B|M
The above uses GNU awk for -i inplace. With other awks you'd just do:
awk -F'[|]' '$8=="U"{print|"cat>&2"; next} 1' file > tmp && mv tmp file
To log the deleted line to a file named log1:
awk -F'[|]' '$8=="U"{print >> "log1"; next} 1' file
To log it and print it to stderr:
awk -F'[|]' '$8=="U"{print|"tee -a log1 >&2"; next} 1' file

output of two command need to be added as columns awk

I have a file which contains Package name and its Release.The line Release change has both old version and new version:
grep -A 2 'Package list' pkglist
Package list: xorg-x11-drv-ati-firmware-7.6.1-3.el6_9.noarch
Repository: /Linux/6.9/rpms/xorg-x11-drv-ati-firmware-7.6.1-4.el6.noarch.rpm
Release Change: 3.el6_9 --> 4.el6
Package list: yum-plugin-fastestmirror-1.1.30-40.0.1.el6.noarch
Repository: /Linux/6.9/rpms/yum-plugin-fastestmirror-1.1.30-42.0.1.el6_10.noarch.rpm
Release Change: 40.0.1.el6 --> 42.0.1.el6_10
Package list: yum-utils-1.1.30-40.0.1.el6.noarch
Repository: /Linux/6.9/rpms/yum-utils-1.1.30-42.0.1.el6_10.noarch.rpm
Release Change: 40.0.1.el6 --> 42.0.1.el6_10
I need formatted output as three columns with 1st column as pkgname 2nd column as 'old version' and 3rd column as 'new version' :
xorg-x11-drv-ati-firmware-7.6.1-3.el6_9.noarch 3.el6_9 4.el6
yum-utils-1.1.30-40.0.1.el6.noarch 40.0.1.el6 42.0.1.el6_10
yum-utils-1.1.30-40.0.1.el6.noarch 40.0.1.el6 42.0.1.el6_10
What I am trying is:
grep -i 'Package list' pkglist | awk '{print $3}'
xorg-x11-drv-ati-firmware-7.6.1-3.el6_9.noarch
yum-plugin-fastestmirror-1.1.30-40.0.1.el6.noarch
yum-utils-1.1.30-40.0.1.el6.noarch
grep -A 2 'Package list' pkglist | grep -i 'Release' | awk '{print $3,$5}'
3.el6_9 4.el6
40.0.1.el6 42.0.1.el6_10
40.0.1.el6 42.0.1.el6_10
The above two command output needs to be added as three columns in each line.

awk '/Package list/{printf $3 OFS}/Release Change/{print $3, $5}' pkglist
Returns
xorg-x11-drv-ati-firmware-7.6.1-3.el6_9.noarch 3.el6_9 4.el6
yum-plugin-fastestmirror-1.1.30-40.0.1.el6.noarch 40.0.1.el6 42.0.1.el6_10
yum-utils-1.1.30-40.0.1.el6.noarch 40.0.1.el6 42.0.1.el6_10

Replace Nth linewise search result

I want to replace all excluding first result
I have txt file:
AAA
BBB
CCC
AAA
BBB
CCC
AAA
BBB
CCC
I want to get this:
AAA
BBB <-- stay same
CCC
AAA
XXX <-- 2nd find replaced
CCC
AAA
XXX <-- 3rd and nth find replaced
CCC
I looking something similar to this, but for whole lines, not for words in lines
sed -i 's/AAA/XXX/2' ./test01

Use branching:
sed -e'/BBB/ {ba};b; :a {n;s/BBB/XXX/;ba}'
I.e. on the first BBB, we branch to :a, otherwise b without parameter starts processing of the next line.
Under :a, we read in a new line, replace all BBB by XXX and branch to a again.

Following awk may also help you on same.
awk '{++a[$0];$0=$0=="AAA"&&(a[$0]==2||a[$0]==3)?$0 ORS "XXX":$0} 1' Input_file

$ # replace all occurrences greater than s
$ # use gsub instead of sub to replace all occurrences in line
$ # whole line: awk -v s=1 '$0=="AAA" && ++c>s{$0="XXX"} 1' ip.txt
$ awk -v s=1 '/AAA/ && ++c>s{sub(/AAA/, "XXX")} 1' ip.txt
AAA
BBB
CCC
XXX
BBB
CCC
XXX
BBB
CCC
$ # replace exactly when occurrence == s
$ awk -v s=2 '/AAA/ && ++c==s{sub(/AAA/, "XXX")} 1' ip.txt
AAA
BBB
CCC
XXX
BBB
CCC
AAA
BBB
CCC
Further reading: Printing with sed or awk a line following a matching pattern

awk '/BBB/{c++;if(c >=2)sub(/BBB/,"XXX")}1' file
AAA
BBB
CCC
AAA
XXX
CCC
AAA
XXX
CCC

As soon as your file does not contain null chars (\0) you can fool sed to consider the whole file as a big string by intstructing sed to separate records using null char \0 instead of the default \n with sed option -z:
$ sed -z 's/BBB/XXX/2g' file66
AAA
BBB
CCC
AAA
XXX
CCC
AAA
XXX
CCC
/2g at the end means from second match and then globally.
You can combine -i with -z without problem.

How to replace the date with "sed" by the words "today" or "tomorrow"

my goal is to retrieve tidal times from www.worldtides.info in a specific way.
I got an API key on the site and can successfully retrieve the infos by issuing:
curl -s "http://www.worldtides.info/api?extremes&lat=my_latitude&lon=my_longitude&length=86400&key=my_api_key"| jq -r ".extremes[] | .date + .type"
I've installed jq on my raspberry to parse "date" and "type" from the json result.
The result in the terminal is:
2016-04-03T16:47+0000Low
2016-04-03T23:01+0000High
2016-04-04T05:18+0000Low
2016-04-04T11:29+0000High
To get a cleaner result, i use sed:
curl -s "http://www.worldtides.info/api?extremes&lat=my_latitude&lon=my_longitude&length=86400&key=my_api_key"| jq -r ".extremes[] | .date + .type" | sed 's/+0000/ /g' | sed 's/T/ /g'|
The result is:
2016-04-03 16:47 Low
2016-04-03 23:01 High
2016-04-04 05:18 Low
2016-04-04 11:29 High
I don't know how to replace the date by the word "today" if it's the date of today (2016-04-03 when i'm writing right now) and how to replace the date by the word "tomorrow" if it's the date of tomorrow.
I've tried:
curl -s "http://www.worldtides.info/api?extremes&lat=my_latitude&lon=my_longitude&length=86400&key=my_api_key"| jq -r ".extremes[] | .date + .type" | sed 's/date +"%Y-%m-%d"/Today/g' | sed 's/+0000/ /g' | sed 's/T/ /g'|
But no luck, no change. Can you help me ? thanks

Some lean linux distribution do not have GNU date out-of-the-box but use POSIX date without a tomorrow function. So you might have to install it first if you want to use sed with date. Alternatively, if GNU awk is available, you can also do
awk '$1 ~ strftime("%Y-%m-%d") {$1 = "today"} $1 ~ strftime("%Y-%m-%d",systime()+24*3600) {$1 = "tomorrow"} {print}'

You can do the substitution this way:
today=`date +%Y-%m-%d`
tomorrow=`date --date="tomorrow" +%Y-%m-%d`
echo $today $tomorrow
sed "s/$today/today/g; s/$tomorrow/tomorrow/g;" your_last_result
where your_last_result is the file containing the data from your question below "The result is:"

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse

Remove duplicate numbers within parentheses using sed - sed

This might work for you (GNU sed): sed ':a;s/\(\(([0-9]\+) \).\)\2/\1/g;ta' file

Related

filter data in text file and load into postgresql

Remove row from a file perl

output of two command need to be added as columns awk

Replace Nth linewise search result

How to replace the date with "sed" by the words "today" or "tomorrow"

Categories

Resources

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse

Remove duplicate numbers within parentheses using sed - sed

This might work for you (GNU sed): sed ':a;s/\(\(([0-9]\+) *\).*\)\2/\1/g;ta' file

Related

filter data in text file and load into postgresql

Remove row from a file perl

output of two command need to be added as columns awk

Replace Nth linewise search result

How to replace the date with "sed" by the words "today" or "tomorrow"

Categories

Resources

This might work for you (GNU sed): sed ':a;s/\(\(([0-9]\+) \).\)\2/\1/g;ta' file