How do I get rid of prefixes from a csv using sed? - sed

Consider this sample csv file:
potato.cat,potato.dog
3,5
7,1
Using sed, I'm trying to get:
cat,dog
3,5
7,1
I've tried: cat test.csv | sed '1 s/(\w+\.(\w+))\,?/$1\,/g' and variants
thereof, but haven't quite been able to get it. I feel like this should be easy but I'm making a botch of it. Could someone point me in the right direction?

Try this :
$ sed -E '1 s/[a-z]+\.//g' test.csv
cat,dog
3,5
7,1
No need to pipe cat | sed

Related

Why is sed returning more characters than requested

In a part of my script I am trying to generate a list of the year and month that a file was submitted. Since the file contains the timestamp, I should be able to cut the filenames to the month position, and then do a sort+uniq filtering. However sed is generating an outlier for one of the files.
I am using this command sequence
ls -1 service*json | sed -e "s|\(.*201...\).*json$|\1|g" | sort |uniq
And this works for most of time except in some cases it outputs the whole timestamp:
$ ls
service-parent-20181119092630.json service-parent-20181123134132.json service-parent-20181202124532.json service-parent-20190121091830.json service-parent-20190125124209.json
service-parent-20181119101003.json service-parent-20181126104300.json service-parent-20181211095939.json service-parent-20190121092453.json service-parent-20190128163539.json
service-parent-20181120095850.json service-parent-20181127083441.json service-parent-20190107035508.json service-parent-20190122093608.json
service-parent-20181120104838.json service-parent-20181129155835.json service-parent-20190107042234.json service-parent-20190122115053.json
$ ls -1 service*json | sed -e "s|\(.*201...\).*json$|\1|g" | sort |uniq
service-parent-201811
service-parent-201811201048
service-parent-201812
service-parent-201901
I have also tried this variation but the second output line is still returned:
ls -1 service*json | sed -e "s|\(.*201.\{3\}\).*json$|\1|g" | sort |uniq
Can somebody explain why service-parent-201811201048 is returned past the requested 3 characters?
Thanks.
service-parent-201811201048 happens to have 201048 to match 201....
Might try ls -1 service*json | sed -e "s|\(.*-201...\).*json$|\1|g" | sort |uniq to ask for a dash - before 201....
It is not recommended to parse the output of ls. Please try instead:
for i in service*json; do
sed -e "s|^\(service-.*-201[0-9]\{3\}\).*json$|\1|g" <<< "$i"
done | sort | uniq
Your problem is explained at https://stackoverflow.com/a/54565973/1745001 (i.e. .* is greedy) but try this:
$ ls | sed -E 's/(-[0-9]{6}).*/\1/' | sort -u
service-parent-201811
service-parent-201812
service-parent-201901
The above requires a sed that supports EREs via -E, e.g. GNU sed and OSX/BSD sed.

manipulation of text by sed command

I a file containing the genome ids following NZ_FLAT01000030.1_173 I need to manipulate those ids like this one: NZ_FLAT01000030.1
I tried some but didn't give me the exact thing.
sed 's/_/\t/' output : NZ FLAT01000030.1_173
sed -r 's/_//' output: NZFLAT01000030.1_173
sed -r 's/_//g' output: NZFLAT01000030.1173
How can I do that by using sed command?
Are you trying to remove the undesrscore and the digits following it?
echo 'NZ_FLAT01000030.1_173' | sed -E 's/_[0-9]+//g'
NZ_FLAT01000030.1
$ echo 'NZ_FLAT01000030.1_173' | sed 's/_[^_]*$//'
NZ_FLAT01000030.1

sed: convert format of date

Fairly new to sed. I am trying to write a sed command that converts dates to the reverse, but not if they're part of other words.
So far I have:
sed 's/[0-9]\{1\}/[0-9]\{1\}/[0-9]\{4\}/SUBSTITUTE/g'
Trying to figure out the substitute part. Thank you!
You need to use word boundaries.
sed 's~\b\([0-9]\{2\}\)/\([0-9]\{2\}\)/\([0-9]\{4\}\)\b~\3/\2/\1~g' file
Example:
$ echo '04/13/1991hello' | sed 's~\b\([0-9]\{2\}\)/\([0-9]\{2\}\)/\([0-9]\{4\}\)\b~\3/\1/\2~g'
04/13/1991hello
$ echo '02/03/2001' | sed 's~\b\([0-9]\{2\}\)/\([0-9]\{2\}\)/\([0-9]\{4\}\)\b~\3/\2/\1~g'
2001/03/02

GNU sed: global substitution failing

I have a file called test.csv with the following content:
T1,T2,T3,T4
10,2,3,17
10,2,5,14
10,2,2,16
15,1,17,15
12,1,9,25
I want to replace all the values 17 on the fourth column by 25. So I tried the command:
cat test.csv | sed -r 's/(([1-9]+,){3})17/\125/g'
T1,T2,T3,T4
10,2,3,17
10,2,5,14
10,2,2,16
15,1,17,15
12,1,9,25
As you can see, only the last row was modified, but not the second.
However, if I do: cat test.csv | sed -r "s/([0-9]+,[0-9]+,[0-9]+,)17/\125/" I have the output I want. Why is that?
The reason your sed line didn't work is:
if you check your sed line carefully,
cat test.csv | sed -r 's/(([1-9]+,){3})17/\125/g' (your sed line)
you had [1-9] not [0-9], fix that and try again, it should work for you.
also the cat file is not required. you can do sed '...' file

Filter text based in a multiline match criteria

I have the following sed command. I need to execute the below command in single line
cat File | sed -n '
/NetworkName/ {
N
/\n.*ims3/ p
}' | sed -n 1p | awk -F"=" '{print $2}'
I need to execute the above command in single line. can anyone please help.
Assume that the contents of the File is
System.DomainName=shayam
System.Addresses=Fr6
System.Trusted=Yes
System.Infrastructure=No
System.NetworkName=AS
System.DomainName=ims5.com
System.DomainName=Ram
System.Addresses=Fr9
System.Trusted=Yes
System.Infrastructure=No
System.NetworkName=Peer
System.DomainName=ims7.com
System.DomainName=mani
System.Addresses=Hello
System.Trusted=Yes
System.Infrastructure=No
System.NetworkName=Peer
System.DomainName=ims3.com
And after executing the command you will get only peer as the output. Can anyone please help me out?
You can use a single nawk command. And you can lost the useless cat
nawk -F"=" '/NetworkName/{n=$2;getline;if($2~/ims3/){print n} }' file
You can use sed as well as proposed by others, but i prefer less regex and less clutter.
The above save the value of the network name to "n". Then, get the next line and check the 2nd field against "ims3". If matched, then print the value of "n".
Put that code in a separate .sh file, and run it as your single-line command.
cat File | sed -n '/NetworkName/ { N; /\n.*ims3/ p }' | sed -n 1p | awk -F"=" '{print $2}'
Assuming that you want the network name for the domain ims3, this command line works without sed:
grep -B 1 ims3 File | head -n 1 | awk -F"=" '{print $2}'
So, you want the network name where the domain name on the following line includes 'ims3', and not the one where the following line includes 'ims7' (even though the network names in the example are the same).
sed -n '/NetworkName/{N;/ims3/{s/.*NetworkName=\(.*\)\n.*/\1/p;};}' File
This avoids abuse of felines, too (not to mention reducing the number of commands executed).
Tested on MacOS X 10.6.4, but there's no reason to think it won't work elsewhere too.
However, empirical evidence shows that Solaris sed is different from MacOS sed. It can all be done in one sed command, but it needs three lines:
sed -n '/NetworkName/{N
/ims3/{s/.*NetworkName=\(.*\)\n.*/\1/p;}
}' File
Tested on Solaris 10.
You just need to put -e pretty much everywhere you'd break the command at a newline or have a semicolon. You don't need the extra call to sed or awk or cat.
sed -n -e '/NetworkName/ {' -e 'N' -e '/\n.*ims3/ s/[^\n]*=\(.*\).*/\1/P' -e '}' File