How to use sed for substituting 2nd column in shell - sed

I have file that looks like this :
1,2,3,4
5,6,7,8
I want to substitute 2rd column containing 6 to 89. The desired output is
1,2,3,4
5,89,7,8
But if I type
index=2
cat file | sed 's/[^,]*/89/'$index
I get
1,89,3,4
5,89,7,8
and if I type
index=2
cat file | sed 's/[^,]6/89/'$index
nothing changes.
Why is it like this? How can I fix this? Thank you.

Since you want to change the second column containing a 6 and you have a comma as field separator it is actually very easy with sed:
sed 's/^\([^,]*\),6,/\1,89,/`
Here we make use of back-referencing to remember the first column.
If you want to replace the 6 in the 5th column, you can do something like:
sed 's/^\(\([^,]*,\)\{4\}\)6,/\189,/'
It is, however, much more comfortable using awk:
awk 'BEGIN{FS=OFS=","}($2==6){$2=89}1'

I solved this by using awk
awk 'BEGIN{FS=OFS=","} {if ($2==6) $2=89}1' file >file1

Related

sed with vertical bar?

I have a list
>ANARCI-HMM_human_167.7|pdb|7EPU|A
>ANARCI-HMM_alpaca_173.7|pdb|7EVY|E
>ANARCI-HMM_alpaca_172.8|pdb|7F2O|S
>ANARCI-HMM_alpaca_171.8|pdb|7F4F|S
>ANARCI-HMM_alpaca_173.6|pdb|7F8W|D
I want to remove from ANARCI to the first vertical bar |.
expecting
>pdb|7EPU|A
>pdb|7EVY|E
>pdb|7F2O|S
>pdb|7F4F|S
>pdb|7F8W|D
I tried
sed 's/ANARCI.*\|//g'
but didn't work.
Do you have any idea how to sed in this case?
Using sed
$ sed 's/[A-Z][^|]*|//' input_file
>pdb|7EPU|A
>pdb|7EVY|E
>pdb|7F2O|S
>pdb|7F4F|S
>pdb|7F8W|D
If you want to remove from ANARCIat the first vertical bar |, try this:
sed 's/ANARCI[^|]*\|//g'
or
sed 's/ANARCI[^|]*\|(.*)/\1\2/'
1st solution: With your shown samples, please try following sed code.
sed -E 's/(.*)ANARCI[^|]*\|(.*)/\1\2/' Input_file
Explanation: Adding detailed explanation for above sed code.
Using -E option of sed to enable ERE(extended regular expression) for program.
Then using sed's capability of storing matched patterns into temporary buffer memory(called capturing groups), by which we can make use of caught values while substitution.
Creating 2 capturing groups here, 1st which has everything before ANARCI string and 2nd capturing group which has everything after first pipe(matching from ANARCI to till first pipe) to get rest of part after first pipe.
While performing substitution substituting line with 1st and 2nd capturing group.
2nd solution: You could use awk for this task also, use match function of awk. Simple explanation would be, using match function of awk and matching only part which you don't required in output, while printing the values printing everything else apart from matched part(which is not required).
awk 'match($0,/ANARCI[^|]*/){print substr($0,1,RSTART-1) substr($0,RSTART+RLENGTH+1)}' Input_file
3rd solution: Adding 1 more solution in awk, where setting field separators to: from string ANARCI to till first occurrence of pipe. Then in main awk program printing 1st and last field, required values as per shown samples.
awk -v FS="ANARCI[^\\\\|]*\\\\|" '{print $1 $NF}' Input_file
Try:
sed 's/ANARCI[^|]*\|//g'
to not match the |

How to delete certain pattern in a record?

I have a file which has hundreds of recorded in the below format:
20150416110321|21,VPLA,91974737XXX5|91974737XXX5,404192086271201|404192086271201,SAI-IMEISV,gsn65.xxxxx.com,gsn65.xxxxx.com;1429148977;301814701;11276100,100.XX.199.250|100.XX.199.XXX|,1,SAIOLU-Location,SAIOLU-LG,2,internet|internet,,SAIOLU-SGSNIP,6,AL,AL_F_1_25G40K_2_25G20K_28|KL_BASIC,,UNKNOWN,SAIOLU-MK,UNKNOWN,SAIOLU-MBRUL,SAIOLU-MBRDL,,,,SAI-IMEI,,,,
I want to take only the first part of the pipe separated data in fields/columns 1-8. How can I do that with awk/sed ?
For example:
20150416110321,VPLA,91974737XXX5,404192086271201,SAI-IMEISV,gsn65.xxxxx.com;1429148977;301814701;11276100,100.XX.199.250,1,SAIOLU-Location,SAIOLU-LG,2,internet|internet,,SAIOLU-SGSNIP,6,AL,AL_F_1_25G40K_2_25G20K_28|KL_BASIC,,UNKNOWN,SAIOLU-MK,UNKNOWN,SAIOLU-MBRUL,SAIOLU-MBRDL,,,,SAI-IMEI,,,,
Thanks
You could use awk.
$ awk -F, -v OFS="," '{for(i=1;i<=8;i++)sub(/\|.*/,"",$i)}1' file
20150416110321,VPLA,91974737XXX5,404192086271201,SAI-IMEISV,gsn65.xxxxx.com,gsn65.xxxxx.com;1429148977;301814701;11276100,100.XX.199.250,1,SAIOLU-Location,SAIOLU-LG,2,internet|internet,,SAIOLU-SGSNIP,6,AL,AL_F_1_25G40K_2_25G20K_28|KL_BASIC,,UNKNOWN,SAIOLU-MK,UNKNOWN,SAIOLU-MBRUL,SAIOLU-MBRDL,,,,SAI-IMEI,,,,
sed ':cycle
s/^\(\([^,]*,\)\{0,7\}[^,|]*\)|[^,]*/\1/;t cycle' YourFile
recursively remove all content between | and next , included for first 8 group separate by ,

Select specific items from a file using sed

I'm very much a junior when it comes to the sed command, and my Bruce Barnett guide sits right next to me, but one thing has been troubling me. With a file, can you filter it using sed to select only specific items? For example, in the following file:
alpha|november
bravo|october
charlie|papa
alpha|quebec
bravo|romeo
charlie|sahara
Would it be possible to set a command to return only the bravos, like:
bravo|october
bravo|romeo
With sed:
sed '/^bravo|/!d' filename
Alternatively, with grep (because it's sort of made for this stuff):
grep '^bravo|' filename
or with awk, which works nicely for tabular data,
awk -F '|' '$1 == "bravo"' filename
The first two use a regular expression, selecting those lines that match it. In ^bravo|, ^ matches the beginning of the line and bravo| the literal string bravo|, so this selects all lines that begin with bravo|.
The awk way splits the line across the field separator | and selects those lines whose first field is bravo.
You could also use a regex with awk:
awk '/^bravo|/' filename
...but I don't think this plays to awk's strengths in this case.
Another solution with sed:
sed -n '/^bravo|/p' filename
-n option => no printing by default.
If line begins with bravo|, print it (p)
2 way (at least) with sed
removing unwanted line
sed '/^bravo\|/ !d' YourFile
Printing only wanted lines
sed -n '/^bravo\|/ p' YourFile
if no other constraint or action occur, both are the same and a grep is better.
If there will be some action after, it could change the performance where a d cycle directly to the next line and a p will print then continue the following action.
Note the escape of pipe is needed for GNU sed, not on posix version

Limiting the sed search to 2 nd column in a file

Below is the content of ma file (sample.txt):
CQUAD4 5600000 560005 5602371 5602367 5602374 5602372 0. -1.75
CQUAD4 5600003 560005 5600000 5602367 5602374 5602372 0. -1.75
Am using the below command:
sed -i "s#\(\s*\w*\s*\)\(5600000\)\(\s*\)\([0-9]*\)\(.*\)#\1\2\36000 \5#g" sample.txt
I want to restrict the pattern matching 5600000 to only second column and then do a replace with '6000 '.
Can somebody help me...please
Here's a possible solution with GNU sed. Anchor the search to start of line with ^.
sed -i -r "s#^(\s*\S+\s+)5600000\s+#\16000 #" sample.txt
awk might be a little more natural for this:
awk '$2=="5600000"{$2="6000";print} 1' sample.txt
That basically says "if the second field is 5600000, replace it with 6000 and print the line, otherwise just print the line".
The one downside I see is that this might, depending on your version of awk, collapse multiple spaces down to one, which may mess with the alignment of your columns. You'll have to decide if that's a problem or not...

Using sed or awk, how can I alter the first field in a delimited line?

I have a delimited file whose first few fields look like this:
2774013300|184500|2012-01-04 23:00:00|
and I want to alter certain rows whose first field equals or exceeds 8 characters.
I want to truncate the value in the first column.
In the case of 2774013300 I want its value to become become 27740133.
I would like to do this in sed, preferably, or awk.
Using sed, I can find any number that exceeds 8 digits at the beginning of the line, but am not quite sure how to truncate it, using, I would assume, substitute.
sed -n -e /'^[0-9]\{10,\}/p' infile
I am thinking I could use grouping for the first 8 characters and return those in a substitute command, but I'm not quite sure how to do that.
In awk, I can detect the first field, but am not quite sure how to use substr to alter the first field and then return the remaining fields, so a full line is preserved.
awk -F'|' '{ if (length($1) > 9) { print $1; print length($1);} }' infile
Depending on the subtleties of your situation, you can use
sed 's/^\([0-9]\{8\}\)[0-9]*/\1/' infile
or
sed 's/^\([0-9]\{8\}\)[0-9]\{1,\}/\1/' infile
which with GNU sed can be simplified to
sed -r 's/^([0-9]{8})[0-9]+/\1/' infile
or, if you need to, add -n and p.
Example:
$ sed 's/^\([0-9]\{8\}\)[0-9]*/\1/' <<<'2774013300|184500|2012-01-04 23:00:00|'
27740133|184500|2012-01-04 23:00:00|
Using awk:
awk -F'|' 'BEGIN{OFS=FS}length($1)>9{$1=substr($1, 0,9)}{print}'
example:
$ echo "2774013300|184500|2012-01-04 23:00:00|" | awk -F'|' 'BEGIN{OFS=FS}length($1)>9{$1=substr($1, 0,9)}{print}'
27740133|184500|2012-01-04 23:00:00|