How to delete certain pattern in a record? - perl

I have a file which has hundreds of recorded in the below format:
20150416110321|21,VPLA,91974737XXX5|91974737XXX5,404192086271201|404192086271201,SAI-IMEISV,gsn65.xxxxx.com,gsn65.xxxxx.com;1429148977;301814701;11276100,100.XX.199.250|100.XX.199.XXX|,1,SAIOLU-Location,SAIOLU-LG,2,internet|internet,,SAIOLU-SGSNIP,6,AL,AL_F_1_25G40K_2_25G20K_28|KL_BASIC,,UNKNOWN,SAIOLU-MK,UNKNOWN,SAIOLU-MBRUL,SAIOLU-MBRDL,,,,SAI-IMEI,,,,
I want to take only the first part of the pipe separated data in fields/columns 1-8. How can I do that with awk/sed ?
For example:
20150416110321,VPLA,91974737XXX5,404192086271201,SAI-IMEISV,gsn65.xxxxx.com;1429148977;301814701;11276100,100.XX.199.250,1,SAIOLU-Location,SAIOLU-LG,2,internet|internet,,SAIOLU-SGSNIP,6,AL,AL_F_1_25G40K_2_25G20K_28|KL_BASIC,,UNKNOWN,SAIOLU-MK,UNKNOWN,SAIOLU-MBRUL,SAIOLU-MBRDL,,,,SAI-IMEI,,,,
Thanks

You could use awk.
$ awk -F, -v OFS="," '{for(i=1;i<=8;i++)sub(/\|.*/,"",$i)}1' file
20150416110321,VPLA,91974737XXX5,404192086271201,SAI-IMEISV,gsn65.xxxxx.com,gsn65.xxxxx.com;1429148977;301814701;11276100,100.XX.199.250,1,SAIOLU-Location,SAIOLU-LG,2,internet|internet,,SAIOLU-SGSNIP,6,AL,AL_F_1_25G40K_2_25G20K_28|KL_BASIC,,UNKNOWN,SAIOLU-MK,UNKNOWN,SAIOLU-MBRUL,SAIOLU-MBRDL,,,,SAI-IMEI,,,,

sed ':cycle
s/^\(\([^,]*,\)\{0,7\}[^,|]*\)|[^,]*/\1/;t cycle' YourFile
recursively remove all content between | and next , included for first 8 group separate by ,

Related

sed - Replace comma after first regex match

i m trying to perform the following substitution on lines of the general format:
BBBBBBB.2018_08,XXXXXXXXXXXXX,01/01/2014,"109,07",DF,CCCCCCCCCCC, .......
as you see the problem is that its a comma separated file, with a specific field containing a comma decimal. I would like to replace that with a dot .
I ve tried this, to replace the first occurence of a pattern after match, but to no avail, could someone help me?
sed -e '/,"/!b' -e "s/,/./"
sed -e '/"/!b' -e ':a' -e "s/,/\./"
Thanks in advance. An awk or perl solution would help me as well. Here's an awk effort:
gawk -F "," 'substr($10, 0, 3)==3 && length($10)==12 { gsub(/,/,".", $10); print}'
That yielded the same file unchanged.
CSV files should be parsed in awk with a proper FPAT variable that defines what constitutes a valid field in such a file. Once you do that, you can just iterate over the fields to do the substitution you need
gawk 'BEGIN { FPAT = "([^,]+)|(\"[^\"]+\")"; OFS="," }
{ for(i=1; i<=NF;i++) if ($i ~ /[,]/) gsub(/[,]/,".",$i);}1' file
See this answer of mine to understand how to define and parse CSV file content with FPAT variable. Also see Save modifications in place with awk to do in-place file modifications like sed -i''.
The following sed will convert all decimal separators in quoted numeric fields:
sed 's/"\([-+]\?[0-9]*\)[,]\?\([0-9]\+\([eE][-+]\?[0-9]+\)\?\)"/"\1.\2"/g'
See: https://www.regular-expressions.info/floatingpoint.html
This might work for you (GNU sed):
sed -E ':a;s/^([^"]*("[^",]*"[^"]*)*"[^",]*),/\1./;ta' file
This regexp matches a , within a pair of "'s and replaces it by a .. The regexp is anchored to the start of the line and thus needs to be repeated until no further matches can be matched, hence the :a and the ta commands which causes the substitution to be iterated over whilst any substitution is successful.
N.B. The solution expects that all double quotes are matched and that no double quotes are quoted i.e. \" does not appear in a line.
If your input always follows that format of only one quoted field containing 1 comma then all you need is:
$ sed 's/\([^"]*"[^"]*\),/\1./' file
BBBBBBB.2018_08,XXXXXXXXXXXXX,01/01/2014,"109.07",DF,CCCCCCCCCCC, .......
If it's more complicated than that then see What's the most robust way to efficiently parse CSV using awk?.
Assuming you have this:
BBBBBBB.2018_08,XXXXXXXXXXXXX,01/01/2014,"109,07",DF,CCCCCCCCCCC
Try this:
awk -F',' '{print $1,$2,$3,$4"."$5,$6,$7}' filename | awk '$1=$1' FS=" " OFS=","
Output will be:
BBBBBBB.2018_08,XXXXXXXXXXXXX,01/01/2014,"109.07",DF,CCCCCCCCCCC
You simply need to know the field numbers for replacing the field separator between them.
In order to use regexp as in perl you have to activate extended regular expression with -r.
So if you want to replace all numbers and omit the " sign, then you can use this:
echo 'BBBBBBB.2018_08,XXXXXXXXXXXXX,01/01/2014,"109,07",DF,CCCCCCCCCCC, .......'|sed -r 's/\"([0-9]+)\,([0-9]+)\"/\1\.\2/g'
If you want to replace first occurrence only you can use that:
echo 'BBBBBBB.2018_08,XXXXXXXXXXXXX,01/01/2014,"109,07",DF,CCCCCCCCCCC, .......'|sed -r 's/\"([0-9]+)\,([0-9]+)\"/\1\.\2/1'
https://www.gnu.org/software/sed/manual/sed.txt

How to use sed for substituting 2nd column in shell

I have file that looks like this :
1,2,3,4
5,6,7,8
I want to substitute 2rd column containing 6 to 89. The desired output is
1,2,3,4
5,89,7,8
But if I type
index=2
cat file | sed 's/[^,]*/89/'$index
I get
1,89,3,4
5,89,7,8
and if I type
index=2
cat file | sed 's/[^,]6/89/'$index
nothing changes.
Why is it like this? How can I fix this? Thank you.
Since you want to change the second column containing a 6 and you have a comma as field separator it is actually very easy with sed:
sed 's/^\([^,]*\),6,/\1,89,/`
Here we make use of back-referencing to remember the first column.
If you want to replace the 6 in the 5th column, you can do something like:
sed 's/^\(\([^,]*,\)\{4\}\)6,/\189,/'
It is, however, much more comfortable using awk:
awk 'BEGIN{FS=OFS=","}($2==6){$2=89}1'
I solved this by using awk
awk 'BEGIN{FS=OFS=","} {if ($2==6) $2=89}1' file >file1

Select specific items from a file using sed

I'm very much a junior when it comes to the sed command, and my Bruce Barnett guide sits right next to me, but one thing has been troubling me. With a file, can you filter it using sed to select only specific items? For example, in the following file:
alpha|november
bravo|october
charlie|papa
alpha|quebec
bravo|romeo
charlie|sahara
Would it be possible to set a command to return only the bravos, like:
bravo|october
bravo|romeo
With sed:
sed '/^bravo|/!d' filename
Alternatively, with grep (because it's sort of made for this stuff):
grep '^bravo|' filename
or with awk, which works nicely for tabular data,
awk -F '|' '$1 == "bravo"' filename
The first two use a regular expression, selecting those lines that match it. In ^bravo|, ^ matches the beginning of the line and bravo| the literal string bravo|, so this selects all lines that begin with bravo|.
The awk way splits the line across the field separator | and selects those lines whose first field is bravo.
You could also use a regex with awk:
awk '/^bravo|/' filename
...but I don't think this plays to awk's strengths in this case.
Another solution with sed:
sed -n '/^bravo|/p' filename
-n option => no printing by default.
If line begins with bravo|, print it (p)
2 way (at least) with sed
removing unwanted line
sed '/^bravo\|/ !d' YourFile
Printing only wanted lines
sed -n '/^bravo\|/ p' YourFile
if no other constraint or action occur, both are the same and a grep is better.
If there will be some action after, it could change the performance where a d cycle directly to the next line and a p will print then continue the following action.
Note the escape of pipe is needed for GNU sed, not on posix version

keep the first part and delete the rest on a specified line using sed

I know a line number in a file, wherein I want to keep the first word and delete the rest till the end of the line. How do I do this using sed ?
So lets say, I want to go to line no 10 in a file, which looks like this -
goodword "blah blah"\
and what i want is
goodword
I have tried this - sed 's/([a-z])./\1/'
But this does it on all the lines in a file. I want it only on one specified line.
If by "first word" you mean "everything up to the first space", and if by "retain this change in the file itself" you mean that you don't mind creating a new file with the same name as the previous file, and if you have a sed that supports -i, you can probably just do:
sed -i '10s/ .*//' input-file
If you want to be more restrictive in the definition of a word, you can use '10s/\([a-z]*\).*/\1/'
Can you use grep or awk to grab just one line, and then pipe it into sed (if grep or awk couldn't do the entire job for you) to work on just one line? I think the key here is isolating that one line first, and then worrying about extracting something from it.
Using awk
awk 'NR==10 {print $1}' file
goodword

Delete line from a text file that contains any string from another file using sed/awk/etc

I'm a bit of a total beginner when it comes to programming and I appreciate all help you are willing to provide.
Here's my problem...
I have a data.txt file with a lot of lines in it and a strings.txt that contains some strings (1 string per line).
I want to delete all lines from data.txt if they contain any string from strings.txt and to save that new file as proc_data.txt.
I know that I could use sed to search and delete for 1 or more strings but having 500+ strings to type in a CLI makes it ... well, you know.
What I've tried so far
~$ for i in `cat strings.txt`; do sed '/${i}/d' data.txt -i.bak; done
but it just makes a backup of data.txt with the same size.
What am I doing wrong?
Use grep:
LC_ALL=C fgrep -v -f strings.txt data.txt >proc_data.txt
It searches all strings of strings.txt in data.txt with switch -f. Reverse the result adding -v. Redirect output to your desired file.