I have a csv file exported from spreadsheet which has, in the last column, sometimes a list of names. The file comes out like this:
ag,bd,cj,dy,"ss"
aa,bs,cs,fg,"name1
name2
name3
"
ff,ce,sd,de,
ag,bd,jj,ds,"ds"
fs,ee,sd,ee,"name4
name5
"
and so on.
I would like to remove the line feed in the last column between quotes so that the output is:
ag,bd,cj,dy,ss
aa,bs,cs,fg,"name1 name2 name3"
ff,ce,sd,de,
ag,bd,jj,ds,"ds"
fs,ee,sd,ee,"name4 name5"
Thanks
This awk may be one solution for you:
awk '/\"/ {s=!s} {printf "%s"(s?FS:RS),$0}'
ag,bd,cj,dy,ss
aa,bs,cs,fg,"name1 name2 name3 "
ff,ce,sd,de,df
New solution
awk -F\" 'NF==3; NF==2 {s++} s==1 {printf "%s ",$0} s==2 {print;s=0}' | awk '{sub(/ "/,"\"")}1' file
ag,bd,cj,dy,"ss"
aa,bs,cs,fg,"name1 name2 name3"
ag,bd,jj,ds,"ds"
fs,ee,sd,ee,"name4 name5"
I need to generate a file.sql file from a file.csv, so I use this command :
cat file.csv |sed "s/\(.*\),\(.*\)/insert into table(value1, value2)
values\('\1','\2'\);/g" > file.sql
It works perfectly, but when the values exceed 9 (for example for \10, \11 etc...) it takes consideration of only the first number (which is \1 in this case) and ignores the rest.
I want to know if I missed something or if there is another way to do it.
Thank you !
EDIT :
The not working example :
My file.csv looks like
2013-04-01 04:00:52,2,37,74,40233964,3860,0,0,4878,174,3,0,0,3598,27.00,27
What I get
insert into table
val1,val2,val3,val4,val5,val6,val7,val8,val9,val10,val11,val12,val13,val14,val15,val16
values
('2013-04-01 07:39:43',
2,37,74,36526530,3877,0,0,6080,
2013-04-01 07:39:430,2013-04-01 07:39:431,
2013-04-01 07:39:432,2013-04-01 07:39:433,
2013-04-01 07:39:434,2013-04-01 07:39:435,
2013-04-01 07:39:436);
After the ninth element I get the first one instead of the 10th,11th etc...
As far I know sed has a limitation of supporting 9 back references. It might have been removed in the newer versions (though not sure). You are better off using perl or awk for this.
Here is how you'd do in awk:
$ cat csv
2013-04-01 04:00:52,2,37,74,40233964,3860,0,0,4878,174,3,0,0,3598,27.00,27
$ awk 'BEGIN{FS=OFS=","}{print "insert into table values (\x27"$1"\x27",$2,$3,$4,$5,$6,$7,$8,$9,$10,$11,$12,$13,$14,$15,$16 ");"}' csv
insert into table values ('2013-04-01 04:00:52',2,37,74,40233964,3860,0,0,4878,174,3,0,0,3598,27.00,27);
This is how you can do in perl:
$ perl -ple 's/([^,]+),([^,]+),([^,]+),([^,]+),([^,]+),([^,]+),([^,]+),([^,]+),([^,]+),([^,]+),([^,]+),([^,]+),([^,]+),([^,]+),([^,]+),([^,]+)/insert into table values (\x27$1\x27,$2,$3,$4,$5,$6,$7,$8,$9,$10,$11,$12,$13,$14,$15,$16);/' csv
insert into table values ('2013-04-01 04:00:52',2,37,74,40233964,3860,0,0,4878,174,3,0,0,3598,27.00,27);
Try an awk script (based on #JS웃 solution):
script.awk
#!/usr/bin/env awk
# before looping the file
BEGIN{
FS="," # input separator
OFS=FS # output separator
q="\047" # single quote as a variable
}
# on each line (no pattern)
{
printf "insert into table values ("
print q $1 q ", "
print $2,$3,$4,$5,$6,$7,$8,$9,$10,$11,$12,$13,$14,$15,$16
print ");"
}
Run with
awk -f script.awk file.csv
One-liner
awk 'BEGIN{OFS=FS=","; q="\047" } { printf "insert into table values (" q $1 q ", " $2","$3","$4","$5","$6","$7","$8","$9","$10","$11","$12","$13","$14","$15","$16 ");" }' file.csv
I have a file in stanza format. Example of the file are as below.
id_1:
id=241
pgrp=staff
groups=staff
home=/home/id_1
shell=/usr/bin/ks
id_2:
id=242
pgrp=staff
groups=staff
home=/home/id_2
shell=/usr/bin/ks
How do I use sed or awk to process it and return only the id name, id and groups in a single line and tab delimited format? e.g.:
id_1 241 staff
id_2 242 staff
with awk:
BEGIN { FS="="}
$1 ~ /id_/ { printf("%s", $1) }
$1 ~ /id/ && $1 !~ /_/ { printf("\t%s", $2) }
$1 ~ /groups/ { printf("\t%s\n", $2) }
Here is an awk solution:
translate.awk
#!/usr/bin/awk -f
{
if(match($1, /[^=]:[ ]*$/)){
id_=$1
sub(/:/,"",id_)
}
if(match($1,/id=/)){
split($1,p,"=")
id=p[2]
}
if(match($1,/groups=/)){
split($1,p,"=")
print id_," ",id," ",p[2]
}
}
Execute it either by:
chmod +x translated.awk
./translated.awk data.txt
or
awk -f translated.awk data.txt
For completeness, here comes a shortened version:
#!/usr/bin/awk -f
$1 ~ /[^=]:[ ]*$/ {sub(/:/,"",$1);printf $1" ";FS="="}
$1 ~ /id/ {printf $2" "}
$1 ~ /groups/ {print $2}
sed 'N;N;N;N;N;y/=\n/ /' data.txt | awk '{print $1,$3,$7}'
Here is the one-liner approach by setting RS:
awk 'NR>1{print "id_"++i,$3,$7}' RS='id_[0-9]+:' FS='[=\n]' OFS='\t' file
id_1 241 staff
id_2 242 staff
Requires GNU awk and assumes the IDs are in increasing order starting at 1.
If the ordering of the ID's is arbitrary:
awk '!/shell/&&NR>1{gsub(/:/,"",$1);print "id_"$1,$3,$5}' RS='id_' FS='[=\n]' OFS='\t' file
id_1 241 staff
id_2 242 staff
awk -F"=" '/id_/{split($0,a,":");}/id=/{i=$2}/groups/{printf a[1]"\t"i"\t"$2"\n"}' your_file
tested below:
> cat temp
id_1:
id=241
pgrp=staff
groups=staff
home=/home/id_1
shell=/usr/bin/ks
id_2:
id=242
pgrp=staff
groups=staff
home=/home/id_2
shell=/usr/bin/ks
> awk -F"=" '/id_/{split($0,a,":");}/id=/{i=$2}/groups/{printf a[1]"\t"i"\t"$2"\n"}' temp
id_1 241 staff
id_2 242 staff
This might work for you (GNU sed):
sed -rn '/^[^ :]+:/{N;N;N;s/:.*id=(\S+).*groups=(\S+).*/\t\1\t\2/p}' file
Look for a line holding an id then get the next 3 lines and re-arrange the output.
I wanted to grep a string at the first occurrence ONLY from a file (file.dat) and replace it by reading from another file (output). I have a file called "output" as an example contains "AAA T 0001"
#!/bin/bash
procdir=`pwd`
cat output | while read lin1 lin2 lin3
do
srt2=$(echo $lin1 $lin2 $lin3 | awk '{print $1,$2,$3}')
grep -m 1 $lin1 $procdir/file.dat | xargs -r0 perl -pi -e 's/$lin1/$srt2/g'
done
Basically what I wanted is: When ever a string "AAA" is grep'ed from the file "file.dat" at the first instance, I want to replace the second and third column next to "AAA" by "T 0001" but still keep the first column "AAA" as it is. Th above script basically does not work. Basically "$lin1" and $srt2 variables are not understood inside 's/$lin1/$srt2/g'
Example:
in my file.dat I have a row
AAA D ---- CITY COUNTRY
What I want is :
AAA T 0001 CITY COUNTRY
Any comments are very appreciated.
If you have output file like this:
$ cat output
AAA T 0001
Your file.dat file contains information like:
$ cat file.dat
AAA D ---- CITY COUNTRY
BBB C ---- CITY COUNTRY
AAA D ---- CITY COUNTRY
You can try something like this with awk:
$ awk '
NR==FNR {
a[$1]=$0
next
}
$1 in a {
printf "%s ", a[$1]
delete a[$1]
for (i=4;i<=NF;i++) {
printf "%s ", $i
}
print ""
next
}1' output file.dat
AAA T 0001 CITY COUNTRY
BBB C ---- CITY COUNTRY
AAA D ---- CITY COUNTRY
Say you place the string for which to search in $s and the string with which to replace in $r, wouldn't the following do?
perl -i -pe'
BEGIN { ($s,$r)=splice(#ARGV,0,2) }
$done ||= s/\Q$s/$r/;
' "$s" "$r" file.dat
(Replaces the first instance if present)
This will only change the first match in the file:
#!/bin/bash
procdir=`pwd`
while read line; do
set $line
sed '0,/'"$1"'/s/\([^ ]* \)\([^ ]* [^ ]*\)/\1'"$2 $3"'/' $procdir/file.dat
done < output
To change all matching lines:
sed '/'"$1"'/s/\([^ ]* \)\([^ ]* [^ ]*\)/\1'"$2 $3"'/' $procdir/file.dat