need help removing time from a csv file - sed

im trying to process a csv and make it easier for sorting, and i need to remove the time and the dash from it. the file has entries like this:
James,07/20/2009-14:40:11
Steve,08/06/2006-02:34:37
John,11/03/2008-12:12:34
and parse it into this:
James,07/20/2009
Steve,08/06/2006
John,11/03/2008
im guessing sed is the right tool for this job?
thanks for your help.

Python
import csv
import datetime
rdr = csv.reader( open("someFile.csv", "rb" ) )
rows = list( reader )
rdr.close()
def byDateTime( aRow ):
return return datetime.datetime.strptime( aRow[1], "%m/%d/%Y-%H:%M:%S" )
rows.sort( key= byDateTime )
wtr = csv.writer( open("sortedFile.csv", "wb" ) )
wtr.writerows( rows )
wtr.close()

cut -d '-' -f 1 file
Edit after comment:
sed 's/-[0-9][0-9]:[0-9][0-9]:[0-9][0-9]//g' file

just use awk
awk -F"," '{ split($2,_,"-"); print $1,_[1] }' OFS="," file

Yes, I think sed is the right tool for the job:
sed 's/-[:0-9]*$//' file

Related

I am trying to filter records based on date field format "YYYYMMDD" through awk command . Source & Target file is comma separated with Header

awk 'BEGIN {FS = ","};(NR>=2){($2 > "20210331");}' test1.csv > test.csv
File test1.csv:
Col1,Col2,Col3
A,20210101,JohnA
B,20210101,JohnB
G,20210501,JohnG
C,20210108,JohnC
D,20210202,JohnD
E,20210331,JohnE
F,20210401,JohnF
H,20210715,JohnH
Expected output:
Col1,Col2,Col3
G,20210501,JohnG
F,20210401,JohnF
H,20210715,JohnH
You can simply treat the dates in your shown samples like integers and compare them. In order to print the header, you need a separate condition.
awk 'BEGIN{FS=OFS=","} FNR==1{print;next} 20210331<$2' Input_file
I prefer the shorter code below:
awk 'BEGIN{FS = OFS = ","}(FNR == 1) || ($2 > 20210331)' test1.csv

Remove any text between two parameters not working properly

I need to remove any data between , and ( and the "," along with it.
I'm currently using the below command.
sed -i '/,/,/(/{//!d;s/ ,$//}' test1.txt
cat test1.txt
CREATE SET TABLE EDW_EXTRC_TAB.AVER_MED_CLM_HDR_EXTRC
,NO FALLBACK ,
NO BEFORE JOURNAL,
NO AFTER JOURNAL,
CHECKSUM = DEFAULT,
DEFAULT MERGEBLOCKRATIO
(
EXTRC_RUN_ID INTEGER NOT NULL,
Current Output
CREATE SET TABLE EDW_EXTRC_TAB.AVER_MED_CLM_HDR_EXTRC
,NO FALLBACK (
EXTRC_RUN_ID INTEGER NOT NULL,
Expected Output:
CREATE SET TABLE EDW_EXTRC_TAB.AVER_MED_CLM_HDR_EXTRC
(
EXTRC_RUN_ID INTEGER NOT NULL,
What is wrong here ?
Any suggestions?
Thanks in advance.
Two approaches:
-- GNU sed approach:
sed -z 's/,[^(]*//' test1.txt
-- GNU awk approach:
awk -v RS= '{ sub(/,[^(]+/,"",$0) }1' test1.txt
The output:
CREATE SET TABLE EDW_EXTRC_TAB.AVER_MED_CLM_HDR_EXTRC
( EXTRC_RUN_ID INTEGER NOT NULL,

How to insert files above every line with a variable in sed?

I have a file with this values:
[mik#mikypc ~]$ cat file.txt
id=cat8760004
id=cat1350003
id=cat9020002
And I want to insert id with the value above every line, so the result will be:
New_id=cat8760004
id=cat8760004
New_id=cat1350003
id=cat1350003
New_id=cat9020002
id=cat9020002
How could I do that?, I have tried with sed, but I cannot replace the variable:
[mik#mikypc ~]$ cat file.txt | sed '/cat\([0-9][0-9]*\)/ i\New_id &'
New_id &
id=cat8760004
New_id &
id=cat1350003
New_id &
id=cat9020002
I suggest with GNU sed:
sed 's/.*/New_&\n&/' file.txt
Output:
New_id=cat8760004
id=cat8760004
New_id=cat1350003
id=cat1350003
New_id=cat9020002
id=cat9020002

SED command Issue with values exceeding 9

I need to generate a file.sql file from a file.csv, so I use this command :
cat file.csv |sed "s/\(.*\),\(.*\)/insert into table(value1, value2)
values\('\1','\2'\);/g" > file.sql
It works perfectly, but when the values exceed 9 (for example for \10, \11 etc...) it takes consideration of only the first number (which is \1 in this case) and ignores the rest.
I want to know if I missed something or if there is another way to do it.
Thank you !
EDIT :
The not working example :
My file.csv looks like
2013-04-01 04:00:52,2,37,74,40233964,3860,0,0,4878,174,3,0,0,3598,27.00,27
What I get
insert into table
val1,val2,val3,val4,val5,val6,val7,val8,val9,val10,val11,val12,val13,val14,val15,val16
values
('2013-04-01 07:39:43',
2,37,74,36526530,3877,0,0,6080,
2013-04-01 07:39:430,2013-04-01 07:39:431,
2013-04-01 07:39:432,2013-04-01 07:39:433,
2013-04-01 07:39:434,2013-04-01 07:39:435,
2013-04-01 07:39:436);
After the ninth element I get the first one instead of the 10th,11th etc...
As far I know sed has a limitation of supporting 9 back references. It might have been removed in the newer versions (though not sure). You are better off using perl or awk for this.
Here is how you'd do in awk:
$ cat csv
2013-04-01 04:00:52,2,37,74,40233964,3860,0,0,4878,174,3,0,0,3598,27.00,27
$ awk 'BEGIN{FS=OFS=","}{print "insert into table values (\x27"$1"\x27",$2,$3,$4,$5,$6,$7,$8,$9,$10,$11,$12,$13,$14,$15,$16 ");"}' csv
insert into table values ('2013-04-01 04:00:52',2,37,74,40233964,3860,0,0,4878,174,3,0,0,3598,27.00,27);
This is how you can do in perl:
$ perl -ple 's/([^,]+),([^,]+),([^,]+),([^,]+),([^,]+),([^,]+),([^,]+),([^,]+),([^,]+),([^,]+),([^,]+),([^,]+),([^,]+),([^,]+),([^,]+),([^,]+)/insert into table values (\x27$1\x27,$2,$3,$4,$5,$6,$7,$8,$9,$10,$11,$12,$13,$14,$15,$16);/' csv
insert into table values ('2013-04-01 04:00:52',2,37,74,40233964,3860,0,0,4878,174,3,0,0,3598,27.00,27);
Try an awk script (based on #JS웃 solution):
script.awk
#!/usr/bin/env awk
# before looping the file
BEGIN{
FS="," # input separator
OFS=FS # output separator
q="\047" # single quote as a variable
}
# on each line (no pattern)
{
printf "insert into table values ("
print q $1 q ", "
print $2,$3,$4,$5,$6,$7,$8,$9,$10,$11,$12,$13,$14,$15,$16
print ");"
}
Run with
awk -f script.awk file.csv
One-liner
awk 'BEGIN{OFS=FS=","; q="\047" } { printf "insert into table values (" q $1 q ", " $2","$3","$4","$5","$6","$7","$8","$9","$10","$11","$12","$13","$14","$15","$16 ");" }' file.csv

Compare semicolon separated data in 2 files using shell script

I have some data (separated by semicolon) with close to 240 rows in a text file temp1.
temp2.txt stores 204 rows of data (separated by semicolon).
I want to:
Sort the data in both files by field1, i.e. the first data field in every row.
Compare the data in both files and redirect the rows that are not equal in separate files.
Sample data:
temp1.txt
1000xyz400100xyzA00680xyz0;19722.83;19565.7;157.13;11;2.74;11.00
1000xyz400100xyzA00682xyz0;7210.68;4111.53;3099.15;216.95;1.21;216.94
1000xyz430200xyzA00651xyz0;146.70;0.00;0.00;0.00;0.00;0.00
temp2.txt
1000xyz400100xyzA00680xyz0;19722.83;19565.7;157.13;11;2.74;11.00
1000xyz400100xyzA00682xyz0;7210.68;4111.53;3099.15;216.95;1.21;216.94
The sort command I'm using:
sort -k1,1 temp1 -o temp1.tmp
sort -k1,1 temp2 -o temp2.tmp
I'd appreciate if someone could show me how to redirect only the missing/mis-matching rows into two separate files for analysis.
Try
cat temp1 temp2 | sort -k1,1 -o tmp
# mis-matching/missing rows:
uniq -u tmp
# matching rows:
uniq -d tmp
You want the difference as described at http://www.pixelbeat.org/cmdline.html#sets
sort -t';' -k1,1 temp1 temp1 temp2 | uniq -u > only_in_temp2
sort -t';' -k1,1 temp1 temp2 temp2 | uniq -u > only_in_temp1
Notes:
Use join rather than uniq, as shown at the link above if you want to compare only particular fields
If the first field is fixed width then you don't need the -t';' -k1,1 params above
Look at the comm command.
using gawk, and outputting lines in file1 that is not in file2
awk -F";" 'FNR==NR{ a[$1]=$0;next }
( ! ( $1 in a) ) { print $0 > "afile.txt" }' file2 file1
interchange the order of file2 and file to output line in file2 that is not in file1