mass rename with sed - sed

I have to work with a huge XML-File which was exported from Excel.
The file looks like this:
<Row>
<Data>some data..</Data>
<Data>some data..</Data>
<Data>some data..</Data>
<Data>some data..</Data>
<Data>some data..</Data>
<Data>some data..</Data>
<Data>some data..</Data>
<Row>
There are about 2000 Row-elements.
So there is always one Row-tag with 7 Data-subtags. Now I'd like to rename every first Data-tag to 'one', the second one to 'second' and so on..
What's the correct sed syntax to do this?

Consider using awk instead...
BEGIN {
NUM[1]="one"
NUM[2]="two"
NUM[3]="three"
NUM[4]="four"
NUM[5]="five"
NUM[6]="six"
NUM[7]="seven"
}
/<Row/{
print
for(i=1;i<8;i++) {
getline
sub(/Data/, NUM[i]);print
}
}
/<\/Row/{print}
Output:
$ awk -f r.awk input
<Row>
<one>some data..</one>
<two>some data..</two>
<three>some data..</three>
<four>some data..</four>
<five>some data..</five>
<six>some data..</six>
<seven>some data..</seven>
</Row>

Related

Extracting value from a xml file using awk

I have a text stream like this
<device nid="05023CA70900" id="1" fblock="-1" type="switch" name="Appliance Home" brand="Google" active="false" energy_lo="427" />
<device nid="0501C1D82300" id="2" fblock="-1" type="switch" name="TELEVISION Home" brand="Google" active="pending" energy_lo="3272" />
from which i would like an output like
05023CA70900##1##-1##switch##Appliance Home##Google##false##427
0501C1D82300##2##-1##switch##TELEVISION Home##Google##pending##3272
There are many lines in the input all of which are not writable.
How can we achieve this using awk or sed ?
Following awk should work:
awk -F '"' '$1 == "<device nid=" { printf("%s##%s##%s##%s##%s##%s##%s##%s\n",
$2, $4, $6, $8, $10, $12, $14, $16)}' file
PS: It is not always best approach to parse XML using awk/sed.
Its very simple in perl . So why not use perl ?
perl -lne 'push #a,/\"([\S]*)\"/g;print join "##",#a;undef #a' your_file
Sample tested:
> cat temp
<device nid="05023CA70900" id="1" fblock="-1" type="switch" name="Appliance Home" brand="Google" active="false" energy_lo="427" />
<device nid="0501C1D82300" id="2" fblock="-1" type="switch" name="TELEVISION Home" brand="Google" active="pending" energy_lo="3272" />
> perl -lne 'push #a,/\"([\S]*)\"/g;print join "##",#a;undef #a' temp
05023CA70900##1##-1##switch##Google##false##427
0501C1D82300##2##-1##switch##Google##pending##3272
>
awk -F\" -v OFS="##" '/^<device nid=/ { print $2, $4, $6, $8, $10, $12, $14, $16 }' file
or more generally:
awk -F\" '/^<device nid=/ {for (i=2;i<=NF;i+=2) printf "%s%s",(i==2?"":"##"),$i; print ""}' file
To address your question in your comment: If you could have a tab in front of <device nid:
awk -F\" '/^\t?<device nid=// ...'
If you meant something else, update your question and provide more representative input.

SED command Issue with values exceeding 9

I need to generate a file.sql file from a file.csv, so I use this command :
cat file.csv |sed "s/\(.*\),\(.*\)/insert into table(value1, value2)
values\('\1','\2'\);/g" > file.sql
It works perfectly, but when the values exceed 9 (for example for \10, \11 etc...) it takes consideration of only the first number (which is \1 in this case) and ignores the rest.
I want to know if I missed something or if there is another way to do it.
Thank you !
EDIT :
The not working example :
My file.csv looks like
2013-04-01 04:00:52,2,37,74,40233964,3860,0,0,4878,174,3,0,0,3598,27.00,27
What I get
insert into table
val1,val2,val3,val4,val5,val6,val7,val8,val9,val10,val11,val12,val13,val14,val15,val16
values
('2013-04-01 07:39:43',
2,37,74,36526530,3877,0,0,6080,
2013-04-01 07:39:430,2013-04-01 07:39:431,
2013-04-01 07:39:432,2013-04-01 07:39:433,
2013-04-01 07:39:434,2013-04-01 07:39:435,
2013-04-01 07:39:436);
After the ninth element I get the first one instead of the 10th,11th etc...
As far I know sed has a limitation of supporting 9 back references. It might have been removed in the newer versions (though not sure). You are better off using perl or awk for this.
Here is how you'd do in awk:
$ cat csv
2013-04-01 04:00:52,2,37,74,40233964,3860,0,0,4878,174,3,0,0,3598,27.00,27
$ awk 'BEGIN{FS=OFS=","}{print "insert into table values (\x27"$1"\x27",$2,$3,$4,$5,$6,$7,$8,$9,$10,$11,$12,$13,$14,$15,$16 ");"}' csv
insert into table values ('2013-04-01 04:00:52',2,37,74,40233964,3860,0,0,4878,174,3,0,0,3598,27.00,27);
This is how you can do in perl:
$ perl -ple 's/([^,]+),([^,]+),([^,]+),([^,]+),([^,]+),([^,]+),([^,]+),([^,]+),([^,]+),([^,]+),([^,]+),([^,]+),([^,]+),([^,]+),([^,]+),([^,]+)/insert into table values (\x27$1\x27,$2,$3,$4,$5,$6,$7,$8,$9,$10,$11,$12,$13,$14,$15,$16);/' csv
insert into table values ('2013-04-01 04:00:52',2,37,74,40233964,3860,0,0,4878,174,3,0,0,3598,27.00,27);
Try an awk script (based on #JS웃 solution):
script.awk
#!/usr/bin/env awk
# before looping the file
BEGIN{
FS="," # input separator
OFS=FS # output separator
q="\047" # single quote as a variable
}
# on each line (no pattern)
{
printf "insert into table values ("
print q $1 q ", "
print $2,$3,$4,$5,$6,$7,$8,$9,$10,$11,$12,$13,$14,$15,$16
print ");"
}
Run with
awk -f script.awk file.csv
One-liner
awk 'BEGIN{OFS=FS=","; q="\047" } { printf "insert into table values (" q $1 q ", " $2","$3","$4","$5","$6","$7","$8","$9","$10","$11","$12","$13","$14","$15","$16 ");" }' file.csv

Replacing using awk/sed

I have a SQL query:
update SCOTT.GLOBAL set DAY_LIGHT_SAVING_STARTS=TO_DATE('03/31/2013 02:00:00', 'MM/DD/YYYY HH24:MI:SS'), DAY_LIGHT_SAVING_ENDS=TO_DATE('10/27/2011 02:00:00', 'MM/DD/YYYY HH24:MI:SS') where zone='GMT';
I want to replace every occurance of TO_DATE with a random number/string and also want the correcponding TO_DATE and random number/string to be saved in a file.
For example:
update SCOTT.GLOBAL set DAY_LIGHT_SAVING_STARTS=abc, DAY_LIGHT_SAVING_ENDS=pqr where zone='GMT';
File:
TO_DATE('03/31/2013 02:00:00', 'MM/DD/YYYY HH24:MI:SS')~~~~abc
TO_DATE('10/27/2011 02:00:00', 'MM/DD/YYYY HH24:MI:SS')~~~~pqr
How can I achieve this with awk/sed/perl?
I have certainly tried something, though did not share with SO here. Apologies. Here is what I have tried:
perl -p -i -e "s/TO_DATE(.*?)\)/abc/g" my.out
This replaces the occurances of TO_DATE but I cannot figure how I can generate separate random numbers in same line for two different occurances of TO_DATE, and save them to the file along with the corresponding TO_DATE clause.
If I understood well your needs you can try in bash something like bellow:
while read x; do
while [[ $x =~ TO_DATE\([^\)]+\) ]]; do
rand=$(dd if=/dev/urandom bs=3 count=1 2>/dev/null|base64)
x=${x/$BASH_REMATCH/$rand}
done
echo $x
done<<XXX
update SCOTT.GLOBAL set DAY_LIGHT_SAVING_STARTS=TO_DATE('03/31/2013 02:00:00', 'MM/DD/YYYY HH24:MI:SS'), DAY_LIGHT_SAVING_ENDS=TO_DATE('10/27/2011 02:00:00', 'MM/DD/YYYY HH24:MI:SS') where zone='GMT';
XXX
Output
update SCOTT.GLOBAL set DAY_LIGHT_SAVING_STARTS=YsuW, DAY_LIGHT_SAVING_ENDS=5Vve where zone='GMT';
This read every lines from a file (which is replaced by a here-is-the-document here). If the line matches with the TO_DATE\([^\)]+\) pattern it creates a semi-random string from the matching proportion reading /dev/urandom and replaces the found string with this encrypted string. Because of base64 the bs in dd always should be multiple of 3 to avoid the = character from the end. It only works if the + and / is acceptable in the random string.

powershell parsing of cdata-section

I'm trying to read an rss feed using powershell and I can't extract a cdata-section within the feed
Here's a snippet of the feed (with a few items cut to save space):
<item rdf:about="http://philadelphia.craigslist.org/ctd/blahblah.html">
<title>
<![CDATA[2006 BMW 650I,BLACK/BLACK/SPORT/AUTO ]]>
</title>
...
<dc:title>
<![CDATA[2006 BMW 650I,BLACK/BLACK/SPORT/AUTO ]]>
</dc:title>
<dc:type>text</dc:type>
<dcterms:issued>2011-11-28T22:15:55-05:00</dcterms:issued>
</item>
And the Powershell script:
$rssFeed = [xml](New-Object System.Net.WebClient).DownloadString('http://philadelphia.craigslist.org/sss/index.rss')
foreach ($item in $rssFeed.rdf.item) { $item.title }
Which produces this:
#cdata-section
--------------
2006 BMW 650I,BLACK/BLACK/SPORT/AUTO
2006 BMW 650I,BLACK/BLACK/SPORT/AUTO
How do I extract the cdata-section?
I tried a few variants such as $item.title."#cdata-section" and $item.title.InnerText which return nothing. I tried $item.title | gm and I see the #cdata-section listed as a property. What am I missing?
Thanks.
Since you have multiple of those, the title property itself would be an array, so the following should work:
$rss.item.title | select -expand "#cdata-section"
or
$rss.item.title[0]."#cdata-section"
based on what you need.

need help removing time from a csv file

im trying to process a csv and make it easier for sorting, and i need to remove the time and the dash from it. the file has entries like this:
James,07/20/2009-14:40:11
Steve,08/06/2006-02:34:37
John,11/03/2008-12:12:34
and parse it into this:
James,07/20/2009
Steve,08/06/2006
John,11/03/2008
im guessing sed is the right tool for this job?
thanks for your help.
Python
import csv
import datetime
rdr = csv.reader( open("someFile.csv", "rb" ) )
rows = list( reader )
rdr.close()
def byDateTime( aRow ):
return return datetime.datetime.strptime( aRow[1], "%m/%d/%Y-%H:%M:%S" )
rows.sort( key= byDateTime )
wtr = csv.writer( open("sortedFile.csv", "wb" ) )
wtr.writerows( rows )
wtr.close()
cut -d '-' -f 1 file
Edit after comment:
sed 's/-[0-9][0-9]:[0-9][0-9]:[0-9][0-9]//g' file
just use awk
awk -F"," '{ split($2,_,"-"); print $1,_[1] }' OFS="," file
Yes, I think sed is the right tool for the job:
sed 's/-[:0-9]*$//' file