Remove row from a file perl - perl

I have file with | delimited row in that i want to add check on the value of 8th position if the value matches i want to remove that row from the file and if it not matching i want to leave that in file.
Below is the file format , i want to remove all the rows which have U value on the 8th position
A|B|DADD|H|O| |123 A Street; Apt.2|U|M
A|B|DADD|H|O| |123 A Street; Apt.2|A|M
A|B|DADD|H|O| |123 A Street; Apt.2|B|M
A|B|DADD|H|O| |123 A Street; Apt.2|U|M
How we can do it this Perl or is there any way we can use Awk or Sed. But after removing i want to print them as well .
I have tried sed but is matching through out the file i want to match at specific position.
sed -i '' "/$pattern/d" $file

perl -F'\|' -wlane'print if $F[7] ne "U"' file > new
With -a switch each line is split into words, available in #F array. The separator to split on can be set with -F option (default is whitespace) and here it's |. See switches in perlrun. Then we just check for the 8th field and print.
In order to change the input file in-place add -i switch
perl -i -F'\|' -wlane'print if $F[7] ne "U"' file
or use -i.bak to keep (.bak) backup as well.
I see that a question popped up about logging those lines that aren't kept in the file.
One way is to hijack the STDERR stream for them
perl -i -F'\|' -wlane'$F[7] ne "U" ? print : print STDERR $_' file 2> excluded
where the file excluded gets the STDERR stream, redirected (in bash) using 2>. However, that can be outright dangerous since now possible warnings are hidden and corrupt the file intended for excluded lines (as they also go to that file).
So better collect those lines and print them at the end
perl -i -F'\|' -wlanE'
$F[7] ne "U" ? print : push #exclude, $_;
END { say for #exclude }
' input > excluded
where file excluded gets all omitted (excluded) lines. (I switched -e to -E so to have say.)

Sounds like this might be what you want:
$ cat file
A|B|DADD|H|O| |123 A Street; Apt.2|U|M
A|B|DADD|H|O| |123 A Street; Apt.2|A|M
A|B|DADD|H|O| |123 A Street; Apt.2|B|M
A|B|DADD|H|O| |123 A Street; Apt.2|U|M
$ awk -i inplace -F'[|]' '$8=="U"{print|"cat>&2"; next} 1' file
A|B|DADD|H|O| |123 A Street; Apt.2|U|M
A|B|DADD|H|O| |123 A Street; Apt.2|U|M
$ cat file
A|B|DADD|H|O| |123 A Street; Apt.2|A|M
A|B|DADD|H|O| |123 A Street; Apt.2|B|M
The above uses GNU awk for -i inplace. With other awks you'd just do:
awk -F'[|]' '$8=="U"{print|"cat>&2"; next} 1' file > tmp && mv tmp file
To log the deleted line to a file named log1:
awk -F'[|]' '$8=="U"{print >> "log1"; next} 1' file
To log it and print it to stderr:
awk -F'[|]' '$8=="U"{print|"tee -a log1 >&2"; next} 1' file

Related

filter data in text file and load into postgresql

I have a text file with the below format:
Text: htpps:/xxx
Expiry: ddmm/yyyy
object_id: 00
object: ABC
auth: 333
RequestID: 1234
Text: htpps:/yyy
Expiry: ddmm/yyyy
object_id: 01
object: NNN
auth: 222
RequestID: 3456
and so on
...
I want to delete all lines with the exception of lines with prefix "Expiry:" "object:" and "object_id:"
then load it into a table in postgresql
Would really appreciate your help on the above two.
thanks
Nick
I'm sure there will be other methods, but I found an iterative approach if every object has the same format of
Text: htpps:/xxx
Expiry: ddmm/yyyy
object_id: 00
object: ABC
auth: 333
RequestID: 1234
Then you can transform the above with
more test.txt | awk '{ printf "%s\n", $2 }' | tr '\n' ',' | sed 's/,,/\n/' | sed '$ s/.$//'
and, for your example it will generate the entries in CSV format
htpps:/xxx,ddmm/yyyy,00,ABC,333,1234
htpps:/yyy,ddmm/yyyy,01,NNN,222,3456
The above code does:
awk '{ printf "%s\n", $2 }': prints only the second element for each row
tr '\n' ',': transform new lines in ,
sed 's/,,/\n/': removes the empty lines
sed '$ s/.$//': removes the trailing ,
Of course this is probably an oversimplified example, but you could use it as basis. Once the file is in CSV you can load it with psql

output of two command need to be added as columns awk

I have a file which contains Package name and its Release.The line Release change has both old version and new version:
grep -A 2 'Package list' pkglist
Package list: xorg-x11-drv-ati-firmware-7.6.1-3.el6_9.noarch
Repository: /Linux/6.9/rpms/xorg-x11-drv-ati-firmware-7.6.1-4.el6.noarch.rpm
Release Change: 3.el6_9 --> 4.el6
Package list: yum-plugin-fastestmirror-1.1.30-40.0.1.el6.noarch
Repository: /Linux/6.9/rpms/yum-plugin-fastestmirror-1.1.30-42.0.1.el6_10.noarch.rpm
Release Change: 40.0.1.el6 --> 42.0.1.el6_10
Package list: yum-utils-1.1.30-40.0.1.el6.noarch
Repository: /Linux/6.9/rpms/yum-utils-1.1.30-42.0.1.el6_10.noarch.rpm
Release Change: 40.0.1.el6 --> 42.0.1.el6_10
I need formatted output as three columns with 1st column as pkgname 2nd column as 'old version' and 3rd column as 'new version' :
xorg-x11-drv-ati-firmware-7.6.1-3.el6_9.noarch 3.el6_9 4.el6
yum-utils-1.1.30-40.0.1.el6.noarch 40.0.1.el6 42.0.1.el6_10
yum-utils-1.1.30-40.0.1.el6.noarch 40.0.1.el6 42.0.1.el6_10
What I am trying is:
grep -i 'Package list' pkglist | awk '{print $3}'
xorg-x11-drv-ati-firmware-7.6.1-3.el6_9.noarch
yum-plugin-fastestmirror-1.1.30-40.0.1.el6.noarch
yum-utils-1.1.30-40.0.1.el6.noarch
grep -A 2 'Package list' pkglist | grep -i 'Release' | awk '{print $3,$5}'
3.el6_9 4.el6
40.0.1.el6 42.0.1.el6_10
40.0.1.el6 42.0.1.el6_10
The above two command output needs to be added as three columns in each line.
awk '/Package list/{printf $3 OFS}/Release Change/{print $3, $5}' pkglist
Returns
xorg-x11-drv-ati-firmware-7.6.1-3.el6_9.noarch 3.el6_9 4.el6
yum-plugin-fastestmirror-1.1.30-40.0.1.el6.noarch 40.0.1.el6 42.0.1.el6_10
yum-utils-1.1.30-40.0.1.el6.noarch 40.0.1.el6 42.0.1.el6_10

sed - Addressing using two strings

I am picking up sed. I am having a trouble understanding how the line addressing in sed works when a pattern is used to specify line address.
I have a sample text file named emp.lst with the following contents:
2233|a.k. shukla |g.m. |sales |12/12/52|6000
9876|jai sharma |director |production|12/03/50|7000
5678|sumit chakrobarty|d.g.m. |marketing |19/04/43|6000
2365|barun sengupta |director |personnel |11/05/47|7800
5423|n.k. gupta |chairman |admin |30/08/56|5400
1006|chanchal singhvi |director |sales |03/09/38|6700
6213|karuna ganguly |g.m. |accounts |05/06/62|6300
1265|s.n. dasgupta |manager |sales |12/09/63|5600
4290|jayant Choudhury |executive|production|07/09/50|6000
2476|anil aggarwal |manager |sales |01/05/59|5000
6521|lalit chowdury |director |marketing |26/09/45|8200
3212|shyam saksena |d.g.m. |accounts |12/12/55|6000
3564|sudhir Agarwal |executive|personnel |06/07/47|7500
2345|j.b. saxena |g.m. |marketing |12/03/45|8000
0110|v.k. agrawal |g.m. |marketing |31/12/40|9000
As I understand, line address can be specified either in the form of line number(s) of a pattern to match as text or regular expression.
I understand how sed -n '1p' emp.lst and sed -n '1,2p' emp.lst print line 1 and line 1 & 2 respectively without echoing all lines (-n).
I also understand and appreciate how sed -n '/director/p' emp.lst match all the lines containing the string director, and outputs:
9876|jai sharma |director |production|12/03/50|7000
2365|barun sengupta |director |personnel |11/05/47|7800
1006|chanchal singhvi |director |sales |03/09/38|6700
6521|lalit chowdury |director |marketing |26/09/45|8200
Now, when I specify multiple patters as sed -n '/director/,/executive/p' emp.lst, the output shown is:
9876|jai sharma |director |production|12/03/50|7000
5678|sumit chakrobarty|d.g.m. |marketing |19/04/43|6000
2365|barun sengupta |director |personnel |11/05/47|7800
5423|n.k. gupta |chairman |admin |30/08/56|5400
1006|chanchal singhvi |director |sales |03/09/38|6700
6213|karuna ganguly |g.m. |accounts |05/06/62|6300
1265|s.n. dasgupta |manager |sales |12/09/63|5600
4290|jayant Choudhury |executive|production|07/09/50|6000
6521|lalit chowdury |director |marketing |26/09/45|8200
3212|shyam saksena |d.g.m. |accounts |12/12/55|6000
3564|sudhir Agarwal |executive|personnel |06/07/47|7500
What does this output represent?
Is it all lines containing the pattern director and executive? Clearly no, as there are some lines not containing either one of the patterns.
Is it all lines starting with first one matching either one of the patters till the last one matching either one of the patterns? No again, as if I go by that logic, one line (2476|anil aggarwal |manager |sales |01/05/59|5000) is missing from the output.
I have not been able to clearly deduce how the command sed -n '/director/,/executive/p' emp.lst is working? I have gone through the sed man page and have yet been unable to deduce.
How do I approach understanding the working?
For context, I am running sed command built into macOS High Sierra 10.13.6 running in Bash version 4.4.
Note: I am a sed newbie. Please edit any mistake or incorrect terminology that I may have used.
https://www.gnu.org/software/sed/manual/sed.html#Range-Addresses:
An address range can be specified by specifying two addresses separated by a comma (,). An address range matches lines starting from where the first address matches, and continues until the second address matches (inclusively):
$ seq 10 | sed -n '4,6p'
4
5
6
Thus 1,2p does not mean "print lines 1 and 2" but "print all lines between line 1 and line 2". The difference becomes more clear with e.g. 3,7p, which will not just print line 3 and 7, but lines 3, 4, 5, 6, 7.
/director/,/executive/p prints all lines between a starting line (matching director) and an ending line (matching executive).
In your case, you have two matching ranges (each starting with director and ending with executive):
9876|jai sharma |director |production|12/03/50|7000
5678|sumit chakrobarty|d.g.m. |marketing |19/04/43|6000
2365|barun sengupta |director |personnel |11/05/47|7800
5423|n.k. gupta |chairman |admin |30/08/56|5400
1006|chanchal singhvi |director |sales |03/09/38|6700
6213|karuna ganguly |g.m. |accounts |05/06/62|6300
1265|s.n. dasgupta |manager |sales |12/09/63|5600
4290|jayant Choudhury |executive|production|07/09/50|6000
6521|lalit chowdury |director |marketing |26/09/45|8200
3212|shyam saksena |d.g.m. |accounts |12/12/55|6000
3564|sudhir Agarwal |executive|personnel |06/07/47|7500
From man sed:
0,addr2
Start out in "matched first address" state, until addr2 is found.
This is similar to 1,addr2, except that if addr2 matches the very
first line of input the 0,addr2 form will be at the end of its range,
whereas the 1,addr2 form will still be at the beginning of its range.
This works only when addr2 is a regular expression.
Not 100% sure if this is the manual section that applies but it looks like you have 2 blocks from "director" to "executive" in your output above.
There happen to be some other "director" lines between the first "director" and first succeeding "executive".

How to replace the date with "sed" by the words "today" or "tomorrow"

my goal is to retrieve tidal times from www.worldtides.info in a specific way.
I got an API key on the site and can successfully retrieve the infos by issuing:
curl -s "http://www.worldtides.info/api?extremes&lat=my_latitude&lon=my_longitude&length=86400&key=my_api_key"| jq -r ".extremes[] | .date + .type"
I've installed jq on my raspberry to parse "date" and "type" from the json result.
The result in the terminal is:
2016-04-03T16:47+0000Low
2016-04-03T23:01+0000High
2016-04-04T05:18+0000Low
2016-04-04T11:29+0000High
To get a cleaner result, i use sed:
curl -s "http://www.worldtides.info/api?extremes&lat=my_latitude&lon=my_longitude&length=86400&key=my_api_key"| jq -r ".extremes[] | .date + .type" | sed 's/+0000/ /g' | sed 's/T/ /g'|
The result is:
2016-04-03 16:47 Low
2016-04-03 23:01 High
2016-04-04 05:18 Low
2016-04-04 11:29 High
I don't know how to replace the date by the word "today" if it's the date of today (2016-04-03 when i'm writing right now) and how to replace the date by the word "tomorrow" if it's the date of tomorrow.
I've tried:
curl -s "http://www.worldtides.info/api?extremes&lat=my_latitude&lon=my_longitude&length=86400&key=my_api_key"| jq -r ".extremes[] | .date + .type" | sed 's/date +"%Y-%m-%d"/Today/g' | sed 's/+0000/ /g' | sed 's/T/ /g'|
But no luck, no change. Can you help me ? thanks
Some lean linux distribution do not have GNU date out-of-the-box but use POSIX date without a tomorrow function. So you might have to install it first if you want to use sed with date. Alternatively, if GNU awk is available, you can also do
awk '$1 ~ strftime("%Y-%m-%d") {$1 = "today"} $1 ~ strftime("%Y-%m-%d",systime()+24*3600) {$1 = "tomorrow"} {print}'
You can do the substitution this way:
today=`date +%Y-%m-%d`
tomorrow=`date --date="tomorrow" +%Y-%m-%d`
echo $today $tomorrow
sed "s/$today/today/g; s/$tomorrow/tomorrow/g;" your_last_result
where your_last_result is the file containing the data from your question below "The result is:"

delete_node failure with osm2pgsql

I updated to OS X Mavericks and I try to setup again a workflow to convert personnal OSM data (created using JOSM software) into Tilemill maps.
For this, I use osm2pgsql to populate a postgres/postgis database with my OSM files. Before update, the same workflow worked well.
I use Postgresql.app version 9.3.0.0 and osm2pgsql version 0.84.0
When I launch the osm2pgsql command, I get this error :
osm2pgsql SVN version 0.84.0 (64bit id space)
Using projection SRS 900913 (Spherical Mercator)
Setting up table: coast_point
...
Reading in file: ../src/misc/00_Coast.osm
delete_node failed: ERROR: prepared statement "delete_node" does not exist
(7)
Arguments were: -476852,
Error occurred, cleaning up
So, there is a "delete_node" error, and I really don't know why.
I tried to change the negative 'id' values to positive ones, but I have the same error.
Here is the first line of the OSM file that caused the error :
<?xml version='1.0' encoding='UTF-8'?>
<osm version='0.6' upload='true' generator='JOSM'>
<node id='-476852' action='modify' visible='true' lat='-4.660264310091712' lon='11.79785544887142' />
<node id='-476850' action='modify' visible='true' lat='-4.659760277426281' lon='11.78306037634432' />
...
Same error on all files that worked previously.
I opened a bug report on osm2pgsql github but this forum is not very active, so I don't expect any help from there.
I've found in osm2pgsql code that the delete_node part is in osm2pgsql/middle-pgsql.c file :
"PREPARE get_node (" POSTGRES_OSMID_TYPE ") AS SELECT lat,lon,tags FROM %p_nodes WHERE id = $1 LIMIT 1;\n"
"PREPARE get_node_list(" POSTGRES_OSMID_TYPE "[]) AS SELECT id, lat, lon FROM %p_nodes WHERE id = ANY($1::" POSTGRES_OSMID_TYPE "[])",
"PREPARE delete_node (" POSTGRES_OSMID_TYPE ") AS DELETE FROM %p_nodes WHERE id = $1;\n",
.copy = "COPY %p_nodes FROM STDIN;\n",
.analyze = "ANALYZE %p_nodes;\n",
.stop = "COMMIT;\n"
(...)
pgsql_execPrepared(node_table->sql_conn, "delete_node", 1, paramValues, PGRES_COMMAND_OK );
If you have any idea, you're very welcome !
Thanks
Greg
Helped by osm2pgsql guys, I figured out that the problem was mainly due to the use of JOSM files into osm2pgsql.
In fact, JOSM files are not pure OSM files as there are some missing key/values : version, user and timestamp.
As I don't need those tags, I preprocessed the OSM files from josm with this script in order to pass the compatibility tests :
#!/bin/bash
SOURCE=$1
TARGET=$2
cat $SOURCE | sed s/"node id=\'-"/"node id=\'"/g | sed s/"nd ref=\'-"/"nd ref=\'"/g \
| sed s/" action=\'modify\'"//g \
| sed "/node/ s/ timestamp='[^']*'//" \
| sed "/node/ s/ action='[^']*'//" \
| sed "/node/ s/ version='[^']*'//" \
| sed "/node/ s/ user='[^']*'//" \
| sed "/node/ s/ id/ version='1' user='iero' timestamp='1970-01-01T12:00:00Z' id/" \
| sed "/way/ s/ timestamp='[^']*'//" \
| sed "/way/ s/ action='[^']*'//" \
| sed "/way/ s/ version='[^']*'//" \
| sed "/way/ s/ user='[^']*'//" \
| sed "/way/ s/ id/ version='1' user='iero' timestamp='1970-01-01T12:00:00Z' id/" \
| sed "/relation/ s/ timestamp='[^']*'//" \
| sed "/relation/ s/ action='[^']*'//" \
| sed "/relation/ s/ version='[^']*'//" \
| sed "/relation/ s/ user='[^']*'//" \
| sed "/relation/ s/ id/ version='1' user='iero' timestamp='1970-01-01T12:00:00Z' id/" \
> $TARGET
It's not the most beautiful/optimal script we can make, but it seems to works well. I have my data in the pgsql database now.
With this script, I might be able to pass Osmosis tests too !
Thanks to you all
Greg