filter data in text file and load into postgresql - postgresql

I have a text file with the below format:
Text: htpps:/xxx
Expiry: ddmm/yyyy
object_id: 00
object: ABC
auth: 333
RequestID: 1234
Text: htpps:/yyy
Expiry: ddmm/yyyy
object_id: 01
object: NNN
auth: 222
RequestID: 3456
and so on
...
I want to delete all lines with the exception of lines with prefix "Expiry:" "object:" and "object_id:"
then load it into a table in postgresql
Would really appreciate your help on the above two.
thanks
Nick

I'm sure there will be other methods, but I found an iterative approach if every object has the same format of
Text: htpps:/xxx
Expiry: ddmm/yyyy
object_id: 00
object: ABC
auth: 333
RequestID: 1234
Then you can transform the above with
more test.txt | awk '{ printf "%s\n", $2 }' | tr '\n' ',' | sed 's/,,/\n/' | sed '$ s/.$//'
and, for your example it will generate the entries in CSV format
htpps:/xxx,ddmm/yyyy,00,ABC,333,1234
htpps:/yyy,ddmm/yyyy,01,NNN,222,3456
The above code does:
awk '{ printf "%s\n", $2 }': prints only the second element for each row
tr '\n' ',': transform new lines in ,
sed 's/,,/\n/': removes the empty lines
sed '$ s/.$//': removes the trailing ,
Of course this is probably an oversimplified example, but you could use it as basis. Once the file is in CSV you can load it with psql

Related

PgAdmin Exporting text column to csv file

I have a table with 3 columns - type, name, and code.
The code column contains the procedure/function source code.
I have exported it to a csv file using Import/Export option in PgAdmin 4 v5, but the code column does not stick to a single cell in the csv file. The data in it spreads across to many of the rows and columns.
I have checked Encoding as UTF8 which works fine normally while exporting other tables.
Other settings: Format: csv, Encoding: UTF8. Have not changed any other settings
Can someone help how to export it properly.
An explanation of what you are seeing:
CREATE TABLE public.csv_test (
fld_1 character varying,
fld_2 character varying,
fld_3 character varying,
fld_4 character varying
);
insert into csv_test values ('1', E'line with line end. \n New line', 'test', 'dog');
insert into csv_test values ('2', E'line with line end. \n New line', 'test', 'dog');
insert into csv_test values ('3', E'line with line end. \n New line \n Another line', 'test2', 'cat');
insert into csv_test values ('4', E'line with line end. \n New line \n \t Another line', 'test3', 'cat');
select * from csv_test ;
fld_1 | fld_2 | fld_3 | fld_4
-------+-----------------------+-------+-------
1 | line with line end. +| test | dog
| New line | |
2 | line with line end. +| test | dog
| New line | |
3 | line with line end. +| test2 | cat
| New line +| |
| Another line | |
4 | line with line end. +| test3 | cat
| New line +| |
| Another line | |
\copy csv_test to csv_test.csv with (format 'csv');
\copy csv_test to csv_test.txt;
--fld_2 has line ends and/or tabs so in CSV the data will wrap inside the quotes.
cat csv_test.csv
1,"line with line end.
New line",test,dog
2,"line with line end.
New line",test,dog
3,"line with line end.
New line
Another line",test2,cat
4,"line with line end.
New line
Another line",test3,cat
-- In text format the line ends and tabs are shown and not wrapped.
cat csv_test.txt
1 line with line end. \n New line test dog
2 line with line end. \n New line test dog
3 line with line end. \n New line \n Another line test2 cat
4 line with line end. \n New line \n \t Another line test3 cat

Replace newline (\n) except last of each line

my input is split into multiple lines. I want it to output in a single line.
For example Input is :
1|23|ABC
DEF
GHI
newline
newline
2|24|PQR
STU
LMN
XYZ
newline
Output:
1|23|ABC DEF GHI
2|24|PQR STU LMN XYZ
Well, here is one for awk:
$ awk -v RS="" -F"\n" '{$1=$1}1' file
Output:
1|23|ABC DEF GHI
2|24|PQR STU LMN XYZ

Remove row from a file perl

I have file with | delimited row in that i want to add check on the value of 8th position if the value matches i want to remove that row from the file and if it not matching i want to leave that in file.
Below is the file format , i want to remove all the rows which have U value on the 8th position
A|B|DADD|H|O| |123 A Street; Apt.2|U|M
A|B|DADD|H|O| |123 A Street; Apt.2|A|M
A|B|DADD|H|O| |123 A Street; Apt.2|B|M
A|B|DADD|H|O| |123 A Street; Apt.2|U|M
How we can do it this Perl or is there any way we can use Awk or Sed. But after removing i want to print them as well .
I have tried sed but is matching through out the file i want to match at specific position.
sed -i '' "/$pattern/d" $file
perl -F'\|' -wlane'print if $F[7] ne "U"' file > new
With -a switch each line is split into words, available in #F array. The separator to split on can be set with -F option (default is whitespace) and here it's |. See switches in perlrun. Then we just check for the 8th field and print.
In order to change the input file in-place add -i switch
perl -i -F'\|' -wlane'print if $F[7] ne "U"' file
or use -i.bak to keep (.bak) backup as well.
I see that a question popped up about logging those lines that aren't kept in the file.
One way is to hijack the STDERR stream for them
perl -i -F'\|' -wlane'$F[7] ne "U" ? print : print STDERR $_' file 2> excluded
where the file excluded gets the STDERR stream, redirected (in bash) using 2>. However, that can be outright dangerous since now possible warnings are hidden and corrupt the file intended for excluded lines (as they also go to that file).
So better collect those lines and print them at the end
perl -i -F'\|' -wlanE'
$F[7] ne "U" ? print : push #exclude, $_;
END { say for #exclude }
' input > excluded
where file excluded gets all omitted (excluded) lines. (I switched -e to -E so to have say.)
Sounds like this might be what you want:
$ cat file
A|B|DADD|H|O| |123 A Street; Apt.2|U|M
A|B|DADD|H|O| |123 A Street; Apt.2|A|M
A|B|DADD|H|O| |123 A Street; Apt.2|B|M
A|B|DADD|H|O| |123 A Street; Apt.2|U|M
$ awk -i inplace -F'[|]' '$8=="U"{print|"cat>&2"; next} 1' file
A|B|DADD|H|O| |123 A Street; Apt.2|U|M
A|B|DADD|H|O| |123 A Street; Apt.2|U|M
$ cat file
A|B|DADD|H|O| |123 A Street; Apt.2|A|M
A|B|DADD|H|O| |123 A Street; Apt.2|B|M
The above uses GNU awk for -i inplace. With other awks you'd just do:
awk -F'[|]' '$8=="U"{print|"cat>&2"; next} 1' file > tmp && mv tmp file
To log the deleted line to a file named log1:
awk -F'[|]' '$8=="U"{print >> "log1"; next} 1' file
To log it and print it to stderr:
awk -F'[|]' '$8=="U"{print|"tee -a log1 >&2"; next} 1' file

How to replace the date with "sed" by the words "today" or "tomorrow"

my goal is to retrieve tidal times from www.worldtides.info in a specific way.
I got an API key on the site and can successfully retrieve the infos by issuing:
curl -s "http://www.worldtides.info/api?extremes&lat=my_latitude&lon=my_longitude&length=86400&key=my_api_key"| jq -r ".extremes[] | .date + .type"
I've installed jq on my raspberry to parse "date" and "type" from the json result.
The result in the terminal is:
2016-04-03T16:47+0000Low
2016-04-03T23:01+0000High
2016-04-04T05:18+0000Low
2016-04-04T11:29+0000High
To get a cleaner result, i use sed:
curl -s "http://www.worldtides.info/api?extremes&lat=my_latitude&lon=my_longitude&length=86400&key=my_api_key"| jq -r ".extremes[] | .date + .type" | sed 's/+0000/ /g' | sed 's/T/ /g'|
The result is:
2016-04-03 16:47 Low
2016-04-03 23:01 High
2016-04-04 05:18 Low
2016-04-04 11:29 High
I don't know how to replace the date by the word "today" if it's the date of today (2016-04-03 when i'm writing right now) and how to replace the date by the word "tomorrow" if it's the date of tomorrow.
I've tried:
curl -s "http://www.worldtides.info/api?extremes&lat=my_latitude&lon=my_longitude&length=86400&key=my_api_key"| jq -r ".extremes[] | .date + .type" | sed 's/date +"%Y-%m-%d"/Today/g' | sed 's/+0000/ /g' | sed 's/T/ /g'|
But no luck, no change. Can you help me ? thanks
Some lean linux distribution do not have GNU date out-of-the-box but use POSIX date without a tomorrow function. So you might have to install it first if you want to use sed with date. Alternatively, if GNU awk is available, you can also do
awk '$1 ~ strftime("%Y-%m-%d") {$1 = "today"} $1 ~ strftime("%Y-%m-%d",systime()+24*3600) {$1 = "tomorrow"} {print}'
You can do the substitution this way:
today=`date +%Y-%m-%d`
tomorrow=`date --date="tomorrow" +%Y-%m-%d`
echo $today $tomorrow
sed "s/$today/today/g; s/$tomorrow/tomorrow/g;" your_last_result
where your_last_result is the file containing the data from your question below "The result is:"

Remove duplicate numbers within parentheses using sed

I am trying to remove duplicate numbers within parentheses using sed.
So I have the following string:
Abdc 1234 1234 (5678) (5678) (9012) (9012) (3456)
I want to use sed to remove any 4-digit numbers within the parentheses, including the parentheses. So my string should look like this:
Abdc 1234 1234 (5678) (9012) (3456)
In this case the "(5678)" and "(9012)" were removed because they were 4-digit numbers within parentheses that repeated. The "1234" numbers were not removed because they were not within parenthesis. The "(3456)" was not removed because it was not repeating.
I do not know how to do this with sed but you could try the following with awk:
$ echo "Abdc 1234 1234 (5678) (5678) (9012) (9012) (3456)" | awk '
{
for(i=1;i<=NF;i++) {
if(substr($i,0,1) != "(" || (seen[$i] != 1)) {
seen[$i]=1;
printf "%s ",$i
}
};
print ""
}'
Output:
Abdc 1234 1234 (5678) (9012) (3456)
This loops through the line fields then prints each field only if it has never been seen before or if it is not starting with (.
This works for your input:
echo 'Abdc 1234 1234 (5678) (5678) (9012) (9012) (3456)' |
sed 's/\(([0-9][0-9]*)\) \1/\1/g'
It assumes duplicates follow each other, if that is not the case, use this version:
echo 'Abdc 1234 1234 (5678) (5678) (9012) (9012) (3456)' |
sed 's/\(([0-9][0-9]*)\) \(.*\)\1/\1\2/g'
Or a bit shorter with GNU sed extended expressions:
echo 'Abdc 1234 1234 (5678) (5678) (9012) (9012) (3456)' |
sed -r 's/(\([0-9]+\)) (.*)\1/\1\2/g'
Output in all cases:
Abdc 1234 1234 (5678) (9012) (3456)
Edit - handle situation where more than two identical items exist
This can be done by looping over the pattern until it no longer matches:
echo 'Abdc 1234 1234 (5678) (5678) (9012) (9012) (3456) (5678) (5678)' |
sed -r ':a; s/(\([0-9]+\))(.*)\1 ?/\1\2/g; ta'
Using Perl :
$ echo "Abdc 1234 1234 (5678) (5678) (9012) (9012) (3456)" |
perl -ne '
my (#arr, %hash);
for (split) {
if (/^\(.*\)/) {
$hash{$_}++;
push #arr, $_ if $hash{$_} == 1;
}
else {
push #arr, $_;
}
}
print join " ", #arr, "\n";
'
That will works with multi line as input and N occurrences of repeated stuff with parenthesis.
This might work for you (GNU sed):
sed ':a;s/\(\(([0-9]\+) *\).*\)\2/\1/g;ta' file
awk -F"(" '{for(i in a)delete a[i];for(i=2;i<=NF;i++){if($i in a){$i="";}else{a[$i];$i="("$i}}print $0}' your_file
Tested below:
input:
> cat temp
Abdc 1234 1234 (5678) (5678) (9012) (9012) (3456)
1234 1234 (1234) (5678) (9012) (1234) (3456)
(5678) (6467) (6467) (9012) (5678)
Now the execution:
> awk -F"(" '{for(i in a)delete a[i];for(i=2;i<=NF;i++){if($i in a){$i="";}else{a[$i];$i="("$i}}print $0}' temp
Abdc 1234 1234 (5678) (9012) (3456)
1234 1234 (1234) (5678) (9012) (3456)
(5678) (6467) (9012) (5678)
>