Not able to perform multiple sed operations at a time - sed

Hi i need to perform multiple sed operations at a time and then flush the output to a file.
I have a .dat file which has data as follows
indicator.dat
Air_Ind - A.Air_Ind Air_Ind - 0000 - 00- 00
Rpting_Ind - Case When Dstbr_Id Is Null Then 'N' Else 'Y' End Rpting_Ind - 0000 - 00 - 00
Latitude,Longitude - A.Store_Latitude Latitude,A.Store_Longitude Longitude - 0000- 00- 00
coalesce(Pm_Cig_Direct_Ind,'') - Coalesce(Direct_Acct_Ind ,'') - 0004 - 01- 01
coalesce(Pm_Mst_Direct_Ind,'') - Coalesce(Direct_Acct_Ind ,'') - 0004 - 02 - 02
coalesce(Pm_Snus_Direct_Ind,'') - Coalesce(Direct_Acct_Ind ,'') - 0004 - 01 - 02
coalesce(Pm_Snuf_Direct_Ind,'') - Coalesce(Direct_Acct_Ind ,'') - 0004 - 04- 02
coalesce(Jmc_Cgr_Direct_Ind,'') - Coalesce(Direct_Acct_Ind ,'') - 2000 - 02 - 01
coalesce(Usst_Mst_Direct_Ind,'') - Coalesce(Direct_Acct_Ind ,'') - 1070- 02- 02
coalesce(Usst_Snus_Direct_Ind,'') - Coalesce(Direct_Acct_Ind ,'') - 1070 - 01 - 02
coalesce(Usst_Snuf_Direct_Ind,'') - Coalesce(Direct_Acct_Ind ,'') - 1070 - 04 - 02
Now I am trying to replace the parameters defined in a file called indicator.sql and flush the output to a file
indicator_s.sql:
Select A.Location_Id,
param1
From Edw.Location A
--1Left Outer Join
--1(
--1 Select
--1 Location_Id,
--1 Direct_Acct_Ind
--1 From Edw.Location_Bcbc D
--1 Where Company_Cd = 'param2'
--1 And Prod_Type_Cd = 'param3'
--1 And Prod_Catg_Cd = 'param4'
--1 ) A
--1 On L.Location_Id = A.Location_Id
Inner Join
Mim.Mdm_Xref_Distributor D
On D.Src_Dstbr_Id=A.Location_Id
Where Sdw_Exclude_Ind='N' And Dstrb_Cd='Us'
the else block is not entered at any point of time
#!/bin/sh
rm ./Source_tmp.sql
touch ./Source_tmp.sql
while read line
do
MIM=`echo $line | cut -d " " -f1 `
EDW=`echo $line | cut -d "-" -f2 `
Company_Cd=`echo $line | cut -d "-" -f3 `
Prod_Type_Cd=`echo $line | cut -d "-" -f4 `
Prod_Catg_Cd=`echo $line | cut -d "-" -f5 `
echo "Select top 10 * from (" >> ./Source_tmp.sql ;
sed "s/Param1/$MIM/g" indicator.sql >> Source_tmp.sql;
echo "minus">> Source_tmp.sql;
if [ "$MIM"="Air_Ind " ] || [ "$MIM"="Rpting_Ind " ] || [ "$MIM"="Latitude,Longitude " ]
then
sed "s/param1/$EDW/g" indicator_s.sql >> Source_tmp.sql
else
sed -e "s/--1/' '/g" -e "s/param1/$EDW/g" -e "s/param2/$Company_Cd/g" -e "s/param3/$Prod_Type_Cd/g" -e "s/param4/$Prod_Catg_Cd/g" ./indicator_s.sql >> ./Source_tmp.sql
fi
done <indicator.dat
the output should be like the param1 and param2 etc values which i defined should be replaced from the indicator .dat file and the commented lines need to be un-commented in the else block
kindly help me

As far as I can tell, the sed command "is working". But it does not produce the expected output. I guess. To give some output example:
[...]
' ' Where Company_Cd = ' 0000 '
' ' And Prod_Type_Cd = ' 00 '
' ' And Prod_Catg_Cd = ' 00'
[...]
Obviously you have extra quotes on start of line, and extra spaced around your values.
Will this fix your issue:
#!/bin/bash
[...]
MIM=`echo -n $line | cut -d " " -f1 `
EDW=`echo -n $line | cut -d "-" -f2 `
Company_Cd=`echo -n $line | cut -d "-" -f3 `
Prod_Type_Cd=`echo -n $line | cut -d "-" -f4 `
Prod_Catg_Cd=`echo -n $line | cut -d "-" -f5 `
# Trim spaces as you "cut" on "-", keeping extra spaces arround
shopt -s extglob
EDW=${EDW%%+([[:space:]])}; EDW=${EDW##+([[:space:]])};
Company_Cd=${Company_Cd%%+([[:space:]])}; Company_Cd=${Company_Cd##+([[:space:]])};
Prod_Type_Cd=${Prod_Type_Cd%%+([[:space:]])}; Prod_Type_Cd=${Prod_Type_Cd##+([[:space:]])};
Prod_Catg_Cd=${Prod_Catg_Cd%%+([[:space:]])}; Prod_Catg_Cd=${Prod_Catg_Cd##+([[:space:]])};
[...]
# Fix "sed" in your "else" clause by removing extra single quotes
sed -e "s/--1/ /g" -e "s/param1/$EDW/g" -e "s/param2/$Company_Cd/g" -e "s/param3/$Prod_Type_Cd/g" -e "s/param4/$Prod_Catg_Cd/g" ./indicator_s.sql >> ./Source_tmp.sql
Producing now the much more valid SQL:
[...]
Where Company_Cd = '0000'
And Prod_Type_Cd = '00'
And Prod_Catg_Cd = '00'
[...]
That being said, this is mostly some hacks to fix (some of ?) the various issues you might have in your script. But the whole things seems a little bit contrived. And fragile. As for example, it will break if any replace string contains a &. Here Be Dragons.

Its impossible to tell what you want the script to do given so far you've only posted a script that DOESN'T produce whatever output you want and you haven't posted the output you DO want, but let's start with this and you can update your question to show expected output and clarify your requirements:
$ cat tst.awk
BEGIN{ FS="-" }
NR==FNR { template = (template ? template ORS : "") $0; next }
{
split($0,arr,/ /)
MIM = arr[1]
EDW = $1
Company_Cd = $3
Prod_Type_Cd = $4
Prod_Catg_Cd = $5
$0 = "Select top 10 * from (\n" template
gsub(/Param1/,MIM "\nminus")
gsub(/param1/,EDW)
if ( MIM !~ /^Air_Ind|Rpting_Ind|Latitude,Longitude/ ) {
gsub(/--1/," ")
gsub(/param2/,Company_Cd)
gsub(/param3/,Prod_Type_Cd)
gsub(/param4/,Prod_Catg_Cd)
}
print
}
.
$ awk -f tst.awk indicator.sql indicator.dat
Select top 10 * from (
Select A.Id,
Air_Ind
minus
From Location A
--1Left Outer Join
--1(
--1 Select
--1 Location_Id,
--1 Direct_Acct_Ind
--1 From Location_Bcbc D
--1 Where Company_Cd = 'param2'
--1 And Prod_Type_Cd = 'param3'
--1 And Prod_Catg_Cd = 'param4'
--1 ) A
Select top 10 * from (
Select A.Id,
Rpting_Ind
minus
From Location A
--1Left Outer Join
--1(
--1 Select
--1 Location_Id,
--1 Direct_Acct_Ind
--1 From Location_Bcbc D
--1 Where Company_Cd = 'param2'
--1 And Prod_Type_Cd = 'param3'
--1 And Prod_Catg_Cd = 'param4'
--1 ) A

Related

Postgresql : regexp_replace to remove special characters

i need to remove all special character in PostgreSQL like this
' " , . / \ | ] [ { } & * - % ^ ! # #
i try this
SELECT regexp_replace('Test.010. " # $ %. تجربه', '[^\w\s\u0600-\u06FF]', ' ', 'g');
and result
Test 010
Arabic character removed !
i need to remove only special character only and leave or replace Arabic & English & number
You can use translate to convert those specific characters to spaces:
select translate('Test.010. " # $ %. تجربه', '''",./\|][{}&*-%^!##', ' ');
translate
--------------------------
Test 010 $ تجربه

\echo is not working in heredoc pased to psql

Here is my sh file.
SCRIPT_DIR=`dirname $0`
export DATA_DIR=${SCRIPT_DIR}/data
export SQL_DIR=${SCRIPT_DIR}/sql
FILE_NAME=${DATA_DIR}/master_exec_alert_mails.dat
if [ ! -d $DATA_DIR ]
then
mkdir $DATA_DIR
fi
cd $SCRIPT_DIR
psql postgresql://xxxxx:xx#192.168.1.116:5432/xx -v ON_ERROR_STOP=1 << EOF > /dev/null
\o MASTER_EXECUTIVE_EFFORT_FILE_NAME
select to_char(LOCALTIMESTAMP-INTERVAL '8 DAY','Mon dd, yyyy') || ' - ' || to_char(LOCALTIMESTAMP-INTERVAL '2 DAY','Mon dd, yyyy')
\echo 'Master Exec effort list:'
\i master_exec_effort.sql
EOF
But it is not printing the message 'Master Exec effort list:' in the master_exec_alert_mails.dat output file.
Can anyone explain why it is not printing ?
Answer from Abelisto
Probably its because "\o [FILE] send all query results to file or |pipe". Instead of \echo try : "\qecho [STRING] write string to query output stream

Using sed / awk to process file with stanza format

I have a file in stanza format. Example of the file are as below.
id_1:
id=241
pgrp=staff
groups=staff
home=/home/id_1
shell=/usr/bin/ks
id_2:
id=242
pgrp=staff
groups=staff
home=/home/id_2
shell=/usr/bin/ks
How do I use sed or awk to process it and return only the id name, id and groups in a single line and tab delimited format? e.g.:
id_1 241 staff
id_2 242 staff
with awk:
BEGIN { FS="="}
$1 ~ /id_/ { printf("%s", $1) }
$1 ~ /id/ && $1 !~ /_/ { printf("\t%s", $2) }
$1 ~ /groups/ { printf("\t%s\n", $2) }
Here is an awk solution:
translate.awk
#!/usr/bin/awk -f
{
if(match($1, /[^=]:[ ]*$/)){
id_=$1
sub(/:/,"",id_)
}
if(match($1,/id=/)){
split($1,p,"=")
id=p[2]
}
if(match($1,/groups=/)){
split($1,p,"=")
print id_," ",id," ",p[2]
}
}
Execute it either by:
chmod +x translated.awk
./translated.awk data.txt
or
awk -f translated.awk data.txt
For completeness, here comes a shortened version:
#!/usr/bin/awk -f
$1 ~ /[^=]:[ ]*$/ {sub(/:/,"",$1);printf $1" ";FS="="}
$1 ~ /id/ {printf $2" "}
$1 ~ /groups/ {print $2}
sed 'N;N;N;N;N;y/=\n/ /' data.txt | awk '{print $1,$3,$7}'
Here is the one-liner approach by setting RS:
awk 'NR>1{print "id_"++i,$3,$7}' RS='id_[0-9]+:' FS='[=\n]' OFS='\t' file
id_1 241 staff
id_2 242 staff
Requires GNU awk and assumes the IDs are in increasing order starting at 1.
If the ordering of the ID's is arbitrary:
awk '!/shell/&&NR>1{gsub(/:/,"",$1);print "id_"$1,$3,$5}' RS='id_' FS='[=\n]' OFS='\t' file
id_1 241 staff
id_2 242 staff
awk -F"=" '/id_/{split($0,a,":");}/id=/{i=$2}/groups/{printf a[1]"\t"i"\t"$2"\n"}' your_file
tested below:
> cat temp
id_1:
id=241
pgrp=staff
groups=staff
home=/home/id_1
shell=/usr/bin/ks
id_2:
id=242
pgrp=staff
groups=staff
home=/home/id_2
shell=/usr/bin/ks
> awk -F"=" '/id_/{split($0,a,":");}/id=/{i=$2}/groups/{printf a[1]"\t"i"\t"$2"\n"}' temp
id_1 241 staff
id_2 242 staff
This might work for you (GNU sed):
sed -rn '/^[^ :]+:/{N;N;N;s/:.*id=(\S+).*groups=(\S+).*/\t\1\t\2/p}' file
Look for a line holding an id then get the next 3 lines and re-arrange the output.

grep and replace

I wanted to grep a string at the first occurrence ONLY from a file (file.dat) and replace it by reading from another file (output). I have a file called "output" as an example contains "AAA T 0001"
#!/bin/bash
procdir=`pwd`
cat output | while read lin1 lin2 lin3
do
srt2=$(echo $lin1 $lin2 $lin3 | awk '{print $1,$2,$3}')
grep -m 1 $lin1 $procdir/file.dat | xargs -r0 perl -pi -e 's/$lin1/$srt2/g'
done
Basically what I wanted is: When ever a string "AAA" is grep'ed from the file "file.dat" at the first instance, I want to replace the second and third column next to "AAA" by "T 0001" but still keep the first column "AAA" as it is. Th above script basically does not work. Basically "$lin1" and $srt2 variables are not understood inside 's/$lin1/$srt2/g'
Example:
in my file.dat I have a row
AAA D ---- CITY COUNTRY
What I want is :
AAA T 0001 CITY COUNTRY
Any comments are very appreciated.
If you have output file like this:
$ cat output
AAA T 0001
Your file.dat file contains information like:
$ cat file.dat
AAA D ---- CITY COUNTRY
BBB C ---- CITY COUNTRY
AAA D ---- CITY COUNTRY
You can try something like this with awk:
$ awk '
NR==FNR {
a[$1]=$0
next
}
$1 in a {
printf "%s ", a[$1]
delete a[$1]
for (i=4;i<=NF;i++) {
printf "%s ", $i
}
print ""
next
}1' output file.dat
AAA T 0001 CITY COUNTRY
BBB C ---- CITY COUNTRY
AAA D ---- CITY COUNTRY
Say you place the string for which to search in $s and the string with which to replace in $r, wouldn't the following do?
perl -i -pe'
BEGIN { ($s,$r)=splice(#ARGV,0,2) }
$done ||= s/\Q$s/$r/;
' "$s" "$r" file.dat
(Replaces the first instance if present)
This will only change the first match in the file:
#!/bin/bash
procdir=`pwd`
while read line; do
set $line
sed '0,/'"$1"'/s/\([^ ]* \)\([^ ]* [^ ]*\)/\1'"$2 $3"'/' $procdir/file.dat
done < output
To change all matching lines:
sed '/'"$1"'/s/\([^ ]* \)\([^ ]* [^ ]*\)/\1'"$2 $3"'/' $procdir/file.dat

Joining two files based on two fields

I posted a question before a week and the answer was simply (use join):
join <(sort file1) <(sort file2) >output
to join files that have something common which is usually the first field.
I have the following two files:
genes.txt
ENSG001 ENSG002
ENSG002 ENSG001
ENSG003 ENSG004
features.txt
ENSG001 400
ENSG002 350
ENSG003 210
ENSG004 100
I need to join these two files to be like this:
output.txt
ENSG001 400 ENSG002 350
ENSG002 350 ENSG001 400
ENSG003 210 ENSG004 100
I know the answer is in join command but I can't figure out how to join based on two fields. I tried
join -j 1 <(sort genes.txt) <(sort features.txt) >attempt1.txt
but the result will looks like this:
attempt1.txt
ENSG001 ENSG002 400
ENSG002 ENSG001 350
ENSG003 ENSG004 210
I then tried
join -j 2 <(sort -k 2 genes.txt) <(sort -k 2 features.txt) >attempt2.txt
attempt2.txt is empty
Does (join) have the ability to join two files based on two fields ? If no then how can I do it ?
%features;
open $fd, '<', 'features.txt' or die $!;
while (<$fd>) {
($k, $v) = split;
$features{$k} = $v;
}
close $fd or die $!;
open $fd, '<', 'genes.txt' or die $!;
while (<$fd>) {
s/(\w+)/$1 $features{$1}/g;
print;
}
close $fd or die $!;
Thank you all guys I have managed to answer it by tricking the problem.
First I joined the files normally, I then changed the position of first and second field, I next joined the modified output file another time with features, and finally I switched the positions of fields again.
join <(sort genes.txt) <(sort features.txt) >tmp
cat tmp | awk '{ print $2, $1, $3 }' >tmp2
join <(sort tmp2) <(sort features.txt) >tmp3
cat tmp3 | awk '{ print $2, $3, $1, $4 }' >output.txt
To the best of my knowledge, join does NOT support this. See join manpage.
However, you can accomplish this in 2 ways:
Turn the first space/tab in the file into a caret (or other character you will never see in the file), then use join as before which will treat the first 2 fields as 1 field:
perl -pi -e 's/^(\S+)\s+/$1#/' file1
perl -pi -e 's/^(\S+)\s+/$1#/' file2
join <(sort file1) <(sort file2) >output
tr "#" " " output > output.final
Do it in Perl. You can do
the blunt approach (perreal's answer: slurp in 2 files at once); this takes a lot of memory if both files are large
The more memory conserving approach (cdtits's answer: slurp in a smaller file, store in a hash, then apply the lookups to line-by-line read of second file)
For really gynormous files, do a linear approach:
sort both files, read 1 line of each file; if they match, print the match; if not; skip 1 line in the file whose ID was smaller.
In case that "ENST" in features.txt is "ENSG", here is an awk solution that works well on given example:
awk 'BEGIN {while(getline <"features.txt") f[$1]=$2} {print $1,f[$1],$2,f[$2]}' < genes.txt
I can explain in detail if you need to.
Using perl:
use strict;
use warnings;
open GIN, "<genes.txt" or die("genes");
open FIN, "<features.txt" or die("features");
my %relations;
my %values;
while (<GIN>) {
my ($r1, $r2) = split;
$relations{$r1} = $r2;
}
while (<FIN>) {
my ($k, $v) = split;
$values{$k} = $v;
}
for my $r1 (sort keys %relations) {
my $r2 = $relations{$r1};
print "$r1 $values{$r1} $r2 $values{$r2}\n";
}
close FIN; close GIN;
Your approach is generally right. It should be achievable by something like
join -o '1.1 2.2 1.2 1.3' <(
join -o '1.1 1.2 2.2' -1 2 <(sort -k 2 genes.txt) <(sort features.txt) |
sort
) <(sort features.txt)
If I place ENSG004 instead of ENST004 into features.txt I will get exactly what you are looking for:
$ join -o '1.1 2.2 1.2 1.3' <(
join -o '1.1 1.2 2.2' -1 2 <(sort -k 2 genes.txt) <(sort features.txt) |
sort
) <(sort features.txt)
ENSG001 400 ENSG002 350
ENSG002 350 ENSG001 400
ENSG003 210 ENSG004 100
There is less verbose version but there is harder to keep track of fields:
join -o '1.2 2.2 1.1 1.3' -1 2 <(
join -1 2 <(sort -k 2 genes.txt) <(sort features.txt) |
sort -k 2
) <(sort features.txt)
If you are going process really big data it will should work pretty effective to tens of GB (and also should be better then most of RDBMS's if features.txt and genes.txt are comparable in size):
TMP=`mktemp`
sort features.txt > "$TMP"
sort -k 2 genes.txt | join -o '1.1 1.2 2.2' -1 2 - "$TMP" | sort |
join -o '1.1 2.2 1.2 1.3' - "$TMP"
rm "$TMP"