Processing a text file with awk, sed and grep

Processing a text file with awk, sed and grep - sed

My input file:
20110512075615 Constanta 1.0041 1013.41 9999.0 0 0.0 0
20110512075630 Constanta 1.0021 1013.45 9999.0 0 0.0 0
20110512075645 Constanta 1.0031 1013.47 9999.0 0 0.0 0
20110512075700 Constanta 1.0018 1013.47 9999.0 0 0.0 0
20110512075730 Constanta 1.0038 1013.48 9999.0 0 0.0 0
20110512075745 Constanta 1.0023 1013.48 9999.0 0 0.0 0
20110512075800 Constanta 9999.0000 1013.46 13.2 0 0.0 0
20110512075815 Constanta 1.0038 1013.45 13.2 0 0.0 0
20110512075830 Constanta 1.0040 1013.50 13.2 0 0.0 0
20110512075845 Constanta 1.0034 1013.50 13.2 0 0.0 0
20110512075900 Constanta 1.0050 1013.45 13.2 0 0.0 0
20110512075915 Constanta 1.0060 1013.48 13.2 0 0.0 0
20110512075930 Constanta 1.0056 1013.45 13.2 0 0.0 0
20110512080000 Constanta 1.0066 1013.50 13.2 0 0.0 0
20110512080015 Constanta 1.0067 1013.49 13.2 0 0.0 0
20110512080100 Constanta 1.0065 1013.48 13.2 0 0.0 0
20110512080115 Constanta 9999.0000 1013.51 13.2 0 0.0 0
20110512080130 Constanta 1.0065 1013.51 13.2 0 0.0 0
20110512080145 Constanta 1.0079 1013.49 13.2 0 0.0 0
20110512080200 Constanta 1.0072 1013.51 13.2 0 0.0 0
20110512080215 Constanta 1.0084 1013.51 13.2 0 0.0 0
My output file:
YY/MM/DD HH -Level- Atm.Prs -Tw-
201105120757 1.0018 1013.47 9999.0 0 0.0 0
201105120759 1.0050 1013.45 13.2 0 0.0 0
201105120800 9999.0000 1.0066 1013.50 13.2 0 0.0 0
201105120801 1.0065 1013.48 13.2 0 0.0 0
201105120802 9999.0000 1.0072 1013.51 13.2 0 0.0 0
My code:
#! /bin/bash
FILE="Constanta20110513.txt"
# 1) remove column two(='Constanta')
awk '{$2="";print}' $FILE | column -t > tmpfile
# 2) remove lines with '9999.0000'
cat tmpfile | sed -e '/9999.[0-9]/d' >> final.tmp
# 3) remove first three lines
awk 'NR>3' final.tmp >> myfile.tmp
# 4) count lines between '....00' si '....00':
#if >= 3, keep only the line with '...00' and delete the other lines
#if < 3, do the same, and put '9999' on column two
output=$(grep -n '00\s*$' myfile.tmp | sed 's/\s*$/ /')
array=($output $(cat myfile.tmp | wc -l))
for (( i=0; i<${#array[#]}-1; i++ )); do
index1=$(echo "${array[$i]}" | grep -o '^[0-9]*')
index2=$(echo "${array[$i+1]}" | grep -o '^[0-9]*')
if [ $(( index2 - index1 )) -ge 3 ]; then
echo $(echo "${array[$i]}" | grep -o '[0-9]*$') >> temp.tmp
else
echo $(echo "${array[$i]}" | grep -o '[0-9]*$') 9999.0000 >> temp.tmp
fi
done
# 5) delete last two characters from first column(=00)
awk '{sub(/..$/,"",$1)} 1' temp.tmp >> output.tmp
# 6) insert header
echo 'YY/MM/DD HH -Level- Atm.Prs -Tw-' | cat - output.tmp >> output2.tmp
#save
mv output2.tmp $FILE
My problem is at step 4: don't work and the temporary file temp.tmp is not create.
I think the problem is here: grep -n '00\s*$' myfile.tmp | sed 's/\s*$/ /'.
Thank you very much in advance.

Here is #1 to #3 in one go:
awk '{$2="";sub(/ /," ")} !/9999.[0-9]/ && t++>2' $FILE
Not sure what you like to count in step #4, can you make it some more clear.

I based #1-3 on Jotne's work and added a function to handle #4. The following should be put into a executable file ( which I called awko ) and run like awko Constanta20110513.txt:
#!/usr/bin/awk -f
BEGIN { print "YY/MM/DD HH -Level- Atm.Prs -Tw-" }
# absorb jotne's work for #1-3 more or less
{$2="";sub(/ /," ")}
/9999.0000/ || NR<=3 { next }
/^[0-9]{12}00/ { output_line() } # deal with the "00" lines
END { output_line() } # output the final "00" stored in last
function output_line() {
if( last_nr != 0 ) {
if( NR-last_nr < 3 ) {
temp = $0 # save off the current line
$0 = last # reset it to the last "00" line
$2 = "9999.0000" # make $2 what you want
print $0
$0 = temp # restore $0 from temp
}
if( NR-last_nr >= 3 ) { print last }
}
$1 = substr( $1, 1, 12 ) # drop the "00" from $1
last = $0; last_nr = NR; # store some variables
}
I get the following output from the input you specified:
YY/MM/DD HH -Level- Atm.Prs -Tw-
201105120757 1.0018 1013.47 9999.0 0 0.0 0
201105120759 1.0050 1013.45 13.2 0 0.0 0
201105120800 9999.0000 1013.50 13.2 0 0.0 0
201105120801 1.0065 1013.48 13.2 0 0.0 0
201105120802 9999.0000 1013.51 13.2 0 0.0 0

Related

Postgres suddenly says that "Relation does not exists" (it exists for sure)

I have a weird problem with Postgres on production server (DigitalOcean).
Suddenly, my Django project started raising this error. I've not changed anything on server for 2 months so it is not caused by any code changes etc.
relation "mainapp_price" does not exist
LINE 1: ...rom_3", "mainapp_price"."stelinka_12kg_pack" FROM "mainapp_p...
I've checked /var/log/postgres/... which says something similar:
2019-04-27 13:40:26 UTC [13288-11] postgres#brennholzdb ERROR: relation "mainapp_availability" does not exist at character 179
2019-04-27 13:40:26 UTC [13288-12] postgres#brennholzdb STATEMENT: SELECT "mainapp_availability"."id", "mainapp_availability"."dry_wood", "mainapp_availability"."wet_wood", "mainapp_availability"."briquettes", "mainapp_availability"."area" FROM "mainapp_availability" WHERE "mainapp_availability"."area" = 'ar' LIMIT 21
I don't have a clue where is the problem. As I said everything is migrated and there are no new changes in the code.
Do you know what to do?
EDIT
postgres process eats almost 100% CPU..
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
13277 postgres 20 0 385212 4916 2740 S 98.0 1.0 352:44.85 postgres
14096 django 20 0 40388 3524 2996 R 0.3 0.7 0:00.02 top
1 root 20 0 119992 5124 2996 S 0.0 1.0 0:03.54 systemd
2 root 20 0 0 0 0 S 0.0 0.0 0:00.00 kthreadd
3 root 20 0 0 0 0 S 0.0 0.0 0:02.45 ksoftirqd/0
5 root 0 -20 0 0 0 S 0.0 0.0 0:00.00 kworker/0:0H
6 root 20 0 0 0 0 S 0.0 0.0 0:01.25 kworker/u2:0
7 root 20 0 0 0 0 S 0.0 0.0 0:03.56 rcu_sched

replace character with increasing numbers per line

I have large matrix files consisting of only "0" and "a" in clolumns and I want to do what this does:
perl -pe 'BEGIN { our $i = 1; } s/a/($i++)/ge;'; < FILE > NEW_FILE
but only increment once for each line instead of every instance on each line.
So if my first line in the file is:
0 0 a a a
The perl command gives me:
0 0 1 2 3
While i would want
0 0 1 1 1
and on the next line for instance 2 0 2 0 2 and so on...
This should be possible to do with awk, but using:
'{ i=1; gsub(/a/,(i+1));print}' tmp2
just gives me 0's and 2's for all lines...

Just increment before, not on every substitution:
awk '{i++; gsub(/a/,i)}1' file
This way, the variable gets updated once per line, not once per record.
The same applies to the Perl script:
perl -pe 'BEGIN { our $i = 0; } $i++; s/a/$i/ge;' file
Test
$ cat a
0 0 a a a
2 3 a a a
$ awk '{i++; gsub(/a/,i)}1' a
0 0 1 1 1
2 3 2 2 2
$ perl -pe 'BEGIN { our $i = 0; } $i++; s/a/$i/ge;' a
0 0 1 1 1
2 3 2 2 2

You can simply replace every occurrence of a with the current line number
perl -pe 's/a/$./g' FILE > NEW_FILE

perl -pe'$i++;s/a/$i/g'
or if you like to increment only for lines with any substitution
perl -pe'/a/&&$i++;s/a/$i/g'
In action:
$ cat a
0 0 a a a
1 2 0 0 0
2 3 a a a
$ perl -pe'$i++;s/a/$i/g' a
0 0 1 1 1
1 2 0 0 0
2 3 3 3 3
$ perl -pe'/a/&&$i++;s/a/$i/g' a
0 0 1 1 1
1 2 0 0 0
2 3 2 2 2

How to extract lines with a specific column using awk or sed

I have a file which have rows like this:
004662484 4 0 0 0 0
The second column is number 4, and I want to use this number to extract this line into 4 lines like this:
004662484 0 0 0 0 0
004662484 1 0 0 0 0
004662484 2 0 0 0 0
004662484 3 0 0 0 0
How to do that using either awk or sed or both? Thanks!

{ reps = $2; for (i = 0; i < reps; i++) { $2 = i; print $0; } }

How to transpose a huge txt file with 1,743,680 columns and 2890 rows [duplicate]

This question already has answers here:
An efficient way to transpose a file in Bash
(33 answers)
Closed 9 years ago.
I have a huge file of genetic markers for 2890 individuals. I would like to transpose this file. The format of my data is as follows: (I just showed 6 markers here)
ID rs4477212 kgp15297216 rs3131972 kgp6703048 kgp15557302 kgp12112772 .....
BV04976 0 0 1 0 0 0
BV76296 0 0 1 0 0 0
BV02803 0 0 0 0 0 0
BV09710 0 0 1 0 0 0
BV17599 0 0 0 0 0 0
BV29503 0 0 1 1 0 1
BV52203 0 0 0 0 0 0
BV61727 0 0 1 0 0 0
BV05952 0 0 0 0 0 0
In fact, I have 1,743,680 columns and 2890 rows in my text file. How to transpose it?
I would like the output should be like that:
ID BV04976 BV76296 BV02803 BV09710 BV17599 BV29503 BV52203 BV61727 BV05952
rs4477212 0 0 0 0 0 0 0 0 0
kgp15297216 0 0 0 0 0 0 0 0 0
rs3131972 1 1 0 1 0 1 0 1 0
kgp6703048 0 0 0 0 0 1 0 0 0
kgp15557302 0 0 0 0 0 0 0 0 0
kgp12112772 0 0 0 0 0 1 0 0 0

I would make multiple passes over the file, perhaps 100, each pass getting 1743680/passes columns, writing out them out (as rows) at the end of each pass.
Assemble the data into strings in an array, not an array of arrays, for lower memory usage and fewer passes.
Preallocating the space for each string at the beginning of each pass (e.g. $new_row[13] = ' ' x 6000; $new_row[13] = '';) might or might not help.

(See: An efficient way to transpose a file in Bash )
Have you tried
awk -f tr.awk input.txt > out.txt
where tr.awk is
{
for (i=1; i<=NF; i++) a[NR,i]=$i
}
END {
for (i=1; i<=NF; i++) {
for (j=1; j<=NR; j++) {
printf "%s", a[j,i]
if (j<NR) printf "%s", OFS
}
printf "%s",ORS
}
}
Probably your file is too big for the above procedure.
Then you could try splitting it up first. For example:
#! /bin/bash
numrows=2890
echo "Splitting file.."
split -d -a4 -l1 input.txt
arg=""
outfile="out.txt"
tempfile="temp.txt"
if [ -e $outfile ] ; then
rm -i $outfile
fi
for (( i=0; i<$numrows; i++ )) ; do
echo "Processing file: "$(expr $i + 1)"/"$numrows
file=$(printf "x%04d\n" $i)
tfile=${file}.tr
cat $file | tr -s ' ' '\n' > $tfile
rm $file
if [ $i -gt 0 ] ; then
paste -d' ' $outfile $tfile > $tempfile
rm $outfile
mv $tempfile $outfile
rm $tfile
else
mv $tfile $outfile
fi
done
note that split will generate 2890 temporary files (!)

sed replace end of line

I'm trying to add a bunch of 0s at the end of a line. The way the line is identified is that it is followed by a line which starts with "expr1"
in Vim what I do is:
s/\nexpr1/ 0 0 0 0 0 0\rexpr1/
and it works fine. I know that in ubuntu \n is what is normally used to terminate the line but whenever I do that I get a ^# symbol so \r works fine for me. I thought I'd use this with sed but it hasn't really worked. here is what I normally write:
sed "s/\nexpr1/ 0 0 0 0 0 0\rexpr1/" infile > outfile

The end-of-line marker is $. Try this:
s/$/ 0 0 0 0 0 0/
Depending on your environment, you might need to escape the $.

awk '{$0=$0" 0 0 0 0 0 "}1' file > tmp && mv tmp file
ruby -i.bak -ne '$_=$_.chomp!+" 0 0 0 0 0\n";print' file

awk '$(NF + 1) = " 0 0 0 0 0 0"' infile > outfile

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse

Processing a text file with awk, sed and grep - sed

Here is #1 to #3 in one go: awk '{$2="";sub(/ /," ")} !/9999.[0-9]/ && t++>2' $FILE Not sure what you like to count in step #4, can you make it some more clear.

Related

Postgres suddenly says that "Relation does not exists" (it exists for sure)

replace character with increasing numbers per line

How to extract lines with a specific column using awk or sed

How to transpose a huge txt file with 1,743,680 columns and 2890 rows [duplicate]

sed replace end of line

Categories

Resources