How to remove lines which contain missing values

How to remove lines which contain missing values - sed

I have a file with 46 columns (4+42) and 52 million rows like:
chr1 rs423246 102 120543 0 2 2 1 1 0 . . . -1 2 2 0 0 . . . . . 2 1 1 -1 -1
chr1 rs245622 104 134506 2 2 2 1 0 0 0 2 2 2 -1 -1 . . . 2 2 1 1 1 1 1 1 . 2
chr1 rs267845 105 124564 . . . . . . . . . . . . . . . . . . . . . . . . . .
chr1 rs234579 106 125642 2 2 2 1 0 0 0 -1 -1 -1 1 0 0 2 1 0 . . . 2 . . 2 1 0
I would like to remove only lines which have missing value for all 42 columns.
My missing value is "." (e.g. row 3 in the above example should remove)
How I can remove these lines using commands in Unix such as BWK SED or something else.
Thanks for any help and advise.

grep -Ev '\. \. \. \. \. \. \. \. \. \. \. \. \. \. \. \. \. \. \. \. \. \. \. \. \. \. \. \. \. \. \. \. \. \. \. \. \. \. \. \. \. \.' yourfile

Not the most readable, but hey!, its perl:
perl -ane 'print unless q|.| x 42 eq join q||, #F[4..$#F]' infile

sed '/( .){26}/d' filename
EDIT:
Correction:
sed '/\( \.\)\{42\}/d' filename
or for a variable number of columns after the first 4:
sed '/^\([^ ]* \)\{4\}\(\. \)*\./d' filename

This might work for you (GNU sed):
sed -r '/(\.\s*){42}$/d' /file
or
sed 's/\./&/42;T;d' file
N.B. the most efficient is probably the first solution.

Some awk verison
awk '{a=$0} gsub(/\./,x)!=42 {print a}' file
This prints all line that do not have 42 . using gsub to count them.
awk -F\. NF!=43 file
This counts number of fields using . as separator. (that's why 43 and not 42)

Related

how print lines between 3 & 6 lines using sed?

1 ajar 45000
2 Sunil 25000
3 varoom 50000
4 Amit 47000
5 tanru 15000
6 Deepak 23000
7 Sunil 13000
8 sattvic 80000
I did it using awk. I want using sed command
$ awk 'NR==3, NR==6 {print NR,$0}' employee.txt

sed -n '3,6p' employee.txt
-n tells sed to not print each line;
3,6 is an "address", it tells sed to only apply the following command to the given range of lines;
p tells sed to print the line.

Why does sed (insert line) output spaces between each character?

I have split a larger data file into individual 2-column files for each field. This results in something like this:
0.00 3.02211e+07
1.00 3.02211e+07
2.00 3.02211e+07
3.00 3.02211e+07
4.00 3.02211e+07
5.00 3.01295e+07
6.00 3.00608e+07
7.00 2.99768e+07
When I try to add a row via sed,
sed -i '1i pressure-prof' myfile.txt the output has a space character between each character (including existing spaces). If I look in notepad++, the extra spaces appear as the ASCII "NULL". In the terminal it looks like this:
pressure-prof
0 . 0 0 3 . 0 2 2 1 1 e + 0 7
1 . 0 0 3 . 0 2 2 1 1 e + 0 7
2 . 0 0 3 . 0 2 2 1 1 e + 0 7
3 . 0 0 3 . 0 2 2 1 1 e + 0 7
4 . 0 0 3 . 0 2 2 1 1 e + 0 7
5 . 0 0 3 . 0 1 2 9 5 e + 0 7
6 . 0 0 3 . 0 0 6 0 8 e + 0 7
7 . 0 0 2 . 9 9 7 6 8 e + 0 7
This is on Windows, and I think sed is being provided by cygwin or msys2. I don't know if that has anything to do with the output format issues.
Yes, I can resort to opening up files in a text editor and just adding that way. I would like to be able to utilize sed in the future though.
Thanks for any thoughts and assistance.

cat myfile.txt | tr -d ' ' | sed 's/./0 /4' | sed '1s/0 //' > mf2 && mv mf2 myfile.txt
Run that after you've finished adding your rows. Using tr initially wipes all the spaces, and then sed counts to the fourth character and re-adds a space.

Data manipulation using Perl script

Input:
Col1 col2 col3 col4
aaa 15 23 A
bbb 7 5 B
ccc 43 10 C
Expected output
aaa 15 16
bbb 7 8
ccc 43 44
I know to get this using awk but I need to do this in Perl. I tried using an array in Perl like
push(#output_array, $temp_array[0] . "\t" . $temp_array[1] . "\n");
I don't know how to add 1 to the col2 and make it as col3. Can anybody help me out?

In a perl oneliner
perl -lane 'print join("\t", #F[0,1], $F[1] + 1)' file.txt
If you want to truncate a header row:
perl -lane 'print join("\t", #F[0,1], $. == 1 ? $F[2] : $F[1] + 1)' file.txt
If you want to completely remove a header row:
perl -lane 'print join("\t", #F[0,1], $F[1] + 1) if $. > 1' file.txt

push(#output_array, $temp_array[0] . "\t" , $temp_array[1] . "\t" , $temp_array[1] + 1 . "\n");

Convert Unix `cal` output to latex table code: one-liner solution?

Trying to achieve the following struggled my mind:
Convert Unix cal output to latex table code, using a short and sweet one-liner (or few-liner).
E.g cal -h 02 2012 | $magicline should yield
Mo &Tu &We &Th &Fr \\
& & 1 & 2 & 3 \\
6 & 7 & 8 & 9 &10 \\
13 &14 &15 &16 &17 \\
20 &21 &22 &23 &24 \\
27 &28 & & & \\
The only reasonable solution I could come up with so far was
cal -h | sed -r -e '1d' -e \
's/^(..)?(...)?(...)?(...)?(...)?(...)?(...)?$/\2\t\&\3\t\&\4\t\&\5\t\&\6\t\\\\/'
... and I really tried hard. The nice thing about it being that it's uncomplicated and easy to understand, the bad thing about it that it's "unflexible" (It couldn't cope with a week of 8 days) and a little verbose. I'm looking for alternative solutions to learn from ;-)
EDIT: Found another one that seems acceptable
cal -h | tail -n +2 |
perl -ne 'chomp;
$,="\t&";
$\="\t\\\\\n";
$line=$_;
print map {substr($line,$_*3,3)} (1..5)'
EDIT: Nice one:
cal -h | perl \
-F'(.{1,3})' -ane \
'BEGIN{$,="\t&";$\="\t\\\\\n"}
next if $.==1;
print #F[3,5,7,9,11]'

Tested on OS-X:
cal 02 2012 |grep . |tail +2 |perl -F'/(.{3})/' -ane \
'chomp(#F=grep $_,#F); $m=$#F if !$m; printf "%s"."\t&%s"x$m."\t\\\\\n", #F;'
Where cal output has 3-character columns; {3} could be changed to match your cal output.

Using the GNU version of awk:
My output of cal using an english LANG.
Command:
LANG=en_US cal
Output:
February 2012
Su Mo Tu We Th Fr Sa
1 2 3 4
5 6 7 8 9 10 11
12 13 14 15 16 17 18
19 20 21 22 23 24 25
26 27 28 29
The awk one-line:
LANG=en_US cal | awk '
BEGIN {
FIELDWIDTHS = "3 3 3 3 3 3 3";
OFS = "&";
}
FNR == 1 || $0 ~ /^\s*$/ { next }
{
for (i=2; i<=6; i++) {
printf "%-3s%2s", $i, i < 6 ? OFS : "\\\\";
}
printf "\n";
}'
Result:
Mo &Tu &We &Th &Fr \\
& & 1 & 2 & 3 \\
6 & 7 & 8 & 9 &10 \\
13 &14 &15 &16 &17 \\
20 &21 &22 &23 &24 \\
27 &28 &29 & & \\

cal 02 2012|perl -lnE'$.==1||eof||do{$,="\t&";$\="\t\\\\\n";$l=$_;print map{substr($l,$_*3,3)}(1..5)}'
my new favorite:
cal 02 2012|perl -F'(.{1,3})' -anE'BEGIN{$,="\t&";$\="\t\\\\\n"}$.==1||eof||do{$i//=#F;print#F[map{$_*2-1}(1..$i/2)]}'

This might work for you:
cal | sed '1d;2{h;s/./ /g;x};/^\s*$/b;G;s/\n/ /;s/^...\(.\{15\}\).*/\1/;s/.../ &\t\&/g;s/\&$/\\\\/'

This works for my implementation of cal, which uses four-character columns and has an initial title line showing the month and year
cal | perl -pe "next if $.==1;s/..../$&&/g;s/&$/\\\\/"
It looks as though yours may have eight-character columns and has no title line, in which case
cal | perl -pe "s/.{8}/$&&/g;s/&$/\\\\/"
should do the trick, but be prepared to tweak it.

cal -h 02 2012| cut -c4-17 | sed -r 's/(..)\s/\0\t\&/g' | sed 's/$/\t\\\\/' | head -n-1 | tail -n +2
This will produce:
Mo &Tu &We &Th &Fr \\
& & 1 & 2 & 3 \\
6 & 7 & 8 & 9 &10 \\
13 &14 &15 &16 &17 \\
20 &21 &22 &23 &24 \\
27 &28 &29 & & \\
You can easily replace \t with number of spaces you wish

SED: How to remove every 10 lines in a file (thin or subsample the file)

I have this so far:
sed -n '0,10p' yourfile > newfile
But it is not working, just outputs a blank file :(

Your question is ambiguous, so here is every permutation I can think of:
Print only the first 10 lines
head -n10 yourfile > newfile
Skip the first 10 lines
tail -n+10 yourfile > newfile
Print every 10th line
awk '!(NR%10)' yourfile > newfile
Delete every 10th line
awk 'NR%10' yourfile > newfile

(Since an ambiguous questions can only have an ambiguous answer...)
To print every tenth line (GNU sed):
$ seq 1 100 | sed -n '0~10p'
10
20
30
40
...
100
Alternatively (GNU sed):
$ seq 1 100 | sed '0~10!d'
10
20
30
40
...
100
To delete every tenth line (GNU sed):
$ seq 1 100 | sed '0~10d'
1
...
9
11
...
19
21
...
29
31
...
39
41
...
To print the first ten lines (POSIX):
$ seq 1 100 | sed '11,$d'
1
2
3
4
5
6
7
8
9
10
To delete the first ten lines (POSIX):
$ seq 1 100 | sed '1,10d'
11
12
13
14
...
100

python -c "import sys;sys.stdout.write(''.join(line for i, line in enumerate(open('yourfile')) if i%10 == 0 ))" >newfile
It is longer, but it is a single language - not different syntax and aprameters for each thing one tries to do.

With non-GNU sed, to print every 10th line use
sed '10,${p;n;n;n;n;n;n;n;n;n;}'
(GNU : sed -n '0~10p')
and to delete every 10th line use
sed 'n;n;n;n;n;n;n;n;n;d;'
(GNU : sed -n '0~10d')

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse

How to remove lines which contain missing values - sed

grep -Ev '\. \. \. \. \. \. \. \. \. \. \. \. \. \. \. \. \. \. \. \. \. \. \. \. \. \. \. \. \. \. \. \. \. \. \. \. \. \. \. \. \. \.' yourfile

Not the most readable, but hey!, its perl: perl -ane 'print unless q|.| x 42 eq join q||, #F[4..$#F]' infile

sed '/( .){26}/d' filename EDIT: Correction: sed '/\( \.\)\{42\}/d' filename or for a variable number of columns after the first 4: sed '/^\([^ ]* \)\{4\}\(\. \)*\./d' filename

This might work for you (GNU sed): sed -r '/(\.\s*){42}$/d' /file or sed 's/\./&/42;T;d' file N.B. the most efficient is probably the first solution.

Some awk verison awk '{a=$0} gsub(/\./,x)!=42 {print a}' file This prints all line that do not have 42 . using gsub to count them. awk -F\. NF!=43 file This counts number of fields using . as separator. (that's why 43 and not 42)

Related

how print lines between 3 & 6 lines using sed?

Why does sed (insert line) output spaces between each character?

Data manipulation using Perl script

Convert Unix `cal` output to latex table code: one-liner solution?

SED: How to remove every 10 lines in a file (thin or subsample the file)

Categories

Resources