Data manipulation using Perl script

Data manipulation using Perl script - perl

Input:
Col1 col2 col3 col4
aaa 15 23 A
bbb 7 5 B
ccc 43 10 C
Expected output
aaa 15 16
bbb 7 8
ccc 43 44
I know to get this using awk but I need to do this in Perl. I tried using an array in Perl like
push(#output_array, $temp_array[0] . "\t" . $temp_array[1] . "\n");
I don't know how to add 1 to the col2 and make it as col3. Can anybody help me out?

In a perl oneliner
perl -lane 'print join("\t", #F[0,1], $F[1] + 1)' file.txt
If you want to truncate a header row:
perl -lane 'print join("\t", #F[0,1], $. == 1 ? $F[2] : $F[1] + 1)' file.txt
If you want to completely remove a header row:
perl -lane 'print join("\t", #F[0,1], $F[1] + 1) if $. > 1' file.txt

push(#output_array, $temp_array[0] . "\t" , $temp_array[1] . "\t" , $temp_array[1] + 1 . "\n");

Related

I want to add new line after every 4 spaces

i am generating 20 numbers and then i am shuffling it
perl -e 'foreach(1..20){print ",$_ "} '
| perl -MList::Util=shuffle -F',' -lane 'print shuffle #F'
and the output is:-
19 15 11 9 8 13 18 4 2 7 5 20 10 14 3 16 1 17 6 12
Now i want the output something like this:-
19 15 11 9
8 13 18 4
2 7 5 20
...
Any help will be appreciated

Doing that in several steps on the command line is ... strange. You can just do it in one program.
use strict;
use warnings;
use List::Util 'shuffle';
my $count = 1;
foreach my $i ( shuffle 1 .. 20) {
print "$i ";
print "\n\n" unless $count++ % 4;
}
This shuffles the list of 1 to 20 directly and then prints each item, but prints two linebreaks after every four. The % is the modulo operator that returns the left-over from a division by 4. So whenever the $count is divisible by 4, it returns 0, and the print kicks in. On the command line it would be like this:
$ perl -MList::Util=shuffle -e '$c=0; for (shuffle 1..20) { print"$_ "; print "\n\n" unless $c++%4}'
Here's the output:
11 20 8 17
10 18 19 6
1 14 7 5
13 16 4 3
9 2 15 12

You could also use a splice call to chop the result of the shuffle list up as you want and print it that way if you didn't want to code an explicit counter. Something like this:
perl -MList::Util=shuffle -e '#list=shuffle(1..20); while (#ret_line = splice(#list, 0, 4)) {print "#ret_line\n\n"}'

I'd put the numbers into an array and use splice to remove them in blocks of four:
use strict;
use warnings 'all'
use List::Util 'shuffle';
my #nums = shuffle 1 .. 20;
print join(" ", splice #nums, 0, 4), "\n\n" while #nums;

I need to print columns from two different files with the different numbers of rows into one file

File1.txt
123 321 231
234 432 342
345 543 453
file2.txt
abc bca cba
def efd fed
ghi hig ihg
jkl klj lkj
mno nom onm
pqr qrp rqp
I want output file like
Outfile.txt
123 321 231 abc bca cba
234 432 342 def efd fed
345 543 453 ghi hig ihg
jkl klj lkj
mno nom onm
pqr qrp rqp

Most simply:
sed 's/$/ /' file1 | paste -d '' - file2
This appends spaces to the end of lines in file1 and pastes the output of that together with file2 without a delimiter.
Alternatively, if you know that file2 is longer than file1,
awk 'NR == FNR { line1[NR] = $0 " "; next } { print line1[FNR] $0 }' file1 file2
or if you don't know it,
awk 'NR == FNR { n = NR; line1[n] = $0 " "; next } { print line1[FNR] $0 } END { for(i = FNR + 1; i <= n; ++i) print line1[i]; }' file1 file2
also works.

How to capture several text from a file and print it with specific format?

I have a file with the following content:
CLASS
1001
CATEGORY
11 12 13 15
16 17
CLASS
3101
CATEGORY
900 901 902 904 905 907
908 909
910 912 913
CLASS
8000
CATEGORY
400 401 402 403
and I like to reformat it using perl or awk to get the following result:
1001 11&12&13&15&16&17
3101 900&901&902&904&905&907&908&909&910&912&913
8000 400&401&402&403
Your help would be appreciated. (I used to do it with excel VBA), but this time I like to make it simple using perl or awk. Thanks in advance. :)

perl -lne'
BEGIN{ $/ ="CLASS"; $" ="&" }
($x, #F) = /\d+/g or next;
print "$x #F"
' file
output
1001 11&12&13&15&16&17
3101 900&901&902&904&905&907&908&909&910&912&913
8000 400&401&402&403

Another awk version
awk '/CLASS/ {c=1;f=0;if (NR>1) print a;next} c {a=$0 " ";c=0} /CATEGORY/ {f=1;c=0;next} f {gsub(/ /,"\\&",$0);a=a $0} END {print a}' file
1001 11&12&13&1516&17
3101 900&901&902&904&905&907908&909910&912&913
8000 400&401&402&403

How to remove lines which contain missing values

I have a file with 46 columns (4+42) and 52 million rows like:
chr1 rs423246 102 120543 0 2 2 1 1 0 . . . -1 2 2 0 0 . . . . . 2 1 1 -1 -1
chr1 rs245622 104 134506 2 2 2 1 0 0 0 2 2 2 -1 -1 . . . 2 2 1 1 1 1 1 1 . 2
chr1 rs267845 105 124564 . . . . . . . . . . . . . . . . . . . . . . . . . .
chr1 rs234579 106 125642 2 2 2 1 0 0 0 -1 -1 -1 1 0 0 2 1 0 . . . 2 . . 2 1 0
I would like to remove only lines which have missing value for all 42 columns.
My missing value is "." (e.g. row 3 in the above example should remove)
How I can remove these lines using commands in Unix such as BWK SED or something else.
Thanks for any help and advise.

grep -Ev '\. \. \. \. \. \. \. \. \. \. \. \. \. \. \. \. \. \. \. \. \. \. \. \. \. \. \. \. \. \. \. \. \. \. \. \. \. \. \. \. \. \.' yourfile

Not the most readable, but hey!, its perl:
perl -ane 'print unless q|.| x 42 eq join q||, #F[4..$#F]' infile

sed '/( .){26}/d' filename
EDIT:
Correction:
sed '/\( \.\)\{42\}/d' filename
or for a variable number of columns after the first 4:
sed '/^\([^ ]* \)\{4\}\(\. \)*\./d' filename

This might work for you (GNU sed):
sed -r '/(\.\s*){42}$/d' /file
or
sed 's/\./&/42;T;d' file
N.B. the most efficient is probably the first solution.

Some awk verison
awk '{a=$0} gsub(/\./,x)!=42 {print a}' file
This prints all line that do not have 42 . using gsub to count them.
awk -F\. NF!=43 file
This counts number of fields using . as separator. (that's why 43 and not 42)

Using perl hash for comparing columns of 2 files

I have asked this question(sorry for asking again, this time it is different and difficult) but I have tried a lot but did not achieve the results.
I have 2 big files (tab delimited).
first file ->
Col1 Col2 Col3 Col4 Col5 Col6 Col7 Col8
101_#2 1 H F0 263 278 2 1.5
102_#1 1 6 F1 766 781 1 1.0
103_#1 2 15 V1 526 581 1 0.0
103_#1 2 9 V2 124 134 1 1.3
104_#1 1 12 V3 137 172 1 1.0
105_#1 1 17 F2 766 771 1 1.0
second file ->
Col1 Col2 Col3 Col4
97486 H 262 279
67486 9 118 119
87486 9 183 185
248233 9 124 134
If col3 value/character (of file1) and col2 value/character (of file 2) are same and then compare col5 and col6 of file 1(like a range value) with col3 and col4 of file2, if range of file 1 is present in file 2 then return that row (from file1) and also add the extra column1 from file2 in output.
Expected output ->
Col1 Col2 Col3 Col4 Col5 Col6 Col7 Col8 Col9
101_#2 1 H F0 263 278 2 1.5 97486
103_#1 2 9 V2 124 134 1 1.3 248233
So far I have tried something with hashes->
#ARGV or die "No input file specified";
open my $first, '<',$ARGV[0] or die "Unable to open input file: $!";
open my $second,'<', $ARGV[1] or die "Unable to open input file: $!";
print scalar (<$first>);
while(<$second>){
chomp;
#line=split /\s+/;
$hash{$line[2]}=$line[3];
}
while (<$first>) {
#cols = split /\s+/;
$p1 = $cols[4];
$p2 = $cols[5];
foreach $key (sort keys %hash){
if ($p1>= "$key"){
if ($p2<=$hash{$key})
{
print join("\t",#cols),"\n";
}
}
else{ next; }
}
}
But there is no comparison of col3 value/character (of file1) and col2 value/character (of file 2)in above code.
But this is also taking lot of time and memory.Can anybody suggest how I can make it fast using hashes or hashes of hashes.Thanks a lot.
Hello everyone,
Thanks a lot for your help. I figured out an efficient way for my own question.
#ARGV or die "No input file specified";
open $first, '<',$ARGV[0] or die "Unable to open input file: $!";
open $second,'<', $ARGV[1] or die "Unable to open input file: $!";
print scalar (<$first>);
while(<$second>){
chomp;
#line=split /\s+/;
$hash{$line[1]}{$line[2]}{$line[3]}= $line[0];
}
while (<$first>) {
#cols = split /\s+/;
foreach $key1 (sort keys %hash) {
foreach $key2 (sort keys %{$hash{$key1}}) {
foreach $key3 (sort keys %{$hash{$key1}{$key2}}) {
if (($cols[2] eq $key1) && ($cols[4]>=$key2) && ($cols[5]<=$key3)){
print join("\t",#cols),"\t",$hash{$key1}{$key2}{$key3},"\n";
}
last;
}
}
}
}
Is it right?

You don't need two hash tables. You just need one hash table built from entries in the first file, and when you loop through the second file, check if there's a key in the first-file hash table using defined.
If there is a key, do your comparisons on the values of other columns (we store values from the first file in the hash table for the third column's key).
If there's no key, then either warn, die, or have the script just keep going without saying anything, if that's what you want:
#!/usr/bin/perl -w
use strict;
use warnings;
my $firstHashRef;
open FIRST, "< $firstFile" or die "could not open first file...\n";
while (<FIRST>) {
chomp $_;
my #elements = split "\t", $_;
my $col3Val = $elements[2]; # Perl arrays are zero-indexed
my $col5Val = $elements[4];
my $col6Val = $elements[5];
# keep the fifth and sixth column values on hand, for
# when we loop through the second file...
if (! defined $firstHashRef->{$col3Val}) {
$firstHashRef->{$col3Val}->{Col5} = $col5Val;
$firstHashRef->{$col3Val}->{Col6} = $col6Val;
}
}
close FIRST;
open SECOND, "< $secondFile" or die "could not open second file...\n";
while (<SECOND>) {
chomp $_;
my #elements = split "\t", $_;
my $col2ValFromSecondFile = $elements[1];
my $col3ValFromSecondFile = $elements[2];
my $col4ValFromSecondFile = $elements[3];
if (defined $firstHashRef->{$col2ValFromSecondFile}) {
# we found a matching key
# 1. Compare $firstHashRef->{$col2ValFromSecondFile}->{Col5} with $col3ValFromSecondFile
# 2. Compare $firstHashRef->{$col2ValFromSecondFile}->{Col6} with $col4ValFromSecondFile
# 3. Do something interesting, based on comparison results... (this is left to you to fill in)
}
else {
warn "We did not locate entry in hash table for second file's Col2 value...\n";
}
}
close SECOND;

How about using just awk for this -
awk '
NR==FNR && NR>1{a[$3]=$0;b[$3]=$5;c[$3]=$6;next}
($2 in a) && ($3<=b[$2] && $4>=c[$2]) {print a[$2],$1}' file1 file2
Input Data:
[jaypal:~/Temp] cat file1
Col1 Col2 Col3 Col4 Col5 Col6 Col7 Col8
101_#2 1 H F0 263 278 2 1.5
109_#2 1 H F0 263 278 2 1.5
102_#1 1 6 F1 766 781 1 1.0
103_#1 2 15 V1 526 581 1 0.0
103_#1 2 9 V2 124 134 1 1.3
104_#1 1 12 V3 137 172 1 1.0
105_#1 1 17 F2 766 771 1 1.0
[jaypal:~/Temp] cat file2
Col1 Col2 Col3 Col4
97486 H 262 279
67486 9 118 119
87486 9 183 185
248233 9 124 134
Test:
[jaypal:~/Temp] awk '
NR==FNR && NR>1{a[$3]=$0;b[$3]=$5;c[$3]=$6;next}
($2 in a) && ($3<=b[$2] && $4>=c[$2]) {print a[$2],$1}' file1 file2
101_#2 1 H F0 263 278 2 1.5 97486
103_#1 2 9 V2 124 134 1 1.3 248233

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse

Data manipulation using Perl script - perl

push(#output_array, $temp_array[0] . "\t" , $temp_array[1] . "\t" , $temp_array[1] + 1 . "\n");

Related

I want to add new line after every 4 spaces

I need to print columns from two different files with the different numbers of rows into one file

How to capture several text from a file and print it with specific format?

How to remove lines which contain missing values

Using perl hash for comparing columns of 2 files

Categories

Resources