How can I merge some columns of two files using perl? - perl

I want to merge the first column of input1.txt and the third column of input2.txt. How can I do it? My code doesn't do what I want.
Input1:
1 6
2 7
3 8
4 9
Input2:
a 4 8
b 6 7
c 3 4
d 2 6
Requested output:
1 8
2 7
3 4
4 6
My code:
#!/usr/bin/perl
use strict;
use warnings;
open my $input1, '<', "input1.txt" or die qq{Failed to open "input1.txt" for writing: $!};
open my $input2, '<', "input2.txt" or die qq{Failed to open "input2.txt" for writing: $!};
open my $outfile, '>', "outfile.txt" or die qq{Failed to open "outfile.txt" for writing: $!};
while(<$input1>)
{
my #columns1 = split;
print $outfile join("\t", $columns1[0], "\n");
}
while(<$input2>)
{
my #columns2 = split;
print $outfile join("\t", $columns2[2], "\n");
}
close($input1);
close($input2);
close($outfile);

Another way to get the requested output is to use one while loop instead of two:
mod.pl
#!/usr/bin/perl
use strict;
use warnings;
open my $input1, '<', "input1.txt" or die qq{Failed to open "input1.txt" for writing: $!};
open my $input2, '<', "input2.txt" or die qq{Failed to open "input2.txt" for writing: $!};
open my $outfile, '>', "outfile.txt" or die qq{Failed to open "outfile.txt" for writing: $!};
while(my $l1 = <$input1>){
my $l2 = <$input2>;
chomp $l1;
chomp $l2;
my #columns1 = split(/ /, $l1);
my #columns2 = split(/ /, $l2);
print $outfile join("\t", $columns1[1-1], $columns2[3-1]),"\n";
}
close($input1);
close($input2);
close($outfile);
#$ perl mod.pl
#$ cat outfile.txt
1 8
2 7
3 4
4 6

Do this:
$filename1 = $ARGV[0]; #for taking input1.txt as the first argument
$filename2 = $ARGV[1]; #for taking input2.txt as the second argument
#data1;
#column1;
open(INPUT_FILE, $filename1)
or die "Couldn't open $filename1!";
while (<INPUT_FILE>) {
my $currentLine = $_; #read the input file one line at a time, storing it to $currentLine
#data1 = split " ", $currentLine; #split your line by space
$firstcolumn = $data1[0]; #store the first column's data
push #column1, $firstcolumn ; #push the first column's data into an array
}
#data2;
#column3;
open(INPUT_FILE, $filename2)
or die "Couldn't open $filename2!";
while (<INPUT_FILE>) {
my $currentLine = $_;
#data2 = split " ", $currentLine;
$thirdcolumn = $data2[2]; #store the third column's data
push #column3, $thirdcolumn ;
}
$size = #column1;
open (OUTPUTFILE, '>>outfile.txt');
for($i = 0; $i < $size; $i++){
print OUTPUTFILE "$column1[$i] $column3[$i]\n"; #writing each entry into the outfile.txt
}
close(INPUT_FILE);
close (OUTPUTFILE);
And when you run your perl program in command line, do:
yourprogram.pl input1.txt input2.txt outfile.txt
And it should work.
I tried the program and opened the outfile.txt and your requested output is in there.

Your code print serially, but you need is parallel
#!/usr/bin/perl
use strict;
use warnings;
open my $input1, '<', "input1.txt" or die qq{Failed to open "input1.txt" for writing: $!};
open my $input2, '<', "input2.txt" or die qq{Failed to open "input2.txt" for writing: $!};
open my $outfile, '>', "outfile.txt" or die qq{Failed to open "outfile.txt" for writing: $!};
my ($line1, $line2);
while(1)
{
$line1 = <$input1> || '';
$line2 = <$input2> || '';
my #columns1 = split ' ', $line1;
my #columns2 = split ' ', $line2;
print $outfile join("\t", $columns1[0], $columns2[2]), "\n";
last if !$line1 && !$line2;
}
close($input1);
close($input2);
close($outfile);

It doesn't have to be this complicated. Read the first file's first column in an array and print it along with the third field of second file. Unless you have files with different number of rows, this should work just fine.
perl -lane'
BEGIN { $x = pop; #col1 = map { (split)[0] } <>; #ARGV = $x }
print join " ", $col1[$.-1], $F[-1]
' input1 input2
1 8
2 7
3 4
4 6

Related

I want to write multiple files from one file without using array to remove complexity

I want to write multiple files from one file (getting latest data every time) without using array to remove complexity. I already tried it using array but when data is high than it will slow down the process.
Kindly give some hint to me how I will remove the complexity of the program.
Input: read a text file from a directory.
Output:
File1.pl - 1 2 3 4 5 6
File2.pl - 6 7 8 9 10
File3.pl -11 12 13 14 15
File4.pl -16 17 18 19 20
I do this using array:
use feature 'state';
open (DATA,"<","e:/today.txt");
#array=<DATA>;
$sizeofarray=scalar #array;
print "Total no. of lines in file is :$sizeofarray";
$count=1;
while($count<=$sizeofarray)
{
open($fh,'>',"E:/e$count.txt");
print $fh "#array[$count-1..($count+3)]\n";
$count+=5;
}
Store lines in a small buffer, and open a file every fifth line and write the buffer to it
use warnings;
use strict;
use feature 'say';
my $infile = shift || 'e:/today.txt';
open my $fh_in, '<', $infile or die "Can't open $infile: $!";
my ($fh_out, #buf);
while (<$fh_in>) {
push #buf, $_;
if ($. % 5 == 0) {
my $file = 'e' . (int $./5) . '.txt';
open $fh_out, '>', $file or do {
warn "Can't open $file: $!";
next;
};
print $fh_out $_ for #buf;
#buf = ();
}
}
# Write what's left over, if any, after the last batch of five
if (#buf) {
my $file = 'e' . ( int($./5)+1 ) . '.txt';
open $fh_out, '>', $file or die "Can't open $file: $!";
print $fh_out $_ for #buf;
}
As I observed from your code You can try this
use warnings;
use strict;
open (my $fh,"<","today.txt") or die "Error opening $!";
my $count = 1;
while(my $line = <$fh>)
{
open my $wh,'>',"e$count.txt" or die "Error creating $!";
print $wh $line;
for(1..4){
if(my $v = scalar <$fh>){
print $wh $v ;
}
else{
last ;
}
}
$count++;
}

How to reset $.?

I know $. shows the line number when $/ is set to "\n".
I wanted to emulate the Unix tail command in Perl and print the last 10 lines from a file but $. didn't work. If the file contains 14 lines it starts from 15 in the next loop.
#!/usr/bin/perl
use strict;
use warnings;
my $i;
open my $fh, '<', $ARGV[0] or die "unable to open file $ARGV[0] :$! \n";
do { local $.; $i = $. } while (<$fh>);
seek $fh, 0, 0;
if ($i > 10) {
$i = $i - 10;
print "$i \n";
while (<$fh>) {
#local $.;# tried doesn't work
#undef $.; #tried doesn't work
print "$. $_" if ($. > $i);
}
}
else {
print "$_" while (<$fh>);
}
close($fh);
I want to reset $. so it can be used usefully in next loop.
Using local with $. does something else than you think:
Localizing $. will not
localize the filehandle's line count. Instead, it will localize
perl's notion of which filehandle $. is currently aliased to.
$. is not read-only, it can be assigned to normally.
1 while <$fh>;
my $i = $.;
seek $fh, $. = 0, 0;
You must reopen the file handle. Otherwise, as you have found, the line number just continues to increment
#!/usr/bin/perl
use strict;
use warnings;
my ($filename) = #ARGV;
my $num_lines;
open my $fh, '<', $filename or die qq{Unable to open file "$filename" for input: $!\n};
++$num_lines while <$fh>;
open $fh, '<', $filename or die qq{Unable to open file "$filename" for input: $!\n};
print "$num_lines lines\n";
while ( <$fh> ) {
print "$. $_" if $. > $num_lines - 10;
}
Here's a neater way
#!/usr/bin/perl
use strict;
use warnings;
my ($filename) = #ARGV;
my #lines;
open my $fh, '<', $filename or die qq{Unable to open file "$filename" for input: $!\n};
while ( <$fh> ) {
push #lines, $_;
shift #lines while #lines > 10;
}
print #lines;

perl match lines from one fine to another file then output the current line and the next line to a new file [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 9 years ago.
Improve this question
If any of you could modify the code so that the sequence names in file 1 are searched within file 2, and if there is a match, the lines in file 1 and its next line are copied to an outfile. right now the code only copies the matched titles but not its next line which is the sequence to the outfile. thanks
for example:
FILE 1 :
SEQUENCE 1 NAME
SEQUENCE 2 NAME
SEQUENCE 3 NAME
FILE 2:
SEQUENCE 1 NAME
AGTCAGTCAGTCAGTCAGTC
SEQUENCE 2 NAME
AAGGGTTTTCCCCCCAAAAA
SEQUENCE 3 NAME
GGGGTTTTTTTTTTAAAAAC
SEQUENCE 4 NAME
AAGTCCCCCCCCCCAAGGTT
etc.
OUTFILE:
SEQUENCE 1 NAME
AGTCAGTCAGTCAGTCAGTC
SEQUENCE 2 NAME
AAGGGTTTTCCCCCCAAAAA
SEQUENCE 3 NAME
GGGGTTTTTTTTTTAAAAAC
code:
use strict;
use warnings;
my $f1 = 'FILE1.fasta';
open FILE1, "$f1" or die "Could not open file \n";
my $f2= 'FILE2.fasta';
open FILE2, "$f2" or die "Could not open file \n";
my $outfile = $ARGV[1];
my #outlines;
my $n=0;
foreach (<FILE1>) {
my $y = 0;
my $outer_text = $_ ;
seek(FILE2,0,0);
foreach (<FILE2>) {
my $inner_text = $_;
if($outer_text eq $inner_text) {
print "$outer_text\n";
push(#outlines, $outer_text);
$n++;
}
}
}
open (OUTFILE, "sequences.fasta") or die "Cannot open $outfile \ +n";
print OUTFILE #outlines;
close OUTFILE;
For very large FILE1, %seen hash could be tied to some of DBM storage,
use strict;
use warnings;
my $f1 = 'FILE1.fasta';
open FILE1, "<", $f1 or die $!;
my $f2 = 'FILE2.fasta';
open FILE2, "<", $f2 or die $!;
# my $outfile = $ARGV[1];
open OUTFILE, ">", "sequences.fasta" or die $!;
my %seen;
while (<FILE1>) {
$seen{$_} = 1;
}
while (<FILE2>) {
my $next_line = <FILE2>;
if ($seen{$_}) {
print OUTFILE $_, $next_line;
}
}
close OUTFILE;
I would put the contents of file 2 into a hash, then check if each record from file 1 was in the hash:
#!perl
use strict;
use warnings;
my $f2= 'FILE2.fasta';
open FILE2, "$f2" or die "Could not open file \n";
my $k;
my $v;
my %hash;
while (defined($k = <FILE2>)) {
chomp $k;
$v = <FILE2>;
$hash{$k} = $v;
}
my $f1 = 'FILE1.fasta';
open FILE1, "$f1" or die "Could not open file \n";
open (OUTFILE, ">sequences.fasta") or die "Cannot open seqeneces.fasta\n";
while (<FILE1>) {
chomp;
if (exists($hash{$_})) {
print OUTFILE "$_\n";
print OUTFILE "$hash{$_}\n";
}
}
close OUTFILE;

How can I print lines from a file to separate files

I have a file which has lines like this:
1 107275 447049 scaffold1443 465 341154 -
There are several lines which starts with one, after that a blank line separates and start lines with 2 and so on.
I want to separate these lines to different files based on their number.
I wrote this script but it prints in every file only the first line.
#!/usr/bin/perl
#script for choosing chromosome
use strict;
my $filename= $ARGV[0];
open(FILE, $filename);
while (my $line = <FILE>) {
my #data = split('\t', $line);
my $length = #data;
#print $length;
my $num = $data[0];
if ($length == 6) {
open(my $fh, '>', $num);
print $fh $line;
}
$num = $num + 1;
}
please, i need your help!
use >> to open file for appending to end of it as > always truncates desired file to zero bytes,
use strict;
my $filename = $ARGV[0];
open(my $FILE, "<", $filename) or die $!;
while (my $line = <$FILE>) {
my #data = split('\t', $line);
my $length = #data;
#print $length;
my $num = $data[0];
if ($length == 6) {
open(my $fh, '>>', $num);
print $fh $line;
}
$num = $num + 1;
}
If I understand your question correctly, then paragraph mode might be useful. This breaks a record on two or more new-lines, instead of just one:
#ARGV or die "Supply a filename\n";
my $filename= $ARGV[0];
local $/ = ""; # Set paragraph mode
open(my $file, $filename) or die "Unable to open '$filename' for read: $!";
while (my $lines = <$file>) {
my $num = (split("\t", $lines))[0];
open(my $fh, '>', $num) or die "Unable to open '$num' for write: $!";
print $fh $lines;
close $fh;
}
close $file;

Vertical index Perl

File 1 has ranges 3-9, 2-6 etc
3 9
2 6
12 20
File2 has values: column 1 indicates the range and column 2 has values.
1 4
2 4
3 5
4 4
5 4
6 1
7 1
8 1
9 4
I would like to calculate the sum of values (file2, column2) for ranges in file1). Eg: If range is 3-9, then sum of values will be 5+4+4+1+1+1+4 = 20
What I have tried is:
open (FILE1,"file1.txt");
open (FILE2,"file2.txt");
#file1 = <FILE1>;
#file2 = <FILE2>;
foreach (#file1)
{
#split_file2 = split("\\s",$_); //splitting the file by space
foreach (#file2)
{
#split_file2 = split("\\s",$_); //splitting the file by space
if (#split_file1[0] == #split_file2[0]) //if column0 of file1 matches with column0 of file2
{
$x += #split_file2[1]; //sum the column1 of file2
if ( #split_file2[0] == #split_file1[0] ) //until column1 of file1 = column0 of file2.
{
last;
}
}
}}
Always use use strict; use warnings;.
split /\s/ is easier to read. split ' ' is what you actually want.
Don't use global variables (e.g. for file handles).
It's useful to check if open succeeds, if only by adding or die $!.
Use meaningful names, not file1 and file2.
use strict;
use warnings;
use feature qw( say );
use List::Util qw( sum );
my $file1 = 'file1.txt';
my $file2 = 'file2.txt';
my #file2;
{
open(my $fh, '<', $file2)
or die "Can't open $file2: $!\n";
while (<$fh>) {
my ($k, $v) = split;
$file2[$k] = $v;
}
}
{
open(my $fh, '<', $file1)
or die "Can't open $file1: $!\n";
while (<$fh>) {
my ($start, $end) = split;
say sum grep defined, #file2[$start .. $end];
}
}
Another solution:
#!/usr/bin/perl
use strict;
use warnings;
my $f1 = shift;
my $f2 = shift;
open FH1, "<", $f1 or die "$!\n";
open FH2, "<", $f2 or die "$!\n";
my %data;
while (<FH1>) {
$data{$1} = $2 if ($_ =~ m/^(\d+)\s+(\d+)$/);
}
while (<FH2>) {
if ($_ =~ m/^(\d+)\s+(\d+)$/) {
my $sum;
for ($1..$2) {
$sum += $data{$_} if defined($data{$_});
}
print "sum for $1-$2: $sum\n" if defined($sum);
}
}
close FH1;
close FH2;
Call: script.pl values.txt ranges.txt