Input a list from a file in Perl - perl

I would like to input data from a list in Perl. So far, I have been pasting lists of data into the actual program for it to process. My original sorting program looked like this:
# /perl/bin
use strict; #force perl to code properly
use warnings; #find typing mistakes in program like missing semicolons, etc.
use Text::ParseWords; #parse text into an array of tokens or array of arrays
my #rows;
while (<DATA>) {
push #rows, [ parse_line(',', 0, $_) ];
}
#rows = sort { $a->[2] <=> $b->[2] } #rows;
open OUTPUT, ">OUTPUT.TXT";
foreach (#rows) {
print OUTPUT join ',', #$_;
}
__DATA__
SMITH,M,1
JONES,F,1
...
But I would like to have it input from a file that has this list instead. I'm not sure if I'm even on the right track but this is what I have so far:
# /perl/bin
use strict; #force perl to code properly
use warnings; #find typing mistakes in program like missing semicolons, etc.
use autodie; #replace functions with ones that succeed or die with lexical scope
use Text::ParseWords; #parse text into an array of tokens or array of arrays
open(MYINPUTFILE, "<inputfile.txt"); # open for input
my #rows = <MYINPUTFILE>; # read file into list
while (<MYINPUTFILE>) {
push #rows, [ parse_line(',', 0, $_) ];
}
#rows = sort { $a->[2] <=> $b->[2] } #rows;
open OUTPUT, ">OUTPUT.TXT";
foreach (#rows) {
print OUTPUT join ',', #$_;
}

Here's the crux of your problem:
open(MYINPUTFILE, "<inputfile.txt"); # open for input
my #rows = <MYINPUTFILE>; # read file into list
Since "#rows =" gives a "wantarray" or list
context to the right-hand side,
"<>" reads the entire file.
Then:
while (<MYINPUTFILE>) { # try to read file again,
# but you've already read it all
push #rows, [ parse_line(',', 0, $_) ];
}
You're trying to read the file twice.
There are other issues with your code, but you probably meant to write only:
open(MYINPUTFILE, "<inputfile.txt"); # open for input
my #rows;
while (<MYINPUTFILE>) {
push #rows, [ parse_line(',', 0, $_) ];
}
… by parallel to your earlier version.
At the least, you might consider a couple of Perlish changes;
open my $input_file, '<', 'inputfile.txt';
Using a lexical ("my") variable instead of a *FILEHANDLE is nicer in any more complex situation than this one. (Among other things, you can pass it to a subroutine much more easily.) Using the three-argument form of open also protects you against problems if you allow others to specify the filename to your program. You then would use <$input_file> in your while loop.

Related

Creating multiple hashes from multiple files in one go

I want to perform a vlookup like process but with multiple files wherein the contents of the first column from all files (sorted n uniq-ed) is reference value. Now I would like to store these key-values pairs from each file in each hash and then print them together. Something like this:
file1: while(){$hash1{$key}=$val}...file2: while(){$hash2{$key}=$val}...file3: while(){$hash3{$key}=$val}...so on
Then print it: print "$ref_val $hash1{$ref_val} $hash3{$ref_val} $hash3{$ref_val}..."
$i=1;
#FILES = #ARGV;
foreach $file(#FILES)
{
open($fh,$file);
$hname="hash".$i; ##trying to create unique hash by attaching a running number to hash name
while(<$fh>){#d=split("\t");$hname{$d[0]}=$d[7];}$i++;
}
$set=$i-1; ##store this number for recreating the hash names during printing
open(FH,"ref_list.txt");
while(<FH>)
{
chomp();print "$_\t";
## here i run the loop recreating the hash names and printing its corresponding value
for($i=1;$i<=$set;$i++){$hname="hash".$i; print "$hname{$_}\t";}
print "\n";
}
Now this where I am stuck perl takes $hname as hash name instead of $hash1, $hash2...
Thanks in advance for the helps and opinions
The shown code attempts to use symbolic references to construct variable names at runtime. Those things can raise a lot of trouble and should not be used, except very occasionally in very specialized code.
Here is a way to read multiple files, each into a hash, and store them for later processing.
use warnings;
use strict;
use feature 'say';
use Data::Dump qw(dd);
my #files = #ARGV;
my #data;
for my $file (#files) {
open my $fh, '<', $file or do {
warn "Skip $file, can't open it: $!";
next;
};
push #data, { map { (split /\t/, $_)[0,6] } <$fh> };
}
dd \#data;
Each hash associates the first column with the seventh (index 6), as clarified, for each line. A reference to such a hash for each file, formed by { }, is added to the array.
Note that when you add a key-value pair to a hash which already has that key the new overwrites the old. So if a string repeats in the first column in a file, the hash for that file will end up with the value (column 7) for the last one. The OP doesn't discuss possible duplicates of this kind in data files (only for the reference file), please clarify if needed.
The Data::Dump is used only to print; if you don't wish to install it use core Data::Dumper.
I am not sure that I get the use of that "reference file", but you can now go through the array of hash references for each file and fetch values as needed. Perhaps like
open my $fh_ref, '<', $ref_file or die "Can't open $ref_file: $!";
while (my $line = <$fh_ref>) {
my $key = ... # retrieve the key from $line
print "$key: ";
foreach my $hr (#data) {
print "$hr->{$key} ";
}
say '';
}
This will print key: followed by values for that string, one from each file.

Why my sorting fails for double digits using perl? [duplicate]

I have 1500 files in one directory and I need to get some information out of every one and write it into a new, single file. The file names consist of a word and a number (Temp1, Temp2, Temp3 and so on) and it is important that the files are read in the correct order according to the numbers.
I did this using
my #files = <Temp*.csv>;
for my $file (#files)
{
this part appends the required data to a seperate file and works fine
}
my problem now is that the files are not opened in the correct order but after file 1 the file 100 gets opened.
Can anybody please give me a hint how I can make it read the files in the right order?
Thank you,
Ca
Sort the files naturally with Sort::Key::Natural natsort.
The following will automatically sort the files naturally, separating out alpha and numerical portions of the name for the appropriate sort logic.
use strict;
use warnings;
use Sort::Key::Natural qw(natsort);
for my $file ( natsort <Temp*.csv> ) {
# this part appends the required data to a seperate file and works fine
}
The following fake data should demonstrate this module in action:
use strict;
use warnings;
use Sort::Key::Natural qw(natsort);
print natsort <DATA>;
__DATA__
Temp100.csv
Temp8.csv
Temp20.csv
Temp1.csv
Temp7.csv
Outputs:
Temp1.csv
Temp7.csv
Temp8.csv
Temp20.csv
Temp100.csv
You can use Schwartzian transform to read and sort files in one step,
my #files =
map { $_->[0] }
sort { $a->[1] <=> $b->[1] }
map { [ $_, /(\d+)/ ] } <Temp*.csv>;
or using less efficient, and more straightforward sort,
my #files = sort { ($a =~ /(\d+)/)[0] <=> ($b =~ /(\d+)/)[0] } <Temp*.csv>;
If the numbers are really important, you might want to read them specifically after file name, with error reporting about missing files:
my #nums = 1 .. 1500; # or whatever the highest is
for my $num (#nums) {
my $file = "Temp$num.csv";
unless (-e $file) {
warn "Missing file: $file";
next;
}
...
# proceed as normal
}
If you need a file count, you can simply use your old glob:
my #files = <Temp*.csv>;
my $count = #files; # get the size of the array
my #nums = 1 .. $count;
On the other hand, if you control the process that prints the files, you might select a format that will automatically sort itself, such as:
temp00001.csv
temp00002.csv
temp00003.csv
temp00004.csv
...
temp00101.csv

Merging two files based on first column and returns multiple values for each key

I am fairly new to Perl so hopefully this has a quick solution.
I have been trying to combine two files based on a key. The problem is there are multiple values instead of the one it is returning. Is there a way to loop through the hash to get the 1-10 more values it could be getting?
Example:
File Input 1:
12345|AA|BB|CC
23456|DD|EE|FF
File Input2:
12345|A|B|C
12345|D|E|F
12345|G|H|I
23456|J|K|L
23456|M|N|O
32342|P|Q|R
The reason I put those last one in is because the second file has a lot of values I don’t want but file 1 I want all values. The result I want is something like this:
WANTED OUTPUT:
12345|AA|BB|CC|A|B|C
12345|AA|BB|CC|D|E|F
12345|AA|BB|CC|G|H|I
23456|DD|EE|FF|J|K|L
23456|DD|EE|FF|M|N|O
Attached is the code I am currently using. It gives an output like so:
OUTPUT I AM GETTING:
12345|AA|BB|CC|A|B|C
23456|DD|EE|FF|J|K|L
My code so far:
#use strict;
#use warnings;
open file1, "<FILE1.txt";
open file2, "<FILE2.txt";
while(<file2>){
my($line) = $_;
chomp $line;
my($key, $value1, $value2, $value3) = $line =~ /(.+)\|(.+)\|(.+)\|(.+)/;
$value4 = "$value1|$value2|$value3";
$file2Hash{$key} = $value4;
}
while(<file1>){
my ($line) = $_;
chomp $line;
my($key, $value1, $value2, $value3) = $line =~/(.+)\|(.+)\|(.+)\|(.+)/;
if (exists $file2Hash{$key}) {
print $line."|".$file2Hash{$key}."\n";
}
else {
print $line."\n";
}
}
Thank you for any help you may provide,
Your overall idea is sound. However in file2, if you encounter a key you have already defined, you overwrite it with a new value. To work around that, we store an array(-ref) inside our hash.
So in your first loop, we do:
push #{$file2Hash{$key}}, $value4;
The #{...} is just array dereferencing syntax.
In your second loop, we do:
if (exists $file2Hash{$key}){
foreach my $second_value (#{$file2Hash{$key}}) {
print "$line|$second_value\n";
}
} else {
print $line."\n";
}
Beyond that, you might want to declare %file2Hash with my so you can reactivate strict.
Keys in a hash must be unique. If keys in file1 are unique, use file1 to create the hash. If keys are not unique in either file, you have to use a more complicated data structure: hash of arrays, i.e. store several values at each unique key.
I assume that each key in FILE1.txt is unique and that each unique key has at least one corresponding line in FILE2.txt.
Your approach is then quite close to what you need, you should just use FILE1.txt to create the hash from (as already mentioned here).
The following should work:
#!/usr/bin/perl
use strict;
use warnings;
my %file1hash;
open file1, "<", "FILE1.txt" or die "$!\n";
while (<file1>) {
my ($key, $rest) = split /\|/, $_, 2;
chomp $rest;
$file1hash{$key} = $rest;
}
close file1;
open file2, "<", "FILE2.txt" or die "$!\n";
while (<file2>) {
my ($key, $rest) = split /\|/, $_, 2;
if (exists $file1hash{$key}) {
chomp $rest;
printf "%s|%s|%s\n", $key, $file1hash{$key}, $rest;
}
}
close file2;
exit 0;

Perl- Extract each line from a txt file and store into different variables

I readin a txt file using a perl script, but im wondering how to store each line from the txt file into a different variable in the perl script using pattern matching. I can match a line using ~^>gi , but it displays both lines from the txt file with >gi (i.e line 1 & 3), also i want to read the two separate DNA sequences into different variables. Consider my example below.
file.txt
>gi102939
GATCTATC
>gi123453
CATCGACA
the perl script:
#!/usr/local/bin/perl
open (MYFILE, 'file.txt');
#array = <MYFILE>;
($first, $second, $third, $fourth, $fifth) = #array;
chomp $first, $second, $third, $fourth, $fifth;
print "Contents:\n #array";
if (#array =~ /^>gi/)
{
print "$first";
}
close (MYFILE);
Assuming that >gi.. are unique in the input, populate a hash where each key is associated with a sequence:
#!/usr/bin/perl
use warnings;
use strict;
my %hash;
my $last;
while (<DATA>) {
chomp;
if (/^>gi/) {
$last = $_;
} else {
$hash{$last} = $_;
}
}
foreach my $k (keys %hash) {
print "$k => $hash{$k}\n";
}
__DATA__
>gi102939
GATCTATC
>gi123453
CATCGACA
Please always use strict and use warnings at the top of your program, and declare your variables using my at their first point of use. This applies epecially when you are asking for help, as doing so can frequently reveal simlpe problems that could otherwise be overlooked.
As it stands, your program will read the file into #array and print it out. The test if (#array =~ /^>gi/) { ... } will force scalar context on the array, and so compare the number of elements in the array, presumably 5, with the regex pattern and fail.
What exactly are you trying to achieve? Reading a file into an array puts each line into a different scalar variables - the variables being the elements of the array
This one-liner reads the database and extracts one element:
perl < file.txt -e '#array=<>;chomp #array;%hash=#array;print $hash{">gi102939"}'
result:
GATCTATC

How to print all values of an array in Perl

I am trying to do print all of the values of an array from a CSV file. I am sort of manually doing this in the example below. Can someone show me the code for doing this for all of the fields of the array no matter how many fields there are? I'm basically just trying to print each field on a new line.
#!/usr/bin/perl
use strict;
use warnings;
use Text::CSV_XS;
my $file = 'test.csv';
my $csv = Text::CSV_XS->new ({
quote_char => '"',
escape_char => '#',
binary => 1,
keep_meta_info => 0,
allow_loose_quotes => 1,
allow_whitespace => 1,
});
open (CSV, "<", $file) or die $!;
while (<CSV>) {
if ($csv->parse($_)) {
my #columns = $csv->fields();
print "$columns[0]\r\n";
print "$columns[1]\r\n";
print "$columns[2]\r\n";
print "$columns[3]\r\n";
print "$columns[4]\r\n";
print "$columns[5]\r\n";
print "$columns[6]\r\n";
print "$columns[7]\r\n";
}
else {
my $err = $csv->error_input;
print "Failed to parse line: $err";
}
}
close CSV;
foreach(#columns)
{
print "$_\r\n";
}
Instead of all the columns[number].
For debugging purposes, Data::Dump is my weapon of choice. It basically pretty-prints data structures.
use strict;
use warnings;
use Data::Dump 'dump';
# Do some stuff....
dump #array; # Unlike Data::Dumper, there's no need to backslash ('\#array')
dump %hash; # Same goes for hashes
dump $arrayref;
dump $hashref; # References handled just as well
There are many other ways to print arrays, of course:
say foreach #columns; # If you have Perl 5.10+
print $_,"\n" foreach #columns; # If you don't
print "#columns"; # Prints all elements, space-separated by default
The 'best' answer depends on the situation. Why do you need it? What are you working with? And what do you want it for? Then season the code accordingly.
If you just want to print the elements separated by spaces:
print #columns;
If you want to be a bit more fancy, you can use join:
print join("\n", #columns);
If you need to do something more, iterate over it:
foreach (#columns) {
# do stuff with $_
}
If you're doing this for diagnostic purposes (as opposed to presentation) you might consider Data::Dumper. In any case it's a good tool to know about if you want a quick printout of more-or-less arbitrary data.
{ $"="\n"; print $fh "#files"; }