how to assign data into hash from an input file - perl

I am new to perl.
Inside my input file is :
james1
84012345
aaron5
2332111 42332
2345112 18238
wayne[2]
3505554
Question: I am not sure what is the correct way to get the input and set the name as key and number as values. example "james" is key and "84012345" is the value.
This is my code:
#!/usr/bin/perl -w
use strict;
use warnings;
use Data::Dumper;
my $input= $ARGV[0];
my %hash;
open my $data , '<', $input or die " cannot open file : $_\n";
my #names = split ' ', $data;
my #values = split ' ', $data;
#hash{#names} = #values;
print Dumper \%hash;

I'mma go over your code real quick:
#!/usr/bin/perl -w
-w is not recommended. You should use warnings; instead (which you're already doing, so just remove -w).
use strict;
use warnings;
Very good.
use Data::Dumper;
my $input= $ARGV[0];
OK.
my %hash;
Don't declare variables before you need them. Declare them in the smallest scope possible, usually right before their first use.
open my $data , '<', $input or die " cannot open file : $_\n";
You have a spurious space at the beginning of your error message and $_ is unset at this point. You should include $input (the name of the file that failed to open) and $! (the error reason) instead.
my #names = split ' ', $data;
my #values = split ' ', $data;
Well, this doesn't make sense. $data is a filehandle, not a string. Even if it were a string, this code would assign the same list to both #names and #values.
#hash{#names} = #values;
print Dumper \%hash;
My version (untested):
#!/usr/bin/perl
use strict;
use warnings;
use Data::Dumper;
#ARGV == 1
or die "Usage: $0 FILE\n";
my $file = $ARGV[0];
my %hash;
{
open my $fh, '<', $file or die "$0: can't open $file: $!\n";
local $/ = '';
while (my $paragraph = readline $fh) {
my #words = split ' ', $paragraph;
my $key = shift #words;
$hash{$key} = \#words;
}
}
print Dumper \%hash;
The idea is to set $/ (the input record separator) to "" for the duration of the input loop, which makes readline return whole paragraphs, not lines.
The first (whitespace separated) word of each paragraph is taken to be the key; the remaining words are the values.

You have opened a file with open() and attached the file handle to $data. The regular way of reading data from a file is to loop over each line, like so:
#!/usr/bin/env perl
use strict;
use warnings;
use Data::Dumper;
my $input = $ARGV[0];
my %hash;
open my $data , '<', $input or die " cannot open file : $_\n";
while (my $line = <$data>) {
chomp $line; # Removes extra newlines (\n)
if ($line) { # Checks if line is empty
my ($key, $value) = split ' ', $line;
$hash{$key} = $value;
}
}
print Dumper \%hash;

OK, +1 for using strict and warnings.
First Take a look at the $/ variable for controlling how a file is broken into records when it's read in.
$data is a file handle you need to extract the data from the file, if it's not to big you can load it all into an array, if it's a large file you can loop over each record at a time. See the <> operator in perlop
Looking at you code it appears that you want to end up with the following data structure from your input file
%hash(
james1 =>[
84012345
],
aaron5 => [
2332111,
42332,
2345112,
18238
]
'wayne[2]' => [
3505554,
]
)
See perldsc on how to do that.
All the documentation can be read using the perldoc command which comes with Perl. Running perldoc on its own will give you some tips on how to use it and running perldoc perldoc will give you possibly far more info than you need at the moment.

Related

perl array prints as GLOB(#x#########)

I have a file which contains a list of email addresses which are separated by a semi-colon which is configured much like this (but much larger) :
$ cat email_casper.txt
casper1#foo.com; casper2#foo.com; casper3#foo.com; casper.casper4#foo.com;
#these throw outlook error :
#casper101#foo.com ; casper100#foo.com
#cat /tmp/emailist.txt | tr '\n' '; '
#cat /tmp/emallist.txt | perl -nle 'print /\<(.*)\>/' | sort
I want to break it up on the semicolon - so I suck the whole file into an array supposedly the contents are split on semicolon.
#!/usr/bin/perl
use strict;
use warnings;
my $filename = shift #ARGV ;
open(my $fh, '<', $filename) or die "Could not open file $filename $!";
my #values = split(';', $fh);
foreach my $val (#values) {
print "$val\n";
}
exit 0 ;
But the file awards me with a golb. I just don't know what is going one.
$ ./spliton_semi.pl email_casper.txt
GLOB(0x80070b90)
If I use Data::Dumper I get
#!/usr/bin/perl
use strict;
use warnings;
use Data::Dumper ;
my $filename = shift #ARGV ;
open(my $fh, '<', $filename) or die "Could not open file $filename $!";
my #values = split(';', $fh);
print Dumper \#values ;
This is what the Dumper returns :
$ ./spliton_semi.pl email_casper.txt
$VAR1 = [
'GLOB(0x80070b90)'
];
You do not "suck the whole file into an array". You don't even attempt to read from the file handle. Instead, you pass the file handle to split. Expecting a string, it stringifies the file handle into GLOB(0x80070b90).
You could read the file into an array of lines as follows:
my #lines = <$fh>;
for my $line ($lines) {
...
}
But it's far simpler to read one line at a time.
while ( my $line = <$fh> ) {
...
}
In fact, there is no reason not to use ARGV here, simplifying your program to the following:
#!/usr/bin/perl
use strict;
use warnings;
use feature qw( say );
while (<>) {
chomp;
say for split /\s*;\s*/, $_;
}
This line
my #values = split(';', $fh);
is not reading from the filehandle like you think it is. You're actually calling split on the filehandle object itself.
You want this:
my $line = <$fh>;
my #values = split(';', $line);
Starting point:
#!/usr/bin/perl
use strict;
use warnings;
open(my $fh, '<', 'dummy.txt')
or die "$!";
my #values;
while (<$fh>) {
chomp;
# original
# push(#values, split(';', $_));
# handle white space
push(#values, split(/\s*;\s*/, $_));
}
close($fh);
foreach my $val (#values) {
print "$val\n";
}
exit 0;
Output for your example:
$ perl dummy.pl
casper1#foo.com
casper2#foo.com
casper3#foo.com
casper.casper4#foo.com

Print variable after closing the file in Perl

Below code works fine but I want $ip to be printed after closing the file.
use strict;
use warnings;
use POSIX;
my $file = "/tmp/example";
open(FILE, "<$file") or die $!;
while ( <FILE> ) {
my $lines = $_;
if ( $lines =~ m/address/ ) {
my ($string, $ip) = (split ' ', $lines);
print "IP address is: $ip\n";
}
}
close(FILE);
sample data in /tmp/example file
$cat /tmp/example
country us
ip_address 192.168.1.1
server dell
This solution looks for the first line that contains ip_address followed by some space and a sequence of digits and dots
Wrapping the search in a block makes perl delete the lexical variable $fh. Because it is a file handle, that handle will also be automatically closed
Note that I've used autodie to avoid the need to explicitly check the status of the open call
This algorithm will find the first occurrence of ip_address and stop reading the file immediately
use strict;
use warnings 'all';
use autodie;
my $file = '/tmp/example';
my $ip;
{
open my $fh, '<', $file;
while ( <$fh> ) {
if ( /ip_address\h+([\d.]+)/ ) {
$ip = $1;
last;
}
}
}
print $ip // 'undef', "\n";
output
192.168.1.1
Store all ips in an array and you'll then have it for later processing.
The shown code can also be simplified a lot. This assumes a four-number ip and data like that shown in the sample
use warnings;
use strict;
use feature 'say';
my $file = '/tmp/example';
open my $fh, '<', $file or die "Can't open $file: $!";
my #ips;
while (<$fh>) {
if (my ($ip) = /ip_address\s*(\d+\.\d+\.\d+\.\d+)/) {
push #ips, $ip;
}
}
close $fh;
say for #ips;
Or, once you open the file, process all lines with a map
my #ips = map { /ip_address\s*(\d+\.\d+\.\d+\.\d+)/ } <$fh>;
The filehandle is here read in a list context, imposed by map, so all lines from the file are returned. The block in map applies to each in turn, and map returns a flattened list with results.
Some notes
Use three-argument open, it is better
Don't assign $_ to a variable. To work with a lexical use while (my $line = <$fh>)
You can use split but here regex is more direct and it allows you to assign its match so that it is scoped. If there is no match the if fails and nothing goes onto the array
use warnings;
use strict;
my $file = "test";
my ( $string,$ip);
open my $FH, "<",$file) or die $!;
while (my $lines = <FH>) {
if ($lines =~ m/address/){
($string, $ip) = (split ' ', $lines);
}
}
print "IP address is: $ip\n";
This will give you the output you needed. But fails in the case of multiple IP match lines in the input file overwrites the last $ip variable.

Loop through file in perl and remove strings with less than 4 characters

I am trying to bring a file loop through it and remove any strings that have less than four characters in it and then print the list. I come from a javascript world and perl is brand new to me.
use strict;
use warnings;
sub lessThan4 {
open( FILE, "<names.txt" );
my #LINES = <FILE>;
close( FILE );
open( FILE, ">names.txt" );
foreach my $LINE ( #LINES ) {
print FILE $LINE unless ( $LINE.length() < 4 );
}
close( FILE );
}
use strict;
use warnings;
# automatically throw exception if open() fails
use autodie;
sub lessThan4 {
my #LINES = do {
# modern perl uses lexical, and three arg open
open(my $FILE, "<", "names.txt");
<$FILE>;
};
# remove newlines
chomp(#LINES);
open(my $FILE, ">", "names.txt");
foreach my $LINE ( #LINES ) {
print $FILE "$LINE\n" unless length($LINE) < 4;
# possible alternative to 'unless'
# print $FILE "$LINE\n" if length($LINE) >= 4;
}
close($FILE);
}
You're basically there. I hope you'll find some comments on your code useful.
# Well done for including these. So many new Perl users don't
use strict;
use warnings;
# Perl programs traditionally use all lower-case subroutine names
sub lessThan4 {
# 1/ You should use lexical variables for filehandles
# 2/ You should use the three-argument version of open()
# 3/ You should always check the return value from open()
open( FILE, "<names.txt" );
# Upper-case variable names in Perl are assumed to be global variables.
# This is a lexical variable, so name it using lower case.
my #LINES = <FILE>;
close( FILE );
# Same problems with open() here.
open( FILE, ">names.txt" );
foreach my $LINE ( #LINES ) {
# This is your biggest problem. Perl doesn't yet embrace the idea of
# calling methods to get properties of a variable. You need to call
# length() as a function.
print FILE $LINE unless ( $LINE.length() < 4 );
}
close( FILE );
}
Rewriting to take all that into account, we get the following:
use strict;
use warnings;
sub less_than_4 {
open( my $in_file_h, '<', 'names.txt' ) or die "Can't open file: $!";
my #lines = <$in_file_h>;
close( $in_file_h );
open( my $out_file_h, '>', 'names.txt' ) or die "Can't open file: $!";
foreach my $line ( #lines ) {
# Note: $line will include the newline character, so you might need
# to increase 4 to 5 here
print $out_file_h $line unless length $line < 4;
}
close( $out_file_h );
}
I am trying to bring a file loop through it and remove any strings that have less than four characters in it and then print the list.
I suppose you need to remove strings from the file which are less than 4 chars in length.
#!/usr/bin/perl
use strict;
use warnings;
open ($FH, "<", "names.txt");
my #final_list;
while (my $line = <$FH>) {
map {
length($_) > 4 and push (#final_list, $_) ;
} split (/\s/, $line);
}
print "\nWords with more than 4 chars: #final_list\n";
#Please try this one:
use strict;
use warnings;
my #new;
while(<DATA>)
{
#Push all the values less than 4 characters
push(#new, $_) unless(length($_) > '4');
}
print #new;
__DATA__
Williams
John
Joe
Lee
Albert
Francis
Sun

Counting using Hashes in Perl

I am trying to determine if a certain ID is present in my hash and if it is store the count in hash: this is what I have:
#!/usr/bin/perl
open (INFILE, "parsedveronii.txt")
or die "cannot find infile";
while ($file=<INFILE>){
#array=split "\t", $file;
$hash{$array[1]}= ""; #the keys in my hash are subject IDS
}
open (INFILE1, "uniqueveroniiproteins.txt")
or die "cannot find infile";
while ($file1=<INFILE>){
#array = split "\n", $file1; #array[0] also contains subject IDs
if (exists ($hash{$array[0]})){ #if in my hash exists $array[0], keep count of it
$count++;
}
$hash{$array[1]{$count}}=$hash{$array[1]{$count}} +1;#put the count in hash
}
use Data::Dumper;
print Dumper (\%hash);
for some reason it's not executing the count, any ideas? Any help is appreciated.
Always include use strict; and use warnings; at the top of each and EVERY script.
Your machinations in the second file loop seem a little contrived. If you're just trying to count the subject ids, that is done a lot simpler.
The following is a cleaned up version of your code, doing what I interpret as your intention.
#!/usr/bin/perl
use strict;
use warnings;
use autodie;
open my $fh, '<', "parsedveronii.txt";
my %count;
while (my $line = <$fh>){
chomp $line;
my #array = split "\t", $line;
$count{$array[1]} = 0; #the keys in my hash are subject IDS
}
open $fh, '<', "uniqueveroniiproteins.txt";
while (my $line = <$fh>){
chomp $line;
my #array = split "\t", $line; #array[0] also contains subject IDs
if (exists $count{$array[0]}) { #if in my hash exists $array[0], keep count of it
$count{$array[0]}++;
}
}
use Data::Dumper;
print Dumper (\%count);

How can I read the lines of a file into an array in Perl?

I have a file named test.txt that is like this:
Test
Foo
Bar
But I want to put each line in a array and print the lines like this:
line1 line2 line3
But how can I do this?
#!/usr/bin/env perl
use strict;
use warnings;
my #array;
open(my $fh, "<", "test.txt")
or die "Failed to open file: $!\n";
while(<$fh>) {
chomp;
push #array, $_;
}
close $fh;
print join " ", #array;
Here is my single liner:
perl -e 'chomp(#a = <>); print join(" ", #a)' test.txt
Explanation:
read file by lines into #a array
chomp(..) - remove EOL symbols for each line
concatenate #a using space as separator
print result
pass file name as parameter
One more answer for you to choose from:
#!/usr/bin/env perl
open(FILE, "<", "test.txt") or die("Can't open file");
#lines = <FILE>;
close(FILE);
chomp(#lines);
print join(" ", #lines);
If you find yourself slurping files frequently, you could use the File::Slurp module from CPAN:
use strict;
use warnings;
use File::Slurp;
my #lines = read_file('test.txt');
chomp #lines;
print "#lines\n";
The most basic example looks like this:
#!/usr/bin/env perl
use strict;
use warnings;
open(F, "<", "test.txt") or die("Cannot open test.txt: $!\n"); # (1)
my #lines = ();
while(<F>) { chomp; push(#lines, $_); } # (2)
close(F);
print "#lines"; # (3) stringify
(1) is the place where the file is opened.
(2) File handles work nicely within list enviroments (scalar/list environments are defined by the left value), so if you assign an array to a file handle, all the lines are slurped into the array. The lines are delimited (ended) by the value of $/, the input record separator. If you use English;, you can use $IRS or $INPUT_RECORD_SEPARATOR. This value defaults to the newline character \n;
While this seemed to be a nice idea, I've just forgot the fact that if you print all the lines, the ending \n will be printed too. Baaad me.
Originally the code was:
my #lines = <F>;
instead of the while loop. This is still a viable alternative, but you should swap (3) with chomping and then printing/stringifying all the elements:
for (#lines) { chomp; }
print "#lines";
(3) Stringifying means converting an array to a string and inserting the value $" between the array elements. This defaults to a space.
See: the perlvar page.
So the actual 2nd try is:
#!/usr/bin/env perl
use strict;
use warnings;
open(F, "<", "test.txt") or die("Cannot open test.txt: $!\n"); # (1)
my #lines = <F>; # (2)
close(F);
chomp(#lines);
print "#lines"; # (3) stringify
This is the simplest version I could come up with:
perl -l040 -pe';' < test.txt
Which is roughly equivalent to:
perl -pe'
chomp; $\ = $/; # -l
$\ = 040; # -040
'
and:
perl -e'
LINE:
while (<>) {
chomp; $\ = $/; # -l
$\ = " "; # -040
} continue {
print or die "-p destination: $!\n";
}
'
This is the code that do this (assume the below code inside script.pl) :
use strict;
use warnings
my #array = <> ;
chomp #array;
print "#array";
It is run by:
scirpt.pl [your file]