I am trying to read values from an input file in Perl.
Input file looks like:
1-sampledata1 This is a sample test
and data for this continues
2-sampledata2 This is sample test 2
Data for this also is on second line
I want to read the above data so that data for 1-sampledata1 goes into #array1 and data for 2-sampledata2 goes in #array2 and so on.
I will have about 50 sections like this. like 50-sampledata50.
EDIT: The names wont always be X-sampledataX. I just did that for example. So names cant be in a loop. I think I'll have to type them manually
I so far have the following (which works). But I am looking for a more efficient way to do this..
foreach my $line(#body){
if ($line=~ /^1-sampledata1\s/){
$line=~ s/1-ENST0000//g;
$line=~ s/\s+//g;
push (#array1, $line);
#using splitarray because i want to store data as one character each
#for ex: i wana store 'This' as T H I S in different elements of array
#splitarray1= split ('',$line);
last if ($line=~ /2-sampledata2/);
}
}
foreach my $line(#body){
if ($line=~ /^2-sampledata2\s/){
$line=~ s/2-ENSBTAP0//g;
$line=~ s/\s+//g;
#splitarray2= split ('',$line);
last if ($line=~ /3-sampledata3/);
}
}
As you can see I have different arrays for each section and different for loops for each section. If I go with approach I have so far then I will end up with 50 for loops and 50 arrays.
Is there another better way to do this? In the end I do want to end up with 50 arrays but do not want to write 50 for loops. And since I will be looping through the 50 arrays later on in the program, maybe store them in an array? I am new to Perl so its kinda overwhelming ...
The first thing to notice is that you are trying to use variable names with integer suffixes: Don't. Use an array whenever you find your self wanting to do that. Second, you only need to read to go over the file contents once, not multiple times. Third, there is usually no good reason in Perl to treat a string as an array of characters.
Update: This version of the code uses existence of leading spaces to decide what to do. I am leaving the previous version up as well for reference.
#!/usr/bin/perl
use strict;
use warnings;
my #data;
while ( my $line = <DATA> ) {
chomp $line;
if ( $line =~ s/^ +/ / ) {
push #{ $data[-1] }, split //, $line;
}
else {
push #data, [ split //, $line ];
}
}
use Data::Dumper;
print Dumper \#data;
__DATA__
1-sampledata1 This is a sample test
and data for this continues
2-sampledata2 This is sample test 2
Data for this also is on second line
Previous version:
#!/usr/bin/perl
use strict;
use warnings;
my #data;
while ( my $line = <DATA> ) {
chomp $line;
$line =~ s/\s+/ /g;
if ( $line =~ /^[0-9]+-/ ) {
push #data, [ split //, $line ];
}
else {
push #{ $data[-1] }, split //, $line;
}
}
use Data::Dumper;
print Dumper \#data;
__DATA__
1-sampledata1 This is a sample test
and data for this continues
2-sampledata2 This is sample test 2
Data for this also is on second line
#! /usr/bin/env perl
use strict;
use warnings;
my %data;
{
my( $key, $rest );
while( my $line = <> ){
unless( ($rest) = $line =~ /^ \s+(.*)/x ){
($key, $rest) = $line =~ /^(.*?)\s+(.*)/;
}
push #{ $data{$key} }, $rest;
}
}
The code below is very similar to #Brad Gilbert's and #Sinan Unur's solutions:
#!/usr/bin/perl
use strict;
use warnings;
use Data::Dumper;
my (%arrays, $label);
while (my $line = <DATA>)
{
($label, $line) = ($1, $2) if $line =~ /^(\S+)(.*)/; # new data block
$line =~ s/^\s+//; # strip whitespaces from the begining
# append data for corresponding label
push #{$arrays{$label}}, split('', $line) if defined $label;
}
print $arrays{'1-sampledata1'}[2], "\n"; # 'i'
print join '-', #{$arrays{'2-sampledata2'}}; # 'T-h-i-s- -i-s- -s-a-m-p-l
print Dumper \%arrays;
__DATA__
1-sampledata1 This is a sample test
and data for this continues
2-sampledata2 This is sample test 2
Data for this also is on second line
Output
i
T-h-i-s- -i-s- -s-a-m-p-l-e- -t-e-s-t- -2-D-a-t-a- -f-o-r- -t-h-i-s- -a-l-s-o- -i-s- -o-n- -s-e-c-o-n-d- -l-i-n-e-
$VAR1 = {
'2-sampledata2' => [
'T',
'h',
'i',
's',
' ',
'i',
's',
' ',
's',
'a',
'm',
'p',
'l',
'e',
' ',
't',
'e',
's',
't',
' ',
'2',
'D',
'a',
't',
'a',
' ',
'f',
'o',
'r',
' ',
't',
'h',
'i',
's',
' ',
'a',
'l',
's',
'o',
' ',
'i',
's',
' ',
'o',
'n',
' ',
's',
'e',
'c',
'o',
'n',
'd',
' ',
'l',
'i',
'n',
'e',
'
'
],
'1-sampledata1' => [
'T',
'h',
'i',
's',
' ',
'i',
's',
' ',
'a',
' ',
's',
'a',
'm',
'p',
'l',
'e',
' ',
't',
'e',
's',
't',
'a',
'n',
'd',
' ',
'd',
'a',
't',
'a',
' ',
'f',
'o',
'r',
' ',
't',
'h',
'i',
's',
' ',
'c',
'o',
'n',
't',
'i',
'n',
'u',
'e',
's',
'
'
]
};
You should, instead, use a hash map to arrays.
Use this regex pattern to get the index:
/^(\d+)-sampledata(\d+)/
And then, with my %arrays, do:
push($arrays{$index}), $line;
You can then access the arrays with $arrays{$index}.
Related
The answer is probably right in front of me...but I've pulled too much of my hair out on this one. I have a script that reads in a list of first names and last names, stores them in arrays, then randomly picks 10 first and last names to print. For example, it will print "John Doe." after grabbing the first and last names. I have everything working, but the program isn't printing them on the same line. It's automaticlly making a new line, like this:
John
Doe
Here is the script:
use strict;
#Open boy names
my $boyFile = "boyFirst.txt";
open (FH, "< $boyFile") or die "Can not open $boyFile for read: $!";
my #allNames;
while (<FH>) # While file is open, keep putting new lines into list
{
push (#allNames, $_);
}
close FH or die "Can not close $boyFile: $!";
#open girl names
my $girlFile = "girlFirst.txt";
open (FH, "< $girlFile") or die "Can not open $girlFile for read: $!";
#my #girlLines;
while (<FH>)
{
push (#allNames, $_); # While file is open, keep putting new lines into list
}
close FH or die "Can not close $girlFile: $!";
#open last names
my $lastFile = "lastName.txt";
open (FH, "< $lastFile") or die "Can not open $lastFile for read: $!";
my #lastLines;
while (<FH>)
{
push (#lastLines, $_); # While file is open, keep putting new lines into list
}
close FH or die "Can not close $lastFile: $!";
#Generate Alphabet
my #alpha = ('A', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'I', 'J', 'K', 'L', 'M', 'N', 'O', 'P', 'Q', 'R', 'S', 'T', 'U', 'V', 'W', 'X', 'Y', 'Z');
my $name;
my $lastName;
my $randomFirstName;
my $randomLastName;
for (1 .. 10)
{
$name = rand #allNames;
$lastName = rand #lastLines;
$randomFirstName = $allNames[$name];
$randomLastName = $lastLines[$lastName];
printf("$randomFirstName");
printf("$randomLastName");
}
Tips, and advice are also appreciated. New to scripting :)
Add chomp; before all your push statements. It will remove the line ending character from $_ before it is pushed to arrays. Then add a print "\n"; after the final print statement.
I would like to write an simple perl script to generate all possible words for given phone number.
I started with definition of an array:
my #nums = (
['0'],
['1'],
['2', 'a', 'b', 'c'],
['3', 'd', 'e', 'f'],
['4', 'g', 'h', 'i'],
['5', 'j', 'k', 'l'],
['6', 'm', 'n', 'o'],
['7', 'p', 'q', 'r', 's'],
['8', 't', 'u', 'v'],
['9', 'w', 'x', 'y', 'z']
);
The final script should generate following output:
$ num2word 12
12
1a
1b
1c
$ num2word 213
213
21d
21e
21f
a13
a1d
a1e
a1f
b13
b1d
b1e
b1f
c13
c1d
c1e
c1f
I am looking for any module which can do most part of the job (something like List::Permutor which does not seem to qualify for this task).
Any hints?
Thanks!
Our very own #brian d foy has solved this problem with his Set::CrossProduct module.
use Set::CrossProduct;
my $iterator = Set::CrossProduct->new(
[ [ qw(8 t u v) ], [ qw(0) ], [ qw(7 p q r s) ] ] );
print "#$_\n" for $iterator->combinations;
Output:
8 0 7
8 0 p
8 0 q
8 0 r
8 0 s
t 0 7
t 0 p
t 0 q
t 0 r
t 0 s
u 0 7
u 0 p
u 0 q
u 0 r
u 0 s
v 0 7
v 0 p
v 0 q
v 0 r
v 0 s
This does what you ask.
use strict;
use warnings;
my #nums = (
[ qw/ 0 / ],
[ qw/ 1 / ],
[ qw /2 a b c / ],
[ qw /3 d e f / ],
[ qw /4 g h i / ],
[ qw /5 j k l / ],
[ qw /6 m n o / ],
[ qw /7 p q r s / ],
[ qw /8 t u v / ],
[ qw /9 w x y z / ],
);
list_matching('12');
list_matching('213');
sub list_matching {
my ($num) = #_;
my #num = $num =~ /\d/g;
my #map = (0) x #num;
do {
print join('', map { $nums[$num[$_]][$map[$_]] } 0 .. $#num), "\n";
my $i = $#map;
while ($i >= 0) {
last if ++$map[$i] < #{ $nums[$num[$i]] };
$map[$i--] = 0;
}
} while grep $_, #map;
}
output
12
1a
1b
1c
213
21d
21e
21f
a13
a1d
a1e
a1f
b13
b1d
b1e
b1f
c13
c1d
c1e
c1f
See the functions in Algorithm::Combinatorics.
Actually, does work, too early for me...
use autodie;
use strict;
use warnings;
my #nums = (
['0'],
['1'],
['2', 'a', 'b', 'c'],
['3', 'd', 'e', 'f'],
['4', 'g', 'h', 'i'],
['5', 'j', 'k', 'l'],
['6', 'm', 'n', 'o'],
['7', 'p', 'q', 'r', 's'],
['8', 't', 'u', 'v'],
['9', 'w', 'x', 'y', 'z']
);
my $input = shift || die "Need a number!\n";
die "Input not numeric!\n" unless $input =~ m/^\d+$/;
my #digits = split //, $input;
my #rows;
push #rows, $nums[$_] for #digits;
print_row(0,'');
exit;
sub print_row {
my $i = shift;
my $word = shift;
my $row = $rows[$i];
for my $j (0..$#{$row}) {
my $word2 = $word . $row->[$j];
if ($i < $#rows) {
print_row($i+1, $word2);
}
else {
print "$word2\n";
}
}
}
No modules required:
my #nums = (
['0'],
['1'],
['2', 'a', 'b', 'c'],
['3', 'd', 'e', 'f'],
['4', 'g', 'h', 'i'],
['5', 'j', 'k', 'l'],
['6', 'm', 'n', 'o'],
['7', 'p', 'q', 'r', 's'],
['8', 't', 'u', 'v'],
['9', 'w', 'x', 'y', 'z']
);
print "$_\n" while glob join '', map sprintf('{%s}', join ',', #{$nums[$_]}), split //, $ARGV[0]
use strict;
use warnings;
my #nums = (
['0'], ['1'], ['2', 'a', 'b', 'c'],
['3', 'd', 'e', 'f'], ['4', 'g', 'h', 'i'],
['5', 'j', 'k', 'l'], ['6', 'm', 'n', 'o'],
['7', 'p', 'q', 'r', 's'], ['8', 't', 'u', 'v'],
['9', 'w', 'x', 'y', 'z']);
num2word(12);
num2word(213);
sub num2word {
my ($i, $n, $t) = ($_[0]=~/(.)(.*)/, $_[1]);
for (#{$nums[$i]}) {
print "$t$_\n" and next if !length($n);
num2word($n, defined $t ? $t.$_ : $_);
}
}
A basic recursive solution:
#!/usr/bin/perl
use strict;
use warnings;
my $phone_number = $ARGV[0] or die "No phone number";
my #nums = (
['0'],
['1'],
[ '2', 'a', 'b', 'c' ],
[ '3', 'd', 'e', 'f' ],
[ '4', 'g', 'h', 'i' ],
[ '5', 'j', 'k', 'l' ],
[ '6', 'm', 'n', 'o' ],
[ '7', 'p', 'q', 'r', 's' ],
[ '8', 't', 'u', 'v' ],
[ '9', 'w', 'x', 'y', 'z' ]
);
my %letters = map { shift #{$_} => $_ } #nums;
my #permutations;
sub recurse {
my $str = shift;
my $done = shift || '';
unless ($str) {
push #permutations, $done;
return;
}
my $next = substr( $str, 0, 1 );
$str = substr( $str, 1 );
recurse( $str, $done . $next );
if ( my #chars = #{ $letters{$next} } ) {
recurse( $str, $done . $_ ) foreach #chars;
}
}
recurse($phone_number);
print "$_\n" foreach #permutations;
and:
perl num2word 12
12
1a
1b
1c
perl num2word 213
213
21d
21e
21f
a13
a1d
a1e
a1f
b13
b1d
b1e
b1f
c13
c1d
c1e
c1f
I'm new in perl and have a question concerning the use of hashes of arrays to retrieve specific columns. My code is the following:
my %hash = ( name1 => ['A', 'A', 'B', 'A', 'A', 'B'],
name2 => ['A', 'A', 'D', 'A', 'A', 'B'],
name3 => ['A', 'A', 'B', 'A', 'A', 'C'],
);
#the values of %hash are returned as arrays not as string (as I want)
foreach my $name (sort keys %hash ) {
print "$name: ";
print "$hash{$name}[2]\n";
}
for (my $i=0; $i<$length; $i++) {
my $diff = "no";
my $letter = '';
foreach $name (sort keys %hash) {
if (defined $hash{$name}[$i]) {
if ($hash{$name}[$i] =~ /[ABCD]/) {
$letter = $hash{$name}[$i];
}
elsif ($hash{$name}[$i] ne $letter) {
$diff = "yes";
}
}
if ( $diff eq "yes" ) {
foreach $name (sort keys %hash) {
if (defined $hash{$name}[$i]) { $newhash{$name} .= $hash{$name}[$i]; }
}
}
}
}
foreach $name (sort keys %newhash ) {
print "$name: $newhash{$name} \n";
}
I want the output of this program to be something like a new hash with only the variable columns:
my %newhash = ( name1 => 'BB',
name2 => 'DB',
name3 => 'BC',
);
but is only given this message:
Use of uninitialized value $letter in string ne at test_hash.pl line 31.
Does anyone have ideas about this?
Cheers
EDIT:
Many thanks for your help in this question.
I edited my post to confirm with the suggestions of frezik, Dan1111, Jean. You're right, now there are no warnings but I can not also get any output from the print statement and I don't have any clue about this...
#TLP: ok I just generate a random set of columns without any order purpose. What I really want is about how the letters vary, which means that if for the same array index (stored in the hash) the letters are the same, discard those, but if the letters are different between keys, I want to store that index column in a new hash.
Cheers.
I assume that by this, you want to match any of the letters A,B,C, or D:
if ($hash{$name}[$i] =~ /ABCD/)
However, as written, it matches the exact string "ABCD". You need a character class for what you want:
if ($hash{$name}[$i] =~ /[ABCD]/)
However, you have other logic problems as well, that can lead you to compare to $letter before it has been set. Setting it to empty (as Jean suggested) is a simple option that may help.
Another problem is here:
print "$name: #{ $newhash{$name} }\n";
%newhash is not a hash of arrays, so you need to remove the array dereference:
print "$name: $newhash{$name} \n";
You may be interested in this alternative solution
use strict;
use warnings;
my %hash = (
name1 => ['A', 'A', 'B', 'A', 'A', 'B'],
name2 => ['A', 'A', 'D', 'A', 'A', 'B'],
name3 => ['A', 'A', 'B', 'A', 'A', 'C'],
);
my #columns;
for my $list (values %hash) {
$columns[$_]{$list->[$_]}++ for 0 .. $#$list;
}
my %newhash = %hash;
for my $list (values %newhash) {
$list = join '', map $list->[$_], grep keys %{$columns[$_]} > 1, 0 .. $#$list;
}
use Data::Dump;
dd \%newhash;
output
{ name1 => "BB", name2 => "DB", name3 => "BC" }
I think it's a mistake to check the letters one by one. It seems easier to just collect all the letters and check them at once. The List::MoreUtils module's uniq function can then quickly determine if the letters vary, and they can be transposed into the resulting hash easily.
use strict;
use warnings;
use Data::Dumper;
use List::MoreUtils qw(uniq);
my %hash = ( name1 => ['A', 'A', 'B', 'A', 'A', 'B'],
name2 => ['A', 'A', 'D', 'A', 'A', 'B'],
name3 => ['A', 'A', 'B', 'A', 'A', 'C'],
);
my #keys = keys %hash;
my $len = $#{ $hash{$keys[0]} }; # max index
my %new;
for my $i (0 .. $len) {
my #col;
for my $key (#keys) {
push #col, $hash{$key}[$i];
}
if (uniq(#col) != 1) { # check for variation
for (0 .. $#col) {
$new{$keys[$_]} .= $col[$_];
}
}
}
print Dumper \%new;
Output:
$VAR1 = {
'name2' => 'DB',
'name1' => 'BB',
'name3' => 'BC'
};
Your scalar $letter is not defined. Add this to get rid of the warning.
my $letter='';
if ($hash{$name}[$i] =~ /ABCD/) {
The regex above would match a string like __ABCD__ or ABCD1234, but never a lone A or B. You probably wanted to match any one of those letters, and it's a good idea to anchor the regex, too:
if ($hash{$name}[$i] =~ /\A [ABCD] \z/x) {
(The /x option means that whitespace is ignored, which helps make regexes a bit easier to read.)
You would still get the warning in the example above when $i == 2 and the inner loop happens to hit the keys name1 or name3 first. Since the regex doesn't match T, $letter will remain uninitialized.
Great. Many thanks for all your help in this question.
I tried a code based on the suggestion of TLP and worked just fine. Because I'm relatively new in perl I thought this code was more easier for me to understand than the code of Borodin. What I did was:
#!/usr/bin/perl
use strict;
use warnings;
use List::MoreUtils qw(uniq);
my %hash = ( name1 => ['A', 'A', 'T', 'A', 'A', 'T', 'N', 'd', 'd', 'D', 'C', 'T', 'T', 'T'],
name2 => ['A', 'A', 'D', 'A', 'A', 'T', 'A', 'd', 'a', 'd', 'd', 'T', 'T', 'C'],
name3 => ['A', 'A', 'T', 'A', 'A', 'C', 'A', 'd', 'd', 'D', 'C', 'T', 'C', 'T'],
);
my #keys = keys %hash;
my $len = $#{ $hash{$keys[0]} }; # max index
my %new;
for (my $i=0; $i<$length; $i++) {
my #col;
for my $key (#keys) {
if ($hash{$key}[$i] =~ /[ABCDT]/) { #added a pattern match
push #col, $hash{$key}[$i];
}
}
if (uniq(#col) != 1) { # check for variation
for (0 .. $#col) {
$new{$keys[$_]} .= $col[$_];
}
}
}
foreach my $key (sort keys %new ) {
print "$key: $new{$key}\n";
}
However, when playing with the uniq function (if (uniq(#col) == 1)), I noticed that the output was a little buggy:
name1: AAAAADCT
name2: AAAAADCT
name3: AAAAT
It seems that is not preserving the initial order of keys => values. Does anyone has a hint about this?
Cheers.
i am trying to scrape info from http://www.soccerbase.com/tournaments/tournament.sd?comp_id=1 from lines 1184 to 1325, basically the up coming games for the next 7 days. i have the code working for a single instance, but i can't figure out how to iterate the code so that it will scrape all the games info until it hits the end of the 7 day's worth of games. Is there some sort of loop i can create that will scrape until i hit a certain tag or something? Here is my code so far, thanks in advance!
my $page = WWW::Mechanize->new;
$page->get('http://www.soccerbase.com/tournaments/tournament.sd?comp_id=1');
my $stream = HTML::TokeParser->new(\$page->{content});
my #fixture;
my $tag = $stream->get_tag("td");
while($tag->[1]{class} ne "dateTime"){
$tag = $stream->get_tag("td");
}
if ($tag->[1]{class} eq "dateTime") {
push(#fixture, $stream->get_trimmed_text("/a"));
}
$stream->get_tag("a");
$stream->get_tag("a");
push(#fixture, $stream->get_trimmed_text("/a"));
$stream->get_tag("a");
push(#fixture, $stream->get_trimmed_text("/a"));
foreach $element (#fixture){
print $element, "\t";
}
print "\n";
Try Web::Query for parsing HTML, it is a much nicer to use than TokeParser. It works declarative instead of imperative and you select elements with CSS expressions.
If there is a score v, add the row to the result set, else discard the row.
use Web::Query 'wq';
my $football_matches = wq($mech->content)
->find('tr.match')
->map(sub {
my (undef, $e) = #_;
return 'v' eq $e->find('td.score')->text
? [
$e->attr('id'),
map { $e->find("td.$_")->text }
(qw(tournament dateTime homeTeam score awayTeam prices))
]
: ();
});
use Data::Dumper; print Dumper $football_matches;
$VAR1 = [
['tn7gc635476', '', ' Mo 12Mar 2012 ', 'Arsenal', 'v', 'Newcastle', ' '],
['tn7gc649937', '', ' Tu 13Mar 2012 ', 'Liverpool', 'v', 'Everton', ' '],
['tn7gc635681', '', ' Sa 17Mar 2012 ', 'Fulham', 'v', 'Swansea', ' '],
['tn7gc635661', '', ' Sa 17Mar 2012 ', 'Wigan', 'v', 'West Brom', ' '],
['tn7gc635749', '', ' Su 18Mar 2012 ', 'Wolves', 'v', 'Man Utd', ' '],
['tn7gc635556', '', ' Su 18Mar 2012 ', 'Newcastle', 'v', 'Norwich', ' ']
];
I have the following code.
Here I am matching the vowels characters words:
if ( /(a)+/ and /(e)+/ and /(i)+/ and /(o)+/ and /(u)+/ )
{
print "$1#$2#$3#$4#$5\n";
$number++;
}
I am trying to get the all matched patterns using grouping, but I am getting only the last expression pattern, which means the fifth expression of the if condition. Here I know that it is giving only one pattern because last pattern matching in if condition. I want to get all matched patterns, however. Can anyone help me out of this problem?
It is not quite clear what you want to do. Here are some thoughts.
Are you trying to count the number of vowels? In which case, tr will do the job:
my $count = tr/aeiou// ;
printf("string:%-20s count:%d\n" , $_ , $count ) ;
output :
string:book count:2
string:stackoverflow count:4
Or extract the vowels
my #array = / ( [aeiou] ) /xg ;
print Dumper \#array ;
Output from "stackoverflow question"
$VAR1 = [
'a',
'o',
'e',
'o',
'u',
'e',
'i',
'o'
];
Or extract sequences of vowels
my #array = / ( [aeiou]+ ) /xg ;
print Dumper \#array ;
Output from "stackoverflow question"
$VAR1 = [
'a',
'o',
'e',
'o',
'ue',
'io'
];
You could use
sub match_all {
my($s,#patterns) = #_;
my #matches = grep #$_ >= 1,
map [$s =~ /$_/g] => #patterns;
wantarray ? #matches : \#matches;
}
to create an array of non-empty matches.
For example:
my $string = "aaa e iiii oo uuuuu aa";
my #matches = match_all $string, map qr/$_+/ => qw/ a e i o u /;
if (#matches == 5) {
print "[", join("][", #$_), "]\n"
for #matches;
}
else {
my $es = #matches == 1 ? "" : "es";
print scalar(#matches), " match$es\n";
}
Output:
[aaa][aa]
[e]
[iiii]
[oo]
[uuuuu]
An input of, say, "aaa iiii oo uuuuu aa" produces
4 matches
You have 5 patterns with one matching group () each. Not 1 pattern with 5 groups.
(a)+ looks for a string containing a, aa, aaa, aaaa etc. The match will be multiple a's, not the word containing the group of a-s.
Your if( ...) is true if $_ contains one or more of 'a','e','i','o','u'.