Working with Types in Perl - perl

I worked with Fortran since years and with Types within Fortran of course.
Now, since I am employed at a new company, I have to work with Perl.
Since my main work is to operate with data within a text file and have to read through the file, I want to generate "objects" or "types" from this data.
In Fortran I would do that so the further work is much easier :)
Is it possible with Perl to generate types for a strict type schema like I can to it in Fortran?
For example ... this is what I would do in Fortran:
Type demoType
integer i
character(256) str
end Type
Then I can work with variables which are of this type. Type(demoType) demoVar e.g.
Is there any chance to work like this in Perl?
Sorry for my English ... I am native German speaking :)

In FORTRAN, much like C, variables mostly occupy a fixed space in memory. So a Type like your demoType describes how an item is laid out, and things like strings have to be preallocated a maximum size.
In Perl, a scalar value - mostly strings and numbers - occupies a lot more than just the space for the value itself, and so cannot be mapped in the same way.
To program in Perl you have to distance yourself from the implementation and allow the run time system to look after things for you. A string can vary in size indefinitely (subject to the memory of the computer) and the preferred way of associating sets of values is by using a hash (a type of array that is indexed by arbitrary strings instead of numbers).
For instance, a single demoType value might look like
my $demo_type = {
i => 99,
str => 'text'
}
and if you wanted to work with many such pairings then you should look at Perl objects.
If you have no experience of object-oriented software then you have a long and fascinating journey ahead of you!

There are some CPAN modules that can help you. This is an option: Class::Struct (which belongs to the standar perl distribution, no further installation is required). Example (taken from module documentation):
use Class::Struct;
struct( Rusage => {
ru_utime => 'Timeval', # user time used
ru_stime => 'Timeval', # system time used
});
struct( Timeval => [
tv_secs => '$', # seconds
tv_usecs => '$', # microseconds
]);
# create an object:
my $t = Rusage->new(ru_utime=>Timeval->new(), ru_stime=>Timeval->new());
# $t->ru_utime and $t->ru_stime are objects of type Timeval.
# set $t->ru_utime to 100.0 sec and $t->ru_stime to 5.0 sec.
$t->ru_utime->tv_secs(100);
$t->ru_utime->tv_usecs(0);
$t->ru_stime->tv_secs(5);
$t->ru_stime->tv_usecs(0);

I would advise you to look at Modern Perl.
If you want to work with types, then look at Moose to work in an object-oriented way and expose methods and attributes.
The core perl is dynamically evaluation a variable to assign a type to a variable.

I think you may be trying to store the integer value, and the stringified value of the same number.
{
i = 123;
str = "123";
}
If that is the case, you are trying to do something that Perl does automatically for you.
my $integer = 123;
my $string = $integer;
print $string;
Another possibility is that you need to associate a number with a string. In which case you can just use:
an array.
my #data = ( 123, 'string' );
print 'integer: ', $data[0], "\n";
print 'string: ', $data[1], "\n";
an array ref.
my $data = [ 123, 'string' ];
print 'integer: ', $data->[0], "\n";
print 'string: ', $data->[1], "\n";
a hash
my %data = (
123 => 'string',
);
print $data{123}, "\n";
a hash ref
my $data = {
123 => 'string',
};
print $data->{123}, "\n";
At any rate you should really post anonymized example data, so that we can actually help you to learn Perl.
I think you should read the documentation for Perl. It is quite extensive. It is actually how I learned to program in Perl.
At the very least read perlintro and perldata. If you need to build nested data structures you should also have a look at perldsc.
You could also look at learn.perl.org.

Related

Global symbol "%formsequence" requires explicit package name at line 37

I am trying to execute a Perl CGI script, but I am getting an error:
Global symbol "%formsequence" requires explicit package name at line 37.
I did some research and found that use strict forces me to declare the variables before I use them or store any data, but in my program I have declared them and that is why I don't understand the error. Here is my script:
#!/usr/bin/perl -w
use strict;
my %errors;
my %form;
my #formsequence;
my %fields = (
"lname" => "Last Name",
"phone" => "Phone",
"fname" => "Fist Name"
);
my %patterns = (
"lname" => '[A-Z][a-z]{2,50}',
"phone" => '\d{3}-\d{3}-\d{4}',
"fname" => '[A-Z][A-Za-z]{2,60}'
);
#formsequence = ("lname", "phone", "phone");
print "content-type/html\n\n";
if ($ENV{REQUEST_METHOD} eq "POST") {
&readformdata;
if (&checkrequiredfields) {
print "Form Data validated successfully!";
}
else {
foreach (keys (%fields)) {
if ($fields{$_} != $formsequence{$_}) { <-- line 37
$errors{$_}="Not in correct sequence\n";
}
}
}
I suspect you may be viewing the concept of an 'array' from the perspective of a PHP developer. In Perl a hash and an array are separate data structures.
Arrays are declared using the # prefix and you refer to elements using square brackets around an integer index:
my #names = ('Tom', 'Dick', 'Larry');
say $names[0]; # 'Tom'
say $names[-1]; # 'Larry'
my $count = #names; # $count now equals 3
foreach my $i (0..$#names) {
say $names[$i];
}
Hashes are declared using the % prefix and you refer to elements using curly braces around a string key:
my %rgb = (
red => '#ff0000',
white => '#ffffff',
blue => '#0000ff',
);
say $rgb{'red'}; # '#ff0000'
say $rgb{blue}; # '#0000ff' quotes optional around bareword keys
foreach my $k (keys %rgb) {
say $rgb{$k};
}
You wouldn't normally use the keys function on an array - in fact older versions of Perl don't even support it, newer versions will return a range of integers (e.g.: 0..2).
When you call keys on a hash the keys have no inherent order, and the order may change.
Other things worth knowing:
Using & to call a function is really old style (i.e. early 90s), these days we'd use readformdata() instead of &readformdata.
The != operator is a numeric comparison operator so only use it when the values you're comparing are actually numbers. If you want to check two strings are 'not equal' then use ne instead (e.g.: if($thing1 ne $thing2) { ... }).
This seems to be some rather old Perl.
You use -won the shebang line rather than use warnings (which has been available since Perl 5.6.0 was released in 2000.
You use a (presumably) custom readformdata() function instead of CGI,pm's params() method. CGI.pm was added to the Perl core in 1997 (it was recently removed - but that's not a reason to not use it).
You use ampersands on function calls. These haven't been necessary since Perl 5 was released in 1994.
Your problem is caused by declaring an array, #formsequence, which you then try to access as a hash - $formsequence{$_} means, "look up the key $_ in the hash %formsequence. In Perl, arrays and hashes are two completely different data types and it is possible (although not recommended for, hopefully, obvious reasons) to have an array and a hash with the same name.
You declare arrays like this - using #:
my #array = ('foo', 'bar', 'baz');
And access individual element like this - using [...]:
print $array[0]; # prints 'foo'
You declare hashes like this - using `%':
my %hash = (foo => 'Foo', bar => 'Bar', baz => 'Baz');
And access individual elements like ths - using {...}:
print $hash{foo}; # prints 'Foo'
Arrays are indexed using integers and are ordered. Hashes are indexed using strings and are unordered.
I can't really suggest a fix for your code as it's not really clear what you are trying to do. It appears that you want to check that parameters appear in a certain order, but this is doomed to failure as a) you can't guarantee the order in which CGI parameters are transmitted from the browser to your web server and b) you can't guarantee the order in which keys(%fields) will return the keys from your %fields hash.
If you explain in a little more detail what you are trying to do, then we might be able to help you more.

Need help understanding portion of script (globs and references)

I was reviewing this question, esp the response from Mr Eric Strom, and had a question regarding a portion of the more "magical" element within. Please review the linked question for the context as I'm only trying to understand the inner portion of this block:
for (qw($SCALAR #ARRAY %HASH)) {
my ($sigil, $type) = /(.)(.+)/;
if (my $ref = *$glob{$type}) {
$vars{$sigil.$name} = /\$/ ? $$ref : $ref
}
}
So, it loops over three words, breaking each into two vars, $sigil and $type. The if {} block is what I am not understanding. I suspect the portion inside the ( .. ) is getting a symbolic reference to the content within $glob{$type}... there must be some "magic" (some esoteric element of the underlying mechanism that I don't yet understand) relied upon there to determine the type of the "pointed-to" data?
The next line is also partly baffling. Appears to me that we are assigning to the vars hash, but what is the rhs doing? We did not assign to $_ in the last operation ($ref was assigned), so what is being compared to in the /\$/ block? My guess is that, if we are dealing with a scalar (though I fail to discern how we are), we deref the $ref var and store it directly in the hash, otherwise, we store the reference.
So, just looking for a little tale of what is going on in these three lines. Many thanks!
You have hit upon one of the most arcane parts of the Perl language, and I can best explain by referring you to Symbol Tables and Typeglobs from brian d foy's excellent Mastering Perl. Note also that there are further references to the relevant sections of Perl's own documentation at the bottom of the page, the most relevant of which is Typeglobs and Filehandles in perldata.
Essentially, the way perl symbol tables work is that every package has a "stash" -- a "symbol table hash" -- whose name is the same as the package but with a pair of trailing semicolons. So the stash for the default package main is called %main::. If you run this simple program
perl -E"say for keys %main::"
you will see all the familiar built-in identifiers.
The values for the stash elements are references to typeglobs, which again are hashes but have keys that correspond to the different data types, SCALAR, ARRAY, HASH, CODE etc. and values that are references to the data item with that type and identifier.
Suppose you define a scalar variable $xx, or more fully, $main:xx
our $xx = 99;
Now the stash for the main package is %main::, and the typeglob for all data items with the identifier xx is referenced by $main::{xx} so, because the sigil for typeglobs is a star * in the same way that scalar identifiers have a dollar $, we can dereference this as *{$main::{xx}}. To get the reference to the scalar variable that has the identifier xx, this typeglob can be indexed with the SCALAR string, giving *{$main::{xx}}{SCALAR}. Once more, this is a reference to the variable we're after, so to collect its value it needs dereferencing once again, and if you write
say ${*{$main::{xx}}{SCALAR}};
then you will see 99.
That may look a little complex when written in a single statement, but it is fairly stratighforward when split up. The code in your question has the variable $glob set to a reference to a typeglob, which corresponds to this with respect to $main::xx
my $type = 'SCALAR';
my $glob = $main::{xx};
my $ref = *$glob{$type};
now if we say $ref we get SCALAR(0x1d12d94) or similar, which is a reference to $main::xx as before, and printing $$ref will show 99 as expected.
The subsequent assignment to #vars is straightforward Perl, and I don't think you should have any problem understanding that once you get the principle that a packages symbol table is a stash of typglobs, or really just a hash of hashes.
The elements of the iteration are strings. Since we don't have a lexical variable at the top of the loop, the element variable is $_. And it retains that value throughout the loop. Only one of those strings has a literal dollar sign, so we're telling the difference between '$SCALAR' and the other cases.
So what it is doing is getting 3 slots out of a package-level typeglob (sometimes shortened, with a little ambiguity to "glob"). *g{SCALAR}, *g{ARRAY} and *g{HASH}. The glob stores a hash and an array as a reference, so we simply store the reference into the hash. But, the glob stores a scalar as a reference to a scalar, and so needs to be dereferenced, to be stored as just a scalar.
So if you had a glob *a and in your package you had:
our $a = 'boo';
our #a = ( 1, 2, 3 );
our %a = ( One => 1, Two => 2 );
The resulting hash would be:
{ '$a' => 'boo'
, '%a' => { One => 1, Two => 2 }
, '#a' => [ 1, 2, 3 ]
};
Meanwhile the glob can be thought to look like this:
a =>
{ SCALAR => \'boo'
, ARRAY => [ 1, 2, 3 ]
, HASH => { One => 1, Two => 2 }
, CODE => undef
, IO => undef
, GLOB => undef
};
So to specifically answer your question.
if (my $ref = *$glob{$type}) {
$vars{$sigil.$name} = /\$/ ? $$ref : $ref
}
If a slot is not used it is undef. Thus $ref is assigned either a reference or undef, which evaluates to true as a reference and false as undef. So if we have a reference, then store the value of that glob slot into the hash, taking the reference stored in the hash, if it is a "container type" but taking the value if it is a scalar. And it is stored with the key $sigil . $name in the %vars hash.

Data types for parameters of subroutines / functions?

In Perl, can one specifiy data types for the parameters of subroutines? E.g. when using a dualvar in a numeric context like exit:
use constant NOTIFY_DIE_MAIL_SEND_FAILED => dualvar 3, 'NOTIFY_DIE_MAIL_SEND_FAILED';
exit NOTIFY_DIE_MAIL_SEND_FAILED;
How does Perl in that case know, that exit expects a numeric parameter? I didn't see a way to define data types for the parameters of subroutines like you do it in Java? (where I could understand how the data type is known as it is explicitely defined)
The whole point of the dualvar is that it behaves as a number or text depending on what you want. In cases where that's not obvious (to you more importantly than to perl) then make it clear.
exit 0 + NOTIFY_DIE_MAIL_SEND_FAILED;
As for explicitly typing parameters, that's not something built in. Perl is a much more dynamic language than Java so it's not common to check/force the type of every parameter or variable. In particular, a perl sub can accept different numbers of parameters and even different structures.
If you want to validate parameters (for an external API for example) try something like Params::Validate
In addition, Moose and Moo allow a certain level of attribute typing and even coercion.
In Perl, scalars are both numeric and stringy at the same time. It is not the variables themselves that distinguish between strings and numbers, but the operators you work with. While the addition + only uses a number, the concatenation . only uses strings.
In more strongly typing languages, e.g. Java, the addition operator doubles as addition and concatenation operator, because it can access type information.
"1" + 2 + 3 is still sick in Java, whereas Perl can cleanly distinguish between "1" + 2 + 3 == 6 and "1" . 2 . 3 eq "123".
You can force numeric or stringy context of a variable by adding 0 or concatenating the empty string:
sub foo {
my ($var) = #_;
$var += 0; # $var is numeric
$var .= ""; # $var is stringy now
}
Perl is quite different from Java in that - Perl is dynamically typed language, because it does not requires its variables to be typed at compile time..
Whereas, Java is statically typed (as you know already)
Perl determines the type of the variable depending upon the context it is used..
There can be only two context: -
List Context
Scalar Context
And the context is defined by the operator or function that is used..
For EG:-
# Define a list
#arr = qw/rohit jain/;
# Define a scalar
$num = 2
# Here perl will evaluate #arr in scalar context and take its length..
# so, below code will evaluate to : - value = 2 / 2
$value = #arr / $num;
# Here since it is used with a foreach loop, #arr will be taken as in list context
foreach (#arr) {
say $_;
}
# Above foreach loop will output: - `rohit` \n `jain` to the console..
You can force the type by:
use Scalar::Util qw(dualvar);
use constant NOTIFY_DIE_MAIL_SEND_FAILED => dualvar 3, 'NOTIFY_DIE_MAIL_SEND_FAILED';
say NOTIFY_DIE_MAIL_SEND_FAILED;
say int(NOTIFY_DIE_MAIL_SEND_FAILED);
output:
NOTIFY_DIE_MAIL_SEND_FAILED
3
How does Perl in that case know, that exit expects a numeric parameter?
exit expect a number as is part of its specification and its behaviour is kind of undefined if you pass it a non-integer value (i.e. you should not do it.
Now, in this particular case, how does dualvar manages to return either value type depending of the context?
I don't know how Scalar::Util's dualvar is implemented but you can write something similar with overload instead.
You certainly can modify the behaviour for a blessed object:
#!/usr/bin/env perl
use strict;
use warnings;
{package Dualvar;
use overload
fallback => 1,
'0+' => sub { $_[0]->{INT_VAL} },
'""' => sub { $_[0]->{STR_VAL} };
sub new {
my $class = shift;
my $self = { INT_VAL => shift, STR_VAL => shift };
bless($self,$class);
}
1;
}
my $x = Dualvar->new(31,'Therty-One');
print $x . " + One = ",$x + 1,"\n"; # Therty-One + One = 32
From the docs, it seems that overload actually changes the behaviour within the declaration scope so you should be able to change the behaviour of some common operators locally for any operand.
If exit does use one of those overloadable operations to evaluate its parameter into a integer then this solution would do.
I didn't see a way to define data types for the parameters of subroutines like you do it in Java?
As already said by others... this is not the case in Perl, at least not at compilation time, except for subroutine prototypes but these don't offer much type granularity (like int vs strings or different object classes).
Richard has mentioned some run-time alternatives you may use. I personally would recommend Moose if you don't mind the performance penalty.
What Rohit Jain said is correct. A function that wants input to follow certain rules simply has to explicitly check that the input is valid.
For example
sub foo
{
my ($param1,$param2) = shift;
$param1 =~ /^\d+$/ or die "Parameter 1 must be a positive integer.";
$param2 =~ /^(bar|baz)$/ or die "Parameter 2 must be either 'bar' or 'baz'";
...
}
This may seem like a pain, but:
The extra flexibility gained generally outweighs the work involved in doing this.
Simply having the correct data type is often not enough to ensure that you valid input, so you end up doing a lot this anyway even in a language like Java.

How do I create a hash of hashes in Perl?

Based on my current understanding of hashes in Perl, I would expect this code to print "hello world." It instead prints nothing.
%a=();
%b=();
$b{str} = "hello";
$a{1}=%b;
$b=();
$b{str} = "world";
$a{2}=%b;
print "$a{1}{str} $a{2}{str}";
I assume that a hash is just like an array, so why can't I make a hash contain another?
You should always use "use strict;" in your program.
Use references and anonymous hashes.
use strict;use warnings;
my %a;
my %b;
$b{str} = "hello";
$a{1}={%b};
%b=();
$b{str} = "world";
$a{2}={%b};
print "$a{1}{str} $a{2}{str}";
{%b} creates reference to copy of hash %b. You need copy here because you empty it later.
Hashes of hashes are tricky to get right the first time. In this case
$a{1} = { %b };
...
$a{2} = { %b };
will get you where you want to go.
See perldoc perllol for the gory details about two-dimensional data structures in Perl.
Short answer: hash keys can only be associated with a scalar, not a hash. To do what you want, you need to use references.
Rather than re-hash (heh) how to create multi-level data structures, I suggest you read perlreftut. perlref is more complete, but it's a bit overwhelming at first.
Mike, Alexandr's is the right answer.
Also a tip. If you are just learning hashes perl has a module called Data::Dumper that can pretty-print your data structures for you, which is really handy when you'd like to check what values your data structures have.
use Data::Dumper;
print Dumper(\%a);
when you print this it shows
$VAR1 = {
'1' => {
'str' => 'hello'
},
'2' => {
'str' => 'world'
}
};
Perl likes to flatten your data structures. That's often a good thing...for example, (#options, "another option", "yet another") ends up as one list.
If you really mean to have one structure inside another, the inner structure needs to be a reference. Like so:
%a{1} = { %b };
The braces denote a hash, which you're filling with values from %b, and getting back as a reference rather than a straight hash.
You could also say
$a{1} = \%b;
but that makes changes to %b change $a{1} as well.
I needed to create 1000 employees records for testing a T&A system. The employee records were stored in a hash where the key was the employee's identity number, and the value was a hash of their name, date of birth, and date of hire etc. Here's how...
# declare an empty hash
%employees = ();
# add each employee to the hash
$employees{$identity} = {gender=>$gender, forename=>$forename, surname=>$surname, dob=>$dob, doh=>$doh};
# dump the hash as CSV
foreach $identity ( keys %employees ){
print "$identity,$employees{$identity}{forename},$employees{$identity}{surname}\n";
}

Dynamic array using perl code. Begineer [duplicate]

I want to create arrays dynamically based on the user input. For example, if the user gives input as 3 then three arrays should be created with the name #message1, #message2, and #message3.
How do I do it in Perl?
Don't. Instead, use an array of arrays:
my #message;
my $input = 3;
for my $index ( 0..$input-1 ) {
$message[$index][0] = "element 0";
$message[$index][1] = 42;
}
print "The second array has ", scalar( #{ $message[1] } ), " elements\n";
print "They are:\n";
for my $index ( 0..$#{ $message[1] } ) {
print "\t", $message[1][$index], "\n";
}
Some helpful rules are at http://perlmonks.org/?node=References+quick+reference
I have to ask why you want to do this, because it's not the right way to go. If you have three streams of input, each of which needs to be stored as a list, then store one list, which is a list of the lists (where the lists are stored as array references):
my #input = (
[ 'data', 'from', 'first', 'user' ],
[ qw(data from second user) ],
[ qw(etc etc etc) ],
);
If you have names associated with each user's data, you might want to use that
as a hash key, for indexing the data against:
my %input = (
senthil => [ 'data', 'from', 'first', 'user' ],
ether => [ qw(data from second user) ],
apu => [ qw(etc etc etc) ],
);
Please refer to the Perl Data Structures Cookbook (perldoc perldsc) for more
on selecting the right data structure for the situation, and how to define
them.
Creating new named arrays dynamically is almost never a good idea. Mark Dominus, author of the enlightening book Higher-Order Perl, has written a three-part series detailing the pitfalls.
You have names in mind for these arrays, so put them in a hash:
sub create_arrays {
my($where,$n) = #_;
for (1 .. $n) {
$where->{"message$_"} = [];
}
}
For a quick example that shows the structure, the code below
my $n = #ARGV ? shift : 3;
my %hash;
create_arrays \%hash, $n;
use Data::Dumper;
$Data::Dumper::Indent = $Data::Dumper::Terse = 1;
print Dumper \%hash;
outputs
$ ./prog.pl
{
'message2' => [],
'message3' => [],
'message1' => []
}
Specifying a different number of arrays, we get
$ ./prog.pl 7
{
'message2' => [],
'message6' => [],
'message5' => [],
'message4' => [],
'message3' => [],
'message1' => [],
'message7' => []
}
The order of the keys looks funny because they're inside a hash, an unordered data structure.
Recall that [] creates a reference to a new anonymous array, so, for example, to add values to message2, you'd write
push #{ $hash{"message2"} }, "Hello!";
To print it, you'd write
print $hash{"message2"}[0], "\n";
Maybe instead you want to know how long all the arrays are:
foreach my $i (1 .. $n) {
print "message$i: ", scalar #{ $hash{"message$i"} }, "\n";
}
For more details on how to use references in Perl, see the following documentation:
Mark's very short tutorial about references, or perldoc perlreftut
Perl references and nested data structures, or perldoc perlref
Perl Data Structures Cookbook, or perldoc perldsc
In compiled languages, variables don't have a name. The name you see in the code is a unique identifier associated with some numerical offset. In an identifier like message_2 the '2' only serves to make it a unique identifier. Anybody can tell that you could make your three variables: message_125, message_216, and message_343. As long as you can tell what you should put into what, they work just as well as message_1...
The "name" of the variable is only for you keeping them straight while you're writing the code.
Dynamic languages add capability by not purging the symbol table(s). But a symbol table is simply an association of a name with a value. Because Perl offers you lists and hashes so cheaply, there is no need to use the programming/logistical method of keeping track of variables to allow a flexible runtime access.
Chances are that if you see yourself naming lists #message1, #message2, ... -- where the items differ only by their reference order, that these names are just as good: $message[1], $message[2], ....
In addition, since symbol tables are usually mapping from name-to-offset (either on the stack or in the heap), it's really not a whole lot more than a key-value pair you find in a hash. So hashes work just as good for looking up more distinct names.
$h{messages} = [];
$h{replies} = [];
I mean really if you wanted to, you could store everything that you put into a lexical variable into a single hash for the scope, if you didn't mind writing: $h{variable_name} for everything. But you wouldn't get the benefit of Perl's implicit scope management, and across languages, programmers have preferred implicit scope management.
Perl allows symbolic manipulation, but over the years the dynamic languages have found that a mixed blessing. But in Perl you have both "perspectives", to give them a name. Because you can determine what code in a compiled language is likely to do better than a dynamic language, it has been determined more error free to use a "compiled perspective" for more things: So as you can see with the availability of offset-management and lookup compiled behavior given to you in core Perl, there is no reason to mess with the symbol table, if you don't have to.
Creating an array dynamically, is as simple as: []. Assigning it to a spot in memory, when we don't know how many we want to store, is as easy as:
push #message, [];
And creating a list of lists all at once is as easy as:
#message = map { [] } 1..$num_lists;
for some specified value in $num_lists.