Why do people say Perl is dynamically typed? - perl

Statically typed vs dynamically typed has been asked repeatedly on stackoverflow, for example here.
The consensus seems to be (quoting from the top answer of the above link):
A language is statically typed if the type of a variable is known at compile time.
And a dynamic language:
A language is dynamically typed if the type is associated with run-time values, and not named variables/fields/etc.
Perl seems to be statically typed by this (or other common definitions of static/dynamic typing). It has 3 types: scalar, array, hash (ignoring things like references for simplicity's sake). Types are declared along with variables:
my $x = 10; # declares a scalar variable named x
my #y = (1, 2, 3); # declares an array variable named y
my %z = (one => 1, two => 2); # declares a hash variable named z
The $, # and % above tell Perl which type you want; I'd count this as a form of explicit typing.
Once x has been declared as a scalar, as above, it's impossible to store a non-scalar value in x:
$x = #y; # x is now 3
Will convert y to a scalar (in Perl, array to scalar conversion result in the length of the array). I blame this on weak typing (Perl very liberally allows conversions between its 3 types), rather than dynamic typing.
Whereas in most statically typed languages, such an assignment would be an error, in Perl it is ok because of implicit conversions (similar to how bool x = 1; is fine in C/C++, but not in Java: both are statically typed, but Java is more strongly typed in this case). The only reason this conversion happened at all in Perl is because of the type of x, which again suggests Perl is statically typed.
Another argument people have against Perl being statically typed is that floats, ints, and strings are all stored in the same type of variable (scalars). But this really has nothing to do with static or dynamic typing. Within Perl's type system (which has only 3 types), there is no difference between floats, ints and strings. These all have type scalar. This is similar to saying C89 isn't statically typed because it used the int type to represent both ints and bools.
Obviously, this line of reasoning is ridiculous. Perl has very little in common with what most people think of as statically typed languages like C/C++, Java, OCaml, etc.
My question is, what's wrong with this line of reasoning?

I disagree on there being a consensus on the definitions you posted. But like your claim, that's opinion-based, and thus off-topic.
The posted definitions of "statically-typed language" and "dynamically-typed language" are useless. These are imaginary buckets into which very few languages fit.
According to the definition of statically-typed language you posted, Perl is a statically-typed language.
The type of $a is known to be a scalar at compile-time.
The type of #a is known to be an array at compile-time.
According to the definition of statically-typed language you posted, Perl isn't a statically-typed language.
$a could contain a signed integer (IV).
$a could contain a string (PV).
$a could contain a reference (RV) to an object of class Foo.
According to the definition of dynamically-typed language you posted, Perl is a dynamically-typed language.
$a could contain a signed integer (IV).
$a could contain a string (PV).
$a could contain a reference (RV) to an object of class Foo.
According to the definition of dynamically-typed language you posted, Perl isn't a dynamically-typed language.
The type of $a is known to be a scalar at compile-time.
The type of #a is known to be an array at compile-time.
Similarly, C++, C#, Java, BASIC and assembler languages are both/neither statically-typed and dynamically-typed. Even C doesn't fit the posted definition of statically-typed perfectly.

Related

How do I write a perl6 macro to enquote text?

I'm looking to create a macro in P6 which converts its argument to a string.
Here's my macro:
macro tfilter($expr) {
quasi {
my $str = Q ({{{$expr}}});
filter-sub $str;
};
}
And here is how I call it:
my #some = tfilter(age < 50);
However, when I run the program, I obtain the error:
Unable to parse expression in quote words; couldn't find final '>'
How do I fix this?
Your use case, converting some code to a string via a macro, is very reasonable. There isn't an established API for this yet (even in my head), although I have come across and thought about the same use case. It would be nice in cases such as:
assert a ** 2 + b ** 2 == c ** 2;
This assert statement macro could evaluate its expression, and if it fails, it could print it out. Printing it out requires stringifying it. (In fact, in this case, having file-and-line information would be a nice touch also.)
(Edit: 007 is a language laboratory to flesh out macros in Perl 6.)
Right now in 007 if you stringify a Q object (an AST), you get a condensed object representation of the AST itself, not the code it represents:
$ bin/007 -e='say(~quasi { 2 + 2 })'
Q::Infix::Addition {
identifier: Q::Identifier "infix:+",
lhs: Q::Literal::Int 2,
rhs: Q::Literal::Int 2
}
This is potentially more meaningful and immediate than outputting source code. Consider also the fact that it's possible to build ASTs that were never source code in the first place. (And people are expected to do this. And to mix such "synthetic Qtrees" with natural ones from programs.)
So maybe what we're looking at is a property on Q nodes called .source or something. Then we'd be able to do this:
$ bin/007 -e='say((quasi { 2 + 2 }).source)'
2 + 2
(Note: doesn't work yet.)
It's an interesting question what .source ought to output for synthetic Qtrees. Should it throw an exception? Or just output <black box source>? Or do a best-effort attempt to turn itself into stringified source?
Coming back to your original code, this line fascinates me:
my $str = Q ({{{$expr}}});
It's actually a really cogent attempt to express what you want to do (turn an AST into its string representation). But I doubt it'll ever work as-is. In the end, it's still kind of based on a source-code-as-strings kind of thinking à la C. The fundamental issue with it is that the place where you put your {{{$expr}}} (inside of a string quote environment) is not a place where an expression AST is able to go. From an AST node type perspective, it doesn't typecheck because expressions are not a subtype of quote environments.
Hope that helps!
(PS: Taking a step back, I think you're doing yourself a disservice by making filter-sub accept a string argument. What will you do with the string inside of this function? Parse it for information? In that case you'd be better off analyzing the AST, not the string.)
(PPS: Moritz++ on #perl6 points out that there's an unrelated syntax error in age < 50 that needs to be addressed. Perl 6 is picky about things being defined before they are used; macros do not change this equation much. Therefore, the Perl 6 parser is going to assume that age is a function you haven't declared yet. Then it's going to consider the < an opening quote character. Eventually it'll be disappointed that there's no >. Again, macros don't rescue you from needing to declare your variables up-front. (Though see #159 for further discussion.))

How are scalars stored 'under the hood' in perl?

The basic types in perl are different then most languages, with types being scalar, array, hash (but apparently not subroutines, &, which I guess are really just scalar references with syntactical sugar). What is most odd about this is that the most common data types: int, boolean, char, string, all fall under the basic data type "scalar". It seems that perl decides rather to treat a scalar as a string, boolean, or number based off of the operator that modifies it, implying the scalar itself is not actually defined as "int" or "String" when saved.
This makes me curious as to how these scalars are stored "under the hood", particularly in regards to it's effect on efficiency (yes I know scripting languages sacrifice efficiency for flexibility, but they still need to be as optimized as possible when flexibility concerns are not affected). It's much easier for me to store the number 65535 (which takes two bytes) then the string "65535" which takes 6 bytes, as such recognizing that $val = 65535 is storing an int would allow me to use 1/3 the memory, in large arrays this could mean fewer cache hits as well.
It's not just limited to saving memory of course. There are times when I can offer more significant optimizations if I know what type of scalar to expect. For instance if I have a hash using very large integers as keys it would be far faster to look up a value if I recognizing the keys as ints, allowing a simply modulo for creating my hash key, then if I have to run more complex hashing logic on a string that has 3 times the bytes.
So I'm wondering how perl handles these scalars under the hood. Does it store every value as a string, sacrificing the extra memory and cpu cost of constant converting string to int in the case that a scalar is always used as an int? Or does it have some logic for inference the type of scalar used to determine how to save and manipulate it?
Edit:
TJD linked to perlguts, which answers half my question. A scalar is actually stored as string, int (signed, unsigned, double) or pointer. I'm not too surprised, I had mostly expected this behavior to occur under the hood, though it's interesting to see the exact types. I'm leaving this question open though because perlguts is actually to low level. Other then telling me that 5 data types exist it doesn't specify how perl works to alternate between them, ie how perl decides which SV type to use when a scalar is saved and how it knows when/how to cast.
There are actually a number of types of scalars. A scalar of type SVt_IV can hold undef, a signed integer (IV) or an unsigned integer (UV). One of type SVt_PVIV can also hold a string[1]. Scalars are silently upgraded from one type to another as needed[2]. The TYPE field indicates the type of a scalar. In fact, arrays (SVt_AV) and hashes (SVt_HV) are really just types of scalars.
While the type of a scalar indicates what the scalar can contain, flags are used to indicate what a scalar does contain. This is stored in the FLAGS field. SVf_IOK signals that a scalar contains a signed integer, while SVf_POK indicates it contains a string[3].
Devel::Peek's Dump is a great tool for looking at the internals of scalars. (The constant prefixes SVt_ and SVf_ are omitted by Dump.)
$ perl -e'
use Devel::Peek qw( Dump );
my $x = 123;
Dump($x);
$x = "456";
Dump($x);
$x + 0;
Dump($x);
'
SV = IV(0x25f0d20) at 0x25f0d30 <-- SvTYPE(sv) == SVt_IV, so it can contain an IV.
REFCNT = 1
FLAGS = (IOK,pIOK) <-- IOK: Contains an IV.
IV = 123 <-- The contained signed integer (IV).
SV = PVIV(0x25f5ce0) at 0x25f0d30 <-- The SV has been upgraded to SVt_PVIV
REFCNT = 1 so it can also contain a string now.
FLAGS = (POK,IsCOW,pPOK) <-- POK: Contains a string (but no IV since !IOK).
IV = 123 <-- Meaningless without IOK.
PV = 0x25f9310 "456"\0 <-- The contained string.
CUR = 3 <-- Number of bytes used by PV (not incl \0).
LEN = 10 <-- Number of bytes allocated for PV.
COW_REFCNT = 1
SV = PVIV(0x25f5ce0) at 0x25f0d30
REFCNT = 1
FLAGS = (IOK,POK,IsCOW,pIOK,pPOK) <-- Now contains both a string (POK) and an IV (IOK).
IV = 456 <-- This will be used in numerical contexts.
PV = 0x25f9310 "456"\0 <-- This will be used in string contexts.
CUR = 3
LEN = 10
COW_REFCNT = 1
illguts documents the internal format of variables quite thoroughly, but perlguts might be a better place to start.
If you start writing XS code, keep in mind it's usually a bad idea to check what a scalar contains. Instead, you should request what should have been provided (e.g. using SvIV or SvPVutf8). Perl will automatically convert the value to the requested type (and warn if appropriate). API calls are documented in perlapi.
In fact, it can hold a string an either a signed integer or an unsigned integer at the same time.
All scalars (including arrays and hashes, excluding one type of scalar that can only hold undef) have two memory blocks at their base. Pointers to the scalar point to its head, which contains the TYPE field and a pointer to the body. Upgrading a scalar replaces the body of the scalar. That way, pointers to the scalar aren't invalidated by an upgrade.
An undef variable is one without any uppercase OK flags set.
The formats used by Perl for data storage are documented in the perlguts perldoc.
In short, though, a Perl scalar is stored as a SV structure containing one of a number of different types, such as an int, a double, a char *, or a pointer to another scalar. (These types are stored as a C union, so only one of them will be present at a time; the SV contains flags indicating which type is used.)
(With regard to hash keys, there's an important gotcha to note there: hash keys are always strings, and are always stored as strings. They're stored in a different type from other scalars.)
The Perl API includes a number of functions which can be used to access the value of a scalar as a desired C type. For example, SvIV() can be used to return the integer value of a SV: if the SV contains an int, that value is returned directly; if the SV contains another type, it's coerced to an integer as appropriate. These functions are used throughout the Perl interpreter for type conversions. However, there is no automatic inference of types on output; functions which operate on strings will always return a PV (string) scalar, for instance, regardless of whether the string "looks like" a number or not.
If you're curious what a given scalar looks like internally, you can use the Devel::Peek module to dump its contents.
Others have addressed the "how are scalars stored" part of your question, so I'll skip that. With regard to how Perl decides which representation of a value to use and when to convert between them, the answer is it depends on which operators are applied to the scalar. For example, given this code:
my $score = 0;
The scalar $score will be initialised with an integer value. But then when this line of code is run:
say "Your score is $score";
The double quote operator means that Perl will need a string representation of the value. So the conversion from integer to string will take place as part of the process of assembling the string argument to the say function. Interestingly, after the stringification of $score, the underlying representation of the scalar will now include both an integer and a string representation, allowing subsequent operations to directly grab the relevant value without having to convert again. If a numeric operator is then applied to the string (e.g.: $score++) then the numeric part will be updated and the (now invalid) string part will be discarded.
This is the reason why Perl operators tend to come in two flavours. For example comparing values of numbers is done with <, ==, > while performing the same comparisons with strings would be done with lt, eq, gt. Perl will coerce the value of the scalar(s) to the type which matches the operator. This is why the + operator does numeric addition in Perl but a separate operator . is needed to do string concatenation: + will coerce its arguments to numeric values and . will coerce to strings.
There are some operators that will work with both numeric and string values but which perform a different operation depending on the type of value. For example:
$score = 0;
say ++$score; # 1
say ++$score; # 2
say ++$score; # 3
$score = 'aaa';
say ++$score; # 'aaa'
say ++$score; # 'aab'
say ++$score; # 'aac'
With regard to questions of efficiency (and bearing in mind standard disclaimers about premature optimisation etc). Consider this code which reads a file containing one integer per line, each integer is validated to check it is exactly 8 digits long and the valid ones are stored in an array:
my #numbers;
while(<$fh>) {
if(/^(\d{8})$/) {
push #numbers, $1;
}
}
Any data read from a file will initially come to us as a string. The regex used to validate the data will also require a string value in $_. So the result is that our array #numbers will contain a list of strings. However, if further uses of the values will be solely in a numeric context, we could use this micro-optimisation to ensure that the array contained only numeric values:
push #numbers, 0 + $1;
In my tests with a file of 10,000 lines, populating #numbers with strings used nearly three times as much memory as populating with integer values. However as with most benchmarks, this has little relevance to normal day-to-day coding in Perl. You'd only need to worry about that in situations where you a) had performance or memory issues and b) were working with a large number of values.
It's worth pointing out that some of this behaviour is common to other dynamic languages (e.g.: Javascript will silently coerce numeric values to strings).

Perl variables defined with * vs $

What's the difference between defining a variable with a * vs a $? For example:
local $var;
local *var;
The initial character is known as a sigil, and says what sort of value the identifier represents. You will know most of them. Here's a list
Dollar $ is a scalar value
At sign # is an array value
Percent % is a hash value
Ampersand & is a code value
Asterisk * is a typeglob
You are less likely to have come across the last two recently, because & hasn't been necessary when calling subroutines since Perl 5.0 was released. And typeglobs are a special type that contains all of the other types, and are much more rarely used.
I'm considering how much deeper to go into all of this, but will leave my answer as it is for now. I may write more depending on the comments that arise.
$var is a scalar. *var is a typeglob. http://perldoc.perl.org/perldata.html#Typeglobs-and-Filehandles
It's not a variable in the strictest sense. You shouldn't generally be using it.

Why can't parameter names start with a number?

Variable names can be numeric-alpha, why can't parameter names behave this way as well?
Because the first-parameter-char of the command-parameter syntax as specified in the Powershell Language Specification does not allow for it.
2.3.4 Parameters
Syntax:
command-parameter:
dash first-parameter-char parameter-chars colonopt
first-parameter-char:
A Unicode character of classes Lu, Ll, Lt, Lm, or Lo
_ (The underscore character U+005F)
?
You can find a list of unicode character classes here.
It's definitely convention, and is not a purely technical limitation.
function f($1) {
$1
}
# Works positionally
f 1
# Works with splatting
$h = #{"1" = 2}
f #H
# Doesn't work by name
f -1 2
It's worth noting that the Language Specification was written after the language itself, so there are probably a few points where it's a little more specific and PowerShell itself is.

Why do the '<' and 'lt' operators return different results in Perl?

I am just learning Perl's comparison operators. I tried the below code :-
$foo=291;
$bar=30;
if ($foo < $bar) {
print "$foo is less than $bar (first)\n";
}
if ($foo lt $bar) {
print "$foo is less than $bar (second)\n";
}
The output is 291 is less than 30 (second). Does this mean the lt operator always converts the variables to string and then compare? What is the rationale for Perl making lt operator behave differently from the < operator?
Thanks,
Your guess is right. The alphabetic operators like lt compare the variables as strings whereas the symbolic ones like < compare them as numbers. You can read the perlop man page for more details.
The rationale is that scalars in Perl are not typed, so without you telling it Perl would not know how to compare two variables. If it did guess then it would sometimes getting it wrong, which would lead to having to do things like ' ' + $a < ' ' + $b to force string comparsion which is probably worse than lt.
That said this is a horrible gotcha which probably catches out everyone new to Perl and still catches me out when coming back to Perl after some time using a less post-modern language.
Since Perl is loosely typed, and values can silently convert between strings and integers at any moment, Perl needs two different types of comparison operators to distinguish between integer comparison (<) and string comparison (lt). If you only had one operator, how would you tell the difference?
Rationale? It's a string operator. From "perldoc perlop":
Binary "lt" returns true if the left argument is stringwise less than the right argument.
If that's not what you want, don't use it.
lt compares values lexically (i.e. in ASCII/UNICODE or locale order) and < compares values numerically. Perl has both operators for the same reason "10" + 5 is 15 rather than a type error: it is weakly typed. You must always tell the computer something unambiguous. Languages that are strongly typed tend to use casting to resolve ambiguity, whereas, weakly typed languages tend to use lots of operators. The Python (a strongly typed language) equivalent to "10" + 5 is float("10") + 5.
Does this mean the 'lt' operator
always converts the variables to
string and then compare?
Yes, see perlop
What is the rationale for Perl making
'lt' operator behave differently from
'<' operator?
Because having a numeric comparison operator and a string comparison operator makes a lot more sense then having a mumble mumble operator and another, identical mumble mumble operator.