$self->[UTF8] = $conf->{utf8};
Never seen such code before.
What does [] mean here?
In this case, the object $self is implemented as a blessed array reference rather than the far more common method of using a blessed hash reference. The syntax $foo->[42] accesses a single element from an array reference. Presumably, UTF8 is a constant that returns a numeric index into the array.
You see this idiom sometimes when people become convinced (usually incorrectly) that hash lookups on object attributes result in significant overhead and try to prematurely optimize their code.
The [] implies that $self is a reference to a list/array (assuming the code works). This looks a bit odd, though, as list indexes should be numeric.
Related
I can define an array as a generic list like this
$array = [Collections.Generic.List[String]]#()
And I can define an element in a hash table as an array like this
$hash = #{
array = #()
}
But I can't define an element in a hash table as a Generic List, like this
$hash = #{
array = [Collections.Generic.List[String]]#()
}
Instead I get this error
Cannot convert the "System.Object[]" value of type "System.Object[]"
to type "System.Collections.Generic.List`1[System.String]
I have been using Generic Lists to avoid the (minor in my case, to be sure) performance issue with regularly adding to a standard array. But this is the first time I have needed to create a hash table that contains a generic list (for a complex return value).
So, first question, is this is even possible? And second question, what is the difference under the hood between simply setting a variable and a hash table element?
EDIT: This is interesting. I CAN use
[System.Collections.ArrayList]#()
and it works. So, now I am curious what exactly is the difference between
[System.Collections.ArrayList]
and
[Collections.Generic.List[String]]
I guess this is the down side of being self taught. I found reference to [Collections.Generic.List[String]] on a BLOG, and maybe [System.Collections.ArrayList] is a much better answer? What I think I understand from this is that the former is specifically typed as a list of strings, while the latter is a list of generic objects, which then must be cast in use, which has potential bug and performance issues. Still, I wonder why the typed generic doesn't work in a hash table.
Imagine I have a referece that points to an array that contains many anonimous arrays. Ex:
my #main_array = ( [1,2,3], [3,4,5], ['a','b','c'] );
my $reference = \#main_array
If later on I'm done using the data from that array and I only have a reference to it, what is the best method to delete that array and free the memory?
I usually do the following to free the memory used by data in a simple array:
undef #array
but because I only have a reference to it I thought about doing this
undef #{$reference}
If I do that, wouldn't I just be deleting the references to the anonymous arrays stored in the array (main_array) and not the actual content of the anonymous arrays?
I guess my question can be simplify as this: Does deleting a reference makes Perl free the memory used by the array, hash or scalar referred by the reference?
Thank you
Yes, undef #{$reference} (or undef #$reference) will do what undef #array did. It will free almost all memory used by the array to be reusable by the program.
But there is very rarely any good reason to do this. When your lexical $reference goes out of scope, the same thing will happen. Explicitly calling undef on it first will just make your code minutely slower.
If later on I'm done using the data from that array and I only have a reference to it, what is the best method to delete that array and free the memory?
Ideally, just let $reference go out of scope. Otherwise, you can use $reference = undef;.
I'm not seeing a lot of info in the swift stdlib reference. For example, Dictionary says certain methods (like remove) will invalidate indices, but that's it.
For a language to call itself "safe", it needs a solution to the classic C++ footguns:
get pointer to element in a vector, then add more elements (pointer is now invalidated), now use pointer, crash
start iterating through a collection. while iterating, remove some elements (either before or after the current iterator position). continue iterating, crash.
(edit: in c++, you're lucky to crash - worse case is memory corruption)
I believe 1 is solved by swift because if a collection stores classes, taking a reference (e.g. strong pointer) to an element will increase the refcount. However, I don't know the answer for 2.
It would be super useful if there was a comparison of footguns in c++ that are/are not solved by swift.
EDIT, due to Robs answer:
It does appear that there's some undocumented snapshot-like behavior going on
with Dictionary and/or for loops. The iteration creates a snapshot / hidden
copy of it when it starts.
Which gives me both a big "WAT" and "cool, that's sort of safe, I guess", and "how expensive is this copy?".
I don't see this documented either in Generator or in for-loop.
The below code prints two logical snapshots of the dictionary. The first
snapshot is userInfo as it was at the start of the iteration loop, and does
not reflect any modifications made during the loop.
var userInfo: [String: String] = [
"first_name" : "Andrei",
"last_name" : "Puni",
"job_title" : "Mad scientist"
]
userInfo["added_one"] = "1" // can modify because it's var
print("first snapshot:")
var hijacked = false
for (key, value) in userInfo {
if !hijacked {
userInfo["added_two"] = "2" // doesn't error
userInfo.removeValueForKey("first_name") // doesn't error
hijacked = true
}
print("- \(key): \(value)")
}
userInfo["added_three"] = "3" // modify again
print("final snapshot:")
for (key, value) in userInfo {
print("- \(key): \(value)")
}
As you say, #1 is not an issue. You do not have a pointer to the object in Swift. You either have its value or a reference to it. If you have its value, then it's a copy. If you have a reference, then it's protected. So there's no issue here.
But let's consider the second and experiment, be surprised, and then stop being surprised.
var xs = [1,2,3,4]
for x in xs { // (1)
if x == 2 {
xs.removeAll() // (2)
}
print(x) // Prints "1\n2\n3\n\4\n"
}
xs // [] (3)
Wait, how does it print all the values when we blow away the values at (2). We are very surprised now.
But we shouldn't be. Swift arrays are values. The xs at (1) is a value. Nothing can ever change it. It's not "a pointer to memory that includes an array structure that contains 4 elements." It's the value [1,2,3,4]. At (2), we don't "remove all elements from the thing xs pointed to." We take the thing xs is, create an array that results if you remove all the elements (that would be [] in all cases), and then assign that new array to xs. Nothing bad happens.
So what does the documentation mean by "invalidates all indices?" It means exactly that. If we generated indices, they're no good anymore. Let's see:
var xs = [1,2,3,4]
for i in xs.indices {
if i == 2 {
xs.removeAll()
}
print(xs[i]) // Prints "1\n2\n" and then CRASH!!!
}
Once xs.removeAll() is called, there's no promise that the old result of xs.indices means anything anymore. You are not permitted to use those indices safely against the collection they came from.
"Invalidates indices" in Swift is not the same as C++'s "invalidates iterators." I'd call that pretty safe, except the fact that using collection indices is always a bit dangerous and so you should avoid indexing collections when you can help it; iterate them instead. Even if you need the indexes for some reason, use enumerate to get them without creating any of the danger of indexing.
(Side note, dict["key"] is not indexing into dict. Dictionaries are a little confusing because their key is not their index. Accessing dictionaries by their DictionaryIndex index is just as dangerous as accessing arrays by their Int index.)
Note also that the above doesn't apply to NSArray. If you modify NSArray while iterating it, you'll get a "mutated collection while iterating" error. I'm only discussing Swift data types.
EDIT: for-in is very explicit in how it works:
The generate() method is called on the collection expression to obtain a value of a generator type—that is, a type that conforms to the GeneratorType protocol. The program begins executing a loop by calling the next() method on the stream. If the value returned is not None, it is assigned to the item pattern, the program executes the statements, and then continues execution at the beginning of the loop. Otherwise, the program does not perform assignment or execute the statements, and it is finished executing the for-in statement.
The returned Generator is a struct and contains a collection value. You would not expect any changes to some other value to modify its behavior. Remember: [1,2,3] is no different than 4. They're both values. When you assign them, they make copies. So when you create a Generator over a collection value, you're going to snapshot that value, just like if I created a Generator over the number 4. (This raises an interesting problem, because Generators aren't really values, and so really shouldn't be structs. They should be classes. Swift stdlib has been fixing that. See the new AnyGenerator for instance. But they still contain an array value, and you would never expect changes to some other array value to impact them.)
See also "Structures and Enumerations Are Value Types" which goes into more detail on the importance of value types in Swift. Arrays are just structs.
Yes, that means there's logically copying. Swift has many optimizations to minimize actual copying when it's not needed. In your case, when you mutate the dictionary while it's being iterated, that will force a copy to happen. Mutation is cheap if you're the only consumer of a particular value's backing storage. But it's O(n) if you're not. (This is determined by the Swift builtin isUniquelyReferenced().) Long story short: Swift Collections are Copy-on-Write, and simply passing an array does not cause real memory to be allocated or copied.
You don't get COW for free. Your own structs are not COW. It's something that Swift does in stdlib. (See Mike Ash's great discussion of how you would recreate it.) Passing your own custom structs causes real copies to happen. That said, the majority of the memory in most structs is stored in collections, and those collections are COW, so the cost of copying structs is usually pretty small.
The book doesn't spend a lot of time drilling into value types in Swift (it explains it all; it just doesn't keep saying "hey, and this is what that implies"). On the other hand, it was the constant topic at WWDC. You may be interested particularly in Building Better Apps with Value Types in Swift which is all about this topic. I believe Swift in Practice also discussed it.
EDIT2:
#KarlP raises an interesting point in the comments below, and it's worth addressing. None of the value-safety promises we're discussing are related to for-in. They're based on Array. for-in makes no promises at all about what would happen if you mutated a collection while it is being iterated. That wouldn't even be meaningful. for-in doesn't "iterate over collections," it calls next() on Generators. So if your Generator becomes undefined if the collection is changed, then for-in will blow up because the Generator blew up.
That means that the following might be unsafe, depending on how strictly you read the spec:
func nukeFromOrbit<C: RangeReplaceableCollectionType>(var xs: C) {
var hijack = true
for x in xs {
if hijack {
xs.removeAll()
hijack = false
}
print(x)
}
}
And the compiler won't help you here. It'll work fine for all of the Swift collections. But if calling next() after mutation for your collection is undefined behavior, then this is undefined behavior.
My opinion is that it would be poor Swift to make a collection that allows its Generator to become undefined in this case. You could even argue that you've broken the Generator spec if you do (it offers no UB "out" unless the generator has been copied or has returned nil). So you could argue that the above code is totally within spec and your generator is broken. Those arguments tend to be a bit messy with a "spec" like Swift's which doesn't dive into all the corner cases.
Does this mean you can write unsafe code in Swift without getting a clear warning? Absolutely. But in the many cases that commonly cause real-world bugs, Swift's built-in behavior does the right thing. And in that, it is safer than some other options.
I am new to Perl and reading about references.
I can not understand how doe one know if the variable he work on is a reference.
For instance if I understand correctly, this:
$b = $a could be assigning scalars or references. How do we know which is it?
In C or C++ we would know via the function signature (*a or &a of **a). But in Perl there is no signature of parameters.
So how do we know in code what is a reference and what is not? Or if it is a reference to scalar or array or hash or another reference?
Perl has a ref that you can use for that:
Returns a non-empty string if EXPR is a reference, the empty string otherwise. [...]
The string returned (if non-empty) will tell you the type of object the reference references.
You're asking the wrong question.
While there is a function called ref and another called reftype, these are not functions you should ever need to use.
It's bad to check the type of variables, because there's no way to effectively know without actually using it as intended due to overloading and magic.
For example, say you designed a function that accepts a reference or a string. That would be a bad design because an object that overloads stringification is both.
A good interface would use context to differentiate the arguments. For example, it could differentiate based on the number of arguments,
foo($point_obj)
-vs-
foo(x => $x, y => $y)
based on the value of other arguments,
foo(fh => $fh)
-vs-
foo(str => $file_contents)
or based on the choice of function called
foo_from_fh($fh)
-vs-
foo($file_contents)
So the answer is: You know it's a reference because your documentation instructs the caller of your function to pass a reference. If you got passed something other than a reference and it's used as a reference, the caller will get a strict error for their error.
The ref function is what you're looking for. Documentation is available at http://perldoc.perl.org/functions/ref.html
ref EXPR
Returns a non-empty string if EXPR is a reference, the empty string otherwise. If EXPR is
not specified, $_ will be used. The value returned depends on the type of thing the
reference is a reference to...
I am mulling over a best practice for passing hash references for return data to/from functions.
On the one hand, it seems intuitive to pass only input values to a function and have only return output variables. However, passing hashes in Perl can only be done by reference, so it is a bit messy and would seem more of an opportunity to make a mistake.
The other way is to pass a reference in the input variables, but then it has to be dealt with in the function, and it may not be clear what is an input and what is a return variable.
What is a best practice regarding this?
Return references to an array and a hash, and then dereference it.
($ref_array,$ref_hash) = $this->getData('input');
#array = #{$ref_array};
%hash = %{$ref_hash};
Pass in references (#array, %hash) to the function that will hold the output data.
$this->getData('input', \#array, \%hash);
Just return the reference. There is no need to dereference the whole
hash like you are doing in your examples:
my $result = some_function_that_returns_a_hashref;
say "Foo is ", $result->{foo};
say $_, " => ", $result->{$_} for keys %$result;
etc.
I have never seen anyone pass in empty references to hold the result. This is Perl, not C.
Trying to create copies by saying
my %hash = %{$ref_hash};
is even more dangerous than using the hashref. This is because it only creates a shallow copy. This will lead you to thinking it is okay to modify the hash, but if it contains references they will modify the original data structure. I find it better to just pass references and be careful, but if you really want to make sure you have a copy of the reference passed in you can say:
use Storable qw/dclone/;
my %hash = %{dclone $ref_hash};
The first one is better:
my ($ref_array,$ref_hash) = $this->getData('input');
The reasons are:
in the second case, getData() needs to
check the data structures to make
sure they are empty
you have freedom to return undef as a special value
it looks more Perl-idiomatic.
Note: the lines
#array = #{$ref_array};
%hash = %{$ref_hash};
are questionable, since you shallow-copy the whole data structures here. You can use references everywhere where you need array/hash, using -> operator for convenience.
If it's getting complicated enough that both the callsite and the called function are paying for it (because you have to think/write more every time you use it), why not just use an object?
my $results = $this->getData('input');
$results->key_value_thingies;
$results->listy_thingies;
If making an object is "too complicated" then start using Moose so that it no longer is.
My personal preference for sub interfaces:
If the routine has 0-3 arguments, they may be passed in list form: foo( 'a', 12, [1,2,3] );
Otherwise pass a list of name value pairs. foo( one => 'a', two => 12, three => [1,2,3] );
If the routine has or may have more than one argument seriously consider using name/value pairs.
Passing in references increases the risk of inadvertent data modification.
On returns I generally prefer to return a list of results rather than an array or hash reference.
I return hash or array refs when it will make a noticeable improvement in speed or memory consumption (ie BIG structures), or when a complex data structure is involved.
Returning references when not needed deprives one of the ability to take advantage of Perl's nice list handling features and exposes one to the dangers of inadvertent modification of data.
In particular, I find it useful to assign a list of results into an array and return the array, which provides the contextual return behaviors of an array to my subs.
For the case of passing in two hashes I would do something like:
my $foo = foo( hash1 => \%hash1, hash2 => \%hash2 ); # gets number of items returned
my #foo = foo( hash1 => \%hash1, hash2 => \%hash2 ); # gets items returned
sub foo {
my %arg = #_;
# do stuff
return #results;
}
I originally posted this to another question, and then someone pointed to this as a "related post", so I'll post it here to for my take on the subject, assuming people will encounter it in the future.
I'm going to contradict the Accepted Answer and say that I prefer to have my data returned as a plain hash (well, as an even-sized list which is likely to be interpreted as a hash). I work in an environment where we tend to do things like the following code snippet, and it's much easier to combine and sort and slice and dice when you don't have to dereference every other line. (It's also nice to know that someone can't damage your hashref because you passed the entire thing by value -- though someone pointed out that if your hash contains more than simple scalars it's not so simple.)
my %filtered_config_slice =
hashgrep { $a !~ /^apparent_/ && defined $b } (
map { $_->build_config_slice(%some_params, some_other => 'param') }
($self->partial_config_strategies, $other_config_strategy)
);
This approximates something that my code might do: building a configuration for an object based on various configuration strategy objects (some of which the object knows about inherently, plus some extra guy) and then filters out some of them as irrelevant.
(Yes, we have nice tools like hashgrep and hashmap and lkeys that do useful things to hashes. $a and $b get set to the key and the value of each item in the list, respectively). (Yes, we have people who can program at this level. Hiring is obnoxious, but we have a quality product.)
If you don't intend to do anything resembling functional programming like this, or if you need more performance (have you profiled?) then sure, use hashrefs.
Uh... "passing hashes can only be done by reference"?
sub foo(%) {
my %hash = #_;
do_stuff_with(%hash);
}
my %hash = (a => 1, b => 2);
foo(%hash);
What am I missing?
I would say that if the issue is that you need to have multiple outputs from a function, it's better as a general practice to output a data structure, probably a hash, that holds everything you need to send out rather than taking modifiable references as arguments.