How to implement Python's digest function in Raku? - hash

In Python, there is a function called digest from hashlib module, which returns the digest of a byte string:
import hashlib
id = "65766"
song_id = bytearray(id, "u8")
m = hashlib.md5(song_id)
result = m.digest()
print(result)
# Output
# b"\xc9j\xa4/vy}+'\xe6\x8e\xe4\xcc\xd8\xa8\xc8"
I find a module from raku.land named Digest::MD5, but it doesn't provide the digest sub:
my $d = Digest::MD5.new;
my $id = "65766";
my Buf $md5-buf = $d.md5_buf($id);
# ???
And I don't want to introduce Inline::Python or Inline::Perl5 into my project, Is it possible to implement digest sub in Raku?

TL;DR Try the md5 sub of the Digest package.
Digest
A glance at raku.land shows various options. The first one was last updated an hour ago, and while that doesn't prove anything about quality or functionality, it's at least "promising". (And it's grondilu. I trust cosimo too, but the updates suggest grondilu is engaged and cosimo isn't.)
So I suggest you read Digest's README and/or install it and read its code. At a glance I would expect the md5 sub to work.
Digest::MD5
From its README:
An interface-compatible port of Perl 5 Digest::MD5
Raku's standard strings are dramatically different from Perl's or Python's. I won't comment further other than to say this seems likely to be a major source of friction that is pointless unless you really need to have interface compatibility with Perl.
Should work with latest (2012.01) release of Rakudo
Wow. It's had an update in 2017, but I see unresolved PRs and, well, to be as open minded as possible, I'll say this package might work for folk who are:
Needing the same interface as the corresponding Perl package.
OK with presuming that everything is as it was more than 10 years ago (4 years before the first official version of the language and compiler were released) or updating the package if needed.
Willing to consider a package that seems to be no longer actively stewarded.
To be pragmatic / closed minded, I'd initially presume this package should only be considered if all of the above applies and you've failed to meet your (relatively simple/basic) needs some much more promising way.

Related

nytprofhtml seem to ignore module named DB

I'm trying to profile a big application that contains a module creatively named DB. After running it under -d:NYTProf and calling nytprofhtml, both without any additional switches or related environment variables, I get usual nytprof directory with HTML output. However it seems that due to some internal logic any output related to my module DB is severely mangled. Just to make sure, DB is pure Perl.
Top and all subroutines list: while other function links point to relevant -pm-NN-line.html file, links to subs from DB point to entry script instead.
"line" link in "Source Code Files" section does point to DB-pm-NN-line.html and it does exists, but unlike all other files it doesn't have "Statements" table inside and "Line" table have absolutely no lines of code, only summary of calls.
Actually, here's a small example:
# main.pl
use lib '.';
use DB;
for (1..10) {
print DB::do_stuff();
}
# DB.pm
package DB;
sub do_stuff {
my $a = 1;
my $b = 2;
my $c = $a + $b;
return $c;
}
1;
Try running perl -d:NYTProf main.pl, then nytprofhtml and then inspect nytprof/DB-pm-8-line.html.
I don't know if it happens because NYTProf itself have internal module named DB or it handles modules starting with DB in some magical way - I've noticed output for functions from DBI looks somewhat different too.
Is there a way to change/disable this behavior short of renaming my DB module?
That's hardly an option
You don't really have an option. The special DB package and the associated Devel:: namespace are coded into the perl compiler / interpreter. Unless you want to do without any debugging facilities altogether and live in fear of any mysterious unquantifiable side effects then you must refactor your code to rename your proprietary DB library
Anyway, the name is very generic and that's exactly why it is expected to be encountered
On the contrary, Devel::NYTProf is bound to use the existing core package DB. It's precisely because it is a very generic identifier that an engineer should reject it as a choice for production code, which could be required to work with pre-existing third-party code at any point
a module creatively named DB
This belies your own support of the choice. DBhas been a core module since v5.6 in 2000, and anything already in CPAN should have been off the cards from the start. I feel certain that the clash of namespaces must been discovered before now. Please don't be someone else who just sweeps the issue under the carpet
Remember that, as it stands, any code you are now putting in the DB package is sharing space with everything that perl has already put there. It is astonishing that you are not experiencing strange and inexplicable symptoms already
I don't see that it's too big a task to fix this as long as you have a proper test suite for your 10MB of Perl code. If you don't, then hopefully you will at least not make the same mistakes again

What happens if I reference a package but don't use/require it?

As much as I can (mostly for clarity/documentation), I've been trying to say
use Some::Module;
use Another::Module qw( some namespaces );
in my Perl modules that use other modules.
I've been cleaning up some old code and see some places where I reference modules in my code without ever having used them:
my $example = Yet::Another::Module->AFunction($data); # EXAMPLE 1
my $demo = Whats::The::Difference::Here($data); # EXAMPLE 2
So my questions are:
Is there a performance impact (I'm thinking compile time) by not stating use x and simply referencing it in the code?
I assume that I shouldn't use modules that aren't utilized in the code - I'm telling the compiler to compile code that is unnecessary.
What's the difference between calling functions in example 1's style versus example 2's style?
I would say that this falls firmly into the category of preemptive optimisation and if you're not sure, then leave it in. You would have to be including some vast unused libraries if removing them helped at all
It is typical of Perl to hide a complex issue behind a simple mechanism that will generally do what you mean without too much thought
The simple mechanisms are these
use My::Module 'function' is the same as writing
BEGIN {
require My::Module;
My::Module->import( 'function' );
}
The first time perl successfully executes a require statement, it adds an element to the global %INC hash which has the "pathified" module name (in this case, My/Module.pm) for a key and the absolute location where it found the source as a value
If another require for the same module is encountered (that is, it already exists in the %INC hash) then require does nothing
So your question
What happens if I reference a package but don't use/require it?
We're going to have a problem with use, utilise, include and reference here, so I'm code-quoting only use and require when I mean the Perl language words.
Keeping things simple, these are the three possibilities
As above, if require is seen more than once for the same module source, then it is ignored after the first time. The only overhead is checking to see whether there is a corresponding element in %INC
Clearly, if you use source files that aren't needed then you are doing unnecessary compilation. But Perl is damn fast, and you will be able to shave only fractions of a second from the build time unless you have a program that uses enormous libraries and looks like use Catalyst; print "Hello, world!\n";
We know what happens if you make method calls to a class library that has never been compiled. We get
Can't locate object method "new" via package "My::Class" (perhaps you forgot to load "My::Class"?)
If you're using a function library, then what matters is the part of use that says
My::Module->import( 'function' );
because the first part is require and we already know that require never does anything twice. Calling import is usually a simple function call, and you would be saving nothing significant by avoiding it
What is perhaps less obvious is that big modules that include multiple subsidiaries. For instance, if I write just
use LWP::UserAgent;
then it knows what it is likely to need, and these modules will also be compiled
Carp
Config
Exporter
Exporter::Heavy
Fcntl
HTTP::Date
HTTP::Headers
HTTP::Message
HTTP::Request
HTTP::Response
HTTP::Status
LWP
LWP::MemberMixin
LWP::Protocol
LWP::UserAgent
Storable
Time::Local
URI
URI::Escape
and that's ignoring the pragmas!
Did you ever feel like you were kicking your heels, waiting for an LWP program to compile?
I would say that, in the interests of keeping your Perl code clear and tidy, it may be an idea to remove unnecessary modules from the compilation phase. But don't agonise over it, and benchmark your build times before doing any pre-handover tidy. No one will thank you for reducing the build time by 20ms and then causing them hours of work because you removed a non-obvious requirement.
You actually have a bunch of questions.
Is there a performance impact (thinking compile time) by not stating use x and simply referencing it in the code?
No, there is no performance impact, because you can't do that. Every namespace you are using in a working program gets defined somewhere. Either you used or required it earlier to where it's called, or one of your dependencies did, or another way1 was used to make Perl aware of it
Perl keeps track of those things in symbol tables. They hold all the knowledge about namespaces and variable names. So if your Some::Module is not in the referenced symbol table, Perl will complain.
I assume that I shouldn't use modules that aren't utilized in the code - I'm telling the compiler to compile code that is unnecessary.
There is no question here. But yes, you should not do that.
It's hard to say if this is a performance impact. If you have a large Catalyst application that just runs and runs for months it doesn't really matter. Startup cost is usually not relevant in that case. But if this is a cronjob that runs every minute and processes a huge pile of data, then an additional module might well be a performance impact.
That's actually also a reason why all use and require statements should be at the top. So it's easy to find them if you need to add or remove some.
What's the difference between calling functions in example 1's style versus example 2's style?
Those are for different purposes mostly.
my $example = Yet::Another::Module->AFunction($data); # EXAMPLE 1
This syntax is very similar to the following:
my $e = Yet::Another::Module::AFunction('Yet::Another::Module', $data)
It's used for class methods in OOP. The most well-known one would be new, as in Foo->new. It passes the thing in front of the -> to the function named AFunction in the package of the thing on the left (either if it's blessed, or if it's an identifier) as the first argument. But it does more. Because it's a method call, it also takes inheritance into account.
package Yet::Another::Module;
use parent 'A::First::Module';
1;
package A::First::Module;
sub AFunction { ... }
In this case, your example would also call AFunction because it's inherited from A::First::Module. In addition to the symbol table referenced above, it uses #ISA to keep track of who inherits from whom. See perlobj for more details.
my $demo = Whats::The:Difference::Here($data); # EXAMPLE 2
This has a syntax error. There is a : missing after The.
my $demo = Whats::The::Difference::Here($data); # EXAMPLE 2
This is a function call. It calls the function Here in the package Whats::The::Difference and passes $data and nothing else.
Note that as Borodin points out in a comment, your function names are very atypical and confusing. Usually functions in Perl are written with all lowercase and with underscores _ instead of camel case. So AFunction should be a_function, and Here should be here.
1) for example, you can have multiple package definitions in one file, which you should not normally do, or you could assign stuff into a namespace directly with syntax like *Some::Namespace::frobnicate = sub {...}. There are other ways, but that's a bit out of scope for this answer.

BigQuery: Hashing a String Doesn't Match CityHash

Trying to get my external CityHash to return the same value as BigQuery Hash().
Here are the values I'm trying to match:
The only hashed string that matches is a blank string.
In the BigQuery Query Reference, it mentions that it uses the CityHash libarary. I've tried using multiple external libraries for CityHash, and they're all consistent with each other, but not with BigQuery Hash()
Here is an example of CityHash in Go (Golang):
package main
import (
"fmt"
"bitbucket.org/creachadair/cityhash"
)
func main() {
var bytesToHash = []byte("mystringtohash")
myHash := int64(cityhash.Hash64(bytesToHash))
fmt.Printf("Hashed version of '%s': %d\n", bytesToHash, myHash)
bytesToHash = []byte("")
myHash = int64(cityhash.Hash64(bytesToHash))
fmt.Printf("Hashed version of '%s': %d\n", bytesToHash, myHash)
}
Here is the output from my program:
Hashed version of 'mystringtohash': -6615946700494525143
Hashed version of '1234': 882600748797058222
Hashed version of '': -7286425919675154353
Is BigQuery doing something special with the string before hashing it?
OK, I spent some time going through the code, and here is what I think happened.
BigQuery's implementation of CityHash is based on code in version 1.0.3 (can be still downloaded from here https://code.google.com/p/cityhash/downloads/detail?name=cityhash-1.0.3.tar.gz)
The golang implementation you used seems to be a port of version 1.1.1 (can be downloaded from here https://code.google.com/p/cityhash/downloads/detail?name=cityhash-1.1.1.tar.gz)
Unfortunately, these versions seem to be incompatible since version 1.1, as noted in README (emphasis is mine):
CityHash v1.1, October 22, 2012
Add CityHash32(), intended for 32-bit platforms.
Change existing functions to improve their hash quality and/or speed. > Most
of the changes were minor, but CityHashCrc* was substantially reworked
(and made perhaps 10% slower, unfortunately).
Improve README.
I am not sure what is the right thing to do here, maybe BigQuery should update its implementation to match version 1.1.1, or maybe it will be a breaking change to existing users who rely on it. But at least we know what is going on now.

Perl: CPAN - Module modifying and adding functionality

I find a module, that I want to change.
My problem have some features like this:
I want to add functionality and flexibility to this module.
Now this module solves tasks, but web-service, for what it was written, change API
And also, I want to use code of this module.
It is not my module
Fix some bugs
How i should be in this situation?
Inheriting from this module and add functionality and upload to CPAN?
Ask author about my modifications (and reload module)?
Something else?
There are various ways to modify a module as you use it, and I cover most of them in Mastering Perl.
As Dave Cross mentions, send fixes upstream or become part of that project. It sounds like you have the ambition to be a significant contributor. :)
Create a subclass to replace methods
Override or overload subroutines or methods
Wrap subroutines to modify or adapt either inputs or outputs (e.g. Hook::LexWrap)
Create a locally patched version, and store it separately from the main code so it doesn't disappear in an upgrade
For example, this is something I often do directly in program code while I wait for an upstream fix:
use Some::Module; # load the original first
BEGIN {
package Some::Module;
no warnings 'redefine';
if( $VERSION > 1.23 and $VERSION < 1.45 ) {
*broken = sub { ... fixed version ... };
}
}
This way, I have the fix even if the target module is upgraded.
I think that your a and b options are pretty much the best approach - although I'd probably do them the other way round.
Approach the module author with your suggestions. Read the module
documentation to find out how the author likes to be contacted. Some
like email, some like RT, other have more specific methods.
Authors tend to like suggestions better if they come with tests for
the new/changed code and patches that can be applied freely. Perhaps
the code is on Github. Can you fork it, make your changes and send
the author a pull request?
If the author is unresponsive, then consider either forking or
subclassing their code. In this case, naming is important as you
want people to be able to find your module as well as the original
one. You'll also want to carefully document the differences between
your version and the original one so that people can choose which
one they want.

Dependencies in Perl code

I've been assigned to pick up a webapplication written in some old Perl Legacy code, get it working on our server to later extend it. The code was written 10 years ago by a solitary self-taught developer...
The code has weird stuff going on - they are not afraid to do lib-param.pl on line one, and later in the file do /lib-pl/lib-param.pl - which is offcourse a different file.
Including a.pl with methods b() and c() and later including d.pl with methods c() and e() seems to be quite popular too... Packages appear to be unknown, so you'll just find &c() somewhere in the code later.
Interesting questions:
Is there a tool that can draw relations between perl-files? Show a list of files used by each other file?
The same for MySQL databases and tables? Can it show which schema's/tables are used by which files?
Is there an IDE that knows which c() is called - the one in a.pl or the one in d.pl?
How would you start to try to understand the code?
I'm inclined to go through each file and refactor it, but am not allowed to do that - only the strict minimum to get the code working. (But since the code never uses strict, I don't know if I'm gonna...)
Not using strict is a mistake -- don't continue it. Move the stuff in d.pl to D.pm (or perhaps a better name alltogether), and if the code is procedural use Sub::Exporter to get those subs back into the calling package. strict is lexical, you can turn it on for just one package. Such as your new package D;. To find out which code is being called, use Devel::SimpleTrace.
perl -MDevel::SimpleTrace ./foo.pl
Now any warnings will be accompanied by a full back-log -- sprinkle warnings around the code and run it.
I think the MySQL question should be removed, from this. Schema Table mappings have nothing to do with perl, it seems an out of place distraction on this question.
I would write a utility to scan a complete list of all subs and which file they live in; then I would write a utility to give me a list of all function calls and which file they come from.
By the way - it is not terribly hard to write a fairly mindless static analysis tool to generate a call graph.
For many cases, in well-written code, that will be enough to help me out...