BigQuery: Hashing a String Doesn't Match CityHash - hash

Trying to get my external CityHash to return the same value as BigQuery Hash().
Here are the values I'm trying to match:
The only hashed string that matches is a blank string.
In the BigQuery Query Reference, it mentions that it uses the CityHash libarary. I've tried using multiple external libraries for CityHash, and they're all consistent with each other, but not with BigQuery Hash()
Here is an example of CityHash in Go (Golang):
package main
import (
"fmt"
"bitbucket.org/creachadair/cityhash"
)
func main() {
var bytesToHash = []byte("mystringtohash")
myHash := int64(cityhash.Hash64(bytesToHash))
fmt.Printf("Hashed version of '%s': %d\n", bytesToHash, myHash)
bytesToHash = []byte("")
myHash = int64(cityhash.Hash64(bytesToHash))
fmt.Printf("Hashed version of '%s': %d\n", bytesToHash, myHash)
}
Here is the output from my program:
Hashed version of 'mystringtohash': -6615946700494525143
Hashed version of '1234': 882600748797058222
Hashed version of '': -7286425919675154353
Is BigQuery doing something special with the string before hashing it?

OK, I spent some time going through the code, and here is what I think happened.
BigQuery's implementation of CityHash is based on code in version 1.0.3 (can be still downloaded from here https://code.google.com/p/cityhash/downloads/detail?name=cityhash-1.0.3.tar.gz)
The golang implementation you used seems to be a port of version 1.1.1 (can be downloaded from here https://code.google.com/p/cityhash/downloads/detail?name=cityhash-1.1.1.tar.gz)
Unfortunately, these versions seem to be incompatible since version 1.1, as noted in README (emphasis is mine):
CityHash v1.1, October 22, 2012
Add CityHash32(), intended for 32-bit platforms.
Change existing functions to improve their hash quality and/or speed. > Most
of the changes were minor, but CityHashCrc* was substantially reworked
(and made perhaps 10% slower, unfortunately).
Improve README.
I am not sure what is the right thing to do here, maybe BigQuery should update its implementation to match version 1.1.1, or maybe it will be a breaking change to existing users who rely on it. But at least we know what is going on now.

Related

How to implement Python's digest function in Raku?

In Python, there is a function called digest from hashlib module, which returns the digest of a byte string:
import hashlib
id = "65766"
song_id = bytearray(id, "u8")
m = hashlib.md5(song_id)
result = m.digest()
print(result)
# Output
# b"\xc9j\xa4/vy}+'\xe6\x8e\xe4\xcc\xd8\xa8\xc8"
I find a module from raku.land named Digest::MD5, but it doesn't provide the digest sub:
my $d = Digest::MD5.new;
my $id = "65766";
my Buf $md5-buf = $d.md5_buf($id);
# ???
And I don't want to introduce Inline::Python or Inline::Perl5 into my project, Is it possible to implement digest sub in Raku?
TL;DR Try the md5 sub of the Digest package.
Digest
A glance at raku.land shows various options. The first one was last updated an hour ago, and while that doesn't prove anything about quality or functionality, it's at least "promising". (And it's grondilu. I trust cosimo too, but the updates suggest grondilu is engaged and cosimo isn't.)
So I suggest you read Digest's README and/or install it and read its code. At a glance I would expect the md5 sub to work.
Digest::MD5
From its README:
An interface-compatible port of Perl 5 Digest::MD5
Raku's standard strings are dramatically different from Perl's or Python's. I won't comment further other than to say this seems likely to be a major source of friction that is pointless unless you really need to have interface compatibility with Perl.
Should work with latest (2012.01) release of Rakudo
Wow. It's had an update in 2017, but I see unresolved PRs and, well, to be as open minded as possible, I'll say this package might work for folk who are:
Needing the same interface as the corresponding Perl package.
OK with presuming that everything is as it was more than 10 years ago (4 years before the first official version of the language and compiler were released) or updating the package if needed.
Willing to consider a package that seems to be no longer actively stewarded.
To be pragmatic / closed minded, I'd initially presume this package should only be considered if all of the above applies and you've failed to meet your (relatively simple/basic) needs some much more promising way.

music21 getElementsByClass not showing any output for class stream.Voice

I am struggling to understand why the below code is throwing an error when it ran seamlessly about a year back. The code snippet is from a popular Coursera course. Does the Music21 package has some recent changes around stream.Voice?
data_fn = 'data/original_metheny.mid'
midi_data = converter.parse(data_fn)
melody_stream = midi_data[5] # For Metheny piece, Melody is Part #5.
melody1, melody2 = melody_stream.getElementsByClass(stream.Voice)
The error thrown is ValueError: not enough values to unpack (expected 2, got 0), which means there is no output for stream.Voice class when previously there were outputs for the same data (midi file). melody_stream.getElementsByClass('Measure') does show outputs.
Can you guide how to debug this?
Yes, one of the improvements in music21 v.7 is that files imported from MIDI now have a similar representation to files imported from MusicXML and other formats. Specifically, Parts now have Measures, which may or may not have Voices, rather than Parts directly containing Voices. Code should not depend on finding Voices directly contained in Parts, which is what this example was doing.
Instead, use this code to find all the measure-voices:
melody_stream.recurse().getElementsByClass(stream.Voice)
Or, equivalently, use the shortcut syntax in v.7:
melody_stream[stream.Voice]
Or, if you don't want the measures at all, call flatten() or chordify() depending on your use case.
What worked for me was downgrading music21 package to a version older than 7.x. So if you already have a newer version of music21 package installed, remove it using pip uninstall music21, Then install the 6.7.0 version using pip install music21==6.7.0.

Kaitai Struct Parameter Type

I am trying to pass a parameter to ksy file. The parameter is of type another ksy file. The reason is that i need to access all the fields from the ksy file passed as parameter.
Is that possible?
If yes, would you please provide me with syntax code snippet so I can mimic it.
If no, what would be another solution?
Thank You.
Affiliate disclaimer: I'm a Kaitai Struct maintainer (see my GitHub profile).
First, I recommend always using the development version of the Kaitai Struct Web IDE (https://ide.kaitai.io/devel/), not the stable one. The stable IDE deployed at https://ide.kaitai.io/ has KS compiler of version 0.8, which is indeed the latest stable version, but already 2 years old at the moment. But the project is under active development, new bug fixes and improvements are coming every week, so the stable Web IDE is pretty much outdated. And thanks to the recent infrastructure enhancement, the devel Web IDE now gets rebuilt every time the compiler is updated, so you can use even the most recent features.
However, you won't be able to simulate the particular situation you describe in the Web IDE, because it can't currently handle top-level parameteric types (there is no hook where you can pass your own values as arguments). But it should work in a local environment. You can compile the commontype.ksy and pty.ksy specs in the Web IDE to the target language you want to use (the manual shows how to do it). The code putting it together could look like this (Java):
Commontype ct = new Commontype(new ByteBufferKaitaiStream(new byte[] { 80, 75 }));
Pty r = new Pty(
new ByteBufferKaitaiStream(new byte[] { 80 }), // IO stream
ct // commonword
);
Note that the actual parameter order of the Pty constructor may be different, e.g. in Python come the custom params (commonword) first and then the IO object. Check the generated code in your particular language.

`#babel/runtime` and `#babel/plugin-transform-runtime` versions

Are #babel/runtime and #babel/plugin-transform-runtime supposed to be on the same version (e.g. both 7.2.0 exactly)? Or can I (as a library author) specify #babel/runtime dependency as ^7.0.0, whilst having the latest #babel/plugin-transform-runtime?
I'm aware that during the beta versions of Babel 7, there was a breaking change in beta.56 (see https://stackoverflow.com/a/51686837/2148762), but I'm guessing this should no longer be the case with the current stable version?
The reason I ask this is I'd ideally want the helpers from #babel/runtime to be shared across different packages, and to me leaving the version range open seems like a good idea. But at the same time, I'm not sure how low I should go (^7.0.0 or ^7.2.0), and whether there's an implicit contract between #babel/runtime and #babel/plugin-transform-runtime with regards to version numbers.
By default, #babel/plugin-transform-runtime is only allowed to output references to #babel/runtime that work on ^7.0.0 because it does not know what version you'd otherwise want to use, and doing anything else would cause lots of issues for users. This means that what you want to do is safe. The downside of this is that, if we add new helpers moving forward, your code would not be able to take advantage of the #babel/runtime version of them (because you might still be using a #babel/runtime version that doesn't have them.
Users can specify the version in the arguments for the transform, if you want to specifically make use of helpers that may have been added to Babel since 7.0.0, e.g.
{
"plugins": [
["#babel/plugin-transform-runtime", { version: "^7.2.0" }],
]
}
would then require you to have "#babel/runtime": "^7.2.0" in your package.json.
For instance, since support for the newer decorators proposal didn't land until Babel 7.1.5, if you use transform-runtime and non-legacy decorators, the decorators helper will still be inserted into every file where you use decorators, instead of importing it from #babel/runtime. To get the shared helper, you need to specify version: "^7.1.5" in your options for transform-runtime.
Can I (as a library author) specify #babel/runtime dependency as ^7.0.0, whilst having the latest #babel/plugin-transform-runtime?
Yes, this is safe.
I'm guessing this should no longer be the case with the current stable version?
Correct, that issue was because people failed to take the beta versioning into account.

latencySmoothFactor equivalence in mongo driver 2.12.0

The MongoEnvironmentSetter#setLatencySmoothFactor method updates the system property com.mongodb.latencySmoothFactor and it isn't available since 2.12.0 ( related git hashes 0375c984fccf9cb0868b406c145f8fd3e263348c 1ae976fa2342cdddeade622f293dc3fccbb80a58). I found the following tickets related to that:
https://jira.mongodb.org/browse/JAVA-763
https://jira.mongodb.org/browse/JAVA-786
https://jira.mongodb.org/browse/JAVA-859
https://jira.mongodb.org/browse/JAVA-930
https://jira.mongodb.org/browse/JAVA-931
But I didn't find the equivalence to this property. Any thoughts? Which are the steps to do in a migration if some application has configured this property?
There is no equivalent in 2.12, as the driver no longer performs any smoothing of round-trip times. Smoothing may be added in a future release, but most likely it won't be configurable. I'm not sure what the MongoEnvironmentSetter is, but assuming this is something you control, your best option is to ignore that property or remove it.