Cassandra CLI: RowKey with RandomPartitioner = Show Key in Cleartext? - nosql

using Cassandra CLI gives me the following output:
RowKey: 31307c32333239
=> (super_column=3f18d800-fed5-17cf-b91a-0016e6df0376,
(column=date, value=1312289229287, timestamp=1312289229647000)
I am using RandomPartitioner. Is it somehow possible to get the RowKey (from CLI) in Cleartext? Ideally in the CLI but if there is a Helper Class to convert it back into a String, this would also be ok.
I know the key is somehow hashed. If the key can not be "retrieved" (what I have to assume), Is there are helper Class exposed in Cassandra, that I can use to generate the Key based on my original String to compare them?
My Problem: I have stored Records in Cassandra, but using the Key like "user1|order1" I am not able to retrieve the records. Cassandra does not find any records. I assume that somehow my keys are wrong and I need to compare them and find out whtere the problem is...
Thanks very much !! Jens

This question is highly relevant: Cassandra cli: Convert hex values into a human-readable format
The only difference is that in Cassandra 0.8, there is now a "key_validation_class" attribute per column family, and it defaults to BytesType. When the CLI sees that it's BytesType, it represents the key in hex.

Cassandra is treating your RowKey as hex bytes: 31 30 7c 32 33 32 39 in hex is
10|2329 in ASCII, which I guess is your key.
If you want to see your plaintext keys in the CLI, then you need to give the command assume MyColumnFamily keys as ascii; which will make the CLI automatically translate between ASCII and hex for you in the row keys.
Or, when you create the column family, use create column family MyColumnFamily with key_validation_class = AsciiType; which will set the preference permanently.
The hashing occurs at a lower level and doesn't affect your problem.

Related

nvarchar(), MD5, and matching data in Bigquery

I am consolidating a couple of datasets in BQ, and in order to do that i need to run MD5 on some data.
The problem I'm having is a chunk of the data is coming MD5'ed already from Azure, and the original field is nvarchar()
I'm not as familiar with Azure, but what I find is that:
HASHBYTES('MD5',CAST('6168a1f5-6d4c-40a6-a5a4-3521ba7b97f5' as nvarchar(max)))
returns
0xCD571D6ADB918DC1AD009FFC3786C3BC (which is expected value)
where
HASHBYTES('MD5','6168a1f5-6d4c-40a6-a5a4-3521ba7b97f5')
returns
0x94B04255CD3F8FEC2B3058385F96822A which is equivalent to what i get if i run
MD5(TO_HEX(MD5('6168a1f5-6d4c-40a6-a5a4-3521ba7b97f5'))) in Bigquery, but it is unfortunately not what i need to match to, i need to match to the nvarchar version in BQ but i cannot figure out how to do that.
Figured out the problem, posting here for posterity.
The field in Azure is being stored as a nvarchar(50) which is encoded as UTF-16LE

What's a best practice for saving a unique, random, short string to db?

I have a table with a varchar column named key, which is supposed to hold a unique, 8-char random string, which is going to be used as an unique identifier by users. This field should be generated and saved on creation of objects, I have a question about how to create it:
Most of recommendations point to UUID field, but it's not applicable for me because it's too long, and if just get a subset of it then there's no guarantee of uniqueness.
Currently I've just implemented a loop in my backend (not DB), which generates a random string and tries to insert it to DB, and retries if the string turns out to be not unique. But I feel that this is just a really bad practice.
What's the best way to do this?
I'm using Postgresql 9.6
UPDATE:
My main concern is to remove the loop that retries to find a random, short string (or number, doesn't matter) that is unique in that table. AFAIK the solution should be a way to generate the string in DB itself. The only thing that I can find for Postgresql is uuid and uuid-ossp that does something like this, but uuid is way too long for my application, and I don't know of any way to have a shorter representation of uuid without compromising it's uniqueness (and I don't think it's possible theoretically).
So, how can I remove the loop and it's back-and-forth to DB?
Encryption is guaranteed unique, it has to be otherwise decryption would not work. Provided you encrypt unique inputs, such as 0, 1, 2, 3, ... then you are guaranteed unique outputs.
You want 8 characters. You have 62 characters to play with: A-Z, a-z, 0-9 so convert your binary output from the encryption to a base 62 number.
You may need to use the cycle walking technique from Format-preserving encryption to handle a few cases.

qliksense not getting longer strings from hive

We have a few dashboards, wherein we source data from Apache Hive (CDH). the data is sourced through Cloudera hive ODBC connection.
What we are observing is that longer string values from source (e.g. longer than 275-280 chars) show up as null in qliksense.
wondering if anyone else has faced similar issues ? how was it handled then ?
Found it.
The Cloudera Hive Driver sets the default length of strings to 255. This is due to limitations on hive and its metadata. From the driver guide -
In the Default String Column Length field, type the default string
column length to use. Note: Hive does not provide the length for
String columns in its column metadata. This option allows you to tune
the length of String columns.
In the driver configuration, go to Advanced settings, and you can edit the default length for string columns. When i set the length to 99999, all my relevant attributes flowed through.
Hope it helps others.

MongoDB generate UUID with dashes

I need to add random UUID with dashes to my column GUID.
How I should write my sql script, I was using GUID = MD5(UUID()); but it generates UUID without dashes, what would be solution ?
If you use MD5(UUID()), it no longer is a UUID, and the "guaranteed unique" property is also lost. Don't do that.
That said, a UUID is a 128 bits number. It doesn't have dashes. Some representations do, but that's a display issue.

deserialize cassandra row key

I'm trying to use the sstablekeys utility for Cassandra to retrieve all the keys currently in the sstables for a Cassandra cluster. The format they come back in what appears to be serialized format when I run sstablekeys for a single sstable. Does anyone know how to deserialize the keys or get them back into their original format? They were inserted into Cassandra using astyanax, where the serialized type is a tuple in Scala. The key_validation_class for the column family is a CompositeType.
Thanks to the comment by Thomas Stets, I was able to figure out that the keys are actually just converted to hex and printed out. See here for a way of getting them back to their original format.
For the specific problem of figuring out the format of a CompositeType row key and unhexifying it, see the Cassandra source which describes the format of a CompositeType key that gets output by sstablekeys. With CompositeType(UTF8Type, UTF8Type, Int32Type), the UTF8Type treats bytes as ASCII characters (so the function in the link above works in this case), but with Int32Type, you must interpret the bytes as one number.