deserialize cassandra row key - scala

I'm trying to use the sstablekeys utility for Cassandra to retrieve all the keys currently in the sstables for a Cassandra cluster. The format they come back in what appears to be serialized format when I run sstablekeys for a single sstable. Does anyone know how to deserialize the keys or get them back into their original format? They were inserted into Cassandra using astyanax, where the serialized type is a tuple in Scala. The key_validation_class for the column family is a CompositeType.

Thanks to the comment by Thomas Stets, I was able to figure out that the keys are actually just converted to hex and printed out. See here for a way of getting them back to their original format.
For the specific problem of figuring out the format of a CompositeType row key and unhexifying it, see the Cassandra source which describes the format of a CompositeType key that gets output by sstablekeys. With CompositeType(UTF8Type, UTF8Type, Int32Type), the UTF8Type treats bytes as ASCII characters (so the function in the link above works in this case), but with Int32Type, you must interpret the bytes as one number.

Related

Decimal number from JSON truncated when read from Mapping Data Flow

When reading a decimal number from JSON data files using a Mapping Data Flow, the decimal digits are truncated.
[Source data]
{
"value": 1123456789.12345678912345678912
}
In Data Factory, the source dataset is configured with no schema. The Mapping Data Flow projection defines a decimal data type with sufficient precision and scale.
[Mapping Data Flow script]
source(output(
value as decimal(35,20)
),
...
However, when viewing the value in the 'Data preview' window, or reviewing pipeline output, the value is truncated.
[Output]
1123456789.12345670000000000000
This issue doesn't occur with other file formats, such as delimited text.
When previewing the data from the source dataset, the decimal digits are truncated in the same way. This occurs whether or not a schema is set. If a schema is set, the data type is number rather than decimal since it's JSON. Mozilla Developer Network documentation calls out the varied number of decimal digits supported by browsers, so I wonder if this is down to the JSON parser being used.
Is this expected behaviour? Can Data Factory be configured to support a the full number of decimal places when working with JSON? Unfortunately this is calling into question whether it's viable to perform aggregate calculations in Data Factory.
I've created a same test as yours and got the same result as follows:
After I changed the soure data, I put double quotes on the value:
Then I use a toDecimal(Value,35,20) to convert the string type to decimal type:
It seems work well. So we can get conclusion:
Don't let ADF do default data type conversion, it will truncate the length of the value.
This issue doesn't occur with other file formats, such as delimited text. Because the default value is the string type.
Its common issue with JSON parser, FloatParseHandling setting is available in .net library but not in ADF.
FloatParseHandling can set to Decimal while parsing the file through .net.
Until the setting is made available in ADF need to try the workaround - using quotes ["] at both end to make the value string and get it converted after loading.

BLOB to String conversion - DB2

A table in DB2 contains BLOB data. I need to convert it into String so that it can be viewed in a readable format. I tried options like
getting blob object and converting to byte array
string buffer reader
sqoop import using --map-column-java and --map-column-hive options.
After these conversions also i am not able to view the data in readable format. Its in an unreadable format like 1f8b0000..
Please suggest a solution on how to handle this scenario.
I think you need to look at the CAST function.
SELECT CAST(BLOB_VAR as VARCHAR(SIZE) CCSID UNICODE) as CHAR_FLD
Also, be advised that max value of SIZE is 32K.
Let me know if you tried this.
1f8b0000 indicates the data in gzip form and therefore you have to unzip it.

Scala Slick, Trouble with inserting Unicode into database

When I create a new row of data that contains several columns that may contain Unicode, the columns that do contain Unicode are being corrupted.
However, if I insert that data directly, using the mysql-cli Slick will retrieve that Unicode data fine.
Is there anything I should add to my table class to tell Slick that this Column may be a Unicode string?
I found the problem, I have to set the character encoding, for the connection.
db.default.url="jdbc:mysql://localhost/your_db_name?characterEncoding=UTF-8"
You probably need to configure that on the db schema side by setting the right collation.

Inserting string in FOR BIT DATA column in DB2

When i try inserting a string value in a column in DB2, which is defined as CHAR(15) FOR BIT DATA, i notice that it gets converted into some other format, probably hexadecimal.
On retrieving the data, i get a byte array, and on trying to convert it back to ASCII, using System.Text.Encoding.ASCII.GetString, i get back a string of some junk characters.
Anyone faced this issue?
Any resolution?
Thanks in advance.
For Bit Data prevents the code page convertion between the client and the server. This is normally used to insert binary data, instead of strings.
Please, take a look at this forum, where many cases are proposed and solved: http://www.dbforums.com/db2/1663992-bit-data.html
You could eventually make a cast to database page (it depends you platform)
CAST(c1 AS CHAR(n) FOR SBCS DATA)
CAST (<forbitdataexpression> AS [VARCHAR|CHAR][(<n>)] FOR [SBCS|DBCS] DATA)
References
http://bytes.com/topic/db2/answers/180874-problem-db2-field-type-char-n-bit-data
http://bytes.com/topic/db2/answers/182124-function-convert-bit-data-column-string
http://publib.boulder.ibm.com/infocenter/db2luw/v9r7/topic/com.ibm.db2.luw.sql.ref.doc/doc/r0023459.html

Cassandra CLI: RowKey with RandomPartitioner = Show Key in Cleartext?

using Cassandra CLI gives me the following output:
RowKey: 31307c32333239
=> (super_column=3f18d800-fed5-17cf-b91a-0016e6df0376,
(column=date, value=1312289229287, timestamp=1312289229647000)
I am using RandomPartitioner. Is it somehow possible to get the RowKey (from CLI) in Cleartext? Ideally in the CLI but if there is a Helper Class to convert it back into a String, this would also be ok.
I know the key is somehow hashed. If the key can not be "retrieved" (what I have to assume), Is there are helper Class exposed in Cassandra, that I can use to generate the Key based on my original String to compare them?
My Problem: I have stored Records in Cassandra, but using the Key like "user1|order1" I am not able to retrieve the records. Cassandra does not find any records. I assume that somehow my keys are wrong and I need to compare them and find out whtere the problem is...
Thanks very much !! Jens
This question is highly relevant: Cassandra cli: Convert hex values into a human-readable format
The only difference is that in Cassandra 0.8, there is now a "key_validation_class" attribute per column family, and it defaults to BytesType. When the CLI sees that it's BytesType, it represents the key in hex.
Cassandra is treating your RowKey as hex bytes: 31 30 7c 32 33 32 39 in hex is
10|2329 in ASCII, which I guess is your key.
If you want to see your plaintext keys in the CLI, then you need to give the command assume MyColumnFamily keys as ascii; which will make the CLI automatically translate between ASCII and hex for you in the row keys.
Or, when you create the column family, use create column family MyColumnFamily with key_validation_class = AsciiType; which will set the preference permanently.
The hashing occurs at a lower level and doesn't affect your problem.