Is there a way in Debezium to stop data serialization? Trying to get values from source as it is - debezium

I have seen many posts on StackOverflow where people are trying to capture the data from source RDBMS and are using Debezium for the same. I am working with SQL Server. However since the DECIMAL and TIMESTAMP values are encoded by default, it becomes an overhead to decode those values into its original form.
I was looking to avoid this extra decoding step but to no avail. Can anyone please tell me how to import data via Debezium as it is i.e. without serializing it.
I saw some youtube videos where DECIMAL values were extracted in its original form.
FOR EX-> 800.0 from SQL Server is obtained as 800.0 via Debezium and not as "ATiA" (encoded)
But i am not sure how to do this. Can anyone please help me with what configuration will be required for the same on Debezium. I am using Debezium Server for now. Can work with Debezium connectors as well if that's needed.
Any help is appreciated.
Thanks.

It may be a matter of representation of timestamp and decimal values as opposed to encoding.
For timestamps, try using different values of time.precision.mode and for decimals, use decimal.handling.mode.
For MySQL, documentation is here

Related

What's the fastest way to put RDF data (specifically DBPedia dumps) into Postgres?

I'm looking to put RDF data from DBPedia Turtle (.ttl) files into Postgres. I don't really care how the data is modelled in Postgres as long as it is a complete mapping (it would also be nice if there were sensible indexes), I just want to get the data in Postgres and then I can transform it with SQL from there.
I tried using this StackOverflow solution that leverages Python and sqlalchemy, but it seems to be much too slow (would take days if not more at the pace I observed on my machine).
I expected there might have been some kind of ODBC/JDBC-level tool for this type of connection. I did the same thing with Neo4j in less than an hour using a plugin Neo4j provides.
Thanks to anyone that can provide help.

MySQL Connector (Mac) returning Hex instead of String

I was screwing around trying to get Excel on the Mac to get to MySql data the other day, but just realized today that Tableau (my primary use of the Mysql Connector) is returning my blob columns as hex strings...not normal text strings as it usually does.
I cannot figure out how to set characterEncoding on the connector side. Nothing changed on the server side. Any help would be much appreciated. Using mysql connector 8.00.0020 and iodbc to facilitate use.
reverted to 5.3 and it worked fine

Inject big local json file into Druid

It's my first Druid experience.
I have got a local setup of Druid in local machine.
Now I'd like to make some query performance test. My test data is a huge local json file 1.2G.
The idea was to load it into druid and run required SQL query. The file is getting parsed and successfully processed (I'm using Druid web-based UI to submit an injection task).
The problem I run into is the datasource size. It doesn't makes sense that 1.2G of raw json data results in 35M of datasource. Is there any limitation the locally running Druid setup has. I think the test data is processed partially. Unfortunately didn't find any relevant config to change it. Will appreciate if some one is able to shed light on this.
Thanks in advance
With druid 80-90 percent compression is expected. I have seen 2GB CSV file reduced to 200MB druid datasoruce.
Can you query the count to make sure all data is ingested? All please disable approximate algorithm hyper-log-log to get exact count.Druid SQL will switch to exact distinct counts if you set "useApproximateCountDistinct" to "false", either through query context or through broker configuration.( refer http://druid.io/docs/latest/querying/sql.html )
Also can check logs for exception and error messages. If it faces problem to ingest particular JSON record it skips that record.

When inserting Unicode data in ODBC application, how to determine the encoding in which it should be

I have a generic ODBC application reading and writing data via ODBC to some db(can be ms sql, mysql or anything else). The received and sent data can be Unicode. I'm using SQL_C_WCHAR for my bindings in this case.
So I have two questions here:
Can I determine the encoding in which the data came from the ODBC data source?
In which encoding should I send data to the ODBC data source? I'm running parameterised insert statement for this purpose.
My researched showed that some data sources have connection options to set the encoding, but I want to write a generic application working with anything.
Couldn't find any ODBC option telling me the encoding of the data source. Is there something like that? ODBC docs just say use SQL_C_WCHAR. Is SQL_C_WCHAR for UTF-16?
I did some more research and both Microsoft docs and unixodbc docs seem to point out that ODBC only supports UCS-2. So I think all the data sent or received needs to be UCS-2 encoded.

MongoDB to DynamoDB

I have a database currently in Mongo running on an EC2 instance and would like to migrate the data to DynamoDB. Is this possible and what is the most cost effective way to achieve this?
When you ask for a "cost effective way" to migrate data, I assume you are looking for existing technologies that can ease your life. If so, you could do the following:
Export your MongoDB data to a text file, say in tsv format, using mongoexport.
Upload that file somewhere in S3.
Import this data, in S3, to DynamoDB using AWS Data Pipeline.
Of course, you should design & finalize your DynamoDB table schema before doing all this.
Whenever you are changing databases, you have to be very careful about the way you migrate data. Certain data formats maintain type consistency, while others do not.
Then there are just data formats that cannot handle your schema. For example, CSV is great at handling data when it is one row per entry, but how do you render an embedded array in CSV? It really isn't possible, JSON is good at this, but JSON has its own problems.
The easiest example of this is JSON and DateTime. JSON does not have a specification for storing DateTime values, they can end up as ISO8601 dates, or perhaps UNIX Epoch Timestamps, or really anything a developer can dream up. What about Longs, Doubles, Ints? JSON doesn't discriminate, it makes them all strings, which can cause loss of precision if not deserialized correctly.
This makes it very important that you choose the appropriate translation medium. The generally means you have to roll your own solution. This means loading up the drivers for both databases, reading an entry from one, translating, and writing to this other. This is the best way to be absolutely sure errors are handled properly for your environment, that types are kept consistently, and that the code properly translates schema from source to destination (if necessary).
What does this all mean for you? It means a lot of leg work for you. It is possible somebody has already rolled something that is broad enough for your case, but I have found in the past that it is best for you to do it yourself.
I know this post is old, Amazon made it possible with AWS DMS, check this document :
https://docs.aws.amazon.com/dms/latest/userguide/CHAP_Source.MongoDB.html
Some relevant parts:
Using an Amazon DynamoDB Database as a Target for AWS Database
Migration Service
You can use AWS DMS to migrate data to an Amazon DynamoDB table.
Amazon DynamoDB is a fully managed NoSQL database service that
provides fast and predictable performance with seamless scalability.
AWS DMS supports using a relational database or MongoDB as a source.