Looking for an ERD (Viz) Tool for AVRO .avdl files - visualization

Does anyone have a recommendation for a ER visualization program that accepts .avdl files. Need it to natively accept avdl files versus the json format. I've looked at a few (Hackolade, Dataedo), but they all seem to only accept the .json files. If anyone has a recommendation, it would be greatly appreciated.
Thanks

Dataedo from 9.3 supports Avro, Parquet, ORC and Delta Lakes. You can use it to discover and document schema, and visualize data model with ERDs.

Related

Convert OpenStreetMap POI Data to CSV

I am looking to extract some Point of Interest (POI) data from OpenStreetMap in a tabular format. I have used this link navigated to the relevant country and downloaded the file,
http://download.geofabrik.de/
What I get is a file with the .osm.pbf extension. However, I think it is possible to download the files in other formats like .shp.zip and .osm.bz2. Is there some way that I can convert this data into a tabular format like a CSV file?
I came across a tool called Osmosis which can be used to manipulate data in these formats, but I am not sure if it can be used for this purpose,
https://wiki.openstreetmap.org/wiki/Osmosis
I was able to successfully install in on my Windows machine though.
To be frank, I am not even sure if this gives me what I want.
In essence, what I am looking for is Sri Lankan POI data that contains the following attributes,
https://data.humdata.org/dataset/hotosm_lka_points_of_interest
If the conversion of this file does not give me data in this format, then I am open to other approaches as well? What is the best way to go about acquiring this data?

Is there a way in Debezium to stop data serialization? Trying to get values from source as it is

I have seen many posts on StackOverflow where people are trying to capture the data from source RDBMS and are using Debezium for the same. I am working with SQL Server. However since the DECIMAL and TIMESTAMP values are encoded by default, it becomes an overhead to decode those values into its original form.
I was looking to avoid this extra decoding step but to no avail. Can anyone please tell me how to import data via Debezium as it is i.e. without serializing it.
I saw some youtube videos where DECIMAL values were extracted in its original form.
FOR EX-> 800.0 from SQL Server is obtained as 800.0 via Debezium and not as "ATiA" (encoded)
But i am not sure how to do this. Can anyone please help me with what configuration will be required for the same on Debezium. I am using Debezium Server for now. Can work with Debezium connectors as well if that's needed.
Any help is appreciated.
Thanks.
It may be a matter of representation of timestamp and decimal values as opposed to encoding.
For timestamps, try using different values of time.precision.mode and for decimals, use decimal.handling.mode.
For MySQL, documentation is here

save PostgreSQL data in Parquet format

I'm working a project which needs to generate parquet files from a huge PostgreSQL database. The data size can be gigantic (ex: 10TB). I'm very new to this topic and has done some research online but did not find a direct way to convert the data to Parquet file. Here are my questions:
The only feasible solution I saw is to load Postgres table to Apache Spark via JDBC and save as a parquet file. But I assume it will be very slow while transferring 10TB data.
Is it possible to generate a huge parquet file size that is 10 TB? Or is it better to create multiple parquet files?
Hope my question is clear and I really appreciate any helpful feedbacks. Thanks in advance!
Use the ORC format instead of the parquet format for this volume.
I assume the data is partitioned, so I think it's a good idea to extract in parallel taking advantage of data partitioning.

Importing and exporting KML into and from DashDB for Geospatial Analysis

I am a recent convert to IBM's DashDB, and I considering proposing to use it at my work. My case would greatly be bolstered if I could show good, easy integration for geospatial analytics data, namely loading and performing SQL filtering on geodata currently in .shp or .kml formats. If it would be possible to also export the filtered data into a KML as a result that would be AMAZING.
So, to give a practical example, say I have an .SHP file with all the zipcodes in the US, I want to select the shape for the 02138 zip code and send it to the query-sender in KML format.
Does anyone have experience with that?
Loading .kml file directly to dashDB is currently not possible, as answered by another post here. You may use ArcGIS for Desktop to connect to dashDB if you want to do this.
https://gis.stackexchange.com/questions/141194/importing-and-exporting-kml-into-and-from-dashdb-for-geospatial-analysis
On the other hand, uploading a .shp file is natively supported and is demonstrated in this tutorial below. Note that dashDB requires the upload file to be compressed in .zip, .tar or .gz file formats.
https://www-01.ibm.com/support/knowledgecenter/SS6NHC/com.ibm.swg.im.dashdb.doc/tutorial/analyze-geospatial-data.html
Also, ArcGIS for Desktop should let you use its export features as well.
There are 2 options loading shape files: via the gui ("Load geospatial data") or via the CLPPlus command IDA LOADGEOSPATIALDATA .
Once the data is loaded it resides in the internal geometry format (e.g. as ST_Polygon), and can then be accessed in various formats through spatial functions imbedded in SQL calls. Unfortunately, kml is not one of the formats that dashDB supports as of now - options are GML, WKT, WKB.

Importing AccessDB and Oracle directly into MongoDB

I am receiving .dmp and .mdb files from a customer & need to get that data into MongoDB.
Is there any way to straight import these file types into Mongo?
The goal is to programmatically ingest these into mongo in any way I can. The only rule is that customer will not change their method of data delivery, meaning I'm stuck with the .dmp and .mdb files as a source.
Any assistance would be greatly appreciated.
Here are a few options/ideas:
Convert mdb to csv, then use mongoimport --type csv to import into MongoDB.
Use an ETL tool, e.g. Pentaho, Informatica, etc. This will give you much more flexibility for doing any necessary transformation/conversion of data.
Write a custom ETL tool, using libraries that know how to read mdb and dmp files.
You don't mention how you plan to use this data, how many tables are in the database, and how normalized the tables are. Depending on the specifics of your use case, it's very possible that loading the data from Access "as is" will not be a good choice since normalized schemas are not a good fit for MongoDB and MongoDB does not natively support joins. This is where an ETL tool can help, by extracting the source data and transforming it into an appropriate JSON structure.
MongoDB has released ODBC drivers. Go Here MongoDB ODBC Drivers connect MSAccess directly to MongoDB through ODBC. Voila!