I have snowflake as a data warehouse and want to use Apache atlas as a data catalog and lineage tool.
I went through the details but not sure it can be used with snowflake.
Is it possible to connect with a snowflake?
Disclaimer: I work for Snowflake.
It is not possible, at this time, to use Atlas as the catalog. You can, however, use a Hive Metastore with Snowflake. I have not tried to use things like the Atlas Hook but perhaps that might be an option.
Related
I'm trying to set up a MSK connector for snowflake and i could hardly see any documentation on how to do it. Unfortunately AWS support person also referred me to use snowflake documentation page.
By following this i can create an EC2 instance and spinoff connector but i wanted to go on serverless mode and use MSK connectors
I'm having hard time with connector properties for snowflake and aws doesnt provide much information about it
As answered on the plugins page, you'd need to upload the Snowflake ZIP/JAR plugins to S3, where they'd be downloaded prior to the connector starting
https://docs.aws.amazon.com/msk/latest/developerguide/msk-connect-plugins.html
I am trying to migrate a Datawarehouse to Delta lake. One thing that I am struggling to figure out is how to connect to Delta Lake (silver and gold) tables outside a spark session. I want to able to connect to these tables using BI tools like Tableau. I am not using databricks and I was wondering if storing these tables in the hive metastore could help. If not this then could someone help me with an alternative approach or if this is feasible or not.
You can have a Hive metastore and a Thrift server with Spark open source and delta.io open source then connect Tableau desktop for instance.
I'm currently working in a Mainframe Technology where we store the data in IBM DB2.
We got a new requirement to use scalable process to migrate the data to a new messaging platform including new database. For that we have identified Kafka is a suitable solution with either KSQLDB or MONGODB.
Can someone able to tell or direct me on how can we connect to IBM DB2 from Kafka to import the data and place it in either KSQLDB or MONGODB?
Any help is much appreciated.
To import the data from IBM DB2 into Kafka, You need to use any connector like the Debezium connector for DB2.
The information regarding the connector can be found in the following.
https://debezium.io/documentation/reference/connectors/db2.html
Connector Configuration
You can also use JDBC Source Connector for the same functionality. The following links are helpful for the configuration.
https://www.confluent.io/blog/kafka-connect-deep-dive-jdbc-source-connector/
A Simple diagram for events flows from RDMS to Kafka topic.
After placing the data into Kafka, we need to transfer that data MongoDb. We need to use Mongo Db Connector to transfer the data from Kafka to mongo Db.
https://www.mongodb.com/blog/post/getting-started-with-the-mongodb-connector-for-apache-kafka-and-mongodb-atlas
https://www.confluent.io/hub/mongodb/kafka-connect-mongodb
The aim that I want to achieve:
is to be notified about DB data updates, for this reason, I want to build the following chain: PostgreSQL -> Kinesis -> Lambda.
But I am now sure how to notify Kisesis properly about DB changes?
I saw a few examples where peoples try to use Postgresql triggers to send data to Kinesis.
some people use wal2json concept.
So I have some doubts about which option to choose, that why I am looking for advice.
You can leverage Debezium to do the same.
Debezium connectors can also be intergrated within the code, using Debezium Engine and you can add transformation or filtering logic(if you need) before pushing the changes out to Kinesis.
Here's a link that explains about Debezium Postgres Connector.
Debezium Server( Internally I believe it makes use of Debezium Engine).
It supports Kinesis, Google PubSub, Apache Pulsar as of now for CDC from Databases that Debezium Supports.
Here is an article that you can refer to for step by step configuration of Debezium Server
[https://xyzcoder.github.io/2021/02/19/cdc-using-debezium-server-mysql-kinesis.html][1]
Our primary datastore is an RDS Postgres database. It would be nice if we could stream all changes to that happen in Postgres to some sink - whether that's kinesis, elasticsearch or any other data store.
We use Postgres 9.5 which has support for 'logical replication'. However, all the extensions that tap into this stream are blocked on RDS. There's a tutorial for streaming the MySQL RDS flavor to kinesis - the postgres equivalent would be ideal. Is this possible currently?
Have a look at https://github.com/disneystreaming/pg2k4j . It takes all changes made to your database and streams them to Kinesis. See the README for an example of how to set this up with RDS. We've been using it in production and have found it very useful for solving this exact problem. Disclaimer: I wrote https://github.com/disneystreaming/pg2k4j
Integrate a central Amazon Relational Database Service (Amazon RDS) for PostgreSQL database with other systems by streaming its modifications into Amazon Kinesis Data Streams. An earlier post, Streaming Changes in a Database with Amazon Kinesis, described how to integrate a central RDS for MySQL database with other systems by streaming modifications through Kinesis. In this post, I take it a step further and explain how to use an AWS Lambda function to capture the changes in Amazon RDS for PostgreSQL and stream those changes to Kinesis Data Streams.
https://aws.amazon.com/blogs/database/stream-changes-from-amazon-rds-for-postgresql-using-amazon-kinesis-data-streams-and-aws-lambda/