How to ETL my PostgreSQL data into a ClickHouse datawarehouse? - postgresql

I have data stored in postgreSQL as data source and I want to load dimensions and fact tables of the clickhouse datawarehouse , I am new to clickhouse and used to use traditional integration tools like Talend and Microsoft SSIS to perform ETL
(PS i'm using docker images for both clickhouse and postgreSQL)

Have a look at the PostgreSQL engine integration here where you can perform SELECT and INSERT queries on ClickHouse on data stored in remote PostgreSQL.
You can also make use of table function as well.

You ingest data from Postgres into Clickhouse using:
External ETL tools like Airbyte and make a connector from Postgres to Clikhouse
Clickhouse integrations table engine to make a view from Clickhouse to Postgres data, after that use insert into query to insert data from that view into the real Clickhouse table

Related

o110.pyWriteDynamicFrame. null

I have created a visual job in AWS Glue where I extract data from Snowflake and then my target is a postgresql database in AWS.
I have been able to connect to both Snowflak and Postgre, I can preview data from both.
I have also been able to get data from snoflake, write to s3 as csv and then take that csv and upload it to postgre.
However when I try to get data from snowflake and push it to postgre I get the below error:
o110.pyWriteDynamicFrame. null
So it means that you can get the data from snowflake in a Datafarme and while writing the data from this datafarme to postgres, you are failing.
You need to check was glue logs to get more understanding why is this failing while writing the data into postgres.
Please check if you have the right version of jars (needed by postgres) compatible with scala(on was glue side).

What is the most recommend way to transfer data from a postgresl db to another postgresdb in aws

We have a production postgresql db available only in glue catalog. What's the best practices to ETL some tables in this database and load the data into another postgresql instance in the same aws account?
This production database is our transactional db and we don't need most of it's tables. We already have some glue ETL's creating tables in S3 (so accessible via Athena) but the goal here is to load into another postgresql instance.
Thanks

Is it possible to connect to a Postgres DB from SageMaker Data Wrangler?

I set up a regular Postgres DB in AWS using the Amazon Relational Database Service (RDS). I would like to ingest this data using data wrangler for inspection and further processing.
Is this possible? I only see S3, Athena, Redshift and SnowFlake as the data ingestion options. Does this mean I must move the data from Postgresql to one of these 4 options before I can use Data Wrangler?
If it's not possible through data wrangler, can I connect to my Postgres through a Jupyter notebook, using a connection string or some kind of option like this? I'm looking to use the data for the SageMaker Feature Store.
Using Amazon RDS with DataWrangler is not possible (yet).
That said, you can connect to Postgres in Jupyter using Python and then use the Feature Store API to ingest data and use it for subsequent tasks.
Source: AWS employee

Postgres to Oracle connection

How to maintain Postgres to Oracle data synchronized?
Currently we are trying to mimic the Oracle to Oracle database link but with Oracle to Postgres. I know that we can add FDW wrappers to connect but the latency is huge.
I have also checked AWS DMS process but it has lot of data type issues when data load and synchronization from Oracle to Postgres happens.
Can someone tell a better way?

I am in need of connecting to DB2 AS/400 database from Oracle Apex.

Is there any feasibility from Oracle to have a connection established to DB2 database so that I can query on DB2 database and generate reports from Oracle Apex?
OR
Is it possible to create a View in Oracle from a remote DB2 database?
OR
What options do i have in order to develop reports in Oracle Apex from the data i have in DB2 database?
(I know, this is an old question and you've already found a workaround. Anyway,) the keyword you might be interested in is gateway. This is Oracle 10g Database Gateway for DB2/400 Installation and User's Guide. I don't know which database you use, but - if 10g is not the one, I hope you'll manage to find the right documentation.
Shortly: after installing the gateway between Oracle and DB2, you'd create a database link. Then, in your Oracle schema, create a view that selects data over that database link from DB2 database. Finally, fetch data in Apex from the view.
As i didn't find a way to directly connect to DB2 from Oracle PL/SQL, i used a work-around. As this is a reporting tool, we are ok to have this tool running with the data which is 1 day off, we did the following:
1) Extract the data required from DB2 database to CSV files. We used a DB2 command which can be run at command line to extract the data into a CSV
2) Then we imported the data into Oracle tables using sqlldr