Synchronizing data between Hadoop and PostgreSql using SymmetricDs - postgresql

I'm using Hadoop to store the data of our application. How can I synchronize data between PostgreSql and Hadoop? I 'm using SymmetricDS as the replication tool.

If hadoop only copies data from PostgreSQL and no updates are done on the hadoop site, try using sqoop - simple database into hadoop import tool.

If you want to continue to use SymmetricDS you can implement an IDatabaseWriter. Here is an example of writing to MongoDB. https://github.com/JumpMind/symmetric-mongo

Related

Using AWS Glue Python jobs to run ETL on redshift

We have a setup to sync rds postgres changes into s3 using DMS. Now, I want to run ETL on this s3 data(in parquet) using Glue as scheduler.
My plan is to build SQL queries to do the transformation, execute them on redshift spectrum and unload data back into s3 in parquet format. I don't want to Glue Spark as my data loads do not require that kind of capacity.
However, I am facing some problems connecting to redshift from glue, primarily library version issues and the right whl files to be used for pg8000/psycopg2. Wondering if anyone has experience with such implementation and how were you able to manage the db connections from Glue Python shell.
I'm doing something similar in a Python Shell Job but with Postgres instead of Redshift.
This is the whl file I use
psycopg_binary-2.9.2-cp36-cp36m-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
An updated version can be found here.

Is it possible to connect to a Postgres DB from SageMaker Data Wrangler?

I set up a regular Postgres DB in AWS using the Amazon Relational Database Service (RDS). I would like to ingest this data using data wrangler for inspection and further processing.
Is this possible? I only see S3, Athena, Redshift and SnowFlake as the data ingestion options. Does this mean I must move the data from Postgresql to one of these 4 options before I can use Data Wrangler?
If it's not possible through data wrangler, can I connect to my Postgres through a Jupyter notebook, using a connection string or some kind of option like this? I'm looking to use the data for the SageMaker Feature Store.
Using Amazon RDS with DataWrangler is not possible (yet).
That said, you can connect to Postgres in Jupyter using Python and then use the Feature Store API to ingest data and use it for subsequent tasks.
Source: AWS employee

Creating a Postgres database with JDBC Driver when starting a SpringBoot app?

When I want to use a Mysql database in a Springboot app, I am able to create it on start via a string in properties similar to this:
spring.datasource.jdbc-url=jdbc:mysql://localhost:3306/testDb?createDatabaseIfNotExist=true&autoReconnect=true&useSSL=false&allowPublicKeyRetrieval=true
Nevertheless PostgreSQL seems to ignore this:
spring.datasource.jdbc-url=jdbc:postgresql://localhost:5432/testdb?createDatabaseIfNotExist=true&autoReconnect=true&useSSL=false&allowPublicKeyRetrieval=true
Is there a way to create a PostgreSQL db on start of a SpringBoot app before Flyway attempts to initiate tables?
Postgres doesn't support creating a database on demand via the JDBC URL. You can learn about what configuration is possible in the documentation.

How can I connect to Denodo using Python Pandas and SQLAlchemy?

I'm trying to connect to Denodo using Python SQLAlchemy, and create a DataFrame from a table. My environment is Cloudera Data Science Workbench. Can I use psycopg2 and the PostgreSQL connection string?
This won't work as Denodo dabbled with PostgreSQL schemas and is thus not compatible with it (or any other dialect). I use engine = sqlalchemy.create_engine("mssql+pyodbc://DenodoODBC"). DenodoODBC settings are taken from my ODBC Data Sources.
EDIT
In April, Denodo added a python connection guide:

Connect icCube with Reshift

in icCube 5.1 there is no Redshift as list of supported JDBC connections.
How to create a data source in icCube on Amazon Redshift ?
A first solution is using the Postgres jdbc driver. Redshift is based on Postgres so it also works (for how long is a good question).
The second is a bit more complicated as you need to add Reshift jdbc driver to icCube. First download jdbc driver from amazon from here, after follow this instructions to add a library to icCube.
Once done you've to configure a new data-source :