Transfer data from Cassandra to Postgressql - postgresql

I would like to know it will have any solution
Problem :
I have Cassandra database for saving large scale data from other sources continuously. Application-based data are saving in postgressql. For functionality, I want to query all data from postgresql. so I would like to save Cassandra data consistently to postgressql database based on data coming in Cassandra.
Is it possible?
Please suggest

I would like to save Cassandra data consistently to PostgreSQL database based on data coming in Cassandra.
There are no special utilities for this. You need to create your own service to gather data from Cassandra, process it and put results to PostgreSQL.

Sure. You can use Change Data Capture (CDC) to copy the data from Cassandra to PostgreSQL as and when there is a change in Cassandra data. One option is to use Kafka Connect with appropriate coonectors.

Related

Can I use grafana with a relational database not listed in the supported data source list?

I need to show metrics in real time but my metrics are stored in a relational database not supported by the datasources listed here https://grafana.com/docs/grafana/latest/http_api/data_source/
Can I somehow provide the JDBC (or other DB driver) to Grafana?
As #danielle clearly mentioned, "There is no direct support for JDBC or ODBC currently. You could get this data in time series form and into Grafana if you are prepared to do some programming.
The simple json data source is a generic backend that could make JDBC/ODBC calls to MapD and then transform the data into the right form for Grafana."
https://github.com/grafana/grafana/issues/8739#issuecomment-312118425
Though this comment is a bit old, i'm pretty sure there is no out of the box way to visualize data using JDBC/ODBC, yet.
One possible approach can make use of:
Grafana can access PostgreSQL
PostgreSQL can transparently display data in other databases as though it was a PostgreSQL table through Foreign Data Wrappers
Doing it this way, you'd use PostgreSQL to act as a gateway to the data. Depending on the table structure, you might also need to create a view in PG to shape the data to match Grafana's requirements for PG data source.

What value does Postgres adapter for spark/hadoop add?

I am not an HDFS nerd but coming from traditional RDMS background, I am scratching surface with newer technologies like Hadoop and Spark. Now, I was looking at my options when it comes to SQL querying on Spark data.
What I realized that Spark inherently supports SQL querying. Then I came across this link
https://www.enterprisedb.com/news/enterprisedb-announces-new-apache-spark-connecter-speed-postgres-big-data-processing
Which I am trying to make some sense of. If I am understanding it correctly. Data is still stored in HDFS format but Postgres connector is used as a query engine? If so, in presence of an existing querying framework, what new value does this postgress connector add?
Or I am misunderstanding what it actually does?
I think you are misunderstanding.
They allude to the concept of Foreign Data Wrapper.
"... They allow PostgreSQL queries to include structured or unstructured data, from multiple sources such as Postgres and NoSQL databases, as well as HDFS, as if they were in a single database. ...
"
This sounds to me like the Oracle Big Data Appliance approach. From Postgres you can look at the world of data processing it logically as though it is all Postgres, but underwater the HDFS data is accessed using Spark query engine invoked by the Postgres Query engine, but you need not concern yourself with that is the likely premise. We are in the domain of Virtualization. You can combine Big Data and Postgres data on the fly.
There is no such thing as Spark data as it is not a database as such barring some Spark fomatted data that is not compatible with Hive.
The value will be invariably be stated that you need not learn Big Data etc. Whether that is true remains to be seen.

Extract data from MongoDB with Sqoop to write on HDFS?

I am concerning about extracting data from MongoDB where my application transact most of the data from MongoDB.
I have worked on sqoop to extract data and found RDBMS gel up with HDFS via sqoop. However, no clear direction found to extract data from NoSQL DB with sqoop to dump it over HDFS for big chunk of data processing?
Please share your suggestions and investigations.
I have extracted static information and data transactions from MySQL. Simply, used sqoop to store data in HDFS and processed the data. Now, I have some live transactions of 1million unique emailIDs per day which data modelled into MongoDB. I need to move data from mongoDB to HDFS for processing/ETL. How can I achieve this goal using Sqoop. I know I can schedule my task but what should be the best approach to take out data from mongoDB via sqoop.
Consider 5DN cluster with 2TB size. Data size varies from 1GB ~ 2GB in peak hours.
Sqoop is applied to import data only from relational databases. There are other ways to get data from mongo to Hadoop.
eg: https://docs.mongodb.com/ecosystem/tools/hadoop/
Or else you can use any data flow management tools like Nifi or Streamsets and get data from mongo in realtime.

forwarding data from one source to another in real time

I have a legacy system that is capable of inserting updating data from its database to remote RDBMS (using jdbc driver) in real time. I cannot change the code since I don't have it.
We are thinking of moving this data to nosql data source like cassandra.
I am thinking of deploying postgres in the middle and pushing it to cassandra or writing it to flat file. Since there are frequent updates I will have to store the data in two database. Is there any ETL process which can listen to sql queries (insert,update,delete) and forward it to different source?
One option would be to use bottled water to capture changes in postgresql and a create a consumer that would apply those changes to e.g. cassandra.

HDFS to PostgreSQL

We need a process in place to pull data from Hadoop Distributed File System (HDFS) to a relational DB (PostgreSQL) on a regular basis. We will need to transfer several million records per hour and I am looking for the best industry standards to move data out of HDFS. Does any one have any suggestions?
The idea is for a web app to interact with PostgreSQL which will have aggregated data.
Sqoop is built for the purpose of moving data between relational data stores and Hadoop. Specifically, you want sqoop-export.