PostgreSQL Chage Data Capture in Talend - postgresql

I need to capture PostgreSQL changes and I would prefer if it is log based.
How does tpostgresqlCDC work? Is it log/trigger based?
Is this component available on Talend open studio for data integration suit? I know it is available on the big data version, but I can seem to find it in Talend open studio for data integration?

Except AS400 and Oracle, all Talend CDC - trigger based
plus:
it only in subscription version
at least in MySQL realisations - I found some wrong logic
opposite to MySQL it not so wide available, but You can try to check:
Bottledwater
wla2jjson
Debezium , main site
CDC + Kafka - give real flexibility

Related

Integrating Confluence and PostgreSQL via rest api's

I'm new to confluence and would like to know that is there anyway I can connect part of an existing confluence page with another postgreSQL db by making API calls instead of creating any sockets from Confluence infrastructure. The below Image might help to understand what I want to achieve. I'm open to any or all options that can help me achieve this.
Requirement:
Have a confluence page updating the frontend with data from DB
No/Minimal changes to the confluence Infra backend
As I click on get data on the front end, It should fetch data from the DB and populate on the screen
I have tried googling all the similar solutions that I can find but I couldn't find any that suits the specific requirement that I have. I tried looking at Atlassian's page for connecting with DB and other db connecting guides from the below mentioned sources.
Source 1 - Atlassian
Source 2 - Atlassian
These two sources shows how to connect the DB using a JDBC connection to confluence and troubleshoot any issues arising out of it. Which I want to keep as the last resort to implement.
Source 3 - Agix - uses JDBC
This article also shows a way to connect Confluence server to db via jdbc, hosted on CentOs server.
Source 4
This shows a way to connect Jira to DB again utilising the Jira Setup configuration.
Please note - I want to touch the Existing Confluence Infra as minimal as possible.
Update:- I have used the data source for the space to get the DB connected. Now the challenge is to get the Data from user and feed into the DB. Any leads, How can I do that? I'm using SQL macro to fetch the data from the DB but not sure how to feed user input from a form to the DB.
If you mean to use PostgreSQL as core data DB for Confluence then you just need to follow those guides you specified links to as Confluence supports most SQL databases. But if you mean to get data from some other PostgreSQL DB just as container of some data or system - it seems to be better option to configure separate DB for Confluence (as it is rather big) and use Java API/ REST API to integrate the systems.

Implement Oracle external table like functionality in Azure managed postgresql

Currently we are using Oracle 19c external table functionality on-prem whereby CSV files are loaded to a specific location on DB server and they get automatically loaded into an oracle external table. The file location is mentioned as part of the table DDL.
We have a requirement to migrate to azure managed postgresql. As per checking the postgresql documentation, similar functionality as oracle external table can be achieved in standalone postgresql using "foreign tables" with the help of file_fdw extension. But in azure managed postgresql, we cannot use this since we do not have access to the DB file system.
One option I came across was to use azure data factory but that looks like an expensive option. Expected volume is about ~ 1 million record inserts per day.
Could anyone advise possible alternatives? One option I was thinking was to have a scheduled shell script running on an azure VM which loads the files to postgresql using PSQL commands like \copy. Would that be a good option for the volume to be supported?
Regards
Jacob
We have one last option that could be simple to implement in migration. We need to use Enterprise DB (EDB) which will avoid the vendor lock-in and also it is free of cost.
Check the below video link for the migration procedure steps.
https://www.youtube.com/watch?v=V_AQs8Qelfc

Automate data loading to Google Sheet from PostgreSQL database

I would like to create an automated data pulling from our PostgreSQL database to a Google sheet. I've tried JDBC service, but it doesn't work, maybe incorrect variables/config. Does anyone already try doing this? I'd also like to schedule the extraction every hour.
According the the documentation, only Google Cloud SQL MySQL, MySQL, Microsoft SQL Server, and Oracle databases are supported by Apps Script's JDBC. You may have to either move to a new database or develop your own API services to handle the connection.
As for scheduling by the hour, you can use Apps Script's installable triggers.

Postgres ODBC Bulk Loading Slow on IBM SPSS

I have the official Postgres ODBC drivers installed and am using IBM SPSS to try and load 4 million records from a MS SQL data source. I have the option set to bulk load via ODBC, but the performance is REALLY slow. When I go SQL-SQL the performance is good, when I go Postgres-Postgres the performance is good, but when I try and go SQL-Postgres it takes about 2.5 hours to load the records.
It's almost as if it's not bulk loading at all. Looking at the output it seems like it's reading the batched record count from the source very quickly (10,000 records), but the insert into the postgres side is taking forever. When I look at the record count every few seconds it jumps from 0 to 10,000 but takes minutes to get there, whereas it should be seconds.
Interestingly I downloaded a third party driver from DevArt and the load went from 2.5 hours to 9 minutes. Still not super quick, but much better. Either Postgres ODBC does not support bulk load (unlikely since postgres to postgres loads so quickly) or there's some configuration option at play in either the ODBC driver config or SPSS config.
Has anybody experienced this? I've been looking at options for the ODBC driver, but can't really see anything related to bulk loading.
IBM SPSS Statistics uses the IBM SPSS Data Access Pack (SDAP). These are 3rd party drivers from Progress/Data Direct. I can't speak to performance using other ODBC drivers. But if you are using the IBM SPSS Data Access Pack "IBM SPSS OEM 7.1 PosgreSQL Wire Protocol" ODBC driver, then there are resources for you
The latest Release of the IBM SPSS Data Access Pack (SDAP) is version 8.0. It is available from Passport Advantage (where you would have downloaded your IBM SPSS Statistics Software) as "IBM SPSS Data Access Pack V8.0 Multiplatform English (CC0NQEN )"
Once installed, see the Help. On Windows it will be here:
C:\ProgramData\Microsoft\Windows\Start Menu\Programs\IBM SPSS OEM Connect and ConnectXE for ODBC 8.0\

Making postgresql logs in JSON format

I am using postgresql 9.5 on ubuntu 16.04.
Is there any way in postgresql so that it's logs can be stored in JSON format ?
I need to send it to elasticsearch, that's why I need to make postgresql logs in JSON format.
I followed this tutorial, but did not quite understood that what and where it was asking me to make changes in the conf file.
PostgreSQL listens to their community and your voice is heard !
PostgreSQL 15 beta has been released on May 15th, 2022. PostgreSQL version 15 now supports the jsonlog logging format which is to be released in third quarter of 2022 .
You have to make the below change in the postgresql.conf file
log_destination = 'jsonlog'
The log output will be written to a file, making it the third type of destination of this kind, after stderr and csvlog.
You can send these generated json logs to elasticsearch or any application for further log aggergations.
Check here for more info
Update: PostgreSQL v15 is out now. You can now explore them here
PostgreSQL self doesn't support any other formats than plain text and CSV. When you need other formats, then you need to get somewhere (or write by self) special extension that is able to touch log API and format and push PostgreSQL logs. One extension was developed by Michael and it is described in mentioned link. Here is link to source code: https://github.com/michaelpq/pg_plugins/tree/master/jsonlog . You have to compile this extension like any other (PostgreSQL extension) - code is in C language, and then you can use it.
As I am understanding your problem statement is you want to push postgresql logs to Elasticsearch.
For this I would recommend to use filebeat where you can simple enable the PostgresSQL module and set the log path. Filebeat start reading logs file and push to Elasticsearch.
You can visualize your data from kibana with readymade dashboard. It is simple plug and play.