Postgres Error Handling in Pentaho Transformation - postgresql

I have a Pentaho mapping in which I am loading the data from a flat file to postgres table. Its a simple one to one mapping.
I am trying to incorporate error handling in that mapping.
Without error handling, the data loads pretty fast (speed around 3000 rps) but if error handling is added then it loads the data very slow with speed 1 rps.
Anyone faced this similar problem? Does postgres driver in PDI does not support error handling?
Attaching image of my mapping:

Related

Postgres to ElasticSearch Data Replication Real time Solution

I am having Spring Boot application with Postgres but for search we want to use elasticsearch but we need to sent postgres data to elasticSearch. Is there production ready best approach can you suggest . solution which we can implement for all failure handling, real time data syncing and indexing solves all problems. when a new create happened new update then should go in ES index also postgres tables having joins, want to handle Localization too.

How to process and insert millions of MongoDB records into Postgres using Talend Open Studio

I need to process millions of records coming from MongoDb and put a ETL pipeline to insert that data into a PostgreSQL database. However, in all the methods I've tried, I keep getting the out memory heap space exception. Here's what I've already tried -
Tried connecting to MongoDB using tMongoDBInput and put a tMap to process the records and output them using a connection to PostgreSQL. tMap could not handle it.
Tried to load the data into a JSON file and then read from the file to PostgreSQL. Data got loaded into JSON file but from there on got the same memory exception.
Tried increasing the RAM for the job in the settings and tried the above two methods again, still no change.
I specifically wanted to know if there's any way to stream this data or process it in batches to counter the memory issue.
Also, I know that there are some components dealing with BulkDataLoad. Could anyone please confirm whether it would be helpful here since I want to process the records before inserting and if yes, point me to the right kind of documentation to get that set up.
Thanks in advance!
As you already tried all the possibilities the only way that I can see to do this requirement is breaking done the job into multiple sub-jobs or going with incremental load based on key columns or date columns, Considering this as a one-time activity for now.
Please let me know if it helps.

Power BI - Data Load Error- OLE DB or ODBC error: [DataSource.Error] PostgreSQL: Exception while reading from stream

My question might look similar to some earlier posts but none of the solution has answered what is the rootcause of this behavior.
I would try to explain what I have done so far:
I am connecting to a PostgresDB (running in our company's aws environment) via my Power BI desktop client. The connection set up was pretty easy and I am able to see all the tables in the DB.
For 2 of my table which are extremely big in size, I am trying to load the data I am getting below error message -
Data Load Error- OLE DB or ODBC error: [DataSource.Error] PostgreSQL: Exception while reading from stream.
I tried changing the Command TimeOut Parameter in the initial M
query-- Didn't help
I tried writing native query with select * and
where clause (used parameter)-- It worked
Question:
When the Power BI starts loading the data without any parameter, it does start extracting some thousands of record but get interrupted and throws the mentioned error. Is it a limit from the database server side which is getting hit or is it a limitation of power BI?
What can I change in my Database server side, if I don't want to pull information using parameters (as at the end I need all the data for my reports)

Inject big local json file into Druid

It's my first Druid experience.
I have got a local setup of Druid in local machine.
Now I'd like to make some query performance test. My test data is a huge local json file 1.2G.
The idea was to load it into druid and run required SQL query. The file is getting parsed and successfully processed (I'm using Druid web-based UI to submit an injection task).
The problem I run into is the datasource size. It doesn't makes sense that 1.2G of raw json data results in 35M of datasource. Is there any limitation the locally running Druid setup has. I think the test data is processed partially. Unfortunately didn't find any relevant config to change it. Will appreciate if some one is able to shed light on this.
Thanks in advance
With druid 80-90 percent compression is expected. I have seen 2GB CSV file reduced to 200MB druid datasoruce.
Can you query the count to make sure all data is ingested? All please disable approximate algorithm hyper-log-log to get exact count.Druid SQL will switch to exact distinct counts if you set "useApproximateCountDistinct" to "false", either through query context or through broker configuration.( refer http://druid.io/docs/latest/querying/sql.html )
Also can check logs for exception and error messages. If it faces problem to ingest particular JSON record it skips that record.

How to copy a database to sap hana from postgresql with talend?

well my problem is, how could i copy a database with talend from postgresql to sap hana without needing to write a job for every table ?
The reason for this is, because it could take some long time to prepare all those jobs, while taking in consideration, having at least 200 tables, which at least have 30 columns.
I tried tTransferDatabase plugin, but i can't success to transfer it to sap hana, it gives me an error that it can't copy schema (while it successfully worked copying it to other database in postgresql), and i am sure that the schemas names are right.
here is the error:
Exception in component tTransferDatabase_1
java.lang.NullPointerException
at org.apache.ddlutils.PlatformFactory.createNewPlatformInstance(PlatformFactory.java:86)
at org.apache.ddlutils.PlatformFactory.createNewPlatformInstance(PlatformFactory.java:124)
at com.devjpcb.transferdatabase.TransferDatabase.getPlatformDestine(TransferDatabase.java:179)
at com.devjpcb.transferdatabase.TransferDatabase.copySchemaToDatabase(TransferDatabase.java:249)
at local_project.aaasa_0_1.aaasa.tTransferDatabase_1Process(aaasa.java:836)
at local_project.aaasa_0_1.aaasa.runJobInTOS(aaasa.java:1130)
at local_project.aaasa_0_1.aaasa.main(aaasa.java:951)
Is there maybe a chance to do sth like .. for each table in connection, table guess schema, copy columns from table to other side of tmap, run ?
Any advice would be helpful ;), Thank you !
With some work, you could use the example job created by rbaldwin on Talend Exchange; note that it starts with files, not a database. But you could easily create a job that loops through all your database tables and does an extract to file, to then use as the starting point.
Another option is Bekwam's solution