This is the actual JobI am having job which executes log files as output. I am doing this using tLogRow, but I want to write them into Database, Is that possible with talend? I have already used tMysqlRow and tMysqlOutput but those are throwing some errors. I have searched in google but there isn't any clear answer for this.
This is the actual output when using TlogRow
You can save the error information directly from tLogCatcher to database table. Please make sure that table name and column names from table should match the tMysqlOutput schema. if you don't want to save the complete information, you can use a tXmlMap for filter the columns required.
Related
I am new to talend and need guidance on below scenario:
We have set of 10 Json files with different structure/schema and needs to be loaded into 10 different tables in Redshift db.
Is there a way we can write generic script/job which can iterate through each file and load it into database?
For e.g.:
File Name: abc_< date >.json
Table Name: t_abc
File Name: xyz< date >.json
Table Name: t_xyz
and so on..
Thanks in advance
With Talend Enterprise version one can benefit of dynamic schema. However based on my experiences with json-s they are somewhat nested structures usually. So you'd have to figure out how to flatten them, once thats done it becomes a 1:1 load. However with open studio this will not work due to the missing dynamic schema.
Basically what you could do is: write some java code that transforms your JSON into CSV. Use either psql from commandline or if your Talend contains new enough PostgreSQL JDBC driver then invoke the client side \COPY from it to load the data. If your file and the database table column order matches it should work without needing to specify how many columns you have, so its dynamic, but the data newer "flows" through talend.
Really not cool but also theoretically possible solution: If Redshift supports JSON (Postgres does) then one can create a staging table, with 2 columns: filename, content. Once the whole content is in this staging table, INSERT-SELECT SQL could be created that transforms the JSON into tabular format that can be inserted into the final table.
However, with your toolset you probably have no other choice than to load these files with 1 job per file. And I'd suggest 1 dedicated job to each file. They would each look for their own files and triggered / scheduled individually or be part of a bigger job where you scan the folders and trigger the right job for the right file.
Not able to load multiple tables, getting error:
Exception in component tMysqlInput_1 (MYSQL_DynamicLoading)
java.sql.SQLException: Bad format for Timestamp 'GUINESS' in column 3
One table works fine. Basically after first iteration the second table trying to use the schema
of the first table. Please help, how to edit the component to make it
correct. Trying to load actor & country table from sakila DB mysql to
a another DB on the same server. Above image is for successful one table
dynamic loading.
you should not use tMysqlInput if output schemas differ. For this case there is no way around tJavaRow and custom code. I however cannot guess what happens in tMap, so you should provide some more details about what you want to achieve.
If all you need is to load data from one table to another without any transformations, you can do one of the following:
If your tables reside in 2 different databases on the same server, you can use a tMysqlRow and execute a query "INSERT INTO catalog.table SELECT * from catalog2.table2..". You can do some simple transformations in SQL if needed.
If your tables live in different servers, check the generic solution I suggested for a similar question here. It may need some tweaking depending on your use case, but the general idea is to replicate the functionality of INSERT INTO SELECT when the tables are not on the same server.
We have many talend jobs to transfer data from oracle (tOracleInput) to redshift (tRedshiftOutputBulkExec). I would like to store the result information into a DB table. For example:
Job name, start time, running time, rows loaded, successful or failed
I know if I turn on log4j, most those information can be derived from the log. However, saving it into DB table will make it easy to check and report the result.
I'm most interested in records loaded. I checked this link http://www.talendbyexample.com/talend-logs-and-errors-component-reference.html and manual of tRedshiftOutputBulkExec. None of them gives me such information.
Will Talend Administration Center provide such function? What is the best way to implement it?
Thanks,
After looking at the URL you provided, tLogCatcher should provide you with what you need (minus the rows loaded, which you can get with a lookup).
I started with Talend Studio Version 6.4.1. There you can set "Stats & Logs" for a job. It can log to console, files or database. When writing to a DB you set JDBC parameters and the name for three tables:
Stats Table: stores start and end timestamps of the job
Logs Table: stores error messages
Meter Table: stores the count of rows for each monitored flow
They correspond to the components tStatCatcher, tLogCatcher, tFlowMeterCatcher, where you can find the needed table schema.
To make a flow monitored select it, open tab "Component" and mark the checkbox "Monitor this connection".
To see the logged values you can use the "AMC" (Active Monitoring Console) in Studio or TAC.
well my problem is, how could i copy a database with talend from postgresql to sap hana without needing to write a job for every table ?
The reason for this is, because it could take some long time to prepare all those jobs, while taking in consideration, having at least 200 tables, which at least have 30 columns.
I tried tTransferDatabase plugin, but i can't success to transfer it to sap hana, it gives me an error that it can't copy schema (while it successfully worked copying it to other database in postgresql), and i am sure that the schemas names are right.
here is the error:
Exception in component tTransferDatabase_1
java.lang.NullPointerException
at org.apache.ddlutils.PlatformFactory.createNewPlatformInstance(PlatformFactory.java:86)
at org.apache.ddlutils.PlatformFactory.createNewPlatformInstance(PlatformFactory.java:124)
at com.devjpcb.transferdatabase.TransferDatabase.getPlatformDestine(TransferDatabase.java:179)
at com.devjpcb.transferdatabase.TransferDatabase.copySchemaToDatabase(TransferDatabase.java:249)
at local_project.aaasa_0_1.aaasa.tTransferDatabase_1Process(aaasa.java:836)
at local_project.aaasa_0_1.aaasa.runJobInTOS(aaasa.java:1130)
at local_project.aaasa_0_1.aaasa.main(aaasa.java:951)
Is there maybe a chance to do sth like .. for each table in connection, table guess schema, copy columns from table to other side of tmap, run ?
Any advice would be helpful ;), Thank you !
With some work, you could use the example job created by rbaldwin on Talend Exchange; note that it starts with files, not a database. But you could easily create a job that loops through all your database tables and does an extract to file, to then use as the starting point.
Another option is Bekwam's solution
I am faced with a situation where we get a lot of CSV files from different clients but there is always some issue with column count and column length that out target table is expecting.
What is the best way to handle frequently changing CSV files. My goal is load these CSV files into Postgres database.
I checked the \COPY command in Postgres but it does have an option to create a table.
You could try creating a pg_dump compatible file instead which has the appropriate "create table" section and use that to load your data instead.
I recommend using an external ETL tool like CloverETL, Talend Studio, or Pentaho Kettle for data loading when you're having to massage different kinds of data.
\copy is really intended for importing well-formed data in a known structure.