How to extract data from hive table to csv using talend - talend

I want ot transfer data from one hadoop server to another hadoop server with help of Talend.
Through my research I come to know we can transfer data through flat files.Can any one suggest me how to transfer data from hive to flat file. If any other alternative way to transfer data using talend please suggest me.

You can use the tHiveInput Component to read the data from the hive table. Use a row link to connect it to a tfileInputDelimited component.
If you want to transfer to another Hadoop system, you can use a tHiveOutput or a tHDFSOutput instead of tFileInputDelimited.

Related

Azure data factory: Implementing the SCD2 on txt files

I have flat files in adls source,
for full load we are adding 2 columns Insert and datatimestamp.
For change load we need to Lookup with full data, the data available in full should be taken as Updated and not available data as Insert and copy.
below is the approach I tried to work out, but i'm unable to perform.
Can any one help me on this.
Thanks you and waiting for quick response.
Currently, the feature to update the existing flat file using the Azure data factory sink is not supported. You have to create a new flat file.
You can also use data flow activity to read full and incremental data and load to a new file in sink transformation.

How can we handle Data validations in snowpipe in Snowflake

My Scenario is I have data in AWS S3 flat files.
I am using SNS to trigger the Snow-pipe when new file arrives in S3.
To load the data from flat files in S3 to Snowflake table I am using Snow-pipe.
So While loading data from flat files to snowflake table by Snow-pipe,
Can I handle data-validation and couple of calculations on source data?
Please help me if we have any way to do this...
Thanks in Advance.
Validation_mode copy option is not yet supported by snowpipe. However, snowpipe does support simple transformations like column reordering, cast etc are supported. The best way to perform calculations and transform your data would be to load the data into a staging table and process downstream into target tables.
Reference:
https://docs.snowflake.net/manuals/sql-reference/sql/create-pipe.html#usage-notes
https://docs.snowflake.net/manuals/user-guide/data-load-transform.html

Reuse data flow from tMysqlInput

I´ve a mysql statement which retrieves a lot of data from a table via the tMysqlInput component. Now I want to process the data in different ways and then merge them back again to a single output of the job.
Therefore I wanted to use the tReplicate component to avoid multiple requests to my SQL server. But if I´m using tReplicate I´m not able to map the two datastreams afterwards again with tMap.
Is there any other way to reuse the result of a mySql component multiple times?
Thanks in advance,
Michael
You can use tHashOutput to save the data from mysql and then use tHashInput to read that data and use in tMap.

How to load CSV data in snappydata table with rowstore mode using JDBC connection

Hi i am starting to learning the snappydata rowstore link there i tried all the example its working , but i need to store the csv , json data in snappydata table, in the example they are using manually connecting the snappy-shell and creating the tables and inserting the records,and another option for JDBC client link , i am tried this way but i dont know how to load csv and store in snappy table,then i tried another method direct query based access in snappydata store also link ,if anyone knows how to store csv data in snappytable using jdbc please share me, Thank you..
You should go through the SnappyData docs - http://snappydatainc.github.io/snappydata/. The "How-to" section has loading from CSV examples ... http://snappydatainc.github.io/snappydata/howto/#how-to-load-data-in-snappydata-tables
val someCSV_DF = snSession.read.csv("Myfile.csv")
someCSV_DF.write.insertInto("MyTable")

Tableau Data store migration to Redshift

Currently we have a workbook developed in Tableau using Oracle server as the data store where we have all our tables and views. Now we are migrating to Redshift fora better performance. We have the same table structure as in the Oracle with the same table names and the field names in the Redshift. We already have the Tableau workbook developed and we need to point to Redshift tables and views now. How do we point the developed workbook to Redshift now, kindly help.
Also let me know any other inputs in this regard.
Thanks,
Raj
Use the Replace Data Source functionality of Tableau Desktop
You can bypass Replace Data Source and move data directly from Oracle to Redshift using bulk loaders.
Simple combo of SQL*Plus + Python + boto + psycopg2 will do the job.
It should:
Open read pipe from Oracle SQL*Plus
Compress data stream
Upload compressed stream to S3
Bulk append data from S3 to Redshift table.
You can check example of how to extract table or query data from Oracle and then load it to Redshift using COPY command from S3.