how to load data (excel file) into sql-server using talend - talend

I am very new to using Talend, could some body tell me how to load a excel sheet to SQL-Server 2012?
I tried searching in Google but no help..
I can easily do it using ssis tasks, but much needed using Talend, I took an input as excel and tried making a DB connection but i dont know the connection string or I dont understand what to put as port, Database, schema, Additional parameters.
I tried following these links:
see here

to solve your problem you need 3 components :
-Your Excel input
-A tmap
-A toutput
Join the excel input to the tmap and then the tmap to the dboutput.
Port, Database, Schema, Additional params refers to your database informations

What Talend Open Studio is ?
Talend Open Studio for Data Integration is an open source graphical development
environment for creating and deploying custom integrations between systems. It
comes with over 600 pre-built connectors that make it quick and easy to connect
databases, transform files, load data, move, copy and rename files, and connect
individual components in order to define complex integration processes.
and to answer to the question.
a raw file(excel,csv,xml,etc) put in a tfileInputdelimited then connected to tMSSQLoutput
the connection looks as shown in figure.
then to confiqure the sql server, goto to metadata,db connections-->right click -->create connection-->DB type-->sql server--> it works on sql server authentication so you need to create a user say user1 with some password-->then login-->password-->server_name-->port(1433 default in sql server)-->others left null(is ok)
file is successfully executed
This is something to be understood when trying Talend for the first time :D

Related

Loop through .csv files using Talend

complete noob here to Talend/Data Integration in general. Have done simple things like loading a CSV to Oracle table using Talend. Below is the requirement now and looking for some ideas to get started please
Request:
Have a folder in Unix Environment where the source application is pushing out .csv files daily#5AM. They are named as below
Filename_20200301.csv
Filename_20200302.csv
Filename_20200303.csv
.
.
and so on till current day.
I have to create a Talend Job to parse through these csv files every morning and load them into an oracle table where my BI/reporting team can consume the data. This table will be used as a Lookup table, and the source is making sure not to send duplicate records in csv.
The files would usually have about 250-300 rows per day. Plan is to keep an eye and if volume of rows increase in future then maybe think of limiting the time frame of the date to rolling 12 months.
Currently i have files from March 1st, 2020 onwards to today.
The destination Oracle schema/table is always the same.
Tools: Talend Data Fabric 7.1
I can think of the below steps but no idea how to get started on step1) and step2)
1) Connect to a Unix server/shared location . I have the server details/Username/Password but what component to use in Metadata?
2) Parse through the files on the above location. Should i use TfileList? Where does the TFileInputDelimited come in?
3) Maybe use Tmap for some cleanup/changing datatypes before using TDBOutput to push into oracle. I have used these components in the past , just have to figure out how to insert into oracle table instead of truncate/load. 
Any thoughts/other cool ways to doing it please. Am i going down the right path? 
For Step 1, you can use the tFTPGet which will save your files from the Unix server/shared location to your local machine or job server.
Then for Step 2, as what you mentioned, you can use a combination of tFileList and tFileInputDelimited
Set tFileList to the directory to directory where your files are now saved (based on Step 1)
tFileList will iterate through the files found in the directory.
Next, tFileInputDelimited will parse each csv one by one
After that you can flow it through a tMap to do whatever transformation you need and write into your Oracle db. An additional optional step you can do is a tUnite so you will write into your db all in one go.
Hope this helps.
Please use below flow,
tFTPFileList --> tFileInputDelimited --> tMap --> tOracleOutput
If you are not picking the file from local server, please use tFileList instead of tFTPFileList

Working with CSVs in SSIS without SQL Database connection

So I am tasked with performing some calculation in SSIS on a CSV file using T-SQL. Once I am complete I need to save the project as an ISPAC file so other users outside my business can run the program too, however their SQL connection managers will obviously differ from mine. To that end is it possible to use T-SQL to make modifications to a CSV inside of SSIS without actually using a connection manager to an SQL database? If so, what tasks would I use to achieve this? Many Thanks
EDIT: For more clarification on what i need to achieve: Basically I am tasked with adding additional data to a CSV file. For example adding forename and surname together etc, create new columns such as month numbers, names etc from a date string then output that to a new CSV location. I am only allowed to use T-SQL and SSIS to achieve this. Once Complete I need to send the SSIS project file to another organisation to use therefore I cannot hardcode any of my connections or assume they hold any of the databases I will use.
My initial question is can all this be done inside of SSIS using T-SQL but I think I have the answer to that now. I plan to parameterise the connection string, source file location, output file and use the Master DB and temp DBs to perform the SQL code to add the additional data, so they will have SQL their end but it wont make any assumptions on what they are using

Is there a function or tool for filling a database with a CSV file using Eclipse?

I got a connection to the SAP HANA database. I've created a personal DB called simply "database" which I want to fill with two CSV files that I have in my laptop.
How can I do this?
Do I need to create the tables with the all the columns before?
One of the problem is that my CSV files are composed by 130 columns.
It is impossible to create all the columns from scratch.
Thank you in advance for you help.
You can simply use the import data from front end function in SAP HANA Studio for that.
The wizard lets you define the target table structure based on the data found in the CSV files.
Check this out Import CSV into SAP HANA, express edition using the SAP HANA Tools for Eclipse

Talend Open Studio Big Data - Iterate and load multiple files in DB

I am new to talend and need guidance on below scenario:
We have set of 10 Json files with different structure/schema and needs to be loaded into 10 different tables in Redshift db.
Is there a way we can write generic script/job which can iterate through each file and load it into database?
For e.g.:
File Name: abc_< date >.json
Table Name: t_abc
File Name: xyz< date >.json
Table Name: t_xyz
and so on..
Thanks in advance
With Talend Enterprise version one can benefit of dynamic schema. However based on my experiences with json-s they are somewhat nested structures usually. So you'd have to figure out how to flatten them, once thats done it becomes a 1:1 load. However with open studio this will not work due to the missing dynamic schema.
Basically what you could do is: write some java code that transforms your JSON into CSV. Use either psql from commandline or if your Talend contains new enough PostgreSQL JDBC driver then invoke the client side \COPY from it to load the data. If your file and the database table column order matches it should work without needing to specify how many columns you have, so its dynamic, but the data newer "flows" through talend.
Really not cool but also theoretically possible solution: If Redshift supports JSON (Postgres does) then one can create a staging table, with 2 columns: filename, content. Once the whole content is in this staging table, INSERT-SELECT SQL could be created that transforms the JSON into tabular format that can be inserted into the final table.
However, with your toolset you probably have no other choice than to load these files with 1 job per file. And I'd suggest 1 dedicated job to each file. They would each look for their own files and triggered / scheduled individually or be part of a bigger job where you scan the folders and trigger the right job for the right file.

Can (Open Studio) Talend be used to automate a data load from a folder to vertica?

I have been looking at a way to automate my data loads into vertica instead of manually exporting flat files each time, and stumbled upon the ETL Talend.
I have been working with a test folder containing multiple csv files, and am attempting to find a way to build a job so the file can be put into vertica.
However, I see in the open studio version (free), if your files do not have the same schema, this becomes next to impossible without having the dynamic schema option which is in the enterprise version.
I start with tFileList and attempt to iterate through tFileInputDelimited, but the schemas are not uniform, so of course it will stop the processing.
So, long story short, am I correct in assuming that there is no way to automate data loads in the free version of Talend if you have a folder consisting of files with different schemas?
If anyone has any suggestions for other open source ETLs to look at or a solution that would be great.
You can access the CURRENT_FILE variable from a tFileList compenent and then send a file down different route depening on the file name. You'd then create a tFileInputDelimited for each file. For example if you had two files named file1.csv and file2.csv, right click the tFileList and choose Trigger>Run If. In the run if condition type ((String)globalMap.get("tFileList_1_CURRENT_FILE")).toLowerCase().matches("file1.csv") and drag it to the tFileInputDelimited set up to handle file1.csv. Do the same for file2.csv, changing the filename in the run if condition.