How to check if the file contains certain values before reading in Talend Studio - talend

Hello beginner in Talend Studio here and first time poster. I am using Talend 8.0 and have a text file to ingest into a database that has the following:
H2||ID||portfolio||manager||name
D||5||8001-1101||48||John Doe
D||6||8001-1102||50||John Doe
D||7||8002-1101||20||Jane Doe
F3||||||||
where the delimiter is a double pipe (||)
ID, portfolio, manager and name and its associated records are the data I'd like to ingest. The first column with "H2", "D" and "F3" are the header, detail and footer indicators respectively. These indicators are not supposed to be ingested but will need to be checked for their presence when the file is read into talend studio.
I need to check if these three indicators are available in the file. If either of these indicators are not in the file, it should not ingest the file and output a message. If the indicators do exist, the data is ingested but only the data for the columns "ID","portfolio","manager" and "name"
I tried using the following components:
Which will read the table in its entirety including the H2 column. I then use t-map with a filter
row1.Header.contains("D")
which keeps rows that has "D" indicator. Appreciate if there is a better way to do this

Use row1.Header.contains("D")&&row1.Header.contains("H2")&&row1.Header.contains("F3") to filter header in ("D","H2","F3")
If you want the reject check the option in an other output and check output reject to true

Related

ADF Add Header to CSV Sink

Anyone know how to add header to csv sink? I have a data flow that's source is a database table. Then I have used derived column and concatenated the columns to make one column and split the data in the column by commas (done in the source via a query). I have then selected the column that has been concatenated to be export to csv.
Data example:
Matt,Smith,10
Therefore I technically only have one column, however, I want to add a header for each section of the data.
Desired output:
FirstName,LastName,Age
Matt,Smith,10
You can add headers in CSV file.
Select Data Flow Activity.
Select Source and use Select activity.
Add column names as shown in below screenshot.
Finally add Sink and run Pipeline.

Dataprep : Invalid array type after run job to excel file

I try to use array type column in dataprep and it is look good in dataprep display ui as the picture below.
But when I run job output with .csv file, there are invalid value in the array column.
Why does the .csv output different from dataprep display?
Array in Dataprep display
Array in csv output
It looks like these two columns each contain the complete record...? I also see some non-English characters in there. I suspect something to do with line breaks and/or encoding.
What do you see if you open the CSV file in a plaintext editor, instead of Excel?
What edition of Dataprep are you using (click Help => About Dataprep => see the Edition heading)?
What version of Excel are you using to open the CSV file?
Assuming that this is a straight-forward flow with a single dataset and recipe, could you post a few rows of data and the recipe itself (which you can download), for testing purposes?

Keep leading zeros when joining data sources in tableau

I am trying to create a data source in Tableau (10.0) where I am joining a table from SQL with an Excel file. The join happens on a site id but when reading the id from the excel source, Tableau strips the leading zeros (and SQL keeps leading zeros). I see this example
to add the leading zeros back as a new, calculated field. But the join is still dropping rows because the id is not properly formatted when making the join.
How do I get the excel data source to read the column with the leading zeros so I can do the join?
Launch Excel and choose to open a new blank workbook.
Click the Data tab and select From Text.
Browse to the saved CSV file and select Import.
Ensure that Delimited is selected and click Next.
Leave Tab as the delimiter and click Next.
Select the column containing the data with leading zeros and click
Text.
Repeat for each column which contains leading zeros.
Click Finish.
Click OK.
Never heard of or used tableau, but it sounds as though something (jet/ace database driver being used to read excel file?) is determining the column to be numeric and parsing the data as numbers, losing leading zeroes
If your attempts at putting them back are giving you grief, I'd recommend trying the other direction instead; get sqlserver to convert its strings to numbers. Number matching should be more reliable than String matching, so long as the two systems don't handle rounding differently :)
If your Excel file was read in from a CSV and the Site ID is showing "Number Stored as Text", I think you can solve your problem by telling Tableau on the Data Source entry that the field is actually a string. On the preview data source view, change the "#" (designating number) to string so that both the SQL source and the Excel source are both strings before doing the join.
This typically has to do with the way Excel stores values as mentioned above. I would play around with the number formatting for the Site ID column in Excel itself, not Tableau, and changed that two "Text" in Excel. You can verify if Tableau will read it properly with the leading 0s by exporting your excel file to csv and looking in the csv files to see if the leading 0s are still there.

Space in column name Talend

I want to make a csv file that I can upload in my Google Calendar.
The mandatory headers for a file to upload are
Subject, Start date, Start time
But in Talend you can't make a column name with a space between the words, anybody know how I can fix this?
Maybe you can generate the first line with a "tFixedFlowInput" and complete your CSV file without column titles by changing in your output component the parameter "Include Header".
Don't forget to check the parameter "append" when you insert your data after

How to combine multiple excel data into one excel with all sheets?

Actually I have a list of customers from all the countries in one sheet name "ALL".
Problem: I have to crate separate sheets for group of countries like for USA sheet name will be USA and for Australia,Germany and Switzerland sheet name will be Central_Region output will be like below image.
What I have tried till now :- I used tFilterRow component and I have got all the separate excel files group by countries . now trying to combine in one file.
For Example : I have 5 excel workbook files each has one sheet like excel1.xls has sheet "USA" other excel2.xls has sheet "Canada" and same other 3 are in same way.
Now I want to generate a single excel workbook which will have all the sheets like "USA", "Canada" and all other sheets from other excels.
I tried using iUnite but it did not help it just append all the sheets data into one sheet.Like below image
Download this add-ins
Open one your excel file and then open this add-ins file (also you can install that)
when you open this file, select Enable Macro.
Go to DATA tab on excel file and select RDB Merge add-in.
set properties and push Merge Button.
With this, your excel fills will merge in one sheet.
If you can't know in which order row will appear, you could store your data in csv files for each country.
Then you can add each csv file into a separate sheet on the Excel file using Write after.
If rows are coming in the rigth order, like all USA then Canada etc . . . you can directly use Write after in your ExcelOutput behind your tFilter but i highly suppose this is not the case.
If you have same structure excel file with different in sheet name then you have to make job like this.
tFileList---tFileInputExcel-----tMap---tFileOuputExcel
Set source directory where you get all the files to the tFileList component.
Use global varibale which hold "file with path" information and assign to the file name text box of tFileInputExcel component.
In select Sheet box assign index instead sheet name.
check Append property of tFileOuputExcel component you can merge all files in single one.
Note: in tMap you can add transformation or make changes in column sequence of output.