Upload multiple files to pentaho

Upload multiple files to pentaho - pentaho-spoon

In pentaho data integration, how do I import a list of xlsx files that are in the same folder?
note: the number of columns are always the same

If your excel column name and sheet name are always same then you can use THIS solution. Here I have take all xlsx file from source folder and convert one-by-one file as csv.
But if your excel columnname and sheet name are dynamic or you need some dynamic solution. Then you can use my another stackoverflow solution from Here

Related

Data Factory - Can I use the date field in a CSV to determine the destination folder in Copy Activity

I have some CSV files that I want to copy to a specific folder in ADLS based on the date column within the file.
i.e. CSV file has a column named "date" that reads "2022-02-23" on all rows. I want to copy that file to a folder that has the corresponding year and month, such as "/curated/UK/ProjectABC/2022/02"
I've got a Lookup activity that's pointing to the source CSV file and populating a Set Variable activity with the month using this dynamic content - #substring(string(activity('Lookup1').output.firstrow.date),5,2)
Would this be the right approach, to use a variable?
I cant use variables in the Directory portion of the Sink Dataset, as far as I know.
Have you come across this situation before?

Sounds like you're on the right path. You can use absolutely use Dataset parameters:
Then populate them in your pipeline using a variable (or parameter, or expression):

How can we Exclude the unnecessary rows from Excel File while doing Data Load using Copy activity in ADF

I have a excel file which is semi-structured. There is data in a table, but there are dividers in certain rows that should needs to be ignored.
The processing of the data should start with the column headers(Col1 , col2 ....) and only process the rows with actual data.
Could anyone suggest the way to achieve this using copy activity in adf .
My source is xls file and target is ADLA (Parquet file)
Any help appreciated. Thanks in advance.

The most closest solution is that you need manually choose data range in the excel file:
Ref: https://learn.microsoft.com/en-us/azure/data-factory/format-excel#dataset-properties
HTH.

Dataprep : Invalid array type after run job to excel file

I try to use array type column in dataprep and it is look good in dataprep display ui as the picture below.
But when I run job output with .csv file, there are invalid value in the array column.
Why does the .csv output different from dataprep display?
Array in Dataprep display
Array in csv output

It looks like these two columns each contain the complete record...? I also see some non-English characters in there. I suspect something to do with line breaks and/or encoding.
What do you see if you open the CSV file in a plaintext editor, instead of Excel?
What edition of Dataprep are you using (click Help => About Dataprep => see the Edition heading)?
What version of Excel are you using to open the CSV file?
Assuming that this is a straight-forward flow with a single dataset and recipe, could you post a few rows of data and the recipe itself (which you can download), for testing purposes?

TalendOpenStuido DI Replace content of one column of .slx File with another column of .csv file

I have two input files:
an .xlsx file that looks like this:
an .csv files that looks like this:
I already have a talend job that transforms the .xlsx file into an .xml file.
One node in the .xml file contains the
<stockLocationCode>SL213</stockLocationCode>
The output .xml file looks like this:
Now I need to replace every occurence of the stockLocationCode with the second column of the .csv file. In this case the result would be:
My talend job looks like this:
I use a tMap component to put the columns of the .xlsx file into the right node of the output xml file.
But I do not know how I can peplace the StockLocactionCode with the acutal full stock location using the .csv file. I tired to also map the .csv file with the tMap component.
I would neet to build in a methof that looks at the current value of the node <stockLocationCode> and loops over the whole .csv file until it find it in the first column of the .csv file and then replace the <stockLocationCode> content with the content of the second column of the .csv file.
Performance is not important ;)

First, you'll need a lookup in e.g. a tMap or tXMLMap component, where you map your keys and add a new column with the second column of the csv file
The resulting columns would look like this:
Product; Stock Location Code; CSV 2nd column data
Now in a second map you could just remove the stock location code and do the rest of your job.
Voila, you exchanged the columns.

u can use tXMLMap which lookup

How to combine multiple excel data into one excel with all sheets?

Actually I have a list of customers from all the countries in one sheet name "ALL".
Problem: I have to crate separate sheets for group of countries like for USA sheet name will be USA and for Australia,Germany and Switzerland sheet name will be Central_Region output will be like below image.
What I have tried till now :- I used tFilterRow component and I have got all the separate excel files group by countries . now trying to combine in one file.
For Example : I have 5 excel workbook files each has one sheet like excel1.xls has sheet "USA" other excel2.xls has sheet "Canada" and same other 3 are in same way.
Now I want to generate a single excel workbook which will have all the sheets like "USA", "Canada" and all other sheets from other excels.
I tried using iUnite but it did not help it just append all the sheets data into one sheet.Like below image

Download this add-ins
Open one your excel file and then open this add-ins file (also you can install that)
when you open this file, select Enable Macro.
Go to DATA tab on excel file and select RDB Merge add-in.
set properties and push Merge Button.
With this, your excel fills will merge in one sheet.

If you can't know in which order row will appear, you could store your data in csv files for each country.
Then you can add each csv file into a separate sheet on the Excel file using Write after.
If rows are coming in the rigth order, like all USA then Canada etc . . . you can directly use Write after in your ExcelOutput behind your tFilter but i highly suppose this is not the case.

If you have same structure excel file with different in sheet name then you have to make job like this.
tFileList---tFileInputExcel-----tMap---tFileOuputExcel
Set source directory where you get all the files to the tFileList component.
Use global varibale which hold "file with path" information and assign to the file name text box of tFileInputExcel component.
In select Sheet box assign index instead sheet name.
check Append property of tFileOuputExcel component you can merge all files in single one.
Note: in tMap you can add transformation or make changes in column sequence of output.

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse

Upload multiple files to pentaho - pentaho-spoon

In pentaho data integration, how do I import a list of xlsx files that are in the same folder? note: the number of columns are always the same

Related

Data Factory - Can I use the date field in a CSV to determine the destination folder in Copy Activity

How can we Exclude the unnecessary rows from Excel File while doing Data Load using Copy activity in ADF

Dataprep : Invalid array type after run job to excel file

TalendOpenStuido DI Replace content of one column of .slx File with another column of .csv file

How to combine multiple excel data into one excel with all sheets?

Categories

Resources