Output row for each input in a column Pentaho Data Integration Spoon

Output row for each input in a column Pentaho Data Integration Spoon - pentaho-spoon

I am new to Pentaho Data Integration, and I am looking to take the below column data and have an output for each entry in the date columns. Not sure how this can be done in spoon.
I have looked into the Denormalisation step, but examples are not very good that I have found. Also, I am not sure it would get me an output for each column value.
Starting Data:
Desired Data:
Pentaho Forum version with sample files attached.

For this I found out you need to use the normaliser step with the below settings.
Results:

Related

How to check if the file contains certain values before reading in Talend Studio

Hello beginner in Talend Studio here and first time poster. I am using Talend 8.0 and have a text file to ingest into a database that has the following:
H2||ID||portfolio||manager||name
D||5||8001-1101||48||John Doe
D||6||8001-1102||50||John Doe
D||7||8002-1101||20||Jane Doe
F3||||||||
where the delimiter is a double pipe (||)
ID, portfolio, manager and name and its associated records are the data I'd like to ingest. The first column with "H2", "D" and "F3" are the header, detail and footer indicators respectively. These indicators are not supposed to be ingested but will need to be checked for their presence when the file is read into talend studio.
I need to check if these three indicators are available in the file. If either of these indicators are not in the file, it should not ingest the file and output a message. If the indicators do exist, the data is ingested but only the data for the columns "ID","portfolio","manager" and "name"
I tried using the following components:
Which will read the table in its entirety including the H2 column. I then use t-map with a filter
row1.Header.contains("D")
which keeps rows that has "D" indicator. Appreciate if there is a better way to do this

Use row1.Header.contains("D")&&row1.Header.contains("H2")&&row1.Header.contains("F3") to filter header in ("D","H2","F3")
If you want the reject check the option in an other output and check output reject to true

How to check date format in Azure Data Factory

I am creating a pipeline where the source is csv files and sink is SQL Server.
The date column in CSV file may have values like
12/31/2020
10162018
20201017
31/12/1982
1982/12/31
I do not find the function which checks the format of the date. How do I check the format and convert the above values to yyyy-MM-dd format.

The solution is given by HimanshuSinha-msft
Solved the issues using expression builder in Derived Column in Mapping Data Flow.
coalesce(toDate(Somedate,'MM/dd/yyyy'),toDate(Somedate,'yyyy/MM/dd'),toDate(Somedate,'dd/MM/yyyy'),toDate(Somedate,'MMddyyyy'),toDate(Somedate,'yyyyddMM'),toDate(Somedate,'MMddyyyy'),toDate(Somedate,'yyyyMMdd'))

This coalesce function answer will not actually solve the problem. It just gets rid of the errors. There are plenty of dates that are valid in multiple formats. For example: "2/1/2020" (mm/dd/yyyy) and "1/2/2020" (dd/mm/yyyy). The previous answer just gets rid of errors, but your analyses downstream will be very incorrect.
You need to do an aggregate analysis of which date format best fits the incoming stream, and the route the logic to the respective separate pipeline branches.

You can configure this in the Mapping tab of your copy activity. The datetime format can be specified, but it only supports one format type. If you have a mix of formats like in your example then it will not work.
One option would be to ingest the column into a staging table as a nvarchar. Then in another copy activity use a custom select statement to detect the column format and cast the date as needed. You should be able to do this using a CASE SQL statement in your SELECT from the staging table.
FYI: data type mapping
https://learn.microsoft.com/en-us/azure/data-factory/copy-activity-schema-and-type-mapping#data-type-mapping

Dataprep : Invalid array type after run job to excel file

I try to use array type column in dataprep and it is look good in dataprep display ui as the picture below.
But when I run job output with .csv file, there are invalid value in the array column.
Why does the .csv output different from dataprep display?
Array in Dataprep display
Array in csv output

It looks like these two columns each contain the complete record...? I also see some non-English characters in there. I suspect something to do with line breaks and/or encoding.
What do you see if you open the CSV file in a plaintext editor, instead of Excel?
What edition of Dataprep are you using (click Help => About Dataprep => see the Edition heading)?
What version of Excel are you using to open the CSV file?
Assuming that this is a straight-forward flow with a single dataset and recipe, could you post a few rows of data and the recipe itself (which you can download), for testing purposes?

How Do I generate Line Graph in PowerBI for the following Data

The Data I have looked like the Image below:
I want to create a line Graph,
The x-axis will have the Months (January to June), Y-Axis will have the Tot_QtY.
Considering that the date information is not in a single column but is spread out to multiple columns, How do I make a line chart.
Output I am getting is shown below:
The Output I need would look like this:

your input data is not correct, you need to model it so you get one column with rows of data. Click Edit Queries, this opens query editor.
Click tab Transform, select your columns and click unpivot.
My expectation is that you still need to work a bit more on your data because how the system knows from a column: MayDATE. that you talk about May. The best base data you can have is a column date and a colum value with data.

Arrivals based on database

I am trying to retrieve arrival rates from a excel spreadsheet in my model but I don't have the option to select the specific row and column i want. How can I ensure that a specific cell is chosen? (For example i want the value 5 corresponding to "limeConveyor" row and "red" column.
This is the sample spreadsheet
This is the properties window
Thank you in advance for your help! :)
Edit 1:
I am currently unable to select "red" from the
dropdown list of value column. Is my program bugged or something?

Maybe to clarify Felipe's reply: I suspect your dbase table is not setup correctly and doesn't match your xls screenshot.
First, load your spreadsheet into an AnyLogic dbase table using the wizard. It should then look like this (note the column properties: the column type is important):
Now in your source, you can easily load the required column-row combination and the valuer-dropdown allows you to select "red" if you want:
You can find the example model that I used for the screens here, hope this helps.

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse

Output row for each input in a column Pentaho Data Integration Spoon - pentaho-spoon

For this I found out you need to use the normaliser step with the below settings. Results:

Related

How to check if the file contains certain values before reading in Talend Studio

How to check date format in Azure Data Factory

Dataprep : Invalid array type after run job to excel file

How Do I generate Line Graph in PowerBI for the following Data

Arrivals based on database

Categories

Resources