The output table and two source tablesI am very new to SSIS. I was trying Merge tranformation with sorting. The data was sorted for Id (1,2,3) but from Id=4 onwards, there was no sorting. I have applied sort on both source tables.
Why such behaviour?
I have attached the image of the outcome and source tables.
Thanks in advance.
Any way to do this? Basically trying to do a SQL UPDATE SET function if matching record for one or more key fields exists in another dataset.
Tried using Joins and Merge. Joins seems like more steps and the Merge appends records instead of updating the correlating rows.
I have table like this:
And I need to have "last" column (it is value from influxDb) in first column.
It is InfluxDb version 1.7.
I have a lot of queries (A,B,C,D):
So I can't use organize fields transformation:
But if I do join transformation before (regardless of the field) my table looks like this:
Use Grafana Organize fields transformation and drag/drop fields to achieve desired order. Example:
If query consist of a lot of series (A,B,C,D), probably it is necessary to do Merge transformation before Organize fields:
I am trying to do upsert and delete in a mapping data flow.
There is a dimension table, DimCustomer.
It is being populated with data from a file.
If a Sha2 hash does not match then upsert.
if CustomerID is missing from the rawSource data, then delete (see image below for settings)
The upsert works, but the delete does not. Its likely because in the sink i have selected the customerID column as the key, but this means it can never delete a record if the entire record, including key is missing from source.
Is there a prescribed design pattern for this scenario?
The easiest solution i can think of is a 2nd dataflow, in which the only customerID's sent to the sink are ones where there is no matching customerID in the source (effectively a right outer join), but want to see if this is indeed the best way to do this.
The best solution i can come up with for this is, to the above dataflow, add an additional column, the formula for which is coalesce(RawCustomerData#CustomerID,DimCustomer#CustomerID)
This ensures there is a CustID column that always has a value.
In the sink, i change the mapping so that this custID maps to the sink CustomerID.
The delete now works as expected. Still unsure if this is the best solution but it works and doesn't appear to cause a major performance issue.
Per my experience, I think that's the best solution, add a new column can solve the problem is much easier than other operations. The way which simplest and effective is the best solution. You don't need create another data flow actives to achieve it or re-design the Alter active logic.
Your Solution:
Add an additional column CustID, the formula for which is: coalesce(RawCustomerData#CustomerID,DimCustomer#CustomerID)
This ensures there is a CustID column that always has a value. In the sink, you change the mapping so that this custID maps to the sink CustomerID.
The delete now works as expected.
I am asking for help on the following topic. I am trying to create an ETL process with two Excel data sources (S1 ~300 rows and S2 ~7000 rows). S1 contains project information and employee details and S2 contains the amount of hours, which each employee worked in which project at a timestamp.
I want to insert the amount of hours, which each employee worked in each project at a timestamp, into the fact table by referencing to the existing primary keys in the dimension tables. If an entry is not present in the dimension tables already, i want to add a new entry first and use the newly generated id. The destination table structure looks as follows (Data Warehouse, Star Schema):Destination Table Structure
In SSIS, i created three Data Flow tasks for filling the Dimension Tables (project, employee and time) with distinct values (using group by, as S1 and S2 contain a lot of duplicate rows)first, and a fourth data flow task (see image below) to insert the FactTable data, and this is where I'm running into problems:
Data Flow Task FactTable
I am using three LookUp functions to retrieve the foreignKeys project_id, employee_id and time_id from the Dimension tables (using project name, employee number and timestamp). If the id is found, it is passed on all the way to Merge Join 1, if not, a new Dimension Entry is created (lets say project) and the generated project_id passed on instead. Same goes for employee and time respectively.
There is two issues with this:
1) The "amount of hours" (passed by Multicast four, see image above) is not matched in the final result (No Match)
2) The amount of rows being inserted keeps increasing forever (Endless Join, I belive due to the Merge joins).
What I've tried:
I have used one UNION instead of three Merge Joins before, but this resulted in the foreign keys being in seperate rows each, instead of merged together.
I used Merge (instead of Merge Join) and combined the join as well as sort conditions in as I fell all possible ways.
I understand that this scenario might be confusing for everybody else, but thank your for taking time looking at it! Any help is greatly appreciated.
Solved it
For anybody having similar issues:
Seperate Data Flows for filling Dimension Tables with those filling Fact Tables will do the trick.
Its a clean solution and easier to debug.
Also: Dont run the LookUp Functions in parallel, but rather one after each other and pass on the attributes. Saves unnecessary Merges as well.
So as a Sum Up:
Four Data Flow Tasks, three for filling dimension tables ONLY and one for filling fact tables ONLY.
Loading Multiple Tables using SSIS keeping foreign key relationships
The answer posted by onupdatecascade is basically it.
Good luck!
I have some data which I need to pivot in Talend. This is a sample:
Now I need this data to be pivoted on the metric column like this:
Currently I am using tPivotToColumnsDelimited to pivot the data to a file and reading back from that file. However having to store data on an external file and reading back is messy and unnecessary overhead.
Is there a way to do this with Talend without writing to an external file? I tried to use tDenormalize but as far as I understand, it will return the rows as 1 column which is not what I need. I also looked for some 3rd party component in TalendExchange but couldn't find anything useful.
Thank you for your help.
Assuming that your metrics are fixed, you can use their names as columns of the output. The solution to do the pivot has two parts: first, a tMap that transposes the value of each input-row in into the corresponding column in the output-row out and second, a tAggregate that groups the map's output-rows according to the brandname.
For the tMap you'd have to fill the columns conditionally like this, example for output colum named "abc": = "abc".equals(in.metric)?in.value:null
In the tAggregate you'd have to group by out.brandname and aggregate each column as sum ignoring nulls.