In a data flow task, how do I restrict rows flowing using a value from another source? - merge

I have an excel sheet with many tabs. Say one is called wsMain and the other is called wsDate.
In my data flow transformation I am able to successfully load the data from wsMain to my table.
Now I have to update this transformation where I have to fetch the maximum date from the worksheet wsDate and only load data from wsMain where the date is less than on equal to the maximum date in wsDate (that is the only column available).
So for I have figured out that I need to create a new Excel connection manager to read the data from wsDate and I have used the Aggregate transformatioin to get the maximum date.
Now the question is how do I use this date to restrict the rows coming from wsMain?
I understand from the link below that you can store the value in a variable but what do I do next?:
SSIS set result set from data flow to variable
I have tried using a merge join but not sure if I am doing it right.
Here is what it looks like now:

I could not achieve the above but would be interested to know if that is possible. As a work around I have created a separate dataflow where I have stored the valued in a variable and then used the variable in the conditional split to filter the required rows:
Here is a step by step guide I followed to write the variable:
https://www.proteanit.com/2008/12/11/ssis-writing-to-a-package-variable-in-a-dataflow/

You can obtain the maximum value of the wsDate column first, this use this as a filter to avoid introducing unnecessary records into the data flow which which would be discarded by the Conditional Split. An overview of this process is below. I'd also recommend confirming the data types for all columns involved.
Create an SSIS DateTime variable and name this something descriptive such as MaxDate.
Create a Data Flow Task before the current one with an Excel Source component. Use the SQL command option for the Data Access Mode and enter a SQL statement to return the max value of the wsDate column. In the following example ExcelSource is the name of the sheet that you're pulling from. I'd suggested confirming the query with the Preview button on the Excel Source as well.
Add a Script Component (not Task) after the Excel Source. Add the MaxDate variable in the ReadWriteVariables field on the main page of the Script Component. On the Inputs and Outputs pane add the output column from the Excel Source as an Input Column with the ReadOnly usage Type. Example C# code for this is below. Note that variables can only be written to in the PostExecute method. The Input0_ProcessInputRow method is called once for each row that passes through, however there will only be the single row in this case. On the following code MaxExcelDate is the name of the output column from the Excel Source.
On the Excel Source component in the Data Flow Task where the records are imported from Excel, change the Data Access Mode to SQL command and enter a SQL statement to return records that have a date less than or equal to the maximum wsDate value. This is the last example and the ? is a placeholder for the parameter. After entering this SQL, click the Parameters button and select Parameter0 for the Parameters field, the MaxDate variable for Variables field, and a direction of Input. The Conditional Split can then be removed since these records will now be filtered out.
Excel MAX wsDate SELECT:
SELECT MAX(wsDate) AS MaxExcelDate FROM ExcelSource
C# Script Component:
DateTime maxDate;
public override void PostExecute()
{
base.PostExecute();
Variables.MaxDate = maxDate;
}
public override void Input0_ProcessInputRow(Input0Buffer Row)
{
maxDate = Row.MaxExcelDate;
}
Excel Command with Date Filter:
SELECT
Column1,
Column2,
Column3
FROM ExcelSheet
WHERE DateColumn <= ?

Yes, it is possible. In the data flow, you will need to determine the max date, which you already have. Next, you will need to MERGE JOIN the two data flows on the date column. From there, you will feed it into a CONDITIONAL SPLIT and split where the date columns match [i.e., !ISNULL()] versus do not match [i.e., ISNULL()]. In your case, you only want the matches. The non-matches will be disregarded.
Note: if you use an INNER JOIN on the MERGE JOIN where there is only one date (i.e., MaxDate) to join on, then this will take care of the row filtering for you. You will not need a CONDITIONAL SPLIT.
Welcome to ETL.
Update
It is a real pain that SSIS's MERGE JOINs only perform joins on EQUAL operations as opposed to LESS THAN and GREATER THAN operations. You will need to separate the data flows.
Use a script component to scan the excel file for the MAX Date and assign that value to a package variable in SSIS. Alternatively, you can have a dates table in SQL Server and then use an Execute SQL Command in SSIS to retrieve the MAX Date from the table and assign that value to a package variable
Modify your existing data flow to remove the reading of the Excel date file completely. Then add a DERIVED COLUMN transformation and add a new column that is mapped to the package variable in SSIS that stores the MAX date. You can name the Derived Column Name 'MaxDate'
Add a conditional split transformation with the following CONDITION logic: [AsOfDt] <= [MaxDate]
Set the Output Name to Insert Records
Note: The CONDITIONAL SPLIT creates a new output data flow with restricted/filtered rows. It does not create a new column within the existing data flow. Think of this as a transposition of data flow output from column modification to row modification. Only those rows that match the condition will be sent to the output that you desire. I assume you only want to Insert these records, so I named it that. You can choose whatever naming convention you prefer
Note 2: Sorry for not making the Update my original answer - I haven't used the AGGREGATE transformation before so I was not aware that it restricts row output as opposed to reading a value in the data flow and then assigning it to a variable. That would be a terrific transformation for Microsoft to add to SSIS. It appears that the ROWCOUNT and SCRIPT COMPONENT transformations are the only ones that have the ability to set a package variable value within the data flow.

Related

Relative Date Item in Power Pivot GETPIVOT DATA excel function

I am using a GETPIVOTDATA function in Excel to source data from a pivot table generated by a Power BI query (everything was originally only in excel, the file got too large, so i stored the main tables in PBI but kept the reports in excel for mgmt's sake).
=GETPIVOTDATA("[Measures].["&$A$100&"]",'PIVOT Table_test'!$A$126,"[Master].[field1]","[Master].[field1].&["&C$26&"]","[Master].[AsofDate]","[Master].[AsofDate].&[2022-04-30T00:00:00]")
However, I want to make the GETPIVOTDATA function as dynamic as possible to prevent having too many hardcoded fields/items for each table that fields the charts we look at. However, when i reference the pivot table, the '[Asof]' field populates the static item as "...&[2022-04-30T00:00:00]")...
I have been trying to change that to reference a header row that contains a Short Date value (4/30/2022) like &["&$B&1"&"]")... but i keep getting #ref errors, every other field accepts the "&&" method, and when i leave the hardcoded timestamp in the formula, it populates.
So it has to be that reference but i do not understand what I am doing wrong. I have also tried changing the format of both the header row in Excel and the field within PBI but to no success.
Found the answer on another site. The solution in the item brackets is to write the following:
["&TEXT($A22,"yyyy-mm-dd""T00:00:00""")&"]

How to add a date range in Azure Data Factory data flow

Working info
I have two different sources of data set so I have created a dataflow in data factory in which for first data(A) set I am doing some transformation and loading into sink,in another data set(B) similarly am performing some transformation and loading into another sink.
Issue
Now I have some requirements in which there is date column DT_COLUMN_A(11-04-2020 01:17:40) in first data set(A)which needs to be compared with a date column DT_COLUMN_B(01-01-2020 16:32:00) in second data set (B) and store the compared output as a column in second dataset(B).
So I need the min and max(date range) of date column from dataset A ,apply it to min and max of date column to dataset B and find the dates which are matching in A and B and store it as YES if not matching NO.
Code approach thought
Logic needed:
if(min(DT_COLUMN_A) and max(DT_COLUMN_A) == min(DT_COLUMN_B) and max(DT_COLUMN_B) then YES else No.
I am trying to achieve this in ADF data flow but unable to do it.
To get MIN and MAX of a dataset in ADF, you will need the Aggregate transformation. Create new columns called MinA, MinB, MaxA, MaxB from each of the relative streams in your data flow using Aggregate. Set the aggregate function to MIN and MAX appropriately for each. Then, you'll be able to set an iif() expression afterward, or use a Filter or Conditional Split transformation that uses those stored min & max values.
I managed to get something similar to work using using a mapLoop() expression to first build an array of dates in a derived column transformation followed by a flatten transformation
https://stackoverflow.com/a/73453351/12592985

Jaspersoft Studio: Force input parameter of subreport to be entered manually

In my main report I get a (small) list of string values from the data base. I then want to use this list for selecting records in a subreport, along with other input parameters:
The user shall be able to select records based on a range of begin and end date -- this is easy using an input parameter of type java.util.Date with "Is For Prompting" set to true. Another criteria shall be one or more items from a list showing values from a data base field. I could define the list in the report template, but then I'd have hard-coded strings (filled from the data base, but at definition time only).
Now the dilemma is: If I define the input parameters in the main report, I cannot get the values for the list beforehand; if I define them in the subreport, I get no prompt at all, so there's no way to set any of them.
So the report requires values for start and end date, and a list of string values to select from (multiple itmes can be selected). This list shall be built from values from the data base. In the subreport all these values shall be joined into a filter for the records. A user shall be able to define the dates and select items from the list manually before executing the report.
Is there a way to achieve this?
After some more hours of trial & error, and some more research, of course, I found that the keyword is "Query-based Input Controls". This documentation describes their creation on the JasperReports Server. Such input controls can be edited in Jaspersoft Studio as well, however, they actually work on the server only. Anyway, this is the solution to my problem.

Perform analysis on last three values of a FileMaker dataset

My end goal is to have a box change color when the last 3 records input into a field (based on the time of input) in FileMaker achieve a certain criteria (ex. variance < 2). I would like to know how to make this happen, or how a calculation/script can be written to only look at the last 3 records.
There are several ways you could approach this. A simple one would be to use a script to:
Show all records in the given table;
Unsort them (assuming they were entered in chronological order; otherwise sort them by creation timestamp);
Omit all records except the last three;
Get the value of a summary field defined as Standard Deviation of your value field;
Set a global variable/field to the square of the returned value.
Then use the global variable/field to conditionally format your "box".
If you don't want to use a script, you will have to define a relationship in order to get the last three values in the table, regardless of the current found set and/or sort order. Or you may use the ExecuteSQL() function for this.

Joining two datasets to create a single tablix in report builder 3

I am attempting to join two datasets in to one tablix for a report. The second dataset requires a personID from the first dataset as its parameter.
If i preview this report only the first dataset is shown. but for my final result what i would like to happen is for each row of a student there is a rowgrouping (?) of that one students modules with their month to month attendance. Can this be done in report builder?
The best practice here is to do the join within one dataset (i.e. joining in SQL)
But in cases that you need data from two separate cubes(SSAS) the only way is the following:
Select the main dataset for the Tablix
Use the lookup function to lookup values from the second dataset like this:
=Lookup(Fields!ProductID.Value, Fields!ID.Value, Fields!Name.Value, "Product")
Note: The granularity of the second dataset must match the first one.
We had a similar issue and that can be resolved this way.
First of All, ensure the first data set's query and second data set's query are working fine by executing separately on the Database client tool such as Datastudio.
Build two data sets on SSRS tool with the respective queries and make sure both the data sets have same key column (personID).
On the SSRS report design, create a table from tool box and add the required columns from the first data set along with the matching key column(personID). Add a new column and use look up function to get the required column from the other data set against the same key column (personID).