Filter Copy-Data source using variable - azure-data-factory

Scenario: I have multiple Views on a Azure SQL Database as source for a Copy Data pipeline. The Views contain data for multiple customers so I need the pipeline filtered by a customer ID.
I can do this using the Source query and just hard code the Customer ID but I'd like to make it more generic and use a variable to be set once and it be used to filter all the views. It is something that at first glance should be pretty straight forward.
Setting the variable is not a problem but I can't figure out the syntax to use in the Query. Or is there another mechanism I can use?
The basic pipeline (links as I can't embed yet):
Basic Pipeline
Filtering using this: Query
Update:
Went with a solution very similar to Jay Gong below. Didn't use #Concat but assigned parameter to variable in SQL code and used in where clause. Will look into #Concat as I suspect it's slightly more efficient.

You could consider passing parameters into ADF to complete query sql in source query blank.The sql could be dynamic content with #concat built-in function.
For example:

Related

Azure Data Flow generic curation framework

I wanted to create a data curation framework using Data Flow that uses generic data flow pipelines.
I have multiple data feeds (raw tables) to validate (between 10-100) and write to sink as curated tables:
For each raw data feed, need to validate the expected schema (based on a parameterized file name)
For each raw data feed, need to provide the Data Flow Script with validation logic (some columns should not be null, some columns should have specifici data types and value ranges, etc.)
Using Python SDK, create Data Factory and mapping data flows pipelines using the Data Flow Script prepared with the parameters provided (for schema validation)
Trigger the python code that creates the pipelines for each feed, does validation, write the issues into Log Analytics workspace and tear off the resources at specific schedules.
Has anyone done something like this? What is the best approach for the above please?
My overall goal is to reduce the time to validate/curate the data feeds, thus I wanted to prepare the validation logic quickly for each feed and create python classes or Powershell scripts scheduled to run them on generic data pipelines at specific times of the day.
many thanks
CK
To validate the schema, you can have a reference dataset which will be having the same schema (first row) as of your main dataset. Then you need to use “Get Metadata” activity for each dataset and get the structure of each dataset. Your Get Metadata activity will look like this:
You can then use “If Condition” activity to matches the structure of both datasets using equal Logical Function. Your equal expression will look something like this:
If both datasets’ structure matches, your next required activity(like copy the dataset to another container) will be performed.
Your complete pipeline will look like this:
The script which you want to run on your inserted dataset could be performed using “Custom” activity. You again need to create the linked service and it’s corresponding dataset for your script which you will run to validate the raw data. Please refer: https://learn.microsoft.com/en-us/azure/batch/tutorial-run-python-batch-azure-data-factory
To schedule the pipeline as per your specific pipeline will be take care by Triggers in Azure Data Factory. A schedule trigger will take care of your requirement of auto trigger your pipeline at any specific time.

Oracle SQL Developer -- Is there a way to reload past fiter parameters that I have specified

When examining a table, there is a filter field. Sometimes I put some lengthy filter parameters in there. Is it possible to see past parameters I have specified, and load them into the Filter field?
No, I don't think so. SQL Developer's filter offers just a single line of a filter.
Just as an illustration, TOAD lets you do that because its filter looks more like an "Editor" window, so you can put several filters in there and (un)comment them as you want:

How to implement conditional branches in Azure Data Factory pipelines

I am implementing a pipeline to insert data updates from csv files to SQL DB. Plan is to first insert the data to temporary SQL table for validation and transformation, and then move processed data to actual SQL table. I would like to branch the pipeline execution depending on the validation result. If data is OK, it will be inserted to target SQL table. If there are fatal fails, insertion activity should be skipped.
Tried to find instructions / guidance but no success so far. Any ideas if pipeline activity supports conditional execution, e.g. based on some properties in input dataset?
It is possible now with Azure Data Factory ver 2.
Post execution our downstream activities can now be dependent on four possible outcomes as standard.
- On success
- On failure
- On completion
- On skip
Also, custom ‘if’ conditions will be available for branching based expressions.
Refer below links for more detail:-
https://www.purplefrogsystems.com/paul/2017/09/whats-new-in-azure-data-factory-version-2-adfv2/
https://learn.microsoft.com/en-us/azure/data-factory/tutorial-control-flow
The short answer is no.
I think its worth pointing out that ADF is just an orchestration tool to invoke other services. The current version can't do what you want because it does not have any of its own compute. Its not an SSIS data flow engine.
If you want this behaviour you'll need to code it into the SQL DB stored procedures with flags etc on the processed datasets.
Then maybe have some boiler plate code with a parameters that are passed from ADF to perform either the insert or update or divert operation.
Handy link for called stored procedure with params from ADF: https://learn.microsoft.com/en-us/azure/data-factory/data-factory-stored-proc-activity
Hope this helps.

Is it possible to prevent the SQL Producer from overwriting just one of the tables columns?

Scenario: A computed property needs to available for RAW methods. The IsComputed property set in the model will not work as its value will not be available to RAW methods.
Attempted Solution: Create a computed column directly on the SQL table as opposed to setting the IsComputed property in the model. Specify that CodefluentEntities not overwrite the computed column. I would than expect the BOM to read the computed SQL field no differently than if it was a normal database field.
Problem: I can't figure out how to prevent Codefluent Entities from overwriting the computed column. I attempted to use the production flags as well as setting produce="false" for the property in the .cfp. Neither worked.
Question: Is it possible to prevent Codefluent Entities from overwriting my computed column and if so, how?
The solution youre looking for is here
You can execute whatever custom T-SQL scripts you like, the only premise is to give the script a specific name so the Producer knows when to execute it.
i.e. if you want your custom script to execute after the tables are generated, name your script
after_[ProjectName]_tables.
Save your custom t-sql file alongside the codefluent generated files and build the project.
In my specific case, i had to enable full-text index in one of my table columns, i wrote the SQL script for the functionality, saved it as
`after_[ProjectName]_relations_add`
Heres how they look in my file directory
file directory
Alternate Solution: An alternate solution is to execute the following the TSQL script after the SQL Producer finishes generating.
ALTER TABLE PunchCard DROP COLUMN PunchCard_CompanyCodeCalculated
GO
ALTER TABLE PunchCard
ADD PunchCard_CompanyCodeCalculated AS CASE
WHEN PunchCard_CompanyCodeAdjusted IS NOT NULL THEN PunchCard_CompanyCodeAdjusted
ELSE PunchCard_CompanyCode
END
GO
Additional Configuration Needed to Make Solution Work: In order for this solution to work one must also configure the BOM so that it does not attempt to save the data associated with the computed columns. This can be done through Model using the advanced properties. In my case I selected the CompanyCodeCalculated property. Went to advanced settings. And set the Save setting to False.
Question: Somewhere in the Knowledge Center there is a passing reference on how to automate the execution SQL Scripts after the SQL Producer finishes but I can not find it. Anybody now how this is done?
Post Usage Comments: Just wanted to let people know I implemented this approach and am so far happy with the results.

iReport: Setting Parameter Values from Query MongoDB

I am fairly new to JasperReports and iReport and am struggling with something, which seems should be basic.
If you use MongoDB then you know it does not support the concept of a 'JOIN'. Therefore, from the iReport main dataset query I want to set a parameter/variable from the results. Then I want to use the collection values I just set in a different dataset as a query parameter/variable (NOT table, or LIST - just a plain old simple dataset I create, which will also query MongoDB as the source).
It seems this would be a straight forward use case, but I don't see anything intuitive in iReport that seems would do this. Can this be done? If so any clues you can give me would be wonderful and greatly appreciated.
Do you want to pass the values as collection from one report onto the other ?
This can be done by writing the following in your filter expression $P{parameter_name}.contains($F{field_name}). Additionally, you need to create parameter with the same parameter_name with class type java.util.collection .
Now this report is ready to receive any parameters as collections. This works for MongoDB as I have tried this out. Now as you have already said that you have been able to send the collection from the main report, the above method will work for receiving the parameter in the second report.