I have a sink table that I would like to populate with the ActivityRunID of the Copy Data iteration within the Until loop
I understand that I cannot map ActivityRunID within the Copy Data task until that task is completed. My Until looks like this:
Is there an easy way to populate my sink with the RunID once the Copy Data task has finished? I was thinking of populating the sink with a dummy GUID then using a Lookup task to populate it in a subsequent task
If you want to use the Lookup activity just to pull your dummy GUID.
Alternatively, you can generate a GUID using data factory dynamic expressions and store it in a variable using Set Variable activity. And use the variable in later activities directly.
If you want to use the copy data activity RunID, create a stored procedure to update the value in the sink with the input parameter and pass the parameter of activity RunID from the stored procedure activity.
Parameter value: #activity('Copy data1').ActivityRunId
Related
I want to do some activity in an ADF pipeline, but only if a field in a JSON output is present. What kind of ADF expression can I use to check that?
I set up two json files for testing, one with a firstName attribute and one without:
I then created a Lookup activity to get the contents of the JSON file and a Set Variable activity for testing the expression. I often use these to test expressions and it's a good way to test and view expression results iteratively:
I then created a Boolean variable (which is one of the datatypes supported by Azure Data Factory and Synapse pipelines) and the expression I am using to check the existence of the attribute is this:
#bool(contains(activity('Lookup1').output.firstRow, 'firstName'))
You can then use that boolean variable in an If activity, to execute subsequent activities conditionally based on the value of the variable.
I am trying to read data using a simple select query and create a csv file with the resultset data.
As of now,I have the select query present in application.properties file and I am able to generate the csv file.
Now, I want to move the query to a static table and fetch it as an initialization step before the batch job starts(Something like a before job).
Could you please let me know what would be the best strategy to do so.i.e. reading from a database before the actual batch job of fetching the data and creating a CSV file starts.
I am able to read the data and write it to a CSV file
application.properties
extract.sql.query=SELECT * FROM schema.table_name
I want it moved to database and fetched before actual job starts
1) I created a job with one step(Read and then write).
2) Implemented JobExecutionListener. In the beforeJob method, used JdbcTemplate to fetch the relevant details(A query in my case) from the DB.
3) Using jobExecution.getExecutionContext() , I set the query in the execution context.
4) Used a step scoped reader to retrieve the value using late binding. #Value("#{jobExecutionContext['Query']}") String myQuery.
5) The key to success here is to pass a placeholder value of null so that the compilation is successful.
I have an ADF pipeline with copy activity, I'm copying data from blob storage CSV file to SQL database, this is working as expected. I need to map Name of the CSV file (this coming from pipeline parameters) and save it in the destination table. I'm wondering if there is a way to map parameters to destination columns.
Column name can't directly use parameters. But you can use parameter for the whole structure property in dataset and columnMappings property in copy activity.
This might be a little tedious as you will need to write the whole structure array and columnMappings on your own and pass them as parameters into pipeline.
In DF v2 in Copy Data activity, it is possible to add a new column to the source with value $$FILEPATH, and then each record will have a name of the input file.
Azure DF v2, CopyData activity -> Source
Whenever I execute a stored procedure in the ADFv2, it gives me an output as
{
"effectiveIntegrationRuntime": "DefaultIntegrationRuntime (Australia Southeast)",
"executionDuration": 34
}
even though I have set 2 variables as output in the procedure. Is there any way to map the output of the stored procedure in the ADFv2? Till now I can map the output of all the other activities but not of Stored procedures.
You could use a lookup activity to get the result.
Please reference this post. https://social.msdn.microsoft.com/Forums/azure/en-US/82e84ec4-fc40-4bd3-b6d5-b742f3cd1a33/adf-v2-how-to-check-if-stored-procedure-output-is-empty?forum=AzureDataFactory
Update by Gagan:
Instead of getting the output of SP (which is not possible in ADFv2 right now), I stored the output in the table and then apply lookup-foreach to the table to get the value.
Stored procedure call in Data factory (v2) does not capture the result data set. So you cannot use the stored procedure activity to get the result data set and refer it in next activities.
Workaround is to use lookup activity to call exact same stored procedure as lookup will get you the result data set from stored procedure. Replace your Stored procedure activity with lookup and it will work.
let's assume there are two file sets A and B on azure data lake store.
/A/Year/
/A/Month/Day/Month/
/A/Year/Month/Day/A_Year_Month_Day_Hour
/B/Year/
/B/Month/Day/Month/
/B/Year/Month/Day/B_Year_Month_Day_Hour
I want to get some values (let's say DateCreated of A entity) and use these values generate file paths for B set.
how can I achieve that?
some thoughts,but i'm not sure about this.
1.select values from A
2.store on some storage ( azure data lake or azure sql database).
3. build one comma separated string pStr
4. pass pStr via Data Factory to stored procedure which generates file paths with pattern.
EDIT
according to #mabasile_MSFT answer
Here is what i have right now.
First USQL script that generates json file, which looks following way.
{
FileSet:["/Data/SomeEntity/2018/3/5/SomeEntity_2018_3_5__12",
"/Data/SomeEntity/2018/3/5/SomeEntity_2018_3_5__13",
"/Data/SomeEntity/2018/3/5/SomeEntity_2018_3_5__14",
"/Data/SomeEntity/2018/3/5/SomeEntity_2018_3_5__15"]
}
ADF pipeline which contains Lookup and second USQL script.
Lookup reads this json file FileSet property and as i understood i need to somehow pass this json array to second script right?
But usql compiler generates string variable like
DECLARE #fileSet string = "["/Data/SomeEntity/2018/3/5/SomeEntity_2018_3_5__12",
"/Data/SomeEntity/2018/3/5/SomeEntity_2018_3_5__13",
"/Data/SomeEntity/2018/3/5/SomeEntity_2018_3_5__14",
"/Data/SomeEntity/2018/3/5/SomeEntity_2018_3_5__15"]"
and the script even didn't get compile after it.
You will need two U-SQL jobs, but you can instead use an ADF Lookup activity to read the filesets.
Your first ADLA job should extract data from A, build the filesets, and output to a JSON file in Azure Storage.
Then use a Lookup activity in ADF to read the fileset names from your JSON file in Azure Storage.
Then define your second U-SQL activity in ADF. Set the fileset as a parameter (under Script > Advanced if you're using the online UI) in the U-SQL activity - the value will look something like #{activity('MyLookupActivity').output.firstRow.FileSet} (see Lookup activity docs above).
ADF will write in the U-SQL parameter as a DECLARE statement at the top of your U-SQL script. If you want to have a default value encoded into your script as well, use DECLARE EXTERNAL - this will get overwritten by the DECLARE statements ADF writes in so it won't cause errors.
I hope this helps, and let me know if you have additional questions!
Try this root link, that can help you start with all about u-sql:
http://usql.io
Usefull link for your question:
https://saveenr.gitbooks.io/usql-tutorial/content/filesets/filesets-with-dates.html