Compare Get metadata activity with lookup output - azure-data-factory

I was trying to do a delta load using ADF. I have a get meta data activity on my blob location to read all the files using getchildItem and lookup activity which reads loaded file names from a sql table. Following this I have a filter activity which should filter out the new files from the blob location.
expression on items
#activity('GetRawFilenames').output.childItems
expression on Condition
#not(contains(activity('GetLoadedFilesList').output.value,item().name))
But still its not filtering out the file list in the filter output. Could the experts please help? Thanks in advance..

Have a look at this. They describe same problem. Expression on condition should be:
#not(contains(join(activity('GetLoadedFilesList').output.value,','),item().name))
so after joining of GetLoadedFilesList activity your code should be working

Related

Pipeline filter modifies date every time I run the pipeline which prevent me from pulling only the last modified date to SQL

Once a week, a file is generated to an onprem folder. My pipeline pulls from that onprem file to blob storage, then from blob to blob, during this part the pipeline filters my data goes to sql. Problem is, when it gets filtered the modified date changes and all the files in the blob storage are pulled rather than the one that got originally pulled for that week. I have attached images of my pipeline and the onprem files and what I filter for.
Instead of trying to proceed with last modified date of the file, you can proceed using file name instead.
Since you have date (yyyyddMM format) in the filename itself, you can dynamically create the filename and check if this file is present in the filtered files list or not.
Look at the following demonstration. Let's say I have the following 2 files as my filtered files. I used Get metadata activity (child items) on the blob storage.
Since we know the format of how the filename would be (SalesWeekly_yyyyddMM.csv), create the present filename value dynamically using the following dynamic content in set variable activity (variable name is file_name_required).
#concat('SalesWeekly_',formatDateTime(utcnow(),'yyyyddMM'),'.csv')
Now, create an array containing all the filenames returned by our get metadata activity. The for each activity items value is given as #activity('Get Metadata1').output.childItems.
Inside this, use an append variable activity with value as #item().name.
Now, you have file name you actually need (dynamically build) and the filtered file names array. You can check if the filename is present in the array of filtered file names or not and take necessary actions. I used if condition activtiy with the following dynamic content.
#contains(variables('files_names'),variables('file_name_required'))
The following are reference images of the output flow.
When current week file is not present in the filtered files.
When current week file is present in the filtered files.
I have used wait for demo here. You can replace it with copy activity (from blob to SQL) in True case. If you don't want to insert when current week file is missing, then leave the false case empty.

Call item() from within query of a lookup on a dataset within a for each loop

I'm creating a pipeline in Azure Data Factory and I'm trying to use a lookup activity to query an Azure SQL dataset as so:
The lookup activity is inside a for each loop. My hope is to get a value from the source dataset being queried in the lookup for every item in the for each loop. However, when I preview the data to test this, it does not work and I get this message:
Does anyone have any ideas as to how to call a for each loop item in a query on a dataset in a lookup in ADF?
Thanks,
Carolina
EDIT:
I've changed tact and tried to use a stored procedure but I'm still having the exact same issue. It seems like I can't call the for loop item with a query of stored procedure. Does anyone know a way around this or how to call the item properly?
Data preview in ADF is used for checking the inserting data is correct or not. So, when we use preview which involves dynamic expressions, it does not take those expressions from the pipeline run and it asks us to manually enter the value for it.
So, that's why in the above it is asking a sample item() expression value to give the data preview in the lookup.
When I gave my item() value in that I got correct preview as below:
Preview of lookup that I got for that particular item():
Your approach is fine and when you debug the pipeline with required query in lookup, you can get the desired result.
please go through my sample demonstration:
Array variable for ForEach:
look up inside ForEach with query:
Result after debug pipeline:
You can see that I got the same result in the first iteration of ForEach, which is in look up preview above.
So, for preview, we have to give a sample value to check our result.

Schema compliance in Azure data factory

I am trying to do schema compliance of an input file in ADF. I have tried the below.
Get Metadata Activity
The schema validation that is available in source activity
But the above seems to only check if a particular field is present or not in the specified position. Also Azure by default takes the datatype of all these fields as string since the input is flat file.
I want to check the position and datatype as well. for eg:-
empid,name,salary
1,abc,10
2,def,20
3,ghi,50
xyz,jkl,10
The row with empid as xyz needs to be rejected as it is not of number data type. Any help is appreciated.
You can use data flow and create a filter to achieve this.
Below is my test:
1.create a source
2.create a filter and use this expression:regexMatch(empid,'(\\d+)')
3.Output:
Hope this can help you.

Need to read a CSV file using Azure Data Factory Activity

I have a CSV file.
I need to read each row and access the column values and for each row I need to call a foreach activity.
With what activity can I achieve this?
Assuming that the CSV file is in a cloud storage , you can use the lookup activity . Please beware that lookup activity has a limitation of 5000 at this time . Once you have done that you can use a FE loop and iterate through it .
Hope this helps

Retrieve blob file name in Copy Data activity

I download json files from a web API and store them in blob storage using a Copy Data activity and binary copy. Next I would like to use another Copy Data activity to extract a value from each json file in the blob container and store the value together with its ID in a database. The ID is part of the filename, but is there some way to extract the filename?
You can do the following set of activities:
1) A GetMetadata activity, configure a dataset pointing to the blob folder, and add the Child Items in the Field List.
2) A forEach activity that takes every item from the GetMetadata activity and iterates over them. To do this you configure the Items to be #activity('NameOfGetMetadataActivity').output.childItems
3) Inside the foreach, you can extract the filename of each file using the following function: item().name
After this continue as you see fit, either adding functions to get the ID or copy the entire name.
Hope this helped!
After Setting up Dataset for source file/file path with wildcard and destination/sink as some table
Add Copy Activity setup source, sink
Add Additional Columns
Provide a name to the additional column and value "$$FILEPATH"
Import Mapping and voila - your additional column should be in the list of source columns marked "Additional"