creating a metadata driven pipeline - parameterizing a source file - azure-data-factory

I have CSV files that are placed in various folders on a blob storage container.
These files will map to a table in a database, and we will use ADF to copy the data to the database.
The aim is to have the pipeline metadata-driven. We have a file that contains JSON with details of each source file and sink table.
[
{
"sourceContainer":"container1",
"sourceFolder":"folder1",
"sourceFile":"datafile.csv",
"sinkTable":"staging1"
},
{
"sourceContainer":"container1",
"sourceFolder":"folder2",
"sourceFile":"datafile2.csv",
"sinkTable":"staging2"
}
]
A for each will look through these values, place them in variables and use them to load the appropriate table from the appropriate CSV.
The issue is, for a CSV source dataset, I cannot parameterize the source dataset with user variables (fields marked with a red x in the below screenshot).
Would appreciate advice on how to tackle this.

The feature is definitely supported, so I'm not sure what you mean by "cannot parameterize". Here is an example of defining the parameters:
And here is an example of referencing them:
I recommend you use the "Add dynamic content" link and the expression builder to get the correct reference.
If you are having some other issue, please describe it in more detail.

Related

An error occurs when using MLDataTable to load data

I tried to create a word tagger model in Swift according to this tutorial in the latest XCode. But I cannot load data from a local file using MLDataTable. Here is my code.
let data = try MLDataTable(contentsOf:
URL(fileURLWithPath: "/path/to/data.json"))
The error is as follows.
error: Couldn't lookup symbols:
CreateML.MLDataTable.init(contentsOf: Foundation.URL, options:
CreateML.MLDataTable.ParsingOptions) throws -> CreateML.MLDataTable
I tried absolute path and relative path, but neither of them worked(I am pretty sure that the data file is in the right location and the paths are correct). In addition, I can load the local file to a URL object, so the problem should lie in MLDataTable.
Could someone help?
I have the same error however I used .csv file. But the problem is solved when I use COREML tool under developer tools of Xcode.
Here are some recommendations:
Your training data's class column label should be "label"
Your training data can be one file but testing data should contains sub folders named exactly the same name of your label's names. To illustrate, you have "negative", "positive" and "neutral as label names. Then you should have three sub folders named "negative", "positive" and "neutral". Moreover testing data files can't be one json or csv file including all the testing data. For example if you have five rows of negative labeled data, you can't put that csv file under negative sub-folder. You have to create five txt file for each five row.

Azure Data Factory Cannot Read Metadata Folder

I hope you guys keep health and keep strong in Pandemic covid-19.
I have some question on Azure Data Factory. btw I have create some pipeline with Metadata activity with detail below:
I have file in Folder and Subfolder like this:
I have metadata activity with for each with first get metadata child item (in folder) like this:
metadata with last modified like this (if you setting like this, metadata only read last modified subfolder
after that add variable I use #item().Name to read file in that folder like this:
after running metadata which have subfolder, I've get error like this:
the error give info that with #item().Name cannot read subfolder on that folder. the metadata for each file is success, but error like this which on my activity cannot read metadata subfolder .
many big thanks to have answer, Thank You
If you need to access the folder
Create a clone of same dataset and setup parameter as below, leave the file field empty.
If you need to access the file inside directory, use condition #equals(item().type,'Folder') to identity directory then inside that use dataset with parameters for directory and file.

ADF Copy only when a new CSV file is placed in the source and copy to the Container

I want to copy the file from Source to target container but only when the Source file is new
(latest file is placed in source). I am not sure how to proceed this and not sure about the syntax to check the source file greater than target. Should i have to use two get metadata activity to check source and target last modified date and use if condition. i tried few ways but it didn't work.
Any help will be handy
syntax i used for the condition is giving me the error
#if(greaterOrEquals(ticks(activity('Get Metadata_File').output.lastModified),activity('Get Metadata_File2')),True,False)
error message
The function 'greaterOrEquals' expects all of its parameters to be either integer or decimal numbers. Found invalid parameter types: 'Object'
You can try one of the Pipeline Templates that ADF offers.
Use this template to copy new and changed files only by using
LastModifiedDate. This template first selects the new and changed
files only by their attributes "LastModifiedDate", and then copies
them from the data source store to the data destination store. You can
also go to "Copy Data Tool" to get the pipeline for the same scenario
with more connectors.
View
documentation
OR...
You can use Storage Event Triggers to trigger the pipeline with copy activity to copy when each new file is written to storage.
Follow detailed example here: Create a trigger that runs a pipeline in response to a storage event

Azure Factory v2 Wildcard

I am trying to create a new dataset in ADF that looks for csv files that meet a certain naming convention. These files are located within a series of different folders in my Azure Blob Storage.
For instance, in the sample directory below, I am trying to pull out csv files that contain the word "cars".
Folder A
fastcars.csv
fasttrucks.csv
Folder B
slowcars.csv
slowtrucks.csv
Ideally , I would end up with the files "slowcars.csv" and "fastcars.csv". I've seen examples out there were people were able to wildcard the file name. I have been playing around with that, but have had no luck. (See image below for one example of what I have been doing).
Is what I am trying to do even possible? Would appreciate any advice you guys may have. Please let me know if I can provide further clarification.
According to the description of filename in this documentation,
The file name under the given fileSystem + folderPath. If you want to
use a wildcard to filter files, skip this setting and specify it in
activity source settings.
so you need to specify it in activity not in file path.
A easy sample in copy activity:
Hope this can help you.

How to make a section optional when mapped to optional data in a Word OpenXml Part?

I'm using OpenXml SDK to generate word 2013 files. I'm running on a server (part of a server solution), so automation is not an option.
Basically I have an xml file that is output from a backend system. Here's a very simplified example:
<my:Data
xmlns:my="https://schemas.mycorp.com">
<my:Customer>
<my:Details>
<my:Name>Customer Template</my:Name>
</my:Details>
<my:Orders>
<my:Count>2</my:Count>
<my:OrderList>
<my:Order>
<my:Id>1</my:Id>
<my:Date>19/04/2017 10:16:04</my:Date>
</my:Order>
<my:Order>
<my:Id>2</my:Id>
<my:Date>20/04/2017 10:16:04</my:Date>
</my:Order>
</my:OrderList>
</my:Orders>
</my:Customer>
</my:Data>
Then I use Word's Xml Mapping pane to map this data to content control:
I simply duplicate the word file, and write new Xml data when generating new files.
This is working as expected. When I update the xml part, it reflects the data from my backend.
Thought, there's a case that does not works. If a customer has no order, the template content is kept in the document. The xml data is :
<my:Data
xmlns:my="https://schemas.mycorp.com">
<my:Customer>
<my:Details>
<my:Name>Some customer</my:Name>
</my:Details>
<my:Orders>
<my:Count>0</my:Count>
<my:OrderList>
</my:OrderList>
</my:Orders>
</my:Customer>
</my:Data>
(see the empty order list).
In Word, the xml pane reflects the correct data (meaning no Order node):
But as you can see, the template content is still here.
Basically, I'd like to hide the order list when there's no order (or at least an empty table).
How can I do that?
PS: If it can help, I uploaded the word and xml files, and a small PowerShell script that injects the data : repro.zip
Thanks for sharing your files so we can better help you.
I had a difficult time trying to solve your problem with your existing Word Content Controls, XML files and the PowerShell script that added the XML to the Word document. I found what seemed to be Microsoft's VSTO example solution to your problem, but I couldn't get this to work cleanly.
I was however able to write a simple C# console application that generates a Word file based on your XML data. The OpenXML code to generate the Word file was generated code from the Open XML Productivity Tool. I then added some logic to read your XML file and generate the second table rows dynamically depending on how many orders there are in the data. I have uploaded the code for you to use if you are interested in this solution. Note: The xml data file should be in c:\temp and the generated word files will be in c:\temp also.
Another added bonus to this solution is if you were to add all of the customer data into one XML file, the application will create separate word files in your temp directory like so:
customer_<name1>.docx
customer_<name2>.docx
customer_<name3>.docx
etc.
Here is the document generated from the first xml file
Here is the document generated from the second xml file with the empty row
Hope this helps.