Data Flow Partition by Column Value Not Writing Unique Column Values to Each Folder - azure-data-factory

I am reading an SQL DB as source and it outputs the following table.
My intention is to use data flow to save each unique type into a data lake folder partition probably named as specific type.
I somehow manage to create individual folders but my data flow saves the entire table with all types into each of the folders.
my data flow
Source
Window
Sink
Any ideas?

I create a same csv source and it works well, please ref my example.
Windows settings:
Sink settings: choose the file name option like this
Note, please don't set optmize again in sink side.
The output folder schema we can get:
Just for now, Data Factory Data Flow doesn't support custom the output file name.
HTH.

You can also try "Name folder as column data" using the OpType column instead of using partitioning. This is a property in the Sink settings.

Related

How to Load files with the same name in data flow Azure data factory

I use data flow in Azure data factory And I set as source dataset files with the same name. The files have named “name_date1.csv” end “name_date2.csv”. I set path “name_*.csv”. I want that data flow load in sink db only data of “name_date1”. How is it possible?
I have reproduced the above and able to get the desired file to sink using Column to store file name option in source options.
These are my source files in storage.
I have given name_*.csv in wild card of source as same as you to read multiple files.
In source options, go to Column to store file name and give a name and this will store the file name of every row in new column.
Then use filter transformation to get the row only from a particular file.
notEquals(instr(filename,'name_date1'),0)
After this give your sink and you can get the rows from your desired file only.

Azure Data Factory DataFlow Source CSV File Header Keep Changing

I am trying load the CSV file from source blob storage and option selected for first row as a header but while doing multiple time debug trigger, the header keep changing, so that i could not able to insert the data to target SQL DB.
kindly suggest and how do we handle this scenario. i am expecting static header needs to configure from source or else existing column i would have to rename into adf side.
Thanks
In Source settings "Allow Schema drift" needs to be ticked.
Allow Schema Drift should be turned-on in the sink as well.

azure data flow converting boolean to string

I am trying to split large json files into smaller chunks using azure data flow. It splits the file but it changes column type boolean to string in output files. This same data flow will be used for different json files with different schemas therefore can't have any fixed schema mapping defined. I have to use auto mapping option. Please suggest how could I solve this issue of automatic datatype conversion? or Any other approach to split the file in the azure data factory?
Here, with my dataset I have tried to have a source as a json file and Sink as a json. If you have a fixed Schema and import it then the data flow works fine and could return a boolean value after running the pipeline.
But as you stated to have "same data flow will be used for different json files with different schemas therefore can't have any fixed schema mapping defined". Hence, you must have a Derived Column to explicitly covert all to boolean values.
Import Schema :
**In the sink you could inspect :
Data Preview :
In your ADF Data Flow Source transformation, click on the Projection Tab and click "Define default format". Set explicit values for Boolean True/False so that ADF can use that hint for proper data type inference for your data.

Copy json blob to ADX table

I have an ADF with a copy activity which copies a json blob to kusto.
I have did the following:
Created a json mapping in the kusto table.
In the "Sink" section of the copy activity: I set the Ingestion mapping name field the name of #1.
In the mapping section of the copy activity, I mapped all the fields.
When I run the copy activity, I get the following error:
"Failure happened on 'Sink' side. ErrorCode=UserErrorKustoWriteFailed,'Type=Microsoft.DataTransfer.Common.Shared.HybridDeliveryException,Message=Failure status of the first blob that failed: Mapping reference wasn't found.,Source=Microsoft.DataTransfer.Runtime.KustoConnector,'"
I looked in kusto for ingestion failures and I see this:
Mapping reference 'mapping1' of type 'mappingReference' in database '' could not be found.
Why am I seeing those errors even though I have an ingestion mapping on the table and what do I need to do to correct it?
It might be that the ingestion format specified in the ADF is not json.
Well, After I removed the mapping name in the sink section, it works.
Looks like the docs are not updated because it states that you can define both:
"ingestionMappingName Name of a pre-created mapping on a Kusto table. To map the columns from source to Azure Data Explorer (which applies to all supported source stores and formats, including CSV/JSON/Avro formats), you can use the copy activity column mapping (implicitly by name or explicitly as configured) and/or Azure Data Explorer mappings."

read sqlserver database using mirth connect and convert it into xml format and vice versa

I have a requirement where I have to read data from sql server local database and first map it in XML file provided by another third party org. who have their own database. Then once I have proper mapping of fields I have to transform the data from sql server database to XML format and vice versa.
So far, I am able to connect sqlserver database in mirthconnect however I dont know what steps are required to create in channels and transformer to carry the task of reading data and mapping corresponding fields to XML format provided by third party and finally writing in XML file provided and vice versa.
In short if I can get details of creating such channel in mirth connect where I can read sql server database and map the fields in corresponding xml file....I guess I can write to it. Same way applies if I go from xml format to sqlserver database. Can someone tell me how to accomplish this?
For database field mapping whats the best way to map fields entirely on two different databases is there any tool which can help....
Also once the task of transforming the data from one end to another is accomplished is there any way of validation in mirth connect that verifies that data is correctly moved from one to another?
If you want to process one row at a time, the normal database reader will work fine; just set the data type under Summary to XML for all steps. Set a destination of channel writer to nowhere and run it once to see what it does in the Dashboard. You can copy and paste that as an example into your message template so you can map variables.
If you want to work an entire result at one time in the Transformer steps, I find it easier to create a custom reader and use "FOR XML RAW, ELEMENTS" on the end of my Microsoft SQL query.
Something like:
//build connection
var dbConn = DatabaseConnectionFactory.createDatabaseConnection('com.microsoft.sqlserver.jdbc.SQLServerDriver','jdbc:sqlserver://servername:1433;databaseName=dbname;integratedSecurity=true;','',''); //this uses the MS JDBC driver and auth dll
//query results with XML output from server 'FOR XML' statement at end
var result = dbConn.executeCachedQuery("SELECT col1 AS FirstColumn, col2 AS SecondColumn FROM [dbname].[dbo].[table1] WHERE [processed] = 'False' FOR XML RAW, ELEMENTS");
//Make sure we are at the top of results
result.beforeFirst();
//wrap XML. Namespace etc. not required
XMLresult = '<message>';
//XML broke up across several rows in one column. Re-combine
while (result.next()) {
XMLresult += result.getString(1);
}
XMLresult += '</message>';
dbConn.close();
return XMLresult;