ADF - How to disable mapping in Copy Activity - azure-data-factory

I want to backup data from CosmosDB to Storage.
I found DB's data is different from Storage's data when data has a value ended with .000Z .
Data in CosmosDB like this:
{
"start": "2021-09-12T15:00:00.000Z",
"end": "2022-10-30T15:00:00.000Z",
}
Data in Storage like this:
{
"start": "2021-09-12T15:00:00Z",
"end": "2022-10-30T15:00:00Z",
}
How can I let them be same?

.000 represents the fraction of seconds in the timestamp and Z represents the UTC timezone in ISO-8601 date format. And, 00Z corresponds to midnight in Greenwich ONLY.
The recommended format for DateTime strings in Azure Cosmos DB is yyyy-MM-ddTHH:mm:ss.fffffffZ which follows the ISO 8601 UTC standard. Where, .fffffff seven-digit fractional seconds
You may enable or disable this Detect datetime property to get this as a string instead. Also, if you choose sink as .json there is very less option (such as ability to choose column format if available for .csv sink).
further you can checkout Configure Azure Cosmos DB account with periodic backup

You can see the mapping first to check what data types are getting mapped for this field
I had one such use case where data types were different and I tried following steps to resolve:
Click on Import in Mapping section of Copy Activity
Check if all columns are correctly mapped
Extract the JSON from Copy activity for mapping
Check the data types being mapped in the JSON
If the data type for the field is not matching then you will get different data
To match the data you will have to dynamically pass the JSON which is explained in detail in this tutorial: https://www.youtube.com/watch?v=b27gmOufge4

Related

Azure Data Factory schema mapping not working with SQL sink

I have a simple pipeline that loads data from a csv file to an Azure SQL db.
I have added a data flow where I have ensured all schema matches the SQL table. I have a specific field which contains numbers with leading zeros. The data type in the source - projection is set to string. The field is mapped to the SQL sink showing as string data-type. The field in SQL has nvarchar(50) data-type.
Once the pipeline is run, all the leading zeros are lost and the field appears to be treated as decimal:
Original data: 0012345
Inserted data: 12345.0
The CSV data shown in the data preview is showing correctly, however for some reason it loses its formatting during insert.
Any ideas how I can get it to insert correctly?
I had repro’d in my lab and was able to load as expected. Please see the below repro details.
Source file (CSV file):
Sink table (SQL table):
ADF:
Connect the data flow source to the CSV source file. As my file is in text format, all the source columns in the projection are in a string.
Source data preview:
Connect sink to Azure SQL database to load the data to the destination table.
Data in Azure SQL database table.
Note: You can all add derived columns before sink to convert the value to string as the sink data type is a string.
Thank you very much for your response.
As per your post the DF dataflow appears to be working correctly. I have finally discovered an issue with the transformation - I have an Azure batch service which runs a python script, which does a basic transformation and saves the output to a csv file.
Interestingly, when I preview the data in the dataflow, it looks as expected. However, the values stored in SQL are not.
For the sake of others having a similar issue, my existing python script used to convert a 'float' datatype column to string-type. Upon conversion, it used to retain 1 decimal number but as all of my numbers are integers, they were ending up with .0.
The solution was to convert values to integer and then to string:
df['col_name'] = df['col_name'].astype('Int64').astype('str')

Mapping Data Flow Common Data Model source connector datetime/timestamp columns nullified?

We are using Azure Data Factory Mapping data flow to read from Common Data Model (model.json).
We use dynamic pattern – where Entity is parameterised and we do not project any columns and we have selected Allow schema drift.
Problem: We are having issue with “Source” in mapping data flow (Source Type is Common Data Model). All the datetime/timestamp columns are read as null in source activity.
We also tried in projection tab Infer drifted column types where we provide a format for timestamp columns, However, it satisfies only certain timestamp columns - since in the source each datetime column has different timestamp format.
11/20/2020 12:45:01 PM
2020-11-20T03:18:45Z
2018-01-03T07:24:20.0000000+00:00
Question: How to prevent datetime columns becoming null? Ideally, we do not want Mapping Data Flow to typecast any columns - is there a way to just read all columns as string?
Some screenshots
In Projection tab - we do not specify schema - to allow schema drift and to dynamically load more than 1 entities.
In Data Preview tab
ModifiedOn, SinkCreatedOn, SinkModifiedOn - all these are system columns and will definitely have values in it.
This is now resolved on a separate conversation with Azure Data Factory team.
Firstly there is no way to 'stringfy' all the columns in Source, because CDM connector gets its metadata from model.json (if needed this file can be manipulated, however not ideal for my scenario).
To solve datetime/timestamp columns becoming null - under Projection tab we need to select Infer drifted column types and then you can add "multiple" time formats that you expect to come from CDM. You can either select from dropdown - if your particular datetime format is not listed in the dropdown (which was my case) then you can edit the code behind the data flow (i.e. data flow script), to add your format (see second screenshot).

Decimal number from JSON truncated when read from Mapping Data Flow

When reading a decimal number from JSON data files using a Mapping Data Flow, the decimal digits are truncated.
[Source data]
{
"value": 1123456789.12345678912345678912
}
In Data Factory, the source dataset is configured with no schema. The Mapping Data Flow projection defines a decimal data type with sufficient precision and scale.
[Mapping Data Flow script]
source(output(
value as decimal(35,20)
),
...
However, when viewing the value in the 'Data preview' window, or reviewing pipeline output, the value is truncated.
[Output]
1123456789.12345670000000000000
This issue doesn't occur with other file formats, such as delimited text.
When previewing the data from the source dataset, the decimal digits are truncated in the same way. This occurs whether or not a schema is set. If a schema is set, the data type is number rather than decimal since it's JSON. Mozilla Developer Network documentation calls out the varied number of decimal digits supported by browsers, so I wonder if this is down to the JSON parser being used.
Is this expected behaviour? Can Data Factory be configured to support a the full number of decimal places when working with JSON? Unfortunately this is calling into question whether it's viable to perform aggregate calculations in Data Factory.
I've created a same test as yours and got the same result as follows:
After I changed the soure data, I put double quotes on the value:
Then I use a toDecimal(Value,35,20) to convert the string type to decimal type:
It seems work well. So we can get conclusion:
Don't let ADF do default data type conversion, it will truncate the length of the value.
This issue doesn't occur with other file formats, such as delimited text. Because the default value is the string type.
Its common issue with JSON parser, FloatParseHandling setting is available in .net library but not in ADF.
FloatParseHandling can set to Decimal while parsing the file through .net.
Until the setting is made available in ADF need to try the workaround - using quotes ["] at both end to make the value string and get it converted after loading.

Lift load Dateformat issue from csv file

we are migrating db2 data to db2 on cloud. We are using below lift cli operation for migration.
Extracting a database table to a CSV file using lift extract from source database.
Then loading the extracted CSV file to db2 on cloud using 'lift load'
ISSUE:
We have created some tables using ddl on the target db2oncloud which have some columns with DATA TYPE "TIMESTAMP"
while load operation(lift load), we are getting below error"
"MESSAGE": "The field in row \"2\", column \"8\" which begins with
\"\"2018-08-08-04.35.58.597660\"\" does not match the user specified
DATEFORMAT, TIMEFORMAT, or TIMESTAMPFORMAT. The row will be
rejected.", "SQLCODE": "SQL3191W"
If you use db2 as a source database, then use either:
the following property during export (to export dates, times, timestamps as usual for db2 utilities - without double quotes):
source-database-type=db2
try to use the following property during load, if you have already
exported timestamps surrounded by double quotes:
timestamp-format="YYYY-MM-DD-HH24.MI.SS.FFFFFF"
If the data was extracted using lift extract then for sure you should load the data with source-database-type=db2. Using this parameter will preconfigure all the necessary load details automatically.

TIMESTAMP type in sqlite DB

I am now building an iPhone app and it involves core data. One of the entities has an attribute with Date type, which effectively generates a column with TIMESTAMP type in the corresponding sqlite DB. The value looks something like 320928592.400471
My question is... how can I convert ordinary datetime into the TIMESTAMP type? I would like to preload some static data to the DB. Therefore, I need to know how to store the data directly to the DB.
Chances are that number is the same number returned by NSDate's timeIntervalSinceReferenceDate, i.e. seconds since 1 January 2001.
It might be easier to either populate the database on the first run of your program, or to generate the prefilled database and export it from your phone to include in the bundle.