I have a lookup activity which is querying from Salesforce. I want to store all the results into Blob. Any idea on how to do it?
Related
I have one scenario to implement: I am reading files in ADLS using ADF and copying into another location. While doing that, I need to capture file names and store in cosmos db. Does anybody has done such scenario, please suggest. TIA.
You can use Get Metadata activity get file metadata such as file name. Using this pattern, you can then push this value to the db collection in cosmos database.
You can use Get Metadata activity to get file names. Then create Azure Function activity and pass your file names to it. Finally, you need generate id property and insert into Cosmos DB container.
I'm using ADF pipeline to copy data from data lake to blob storage and then from blob storage to table storage.
As you can see below, here are the column types in ADF Data Flow Sink - Blob Storage (integer, string, timestamp):
Here is the Mapping settings in Copy data activity:
On checking the output in table storage, I see all columns are of string type:
Why is table storage saving data in string values? How do I resolve this issue in table storage so that it will accept columns in the right type (integer, string, timestamp)? Please let me know. Thank you!
In usually, when load data from blob storage in Data Factory, all the default data type in blob file are String, Data Factory will help you convert the data type automatically to Sink.
But it also can not meet all our requests.
I tested copy data from Blob to Table Storage and found that: if we don't specify the data type manually in Source, after pipeline executed, all the data type will be String in Sink(Table Storage).
For example, this my Source blob file:
If I don't change the source data type, it seems that everything is ok in Sink table:
But after the pipeline executed, the data type in table storage are all String:
If we change the data type in Source blob manually, and it works ok!
For another question, a little confuse that just from you screenshot, that seems the UI of Mapping Data Flow Sink, but Mapping Data Flow doesn't support Table Storage as Sink.
Hope this helps.
Finally figured out the issue - I was using DelimitedText format for Blob Storage. After converting to Parquet format, I can see data being written to table storage in correct type.
Here is how my ADF Pipeline looks like. In Data Flow, I read some data from a source, perform filter & join and store data to a sink. My plan was to use Azure Table Storage as the sink. However, according to https://github.com/MicrosoftDocs/azure-docs/issues/34981, ADF Data Flow does not support Azure Table Storage as a sink. Is there an alternative to use Azure Table Storage as the sink in Data Flow?
No, it is impossible. Azure Table Storage can not be the sink of data flow.
Only these six dataset is allowed:
Not only these limits. When as the sink of the dataflow, Azure Blob Storage and Azure Data Lake Storage Gen1&Gen2 only support four format: JSON, Avro, Text, Parquet.'
At least for now, your idea is not a viable solution.
For more information, have a look of this offcial doc:
https://learn.microsoft.com/en-us/azure/data-factory/data-flow-sink#supported-sink-connectors-in-mapping-data-flow
Even today it isn't possible. One option could be (we are solving a similar case like this currently) to use a Blob Storage as a temporary destination.
The data flow will store the result in the Blob Storage. The source data is processed by all these different transformations in the data flow and prepared well for table storage, e.g. PartitionKey, RowKey, and all other columns are there.
A subsequent Copy Activity will move the data from Blob Storage into Table Storage easily.
The marked part of the pipeline is doing exactly this:
Full Orders runs the data flow
to Table Storage copy activity moves data into the Table Storage
I have a copy data activity for on-premise SQL Server as source and ADLS Gen2 as sink. There is a control table to pickup tableName, watermarkDateColumn and the watermarkDatetime to pull incremental data from the source database.
After data is pulled/loaded in sink, I want to get the max of the watermarkDateColumn in my dataset. Can it be obtained from #activity('copyActivity1').output?
I'm not allowed to use one extra lookup activity to query the source table for getting the max(watermarkDateColumn) in pipeline.
Copy activity only could be used for data transmission,not for any other aggregation feature. So #activity('copyActivity1').output won't help. Since you said you can't use lookup activity, i'm afraid your requirement is not available so far.
If you prefer not using additional activities, I suggest you using Data Flow Activity instead which is more flexible.There is built-in aggregation feature in the Data Flow Activity.
I have a ForEach activity which uses a Copy activity with an HTTP source and a blob storage sink to download a json file for each item. The HTTP source is set to Binary Copy whereas the blob storage sink is not, since I want to both copy the complete json file to blob storage and also extract some data from each json file and store this data in a database table in the following activity.
In the blob storage sink, I've added a column definition which extracts some data from the json file. The json files are stored in blob storage successfully, but how can I access the extracted data in the subsequent stored procedure activity?
You can try using lookup activity to extract the data in the Blob and use the output in later activity. Refer to https://learn.microsoft.com/en-us/azure/data-factory/control-flow-lookup-activity to see whether lookup activity can satisfy your requirements.