How to clone a data flow in Synapse? - azure-data-factory

How to clone a data flow in Synapse?
I see in Azure Data Factory there is a possibility to clone a data flow by going to Data Flows folder and right-click on a data flow. But in Synapse I don't find a way to see the list of the data flows...

Posting it as answer for other community members.
In Synapse, you can find the list of Dataflows in the Develop pane.
You can create dataflows as well here. Go to Develop -> + -> Dataflow.
You can create the clone of the Dataflow using the Data flow list here. It will contain all the dataflows including the dataflows that were created using the Data flow activity in the pipeline.
The same will reflect in the Dataflow activity of the Pipeline.

Related

Azure IoT hub message routing push with blob index tags

I have a setup which consists of devices sending data to Azure cloud IoT Hub using message routing (to storage endpoints) which land up as blobs in a container. The frequency of data push is high. On the other end, I want to be able to query my blob container to pull files based on specific dates.
I came across blob index tags which look like a promising solution to query and is supported by the Azure SDK for .net.
I was thinking to add tags to each blob ex: processedDate: <dd/mm/yyyy>, which would help me query on the same later.
I found out that while uploading the blobs manually it is possible to add the tags but not sure how to go about or where to configure the same in the message routing flow where blobs are created on the fly. So I am looking for a solution to add those tags in flight as they are being pushed on to the container.
Any help on this will be much appreciated.
Thanks much!
Presently, the Azure IoT Hub doesn't have a feature to populate a custom endpoint for instance headers, properties, tags, etc.
However, in your case such as a storage custom endpoint you can use an EventGridTrigger function to populate a blob based on your needs.

what is the difference between ADF UX and ADF service?

While going through documentation for 'source control in ADF', I came across following information in the introduction paragraph.
By default, the Azure Data Factory user interface experience (UX)
authors directly against the data factory service. This
experience has the following limitations:
The Data Factory service doesn't include a repository for storing the
JSON entities for your changes. The only way to save changes is via
the Publish All button and all changes are published directly to the
data factory service.
The Data Factory service isn't optimized for collaboration and
version control.
The Azure Resource Manager template required to deploy Data Factory
itself is not included.
from the highlighted phrase of information, what concerns me is how to understand the difference between ADF service and ADF UI. Couldn't find any relevant information over google search.
Would anyone please help me understand. I have attached the web-link for the source of the document.
Thank you for your time and support
The ADF UI is the authoring tool. By default it isn’t connected to Azure DevOps or GitHub for source control. This is called “Live Mode” and changes are saved directly to ADF service (meaning “saved to the server”).
The preferred method of working in the ADF UI is to connect it Azure DevOps or GitHub repos for source control. (The link you provided describes this.) This allows you to save intermediate progress (even if the pipeline won’t validate because it isn’t valid). And it allows you to utilize source control features like branching, merging, pull request approvals, etc. Only when you merge to the main branch and then Publish do the changes get saved to the ADF service. Until then you can debug pipelines including changes you made in your branch. But the version of pipeline that is run from a trigger (like a schedule trigger) is the published version not what’s in the main branch or in your branch.

Tool for Azure cognitive search similar to Logstash?

My company has lots of data(Database: PostgreSQL) and now the requirement is to add search feature in that,we have been asked to use Azure cognitive search.
I want to know that how we can transform the data and send it to the Azure search engine.
There are few cases which we have to handle:
How will we transfer and upload on index of search engine for existing data?
What will be the easy way to update the data on search engine with new records in our Production Database?(For now we are using Java back end code for transforming the data and updating the index, but it is very time consuming.)
3.What will be the best way to manage when there's an update on existing database structure? How will we update the indexer without doing lots of work by creating the indexers every time?
Is there anyway we can automatically update the index whenever there is change in database records.
You can either write code to push data from your PostgreSQL database into the Azure Search index via the /docs/index API, or you can configure an Azure Search Indexer to do the data ingestion. The upside of configuring an Indexer to do the ingestion is that you can also configure it to monitor the datasource on a schedule for updates, and have those updates reflected into the search index automatically. For example via SQL Integrated Change Tracking Policy
PostgreSQL is a supported datasource for Azure Search Indexers, although the datasource is in preview (not get generally available).
Besides the answer above that involves coding on your end, there is a solution you may implement using Azure Data Factory PostgreSQL connector with a custom query that tracks for recent records and create a Pipeline Activity that sinks to an Azure Blob Storage account.
Then within Data Factory you can link to a Pipeline Activity that copies to an Azure Cognitive Search index and add a trigger to the pipeline to run at specified times.
Once the staged data is in the storage account in delimitedText format, you can also use built-in Azure Blob indexer with change tracking enabled.

Is there any way to call Bing-ads api through a pipeline and load the data into Bigquery through Google Data Fusion?

I'm creating a pipeline in Google Data Fusion that allows me to export my bing-ads data into Bigquery using my bing-ads developer token. I couldn't find any data sources that should be added to my pipeline in data fusion. Is fetching data from API calls even supported on Google Data Fusion and if it is, how can it be done?
HTTP based sources for Cloud Data Fusion are currently in development and will be released by Q3. Could you elaborate on your use case a little more, so we can make sure that your requirements will be covered by those plugins? For example, are you looking to build a batch or real-time pipeline?
In the meantime, you have the following two, more immediate options/workarounds:
If you are ok with storing the data in a staging area in GCS before loading it into BigQuery, you can use the HTTPToHDFS plugin that is available in the Hub. Use a path that starts with gs:///path/to/file
Alternatively, we also welcome contributions, so you can also build the plugin using the Cloud Data Fusion APIs. We are happy to guide you, and can point you to documentation and samples.

Use Azure to GET from RESTful API

I would like to use Azure to retrieve JSON data from a REST api then store that data into a table. Data retrieval would occur daily and a parameter would be passed to the api to restrict the results to the prior day's data.
Which Azure component/mechanism should I use for calling the api?
The data would be the foundation for a data warehouse. Should I use Azure SQL table or Azure table?
I have recently begun exploring Azure and am not sure how to do this.
I look forward to feedback.
Thank you.
Take a look at Azure Functions. You can create an Azure Function that is periodically invoked, it has input bindings for different sources (or you can add some C# code to read from URL) and then place results into Azure Database.
Here is example of Azure Function that sends JSON to stored procedure:
https://www.codeproject.com/Articles/1169531/Sending-events-from-Azure-Event-Hub-to-Azure-SQL-D