Is it possible to catalog data inside csv files inside Azure Blob Storage using Azure Data Catalog? - metadata

I want to catalog data stored in csv files in the Azure Blob Storage. I tried to see if there is anyway to get metadata of Blob Storage and found Data Catalog is an option. Thing is, csv file is handled as a blob type and we can not profile it. I want, csv files in blob storage to act as tables.
Is this possible using Azure Data Catalog?

Yes you can use Data Catalog, For updated Data Catalog features, please use the new Azure Purview service, which offers unified data governance for your entire data estate. I would recommend to use : Azure Purview( Still you possible through Data Catalog)
Registering assets from a data source copies the assets’ metadata to Azure, but the data remains in the existing data-source location.
For updated Data Catalog features, please use the new Azure Purview service, which offers unified data governance for your entire data estate.
Introduction to Azure Purview (preview) - Azure Purview
This article provides an overview of Azure Purview, including its features and the problems it addresses. Azure Purview enables any user to register, discover, understand, and consume data sources.
This article outlines how to register an Azure Blob Storage account in Purview and set up a scan.
For more information on Blob index tags categorize data in your storage account using key-value tag attributes. These tags are automatically indexed and exposed as a searchable multi-dimensional index to easily find data. This article shows you how to set, get, and find data using blob index tags. Use blob index tags to manage and find data on Azure Blob Storage

Related

Azure IoT hub message routing push with blob index tags

I have a setup which consists of devices sending data to Azure cloud IoT Hub using message routing (to storage endpoints) which land up as blobs in a container. The frequency of data push is high. On the other end, I want to be able to query my blob container to pull files based on specific dates.
I came across blob index tags which look like a promising solution to query and is supported by the Azure SDK for .net.
I was thinking to add tags to each blob ex: processedDate: <dd/mm/yyyy>, which would help me query on the same later.
I found out that while uploading the blobs manually it is possible to add the tags but not sure how to go about or where to configure the same in the message routing flow where blobs are created on the fly. So I am looking for a solution to add those tags in flight as they are being pushed on to the container.
Any help on this will be much appreciated.
Thanks much!
Presently, the Azure IoT Hub doesn't have a feature to populate a custom endpoint for instance headers, properties, tags, etc.
However, in your case such as a storage custom endpoint you can use an EventGridTrigger function to populate a blob based on your needs.

Tool for Azure cognitive search similar to Logstash?

My company has lots of data(Database: PostgreSQL) and now the requirement is to add search feature in that,we have been asked to use Azure cognitive search.
I want to know that how we can transform the data and send it to the Azure search engine.
There are few cases which we have to handle:
How will we transfer and upload on index of search engine for existing data?
What will be the easy way to update the data on search engine with new records in our Production Database?(For now we are using Java back end code for transforming the data and updating the index, but it is very time consuming.)
3.What will be the best way to manage when there's an update on existing database structure? How will we update the indexer without doing lots of work by creating the indexers every time?
Is there anyway we can automatically update the index whenever there is change in database records.
You can either write code to push data from your PostgreSQL database into the Azure Search index via the /docs/index API, or you can configure an Azure Search Indexer to do the data ingestion. The upside of configuring an Indexer to do the ingestion is that you can also configure it to monitor the datasource on a schedule for updates, and have those updates reflected into the search index automatically. For example via SQL Integrated Change Tracking Policy
PostgreSQL is a supported datasource for Azure Search Indexers, although the datasource is in preview (not get generally available).
Besides the answer above that involves coding on your end, there is a solution you may implement using Azure Data Factory PostgreSQL connector with a custom query that tracks for recent records and create a Pipeline Activity that sinks to an Azure Blob Storage account.
Then within Data Factory you can link to a Pipeline Activity that copies to an Azure Cognitive Search index and add a trigger to the pipeline to run at specified times.
Once the staged data is in the storage account in delimitedText format, you can also use built-in Azure Blob indexer with change tracking enabled.

Does Azure Purview have Federated Data Catalogue capability?

I am exploring options to create a data mesh using Azure native services and need a Federated Data Catalogue.Could the following be accomplished:
Create the Azure Data Platform 1 with Purview Data Catalogue 1 cataloging the related data assets
Create the Azure Data Platform 2 with Purview Data Catalogue 2 cataloging the related data assets
Create a Federated Azure Purview Data Catalogue 3 where Purview Data Catalogue 1 & 2 synchs its contents automatically to Purview Data Catalogue 3.
Allow users to search all the company assets from Purview Data Catalogue 3 and request access to the data from Azure Data Platform 1 or Azure Data Platform 2 to consume the data.
Thanks
CK
There is not a built-in feature currently for Azure Purview.
The recommended approach would be to use Collections and the Data Access Policies to support the access control.
If you must support scanning via one Purview instance and copying to another, you may subscribe to data catalog events using the Purview Kafka Endpoint and then use some compute to relay them to the other catalog.
You should also consider doing a one-time Purview migration if Purview Data Catalog 1 and 2 already have scanned assets.

Moving image metadata from Azure Blob Storage into CosmosDB

I have several images in a container in Blob Storage that have metadata I want to store in CosmosDB. The images are Jpeg's, Would I have to pull the data into Data Factory or is there a simpler method?
I think the simplest way would be to just write a small console app to loop through all your blob containers and items and insert into Cosmos DB using it's SDK.
There are many ways to loop through images stored in Blob Storage and store them in Cosmos DB. Please refer to below links. A little tweak and required functionality can be achieved.
Using ADF
Link
Use Azure Logic App
Link 1 Link 2
Write a custom code and then populate the result into Cosmos DB
Link

Does azure data factory uses data catalog services?

I am planning to use Azure Data factory for ETL process, I would like to know if Azure Data factory uses the metamodel that is captured in the Data Catalog. Please advice
No currently you can' t reuse Metadata stored in Azure Data Catalog in Azure Data Factory directly. You could try to reuse some of the Metadata, retrieving Data Assets via the Rest API (https://learn.microsoft.com/en-us/rest/api/datacatalog/data-catalog-data-asset), but I think that it will be faster doing the setup in Azure Data Factory. Also be aware that main focus of Data Factory is on movement and orchestration. For Big Data transformations, you will use e. g. Databricks activities, for "classic" ETL integrate SSIS.