What's the difference between using Data Export Service and Export to Data Lake, regarding dataverse replication? - azure-data-lake-gen2

I know Data Export Service has a SQL storage target where as Export to Data Lake is Gen2 but seeing that Dataverse (aka Common Data Service) is structured data, I can't see why you'd use Export to Data Lake option in Powerapps, as Gen2 is for un-structured and semi-structured data!
Am I missing something here? Could they both be used e.g. Gen2 to store images data?

Data Export service is v1 used to replicate the Dynamics CRM online data to Azure SQL or Azure IaaS SQL server in near real time.
Export to Datalake is similar to v2, for the same replication purpose with new trick :) snapshot is advantage here.
There is a v3 coming, almost similar to v2 but additionally with Azure synapse linkage.
These are happening very fast and not sure how community is going to adapt.

Related

Is it possible to create Linked Service in Azure Data Factory to a Synapse Lake database

Hi can someone let me know if its possible to create linked service to lake database in Azure Data Factory?
I've googled it but there is not tangible information?
There is no direct way to connect to a lake database present in Azure Synapse Analytics (like connecting to dedicated SQL pool). The lake databases in Azure synapse analytics store their data inside an azure data lake storage. This is done with the help of the linked service to the data lake storage account. By default, the data lake account created at the time of creation of synapse workspace will be used to store all the data of lake database.
When you choose Lake Database -> <your_database> -> open, then you can see in the storage settings about the details of linked service and the input folder where it will be stored.
So, you can simply create a linked service to the data lake storage account which was used to store the data of lake database in azure synapse. Refer to this official Microsoft documentation to understand about the Lake Databases.

Loading huge amount of from ADLS to PostgreSQL database

I need to copy one year of historical data from Azure data lake storage to Azure PostgreSQL database. 1 day data = 65 GB. How can I load that much data in less time
You can try Azure Data Factory. ADF has connectors for both ADLS and Azure Database for PostgreSQL. Refer to below metrics based on network bandwidth and data size for copy activity in ADF:
Copy activity performance and scalability guide
Below are some sample articles to use ADLS and Azure Postgres with ADF:
Copy and transform data in Azure Data Lake Storage Gen2 using Azure Data Factory or Azure Synapse Analytics
Copy and transform data in Azure Database for PostgreSQL using Azure Data Factory or Synapse Analytics

Data analytics (join mongoDB and SQL data) through Azure Data Lake and power BI

We have an app hosted on Azure using mongoDB (running on a VM) and Azure SQL dbs. The idea is to build a basic data analysis pipeline to "join" the data between both these DBs and visually display the same using power BI.
For instance we have a "user" table in SQL with a unique "id" and a "data" table in mongo that has a reference of "id" + other tables in SQL that have reference of 'id'. So we wish to analyse the contents of data based on user and possibly join that further with other tables as needed.
Is azure data lake + power BI enough to implement this case? Or we need azure data analytics or azure synapse for this?
Azure Data Lake (ADL) and Power BI on its own is not going to be able to build a pipeline, ADL it is just a storage area and Power BI is a very much a lightweight ETL tool limited by features and capacity.
It would be highly recommended that you have some better compute power behind it using, as you mentioned Azure Synapse. That will be able to have a defined pipeline to orchestrate data movement into the data lake, then do the processing to transform the data.
Power BI on it own will not be able to do this, as you will still be limited by the Dataflow and Dataset size of 1GB if running Pro. Azure Synapse does contain Azure Data Factory Pipelines, Apache Spark and Azure SQL Data Warehouse so you can choose between Spark and SQL for your data transformational steps as both will connect to the Data Lake.
Note: Azure Data Lake Analytics (ADLA) (and USQL) is not a major focus for MS, and never widely used. Azure Databricks and Azure Synapse with Spark has replaced ADLA in all of the modern data pipeline and architectures examples for MS.

How does one synchronise on premise MongoDB with an Azure DocumentDB?

I'm looking for the best way that i can synchronise a on premise MongoDB with an Azure DocumentDB . the idea is this can synchronise on a predetermined time, for example every 2 hours.
I'm using .NET and C#.
I was thinking that I can create a Windows Service that retrieves the documents from de Azure DocumentDB collections and inserts the documents on my on premise MongoDB.
But I'm wondering if there is any better way.
Per my understanding, you could use Azure Cosmos DB Data Migration Tool to Export docs from collections to JSON file, then pick up the exported file(s) and insert / update into your on-premise MongoDB. Moreover, here is a tutorial about using the Windows Task Scheduler to backup DocumentDB, you could follow here.
When executing the export operation, you could export to a local file or Azure Blob Storage. For exporting to the local file, you could leverage the FileTrigger from Azure WebJobs SDK Extensions to monitor the file additions / changes under a particular directory, then pick up the new inserted local file and insert into your MongoDB. For exporting to Blob storage, you could also work with WebJobs SDK and use the BlobTrigger to trigger the new blob file and do the insertion. For the blob approach, you could follow How to use Azure blob storage with the WebJobs SDK.

Loading data from S3 to PostgreSQL RDS

We are planning to go for PostgreSQL RDS in AWS environment. There are some files in S3 which we will need to load every week. I don't see any option in AWS documentation where we can load data from S3 to PostgreSQL RDS. I see it is possible for Aurora but cannot find anything for PostgreSQL.
Any help will be appreciated.
One option is to use AWS Data Pipeline. It's essentially a JSON script that allows you to orchestrate the flow of data between sources on AWS.
There's a template offered by AWS that's setup to move data between S3 and MySQL. You can find it here. You can easily follow this and swap out the MySQL parameters with those associated with your Postgres instance. Data Pipeline simply looks for RDS as the type and does not distinguish between MySQL and Postgres instances.
Scheduling is also supported by Data Pipeline, so you can automate your weekly file transfers.
To start this:
Go to the Data Pipeline service in your AWS console
Select "Build from template" under source
Select the "Load S3 to MySQL table" template
Fill in the rest of the fields and create the pipeline
From there, you can monitor the progress of the pipeline in the console!