Data analytics (join mongoDB and SQL data) through Azure Data Lake and power BI - mongodb

We have an app hosted on Azure using mongoDB (running on a VM) and Azure SQL dbs. The idea is to build a basic data analysis pipeline to "join" the data between both these DBs and visually display the same using power BI.
For instance we have a "user" table in SQL with a unique "id" and a "data" table in mongo that has a reference of "id" + other tables in SQL that have reference of 'id'. So we wish to analyse the contents of data based on user and possibly join that further with other tables as needed.
Is azure data lake + power BI enough to implement this case? Or we need azure data analytics or azure synapse for this?

Azure Data Lake (ADL) and Power BI on its own is not going to be able to build a pipeline, ADL it is just a storage area and Power BI is a very much a lightweight ETL tool limited by features and capacity.
It would be highly recommended that you have some better compute power behind it using, as you mentioned Azure Synapse. That will be able to have a defined pipeline to orchestrate data movement into the data lake, then do the processing to transform the data.
Power BI on it own will not be able to do this, as you will still be limited by the Dataflow and Dataset size of 1GB if running Pro. Azure Synapse does contain Azure Data Factory Pipelines, Apache Spark and Azure SQL Data Warehouse so you can choose between Spark and SQL for your data transformational steps as both will connect to the Data Lake.
Note: Azure Data Lake Analytics (ADLA) (and USQL) is not a major focus for MS, and never widely used. Azure Databricks and Azure Synapse with Spark has replaced ADLA in all of the modern data pipeline and architectures examples for MS.

Related

Create a Dataflow within the PBI reporting service that connects to a view/Table built-in Azure Databricks. Is it possible?

I need to create a Dataflow within the PBI reporting service that connects to a view/Table built-in Azure Databricks. Is it possible?
Can I create a data model or join tables with Power BI Service? If yes, then in which license, pro or premium?

Is it possible to create Linked Service in Azure Data Factory to a Synapse Lake database

Hi can someone let me know if its possible to create linked service to lake database in Azure Data Factory?
I've googled it but there is not tangible information?
There is no direct way to connect to a lake database present in Azure Synapse Analytics (like connecting to dedicated SQL pool). The lake databases in Azure synapse analytics store their data inside an azure data lake storage. This is done with the help of the linked service to the data lake storage account. By default, the data lake account created at the time of creation of synapse workspace will be used to store all the data of lake database.
When you choose Lake Database -> <your_database> -> open, then you can see in the storage settings about the details of linked service and the input folder where it will be stored.
So, you can simply create a linked service to the data lake storage account which was used to store the data of lake database in azure synapse. Refer to this official Microsoft documentation to understand about the Lake Databases.

Loading huge amount of from ADLS to PostgreSQL database

I need to copy one year of historical data from Azure data lake storage to Azure PostgreSQL database. 1 day data = 65 GB. How can I load that much data in less time
You can try Azure Data Factory. ADF has connectors for both ADLS and Azure Database for PostgreSQL. Refer to below metrics based on network bandwidth and data size for copy activity in ADF:
Copy activity performance and scalability guide
Below are some sample articles to use ADLS and Azure Postgres with ADF:
Copy and transform data in Azure Data Lake Storage Gen2 using Azure Data Factory or Azure Synapse Analytics
Copy and transform data in Azure Database for PostgreSQL using Azure Data Factory or Synapse Analytics

What's the difference between using Data Export Service and Export to Data Lake, regarding dataverse replication?

I know Data Export Service has a SQL storage target where as Export to Data Lake is Gen2 but seeing that Dataverse (aka Common Data Service) is structured data, I can't see why you'd use Export to Data Lake option in Powerapps, as Gen2 is for un-structured and semi-structured data!
Am I missing something here? Could they both be used e.g. Gen2 to store images data?
Data Export service is v1 used to replicate the Dynamics CRM online data to Azure SQL or Azure IaaS SQL server in near real time.
Export to Datalake is similar to v2, for the same replication purpose with new trick :) snapshot is advantage here.
There is a v3 coming, almost similar to v2 but additionally with Azure synapse linkage.
These are happening very fast and not sure how community is going to adapt.

Synapse Analytics vs SQL Server 2019 Big Data Cluster

Could someone explain the difference between SQL Server 2019 BDC vs Azure Synapse Analytics other than OLAP & OLTP differences? Why would one use Analytics over SQL Server 2019 BDC?
Azure Synapse Analytics is a Cloud based DWH with DataLake, ADF & PowerBI designers tightly integrated. it is a PaaS offering and it is not available on-prem. The DWH engine is MPP with limited polybase support (DataLake).
it also allows ypu to provision Apache Spark if needed.
SQLServer 2019 Big Data Cluster is a IaaS platform based on Kubernetes. it can be implemented on-prem on VMs or on OpenShift or on AKS Any cloud for that matter).
Its Data Virtualization support is very good with support for ODBC data sources and a Data Pool to support Data Virtualization- Implemented via Polybase.
Apache Spark makes up the Big Data compute.
Though it is not a MPP like Synapse, because of Pods in Kubernetes, multiple pods can be created on the fly through scalability features such as VMSS ... etc.
If you want Analytical capability on-prem you will use SQLServer 2019 BDC but if you want a Cloud based DWH with analytical capability features you will use Synapse
explain the difference between SQL Server 2019 BDC vs Azure Synapse Analytics
Server is OLTP and Synapse is OLAP. :D
other than OLAP & OLTP differences? Why would one use Analytics over SQL Server 2019 BDC?
Purely from a terminology point of view their product management have no clue what they are doing.
"SQL Server" is a DYI/on-prem/managed-by-you DB.
Fully Azure managed SaaS version of SQL Server is known as Azure SQL Database.
They also have "Azure SQL Managed Instance", and "SQL Server on Azure VM".
Azure Synapse is renamed Dedicated SQL-Pools.
Azure Synapse On-demand is renamed to Serverless SQL-Pools.
Azure Synapse Analytics = Dedicated + Serverless + bunch of ML services.
I'm going to answer assuming your question is:
Why would one use "Azure Synapse Dedicated or Serverless" over SQL Server?
SQL Server is on prem DIY, other is SaaS, fully managed by Azure. With this comes all the pros/cons of SaaS like No CAPEX, no management, elastic, very large scale, ...
Synapse' USP is it's MPP, which SQL Server does not have. Though I see things like Polybase and EXTERNAL TABLES being supported by SQL Server.
Due to MPP architecture, Synapse's transactional performance is worst by far (that I've seen). E.g. Executing INSERT INTO xxx VALUES(...) to add one row via JDBC would take about 1-2 seconds as against 10-12 seconds for importing CSV files with 10s of thousands of rows using COPY command. And INSERT INTO does not scale with JDBC batching. It'll take 100 seconds to insert 100 rows in one batch.
It is not your fault that you are confused. IMO Azure Product Management for Databases (SQL Server, DW, ADP, Synapse, Analytics and the 10 other flavors of all these) have no clue what they want to offer 2 years from today. Every product boasts of Big Data, Massive this and that, ML and Analytics, Elastic this and that. Go figure.
PS: Check out Snowflake if you haven't.
I'm not affiliated with Microsoft or Snowflake.
I believe the user user3129206 is asking
SQL Server 2019 BDC vs Azure Synapse Analytics
not
SQL Server vs Azure Synapse Analytics
so the first answer is relevant.
The only thing I'd argue is that the BDC is also an MPP like Synapse because of Pods in Kubernetes if implemented right, with many servers + HDS.
I plan to test BDC on-premises and see how demanding the install and maintenance are.
The neat thing about the BDC seems to be easy, partially or fully, to port it from on-premises to Azure or any cloud.
It seems that BDC is both OLTP and OLAP, trying to provide the best of both worlds.
As I am on the same comparison quest, I'll try to get back and share what I learn.