Create a Dataflow within the PBI reporting service that connects to a view/Table built-in Azure Databricks. Is it possible? - azure-data-flow

I need to create a Dataflow within the PBI reporting service that connects to a view/Table built-in Azure Databricks. Is it possible?
Can I create a data model or join tables with Power BI Service? If yes, then in which license, pro or premium?

Related

Azure Durable Function app with Postgres data store

We need to host existing Azure Durable Function app outside of Azure. We can run the function app as a container, but we'll need to configure an alternate data store (which is currently using Azure Storage). I can see MS SQL is a supported alternate - see here - and this will work for us, but Postgres is more aligned with the direction we're headed, so would be preferable. Has anyone used Postgres as the storage provider for Azure Durable Function apps?
The language specific operations to deploy outside the azure can be performed by the steps mentioned in https://learn.microsoft.com/en-us/azure/azure-functions/durable/durable-functions-overview?tabs=python
For the storage considerations refer: https://learn.microsoft.com/en-us/azure/azure-functions/storage-considerations
But there are no specific and perfect procedures for Postgre SQL and waiting for update from Azure.

Synapse Analytics vs SQL Server 2019 Big Data Cluster

Could someone explain the difference between SQL Server 2019 BDC vs Azure Synapse Analytics other than OLAP & OLTP differences? Why would one use Analytics over SQL Server 2019 BDC?
Azure Synapse Analytics is a Cloud based DWH with DataLake, ADF & PowerBI designers tightly integrated. it is a PaaS offering and it is not available on-prem. The DWH engine is MPP with limited polybase support (DataLake).
it also allows ypu to provision Apache Spark if needed.
SQLServer 2019 Big Data Cluster is a IaaS platform based on Kubernetes. it can be implemented on-prem on VMs or on OpenShift or on AKS Any cloud for that matter).
Its Data Virtualization support is very good with support for ODBC data sources and a Data Pool to support Data Virtualization- Implemented via Polybase.
Apache Spark makes up the Big Data compute.
Though it is not a MPP like Synapse, because of Pods in Kubernetes, multiple pods can be created on the fly through scalability features such as VMSS ... etc.
If you want Analytical capability on-prem you will use SQLServer 2019 BDC but if you want a Cloud based DWH with analytical capability features you will use Synapse
explain the difference between SQL Server 2019 BDC vs Azure Synapse Analytics
Server is OLTP and Synapse is OLAP. :D
other than OLAP & OLTP differences? Why would one use Analytics over SQL Server 2019 BDC?
Purely from a terminology point of view their product management have no clue what they are doing.
"SQL Server" is a DYI/on-prem/managed-by-you DB.
Fully Azure managed SaaS version of SQL Server is known as Azure SQL Database.
They also have "Azure SQL Managed Instance", and "SQL Server on Azure VM".
Azure Synapse is renamed Dedicated SQL-Pools.
Azure Synapse On-demand is renamed to Serverless SQL-Pools.
Azure Synapse Analytics = Dedicated + Serverless + bunch of ML services.
I'm going to answer assuming your question is:
Why would one use "Azure Synapse Dedicated or Serverless" over SQL Server?
SQL Server is on prem DIY, other is SaaS, fully managed by Azure. With this comes all the pros/cons of SaaS like No CAPEX, no management, elastic, very large scale, ...
Synapse' USP is it's MPP, which SQL Server does not have. Though I see things like Polybase and EXTERNAL TABLES being supported by SQL Server.
Due to MPP architecture, Synapse's transactional performance is worst by far (that I've seen). E.g. Executing INSERT INTO xxx VALUES(...) to add one row via JDBC would take about 1-2 seconds as against 10-12 seconds for importing CSV files with 10s of thousands of rows using COPY command. And INSERT INTO does not scale with JDBC batching. It'll take 100 seconds to insert 100 rows in one batch.
It is not your fault that you are confused. IMO Azure Product Management for Databases (SQL Server, DW, ADP, Synapse, Analytics and the 10 other flavors of all these) have no clue what they want to offer 2 years from today. Every product boasts of Big Data, Massive this and that, ML and Analytics, Elastic this and that. Go figure.
PS: Check out Snowflake if you haven't.
I'm not affiliated with Microsoft or Snowflake.
I believe the user user3129206 is asking
SQL Server 2019 BDC vs Azure Synapse Analytics
not
SQL Server vs Azure Synapse Analytics
so the first answer is relevant.
The only thing I'd argue is that the BDC is also an MPP like Synapse because of Pods in Kubernetes if implemented right, with many servers + HDS.
I plan to test BDC on-premises and see how demanding the install and maintenance are.
The neat thing about the BDC seems to be easy, partially or fully, to port it from on-premises to Azure or any cloud.
It seems that BDC is both OLTP and OLAP, trying to provide the best of both worlds.
As I am on the same comparison quest, I'll try to get back and share what I learn.

Data analytics (join mongoDB and SQL data) through Azure Data Lake and power BI

We have an app hosted on Azure using mongoDB (running on a VM) and Azure SQL dbs. The idea is to build a basic data analysis pipeline to "join" the data between both these DBs and visually display the same using power BI.
For instance we have a "user" table in SQL with a unique "id" and a "data" table in mongo that has a reference of "id" + other tables in SQL that have reference of 'id'. So we wish to analyse the contents of data based on user and possibly join that further with other tables as needed.
Is azure data lake + power BI enough to implement this case? Or we need azure data analytics or azure synapse for this?
Azure Data Lake (ADL) and Power BI on its own is not going to be able to build a pipeline, ADL it is just a storage area and Power BI is a very much a lightweight ETL tool limited by features and capacity.
It would be highly recommended that you have some better compute power behind it using, as you mentioned Azure Synapse. That will be able to have a defined pipeline to orchestrate data movement into the data lake, then do the processing to transform the data.
Power BI on it own will not be able to do this, as you will still be limited by the Dataflow and Dataset size of 1GB if running Pro. Azure Synapse does contain Azure Data Factory Pipelines, Apache Spark and Azure SQL Data Warehouse so you can choose between Spark and SQL for your data transformational steps as both will connect to the Data Lake.
Note: Azure Data Lake Analytics (ADLA) (and USQL) is not a major focus for MS, and never widely used. Azure Databricks and Azure Synapse with Spark has replaced ADLA in all of the modern data pipeline and architectures examples for MS.

Azure databricks connect to on-prem databases through ADF

Is possible to connect an Azure Databricks notebook with Azure Data Factory linkedservices (connections to on prem DBs)?
On ADF, I have connections to on prem gateways through linked services to connect to local DBs. I need to connect my Databricks notebook with that linked services on ADF. It is possible?
Regards.
Yes you can,
You need to ensure your databricks resource is configured to be in the VNET which can talk to these databases.
https://learn.microsoft.com/en-us/azure/databricks/administration-guide/cloud-configurations/azure/vnet-inject

Azure Data Factory v2 and data processing in custom activity

I am migrating (extract-load) a large dataset to a LOB service, and would like to use Azure Data Factory v2 (ADF v2). This would be the cloud version of the same kind of orchestration typically implemented in SSIS. My source database and dataset, as well as the target platform are on Azure. That lead me to ADFv2 with Batch Service (ABS) and creating a custom activity.
https://learn.microsoft.com/en-us/azure/data-factory/transform-data-using-dotnet-custom-activity
However, I am unable to read from the documentation or samples provided by Microsoft how ADF v2 can create the job and tasks needed by the batch service.
As an example, lets say I have dataset with 10 million records, and batch service with 10 cores in a pool. How do I submit 1/10, or even row-for-row, to my command line app running on each of the cores in the pool? How do I distribute the work? Following the default guide at the docs for ADF v2, I just get a datasets.json file, and it is the same for all my pool nodes, no "slice" or subset information.
If ADF v2 was not involved I would create a job in ABS and for each row or for each X rows, create a task. The nodes would then execute task for task. How do I achieve something similar with ADF v2?