Synapse Analytics vs SQL Server 2019 Big Data Cluster - sql-server-2019

Could someone explain the difference between SQL Server 2019 BDC vs Azure Synapse Analytics other than OLAP & OLTP differences? Why would one use Analytics over SQL Server 2019 BDC?

Azure Synapse Analytics is a Cloud based DWH with DataLake, ADF & PowerBI designers tightly integrated. it is a PaaS offering and it is not available on-prem. The DWH engine is MPP with limited polybase support (DataLake).
it also allows ypu to provision Apache Spark if needed.
SQLServer 2019 Big Data Cluster is a IaaS platform based on Kubernetes. it can be implemented on-prem on VMs or on OpenShift or on AKS Any cloud for that matter).
Its Data Virtualization support is very good with support for ODBC data sources and a Data Pool to support Data Virtualization- Implemented via Polybase.
Apache Spark makes up the Big Data compute.
Though it is not a MPP like Synapse, because of Pods in Kubernetes, multiple pods can be created on the fly through scalability features such as VMSS ... etc.
If you want Analytical capability on-prem you will use SQLServer 2019 BDC but if you want a Cloud based DWH with analytical capability features you will use Synapse

explain the difference between SQL Server 2019 BDC vs Azure Synapse Analytics
Server is OLTP and Synapse is OLAP. :D
other than OLAP & OLTP differences? Why would one use Analytics over SQL Server 2019 BDC?
Purely from a terminology point of view their product management have no clue what they are doing.
"SQL Server" is a DYI/on-prem/managed-by-you DB.
Fully Azure managed SaaS version of SQL Server is known as Azure SQL Database.
They also have "Azure SQL Managed Instance", and "SQL Server on Azure VM".
Azure Synapse is renamed Dedicated SQL-Pools.
Azure Synapse On-demand is renamed to Serverless SQL-Pools.
Azure Synapse Analytics = Dedicated + Serverless + bunch of ML services.
I'm going to answer assuming your question is:
Why would one use "Azure Synapse Dedicated or Serverless" over SQL Server?
SQL Server is on prem DIY, other is SaaS, fully managed by Azure. With this comes all the pros/cons of SaaS like No CAPEX, no management, elastic, very large scale, ...
Synapse' USP is it's MPP, which SQL Server does not have. Though I see things like Polybase and EXTERNAL TABLES being supported by SQL Server.
Due to MPP architecture, Synapse's transactional performance is worst by far (that I've seen). E.g. Executing INSERT INTO xxx VALUES(...) to add one row via JDBC would take about 1-2 seconds as against 10-12 seconds for importing CSV files with 10s of thousands of rows using COPY command. And INSERT INTO does not scale with JDBC batching. It'll take 100 seconds to insert 100 rows in one batch.
It is not your fault that you are confused. IMO Azure Product Management for Databases (SQL Server, DW, ADP, Synapse, Analytics and the 10 other flavors of all these) have no clue what they want to offer 2 years from today. Every product boasts of Big Data, Massive this and that, ML and Analytics, Elastic this and that. Go figure.
PS: Check out Snowflake if you haven't.
I'm not affiliated with Microsoft or Snowflake.

I believe the user user3129206 is asking
SQL Server 2019 BDC vs Azure Synapse Analytics
not
SQL Server vs Azure Synapse Analytics
so the first answer is relevant.
The only thing I'd argue is that the BDC is also an MPP like Synapse because of Pods in Kubernetes if implemented right, with many servers + HDS.
I plan to test BDC on-premises and see how demanding the install and maintenance are.
The neat thing about the BDC seems to be easy, partially or fully, to port it from on-premises to Azure or any cloud.
It seems that BDC is both OLTP and OLAP, trying to provide the best of both worlds.
As I am on the same comparison quest, I'll try to get back and share what I learn.

Related

How to sync data from Azure SQL managed instance to On-premise SQL server instance on weekly Basis

Please what are the options for me in the following scenario.
We have a SQL managed instance on Azure, our client has requested that they want their data (managed instance) to sync on a weekly basis with their on premise SQL server. I suggested using Azure Data Sync but that will be costly for us and using transactional replication based on what was described here https://learn.microsoft.com/en-us/azure/azure-sql/managed-instance/replication-transactional-overview?view=azuresql is also costly as it requires that we have another managed instance to act as a broker.
Please, I'm very welcoming to ideas you may have on how to go about this.
Thank you.

Azure Durable Function app with Postgres data store

We need to host existing Azure Durable Function app outside of Azure. We can run the function app as a container, but we'll need to configure an alternate data store (which is currently using Azure Storage). I can see MS SQL is a supported alternate - see here - and this will work for us, but Postgres is more aligned with the direction we're headed, so would be preferable. Has anyone used Postgres as the storage provider for Azure Durable Function apps?
The language specific operations to deploy outside the azure can be performed by the steps mentioned in https://learn.microsoft.com/en-us/azure/azure-functions/durable/durable-functions-overview?tabs=python
For the storage considerations refer: https://learn.microsoft.com/en-us/azure/azure-functions/storage-considerations
But there are no specific and perfect procedures for Postgre SQL and waiting for update from Azure.

Azure Synapse SQL CICD with multiple environments

Might be a stupid question but seems to be very difficult to find information about Synapse with multiple environments.
We have dev/test/prod environment setup and need to create partially-automated CICD pipelines between those. The only problem is now that we cannot build dynamic SQL scripts to query from the respective storage accounts - so those could be identical no matter the environment. So, dev Synapse using data from dev-storage and so on. Dedicated SQL pool can benefit from Stored Procedures, and I could pass parameters there if it works. But what about serverless pool? What is the correct way?
I've tried to look options from OPENROWSET with DATA_SOURCE argument as well as EXTERNAL DATA SOURCE expression without any luck. Also, no one seems to offer any information about this so I'm beginning to think if this whole perspective is wrong.
This kind of "external" file reading is new to me, I may have tried to put this in a SQL Server context in my head.
Thank you for your time!
Okay, Serverless pool does support both procedures and dynamic SQL, yet you currently cannot call that straight from Synapse Pipelines.
You have to either trigger those procedures via Spark notebooks or by creating separate Synapse Analytics Linked Services for each of your databases in a Synapse Serverless pools and work from there.

Easy way to create services on HDP cluster

We have a company with a HDP 2.6.4 cluster and an outsource / offshore team that handles different ops tasks. Due to quite strict data access policies, that cannot have access to quite a few datasets. However, we do need them to be able to 24/7 monitor and (ideally) execute different jobs.
So I'm in position as someone in Big data team to enable them to do so, but without access to data. Not being sure what would HDP 2.6 have to offer, I do know that there are certain tools that enable devs to develop all kind of API endpoints, which could then be mapped to different ETL jobs / shell scripts etc.
Would this be optimal approach from an architectural standpoint?
I was thinking of getting us something like Dreamfactory, but opensource and something I can run on premise. Any ideas?
Cheers!
DreamFactory can generate REST APIs for a multitude of databases, among them MySQL, Microsoft SQL Server, Oracle, PostgreSQL, and MongoDB. Along with the API, DreamFactory will also auto-generate an extensive set of interactive Swagger documentation for your API. Are you looking to connect databases?

Setting up backup strategy for backing up postgresql database on cloud foundry

We have setup a community postgresql service on Cloud Foundry (IBM Blumix). This is a free service and no automated backup and recovery is supported out of the box.
Is there a way to set up a standby server or a regular backup in case there is any data corruption/failure?
IBM compose and ElephantSQL can provide this service at a cost, butwe are not ready for it yet.
PostgreSQL is an experimental service and there is not a dashboard and other advanced features (Daily backup for example) that you can find in other services that you mentioned. If you want to do a backup you could write an ad-hoc script that 'saves'\exports all tables as you want and run it every day.
If you need PostegreSQL you can create a PostegreSQL by compose service $17.50 / mo for the first GB and $12 for Extra GB )
We used Postgresql Studio and deployed it on IBM Bluemix. The database service was connected to the pgstudio interface (This restricts the access to only connected databases). We also had to make minor changes to pgstudio so that we could use pg_dump with the interface.
The result: We could manually dump the data. This solution works well as we could take regular dumps (though manually).
In the free tier you are right in saying that you cant get the backup. Those features are available only in Compose for PostgresSQL service - but that's a paid service.