Azure Synapse SQL CICD with multiple environments

Azure Synapse SQL CICD with multiple environments - azure-devops

Might be a stupid question but seems to be very difficult to find information about Synapse with multiple environments.
We have dev/test/prod environment setup and need to create partially-automated CICD pipelines between those. The only problem is now that we cannot build dynamic SQL scripts to query from the respective storage accounts - so those could be identical no matter the environment. So, dev Synapse using data from dev-storage and so on. Dedicated SQL pool can benefit from Stored Procedures, and I could pass parameters there if it works. But what about serverless pool? What is the correct way?
I've tried to look options from OPENROWSET with DATA_SOURCE argument as well as EXTERNAL DATA SOURCE expression without any luck. Also, no one seems to offer any information about this so I'm beginning to think if this whole perspective is wrong.
This kind of "external" file reading is new to me, I may have tried to put this in a SQL Server context in my head.
Thank you for your time!

Okay, Serverless pool does support both procedures and dynamic SQL, yet you currently cannot call that straight from Synapse Pipelines.
You have to either trigger those procedures via Spark notebooks or by creating separate Synapse Analytics Linked Services for each of your databases in a Synapse Serverless pools and work from there.

Related

How to sync data from Azure SQL managed instance to On-premise SQL server instance on weekly Basis

Please what are the options for me in the following scenario.
We have a SQL managed instance on Azure, our client has requested that they want their data (managed instance) to sync on a weekly basis with their on premise SQL server. I suggested using Azure Data Sync but that will be costly for us and using transactional replication based on what was described here https://learn.microsoft.com/en-us/azure/azure-sql/managed-instance/replication-transactional-overview?view=azuresql is also costly as it requires that we have another managed instance to act as a broker.
Please, I'm very welcoming to ideas you may have on how to go about this.
Thank you.

Azure Durable Function app with Postgres data store

We need to host existing Azure Durable Function app outside of Azure. We can run the function app as a container, but we'll need to configure an alternate data store (which is currently using Azure Storage). I can see MS SQL is a supported alternate - see here - and this will work for us, but Postgres is more aligned with the direction we're headed, so would be preferable. Has anyone used Postgres as the storage provider for Azure Durable Function apps?

The language specific operations to deploy outside the azure can be performed by the steps mentioned in https://learn.microsoft.com/en-us/azure/azure-functions/durable/durable-functions-overview?tabs=python
For the storage considerations refer: https://learn.microsoft.com/en-us/azure/azure-functions/storage-considerations
But there are no specific and perfect procedures for Postgre SQL and waiting for update from Azure.

Synapse Analytics vs SQL Server 2019 Big Data Cluster

Could someone explain the difference between SQL Server 2019 BDC vs Azure Synapse Analytics other than OLAP & OLTP differences? Why would one use Analytics over SQL Server 2019 BDC?

Azure Synapse Analytics is a Cloud based DWH with DataLake, ADF & PowerBI designers tightly integrated. it is a PaaS offering and it is not available on-prem. The DWH engine is MPP with limited polybase support (DataLake).
it also allows ypu to provision Apache Spark if needed.
SQLServer 2019 Big Data Cluster is a IaaS platform based on Kubernetes. it can be implemented on-prem on VMs or on OpenShift or on AKS Any cloud for that matter).
Its Data Virtualization support is very good with support for ODBC data sources and a Data Pool to support Data Virtualization- Implemented via Polybase.
Apache Spark makes up the Big Data compute.
Though it is not a MPP like Synapse, because of Pods in Kubernetes, multiple pods can be created on the fly through scalability features such as VMSS ... etc.
If you want Analytical capability on-prem you will use SQLServer 2019 BDC but if you want a Cloud based DWH with analytical capability features you will use Synapse

explain the difference between SQL Server 2019 BDC vs Azure Synapse Analytics
Server is OLTP and Synapse is OLAP. :D
other than OLAP & OLTP differences? Why would one use Analytics over SQL Server 2019 BDC?
Purely from a terminology point of view their product management have no clue what they are doing.
"SQL Server" is a DYI/on-prem/managed-by-you DB.
Fully Azure managed SaaS version of SQL Server is known as Azure SQL Database.
They also have "Azure SQL Managed Instance", and "SQL Server on Azure VM".
Azure Synapse is renamed Dedicated SQL-Pools.
Azure Synapse On-demand is renamed to Serverless SQL-Pools.
Azure Synapse Analytics = Dedicated + Serverless + bunch of ML services.
I'm going to answer assuming your question is:
Why would one use "Azure Synapse Dedicated or Serverless" over SQL Server?
SQL Server is on prem DIY, other is SaaS, fully managed by Azure. With this comes all the pros/cons of SaaS like No CAPEX, no management, elastic, very large scale, ...
Synapse' USP is it's MPP, which SQL Server does not have. Though I see things like Polybase and EXTERNAL TABLES being supported by SQL Server.
Due to MPP architecture, Synapse's transactional performance is worst by far (that I've seen). E.g. Executing INSERT INTO xxx VALUES(...) to add one row via JDBC would take about 1-2 seconds as against 10-12 seconds for importing CSV files with 10s of thousands of rows using COPY command. And INSERT INTO does not scale with JDBC batching. It'll take 100 seconds to insert 100 rows in one batch.
It is not your fault that you are confused. IMO Azure Product Management for Databases (SQL Server, DW, ADP, Synapse, Analytics and the 10 other flavors of all these) have no clue what they want to offer 2 years from today. Every product boasts of Big Data, Massive this and that, ML and Analytics, Elastic this and that. Go figure.
PS: Check out Snowflake if you haven't.
I'm not affiliated with Microsoft or Snowflake.

I believe the user user3129206 is asking
SQL Server 2019 BDC vs Azure Synapse Analytics
not
SQL Server vs Azure Synapse Analytics
so the first answer is relevant.
The only thing I'd argue is that the BDC is also an MPP like Synapse because of Pods in Kubernetes if implemented right, with many servers + HDS.
I plan to test BDC on-premises and see how demanding the install and maintenance are.
The neat thing about the BDC seems to be easy, partially or fully, to port it from on-premises to Azure or any cloud.
It seems that BDC is both OLTP and OLAP, trying to provide the best of both worlds.
As I am on the same comparison quest, I'll try to get back and share what I learn.

Best practice for running database schema migrations

Build servers are generally detached from the VPC running the instance. Be it Cloud Build on GCP, or utilising one of the many CI tools out there (CircleCI, Codeship etc), thus running DB schema updates is particularly challenging.
So, it makes me wonder.... When's the best place to run database schema migrations?
From my perspective, there are four opportunities to automatically run schema migrations or seeds within a CD pipeline:
Within the build phase
On instance startup
Via a warm-up script (synchronously or asynchronously)
Via an endpoint, either automatically or manually called post deployment
The primary issue with option 1 is security. With Google Cloud Sql/Google Cloud Build, it's been possible for me to run (with much struggle), schema migrations/seeds via a build step and a SQL proxy. To be honest, it was a total ball-ache to set up...but it works.
My latest project is utilising MongoDb, for which I've connected in migrate-mongo if I ever need to move some data around/seed some data. Unfortunately there is no such SQL proxy to securely connect MongoDb (atlas) to Cloud Build (or any other CI tools) as it doesn't run in the instance's VPC. Thus, it's a dead-end in my eyes.
I'm therefore warming (no pun intended) to the warm-up script concept.
With App Engine, the warm-up script is called prior to traffic being served, and on the host which would already have access via the VPC. The warmup script is meant to be used for opening up database connections to speed up connectivity, but assuming there are no outstanding migrations, it'd be doing exactly that - a very light-weight select statement.
Can anyone think of any issues with this approach?
Option 4 is also suitable (it's essentially the same thing). There may be a bit more protection required on these endpoints though - especially if a "down" migration script exists(!)

It's hard to answer you because it's an opinion based question!
Here my thoughts about your propositions
It's the best solution for me. Of course you have to take care to only add field and not to delete or remove existing schema field. Like this, you can update your schema during the Build phase, then deploy. The new deployment will take the new schema and the obsolete field will no longer be used. On the next schema update, you will be able to delete these obsolete field and clean your schema.
This solution will decrease your cold start performance. It's not a suitable solution
Same remark as before, in addition to be sticky to App Engine infrastructure and way of working.
No real advantage compare to the solution 1.
About security, Cloud Build will be able to work with worker pool soon. Still in alpha but I expect in the next month an alpha release of it.

Convert Terraform Templates to Cloudformation Templates

I want to convert the existing terraform templates(hcl) to aws cloudformation templates(json/yaml).
I basically want to find security issues with these templates through CFN_NAG.
An approach that I have already tried was converting HCL to JSON and then passing the template to CFN_NAG but I received a failure since both the templates have different structure.
Can anyone please provide any suggestions here?

A rather convoluted way of achieving this is to use Terraform to stand-up actual AWS environments, and then to use AWS’s CloudFormer to extract CloudFormation templates (JSON or YAML) from what Terraform has built. At which point you can use cfn-nag.
CloudFormer has some limitations, in that not all AWS resources are currently supported (RDS Security Groups for example) , but it will get you all the basic AWS resources.
Don't forget to remove all the environments, including CloudFormer's, to minimise the cost.

You want to use static code analysis to find security issues in your Terraform setup.
Trying to converting Terraform to CloudFormation to later use cfn-nag is one way. However, there exist tools now that directly operate on the Terraform setup.
I would recommend to take a look at terrascan. It is built on terraform_validate.

https://github.com/bridgecrewio/checkov/ runs security scanning for both terraform and cloudformation

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse