I'm looking for pipeline setup to purge data from RDS PostgreSQL instance using AzureDevOps pipeline. If anyone can share the idea or steps it would be very helpful.
Related
How do I create a hyperscale PostgreSQL DB from the Azure CLI?
Please note, I am not asking about Arc-enabled Kubernetes clusters.
The current documentation is for Portal only.
Is there a way to do this from the Azure CLI?
Furkan is here from Microsoft (PostgreSQL Hyperscale Citus team),
We currently do not support the Azure CLI based provisioning. However, we have rest APIs in private preview. You can see the API description here https://github.com/Azure/azure-rest-api-specs/tree/main/specification/postgresqlhsc/resource-manager
Is there any possibility to take the backup of mongodb which hosted in Azure-vm(ubuntu18.04)from azure services?
I already have the script to take the backup of mongodb and send it to azure blob storage.
But I don't want to take the backup in azure-VM server and send it to blob.
Instead is there any other way to take backup from any azure services?
I analyzed and found that we can migrate the database to azure cosmos and take backup. I can't afford to pay cost for that. So i want to know any other way to take backup directly from azure
We have a requirement that while provisioning the Databricks service thru CI/CD pipeline in Azure DevOps we should able to mount a blob storage to DBFS without connecting to a cluster. Is it possible to mount object storage to DBFS cluster by using a bash script from Azure DevOps ?
I looked thru various forums but they all mention about doing this using dbutils.fs.mount but the problem is we cannot run this command in Azure DevOps CI/CD pipeline.
Will appreciate any help on this.
Thanks
What you're asking is possible but it requires a bit of extra work. In our organisation we've tried various approaches and I've been working with Databricks for a while. The solution that works best for us is to write a bash script that makes use of the databricks-cli in your Azure Devops pipeline. The approach we have is as follows:
Retrieve a Databricks token using the token API
Configure the Databricks CLI in the CI/CD pipeline
Use Databricks CLI to upload a mount script
Create a Databricks job using the Jobs API and set the mount script as file to execute
The steps above are all contained in a bash script that is part of our Azure Devops pipeline.
Setting up the CLI
Setting up the Databricks CLI without any manual steps is now possible since you can generate a temporary access token using the Token API. We use a Service Principal for authentication.
https://learn.microsoft.com/en-US/azure/databricks/dev-tools/api/latest/tokens
Create a mount script
We have a scala script that follows the mount instructions. This can be Python as well. See the following link for more information:
https://docs.databricks.com/data/data-sources/azure/azure-datalake-gen2.html#mount-azure-data-lake-storage-gen2-filesystem.
Upload the mount script
In the Azure Devops pipeline the databricks-cli is configured by creating a temporary token using the token API. Once this step is done, we're free to use the CLI to upload our mount script to DBFS or import it as a notebook using the Workspace API.
https://learn.microsoft.com/en-US/azure/databricks/dev-tools/api/latest/workspace#--import
Configure the job that actually mounts your storage
We have a JSON file that defines the job that executes the "mount storage" script. You can define a job to use the script/notebook that you've uploaded in the previous step. You can easily define a job using JSON, check out how it's done in the Jobs API documentation:
https://learn.microsoft.com/en-US/azure/databricks/dev-tools/api/latest/jobs#--
At this point, triggering the job should create a temporary cluster that mounts the storage for you. You should not need to use the web interface, or perform any manual steps.
You can apply this approach to different environments and resource groups, as do we. For this we make use of Jinja templating to fill out variables that are environment or project specific.
I hope this helps you out. Let me know if you have any questions!
We have a VM that runs a MongoDB and a web api to it.
Currently there is no backup solution enabled.
My goal is to protect data in MongoDB.
Is it a good idea to snapshot the whole Azure VM instance daily?
Or should I go for a database backup solution?
We are planning to go for PostgreSQL RDS in AWS environment. There are some files in S3 which we will need to load every week. I don't see any option in AWS documentation where we can load data from S3 to PostgreSQL RDS. I see it is possible for Aurora but cannot find anything for PostgreSQL.
Any help will be appreciated.
One option is to use AWS Data Pipeline. It's essentially a JSON script that allows you to orchestrate the flow of data between sources on AWS.
There's a template offered by AWS that's setup to move data between S3 and MySQL. You can find it here. You can easily follow this and swap out the MySQL parameters with those associated with your Postgres instance. Data Pipeline simply looks for RDS as the type and does not distinguish between MySQL and Postgres instances.
Scheduling is also supported by Data Pipeline, so you can automate your weekly file transfers.
To start this:
Go to the Data Pipeline service in your AWS console
Select "Build from template" under source
Select the "Load S3 to MySQL table" template
Fill in the rest of the fields and create the pipeline
From there, you can monitor the progress of the pipeline in the console!