Azure Data Factory AWS PostgreSQL RDS as source connection

Azure Data Factory AWS PostgreSQL RDS as source connection - postgresql

I am attempting to use an AWS PostgreSQL RDS instance as my source for a data factory pipeline. I am unable to get this connection to work in ADF v1 or v2. I have tried everything from using a PostgreSQL connection to an Azure database for PostgreSQL. Essentially I am going cloud to cloud, and this connection doesn't seem to be supported yet. Has anyone else had any luck doing this?

yes, this is horribly broken as you have found out. Two main problems:
1) You must install the NpgSQL 2.0.14.3. driver (chose core installation option to ensure both x86 and x64 versions are installed) this version won't validate the server certificate
2) PostgreSQL connector can only enter connection information using by uploading via PowerShell the current GUI does not support the full configuration of the data source:
Here is the example json:
{
"name": "PostgreSqlLinkedService",
"properties": {
"type": "PostgreSql",
"typeProperties": {
"server": "<server>",
"database": "<database>",
"username": "<username>",
"password": {
"type": "SecureString",
"value": "<password>"
}
},
"connectVia": {
"referenceName": "<name of Integration Runtime>",
"type": "IntegrationRuntimeReference"
}
}
}
alternatively, the ODBC driver can work around this issue as you need to specify additional properties on the connection string that are not exposed by the pg connector. you need to add the following value to the DSN:
**sslmode=Require;Trust Server Certificate=true*
and that should resolve the error
note: the ODBC nor the Postgresql Connector's currently work with the ADF v2 Lookup activity.

Related

Azure Database for PostgreSQL: "InvalidInputs" error restoring an LTR backup from Azure Backup Vault

I am trying to restore an long-term retention (LTR) Azure Database for PostgreSQL database backup using Azure Backup Vault as described in these articles:
https://learn.microsoft.com/en-us/azure/backup/backup-azure-database-postgresql (backup configuration steps)
https://download.microsoft.com/download/9/1/9/91990314-33bd-4eaa-a084-d1f7e6175ee1/AzBkpPostgres_ManualPermissions.docx (indirectly linked by the above article)
The LTR backups are completing without issues, however, restoring them to the Azure Database for PostgreSQL resource leads to an "InvalidInputs" error:
With this in the Activity Log:
"properties": {
"statusMessage": "{\"status\":\"Failed\",\"error\":{\"code\":\"ResourceOperationFailure\",\"message\":\"The resource operation completed with terminal provisioning state 'Failed'.\",\"details\":[{\"code\":\"InvalidInputs\",\"message\":\"\",\"additionalInfo\":[{\"type\":\"UserFacingError\",\"info\":{\"message\":\"\",\"recommendedAction\":[\"\"],\"code\":\"InvalidInputs\",\"target\":\"\",\"isRetryable\":false,\"isUserError\":false,\"properties\":{\"ActivityId\":\"a7a2867e-8889-41c4-a5cf-37fd1394d3d6-Ibz\"}}}]}]}}",
"eventCategory": "Administrative",
"entity": "/subscriptions/XXXXXXXXXX/resourceGroups/poc-rg/providers/Microsoft.DataProtection/backupVaults/XXXXXXXXXXpoc-psql-bv-2/backupInstances/XXXXXXXXXXpoc-psql-2-backup_restore_test_2",
"message": "Microsoft.DataProtection/backupVaults/backupInstances/ValidateRestore/action",
"hierarchy": "30ff29b8-a165-42a0-a594-f726229a5954"
},
Restoring to an Azure Storage Account leads to this error:
"properties": {
"statusMessage": "{\"status\":\"Failed\",\"error\":{\"code\":\"ResourceOperationFailure\",\"message\":\"The resource operation completed with terminal provisioning state 'Failed'.\",\"details\":[{\"code\":\"SMAWTeeInternalError\",\"message\":\"Microsoft Azure Backup encountered an internal error.\",\"additionalInfo\":[{\"type\":\"UserFacingError\",\"info\":{\"message\":\"Microsoft Azure Backup encountered an internal error.\",\"recommendedAction\":[\"Wait for a few minutes and then try the operation again. If the issue persists, please contact Microsoft support.\"],\"code\":\"SMAWTeeInternalError\",\"target\":\"\",\"isRetryable\":false,\"isUserError\":false,\"properties\":{\"ActivityId\":\"a7a2867e-8889-41c4-a5cf-37fd1394d3dd-Ibz\"}}}]}]}}",
"eventCategory": "Administrative",
"entity": "/subscriptions/XXXXXXXXXX/resourceGroups/poc-rg/providers/Microsoft.DataProtection/backupVaults/XXXXXXXXXXpoc-psql-bv-2/backupInstances/XXXXXXXXXXpoc-psql-2-backup_restore_test_2",
"message": "Microsoft.DataProtection/backupVaults/backupInstances/ValidateRestore/action",
"hierarchy": "30ff29b8-a165-42a0-a594-f726229a5954"
},
I have tried with both Azure Database for PostgreSQL versions 10 and 11. The azure_backup role is granted the following permissions:
ALTER USER azure_backup WITH CREATEDB;
GRANT azure_pg_admin TO azure_backup;
Any insight or help is appreciated.
Thanks!

We have fixed this issue in the September update of 2021. Please re-try with the latest version.

Deploying feathersjs app with mongodb

I am trying to deploy my feathersJS app on heroku with mongodb database. I used mLab sandbox plan, I did not configure anything on it. But as there is no documentation or any previous questions about it here I am.
I have made a feathersJS app running with mongodb. But when I deploy it on heroku I always get an error 503, service unavaible (timeout).
Here is in /config/default.json
"mongodb": "mongodb://localhost:27017/api_feathers",
There is may be something to change on this line.
And here production.js
{ "host": "api-feathers-app.feathersjs.com", "port": "PORT" }

As already pointed out you can add the MongoDB URL to your production.json in the mongodb property. Heroku will also set the MONGODB_URI for most MongoDB addons which you can use by changing production.json to::
{
"host": "api-feathers-app.feathersjs.com",
"port": "PORT",
"mongodb": "MONGODB_URI"
}

If you log into mlab and go to your database you should see a connection string. At the bottom of your production.json you should add a line to connect to the mlab database.
Something like:
{
"host": "api-feathers-app.feathersjs.com",
"port": "PORT",
"mongodb": "mongodb://USERNAME:PASSWORD#ADDRESS-AT-MLAB:PORT/DB_NAME"
}
That last line will be provided to you in the mlab settings. In addition mlab may require some query string at the end of the mongodb url, such as ?ssl=true. But they will tell you what is necessary.

Log in to your mlab account and create a new environment if not created already.
Then click on the database name. It will show you tab containing Collections, Users, Stats, Backups and Tools.
Click on the Users tab and it will show you a list of users assigned to that db. If you haven't created a user for that, then you will have to create a user by clicking on the "Add database user" button.
Once the user has been created you then add this to your production.json file. Add this as the value to the mongodb key in production.json file
mongodb://<dbuser>:<dbpassword>#ds115749.mlab.com:15749/<dbname>
replace the <dbuser> with the database username you created earlier and <dbpassword> the password associated to the dbuser and replace with the name of the database you are trying to access.
Here's a full code base for the production.json file
{
"host": "https://your.domainname.com/",
"port": "PORT",
"public": "../public/",
"paginate": {
"default": 10,
"max": 1000
},
"mongodb": "mongodb://<dbuser>:<dbpassword>#ds115749.mlab.com:15749/<dbname>",
}
Please note, default.json is for your development environment while production.json is for your production and used when you are deploying file in the same directory.

Get PostgreSQL execution plan with Drill query

I need to get the execution plan of a query that execute from Apache Drill using the PostgreSQL storage plugin (not the Drill execution plan, but the PostGIS one).
So I enable the explain plan logs with following commands;
SET auto_explain.log_min_duration = 0;
SET auto_explain.log_analyze = true;
And if I execute a query from pgAdmin, it shows the statement and the plan. But if I execute the same query from Drill, it does not log anything.
Do you know why this happens and how can be solved this situation?
Note: I checked the connection and it's ok, they are the same in pgAdmin and Drill, also in Drill I execute queries and I get results, so I assume that there is no connection problems.

I suspect you are executing the SET commands from a postgres command line, so those settings would only apply to Drill's postgres session. In order to apply those settings to Drill's postgres session, try adding those properties to Drill's storage plugin configuration. Here is an example configuration for those properties:
{
"type": "jdbc",
"driver": "org.postgresql.Driver",
"url": "jdbc:postgresql://localhost:5432;auto_explain.log_min_duration=0;auto_explain.log_analyze=true",
"username": "postgres",
"enabled": true
}

What is the way to connect ClearDb Mysql dashboard on Bluemix?

I created a ClearDb Mysql database on Bluemix. I got all the data to conect it but i can't do it.
I tried to use Squirrel and i always got a timeout connection.
So i wanted to use the web interface but it asks me a login/password because the area seems restricted.
I got this message on popup :
A username and password are being requested by https://bluemix-eu-gb.marketplace.ibmcloud.com. The site says: "Restricted Site"
I tried to feed the fields with the data found to connect the db : it's failed. I also tried the login/password for Bluemix : failed too...
I tried many other things but the good one.
Maybe i missed something when i created the DB or ... i don't know.
If anyone could give me the (obvious?) trick?

I used SQuirrel and it works fine for me to connect to the ClearDB MySQL database instance.
Here are the steps I followed:
1) Bound your ClearDB MySQL instance to an application (this is step is only needed so you can find your database credentials)
2) In the Bluemix UI select the application you created above, this will open the application dashboard
3) Locate the ClearDB MySQL instance tile and select the "Show Credentials" at the bottom of the tile. You should something similar to this (for privacy I changed ids and password below):
{
"cleardb": [
{
"name": "ClearDB MySQL Database-29",
"label": "cleardb",
"plan": "spark",
"credentials": {
"jdbcUrl": "jdbc:mysql://us-cdbr-iron-east-04.cleardb.net/DATABASENAME?user=USERNAME&password=PASSWORD",
"uri": "mysql://USERNAME:PASSWORD#us-cdbr-iron-east-04.cleardb.net:3306/DATABASENAME?reconnect=true",
"name": "DATABASENAME",
"hostname": "us-cdbr-iron-east-04.cleardb.net",
"port": "3306",
"username": "USERNAME",
"password": "PASSWORD"
}
}
]
}
4) Create the alias in SQuirrel and use the values above in the fields, please note that your server name could be different than mine:
Name: MyAliasName
Driver: MySQL Driver
URL: jdbc:mysql://us-cdbr-iron-east-04.cleardb.net:3306/DATABASENAME
User Name: USERNAME
Password: PASSWORD
5) Click Test button to check your connection. Everything should be fine.

How to pipe data from AWS Postgres RDS to S3 (then Redshift)?

I'm using AWS data pipeline service to pipe data from a RDS MySql database to s3 and then on to Redshift, which works nicely.
However, I also have data living in an RDS Postres instance which I would like to pipe the same way but I'm having a hard time setting up the jdbc-connection. If this is unsupported, is there a work-around?
"connectionString": "jdbc:postgresql://THE_RDS_INSTANCE:5432/THE_DB”

Nowadays you can define a copy-activity to extract data from a Postgres RDS instance into S3. In the Data Pipeline interface:
Create a data node of the type SqlDataNode. Specify table name and select query
Setup the database connection by specifying RDS instance ID (the instance ID is in your URL, e.g. your-instance-id.xxxxx.eu-west-1.rds.amazonaws.com) along with username, password and database name.
Create a data node of the type S3DataNode
Create a Copy activity and set the SqlDataNode as input and the S3DataNode as output

this doesn't work yet. aws hasnt built / released the functionality to connect nicely to postgres. you can do it in a shellcommandactivity though. you can write a little ruby or python code to do it and drop that in a script on s3 using scriptUri. you could also just write a psql command to dump the table to a csv and then pipe that to OUTPUT1_STAGING_DIR with "staging: true" in that activity node.
something like this:
{
"id": "DumpCommand",
"type": "ShellCommandActivity",
"runsOn": { "ref": "MyEC2Resource" },
"stage": "true",
"output": { "ref": "S3ForRedshiftDataNode" },
"command": "PGPASSWORD=password psql -h HOST -U USER -d DATABASE -p 5432 -t -A -F\",\" -c \"select blah_id from blahs\" > ${OUTPUT1_STAGING_DIR}/my_data.csv"
}
i didn't run this to verify because it's a pain to spin up a pipeline :( so double check the escaping in the command.
pros: super straightforward and requires no additional script files to upload to s3
cons: not exactly secure. your db password will be transmitted over the wire without encryption.
look into the new stuff aws just launched on parameterized templating data pipelines: http://docs.aws.amazon.com/datapipeline/latest/DeveloperGuide/dp-custom-templates.html. it looks like it will allow encryption of arbitrary parameters.

AWS now allow partners to do near real time RDS -> Redshift inserts.
https://aws.amazon.com/blogs/aws/fast-easy-free-sync-rds-to-redshift/

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse

Azure Data Factory AWS PostgreSQL RDS as source connection - postgresql

Related

Azure Database for PostgreSQL: "InvalidInputs" error restoring an LTR backup from Azure Backup Vault

Deploying feathersjs app with mongodb

Get PostgreSQL execution plan with Drill query

What is the way to connect ClearDb Mysql dashboard on Bluemix?

How to pipe data from AWS Postgres RDS to S3 (then Redshift)?

Categories

Resources