Cloud PostgreSQL clean large objects vacuumlo - postgresql

We are managing to use GCP CloudSQL for our PostgreSQL database,
at this moment one of our applications uses large objects and i was wondering how to perform a vacuumlo operation on such platforms (question might be valid for AWS RDS or any other cloud postgresql provider).
Does making custom queries/procedures to perform the same task is the only solution?

Since vacuumlo is a client tool, it should work just fine with hosted databases.

Related

Best way to set up jupyter notebook project in AWS

My current project have the following structure:
Starts with a script in jupyter notebook which dowloads data from a CRM API to put in a local PostgressSql database I run with PgAdmin. After that it runs cluster analysis, return some scoring values, creates a table in database with the results and updates this values in the CRM with another API call. This process will take between 10 to 20 hours (the API only allows 400 requests per minute).
The second notebook reads the database, detects last update, runs api call to update database since the last call, runs kmeans analysis to cluster the data, compare results with the previous call, updates the new ones and the CRM via API. This second process takes less than 2 hours in my estimation and I want this script to run every 24 hours.
After testing, this works fine. Now I'm evaluating how to put this in production in AWS. I understand for the notebooks I need Sagemaker and from I have seen is not that complicated, my only doubt here is if I can call the API without implementing aditional code or need some configuration. My second problem is database. I don't understand the difference between RDS which is the one I think I have to use for this and Aurora or S3. My goal is to write the less code as possible, but a have try some tutorial of RDS like this one: [1]: https://www.youtube.com/watch?v=6fDTre5gikg&t=10s, and I understand this connect my local postgress to AWS but I can't find the data in the amazon page, only creates an instance?? and how to connect to it to analysis this data from SageMaker. My final goal is to run the notebooks in the cloud and connect to my postgres in the cloud. Just some orientation about how to use this tools would be appreciated.
I don't understand the difference between RDS which is the one I think I have to use for this and Aurora or S3
RDS and Aurora are relational databases fully managed by AWS. "Regular" RDS allows you to launch the existing popular databases such as MySQL, PostgreSQSL and other which you can launch at home/work as well.
Aurora is in-house, cloud-native implementation databases compatible with MySQL and PosrgreSQL. It can store the same data as RDS MySQL or PosrgreSQL, but provides a number of features not available for RDS, such as more read replicas, distributed storage, global databases and more.
S3 is not a database, but an object storage, where you can store your files, such as images, csv, excels, similarly like you would store them on your computer.
I understand this connect my local postgress to AWS but I can't find the data in the amazon page, only creates an instance??
You can migrate your data from your local postgress to RDS or Aurora if you wish. But RDS nor Aurora will not connect to your existing local database, as they are databases themselfs.
My final goal is to run the notebooks in the cloud and connect to my postgres in the cloud.
I don't see a reason why you wouldn't be able to connect to the database. You can try to make it work, and if you encounter difficulties you can make new question on SO with RDS/Aurora setup details.

for db2 on cloud, are things like runstats and reorgchk/reorg done automatically?

I am seeing some slow performance on a couple of my queries that run against my db2 on cloud instance. When I had a local db2, I would try these tools to see if I could improve performance. Now, with db2 on cloud, I believe I can run them using admin_cmd, however, if they are already being run automatically on my db objects, there is no point, but I am not sure how to tell.
Yes, Db2 on Cloud does auto reorgs and runstats automatic. We do recommend running them manually, if you are running a lot of data loads to better the performance.
As you stated, Db2 on Cloud is a managed (as a Service) database offering. But this is for the general part, not for application-specific stuff. Backup / restore can be done without any application insights, but creating indexes, running runstats or performing reorgs is application-specific.
Runstats can be invoked using admin_cmd. The same is true for running reorg on tables and indexes.

Oracle FDW support in AWS

I have an OLTP DB in Oracle and a downstream OLAP System in PostgreSQL on-premises. The data from Oracle is pumped into PostgreSQL using Oracle_FDW.
I am exploring the possibility of moving the PostgreSQL to AWS, but none of the RDS have Oracle_fdw capability. One way out is to install PG on an EC2 instance but that would leave some of the features like read-replica provided natively by AWS. Is there a better workaround?
Also is there a way to fetch the data in Oracle RDS from Postgres RDS in AWS?
With PostgreSQL on Amazon RDS your choice of extensions is limited to the extensions they explicitly support. As far as I'm aware there's no way around this limitation.
Like you mentioned, the general option in this case would be to host PostgreSQL yourself on EC2 instead of RDS. You lose automatic backup/replication/management features, but you get the power and flexibility you need. This will certainly work but will require some leg work to replace what you're losing by not using RDS.
The only alternative to this I can think of is that you may be able to host a different (otherwise empty) PostgreSQL server with the oracle-fdw extension installed and use the postgres-fdw extension (which is supported by RDS) to proxy requests from your RDS hosted database, through your proxy PostgreSQL database, to your Oracle database and back. If the amount of data you're retrieving is substantial, or if the number of queries per minute is high this is probably a terrible idea. But it might be worth testing to see if it works for your use case.
I did a quick search around and I haven't been able to find any references to anyone actually layering foreign data wrappers like this but I also couldn't find anything in the manual or online saying it wasn't supported either. In theory it should work, but if you do try it make sure you thoroughly test it prior to using it to do anything important.
Oracle_FDW is now supported in recent versions - https://aws.amazon.com/about-aws/whats-new/2021/07/amazon-rds-for-postgresql-supports-oracle-fdw-extension-for-accessing-data-in-oracle-databases/

Setting up backup strategy for backing up postgresql database on cloud foundry

We have setup a community postgresql service on Cloud Foundry (IBM Blumix). This is a free service and no automated backup and recovery is supported out of the box.
Is there a way to set up a standby server or a regular backup in case there is any data corruption/failure?
IBM compose and ElephantSQL can provide this service at a cost, butwe are not ready for it yet.
PostgreSQL is an experimental service and there is not a dashboard and other advanced features (Daily backup for example) that you can find in other services that you mentioned. If you want to do a backup you could write an ad-hoc script that 'saves'\exports all tables as you want and run it every day.
If you need PostegreSQL you can create a PostegreSQL by compose service $17.50 / mo for the first GB and $12 for Extra GB )
We used Postgresql Studio and deployed it on IBM Bluemix. The database service was connected to the pgstudio interface (This restricts the access to only connected databases). We also had to make minor changes to pgstudio so that we could use pg_dump with the interface.
The result: We could manually dump the data. This solution works well as we could take regular dumps (though manually).
In the free tier you are right in saying that you cant get the backup. Those features are available only in Compose for PostgresSQL service - but that's a paid service.

How to replicate MySQL database to Cloud SQL Database

I have read that you can replicate a Cloud SQL database to MySQL. Instead, I want to replicate from a MySQL database (that the business uses to keep inventory) to Cloud SQL so it can have up-to-date inventory levels for use on a web site.
Is it possible to replicate MySQL to Cloud SQL. If so, how do I configure that?
This is something that is not yet possible in CloudSQL.
I'm using DBSync to do it, and working fine.
http://dbconvert.com/mysql.php
The Sync version do the service that you want.
It work well with App Engine and Cloud SQL. You must authorize external conections first.
This is a rather old question, but it might be worth noting that this seems now possible by Configuring External Masters.
The high level steps are:
Create a dump of the data from the master and upload the file to a storage bucket
Create a master instance in CloudSQL
Setup a replica of that instance, using the external master IP, username and password. Also provide the dump file location
Setup additional replicas if needed
VoilĂ !