Monitoring Queries on Postgres - postgresql

I have an web application that executes query against a RDS Postgres Database. For this application, we use a Trunk based development and our developers can and should deploy anything on master branch directly to production. During the day, when we are operating in a low workload we can't see any performance degradation on database, but at night ( we operate a courier service), when we experiment huge workload we can have some performance degradation...
My question is: How should I monitor this kind of behaviour?
I don't want to impose to run a stress test before deploy to production.
I would like to have a tool that can monitor our database and inform like: "Take care! You have a new query (or a slow query) on your database caused by Pull Request 1234".

If you are on RDS for PostgreSQL 10, or can upgrade to that version, then you can use Performance Insights to monitor your running instance, to see which queries are generating load on your instance, and what wait states those queries are in. You can find more info here: https://aws.amazon.com/rds/performance-insights/
Full disclosure: I am the Product Manager for Amazon Aurora PostgreSQL, which was the first db engine to support Performance Insights.

The simple solution is to use the pg_stat_statements. extension. It can show you the queries that consumed the most run time altogethet ar one glance.

Related

Best practice for running database schema migrations

Build servers are generally detached from the VPC running the instance. Be it Cloud Build on GCP, or utilising one of the many CI tools out there (CircleCI, Codeship etc), thus running DB schema updates is particularly challenging.
So, it makes me wonder.... When's the best place to run database schema migrations?
From my perspective, there are four opportunities to automatically run schema migrations or seeds within a CD pipeline:
Within the build phase
On instance startup
Via a warm-up script (synchronously or asynchronously)
Via an endpoint, either automatically or manually called post deployment
The primary issue with option 1 is security. With Google Cloud Sql/Google Cloud Build, it's been possible for me to run (with much struggle), schema migrations/seeds via a build step and a SQL proxy. To be honest, it was a total ball-ache to set up...but it works.
My latest project is utilising MongoDb, for which I've connected in migrate-mongo if I ever need to move some data around/seed some data. Unfortunately there is no such SQL proxy to securely connect MongoDb (atlas) to Cloud Build (or any other CI tools) as it doesn't run in the instance's VPC. Thus, it's a dead-end in my eyes.
I'm therefore warming (no pun intended) to the warm-up script concept.
With App Engine, the warm-up script is called prior to traffic being served, and on the host which would already have access via the VPC. The warmup script is meant to be used for opening up database connections to speed up connectivity, but assuming there are no outstanding migrations, it'd be doing exactly that - a very light-weight select statement.
Can anyone think of any issues with this approach?
Option 4 is also suitable (it's essentially the same thing). There may be a bit more protection required on these endpoints though - especially if a "down" migration script exists(!)
It's hard to answer you because it's an opinion based question!
Here my thoughts about your propositions
It's the best solution for me. Of course you have to take care to only add field and not to delete or remove existing schema field. Like this, you can update your schema during the Build phase, then deploy. The new deployment will take the new schema and the obsolete field will no longer be used. On the next schema update, you will be able to delete these obsolete field and clean your schema.
This solution will decrease your cold start performance. It's not a suitable solution
Same remark as before, in addition to be sticky to App Engine infrastructure and way of working.
No real advantage compare to the solution 1.
About security, Cloud Build will be able to work with worker pool soon. Still in alpha but I expect in the next month an alpha release of it.

for db2 on cloud, are things like runstats and reorgchk/reorg done automatically?

I am seeing some slow performance on a couple of my queries that run against my db2 on cloud instance. When I had a local db2, I would try these tools to see if I could improve performance. Now, with db2 on cloud, I believe I can run them using admin_cmd, however, if they are already being run automatically on my db objects, there is no point, but I am not sure how to tell.
Yes, Db2 on Cloud does auto reorgs and runstats automatic. We do recommend running them manually, if you are running a lot of data loads to better the performance.
As you stated, Db2 on Cloud is a managed (as a Service) database offering. But this is for the general part, not for application-specific stuff. Backup / restore can be done without any application insights, but creating indexes, running runstats or performing reorgs is application-specific.
Runstats can be invoked using admin_cmd. The same is true for running reorg on tables and indexes.

MongoDB replication to hard drive

MonogDB's dynamic schema design is driving me towards it to replace MySQL in a production site. But this project runs on only 1 dedicated server (with 2 hard drives).
Docs about "MongoDB for production" recommends multiple servers. This makes me wonder if MongoDB is only suited for large commercial projects?
Anyways... I am wondering if the live database data can be replicated to the second hard drive for backup & recovery (to recover from corrupt data due to hard stop).
Any thoughts against the use of MongoDB in a single server environment is also appreciated. In this project, the biggest database will be less than 7GB.
Thanks

how to monitor a Heroku postgres database

NewRelic gives nice database analyses, however it seems to track only the web app's transactions.
I have independently managed servers which query and load my Heroku postgresql database. Is there a way I can get diagnostics and analysis of the database activity so that it will include all connections to it?
New Relic application monitoring will only collect data on database queries that are part of a web transaction or background task that is being monitored. If you're using one of New Relic's supported languages to query your database, you may be able to track that code as a background task (see https://newrelic.com/docs/features/monitoring-background-processes). If you would like a general monitoring plugin for your postgresql database, you could check out the postgresql plugin for New Relic (created and supported by Boundless): http://newrelic.com/plugins/boundless/109.
You should also try Heroku PG Extras: https://github.com/heroku/heroku-pg-extras. That will give info about cache hit, indexes, long queries, etc.

MongoDB on Azure Cloud

Is MongoDB for Azure production ready ?
Can anyone share some experience with it ?
Looks like comfort is missing for using it for prod.
What do you think ?
Edit: Since there is a misunderstanding in my question i will try to redefine it.
The information i look into from the community is sharing an info of someone who is running mongo on windows azure to share experience from it.
What i mean by experience is not how to run it in the cloud(we already have the manual on 10gens faq) nor how many bugs it have(we can see that in mongo-azure jira).
What i am looking for is that how it is going with performance ?
Are there any problems(side effects) from running mongodb on azure ?
How does mongodb handle VM recycling ?
Does anyone tried sharding ?
In the end, is the mongo-azure worker role from 10gens stable for using it in production ?
Hope this clears out.
A bit of clarification here. MongoDB itself is production-ready. And MongoDB works just fine in Windows Azure, as long as you set up the scaffolding to get it to work in the environment. This typically entails setting up an Azure Drive, to give you durable storage. Alternatively, using a replicaset, you effectively have eventual consistency across the set members. Then, you could consider going with a standalone (or standalone with hot standby). Personally, I prefer a replicaset model, and that's typical guidance for production MongoDB systems.
As far as 10gen's support for Windows Azure: While the page #SyntaxC4 points to does clarify the wrapper is in a preview state, note that the wrapper is the scaffolding code that launches MongoDB. This scaffolding was initially released in December 2011, and has had a few tweaks since then. It uses the production MongoDB bits (and works just fine with version 2.0.5 which was published on May 9). One caveat is that the MongoDB replicaset roles are deployed alongside your application's roles, since the client app needs visibility to all replica set nodes (to properly build the set). To avoid this limitation, you'd need to run mongos and the entry point (and that's not part of 10gen's scaffolding solution).
Forgetting the preview scaffolding a moment: I have customers running MongoDB in production, with custom scaffolding. One of them is running a rather large deployment, with multiple shards, using a replicaset per shard.
So... does it work in Windows Azure? Yes. Should you take advantage of 10gen's supplied scaffolding? If you're just looking for a simple way to launch a replicaset, I think it's fine. If you want a standalone model, or a shard model, or if you need a separate deployment for MongoDB, you'd currently need to do this on your own (or modify the project 10gen published).
MongoLab is now offering Mongo as a service on Azure MongoLab Blog
Free Demo account is 0.5 GB storage are available in the Windows Azure Store
The warning message on their site says that it's a preview. This would mean that there would be no support for it at a product level in Windows Azure.
If you want to form your own opinion on a comfort level, you can take a look at their bug tracking system and get a feeling for what people are currently reporting as issues.