Big Latency using Azure Database for PostgreSQL flexible server - postgresql

I have a PostgreSQL database deployed on Azure using Azure Database for PostgreSQL flexible server.
I have the same database created locally and queries that takes 70ms locally, takes 800ms on the database deployed on Azure.
The latency happens if I query the database from my local machine or from an app deployed in Azure Service.
Any clue what may be the issue or how this can be improved?
The compute tier I am using in Azure is the following (I know it's the worst one but I still think it shouldn't take almost a second to query a table that has 50 records in it and two columns):
Burstable (1-2 vCores) - Best for workloads that don’t need the full
CPU continuously
Standard_B1ms (1 vCore, 2 GiB memory, 640 max iops)
This is what Azure is showing me in the overview (so that's why I am guessing the problem should be network related and not CPU/memory related):

Burstable sku's are for testing and small workloads changing to General purpose will improve.

Related

How to downsize an AWS RDS instance to free tier

I want to create a free tier clone of a production AWS RDS PostgreSQL. As per my understanding, following are different ways
create a snapshot of the production DB and restore it on t2.micro
create a read replica of the production DB using t2.micro and then detach it as independent database
create a free tier database and restore a database dump of the production db
Option 3 is my last preference.
The problem is while creating read replica or restoring from snapshot, AWS doesn't explicitly allow to choose the free tier template. I just want to know if restoring to t2.micro without any advanced features like autoscaling, performance monitoring etc. is equivalent to free tier or not? I read here that the key thing with AWS production DB is that AWS provisions a secondary database provisioned to fallback in event of failure of the primary database or the Availability Zone in which the database is running.
AWS Free Tier doesn't actually care about the kind of service you use. Per their website you just get 750 instance hours per month for a db.t2.micro.
You can use these in any service you see fit and the discount will be applied automatically for the first 12 months.
Looking at the pricing page for RDS Postgres I can see, that these instances aren't listed anymore, which seems weird. The t2 instance family is fairly old, so they're probably trying to phase it out, but typically you can provision older instance types using the API directly if they're not available in the Console.
So what you want to do is create your db.t2.micro instance using one of the SDKs or the AWS CLI and restore from a snapshot. Alternatively you can create a read replica from the CLI and set the class to db.t2.micro. Later detaching that from the main cluster should work.
The production ready stuff refers to the Multi-AZ deployment, which is good for production use, but for anything production related a t2.micro seems like a bad choice, so I'm going to assume you're not planing to do that.

How can I deploy Mongo database on AWS?

I am building my own webapp which requires a huge database. I want to build and manage my own Mongo database on AWS rather than using Mongo Atlas. Which will be more cost saving? And whether I should go for Mongo Atlas? What will be its advantage over my own database?
There are pros and cons for both approaches:
Running MongoDB on AWS
Pros:
Complete control over how you run the database and how resources are allocated on the server. This could even be together with an application server on the same EC2 instance depending on your traffic and load. This might help with cost saving if your database is huge but isn't likely to see much traffic.
Cons:
You will be responsible for ensuring database availability and applying security patches as and when they are available. You may also have to setup firewalls and protect the EC2 instance and database in other ways that would be trivial to do on a hosted service like Atlas.
Data sharding and clustering can be a real pain to manage by yourself.
Running on Atlas
Pros:
Completely managed service where you don't have to be concerned about performance optimization or scalability. You pay for the services and Mongodb takes care of the rest.
You can focus on building a great application instead of spending your time on administering the database and the EC2 instance on which the database runs.
Cons:
You will be constrained by the options offered by Atlas. For most use cases this should be fine, but if you really want a specific change, it would be difficult to implement it if Mongodb doesn't already support it as a part of Atlas.
Think running your application on EC2 vs buying a server on-premise and running your application on that.
Being a managed service, costs might also be higher if your database does not see much traffic.
HOSTING yourself: You can get one or more AWS ec2 instances(which are VMs) where you can install and run Mongo DB yourself and manage it like you wanted to, making sure that you spin up more instances when the workload becomes large and there are instances up and running at all times to enable high availability.
Cost (high) - Management responsibilities (lots) - Full MongoDB functionality
MongoDB Atlas is a managed service, you don't need to worry about management tasks like scaling of your database and high availability when a single/more instances die... You pay a very low cost for it - this is run by MongoDb themselves on AWS, Azure, Google cloud;
Cost (low) - Management responsibilities (some) - Full MongoDB functionality
Now AWS has its own Mongo compatible database called DocumentDB - this is also a managed database, so you don't need to worry about scalability, high availability etc. This is only available on AWS so super simple and convinient.
Cost (low) - Management responsibilities (minimal) - Limited MongoDB functionality

AWS RDS with Postgres : Is OOM killer configured

We are running load test against an application that hits a Postgres database.
During the test, we suddenly get an increase in error rate.
After analysing the platform and application behaviour, we notice that:
CPU of Postgres RDS is 100%
Freeable memory drops on this same server
And in the postgres logs, we see:
2018-08-21 08:19:48 UTC::#:[XXXXX]:LOG: server process (PID XXXX) was terminated by signal 9: Killed
After investigating and reading documentation, it appears one possibility is linux oomkiller running having killed the process.
But since we're on RDS, we cannot access system logs /var/log messages to confirm.
So can somebody:
confirm that oom killer really runs on AWS RDS for Postgres
give us a way to check this ?
give us a way to compute max memory used by Postgres based on number of connections ?
I didn't find the answer here:
http://postgresql.freeideas.cz/server-process-was-terminated-by-signal-9-killed/
https://www.postgresql.org/message-id/CAOR%3Dd%3D25iOzXpZFY%3DSjL%3DWD0noBL2Fio9LwpvO2%3DSTnjTW%3DMqQ%40mail.gmail.com
https://www.postgresql.org/message-id/04e301d1fee9%24537ab200%24fa701600%24%40JetBrains.com
AWS maintains a page with best practices for their RDS service: https://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/CHAP_BestPractices.html
In terms of memory allocation, that's the recommendation:
An Amazon RDS performance best practice is to allocate enough RAM so
that your working set resides almost completely in memory. To tell if
your working set is almost all in memory, check the ReadIOPS metric
(using Amazon CloudWatch) while the DB instance is under load. The
value of ReadIOPS should be small and stable. If scaling up the DB
instance class—to a class with more RAM—results in a dramatic drop in
ReadIOPS, your working set was not almost completely in memory.
Continue to scale up until ReadIOPS no longer drops dramatically after
a scaling operation, or ReadIOPS is reduced to a very small amount.
For information on monitoring a DB instance's metrics, see Viewing DB Instance Metrics.
Also, that's their recommendation to troubleshoot possible OS issues:
Amazon RDS provides metrics in real time for the operating system (OS)
that your DB instance runs on. You can view the metrics for your DB
instance using the console, or consume the Enhanced Monitoring JSON
output from Amazon CloudWatch Logs in a monitoring system of your
choice. For more information about Enhanced Monitoring, see Enhanced
Monitoring
There's a lot of good recommendations there, including query tuning.
Note that, as a last resort, you could switch to Aurora, which is compatible with PostgreSQL:
Aurora features a distributed, fault-tolerant, self-healing storage
system that auto-scales up to 64TB per database instance. Aurora
delivers high performance and availability with up to 15 low-latency
read replicas, point-in-time recovery, continuous backup to Amazon S3,
and replication across three Availability Zones.
EDIT: talking specifically about your issue w/ PostgreSQL, check this Stack Exchange thread -- they had a long connection with auto commit set to false.
We had a long connection with auto commit set to false:
connection.setAutoCommit(false)
During that time we were doing a lot
of small queries and a few queries with a cursor:
statement.setFetchSize(SOME_FETCH_SIZE)
In JDBC you create a connection object, and from that connection you
create statements. When you execute the statments you get a result
set.
Now, every one of these objects needs to be closed, but if you close
statement, the entry set is closed, and if you close the connection
all the statements are closed and their result sets.
We were used to short living queries with connections of their own so
we never closed statements assuming the connection will handle the
things once it is closed.
The problem was now with this long transaction (~24 hours) which never
closed the connection. The statements were never closed. Apparently,
the statement object holds resources both on the server that runs the
code and on the PostgreSQL database.
My best guess to what resources are left in the DB is the things
related to the cursor. The statements that used the cursor were never
closed, so the result set they returned never closed as well. This
meant the database didn't free the relevant cursor resources in the
DB, and since it was over a huge table it took a lot of RAM.
Hope it helps!
TLDR: If you need PostgreSQL on AWS and you need rock solid stability, run PostgreSQL on EC2 (for now) and do some kernel tuning for overcommitting
I'll try to be concise, but you're not the only one who has seen this and it is a known (internal to Amazon) issue with RDS and Aurora PostgreSQL.
OOM Killer on RDS/Aurora
The OOM killer does run on RDS and Aurora instances because they are backed by linux VMs and OOM is an integral part of the kernel.
Root Cause
The root cause is that the default Linux kernel configuration assumes that you have virtual memory (swap file or partition), but EC2 instances (and the VMs that back RDS and Aurora) do not have virtual memory by default. There is a single partition and no swap file is defined. When linux thinks it has virtual memory, it uses a strategy called "overcommitting" which means that it allows processes to request and be granted a larger amount of memory than the amount of ram the system actually has. Two tunable parameters govern this behavior:
vm.overcommit_memory - governs whether the kernel allows overcommitting (0=yes=default)
vm.overcommit_ratio - what percent of system+swap the kernel can overcommit. If you have 8GB of ram and 8GB of swap, and your vm.overcommit_ratio = 75, the kernel will grant up to 12GB or memory to processes.
We set up an EC2 instance (where we could tune these parameters) and the following settings completely stopped PostgreSQL backends from getting killed:
vm.overcommit_memory = 2
vm.overcommit_ratio = 75
vm.overcommit_memory = 2 tells linux not to overcommit (work within the constraints of system memory) and vm.overcommit_ratio = 75 tells linux not to grant requests for more than 75% of memory (only allow user processes to get up to 75% of memory).
We have an open case with AWS and they have committed to coming up with a long-term fix (using kernel tuning params or cgroups, etc) but we don't have an ETA yet. If you are having this problem, I encourage you to open a case with AWS and reference case #5881116231 so they are aware that you are impacted by this issue, too.
In short, if you need stability in the near term, use PostgreSQL on EC2. If you must use RDS or Aurora PostgreSQL, you will need to oversize your instance (at additional cost to you) and hope for the best as oversizing doesn't guarantee you won't still have the problem.

High CPU Utilisation on AWS RDS - Postgres

Attempted to migrate my production environment from Native Postgres environment (hosted on AWS EC2) to RDS Postgres (9.4.4) but it failed miserably. The CPU utilisation of RDS Postgres instances shooted up drastically when compared to that of Native Postgres instances.
My environment details goes here
Master: db.m3.2xlarge instance
Slave1: db.m3.2xlarge instance
Slave2: db.m3.2xlarge instance
Slave3: db.m3.xlarge instance
Slave4: db.m3.xlarge instance
[Note: All the slaves were at Level 1 replication]
I had configured Master to receive only write request and this instance was all fine. The write count was 50 to 80 per second and they CPU utilisation was around 20 to 30%
But apart from this instance, all my slaves performed very bad. The Slaves were configured only to receive Read requests and I assume all writes that were happening was due to replication.
Provisioned IOPS on these boxes were 1000
And on an average there were 5 to 7 Read request hitting each slave and the CPU utilisation was 60%.
Where as in Native Postgres, we stay well with in 30% for this traffic.
Couldn't figure whats going wrong on RDS setup and AWS support is not able to provide good leads.
Did anyone face similar things with RDS Postgres?
There are lots of factors, that maximize the CPU utilization on PostgreSQL like:
Free disk space
CPU Usage
I/O usage etc.
I came across with the same issue few days ago. For me the reason was that some transactions was getting stuck and running since long time. Hence forth CPU utilization got inceased. I came to know about this, by running some postgreSql monitoring command:
SELECT max(now() - xact_start) FROM pg_stat_activity
WHERE state IN ('idle in transaction', 'active');
This command shows the time from which a transaction is running. This time should not be greater than one hour. So killing the transaction which was running from long time or that was stuck at any point, worked for me. I followed this post for monitoring and solving my issue. Post includes lots of useful commands to monitor this situation.
I would suggest increasing your work_mem value, as it might be too low, and doing normal query optimization research to see if you're using queries without proper indexes.

MongoDB on Amazon SSD-backed EC2

We have mongodb sharded cluster currently deployed on EC2 instances in Amazon. These shards are also replica sets. The instances used are using EBS with IOPS provisioned.
We have about 30 million documents in a collection. Our queries count the whole collection that matches the filters. We have indexes on almost all of the query-able fields. This results to the RAM reaching 100% usage. Our working set exceeds the size of the RAM. We think that the slow response of our queries are caused by EBS being slow so we are thinking of migrating to the new SSD-backed instances.
C3 is available
http://aws.typepad.com/aws/2013/11/a-generation-of-ec2-instances-for-compute-intensive-workloads.html
I2 is coming soon
http://aws.typepad.com/aws/2013/11/coming-soon-the-i2-instance-type-high-io-performance-via-ssd.html
Our only concern is that SSD is ephemeral, meaning the data will be gone once the instance stops, terminates, or fails. How can we address this? How do we automate backups. Is it a good idea to migrate to SSD to improve the performance of our queries? Do we still need to set-up a sharded cluster?
Working with the ephemeral disks is a risk but if you have your replication setup correctly it shouldn't be a huge concern. I'm assuming you've setup a three node replica set correct? Also you have three nodes for your config servers?
I can speak of this from experience as the company I'm at has been setup this way. To help mitigate risk I'm moving towards a backup strategy that involved a hidden replica. With this setup I can shutdown the hidden replica set and one of the config servers (first having stopped balancing) and take a complete copy of the data files (replica and config server) and have a valid backup. If AWS went down on my availability zone I'd still have a daily backup available on S3 to restore from.
Hope this helps.