temporarily shut down redshift to reduce bill - amazon-redshift

Amazon says the following on Redshift billing
"Node usage hours are billed for each hour your data warehouse cluster is running in an Available state. If you no longer wish to be charged for your data warehouse cluster, you must terminate it to avoid being billed for additional node hours."
This means if I just create a cluster and whether use it or not I'll be billed 24/7 because the cluster doesn't have any state like "Suspend". Is there a way to shut down the whole Redshift server when not in use so that I'll be billed only for the hours when I want to use the clusters?
Edit: With Tomasz's reply it sounds like if I want to shutdown the cluster on weekend it'll be like backing up the whole database on Friday evening and restoring on Sunday evening. This doesn't sound good. What does Amazon really mean when they say "PAY ONLY FOR THE HOURS YOU USE"?
Can you tell me how much time will it take to backup/restore a data warehouse of size around 100GB? Can I automatically associate security groups to the cluster after restoring from the Java code?

You can create a manual snapshot of a cluster when you have finished work and then remove cluster.
You will pay for S3 storage, but that is much less than for running Redshift cluster.
Next day just restore cluster from latest snapshot. You will have to add security groups to new cluster, probably with JAVA API:
The new cluster will be associated only with the default security and
parameter groups. If the original cluster was associated with any
other security or parameter group, you will need to manually associate
those groups with the new cluster.
The easiest way to create snapshot is from the console, but you probably will want to do it automatically using cli or Java SDK.
Creating a snapshot of a 3 node cluster filled up to 80% took me about 5 minutes (it's so quick because snapshots are incremental). 100GB is much less than my setup, so it should be even faster. Also restore shouldn't take long time.

UPDATE: A lot has changed in the intervening years, in particular restore from snapshot is now quite fast. Your cluster becomes available in a few minutes and you can run queries while the restore continues in the background. Total time for complete restore of 100GB would now be measured in minutes (varies based on node type & count).
What does Amazon really mean when they say "PAY ONLY FOR THE HOURS YOU USE"?
You pay for the whole hour of any partial hours used.
Can you tell me how much time will it take to backup/restore a data warehouse of size around 100GB?
Snapshots are incremental and this is what makes them fast (as Tomasz mentioned). It's is fairly quick to shutdown a cluster about half an hour. However restoring from a snapshot is very slow I'd suggest around 3 hours for restoring 100GB.
If you really want to be able to take a database cluster up and down quickly you might be better using another analytic DB (e.g. Greenplum or Vertica free editions) with the data stored on EBS volumes. It'd be a lot more work to manage though, that's the tradeoff.

Now we can able to pause and resume the Redshift cluster (both Console and CLI)
check out the link:
https://aws.amazon.com/blogs/big-data/lower-your-costs-with-the-new-pause-and-resume-actions-on-amazon-redshift/

Related

Postgres insert slow after snapshot restore but not after restart

My setup
Postgres 11 running on an AWS EC2 t4g.xlarge instance (4 vCPU, 16GB) running Amazon Linux.
Set up to take a nightly disk snapshot (my workload doesn't require high reliability).
Database has table xtc_table_1 with ~6.3 million rows, about 3.2GB.
Scenario
To test some new data processing code, I created a new test AWS instance from the nightly snapshot of my production instance.
I create a new UNLOGGED table, and populate it with INSERT INTO holding_table_1 SELECT * FROM xtc_table_1;
It takes around 2 min 24 sec for the CREATE statement to execute.
I truncate holding_table_1 and run the CREATE statement again, and it completes in 30 sec. The ~30 second timing is consistent for successive truncates and creates of the table.
I think this may be because of some caching of data. I tried restarting Postgres service, then rebooting the AWS instance (after stopping postgres with sudo service postgresql stop), then stopping and starting the AWS instance. However, it's still ~30 sec to create the table.
If I rebuild a new instance from the snapshot, the first time I run the CREATE statement it's back to the ~2m+ time.
Similar behavior for other tables xtc_table_2, xtc_table_3.
Hypothesis
After researching and finding this answer, I wonder if what's happening is that the disk snapshot contains some WAL data that is being replayed the first time I do anything with xtc_table_n. And that subsequently, because Postgres was shut down "nicely" there is no WAL to playback.
Does this sound plausible?
I don't know enough about Postgres internals to be sure. I would have imagined that any WAL playback would happen on starting up postgres, but maybe it happens at the individual table level the first time a table is touched?
Knowing the reason is more than just theoretical; I'm using the test instance to do some tuning on some processing code, and need to be confident in having a consistent baseline to measure from.
Let me know if more information is needed about my setup or what I'm doing.
#jellycsc's suggestion was correct; adding more info here in case it's helpful to anyone else.
The problem I was encountering was not a postgres issue at all, but because of the way AWS handles volumes and snapshots.
From this page:
For volumes that were created from snapshots, the storage blocks must
be pulled down from Amazon S3 and written to the volume before you can
access them. This preliminary action takes time and can cause a
significant increase in the latency of I/O operations the first time
each block is accessed. Volume performance is achieved after all
blocks have been downloaded and written to the volume.
I used the fio utility as described in the linked AWS page to initialize the restored volume, and first-time performance was consistent with subsequent query times.

Kubernetes deployment strategy using CQRS with dotnet & MongoDb

I am re-designing a dotnet backend api using the CQRS approach. This question is about how to handle the Query side in the context of a Kubernetes deployment.
I am thinking of using MongoDb as the Query Database. The app is dotnet webapi app. So what would be the best approach:
Create a sidecar Pod which containerizes the dotnet app AND the MongoDb together in one pod. Scale as needed.
Containerize the MongoDb in its own pod and deploy one MongoDb pod PER REGION. And then have the dotnet containers use the MongoDb pod within its own region. Scale the MongoDb by region. And the dotnet pod as needed within and between Regions.
Some other approach I haven't thought of
I would start with the most simple approach and that is to place the write and read side together because they belong to the same bounded context.
Then in the future if it is needed, then I would consider adding more read side or scaling out to other regions.
To get started I would also consider adding the ReadSide inside the same VM as the write side. Just to keep it simple, as getting it all up and working in production is always a big task with a lot of pitfalls.
I would consider using a Kafka like system to transport the data to the read-sides because with queues, if you later add a new or if you want to rebuild a read-side instance, then using queues might be troublesome. Here the sender will need to know what read-sides you have. With a Kafka style of integration, each "read-side" can consume the events in its own pace. You can also more easily add more read-sides later on. And the sender does not need to be aware of the receivers.
Kafka allows you to decouple the producers of data from consumers of the data, like this picture that is taken form one of my training classes:
In kafka you have a set of producers appending data to the Kafka log:
Then you can have one or more consumers processing this log of events:
It has been almost 2 years since I posted this question. Now with 20-20 hindsight I thought I would post my solution. I ended up simply provisioning an Azure Cosmos Db in the region where my cluster lives, and hitting the Cosmos Db for all my query-side requirements.
(My cluster already lives in the Azure Cloud)
I maintain one Postges Db in my original cluster for my write-side requirements. And my app scales nicely in the cluster.
I have not yet needed to deploy clusters to new regions. When that happens, I will provision a replica of the Cosmos Db to that additional region or regions. But still just one postgres db for write-side requirements. Not going to bother to try to maintain/sync replicas of the postgres db.
Additional insight #1. By provisioning the the Cosmos Db separately from my cluster (but in the same region), I am taking the load off of my cluster nodes. In effect, the Cosmos Db has its own dedicated compute resources. And backup etc.
Additional insight #2. It is obvious now but wasnt back then, that tightly coupling a document db (such as MongoDb) to a particular pod is...a bonkers bad idea. Imagine horizontally scaling your app and with each new instance of your app you would instantiate a new document db. You would quickly bloat up your nodes and crash your cluster. One read-side document db per cluster is an efficient and easy way to roll.
Additional insight #3. The read side of any CQRS can get a nice jolt of adrenaline with the help of an in-memory cache like Redis. You can first see if some data is available in the cache before you hit the docuement db. I use this approach for data such as for a checkout cart, where I will leave data in the cache for 24 hours but then let it expire. You could conceivably use redis for all your read-side requirements, but memory could quickly become bloated. So the idea here is consider deploying an in-memory cache on your cluster -- only one instance of the cache -- and have all your apps hit it for low-latency/high-availability, but do not use the cache as a replacemet for the document db.

Is Google Cloud SQL high availability really improving reliability?

I want to create a Google Cloud SQL instance but I am not sure about choosing high availability or not.
From what I understand the failover switch can take a few minutes, it is not instantly done, and the cost is roughly 2x the cost of a regular instance.
The failover is triggered only in case of zone outage, not in case of db issues. Since the monthly uptime is 99.95 at least, that makes an outage possibility of 21mins per month maximum. A failover can take up to 5 mins, and we can suppose the 21minutes downtime is not happening on a single event, therefore is there a real need to subscribe to High Availability?
A full zone outage is probably quite rare, so if you don't care about it, an HA instance might indeed not be needed.
One advantage of HA is that failover can be faster than restart. We've experienced cases when the primary instance gets "stuck" and a restart would take up to 30 minutes (GCP ticket). In such cases it's faster to failover to an HA instance.
(Before October 2019, HA failover instances could also be used for read queries, and thus avoid the need for an additional read replica. With the change from binlog-based replication to disk-based replication this is not the case anymore.)
HA Failover is not just for a full zone outage. It kicks in whenever the primary instance stops responding for more than a minute.
The fact that it is quicker than a restart, more reliable than a restart, and automatic means it keeps your outages much shorter when mysql crashes.
Also, don't you need HA for the SLA to apply, without HA you're not multizone, and therefore you can't meet the defintion of "Downtime"
"Downtime" means (ii) with respect to Cloud SQL Second Generation for
MySQL, Cloud SQL for SQL Server, and Cloud SQL for PostgreSQL: all
connection requests to a Multi-zone Instance fail.
https://cloud.google.com/sql/sla

Is MongoDB point in time recovery only possible with OpLogs?

For MongoDB version 3+
We are using MongoDB as a single stand-alone instance. I discovered I can't create a point in time recovery without building a single node replica so I can take advantage of the oplogs to "replay transactions" for point in time recovery. I've done some testing making a single node replica I seem to be able to restore to a specific time.
Is there an alternative with a stand-alone mongodb to get point in time recovery or do I need to convert it to a single node replca.
Thanks
By point-in-time I assume you mean arbitrary point-in-time. You can get a backup up to a particular point by taking a filesystem snapshot at that point, but you can't roll that snapshot forward arbitrarily (by say 10 seconds) to another point without taking another snapshot, or having an oplog.
Basically, yes, you will need an oplog (and hence a single node replica set at a minimum) to be able to recover your data to a specific point in time. The oplog needs to be big enough to cover all operations between your backup/snapshot to the point in time you wish to recover.
For example, if you take backups every 24 hours then you will need an oplog to cover at least 24 hours worth of operations in order to be able to restore to any point in that day. The more often you take backups, the smaller you can make your oplog while retaining the capability.

Rebalance data after adding nodes

I'm using Cassandra 2.0.4 (with vnodes) and 2 days ago I added 2 nodes (.210 and .195.) I expected Cassandra to redistribute the existing data automatically, but today I still find this nodetool status
Issuing a nodetool repair on any of the nodes doesn't do anything either (the repair finishes within seconds.) The logs state that the repair is being executed as expected, but after preparing the repair plan it pretty much instantly finishes executing said plan.
Was I wrong to assume the existing data would be redistributed at all, or is something wrong? And if that isn't the case; how do I manually 'rebalance' the data?
Worth noting: I seem to have lost some data after adding this new nodes. Issuing a select on certain keys only returns data from the last couple of days rather than weeks, this makes me think the data is saved on .92 while Cassandra queries for it on one of the new servers. But that's really just an uneducated guess, I may have simple broken something during all of my trial & error tests meaning the data is actually gone (even though I don't issue deletes, ever.)
Can anyone enlighten me?
There is currently no manual rebalance option for vnode-enabled clusters.
But your cluster doesn't look unbalanced based on the nodetool status output you show. I'm curious as to why node .88 has only 64 tokens compared to the others but that isn't a problem per se. When a cluster is smaller there will be a slight variance in the balance of data across the nodes.
As for the data issues, you can try running nodetool repair -pr around the nodes in the ring and then nodetool cleanup and see if that helps.