GCS - Cost migrating data to Archive Bucket - google-cloud-storage

I would like to know how much I would save by transferring 1 TB of data from a standard regional bucket to an Archive bucket located in the same region (and within the same project).
I understand that the cost can be split in Data Storage, Network Usage and Operations Usage.
For the Data Storage:
The cost of storing 1 TB in a Standard bucket per month : 1024 * 0.020 $ = 20.48 $
The cost of storing 1 TB in an Archive bucket per month : 1024 * 0.0012 $ = 1.2288 $
Which means that I would save 19.2512 $ per month.
For the Network Usage:
I assume that this cost for the transfer will be 0 because the data will move from one region to the same.
For the Operations Usage:
Retrieval cost from the Standard bucket : 0.004 $
It should need less than 10000 Class B operations to gather all the files.
Insertion cost in the Archive bucket : 0.50 $
It should need around 1024 * 1024 / 128 = 8192 operations of Class A. (1 per directory, 1 per file, and for each file larger than 128MB 1 per additional 128MB.)
So in total, I would have to pay 0.504$ once to transfer all the files to the Archive bucket and the bucket will cost me 1.2288 $ instead of 20.48 $.
Is my calculation correct or did I miss something ?
Regards,

According to the documentation on Cloud Storage Pricing your estimates seem to be correct. Moreover, the amount of data you would like to transfer is quite minimal so the charges would be low as well.
Keep in mind that Archive storage class implies that reads, early writes and deletions would be charged accordingly as shown here, so if you pretend to access that data often or overwrite the files therein it might be better to stay with the Stadard storage class.
Lastly, there is also a pricing calculator to make this kind of estimates that could be found here.

Related

Google Cloud cloud storage operation costs

I am looking into using Google Cloud cloud storage buckets as a cheaper alternative to compute engine snapshots to store backups.
However, I am a bit confused about the costs per operation. Specifically the insert operation. If I understand the documentation correctly, it doesn't seem that it matters how large the file is that you want to insert is, it always counts as 1 operation.
So if I upload a single 20 TB file using one insert to a standard storage class bucket, wait 14 days, then retrieve it again, and all this within the same region, I practically only pay for storing it for 14 days?
Doesn't that mean that even the standard storage class bucket is a more cost effective option for storing backups compared to snapshots, as long as you can get your whole thing into a single file?
It's not fully accurate, and all depends on what cost for you.
First of all, the maximum size of an object in Cloud Storage is 5 TiB, so you can't store 1 file of 20Tb, but 4, at the end, it's the same principle.
The persistent disk snapshot is a very powerful feature:
The snapshot doesn't need CPUs to be done, compared to your solution.
The snapshot doesn't need network bandwidth to be done, compared to your solution.
The snapshot can be done anytime, on the fly.
The snapshot can be restored in the current VM, or you can create a new VM with a snapshot to investigate on it, for example.
You can perform incremental snapshots saving money (cheaper than full image snapshot).
You don't need additional space on your persistent disk to be done (compared to your solution where you need to create an archive before sending it to Cloud Storage).
In your scenario seems like using snapshots seems like the best solution in terms of time efficiency. Now, is using Cloud Storage a cheaper solution? Probably, as it is listed as the most affordable storage option, but in the end, you will have to calculate the cost-benefits on your own.

Meaning of "Minimum storage duration" of Google Cloud storage

I know that Google Cloud storage has 4 options of storage and each options has a different of "Minimum storage duration"
https://cloud.google.com/storage/docs/lifecycle?hl=vi
Standard Storage: None
Nearline Storage: 30 days
Coldline Storage: 90 days
Archive Storage: 365 days
What is the meaning of "Minimum storage duration"?
I guess that, "Minimum storage duration" is the time that your data has been keep in Google Cloud storage.
Is it the period after which your data will automatically be deleted if not used?
Such as:
I use options Nearline Storage: 30 days to store my data.
If within 30 days I don't use this data. It will be delete
If I use this data frequently. It will be stored until I delete my bucket.
Is my guess right?
If wrong: please tell me the right thing.
In order to understand the Minimum Storage Duration, it is necessary to know the concept of Storage classes first.
What is a storage class?
The storage class of an object or bucket affects the object's/bucket's
availability and pricing.
Depending on one's use case and how frequently one accesses the data in a bucket, he may chooce one of the available Storage Classes:
Standard Storage is used for data that is frequently accessed and/or
stored only for short periods of time.
Nearline Storage is a low-cost option for accessing infrequently
data. It offers lower at-rest costs in exchange to lower availability,
30 days minimum storage duration and cost for data access. It is
suggested to be used in use cases where one accesses his data once per
month on average.
Coldline Storage is similar to Nearline, but offers even lower
at-rest costs again in exhange to lower availability, 90 days minimum
storage duration and higher cost for data access.
Archive Storage is the lowest-cost, highly durable storage service
for data archiving, online backup, and disaster recovery. has no
availability SLA, though the typical availability is comparable to
Nearline Storage and Coldline Storage. Archive Storage also has higher
costs for data access and operations, as well as a 365-day minimum
storage duration.
You may find detailed information in the Storage Classes documentation.
So what is the minimum storage duration?
A minimum storage duration applies to data stored using one of the above storage classes. You can delete the file before it has been stored for this duration, but at the time of deletion you are charged as if the file was stored for the minimum duration.
Please note that minimum storage duration doesn't have to do with automatic deletion of objects.
If you would like to delete objects based on conditions such the Age of an object, then you may set an Object Lifecycle policy for the target object. You may find an example on how to delete live versions of objects older than 30 days, here.
It simply means that, even if you delete your objects before that minimum duration, they will assume that you have stored for minimum duration and charge your for that time.
If within 30 days I don't use this data. It will be delete:
They will not delete unless you specify object lifecycle rules or delete objects by yourself
If I use this data frequently. It will be stored until I delete my bucket.
If you use this data frequently, you will be charged as per the rates specified for the storage class(Nearline Storage in your case)

Migrate an on-prem database to AWS Aurora

I have a postgres database running locally that I'd like to migrate to AWS Aurora (or AWS postgres).
I've pg_dump'd the database that I want, and it's ~30gb compressed.
How should I upload this file and get the AWS RDS instance to pg_restore from it?
Requirements:
There's no one else using the DB so we're ok with a lot of downtime and an exclusive lock on the db. We want it to be as cheap as possible to migrate
What I've tried/looked at so far:
Running pg_restore on the local file with the remote target - unknown pricing total
I'd also like to do this as cheaply as possible, and I'm not sure I understand their pricing strategy.
Their pricing says:
Storage Rate $0.10 per GB-month
I/O Rate $0.20 per 1 million requests
Replicated Write I/Os $0.20 per million replicated write I/Os
Would pg_restore count as one request? The database has about 2.2 billion entries, and if each one is 1 request does that come out to $440 to just recreate the database?
AWS Database Migration Service - it looks like this would be the cheapest (as it's free?) but it only works by connecting to the local database. Uncompressed the data is about 200gb, and I'm not sure it makes sense to do a one for one copy using DMS
I've read this article but I'm still not clear on the best way of doing the migration.
We're ok with this taking a while, we'd just like to do it as cheap as possible.
Thanks in advance!
There are some points you should note when migrating
AWS Database Migration Service - it looks like this would be the cheapest (as it's free?)
The service they provide for free is a Virtual machine ( with softwares included ) that provide the computing power and functionality to move Databases to some of their RDS service.
Even when that service is free, you would be charged normal fee for any RDS usage
The number they provided is roughly related to EBS (the underlying disks ) they use to serve your data. A very big and complex query can take some I/O, the two are not equal to eachother.
The estimation for EBS usage can be seen here
As an example, a medium sized website database might be 100 GB in size and expect to average 100 I/Os per second over the course of a month. This would translate to $10 per month in storage costs (100 GB x $0.10/month), and approximately $26 per month in request costs (~2.6 million seconds/month x 100 I/O per second * $0.10 per million I/O).
My personal advice: Make a clone of your DB with only part of the set ( 5% maybe). Use DMS on that piece. You can see how the bills work out for you in a few minutes. Then you can estimate the price on a full DB migration

Data Factory Copy Activity Blob -> ADLS

I have files that accumulate in Blob Storage on Azure that are moved each hour to ADLS with data factory... there are around 1000 files per hour, and they are 10 to 60kb per file...
what is the best combination of:
"parallelCopies": ?
"cloudDataMovementUnits": ?
and also,
"concurrency": ?
to use?
currently i have all of these set to 10, and each hourly slice takes around 5 minutes, which seems slow?
could ADLS, or Blob be getting throttled, how can i tell?
There won't be a one solution fits all scenarios when it comes to optimizing a copy activity. However there few things you can checkout and find a balance. A lot of it depends on the pricing tiers / type of data being copied / type of source and sink.
I am pretty sure that you would have come across this article.
https://learn.microsoft.com/en-us/azure/data-factory/copy-activity-performance
this is a reference performance sheet, the values are definitely different depending on the pricing tiers of your source and destination items.
Parallel Copy :
This happens at the file level, so it is beneficial if your source files are big as it chunks the data (from the article)
Copy data between file-based stores Between 1 and 32. Depends on the size of the files and the number of cloud data movement units (DMUs) used to copy data between two cloud data stores, or the physical configuration of the Self-hosted Integration Runtime machine.
The default value is 4.
behavior of the copy is important. if it is set to mergeFile then parallel copy is not used.
Concurrency :
This is simply how many instances of the same activity you can run in parallel.
Other considerations :
Compression :
Codec
Level
Bottom line is that you can pick and choose the compression, faster compression will increase network traffic, slower will increase time consumed.
Region :
the location or region of that the data factory, source and destination might affect performance and specially the cost of the operation. having them in the same region might not be feasible all the time depending on your business requirement, but definitely something you can explore.
Specific to Blobs
https://learn.microsoft.com/en-us/azure/storage/common/storage-performance-checklist#blobs
this article gives you a good number of metrics to improve performance, however when using data factory i don't think there is much you can do at this level. You can use the application monitoring to check out throughput while your copy is going on.

Estimated pricing for CloudKit usage?

I have a series of questions all for CloudKit pricing, hence a single post with multiple questions as they are all interrelated.
The current CloudKit calculator (as of Feb 2017) shows the following pricing details for 100,000 users:
Imagine working on an application which has large assets and large amount of data transfer (even after using compression and architecting it such that transfers are minimized).
Now, assume my actual numbers for an app such as the one I just described with 100,000 users is:
Asset Storage: 7.5 TB
Data Transfer: 375 TB (this looks pretty high, but assume it is true)
My Questions
Then will the Data Transfer component of my usage bill will be: (375 - 5) * 1000 GB * 0.10 $/GB = 37000 $?
Also, are there some pricing changes if say one is within the 5TB limit, but exceeds the 50MB per user limit or is that per user limit just an average, so even if the data transfer per user may be higher than 50MB, but if I stay within 5TB limit, I won't be charged?
What does Active Users really mean in this pricing context? The number of users who have downloaded the app or the number of users using the app in a given month?
How is the asset storage counted? Imagine for 2 successive months, this is the asset size uploaded: Month 1: 7.5 TB, Month 2: 7.5 TB. Then in the second month, will my asset storage be counted as 15 TB or 7.5 TB?
Is it correct that asset storage etc allocations increase for every user that is added (the screenshot does say that) or the allocations are increased in bulk only when you hit certain numbers such as: 10K, 20K, .., 100K etc? I read about bulk allocation but cannot find the source now and I am asking this question just to be sure, to avoid unpleasant surprises later.
Last but not the least, is CloudKit usage billed monthly?