I recently setup the Google Cloud Storage Access Logs & Storage Data and the logs are getting logged but I could see 4 logs at the same time.
For example:
usage_2017_02_14_07_00_00_00b564_v0
usage_2017_02_14_07_00_00_01b564_v0
usage_2017_02_14_07_00_00_02b564_v0
usage_2017_02_14_07_00_00_03b564_v0
So there are 4 usage logs logged for every hour, what's the different between them.
I connected all the logs to big query to query the table - and all 4 of them have different values.
Also analysing on storage logs - I could see storage_byte_hours to 43423002260.
How to calculate the cost from storage_byte_hours?
It is normal for GCS to sometimes produce more than one logfiles for the same hour. From Downloading logs (emphasis mine):
Note:
Any log processing of usage logs should take into account the possibility that they may be delivered later than 15 minutes after the
end of an hour.
Usually, hourly usage log object(s) contain records for all usage that occurred during that hour. Occasionally, an hourly usage log
object contains records for an earlier hour, but never for a later
hour.
Cloud Storage may write multiple log objects for the same hour.
Occasionally, a single record may appear twice in the usage logs. While we make our best effort to remove duplicate records, your log
processing should be able to remove them if it is critical to your log
analysis. You can use the s_request_id field to detect duplicates.
You calculate the bucket size from storage_byte_hours. From Access and storage log format:
Storage data fields:
Field Type Description
storage_byte_hours integer Average size in byte-hours over a 24 hour period of the bucket.
To get the total size of the bucket, divide byte-hours by 24.
In your case 43423002260 byte-hours / 24 hours = 1809291760 bytes
You can use the bucket size to estimate the cost for the storage itself:
1809291760 bytes = 1809291760 / 2^^30 GB ~= 1.685 GB
Assuming Multi Regional Storage (per GB per Month ) $0.026 your storage cost be:
1.685 GB x $0.026 = $0.04381 / month ~= $0.00146033333333 / day (w/ 30 days month)
But a pile of other data (network, ops, etc) is needed to compute additional related costs, see Google Cloud Storage Pricing.
Related
I’m using RDS PostgreSQL with the free tier configuration (t2.micro instance, 20 GB of general purpose database storage (SSD) and 20 GB of backups storage).
I’ve created a database with a few tables and the current usage is below 200 MB.
The issue is that the storage has been increasing day by day at a rate of 1 GB per day since I’ve created the database. As a workaround I’ve tried to stop this behavior by decreasing the backup retention period from 7 to 0 days. From yesterday on, no more backups were stored. However, the storage kept on growing at almost the same rate.
I have no clue what's going on and I don't know what to do to stop the consumption and avoid exceeding the free tier storage.
Apart from that, I also don't know which storage is the one that has been increasing (data or backup storage) because the AWS platform only says that the increase is in the RDS Storage (it doesn't distinguish between both types of storages). As I’m not storing a lot of data in the database I suspect the problem is in the backups and snapshots storage and not in the data storage itself.
Thanks in advance!
Update 2022-09-13
The curious fact is that I have a budget threshold of 8 Gb/month and on the same day every month the threshold is exceeded (in my case, on the 13th of each month). But at the end of the month it kinda resets by itself (it returns to zero). I don't know what is happening. Does anyone know where that temporary Gbs usage comes from?
I will describe the data and case.
record {
customerId: "id", <---- indexed
binaryData: "data" <---- not indexed
}
Expectations:
customerId is random 10 digit number
Average size of binary record data - 1-2 kilobytes
There may be up to 100 records per one customerId
Overall number of records - 500M
Write pattern #1: insert one record at a time
Write pattern #2: batch, maybe in parallel, with speed of at least 20M record per hour
Search pattern #1: find all records by customerId
Search pattern #2: find of all records by customerId group, in parallel, at a rate of at least 10M customerId per hour
Data is not too important, we can trade some aspects of reliability for speed
We suppose to work in AWS / GCP - it's best we key-value store is administered by the cloud
We want to spend no more that 1K USD per month on cloud costs for this solution
What we have tried:
We have this approach implemented in relational database, in AWS RDS MariaDB. Server is 32GB RAM, 2TB GP2 SSD, 8 CPU. I found that IOPS usage was high and insert speed was not satisfactory. After investigation I concluded that due to random nature of customerId there is high rate of different writes to index. After this I did the following:
input data is sorted by customerId ASC
Additional trade was made to reduce index size with little degradation of single record read speed. For this I did some sort of buckets where records 1111111185 and 1111111186 go to same "bucket" 11111111. This way bucket can't contain more than 100 customerIds so read speed will be ok, and write speed improves.
Even like this, I could not make more than 1-3M record writes per hour. Different write concurrencies were tested, current value is 4 concurrent writers. After all modifications it's not clear what else we can improve:
IOPS is not at the top use (~4K per second),
CPU use is not high,
Network is not fully utilized,
Write and read throughputs are not capped.
Apparently, ACID principles are holding us back. I am in look for flatly scalable key-value store and will be glad to hear any ideas and roughly estimations.
So if I understand you...
2kb * 500m records ≈ 1 TB of data
20m writes/hr ≈ 5.5k writes/sec
That's quite doable in NoSQL.
The scale is not the issue. It's your cost.
$1k a month for 1 TB of data sounds like a reasonable goal. I just don't think that the public clouds are quite there yet.
Let me give an example with my recommendation: Scylla Cloud and Scylla Open Source. (Disclosure: I work for ScyllaDB.)
I will caution you that your $1k/month capitation on costs might cause you to consider and make some tradeoffs.
As is typical in high availability deployments, to ensure data redundancy in case of node failure, you could use 3x i3.2xlarge instances on AWS (can store 1.9 TB per instance).
You want the extra capacity to run compactions. We use incremental compaction, which saves on space amplification, but you don't want to go with the i3.xlarge (0.9 tb each), which is under your 1 tb limit unless really pressed for costs. In which case you'll have to do some sort of data eviction (like a TTL) to keep your data to around <600 gb.
Even with annual reserved pricing for Scylla Cloud (see here: https://www.scylladb.com/product/scylla-cloud/#pricing) of $764.60/server, to run the three i3.2xlarge would be $2,293.80/month. More than twice your budget.
Now, if you eschew managed services, and want to run self-service, you could go Scylla Open Source, and just look at the on-demand instance pricing (see here: https://aws.amazon.com/ec2/pricing/on-demand/). For 3x i3.2xlarge, you are running each at $0.624/hour. That's a raw on-demand cost of $449.28 each, which doesn't include incidentals like backups, data transfer, etc. But you could get three instances for $1,347.84. Open Source. Not managed.
Still over your budget, but closer. If you could get reserved pricing, that might just make it.
Edit: Found the reserve pricing:
3x i3.2xlarge is going to cost you
At monthly pricing $312.44 x 3 = $937.32, or
1 year up-front $3,482 annual/12 = $290.17/month/server x 3 = $870.50.
So, again, backups, monitoring, and other costs are above that. But you should be able to bring the raw server cost <$1,000 to meet your needs using Scylla Open Source.
But the admin burden is on your team (and their time isn't exactly zero cost).
For example, if you want monitoring on your system, you'll need to set up something like Prometheus, Grafana or Datadog. That will be other servers or services, and they aren't free. (The cost of backups and monitoring by our team are covered with Scylla Cloud. Part of the premium for the service.)
Another way to save money is to only do 2x replication. Which puts your data in a real risky place in case you lose a server. It is not recommended.
All of this was based on maximal assumptions of your data. That your records are all around 2k (not 1k). That you're not getting much utility out of data compression, which ScyllaDB has built in – see part one (https://www.scylladb.com/2019/10/04/compression-in-scylla-part-one/) and part two (https://www.scylladb.com/2019/10/07/compression-in-scylla-part-two/).
To my mind, you should be able to squeak through with your $1k/month budget if you go reserved pricing and open source. Though adding on monitoring and backups and other incidental costs (which I haven't calculated here) may end you up back over that number again.
Otherwise, $2.3k/month in a fully-managed-cloud enterprise package and you can sleep easy at night.
I have an entry plan instance of DB2 Warehouse on Cloud that I'm looking to use for development of a streaming application.
If I keep the data to <= 1GB, it will cost me $50/month. I'm worried that I could easily fill the database up with 20GB and the cost jumps up to $1000/month.
Is there a way that I can limit the amount of data in my DB2 Warehouse on Cloud to < 1GB?
As per this link
Db2 Warehouse pricing plans
You will not be charged anything as long as your data usage does not exceed 1 GB. From 1 GB to 20 GB the price will vary based on the data used.
You should be able to see the current % of usage at any time in your console. Other than that I am not aware of any method to automatically restrict the usage to less than 1 GB at this time.
One of the problem would be the data compression which determines the actual amount of data stored and it can vary based on the type of data stored.
Hope this helps.
Regards
Murali
Google cloud storage documentation (https://cloud.google.com/storage/docs/storage-classes) suggests that there is "minimum storage duration" for Nearline (30days) and Coldline (90 days) storage but there is no description about regional and multi-regional storage.
Does it mean that there is absolutely no minimum storage duration? Is the unit by microsecond, second, minute, hour, or a day?
For example, (unrealistically) suppose that I created a google cloud storage bucket, and copied 10 petabyte of the data to the bucket, and removed each piece of data 1 minute after (or moved the data to another bucket I don't have to pay), then, is the cost of regional storage will be
$.02 (per GB per month) * 10,000,000 (GB) * 1/30/24/60 (months) = $4.62 ?
If the "unit" of GCS bucket usage time is an hour rather than a minute, then the cost will be $278. If it is 12 hours, the cost will be $3333, so there are huge differences in such an extreme case.
I want to create a "temporary bucket" that holds petabyte-scale data in a short period of time, and just wanted to know what the budget should be. The previous question (Minimum storage duration Google Cloud Storage buckets) did not help in answering to my question.
There is no minimum storage duration for regional or multi-regional buckets, but keep in mind that you will still have to pay operations costs to upload. At the time of this writing, that would be $0.05 per 10,000 files uploaded.
I believe the granularity of billing is seconds, so your initial calculation $4.62 would be correct. But note depending on the average size of your files, uploading a petabyte is likely to be more expensive than that; if you have a petabyte of files that are 100MB in size, the operations cost for the uploads would be ~$50.
I am using node.js application running on Google Compute engine to create GCS bucket for each user. The bucket creation is a one time activity for each user. But when I try to run the program to create unique buckets for 20 users in parallel, I am getting the below error.
"Error code":429 and "Error message":"The project exceeded the rate limit for creating and deleting buckets."
Is there anyway I can increase this limit?
No.
There is a per-project rate limit to bucket creation and deletion of approximately 1 operation every 2 seconds, so plan on fewer buckets and more objects in most cases. If you're designing a system that adds many users per second, then design for many users in one bucket (with appropriate ACLs) so that the bucket creation rate limit doesn't become a bottleneck.
See https://cloud.google.com/storage/docs/quotas-limits.