I have a series of questions all for CloudKit pricing, hence a single post with multiple questions as they are all interrelated.
The current CloudKit calculator (as of Feb 2017) shows the following pricing details for 100,000 users:
Imagine working on an application which has large assets and large amount of data transfer (even after using compression and architecting it such that transfers are minimized).
Now, assume my actual numbers for an app such as the one I just described with 100,000 users is:
Asset Storage: 7.5 TB
Data Transfer: 375 TB (this looks pretty high, but assume it is true)
My Questions
Then will the Data Transfer component of my usage bill will be: (375 - 5) * 1000 GB * 0.10 $/GB = 37000 $?
Also, are there some pricing changes if say one is within the 5TB limit, but exceeds the 50MB per user limit or is that per user limit just an average, so even if the data transfer per user may be higher than 50MB, but if I stay within 5TB limit, I won't be charged?
What does Active Users really mean in this pricing context? The number of users who have downloaded the app or the number of users using the app in a given month?
How is the asset storage counted? Imagine for 2 successive months, this is the asset size uploaded: Month 1: 7.5 TB, Month 2: 7.5 TB. Then in the second month, will my asset storage be counted as 15 TB or 7.5 TB?
Is it correct that asset storage etc allocations increase for every user that is added (the screenshot does say that) or the allocations are increased in bulk only when you hit certain numbers such as: 10K, 20K, .., 100K etc? I read about bulk allocation but cannot find the source now and I am asking this question just to be sure, to avoid unpleasant surprises later.
Last but not the least, is CloudKit usage billed monthly?
Related
I have a critical application deployed in AWS RDS; the DB engine is PostgreSQL version 10.18. The architecture is unusual, because we're talking about medical data. This means that all the doctors connecting the database (through a PGBouncer) have their own schema; arount 4000 doctors means around 4000 schemas, with the same structure but different data obviously. Around 2000 doctors are actually connecting every day.
The Instance type is db.r5.4xlarge and there's a total buffer of around 100 GB. Still, there are a lot of hits on the disk, this means that on the performance insight side, I can actually see that the greater AAS is because of a metric called "DataFileRead", which (as far as I know) means that the data couldn't be fetched from the buffer and the Engine went for the disk. There's an average value of 60 AAS on DataFileRead.
This is not really the problem; I'm trying to apply some optimizations creating the right indexes for example, the problem is that on the TOP SQL tab I cannot see any data right after the query (like Calls/sec, Rows/sec, Blk hits/sec, etc.).
Does this means that the limit of 5000 rows of the pg_stat_statements is too low? Also I can't find any information about the impact on the database performance about having the statistics enabled. Does it increase critically increasing the limit of 5000 records? Can I go up to 50000 for example?
I would like to know how much I would save by transferring 1 TB of data from a standard regional bucket to an Archive bucket located in the same region (and within the same project).
I understand that the cost can be split in Data Storage, Network Usage and Operations Usage.
For the Data Storage:
The cost of storing 1 TB in a Standard bucket per month : 1024 * 0.020 $ = 20.48 $
The cost of storing 1 TB in an Archive bucket per month : 1024 * 0.0012 $ = 1.2288 $
Which means that I would save 19.2512 $ per month.
For the Network Usage:
I assume that this cost for the transfer will be 0 because the data will move from one region to the same.
For the Operations Usage:
Retrieval cost from the Standard bucket : 0.004 $
It should need less than 10000 Class B operations to gather all the files.
Insertion cost in the Archive bucket : 0.50 $
It should need around 1024 * 1024 / 128 = 8192 operations of Class A. (1 per directory, 1 per file, and for each file larger than 128MB 1 per additional 128MB.)
So in total, I would have to pay 0.504$ once to transfer all the files to the Archive bucket and the bucket will cost me 1.2288 $ instead of 20.48 $.
Is my calculation correct or did I miss something ?
Regards,
According to the documentation on Cloud Storage Pricing your estimates seem to be correct. Moreover, the amount of data you would like to transfer is quite minimal so the charges would be low as well.
Keep in mind that Archive storage class implies that reads, early writes and deletions would be charged accordingly as shown here, so if you pretend to access that data often or overwrite the files therein it might be better to stay with the Stadard storage class.
Lastly, there is also a pricing calculator to make this kind of estimates that could be found here.
I'm using MongoDB with approximately 4 million documents and around 5-6GB database size. The machine has 10GB of RAM, and free only reports around 3.7GB in use. The database is used for a video game related ladder (rankings) website, separated by region.
It's a fairly write heavy operation, but still gets a significant number of reads as well. We use an updater which queries an outside source every hour or two. This updater then processes the records and updates documents on the database. The updater only processes one region at a time (see previous paragraph), so approximately 33% of the database is updated.
When the updater runs, and for the duration that it runs, the average flush time spikes up to around 35-40 seconds, and we experience general slowdowns with other queries. The updater is RAN on a SEPARATE MACHINE and only queries MongoDB at the end, when all the data has been retrieved and processed from the third party.
Some people have suggested slowing down the number of updates, or only updating players who have changed, but the problem comes down to rankings. Since we support ties between players, we need to pre-calculate the ranks - so if only a few users have actually changed ranks, we still need to update the rest of the users ranks accordingly. At least, that was the case with MySQL - I'm not sure if there is a good solution with MongoDB for ranking ~800K->1.2 million documents while supporting ties.
My question is: how can we improve the flush and slowdown we're experiencing? Why is it spiking so high? Would disabling journaling (to take some load off the i/o) help, as data loss isn't something I'm worried about as the database is updated frequently regardless?
Server status: http://pastebin.com/w1ETfPWs
You are using the wrong tool for the job. MongoDB isn't designed for ranking large ladders in real time, at least not quickly.
Use something like Redis, Redis have something called a "Sorted List" designed just for this job, with it you can have 100 millions entries and still fetch the 5000000th to 5001000th at sub millisecond speed.
From the official site (Redis - Sorted sets):
Sorted sets
With sorted sets you can add, remove, or update elements in a very fast way (in a time proportional to the logarithm of the number of elements). Since elements are taken in order and not ordered afterwards, you can also get ranges by score or by rank (position) in a very fast way. Accessing the middle of a sorted set is also very fast, so you can use Sorted Sets as a smart list of non repeating elements where you can quickly access everything you need: elements in order, fast existence test, fast access to elements in the middle!
In short with sorted sets you can do a lot of tasks with great performance that are really hard to model in other kind of databases.
With Sorted Sets you can:
Take a leader board in a massive online game, where every time a new score is submitted you update it using ZADD. You can easily take the top users using ZRANGE, you can also, given an user name, return its rank in the listing using ZRANK. Using ZRANK and ZRANGE together you can show users with a score similar to a given user. All very quickly.
Sorted Sets are often used in order to index data that is stored inside Redis. For instance if you have many hashes representing users, you can use a sorted set with elements having the age of the user as the score and the ID of the user as the value. So using ZRANGEBYSCORE it will be trivial and fast to retrieve all the users with a given interval of ages.
Sorted Sets are probably the most advanced Redis data types, so take some time to check the full list of Sorted Set commands to discover what you can do with Redis!
Without seeing any disk statistics, I am of the opinion that you are saturating your disks.
This can be checked with iostat -xmt 2, and checking the %util column.
Please don't disable journalling - you will only cause more issues later down the line when your machine crashes.
Separating collections will have no effect. Separating databases may, but if you're IO bound, this will do nothing to help you.
Options
If I am correct, and your disks are saturated, adding more disks in a RAID 10 configuration will vastly help performance and durability - more so if you separate the journal off to an SSD.
Assuming that this machine is a single server, you can setup a replicaset and send your read queries there. This should help you a fair bit, but not as much as the disks.
I have to implement an application in IOS which will be storing data on IPhone local storage. For example on device there can be yearly 3 million records by a user, so does IPhone will be able to process these many number of records?
Thanks
If there is a limit, it will not be number of rows, but storage space you take. If one row of your database is say 100 bytes, db size will be about 300MB. Since max size for the app is currently 2GB, it should work, unless your database row is wider than about 500+ bytes.
MongoDB is fast, but only when your working set or index can fit into RAM. So if my server has 16G of RAM, does that mean the sizes of all my collections need to be less than or equal to 16G? How does one say "ok this is my working set, the rest can be "archived?"
"Working set" is basically the amount of data AND indexes that will be active/in use by your system.
So for example, suppose you have 1 year's worth of data. For simplicity, each month relates to 1GB of data giving 12GB in total, and to cover each month's worth of data you have 1GB worth of indexes again totalling 12GB for the year.
If you are always accessing the last 12 month's worth of data, then your working set is: 12GB (data) + 12GB (indexes) = 24GB.
However, if you actually only access the last 3 month's worth of data, then your working set is: 3GB (data) + 3GB (indexes) = 6GB. In this scenario, if you had 8GB RAM and then you started regularly accessing the past 6 month's worth of data, then your working set would start to exceed past your available RAM and have a performance impact.
But generally, if you have enough RAM to cover the amount of data/indexes you expect to be frequently accessing then you will be fine.
Edit: Response to question in comments
I'm not sure I quite follow, but I'll have a go at answering. Firstly, the calculation for working set is a "ball park figure". Secondly, if you have a (e.g.) 1GB index on user_id, then only the portion of that index that is commonly accessed needs to be in RAM (e.g. suppose 50% of users are inactive, then 0.5GB of the index will be more frequently required/needed in RAM). In general, the more RAM you have, the better especially as working set is likely to grow over time due to increased usage. This is where sharding comes in - split the data over multiple nodes and you can cost effectively scale out. Your working set is then divided over multiple machines, meaning the more can be kept in RAM. Need more RAM? Add another machine to shard on to.
The working set is basically the stuff you are using most (frequently). If you use index A for collection B to search for a subset of documents then you could consider that your working set. As long as the most commonly used parts of those structures can fit in memory then things will be exceedingly fast. As parts no longer fit in your working set, like many of the documents then that can slow down. Generally things will become much slower if your indexes exceed your memory.
Yes, you can have lots of data, where most of it is "archived" and rarely used without affecting the performance of our application or impacting your working set (which doesn't include that archived data).