How much storage does Rundeck need in S3 for log storage for say 100 jobs, 100 executions a day?
There is no information in the docs about minimum storage requirements, hence the question.
You can do that with a simple experiment. If one execution (on my example it's a simple "hello world!" print) takes 2.4 kB then 2.4 kB x 100 = 240 kB (100 executions), and 240 kB x 100 = 24.000 kB = 24 MB per day (100 executions of 100 jobs). Of course, that depends on your job outputs, my example is just a "hello world!" string printed.
Anyway, this is not a big deal for the Amazon S3 service (check "How much data can I store in Amazon S3?" section).
Related
Explanation: We are able to run 200 TPS on 16 core cpu successfully with around 40% cpu utilization, 50-60 concurrent locks. However when we increase the load upto 300 TPS DB response gets slow post 15-20mins run. System shows dead tuple of 2-4%.
Observation : CPU and other resources remain stable and if we perform vacuum during system latency, performance gets increased. However after 15-20 mins system again starts getting slow.
CPU : 16 core , RAM 128 GB, DB size 650 GB
i have made some test running ycsb in mongodb enterprise with and without encryption at rest. I was using the default workloads and i found some weird results when running the workload E.
Without encryption the runtime was about 13mins but when i switched it to an encrypted database the runtime jumped to a suspicious 17HOURS!!!
There must be something wrong but i cant figure what it could be. All the tests are being made with 100K operation count and 10M itens count, and im rebooting the system after each run. Would appreciate some help figuring this one out
YCSB makes no encryption per se but relies on the java driver of MongoDB. Have you tried the documentation of MongoDB?
Which type of encryption are you using?
I don't find your result that surprising. According to your question, your workload file looks like:
recordcount=10000000
operationcount=100000
readproportion=0
updateproportion=0
scanproportion=0.95
insertproportion=0.05
requestdistribution=zipfian
maxscanlength=100
scanlengthdistribution=uniform
This is a very intensive scan workload. First, scans are the slowest operations on column stores. Second, assuming it takes 250 ms for encryption and 400 ms for decryption, both the client and the REST server have to do it for each operation so it will take: (0.25 + 0.4)*100000 seconds, i.e. about 18 hours.
EDIT
According to your comments, you are using AES256 and comparing Workloads A and E.
Workload A is about 50 % reads and 50 % writes. If you're using the standard row size of YCSB, each row represent 1 kB (10 fields, 100 B each).
So, for 100k operations, you are manipulating the following amounts of data:
Workload A: 100000*0.5*1kB + 100000*0.5*1kB = 100 MB
Workload E: 100000*0.95*100*1kB + 100000*0.05*1kB = 9505 MB because your scans represent 100 rows!
Since AES is distributive, i.e. AES(A + B) = AES(A) + AES(B), you encrypt 95 times more data with workload E, which explains the time difference.
I recently setup the Google Cloud Storage Access Logs & Storage Data and the logs are getting logged but I could see 4 logs at the same time.
For example:
usage_2017_02_14_07_00_00_00b564_v0
usage_2017_02_14_07_00_00_01b564_v0
usage_2017_02_14_07_00_00_02b564_v0
usage_2017_02_14_07_00_00_03b564_v0
So there are 4 usage logs logged for every hour, what's the different between them.
I connected all the logs to big query to query the table - and all 4 of them have different values.
Also analysing on storage logs - I could see storage_byte_hours to 43423002260.
How to calculate the cost from storage_byte_hours?
It is normal for GCS to sometimes produce more than one logfiles for the same hour. From Downloading logs (emphasis mine):
Note:
Any log processing of usage logs should take into account the possibility that they may be delivered later than 15 minutes after the
end of an hour.
Usually, hourly usage log object(s) contain records for all usage that occurred during that hour. Occasionally, an hourly usage log
object contains records for an earlier hour, but never for a later
hour.
Cloud Storage may write multiple log objects for the same hour.
Occasionally, a single record may appear twice in the usage logs. While we make our best effort to remove duplicate records, your log
processing should be able to remove them if it is critical to your log
analysis. You can use the s_request_id field to detect duplicates.
You calculate the bucket size from storage_byte_hours. From Access and storage log format:
Storage data fields:
Field Type Description
storage_byte_hours integer Average size in byte-hours over a 24 hour period of the bucket.
To get the total size of the bucket, divide byte-hours by 24.
In your case 43423002260 byte-hours / 24 hours = 1809291760 bytes
You can use the bucket size to estimate the cost for the storage itself:
1809291760 bytes = 1809291760 / 2^^30 GB ~= 1.685 GB
Assuming Multi Regional Storage (per GB per Month ) $0.026 your storage cost be:
1.685 GB x $0.026 = $0.04381 / month ~= $0.00146033333333 / day (w/ 30 days month)
But a pile of other data (network, ops, etc) is needed to compute additional related costs, see Google Cloud Storage Pricing.
I am about to setup MongoDB on AWS EC2 (Amazon Linux HVM 64bits) and implement RAID 10.
I am expecting a couple of millions of records for a system of videos on demand.
I could not find any good advice on how much disk space I should use for that instance.
The dilemma is that I can't spend too much on EBS volume right now, but if I have to add a new bigger volume in less than a year and turn the db off to move the data to that new volume, that is a problem.
For the initial stage, I was thinking 16Gb (available after RAID 10 implementation) on a t2.medium, with plan of upgrading to m4.medium and adding replica sets later.
Any thoughts on this?
The math is pretty simple:
Space required = bytes per record x number of records
If you have an average of 145 bytes per record with an expectation of 5 million records, you can work with 1 GB of storage.
EBS storage is pretty cheap. 1 GB of SSD is $0.10 per month in us-east-1. You could allocate 5 GB for only $0.50 per month.
Also, RAID 10 is RAID 0 and RAID 1 combined. Read over this SO question regarding RAID 0 and RAID 1 configurations with EBS.
https://serverfault.com/questions/253908/is-raid-1-overkill-on-amazon-ebs-drives-in-terms-of-reliability
We are using MongoDB as a virtual machine (A3) on Azure. We are trying to simulate running cost of using MongoDB for our following scenario:
Scenario is to insert/update around 2k amount of data (time series data) every 5 minutes by 100,000 customers. We are using MongoDB on A3 instance (4 core) of Windows Server on Azure (that restricts 4TB per shard).
When we estimated running cost, it is coming out to be approx $34,000 per month - which includes MongoDB licensing, our MongoDB virtual machine, storage, backup storage and worker role.
This is way costly. We have some ideas to bring the cost down but need some advice on those ideas as some of you may have already done this.
Two questions:
1- As of today, we are estimating to use 28 MongoDB instances (with 4 TB limit). I have read that we can increase the disk size from 4TB to 64 TB on Linux VM or Windows Server 2012 server. This may reduce our number of shards needed. Is running MongoDB on 64TB disk size shard possible in Azure?
You may ask why 28 number of instances..
2- We are calculating our number of shards required based on "number of inserts per core"; which is itself depend on number of values inserted in the MongoDB per message. each value is 82 bytes. We did some load testing and it comes out that we can only run 8000 inserts per second and each core can handle approx. 193 inserts per second - resulting into need of 41 cores (which is way too high). You can divide 41 cores/4 resulting into A3 11 instances -- which is another cost....
Looking for help to see - if our calculation is wrong or the way we have setup is wrong.
Any help will be appreciated.
Question nr. 1:
1- As of today, we are estimating to use 28 MongoDB instances (with 4
TB limit). I have read that we can increase the disk size from 4TB to
64 TB on Linux VM or Windows Server 2012 server. This may reduce our
number of shards needed. Is running MongoDB on 64TB disk size shard
possible in Azure?
According to documentation here, the maximum you can achieve is 16TB, which is 16 Data disks attached, max. 1 TB each. So, technically the largest disk you can attach is 1TB, but you can build RAID 0 stripe with the 16 disks attached, so you can get 16TB storage. But this (16TB) is the maximum amount of storage you can officially get.
According to the Azure documentation A3 size can have a maximum of 8 data disks. So a maximum of 8TB. A4 can handle 16 disks. I would assume your bottleneck here is disk and not the number of cores. So i'm not convinced you need such a big cluster.