Batch jobs and reduced SSD lifetime? - spring-batch

I am working on a batch job which imports data from a legacy database, transforms the data in 3NF and inserts the resulting data into another database (target database). The batch job is written with Spring Batch.
While I was developing the steps of the job, I wrote unit tests to test the functionality for each step. But now I am finished with development of the steps and want to test the system in a kind of testing environment before rolling the batch job out to production. Therefore, I imported the legacy database locally on a MySQL server and also created a local version of the target database. These MySQL servers are deployed on my Macbook Pro with 256 GB SSD. I already ran the job a few times with little bugfixes but now it came to my mind that SSDs are more sensible to write cycles than a standard HDD. Hence, I checked the process mysqld in my activity manager and noticed that 424.64 GB have been written to my SSD in the last three days.
How much influence (lifetime, write cycles) does this number of written GB will have to my SSD? Would you recommend to deploy the database on a normal HDD instead of using my SSD? Or do you think that I am falsely alarmed?

I would recommend you deploy the database to a normal HDD, because the NAND flash on your SSD do have a max erase threshold. In other words, you are wearing down your SSD. Although SSDs have features to ensure that the NAND flash wear down evenly, you are definitely wearing it down much faster than normal usage.

Related

MongoDB Performance in Docker

I did an experiment by running a python app that is writing 2000 records into mongoDB.
The details of my setup of the experiment as follows:
Test 1: Local PC - Python App running on Local PC with mongoDB on Local PC (baseline)
Test 2: Docker - Python App on Linux Container with mongoDB on Linux Container with persist volume
Test 3: Docker - Python App on Linux Container with mongoDB on Linux Container without persist volume
I’ve generated the result in chart - on average writing data on local PC is about 30 secs. Where else on Docker, it takes about 80plus secs. Hence it seems like writing on Docker is almost 3 times slower than writing on local PC itself.
Should I want to improve the write speed or performance of the mongoDB in docker container, what is the recommended practice? Or should I put the mongoDB as a external volume without docker?
Thank you!
graph
Your system is not consistent in many ways - dynamic storage and CPU performance, other processes, dynamic system settings etc. There are a LOT of underlying things under storage only.
60 sec tests are not enough for anything
Simple operations are not good enough for baseline comparisons
There is ZERO performance impact with storage and CPU in case of containers, there is an impact in networking, but i assume, this is not applicable here
Databases and database management systems must be optimized in special ways, there is no "install and run" approach. We, sysadmins/db admins usually need days to have it running smoothly. Also, performance changes over time.
After couple of weeks of testing and troubleshooting. I finally got the answer and I shall share my findings with the rest of the DevOps or anyone who facing the same issue as me
Correct this statement if needed, Docker Container was started off with Linux, Microsoft join the container bandwagon late and in order to for the container works (with Linux), the DevOps team need to install Linux WSL2 in Windows. And that cost extra overheads which resultant in the process speed.
So to improve the performance speed with containers, the setup should be in Linux OS instead of Windows OS. (and yes the speed reduce drastically)

Cassandra + Redis Project Server setup

Currently i'm having a Dedicated VPS Server with 4GB Ram , 50GB Hard-disk , i have a SAAS solution running on the server with more than 1500 customers. Now i'm going to upgrade the projects business plan,There will be 25000 customers and about 500 - 1000 customers using the project realtime . For now it takes 5 Seconds to fetch cassandra database records from the server to the application.Then i came through redis and it says that saving a copy to redis will help to fetch the data much faster and lowers server overhead.
Am i right about this ?
If i need to improve the overall performance , Can anybody tell me what are the things i need to upgrade ?
Can a server with configuration said above can handle cassandra and redis together ?
Thanks in advance .
A machine with 4GB of RAM will probably only be single-core so it's too small for any production workload and only suitable for dev usage where you run 1 or 2 transactions per second, mostly for functional testing.
We generally recommend deploying Cassandra on machines with at least 2 cores + 8GB allocated to the heap (so need at least 16GB of RAM) for low production loads. For moderate loads, 4 cores + 32GB RAM is ideal so you can allocate 16GB to the heap.
If you're just at the proof-of-concept stage, there's a tier on DataStax Astra that's free forever and doesn't require a credit card to create an account. I recommend it to most people because you can launch a cluster in a few clicks and you can quickly focus on developing your app. Cheers!

What are the Server Requirements for BluePrism tool with 5 Bots?

The Company that I work for is planning to purchase BluePrism tool, we are in-need of the server requirements (RAM size,HD size etc) for the functioning 5 Bots.
Thanks in Advance for your help.
The Automation Platform required depends of the nature of your automations.
You can run everything in a single node and 8G of RAM and 4 Cores will be enough (if you are not automating applications that require many resources) and if you do not need to run anyhing concurrently.
On the other hand for Production purposes:
1 Blue Prism Application Server 2012 R2 (This is used to Orchestrate
your Platform)... regularly I put at least 8G RAM and 4 Cores, HDD
used for BP is insifnificant 10G after OS installation should be
good.
1 SQL Server (where the structure of your automation, and the execution logs will be allocated) at least 8G RAM and HDD depends of
your transacctions, logging and other configurations that you will
choose when you automate.
1 Interactive Client, (where you will monitor your platform) 4G Ram 50G HDD
1 Runtime Resource (where your automations will run)... here you have to take in consideration all the resources that will be used by
your automation, here you have to deal with that HDD Space, and
Processor Needs. You can only run 1 automation at time, so if you
need to run 5 different automations (what I think that you called
robots) concurrently, you need 5 runtime resources, each one must
have installed the required applications (used to run its
automation)

MongoDB Response Slower on Amazon EC2 Server Than Localhost

I'm loading the same amount of data (~100kb) on both my local server and a test Amazon EC2 server, but the response is 2x slower on EC2. Both are running Apache 2 and MongoDB on the same machine. On my local server, the response is about 209ms versus approximately 455ms on EC2.
I've setup a simple query and AJAX call that grabs point data to display on the map based on the current viewport of the device.
How can I debug this issue? How can I make it as faster as my local server? I even tried experimenting with different types of instances to make sure the specs are the same, but no luck. I also realize it could be because of network latency.
Local computer specs:
Intel Core i5 # 3.30GHz
8GB RAM
64-bit Windows 8
Amazon EC2 specs (m4.large):
2.4 GHz Intel Xeon Haswell (2 vCPUs)
8GB RAM
Amazon Linux
A remote query to EC2 is unlikely to return the result of the AJAX query as fast as your local server because it has network latency, while your local server does not. Measure the time in your AJAX handler from the start of the query to the point where it is ready to return data to get a meaningful baseline for comparison.
MongoDB is very sensitive to data being in RAM vs on disk. Depending on how you configured your EC2 instance, and on your local hardware, chances are pretty good that your local hardware is faster. EC2 instances can be configured to use SSD storage, and you can configure a guaranteed IOPS figure.
Is the 100KB the size of the result set, or the amount of data needed to form the result set? If you process 4GB of data down to get a 100KB result set, there's a good chance that disk IO is involved. If the amount of data you need to pull is small, repeat the test a few times to ensure data is entirely in RAM.
Finally, if both local and EC2 are pulling data from RAM, there's a good chance that your local CPU core is just faster than the EC2 CPU core, and that your RAM access is faster as well. EC2 is designed to provide low-cost commodity hardware. Developer setups are often much faster.
If you cannot account for the speed differences given the factors above, update your question with the time measurements that exclude network latency and provide more detailed specifications about your hardware. Update the question to indicate whether the data you are retrieving from MongoDB should be entirely in RAM, given it's size and the amount of RAM on your instance.

mongodb flushing mmap takes around 20 secs with no updates being required

Hi One of our customers is running mongodb V2.2.3 on a 64 bit windows server 2008 R2 Enterprise.
We're currently seeing mmap flush times of over 20 seconds every minute.
What is confusing me is that it isn't doing any writes to the disk. (Disk write bytes is next to 0)
Our programme which access the data has been temporary turned off.
so all that is connected is a mongo shell.
Mongostat and mongotop aresn't showing anything
The database has 130 million records. There are 356 files for mmap.
Any sugestions on what could be causing this?
Thanks
If your working set is significantly larger than memory, and MongoDB is constantly going to disk for reads (and not just the normal spikes when syncing writes to disk), then you really should be sharding to spread the data across multiple machines/instances.
Given the behaviour you have described and that you have a large number of files for mmap, I suspect the underlying performance issue is SERVER-12401 in the MongoDB Jira issue tracker:
On Windows, Memory Mapped File flushes are synchronous operations. When the OS Virtual Memory Manager is asked to flush a memory mapped file, it makes a synchronous write request to the file cache manager in the OS. This causes large I/O stalls on Windows systems with high Disk IO latency, while on Linux the same writes are asynchronous.
There are a few possible ways to improve the flush performance on Windows, including code changes in both the MongoDB server and the Windows O/S. There is some ongoing work to address these issues, now that the synchronous flushing behaviour on Windows has been confirmed.
If you are using higher latency local storage (for example, spinning disks) you may be able to mitigate the issue by upgrading to SSD or better spec'd drives.
I would suggest upvoting/watching SERVER-12401 and the related Jira issues for updates.
It would also be worth upgrading from MongoDB 2.2 to a newer version as 2.2 is now past end-of-life for updates. There have been two major production release branches since then, including significant improvements in general performance/features as well as Windows support.