Can Geode flush data to database - geode

In Geode's Homepage, It says'Super fast write-ahead-logging (WAL) persistence with a shared-nothing architecture that is optimized for fast parallel recovery of nodes or an entire cluster'
Does this mean Geode can flush data to database. If so, what's its mechanism?

Yes, you can use Geode's AsyncEventListener to flush data to the database.

Related

What is difference MongoDB wiredTiger cache with in-memory DB

As i know MonogoDB cache working set in RAM.
Then if i increase wiredTigerCacheSizeGB as much as all of data in disk, does it work as fast as in-memory db?
if no, what is difference?
See In-Memory Storage Engine and WiredTiger Storage Engine
(In-memory) By avoiding disk I/O, the in-memory storage engine allows for more predictable latency of database operations.
Keep in mind that you are limited a 10000 GB when setting wiredTigerCacheSizeGB. You should also disable journaling and set storage.syncPeriodSecs to 0 in order to increase performance of WiredTiger. But, still WiredTiger has to create WiredTiger.wt and WiredTiger.turtle at least...
PS. I think this link might answer your question
I cannot answer all your questions.
A cache reads data from disk and keeps it in the RAM. When you access such data again then you read it from RAM instead of reading it again from disk - which would be much slower.
So, a cache is useless if you have to read the data only once. Some applications anticipate the data you may read in future and put it into the cache in advance.
The MongoDB in-memory DB puts all data into RAM only, it does not read or write anything from disk, apart from some logging data. When you stop an in-memory MongoDB process then all data is lost.
The wiredTiger storage engine is a data format used by MongoDB to store data persistently on disk.
If you set wiredTigerCacheSizeGB high enough to hold all of your data, then all of your reads will be satisfied from the cache. Writes will update the cache and also be written to storage.
If you use the in-memory configuration then all of your reads will be satisfied from memory. Writes will only go to memory and will not be stored on disk.
So if your workload is mostly reads, then the large cache will behave similarly to an in-memory DB. If your workload has a lot of writes, then the large cache configuration may be slower because it needs to write to disk.
Also, the in-memory DB will not preserve your data in the event of a crash, since it only holds data in memory.

Apache Ignite with Posgresql

Objective: To scale existing application where PostgreSQL is used as a data store.
How can Apache Ignite help: We have an application which has many modules and all the modules are using some shared tables. So we have only one PostgreSQL master database and It's already on AWS large SSD machines. We already have Redis for caching but as we no limitation of Redis is, It's not easy partial updates and querying on secondary indexes.
Our use case:
We have two big tables, one is member and second is subscription. It's many to many relations where one member is subscribed in multiple groups and we are maintaining subscriptions in subscription table.
Member table size is around 40 million and size of this table is around 40M x 1.5KB + more ~= 60GB
Challenge
A challenge is, we can't archive this data since every member is working and there are frequent updates and read on this table.
My thought:
Apache Ignite can help to provide a caching layer on top of PostgreSQL table, as per I read from the documentation.
Now, I have a couple of questions from an Implementation point of
view.
Will Apache Ignite fits in our use case? If Yes then,
Will apache Ignite keep all data 60GB in RAM? Or we can distribute RAM load on multiple machines?
On updating PostgreSQL database table, we are using python and SQLALchamy (ORM). Will there be a separate call for Apache Ignite to
update the same record in memory OR IS there any way that Apache
Ignite can sync it immediately from Database?
Is there enough support for Python?
Are there REST API support to Interact with Apache Ignite. I can avoid ODBC connection.
How about If this load becomes double in next one year?
A quick answer is much appreciated and Thanks in Advance.
Yes it should fit your case.
Apache Ignite has persistence meaning it can store the data on disk optionally, but if you employ it for caching only it will happily store everything in RAM.
There are two approaches. You can do your updates on Apache Ignite (which will propagate them to PostgreSQL) or you can do your updates to PostgreSQL and have Apache Ignite fetch them on the first use (pull from PostgreSQL). The latter only works for new records as you can imagine. There is no support of propagating data from PostgreSQL to Apache Ignite, I guess you could do something like that by using triggers but it is untested.
There is 3rd party client. I didn't try it. Apache Ignite only has built-in native clients for C++/C#/Java for now, other platforms can only connect through JDBC/ODBC/REST and only use a fraction of functionality.
There is REST API and it have improved recently.
120GB doesn't sound like anything scary as far as Apache Ignite is concerned.
in addition to alamar's answer:
You can store your data in-memory on many machines, as Ignite supports partitioned caches that are divided on parts and are distributed between machines. You can set data-collocations and number of backups.
There is an interesting memory model in Apache Ignite that allows you to persist data on the disk quickly. As Ignite Developers said a database behind the cluster will be slower than Ignite persistence because communication goes through external protocols
In our company we have huge Ignite cluster that keeps in RAM much more data

How to modify the configuration of Kafka to process large amount of data

I am using kafka_2.10-0.10.0.1. I have two questions:
- I want to know how I can modify the default configuration of Kafka to process large amount of data with good performance.
- Is it possible to configure Kafka to process the records in memory without storing in disk?
thank you
Is it possible to configure Kafka to process the records in memory without storing in disk?
No. Kafka is all about storing records reliably on disk, and then reading them back quickly off of disk. In fact, its documentation says:
As a result of taking storage seriously and allowing the clients to control their read position, you can think of Kafka as a kind of special purpose distributed filesystem dedicated to high-performance, low-latency commit log storage, replication, and propagation.
You can read more about its design here: https://kafka.apache.org/documentation/#design. The implementation section is also quite interesting: https://kafka.apache.org/documentation/#implementation.
That said, Kafka is also all about processing large amounts of data with good performance. In 2014 it could handle 2 million writes per second on three cheap instances: https://engineering.linkedin.com/kafka/benchmarking-apache-kafka-2-million-writes-second-three-cheap-machines. More links about performance:
https://docs.confluent.io/current/kafka/deployment.html
https://www.confluent.io/blog/optimizing-apache-kafka-deployment/
https://community.hortonworks.com/articles/80813/kafka-best-practices-1.html
https://www.cloudera.com/documentation/kafka/latest/topics/kafka_performance.html

Data loss during MongoDB replica set failover

I have a Go client that continually inserts sensor data into a MongoDB replica set 30 times per sec. I ran a failover test on my 3-member set using majority write concern (while still firing the inserts). The failover process took only 2 seconds, but afterward I realized some data samples were missing. They were not in the database or in the old primary's rollback file--completely missing.
In general, how can I assure no data loss (or minimal loss) in MongoDB during failover? I'm new to MongoDB. Are there commercially available MongoDB management modules for this (I have the free edition)? E.g. can MongoDB store incoming data temporarily during failovers and later persist the data to the database? I don't want to resort to handling failovers on the client--I want failovers to be transparent to the client.

How can I do Memcached HA?

I use memcached to save about 5MB data. About forty percent of the data updates ever seconds, that causes about 280 qps between memcached client and server, with get and set each takes half of the queries. Except the realization of such great data transaction, I also meet the HA problem.
Before I choose memcached, I've also looked at Redis. But it seems to be only one thread and not likely to performs well on data persistence. Also the client for Redis is not that easy and reach when it stands with Memcached.
But how can I do HA with memcached? How should I keep data duplication between the Master and slave memcached server; And when a memcached server crashed, it follows with the data consistency problem. Are there already some good tools for memcached HA, or if there is a better NoSql Database instead of memcached?
Maybe you can use repcached(repcached.lab.klab.org) or magent(https://code.google.com/p/memagent/)