How to fine tune ELK - elastic-stack

We are running out of heap memory and also unstability issues in our ELK, below the configuration screenshot.
-Version 6.2.4
-No of nodes: 5
-Data nodes: 3
-Indices: 6138
-Documents: 3,840,550,046
-Primary shards: 14,934
-Replica Shards: 14,934
-Disk Available: 25.98(1TB/5TB)
-JVM Heap: 62.045(46GB/74GB)
I am know that I have to reduce on number of shards and also data we are holding is since Jan 2019-although 2019 data is in closed state.
I need help on understanding as how can I do
1- re-indexing to reduce the number of shards of the old indices
2- download of old indices and keep in a archive and later re-use the same if and when required
3- we are having daily indices rotation, how to change it to weekly/monthly indices and how it will help.
Looking forward for some guidance as ELK is new to me and am held up with this.
Thanks,
Abhishek

1- re-indexing to reduce the number of shards of the old indices
A reindex is rather expensive; if you have more than one primary shard per index (on 6.x the default was 5), I'd start with a _shrink, which is much cheaper.
2- download of old indices and keep in a archive and later re-use the same if and when required
Sounds like snapshot and restore, but that will be a slow and tedious approach.
3- we are having daily indices rotation, how to change it to weekly/monthly indices and how it will help.
The better approach would be a rollover that offers more flexibility and also allows you to create evenly sized indices / shards. Our default for the Beats in 7.x is 50GB for a single shard index.
Generally your Elasticsearch version is very old. There are a lot of performance and stability improvements in the current 7.10 version. Also features like Index Lifecycle Management, which would be the solution for your kind of problem.
Some additional notes:
Why are only 3 of 5 nodes data nodes?
As you said, the number of indices and shards is way too high.
The heap size should only go up to ~30GB, so that Java can use compressed pointers.

Related

What are settings to lookout for with Citus PostgresQL

We are looking into using CitusDB. After reading all the documentation we are not clear on some fundamentals. Hoping somebody can give some directions.
In Citus you specify a shard_count and a shard_max_size, these settings are set on the coordinator according to the docs (but weirdly can also be set on a node).
What happens when you specify 1000 shards and distribute 10 tables with 100 clients?
Does it create a shard for every table (users_1, users_2, shops_1, etc.) (so effectively using all 1000 shards.
If you would grow with another 100 clients, we already hit the 1000 limit, how are these tables partitioned?
The shard_max_size defaults to 1Gb. If a shard is > 1Gb a new shard is created, but what happens when the shard_count is already hit?
Lastly, is it advisible to go for 3000 shards? We read in the docs 128 is adviced for a saas. But this seams low if you have 100 clients * 10 tables. (I know it depends.. but..)
Former Citus/current Microsoft employee here, chiming in with some advice.
Citus shards are based on integer hash ranges of the distribution key. When a row is inserted, the value of the distribution key is hashed, the planner looks up what shard was assigned the range of hash values that that key falls into, then looks up what worker the shard lives on, and then runs the insert on that worker. This means that the customers are divided up across shards in a roughly even fashion, and when you add a new customer it'll just go into an existing shard.
It is critically important that all distributed tables that you wish to join to each other have the same number of shards and that their distribution columns have the same type. This lets us perform joins entirely on workers, which is awesome for performance.
If you've got a super big customer (100x as much data as your average customer is a decent heuristic), I'd use the tenant isolation features in advance to give them their own shard. This will make moving them to dedicated hardware much easier if you decide to do so down the road.
The shard_max_size setting has no effect on hash distributed tables. Shards will grow without limit as you keep inserting data, and hash-distributed tables will never increase their shard count under normal operations. This setting only applies to append distribution, which is pretty rarely used these days (I can think of one or two companies using it, but that's about it).
I'd strongly advise against changing the citus.shard_count to 3000 for the use case you've described. 64 or 128 is probably correct, and I'd consider 256 if you're looking at >100TB of data. It's perfectly fine if you end up having thousands of distributed tables and each one has 128 shards, but it's better to keep the number of shards per table reasonable.

Sharding key, chunkSize and pre-splitting

I have set up a sharded cluster on a single machine, following the steps mentioned here:
http://www.mongodb.org/display/DOCS/A+Sample+Configuration+Session
But I don't understand the '--chunkSize' option:
$ ./mongos --configdb localhost:20000 --chunkSize 1 > /tmp/mongos.log &
With N shards, each shard is supposed to have 1/N number of documents, dividing the shard-key's range into N almost equal parts, right? This automatically fixes the chunkSize/shard-size. Which chunk is the above command then dealing with?
Also, there's provision to split a collection manually at a specific value of key and then migrate a chunk to any other shard you want. This can be done manually and is even handled by a 'balancer' automatically. Doesn't it clash with the sharding settings and confuse the config servers or they are reported about any such movement immediately?
Thanks for any help.
You might be confusing a few things. The --chunkSize parameter sets the chunk size for the doing splits. The "settings" collection in the "config" database with _id "chunksize" to have a look at the current value, if set. The --chunkSize option will only set this value, or make changes to the system, if there is no value set already, otherwise it will be ignored.
The chunk size is the size in megabytes above which the system will keep chunk under. This is done in two places, 1) when writes pass through the mongos instances and 2) prior to moving chunks to another shard during balancing. As such it does not follow from the "data size / shard count" formula. Your example of 1Mb per chunk is almost always a bad idea.
You can indeed split and move chunks manually and although that might result in a less than ideal chunk distribution it will never confuse or break the config meta data and the balancer. The reason is relatively straightforward; the balancer uses the same commands and follows the same code paths. From MongoDB's perspective there is no significant difference between a balancer process splitting and moving chunks and you doing it.
There are a few valid use cases for manually splitting and moving chunks though. For example, you might want to do it manually to prepare a cluster for very high peak loads from a cold start -- pre-splitting. Typically you will write a script to do this, or load splits from a performance test which already worked well. Also, you may watch for hot chunks to split/move those chunks to move evenly distribute based on "load" as monitored from your application.
Hope that helps.
Great, thanks! I think I get it now..Correct me if I'm wrong:I was thinking that if there are N servers, then first 1/Nth part of the collection (=chunk1) will go to shard1, the second 1/Nth (=chunk2) will go to shard2 and so on.. When you said that there's no such "formula", I searched a bit more and found these links MongoDB sharding, how does it rebalance when adding new nodes?How to define sharding range for each shard in Mongo?From the definition of "chunk" in the documentation, I think it is to be thought of as merely a unit of data migration. When we shard a collection among N servers, then the total no. of chunks is not necessarily N. And they need not be of equal size either. The maximum size of one chunk is either already set as a default (usually 64MB) in the settings collection of config database, or can be set manually by specifying a value using the --chunkSize parameter as shown in the above code. Depending on the values of the shard-key, one shard may have more chunks than the other. But MongoDB uses a balancer process that tries to distribute these chunks evenly among the shards. By even distribution, I mean it tends to split chunks and migrate them to other shards if they grow bigger than their limit or if one particular shard is getting heavily loaded. This can be done manually as well, by following the same set of commands that the balancer process uses.

MongoDB Insert performance - Huge table with a couple of Indexes

I am testing Mongo DB to be used in a database with a huge table of about 30 billion records of about 200 bytes each. I understand that Sharding is needed for that kind of volume, so I am trying to get 1 to 2 billion records on one machine. I have reached 1 billion records on a machine with 2 CPU's / 6 cores each, and 64 GB of RAM. I mongoimport-ed without indexes, and speed was okay (average 14k records/s). I added indexes, which took a very long time, but that is okay as it is a one time thing. Now inserting new records into the database is taking a very long time. As far as I can tell, the machine is not loaded while inserting records (CPU, RAM, and I/O are in good shape). How is it possible to speed -up inserting new records?
I would recommend adding this host to MMS (http://mms.10gen.com/help/overview.html#installation) - make sure you install with munin-node support and that will give you the most information. This will allow you to track what might be slowing you down. Sorry I can't be more specific in the answer, but there are many, many possible explanations here. Some general points:
Adding indexes means that that the indexes as well as your working data set will be in RAM now, this may have strained your resources (look for page faults)
Now that you have indexes, they must be updated when you are inserting - if everything fits in RAM this should be OK, see first point
You should also check your Disk IO to see how that is performing - how does your background flush average look?
Are you running the correct filesystem (XFS, ext4) and a kernel version later than 2.6.25? (earlier versions have issues with fallocate())
Some good general information for follow up can be found here:
http://www.mongodb.org/display/DOCS/Production+Notes

What is the largest known Neo4j cluster?

What is the largest known Neo4j cluster (in db size, graph stats, or # of machines)?
The # of nodes and relationships was recently (with the 1.3 release) expanded to 32 billion each and another 64 billion for properties. If you look at the mailing list, there have been recent inquiries for quite large datastores.
As an approach to an answer you might want to check out this interview with Emil Eifrem (neo's founder): http://www.infoq.com/interviews/eifrem-graphdbs. In particular check out the part on "From a data complexity perspective, how does Neo4j help remove some of the implementation complexity in storing your data?": "hundreds of millions is probably a large one. And billions that's definitly a large one."
I was in conversation with neo technologies recently, in which they shared that the largest installations they know of machine-wise do not have more than 3-5 machines.
Also, they said that the size of the graph neo4j can efficiently handle is dependent on the number of nodes and edges in the graph. If they can all be kept in memory, most queries will be fast. You find the sizes for nodes and edges in memory at http://wiki.neo4j.org/content/Configuration_Settings (it's 9 bytes per node and 33 bytes per relationship).

Cassandra random read speed

We're still evaluating Cassandra for our data store. As a very simple test, I inserted a value for 4 columns into the Keyspace1/Standard1 column family on my local machine amounting to about 100 bytes of data. Then I read it back as fast as I could by row key. I can read it back at 160,000/second. Great.
Then I put in a million similar records all with keys in the form of X.Y where X in (1..10) and Y in (1..100,000) and I queried for a random record. Performance fell to 26,000 queries per second. This is still well above the number of queries we need to support (about 1,500/sec)
Finally I put ten million records in from 1.1 up through 10.1000000 and randomly queried for one of the 10 million records. Performance is abysmal at 60 queries per second and my disk is thrashing around like crazy.
I also verified that if I ask for a subset of the data, say the 1,000 records between 3,000,000 and 3,001,000, it returns slowly at first and then as they cache, it speeds right up to 20,000 queries per second and my disk stops going crazy.
I've read all over that people are storing billions of records in Cassandra and fetching them at 5-6k per second, but I can't get anywhere near that with only 10mil records. Any idea what I'm doing wrong? Is there some setting I need to change from the defaults? I'm on an overclocked Core i7 box with 6gigs of ram so I don't think it's the machine.
Here's my code to fetch records which I'm spawning into 8 threads to ask for one value from one column via row key:
ColumnPath cp = new ColumnPath();
cp.Column_family = "Standard1";
cp.Column = utf8Encoding.GetBytes("site");
string key = (1+sRand.Next(9)) + "." + (1+sRand.Next(1000000));
ColumnOrSuperColumn logline = client.get("Keyspace1", key, cp, ConsistencyLevel.ONE);
Thanks for any insights
purely random reads is about worst-case behavior for the caching that your OS (and Cassandra if you set up key or row cache) tries to do.
if you look at contrib/py_stress in the Cassandra source distribution, it has a configurable stdev to perform random reads but with some keys hotter than others. this will be more representative of most real-world workloads.
Add more Cassandra nodes and give them lots of memory (-Xms / -Xmx). The more Cassandra instances you have, the data will be partitioned across the nodes and much more likely to be in memory or more easily accessed from disk. You'll be very limited with trying to scale a single workstation class CPU. Also, check the default -Xms/-Xmx setting. I think the default is 1GB.
It looks like you haven't got enough RAM to store all the records in memory.
If you swap to disk then you are in trouble, and performance is expected to drop significantly, especially if you are random reading.
You could also try benchmarking some other popular alternatives, like Redis or VoltDB.
VoltDB can certainly handle this level of read performance as well as writes and operates using a cluster of servers. As an in-memory solution you need to build a large enough cluster to hold all of your data in RAM.