Is there are practical limit to an nftable named ip set size? - nftables

Is there a practical limit to the size of sets in nftables ? Would an ip4_addr range set with 100,000 entries work ? How would it perform ?
Thanks

Related

caffeine cache - Limit size to x MB

Is it possible to set caffeine cache size limited to x MB ?
If this is not possible, is there any performance reason for not doing it [For my understanding].
I checked with weighter, . As per my understanding weight is calculated based on number of time key was accessed. If i understand weight wrongly please guide me with right link
My use case is to set cache size to x-MB and evict once we reach x-MB
Reference i used:
https://github.com/ben-manes/caffeine/wiki/Eviction
https://www.tabnine.com/code/java/methods/com.github.benmanes.caffeine.cache.Caffeine/maximumWeight
weight based eviction : https://www.baeldung.com/java-caching-caffeine
Thanks
Jk

How calculate minimum cluster size for FAT10 with 1 GB disk size?

Let's see we have a file system with FAT10 and a disk size of 1 GB.
I'd like to know how I can calculate the minimum size of a cluster?
My current approach looks like this: FAT10 means we have 2^10 clusters. Since the disk size is 1 GB which equals 2^30 bytes, we have 2^(30-10) = 2^20 bytes for each cluster.
Which means the minimum cluster size is 2^20 bytes ?
I hope this is the correct place to ask, otherwise tell me and I will delete this question! :c
It really depends on what your goals are.
Technically, the minimum cluster size is going to be 1 sector. However, this means that the vast majority of the 1 GB will not likely be accessible by the FAT10 system.
If you want to be able to access almost the whole 1 GB disk with the FAT10, then your calculation serves as a reasonable approximation. In fact due to practical constraints, you're probably not going to get much better unless you decide to start making some more unorthodox decisions (I would argue using a FAT10 system on a 1 GB drive is already unorthodox).
Here are some of the things you will need to know.
How many of the theoretical 1024 FAT values are usable? Remember, some have special meaning such as "cluster available", "end of cluster chain", "bad block (if applicable)" or "reserved value (if applicable)"
Does your on-disk FAT10 table reserve its space by count of sectors or count of clusters?
What is your sector size?
Are there any extra reserved sectors or clusters?
Is your data section going to be sector or cluster aligned?
Are you going to limit your cluster size to a power of two?

maximum number of classes for ColumnDataClassifier

Is there a limit on the maximum number of classes i can have in using ColumnDataClassifier? I have about addresses that I want to assign to 10k orgs, but i kept running into memory issue even after I set the -xmx number to maximum.
There isn't an explicit limit for the size of the label set, but 10k is an extremely large set, and I am not surprised you are having memory issues. You should try some experiments with substantially smaller label sets (~ 100 labels) and see if your issues go away. I don't know how many labels will practically work, but I doubt it's anywhere near 10,000. I would try much smaller sets just to understand how the memory usage is growing at the label set size grows.
You may have to have a hierarchy of labels and different classifiers. You could imagine the first label being "California-organization", and then having a second classifier to select the various California organizations, etc...

What is the max size of collection in mongodb

I would like to know what is the max size of collection in mongodb.
In mongodb limitations documentation it is mentioned single MMAPv1 database has a maximum size of 32TB.
This means max size of collection is 32TB?
If I want to store more than 32TB in one collection what is the solution?
There are theoretical limits, as I will show below, but even the lower bound is pretty high. It is not easy to calculate the limits correctly, but the order of magnitude should be sufficient.
mmapv1
The actual limit depends on a few things like length of shard names and alike (that sums up if you have a couple of hundred thousands of them), but here is a rough calculation with real life data.
Each shard needs some space in the config db, which is limited as any other database to 32TB on a single machine or in a replica set. On the servers I administrate, the average size of an entry in config.shards is 112 bytes. Furthermore, each chunk needs about 250 bytes of metadata information. Let us assume optimal chunk sizes of close to 64MB.
We can have at maximum 500,000 chunks per server. 500,000 * 250byte equals 125MB for the chunk information per shard. So, per shard, we have 125.000112 MB per shard if we max everything out. Dividing 32TB by that value shows us that we can have a maximum of slightly under 256,000 shards in a cluster.
Each shard in turn can hold 32TB worth of data. 256,000 * 32TB is 8.19200 exabytes or 8,192,000 terabytes. That would be the limit for our example.
Let's say its 8 exabytes. As of now, this can easily translated to "Enough for all practical purposes". To give you an impression: All data held by the Library of Congress (arguably one of the biggest library in the world in terms of collection size) holds an estimated size of data of around 20TB in size including audio, video, and digital materials. You could fit that into our theoretical MongoDB cluster some 400,000 times. Note that this is the lower bound of the maximum size, using conservative values.
WiredTiger
Now for the good part: The WiredTiger storage engine does not have this limitation: The database size is not limited (since there is no limit on how many datafiles can be used), so we can have an unlimited number of shards. Even when we have those shards running on mmapv1 and only our config servers on WT, the size of a becomes nearly unlimited – the limitation to 16.8M TB of RAM on a 64 bit system might cause problems somewhere and cause the indices of the config.shard collection to be swapped to disk, stalling the system. I can only guess, since my calculator refuses to work with numbers in that area (and I am too lazy to do it by hand), but I estimate the limit here in the two digit yottabyte area (and the space needed to host that somewhere in the size of Texas).
Conclusion
Do not worry about the maximum data size in a sharded environment. No matter what, it is by far enough, even with the most conservative approach. Use sharding, and you are done. Btw: even 32TB is a hell lot of data: Most clusters I know hold less data and shard because the IOPS and RAM utilization exceeded a single nodes capacity.

How much used diskspace will increase when doing Mongo Indexing?

Let say, I have 1GB of data in format: {_id:ObjectId(), expiration:, value:}. let say around 1000000 records.
If I do an indexing on a non-existing field, e.g. "key", ( db.mytest.ensureIndex({key:1}) ), then in general, how many diskspace will it increase?
of I do indexing on the expiration date, how many will be increased?
The size of the index depends upon the number of documents being indexed and the key size. You should be able to estimate the approximate size of the index on expiration date by multiplying the number of documents by the key size (dates in Mongo are 8 bytes). But as Corey points out, testing is the best way to find out.
If you create an index on a non-existing field, you should not see the index size increase since there is nothing to add to the Btree.