caffeine cache - Limit size to x MB - caffeine-cache

Is it possible to set caffeine cache size limited to x MB ?
If this is not possible, is there any performance reason for not doing it [For my understanding].
I checked with weighter, . As per my understanding weight is calculated based on number of time key was accessed. If i understand weight wrongly please guide me with right link
My use case is to set cache size to x-MB and evict once we reach x-MB
Reference i used:
https://github.com/ben-manes/caffeine/wiki/Eviction
https://www.tabnine.com/code/java/methods/com.github.benmanes.caffeine.cache.Caffeine/maximumWeight
weight based eviction : https://www.baeldung.com/java-caching-caffeine
Thanks
Jk

Related

yolov4..cfg : increasing subdivisions parameter consequences

I'm trying to train a custom dataset using Darknet framework and Yolov4. I built up my own dataset but I get a Out of memory message in google colab. It also said "try to change subdivisions to 64" or something like that.
I've searched around the meaning of main .cfg parameters such as batch, subdivisions, etc. and I can understand that increasing the subdivisions number means splitting into smaller "pictures" before processing, thus avoiding to get the fatal "CUDA out of memory". And indeed switching to 64 worked well. Now I couldn't find anywhere the answer to the ultimate question: is the final weight file and accuracy "crippled" by doing this? More specifically what are the consequences on the final result? If we put aside the training time (which would surely increase since there are more subdivisions to train), how will be the accuracy?
In other words: if we use exactly the same dataset and train using 8 subdivisions, then do the same using 64 subdivisions, will the best_weight file be the same? And will the object detections success % be the same or worse?
Thank you.
first read comments
suppose you have 100 batches.
batch size = 64
subdivision = 8
it will divide your batch = 64/8 => 8
Now it will load and work one by one on 8 divided parts into the RAM, because of LOW RAM capacity you can change the parameter according to ram capacity.
you can also reduce batch size , so it will take low space in ram.
It will do nothing to the datasets images.
It is just splitting the large batch size which can't be load in RAM, so divided into small pieces.

How calculate minimum cluster size for FAT10 with 1 GB disk size?

Let's see we have a file system with FAT10 and a disk size of 1 GB.
I'd like to know how I can calculate the minimum size of a cluster?
My current approach looks like this: FAT10 means we have 2^10 clusters. Since the disk size is 1 GB which equals 2^30 bytes, we have 2^(30-10) = 2^20 bytes for each cluster.
Which means the minimum cluster size is 2^20 bytes ?
I hope this is the correct place to ask, otherwise tell me and I will delete this question! :c
It really depends on what your goals are.
Technically, the minimum cluster size is going to be 1 sector. However, this means that the vast majority of the 1 GB will not likely be accessible by the FAT10 system.
If you want to be able to access almost the whole 1 GB disk with the FAT10, then your calculation serves as a reasonable approximation. In fact due to practical constraints, you're probably not going to get much better unless you decide to start making some more unorthodox decisions (I would argue using a FAT10 system on a 1 GB drive is already unorthodox).
Here are some of the things you will need to know.
How many of the theoretical 1024 FAT values are usable? Remember, some have special meaning such as "cluster available", "end of cluster chain", "bad block (if applicable)" or "reserved value (if applicable)"
Does your on-disk FAT10 table reserve its space by count of sectors or count of clusters?
What is your sector size?
Are there any extra reserved sectors or clusters?
Is your data section going to be sector or cluster aligned?
Are you going to limit your cluster size to a power of two?

scikit-learn: Hierarchal Agglomerative Clustering performance with increasing dataset

scikit-learn==0.21.2
Hierarchal Agglomerative Clustering algorithm response time is increasing exponentially when increasing the dataset.
My Data set is textual. Each Document is 7-10 words long.
Using the following code to perform the Clustering.
hac_model = AgglomerativeClustering(affinity=consine,
linkage=complete,
compute_full_tree=True,
connectivity=None, memory=None,
n_clusters=None,
distance_threshold=0.7)
cluster_matrix = hac_model.fit_predict(matrix)
where the matrix of size are:
5000x1500 taking 17 seconds
10000*2000 taking 113 seconds
13000*2418 taking 228 seconds
I can't control 5000, 10000, 15000 as that is the size of input, or the feature set size(i.e 1500,2000,2418) since I am using BOW model(TFIDF).
I end up using all the unique words(after removing stopwords) as my feature list. this list grows as the input size increases.
So two questions.
How do I avoid increase in feature set size irrespective of increase in the size of input data set
Is there a way I can improve on the performance of the Algorithm without compromising on the quality?
Standard AGNES hierarchical clustering is O(n³+n²d) in complexity. So the number of instances is much more a problem than the number of features.
There are approaches that typically run in O(n²d), although the worst case remains the same, so they will be much faster than this. With these you'll usually run into memory limits first... Unfortunately, this isn't implemented in sklearn for all I know, so you'll have to use other clustering tools - or write the algorithm yourself.

What is the max size of collection in mongodb

I would like to know what is the max size of collection in mongodb.
In mongodb limitations documentation it is mentioned single MMAPv1 database has a maximum size of 32TB.
This means max size of collection is 32TB?
If I want to store more than 32TB in one collection what is the solution?
There are theoretical limits, as I will show below, but even the lower bound is pretty high. It is not easy to calculate the limits correctly, but the order of magnitude should be sufficient.
mmapv1
The actual limit depends on a few things like length of shard names and alike (that sums up if you have a couple of hundred thousands of them), but here is a rough calculation with real life data.
Each shard needs some space in the config db, which is limited as any other database to 32TB on a single machine or in a replica set. On the servers I administrate, the average size of an entry in config.shards is 112 bytes. Furthermore, each chunk needs about 250 bytes of metadata information. Let us assume optimal chunk sizes of close to 64MB.
We can have at maximum 500,000 chunks per server. 500,000 * 250byte equals 125MB for the chunk information per shard. So, per shard, we have 125.000112 MB per shard if we max everything out. Dividing 32TB by that value shows us that we can have a maximum of slightly under 256,000 shards in a cluster.
Each shard in turn can hold 32TB worth of data. 256,000 * 32TB is 8.19200 exabytes or 8,192,000 terabytes. That would be the limit for our example.
Let's say its 8 exabytes. As of now, this can easily translated to "Enough for all practical purposes". To give you an impression: All data held by the Library of Congress (arguably one of the biggest library in the world in terms of collection size) holds an estimated size of data of around 20TB in size including audio, video, and digital materials. You could fit that into our theoretical MongoDB cluster some 400,000 times. Note that this is the lower bound of the maximum size, using conservative values.
WiredTiger
Now for the good part: The WiredTiger storage engine does not have this limitation: The database size is not limited (since there is no limit on how many datafiles can be used), so we can have an unlimited number of shards. Even when we have those shards running on mmapv1 and only our config servers on WT, the size of a becomes nearly unlimited – the limitation to 16.8M TB of RAM on a 64 bit system might cause problems somewhere and cause the indices of the config.shard collection to be swapped to disk, stalling the system. I can only guess, since my calculator refuses to work with numbers in that area (and I am too lazy to do it by hand), but I estimate the limit here in the two digit yottabyte area (and the space needed to host that somewhere in the size of Texas).
Conclusion
Do not worry about the maximum data size in a sharded environment. No matter what, it is by far enough, even with the most conservative approach. Use sharding, and you are done. Btw: even 32TB is a hell lot of data: Most clusters I know hold less data and shard because the IOPS and RAM utilization exceeded a single nodes capacity.

Virtual Memory Page Replacement Algorithms

I have a project where I am asked to develop an application to simulate how different page replacement algorithms perform (with varying working set size and stability period). My results:
Vertical axis: page faults
Horizontal axis: working set size
Depth axis: stable period
Are my results reasonable? I expected LRU to have better results than FIFO. Here, they are approximately the same.
For random, stability period and working set size doesnt seem to affect the performance at all? I expected similar graphs as FIFO & LRU just worst performance? If the reference string is highly stable (little branches) and have a small working set size, it should still have less page faults that an application with many branches and big working set size?
More Info
My Python Code | The Project Question
Length of reference string (RS): 200,000
Size of virtual memory (P): 1000
Size of main memory (F): 100
number of time page referenced (m): 100
Size of working set (e): 2 - 100
Stability (t): 0 - 1
Working set size (e) & stable period (t) affects how reference string are generated.
|-----------|--------|------------------------------------|
0 p p+e P-1
So assume the above the the virtual memory of size P. To generate reference strings, the following algorithm is used:
Repeat until reference string generated
pick m numbers in [p, p+e]. m simulates or refers to number of times page is referenced
pick random number, 0 <= r < 1
if r < t
generate new p
else (++p)%P
UPDATE (In response to #MrGomez's answer)
However, recall how you seeded your input data: using random.random,
thus giving you a uniform distribution of data with your controllable
level of entropy. Because of this, all values are equally likely to
occur, and because you've constructed this in floating point space,
recurrences are highly improbable.
I am using random, but it is not totally random either, references are generated with some locality though the use of working set size and number page referenced parameters?
I tried increasing the numPageReferenced relative with numFrames in hope that it will reference a page currently in memory more, thus showing the performance benefit of LRU over FIFO, but that didn't give me a clear result tho. Just FYI, I tried the same app with the following parameters (Pages/Frames ratio is still kept the same, I reduced the size of data to make things faster).
--numReferences 1000 --numPages 100 --numFrames 10 --numPageReferenced 20
The result is
Still not such a big difference. Am I right to say if I increase numPageReferenced relative to numFrames, LRU should have a better performance as it is referencing pages in memory more? Or perhaps I am mis-understanding something?
For random, I am thinking along the lines of:
Suppose theres high stability and small working set. It means that the pages referenced are very likely to be in memory. So the need for the page replacement algorithm to run is lower?
Hmm maybe I got to think about this more :)
UPDATE: Trashing less obvious on lower stablity
Here, I am trying to show the trashing as working set size exceeds the number of frames (100) in memory. However, notice thrashing appears less obvious with lower stability (high t), why might that be? Is the explanation that as stability becomes low, page faults approaches maximum thus it does not matter as much what the working set size is?
These results are reasonable given your current implementation. The rationale behind that, however, bears some discussion.
When considering algorithms in general, it's most important to consider the properties of the algorithms currently under inspection. Specifically, note their corner cases and best and worst case conditions. You're probably already familiar with this terse method of evaluation, so this is mostly for the benefit of those reading here whom may not have an algorithmic background.
Let's break your question down by algorithm and explore their component properties in context:
FIFO shows an increase in page faults as the size of your working set (length axis) increases.
This is correct behavior, consistent with Bélády's anomaly for FIFO replacement. As the size of your working page set increases, the number of page faults should also increase.
FIFO shows an increase in page faults as system stability (1 - depth axis) decreases.
Noting your algorithm for seeding stability (if random.random() < stability), your results become less stable as stability (S) approaches 1. As you sharply increase the entropy in your data, the number of page faults, too, sharply increases and propagates the Bélády's anomaly.
So far, so good.
LRU shows consistency with FIFO. Why?
Note your seeding algorithm. Standard LRU is most optimal when you have paging requests that are structured to smaller operational frames. For ordered, predictable lookups, it improves upon FIFO by aging off results that no longer exist in the current execution frame, which is a very useful property for staged execution and encapsulated, modal operation. Again, so far, so good.
However, recall how you seeded your input data: using random.random, thus giving you a uniform distribution of data with your controllable level of entropy. Because of this, all values are equally likely to occur, and because you've constructed this in floating point space, recurrences are highly improbable.
As a result, your LRU is perceiving each element to occur a small number of times, then to be completely discarded when the next value was calculated. It thus correctly pages each value as it falls out of the window, giving you performance exactly comparable to FIFO. If your system properly accounted for recurrence or a compressed character space, you would see markedly different results.
For random, stability period and working set size doesn't seem to affect the performance at all. Why are we seeing this scribble all over the graph instead of giving us a relatively smooth manifold?
In the case of a random paging scheme, you age off each entry stochastically. Purportedly, this should give us some form of a manifold bound to the entropy and size of our working set... right?
Or should it? For each set of entries, you randomly assign a subset to page out as a function of time. This should give relatively even paging performance, regardless of stability and regardless of your working set, as long as your access profile is again uniformly random.
So, based on the conditions you are checking, this is entirely correct behavior consistent with what we'd expect. You get an even paging performance that doesn't degrade with other factors (but, conversely, isn't improved by them) that's suitable for high load, efficient operation. Not bad, just not what you might intuitively expect.
So, in a nutshell, that's the breakdown as your project is currently implemented.
As an exercise in further exploring the properties of these algorithms in the context of different dispositions and distributions of input data, I highly recommend digging into scipy.stats to see what, for example, a Gaussian or logistic distribution might do to each graph. Then, I would come back to the documented expectations of each algorithm and draft cases where each is uniquely most and least appropriate.
All in all, I think your teacher will be proud. :)