lucene search performance is very slow - lucene.net

I am using lucene.net, I need to index and store very large size of document say 500 GB, Creates index folder of size 500 GB When I use search on this it tooks too much time say 10 min, I use analyzed and not analyzed Index type on different Fields , I have 90 such different fields.
Please help me. Pravin Thokal

Related

is there a maximum number of records for MongoDB?

I am looking to use MongoDB to store a huge amount of records : between 12 and 15 billions. Is it possible to store this number of documents in mongoDB ?
I saw on the net, that there are limits for : document size, index size, number of elements in collection.
But is there a limit in terms of number of records ?
There is no limit on the number of documents in one collection. However, you are probobly will run into issues with disk, ram, and lookup/update performance. If your data has some kind of logical grouping, I would suggest splinting it between multiple collections, or even instances (on different servers of course).

Meteor increase size of mongodb

I have a running Meteor project and wanted to test, if it scales.
So in fact I have to save data in a collection where one document uses round about 500 KB. At 682 entries (documents) in this collection Meteor exits with code 3.
I heard about a maximum of 32MB data, why is that? How can I get more space? Will more space have impacts on efficiency?
I need about 10000 entries of 500 KB. Is that even possible?
Would appreciate answers and also solutions (like maybe outsource mongodb?).

Lucene.net limitations on the index files with large data

We are planning to use to lucene.net for search in order to have a google like search. We will have a huge amount of data which will be loaded from d/b which needs to be indexed. Is there any limitations on size of the indexing documents and what is the maximum size of a single index document be? How to distribute the size on an average size of each document?
There are no limitations with the data sizes you're hinting at, i'm running a lucene.net app that works with 100s of gb of indexed data without spliting the index (outside of how lucene naturally splits it without you asking).
Just add your data to your index and forget about any limitations, you're way bellow any potential issues. (but do read all their performance guidelines, the lucene ones as lucene.net is a direct port, all those tips apply).

Are there any tools to estimate index size in MongoDB?

I'm looking for a tool to get a decent estimate of how large a MongoDB index will be based on a few signals like:
How many documents in my collection
The size of the indexed field(s)
The size of the _id I'm using if not ObjectId
Geo/Non-geo
Has anyone stumbled across something like this? I can imagine it would be extremely useful given Mongo's performance degradation once it hits the memory wall and documents start getting paged out to disk. If I have a functioning database and want to add another index, the only way I'll know if it will be too big is to actually add it.
It wouldn't need to be accurate down to the bit, but with some assumptions about B-Trees and the index implementation I'm sure it could be reasonable enough to be helpful.
If this doesn't exist already I'd like to build and open source it, so if I've missed any required parameters for this calculation please include in your answer.
I just spoke with some of the 10gen engineers and there isn't a tool but you can do a back of the envelope calculation that is based on this formula:
2 * [ n * ( 18 bytes overhead + avg size of indexed field + 5 or so bytes of conversion fudge factor ) ]
Where n is the number of documents you have.
The overhead and conversion padding are mongo specific but the 2x comes from the b-tree data structure being roughly half full (but having allocated 100% of the space a full tree would require) in the worst case.
I'd explain more but I'm learning about it myself at the moment. This presentation will have more details: http://www.10gen.com/presentations/mongosp-2011/mongodb-internals
You can check the sizes of the indexes on a collection by using command:
db.collection.stats()
More details here: http://docs.mongodb.org/manual/reference/method/db.collection.stats/#db.collection.stats
Another way to calculate is to ingest ~1000 or so documents into every collection, in other words, build a small scale model of what you're going to end up within production, create indexes or what have you and calculate the final numbers based on db.collection.stats() average.
Edit (from a comment):
Tyler's answer
describes the original MMAP storage engine circa MongoDB 2.0, but this
formula definitely isn't applicable to modern versions of MongoDB.
WiredTiger, the default storage engine in MongoDB 3.2+, uses index
prefix compression so index sizes will vary based on the distribution
of key values. There are also a variety of index types and options
which might affect sizing. The best approach for a reasonable estimate
would be using empirical estimation with representative test data for
your projected growth.
Best option is to test in non-prod deployment!
Insert 1000 documents and check index sizes , insert 100000 documents and check index sizes and so one.
Easy way to check in a loop all collections total index sizes:
var y=0;db.adminCommand("listDatabases").databases.forEach(function(d){mdb=db.getSiblingDB(d.name);mdb.getCollectionNames().forEach(function(c){s=mdb[c].stats(1024*1024).totalIndexSize;y=y+s;print("db.Collection:"+d.name+"."+c+" totalIndexSize: "+s+" MB"); })});print("============================");print("Instance totalIndexSize: "+y+" MB");

Mongodb is slow on 8 million rows collection

I have 8 milion rows of collection, and I am new to mongoDB.
It loads 20 rows really slowly...
What should I do do speed it up?
Probably you need to add index.
Optimization Guidelines
Need more informations.Just like server's ram。
Optimize Step:
1.Index your query field。
2.Check the index's storage size。If the index's size larger than RAM。MongoDB need to read data by disk I/O.So slower.
8 milion documents is not large。It is not a big deal。