Is it possible to use mongoDB geospacial indexes with grid FS - mongodb

I have a large geojson feature collection which is over 16MB. I am hoping to insert the data into MongoDB so that I can utilize the geospatial functionality that MongoDB offers ($geoIntersects, $geoWithin, etc). Due to the large size of the file, I cannot store the data in one MongoDB document.
I have used GridFS to break the file up into several chunks within MongoDB but I am now unsure whether I can now utilize the geospatial features that I would like to.
Does anyone know if this is possible and if so whats the best way to do something like this?

One way you should be able to achieve what you are describing is to extract the data to be indexed into a separate collection, and add indexes on that collection.
GridFS essentially takes the data, splits it into 16 MB chunks or less and stores each chunk as a document. I don't see how the chunks would be supported by a geo index.

Related

How to store lookup values in MongoDB?

I have a collection in db which represents mediafiles.
And among other info I shoud store format name. I wonder if there best practices to store info like that. Is it better to create new collection for file formats and use link to that collection or to store format name right in file documents as a plain text? What about perfomance and compression? It supposed to be more than a billion documents in db. What would mongo expers suggest in this situation?
Embedded documents are the preferred approach.
In your case, it means it is better to store file format in the same collection.
Putting the file format into the separate collection means creating a new file on the disk.
It is a slower option and should be used if your document ( any of them ) exceeds 16 MB in size.
See these links for more information
6 Rules of Thumb for MongoDB Schema Design
and
How to Program with MongoDB Using the .NET Driver
I've done some benchmarks and figured out that in my case storing "lookup values" as plaintext is more efficient in terms of disk space than embedded document and than reference to outstanding collection. Sorry for poor terminology.

Mapping datasets to NoSql (MongoDB) collection

what I have ?
I have data of 'n' department
each department has more than 1000 datasets
each datasets has more than 10,000 csv files(size greater than 10MB) each with different schema.
This data even grow more in future
What I want to DO?
I want to map this data into mongodb
What approaches I used?
I can't map each datasets to a document in mongo since it has limit of 4-16MB
I cannot create collection for each datasets as max number of collection is also limited (<24000)
So finally I thought to create collection for each department , in that collection one document for each record in csv file belonging to that department.
I want to know from you :
will there be a performance issue if we map each record to document?
is there any max limit for number of documents?
is there any other design i can do?
will there be a performance issue if we map each record to document?
mapping each record to document in mongodb is not a bad design. You can have a look at FAQ at mongodb site
http://docs.mongodb.org/manual/faq/fundamentals/#do-mongodb-databases-have-tables .
It says,
...Instead of tables, a MongoDB database stores its data in collections,
which are the rough equivalent of RDBMS tables. A collection holds one
or more documents, which corresponds to a record or a row in a
relational database table....
Along with limitation of BSON document size(16MB), It also has max limit of 100 for level of document nesting
http://docs.mongodb.org/manual/reference/limits/#BSON Document Size
...Nested Depth for BSON Documents Changed in version 2.2.
MongoDB supports no more than 100 levels of nesting for BSON document...
So its better to go with one document for each record
is there any max limit for number of documents?
No, Its mention in reference manual of mongoDB
...Maximum Number of Documents in a Capped Collection Changed in
version
2.4.
If you specify a maximum number of documents for a capped collection
using the max parameter to create, the limit must be less than 232
documents. If you do not specify a maximum number of documents when
creating a capped collection, there is no limit on the number of
documents ...
is there any other design i can do?
If your document is too large then you can think of document partitioning at application level. But it will have high computation requirement at application layer.
will there be a performance issue if we map each record to document?
That depends entirely on how you search them. When you use a lot of queries which affect only one document, it is likely even faster that way. When a higher document-granularity results in a lot of document-spanning queries, it will get slower because MongoDB can't do that itself.
is there any max limit for number of documents?
No.
is there any other design i can do?
Maybe, but that depends on how you want to query your data. When you are content with treating files as a BLOB which is retrieved as a whole but not searched or analyzed on the database level, you could consider storing them on GridFS. It's a way to store files larger than 16MB on MongoDB.
In General, MongoDB database design doesn't depend so much on what and how much data you have, but rather on how you want to work with it.

How should I use MongoDB GridFS to store my big-size data?

After I read MongoDB Gridfs official document , I know that GridFS is used by MongoDB to store large file(size>16M),the file can be a video , a movie or anything else.But now , what I meet , is large strutured data , not a simple physical file. Size of the data exceeds the limit. To make it more detailed, what I am dealing with is thousands of gene sequences,and many of them exceeds BSON-document size limit .You can just consider each gene sequence as a simple string ,and the string is so large that some string has exceeds the mongoDB BSOM size limit.So ,what can I do to solve such a problem ? Is GridFS still suitable to solve my problem?
GridFS will split up the data in chunks of smaller size, that's how it overcomes the size limit. It's particularly useful for streaming data, because you can quickly access data at any given offset since the chunks are indexed.
Storing 'structured' data in the tens of megabytes sounds a bit weird: either you need to access parts of the data based on some criteria, then you need a different data structure that allows access to smaller parts of the data.
Or you really need to process the entire data set based on some criteria. In that case, you'll want an efficiently indexed collection that you can query based on your criteria and that contains the id of the file that must then be processed.
Without a concrete example of the problem, i.e. what does the query and the data structure look like, it will be hard to give you a more detailed answer.

MongoDB GridFS Size Limit

I am using MongoDB as a convenient way of storing a dataset as a series of columns where there is a document that stores the values for a given column and another document that stores the details of the detaset, and a mapping to the other documents with the associated column values. The issue I'm now facing as things get bigger is that I can no longer store the entire column in a single document.
I'm aware that there is also the GridFS option, the only downside is that I believe it stores the files as blobs meaning I would lose random access to a chunk of the column, or the value at a specified index, something that was incredibly useful from the document store, however I may not ahve any other option.
So my question is: does GridFS also impose an upper limit on the size of documents and if so does anyone know what this is. I've looked in hte docs and haven't found anything, but it may be I'm not looking in the correct place or that there is a limit but it's not well documented.
Thanks,
Vackar
GridFS
Per the GridFS documentation:
Instead of storing a file in an single document, GridFS divides a file
into parts, or chunks, and stores each of those chunks as a separate
document. By default GridFS limits chunk size to 256k. GridFS uses
two collections to store files. One collection stores the file chunks,
and the other stores file metadata.
GridFS will allow you to store arbitrarily large files however this really won't help your use case. A file in GridFS will effectively be a large binary blob and you will not get any of the benefits of structured documents and indexing.
Schema Design
The fundamental challenge you have is your approach to schema design. If you are creating documents that are likely to grow beyond the 16Mb document limit, these will also have a significant impact on your database storage and fragmentation as the documents grow in size.
The appropriate solution would be to rethink your schema approach so that you do not have unbounded document growth. This probably means flattening the array of "columns" that you are growing so it is represented by a collection of documents rather than an array.
A better (and separate) question to ask would be how to refactor your schema given the expected data growth patterns.

Anyone can explain about MongoDB GridFS feature?

I am one of the MongoDB user.Unfortunately i heard about MongoDB GridFS feature but i am not sure When and where to use that feature or its necessary to use.If you might be using the feature then, i hope you guys can explain precisely about this feature.
Thanks,
You'd look into using GridFS if you needed to store large files in MongoDB.
Quote from the MongoDB documentation which I think sums it up well:
The database supports native storage
of binary data within BSON objects.
However, BSON objects in MongoDB are
limited to 4MB in size. The GridFS
spec provides a mechanism for
transparently dividing a large file
among multiple documents. This allows
us to efficiently store large objects,
and in the case of especially large
files, such as videos, permits range
operations (e.g., fetching only the
first N bytes of a file).
There's also the GridFS specification here