CosmosDB Mongo Database Request Saving Issue - azure-devops

I have quite large request to save and it is really necessary to save it. I have read across the web and according to documentation the request size should be between 2-4 MB and when I save it and get error below:
"Mongo Error: Request size is too large "
it contains lots of text, and images that required for the user to upload so the document gets really big. How can I save large request data in the cosmosDb?

Based on the official document for Cosmos DB Limitation:
There are no restrictions on the item payloads like number of
properties and nesting depth, except for the length restrictions on
partition key and id values, and the overall size restriction of 2 MB
And the max request size is 2MB,response size is 4MB.(link)
If your data is 2MB+, you could follow the strategy in this blog: Cosmos DB document size is limited to 2 MB and is not supposed to be used for content storage. For larger payload storage, use Azure Blob Storage instead.
Or you could consider using MongoDB Atlas on Azure if you'd like full MongoDB feature support.

Related

Is there a maximum payload size that we can GET through MongoDB Atlas?

I have a document store on MongoDB Atlas where each document is roughly 1MB. I am using REST API and a GET request to access data from this database. This GET request filters the database and can return a bunch of documents based on the filter.
My question is : Is there a recommended payload size limit in terms of amount of data that we can transmit? For example if my filter results in 100 documents I would be transmitting approx 100MB of data.
Are there better ways to do this?

Using nested document structure in mongodb

I am planning to use a nested document structure for my MongoDB Schema design as I don't want to go for flat schema design as In my case I will need to fetch my result in one query only.
Since MongoDB has a size limit for a document.
MongoDB Limits and Threshold
A MongoDB document has a size limit of 16MB ( an amount of data). If your subcollection can growth without limits go flat.
I don't need to fetch my nested data but only be needing my nested data for filtering and querying purpose.
I want to know whether I will still be bound by MongoDB size limits even if I use my embedded data only for querying and filter purpose and never for fetching of nested data because as per my understanding, in this case, MongoDB won't load the complete document in memory but only the selected fields?
Nested schema design example
{
clinicName: "XYZ Hopital",
clinicAddress: "ABC place.",
"doctorsWorking":{
"doctorId1":{
"doctorJoined": ISODate("2017-03-15T10:47:47.647Z")
},
"doctorId2":{
"doctorJoined": ISODate("2017-04-15T10:47:47.647Z")
},
"doctorId3":{
"doctorJoined": ISODate("2017-05-15T10:47:47.647Z")
},
...
...
//upto 30000-40000 more records suppose
}
}
I don't think your understanding is correct when you say "because as per my understanding, in this case, MongoDB won't load the complete document in memory but only the selected fields?".
If we see MongoDB Doc. then it reads
The maximum BSON document size is 16 megabytes. The maximum document size helps ensure that a single document cannot use excessive amount of RAM or, during transmission, excessive amount of bandwidth. To store documents larger than the maximum size, MongoDB provides the GridFS API.
So the clear limit is 16 MB on document size. Mongo should stop you from saving such a document which is greater than this size.
If I agree with your understanding for a while then let's say that it allows to
save any size of document but more than 16 MB in RAM is not allowed. But on other hand, while storing the data it won't know what queries will be run on this data. So ultimately you will be inserting such big documents which can't be used later. (because while inserting we don't tell the query pattern, we can even try to fetch the full document in a single shot later).
If the limit is on transmission (hypothetically assuming) then there are lot of ways (via code) software developers can bring data into RAM in clusters and they won't cross 16 MB limit ever (that's how they do IO ops. on large files). They will make fun of this limit and just leave it useless. I hope MongoDB creators knew it and didn't want it to happen.
Also if limit is on transmission then there won't be any need of separate collection. We can put everything in a single collections and just write smart queries and can fetch data. If fetched data is crossing 16 MB then fetch it in parts and forget the limit. But it doesn't go this way.
So the limit must be on document size else it can create so many issues.
In my opinion if you just need "doctorsWorking" data for filtering or querying purpose (and if you also think that "doctorsWorking" will cause document to cross 16 MB limit) then it's good to keep it in a separate collection.
Ultimately all things depend on query and data pattern. If a doctor can serve in multiple hospitals in shifts then it will be great to keep doctors in separate collection.

GridFS and Cloning to another server

I have a local MongoDB database that I am starting to put some files into GridFS for caching purposes. What I want to know is:
Can I use db.cloneCollection() on another server to clone my fs.* collections? If I do that will the GridFS system on that server work properly? Essentially I have to "pull" data from another machine that has the files in GridFS, I can't direcly add them easily to the production box.
Edit: I was able to get on my destination server and use the following commands from the mongo shell to pull the GridFS system over from another mongo system on our network.
use DBName
db.cloneCollection("otherserver:someport","fs.files")
db.cloneCollection("otherserver:someport","fs.chunks")
For future reference.
The short answer is of course you can, it is only a collection and there is nothing special about it at all. The longer form is explaining what GridFS actually is.
So the very first sentence on the manual page:
GridFS is a specification for storing and retrieving files that exceed the BSON-document size limit of 16MB.
GridFS is not something that "MongoDB does", internally to the server it is basically just two collections, one for the reference information and one for the "chunks" that are used to break up the content so no individual document exceeds the 16MB limit. But most importantly here is the word "specification".
So the server itself does no magic at all. The implementation to store reference data and chunks is all done at the "driver" level, where in fact you can name the collections you wish to use rather than just accept the defaults. So when reading and writing data, it is the "driver" that does the work by pulling the "chunks" contained in the reference document or creating new "chunks" as data is sent to the server.
The other common misconception is that GridFS is the only method for dealing with "files" when sending content to MongoDB. Again in that first sentence, it actually exists as a way to store content that exceeds the 16MB limit for BSON documents.
MongoDB has no problem directly storing binary data in a document as long as the total document does not exceed the 16MB limit. So in most use cases ( small image files used on websites ) the data would be better stored in ordinary documents and thus avoid the overhead of needing to read and write with multiple collections.
So there is no internal server "magic". These are just ordinary collections that you can query, aggregate, mapReduce and even copy or clone.

How should I use MongoDB GridFS to store my big-size data?

After I read MongoDB Gridfs official document , I know that GridFS is used by MongoDB to store large file(size>16M),the file can be a video , a movie or anything else.But now , what I meet , is large strutured data , not a simple physical file. Size of the data exceeds the limit. To make it more detailed, what I am dealing with is thousands of gene sequences,and many of them exceeds BSON-document size limit .You can just consider each gene sequence as a simple string ,and the string is so large that some string has exceeds the mongoDB BSOM size limit.So ,what can I do to solve such a problem ? Is GridFS still suitable to solve my problem?
GridFS will split up the data in chunks of smaller size, that's how it overcomes the size limit. It's particularly useful for streaming data, because you can quickly access data at any given offset since the chunks are indexed.
Storing 'structured' data in the tens of megabytes sounds a bit weird: either you need to access parts of the data based on some criteria, then you need a different data structure that allows access to smaller parts of the data.
Or you really need to process the entire data set based on some criteria. In that case, you'll want an efficiently indexed collection that you can query based on your criteria and that contains the id of the file that must then be processed.
Without a concrete example of the problem, i.e. what does the query and the data structure look like, it will be hard to give you a more detailed answer.

Exceded maximum insert size of 16,000,000 bytes + mongoDB + ruby

I have an application where I'm using mongodb as a database for storing record the ruby wrapper for mongodb I'm using is mongoid
Now everything was working fine until I hit a above error
Exceded maximum insert size of 16,000,000 bytes
Can any pin point how to get rid of errors.
I'm running a mongodb server which does not have a configuration file (no configuration was provide with mongodb source file)
Can anyone help
You have hit the maximum limit of a single document in MongoDB.
If you save large data files in MongoDB, use GridFs instead.
If your document has too many subdocuments, consider splitting it and use relations instead of nesting.
The limit of 16MB data per document is a very well known limitation.
Use GridFS for storing arbitrary binary data of arbitrary size + metadata.