Use Mongoexport to export a collection in multiple files

Use Mongoexport to export a collection in multiple files - mongodb

I'm trying to export all datas from one of my collection but the collection exceed 16mo.
So when I try to re import it, Mongo fails since the limit of import is 16mo.
Is there a way to ask the export in multiple files? I don't find this information in the doc.
Thank you.

Depending on the data in your collection, one possible solution might be to use the --query <JSON>, -q <JSON> flag to create several files. (Documentation here.) For example, if your collection stores college student documents, e.g.:
{ _id: ObjectId("5237258211f41a0c647c47b1"),
name: "Jane Doe",
age: 19,
grade: "sophomore" },
{ _id: ObjectId("5237258211f41a0c647c47b2"),
name: "John Smith",
age: 20,
grade: "junior" },
...
You might, for example, decide to query on grade, running mongoexport four times to create four files (freshman, sophomore, junior, senior). If each file was under 16mb, this would solve your problem.
If this doesn't answer your question, please provide the commands you're using to import and export. :)

Related

List documents in Meteor collection with duplicate first names

My 'Programs' collection would look like this (as an array);
[{ FullName: "Jane Doe", CampYear: "mays15",...}, { FullName "Jane Doe", CampYear: "mays16",...},...]
Some people in the collection are newbies and have just one document in the collection. Others have multiple documents and are returnees. We'd like the ability to mark or flag somehow the newbies. Somehow iterate through the collection and single out those who just have one document in there. The trouble is if I have a list of, say, 150 names, for each name I'd have to have a separate find operation on the collection, which is too intensive.
I tried using aggregation via the meteorhacks:aggregate but couldn't get it to work. After loading the package, my IDE wouldn't recognize the .aggregate method at all, even on the server.
Underscore might be a worthwhile way of doing it, but I couldn't find a method that might be of assistance.
Any ideas how we could do this?

Based on your comment, I'd probably denormalize your data. I'd have a new collection called CampAttendance or something like that. Then you'd have the structure:
{
"name": "The camper's name",
"years": ["mays2015", ...]
}
You can then use upsert to either insert a new record or $push another camp year onto the years array as you're importing data.
To get the camper names who are 'newbies' then, you do:
CampAttendance.find({ years: { $size: 1 } });

MongoDB - Manipulating multi-level arrays in a document

I am currently building an app with Meteor and MongoDB. I have a 3 level document structure with array in array:
{
_id: "shtZFiTeHrPKyJ8vR",
description: "Some title",
categories: [{
id: "shtZFiTeHrPKyJ8vR",
name: "Foo",
options: [{
id: "shtZFiTeHrPKyJ8vR",
name: "bar",
likes: ["abc", "bce"]
}]
}]
}
Now, the document could be manipulated at any level. Means:
description could be changed
categories can be added / removed / renamed
options can be added / removed / renamed
users can like options, so they must be added or removed
1 and 2 is quite easy. It is also relatively easy to add or remove a new option:
MyCollection.update({ _id: id, "categories.id": categoryId }, {
$push: {
"categories.$.options": {
id: Random.id
name: optionName
}
}
});
But manipulating the options hash requires to do that on javascript objects. That means I first need to find my document, iterate over the options and then write them back.
At least that's what I am doing right now. But I don't like that approach.
What I was thinking about is splitting the collection, at least to put the likes into it's own collection referencing the origin document.
Or is there another way? I don't really like both of my possible solutions.

For this kind of query one would normally use a the Mongo position operator. Although from the docs.
Nested Arrays
The positional $ operator cannot be used for queries
which traverse more than one array, such as queries that traverse
arrays nested within other arrays, because the replacement for the $
placeholder is a single value
Thus the only way to natively do what you want is by using specific indexes.
db.test.update({},{$pull:{"categories.0.options.0.likes":"abc"}})
Unfortunately Mongo does not allow to easily get the index of a match nested document.
I would normally say that once your queries become that difficult it's probably a good idea to revisit the way you store data. Also with that many arrays to which you will be pushing data, Mongo will probably be relocating a lot of documents. This is definitely something that you want to minimize.
So at this point you will need to separate your data out into different documents and even collections.
Your first documents would look like this:
{
_id: "shtZFiTeHrPKyJ8vR",
description: "Some title",
categories: [{
id: "shtZFiTeHrPKyJ8vR",
name: "Foo",
options: ["shtZFiTeHrPKyJ8vR"]
}]
}
This way you can easily add/remove options as you mentioned in your question. You would then need a second collection with documents that represent each option.
{
_id: "shtZFiTeHrPKyJ8vR",
name: "bar",
likes: ["abc", "bce"]
}
You can learn more about references here. This is similar to what you mentioned in your comment. The benefit of this is that you are already reducing the potential amount of relocation. Depending on how you use your data you may even be reducing network usage.
Now doing updates on the likes is easy.
MyCollection.update({ _id: id}, {
$push: {likes: "value"}
});
This does, however, require you to make two queries to the db. Although on the flip side you do a lot less on the client side and a lot less bandwidth is used.
Some other questions you need to ask yourself is if that depth of nesting is really needed. There might be an easier way to go about achieving your goal that doesn't require it to become so complicated.

Link pdf with collection/database in Mongodb

I am new to MongoDB. I have following data. Empid, Name, Salary, Resume (Resume is in PDF Format).
Now I am able to insert id, name and salary using mongo shell as following.
db.test.insert({empid:100,Name:'Gaurav',Salary:1000});
I am using mongofiles command to upload resume in database.
mongofiles -d test put "C:\resume.pdf"
So I am able to insert data as well as pdf in database.
My question is how to relate/map empid 100 with resume.

As you're using the mongofiles utility to insert in FS grid, files will be put in the collection fs.files (chunks will be stored in fs.chunks). The files have to live in a different collection because grid FS uses a different engine.
Mongofiles works only with file names, so you can either store the file name and query for it like below OR you can parse the utility response after you call it.
After executing your mongofiles command you'll have:
db.fs.files.find();
{
_id: ObjectId('530191f8fc518a0ecdfd45a6'),
filename: "file.pdf",
chunkSize: 262144,
uploadDate: new Date(1392611832917),
md5: "b1ee25bcd665e2d6d7c4f4d6f08f44a3",
length: 40098
}
To link with your employee entry:
> db.test.insert({empid:100,Name:'Gaurav',Salary:1000, file: ObjectId("530191f8fc518a0ecdfd45a6")});
> db.test.find();
{ "_id" : ObjectId("530195bd58f2d10f8b6703a4"), "empid" : 100, "Name" : "Gaurav", "Salary" : 1000, "file" : ObjectId("530191f8fc518a0ecdfd45a6") }
In case you need to also specify the db and collection, use a DB Ref (http://docs.mongodb.org/manual/reference/database-references/)

You have couple options:
Add a field to your employee document that references files collection by _id:
db.test.insert({empid: 100, Name: 'Gaurav', Salary: 1000, fileId: ObjectId("53019397a8f26bc570896972")});
I prefer this option because it lets you use files for different purposes and not pollute it with fields created for specific needs like employee info. When you use mongofiles to put the file it returns back an ID of a newly created document. Use it as a value for fileId. The same will work if you use mongo driver and get back an id.
Add empid field to files collection. GridFS stores files in 2 collections: chunks and files (list of fields is in this doc). Not perfect for the same reason as option 3.
Move all fields from your employee doc to files collection - not the best practice if you plan to use files for anything other than resume storage.

Mongodb architecture required [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 9 years ago.
Improve this question
I am creating a project that requires data storage and I am considering useing MongoDB but am having trouble finding the logical / optimal way of organising the data
My simplified data needs to store Place information like so:
{place_city : "London",
place_owner: "Tim",
place_name: "Big Ben"}
{place_city : "Paris",
place_owner: "Tim",
place_name: "Eifel Tower"}
{place_city : "Paris",
place_owner: "Ben",
place_name: "The Louvre"}
And here are the main operations I need
Retrieve all my places
Retrieve all my friends places
Retrieve all my friends cities
If I use mongoDB a collection document max size is 16meg right? If that is correct then I can't store all the information in a PLACES similar to my example above right?
I would probably need to create a "OWNER" collection? like so:
{
owner: "Tim",
cities: [ {
name: "London",
places:[ {name:"Big Ben"}]
},
{
name: "Paris",
places:[ {name:"Eifel Tower"}, {name: "The Louvre"}]
}
]
}
but the problem now is that Retrieving my friends places becomes cumbersome and my friends cities even more so....
any advice from a cunning DB architect woudl be much appreciated.

The size limit of 16MB is per document, not per collection.
{place_city : "London", place_owner: "Tim", place_name: "Big Ben"}
is a very little document, so don't worry. The design of your collections depends heavily on how you query your data.

The data size limitation is per document and not per collection. Collections can easily become several GB (or even TB) large.
I would suggest you keep your data as simple as you have, like:
{place_city : "London",
place_owner: "Tim",
place_name: "Big Ben"}
{place_city : "Paris",
place_owner: "Tim",
place_name: "Eifel Tower"}
{place_city : "Paris",
place_owner: "Ben",
place_name: "The Louvre"}
I am thinking that friends are stored like this:
{
username: "Ben",
friends: [ "Tim", "Bob" ]
}
Then your three queries can be done as:
All your places: db.places.find( { place_owner: "Ben" } );
All your friends' places with two queries (pseudo code):
friends = db.friends.find( { username: "Ben" } );
// friends = [ "Tim", "Bob" ], you do need to do some code to make this change
db.places.find( { place_owner: { $in: [ "Tim", "Bob" ] } } );
All your friends' cities with two queries (pseudo code):
friends = db.friends.find( { username: "Ben" } );
db.so.distinct( 'name', { place_owner: { $in: [ "Tim", "Bob" ] } } );
Even with millions of documents, this should work fine, providing you have an index on the fields that you query for: { place_owner: 1 } and { username: 1 }.

I love MongoDB, but this data is not a good candidate for MongoDB. MongoDB does not support relationships, and that is basically all you are tracking here. Use a relational database to store the relationships.
Think of it like this: under the skin of the DBMS, MongoDB or SQL, an index is an index, a table is a table (basically). You get more performance from MongoDB not because it does the same things faster, but because you are able to use it to get your DB server to do less. (E.g. pull an entire document containing nested arrays and subdocs instead of joining a bunch of tables together). There are some fundamental differences in the way MongoDB handles updates, but for querying simple data sets most systems are going to be relatively similar. One big difference between the two rooted in the way MongoDB works is that it cannot use data in a collection as parameters for a another query, which is basically the whole point of a relational database. Since 2 of your use cases require "joins" (to "all my friends"), you need two queries.
So what you're doing with two queries is the same thing as a join, except relational databases are optimized to do this extremely efficiently; I promise you it will be slower to do this join manually, plus you're sending all data (friends' IDs) over the wire and making an extra DB connection. Now, if you could store all your friends' cities and places in a single document, MongoDB will probably be (slightly) faster than joining, but now you've got a new problem, because you now have to manage all this, anytime anyone adds a city or place all their friends have to be modified--this is unrealistic.
But there is even more to the story than that, because SQL DBMS are extremely mature applications with lots of features to improve query performance. They let you do things like create "materialize views" that store all your friends cities and places in memory and update themselves automatically any time one of their source tables is updated so you don't have to do anything special, you'd just query and you'd get your data without actually executing any joins. (A materialized table is not the right fit here, but just as an example, it is possible if you needed it.)
ALSO, when you are using MongoDB, a guideline I've found helpful is this, anytime you are asking yourself whether your document will be large enough to store all the data you will EVENTUALLY have, you probably have a design problem on your hands. If a document's growth is unbound, it should probably be enumerated within a collection instead. Or put another way, your collections should grow as your application is used, not your document's size (much).
If breaking apart your schema like this means that for primary operations you are doing a lot of manual joins, it is worth considering the question of whether or not you should be using a relational database instead.

Referencial integrity in MongoDB. Which is a better practice?

Let's say I have a Document and an author collections. I could design it in two ways:
1st way:
documents
{_id:1, title:"document 1", author:"John", age: 34}
{_id:2, title: "document 2", author: "Maria", age:42 }
{_id:3, title: "document 3", author: "John", age: 34}
authors
{_id:1, name:"John", age:34}
{_id:2, name:"Maria", age:42}
2nd way:
documents
{_id:1, title:"document 1", id_author:1}
{_id:2, title: "document 2", id_author: 2}
{_id:3, title: "document 3", id_author: 1}
authors
{_id:1, name:"John", age:34}
{_id:2, name:"Maria", age:42}
1st way is good because I don't have to simulate a Join when I retrieve a document, I have all the data in the documents collection. But, on the other hand, if I have to change Maria's age, I have to do it in both collections.
2nd way is the opposite, if I need a document and the age of it's author I need to query documents first and then authors. But the good thing is that when I have to change Maria's age I only have to do it in the authors collection.
So, which solution is better? I guess that the more fields you need in authors collection the more likely you'll be using the second way. But, if I am using the 1st way, is there a single query I can use to update the age of Maria in both collections?
Which is the most used solution?

Update in more than one collection would be a transaction. MongoDB does not support transactions.
Both ways have their own disadvantages.
The first way which is author-data inclusive may be more appropriate in logging situations where its contents won't be subject to change.
The second way is better when you expect the author's details to change or grow over time (most cases).
Like already mentioned, embedding the documents in their respective author's document would be a way to combine the 2 suggestions' benefits but may lead to problems in the long run.

The problem with the first method is updates:
{_id:1, title:"document 1", author:"John", age: 34}
I can imagine that actually you will want an author id in there as well as some of the details you need for querying (schema redundancy).
This could pose a problem, as you notice:
But, on the other hand, if I have to change Maria's age, I have to do it in both collections.
Age changes once every year at least, and if you have the age wrong, more often. Name can change as well, especially if later on you find that this "John" has a last name or his name is actually "Johnny".
So the problem with creating redundancy here is that the author document could change dramatically causing you to have to run extremely unperformant updates which could massively increase your working set at times. As to how often it would cause this I cannot say with the information provided, that will be upto you decide.
Normally a good way to create redundancy is when you need extremely rarely updated attributes in another document in your current document. This does not seem to be the case here.
The second way is normally the default way of doing this kind of randomly read and updated relationship however there is a possible third method - embedding.
You could embed the documents into the author. This depends on how many documents you are looking to store though since MongoDB has a max document size of 16Meg.
That being said a possibility is:
{
_id: {},
name: 'John',
age: 43,
documents: [
{ id: 1, title: "New Document" }
]
}
The one down side of this is the use of in-memory operations such as $pull or $push and not only that but if your document is consistently and vastly growing you could see fragmentation.
But again these are just notes for you to take in, the realiy depends upon information not provided.

I would suggest a mix of both approaches, the "static" information will be saved along with the documents collection, and the variable data will be centralized in the authors collection, only when the variable data requires to be retrieved I will use the author id to retrieve his age. Something like this:
documents
{_id:"1", title:"document 1", author:"John", authorId: "1"}
{_id:"2", title: "document 2", author: "Maria", authorId: "2"}
{_id:"3", title: "document 3", author: "John", authorId: "1"}
authors
{_id:"1", name:"John", age:34}
{_id:"2", name:"Maria", age:42}
Age is something you wouldn't required too often, but could be updated frequently therefore this will handle better both situations.
As someone else mentioned, Mongo is not transactional and you could have problems if you create the author and the document in one shot.