Convert a MongoDB with two collections in a neo4j graph

Convert a MongoDB with two collections in a neo4j graph - mongodb

I finished to create my Mongo database. It is made on two collections:
1. team
2. coach
I give you an example of the documents contained in these collections:
Here is a team document:
{
"_id" : "Mil.74",
"official_name" : "Associazione Calcio Milan S.p.A",
"common_name" : "Milan",
"country" : "Italy",
"started_by" : {
"day" : 16,
"month" : 12,
"year" : 1899
},
"stadium" : {
"name" : "Giuseppe Meazza",
"capacity" : 81277
},
"palmarès" : {
"Serie A" : 18,
"Serie B" : 2,
"Coppa Italia" : 5,
"Supercoppa Italiana" : 6,
"UEFA Champions League" : 7,
"UEFA Super Cup" : 5,
"Cup Winners cup" : 2,
"UEFA Intercontinental cup" : 4
},
"uniform" : "black and red"
}
This is a coach document:
{
"_id" : ObjectId("556cec3b9262ab4f14165fcd"),
"name" : "Carlo",
"surname" : "Ancelotti",
"age" : 55,
"date_Of_birth" : {
"day" : 10,
"month" : 6,
"year" : 1959
},
"place_Of_birth" : "Reggiolo",
"nationality" : "Italian",
"preferred_formation" : "4-2-3-1",
"coached_Team" : [
{
"team_id" : "RMa.103",
"in_charge" : {
"from" : "26/june/2013",
"to" : "25/may/2015"
},
"matches" : 119
},
{
"team_id" : "PSG.00",
"in_charge" : {
"from" : "30/dec/2011",
"to" : "24/june/2013"
},
"matches" : 77
},
{
"team_id" : "Che.11",
"in_charge" : {
"from" : "01/july/2009",
"to" : "22/may/2011"
},
"matches" : 109
},
{
"team_id" : "Mil.74",
"in_charge" : {
"from" : "07/nov/2001",
"to" : "31/may/2009"
},
"matches" : 420
}
]
As you can see, I used a normalized model: every coach has an array of coached teams.
I want to convert this Mongo database into a graph database, in particular Neo4j; my goal is to show that in this highly connected domains neo4j has better performance than Mongo(For example the query:"Find the palmarès of all teams coached by Carlo Ancelotti, in mongo requires two queries, instead in neo4j it's enough to follow relationships).
I found this guide on the forum that uses Gremlin to convert a mongo collection of documents into neo4j graph automatically.The problem is that the guide talks about just one collection.
So, is it possible to generate automatically the neo4j graph starting from my mongo database(with two collections) or must I create the graph "by hand"?

Gremlin is a Domain Specific Language for working with graphs, but it is based on Groovy so you effectively have all the flexibility you want to really do whatever you want. In other words, what you can do with one MongoDB collection you can easily do with two (or however many collections you have). That was the point of the blog post referenced in one of the other answers:
http://thinkaurelius.com/2013/02/04/polyglot-persistence-and-query-with-gremlin/
Gremlin is a great language for transforming data into graph form, whatever its source format is. I would think that you would first load all of your teams as vertices then iterate through your coaches, creating coach vertices and edges to their related teams as you go.
I would also add that nothing is "automatic" about Gremlin. It's not as though you tell Gremlin that you have data in MongoDB and it turns it into a graph. You have to write Gremlin to tell it how you want your MongoDB data turned into a graph.

Related

Get a count of documents in an aggregation under specific requirement

In my collection, each document represents a user-generated quiz, and includes an array field for tags, i.e. History, Science, Math, etc. I am trying to get a count of documents associated with each tag.
The below aggregation results in a unique tag list that look like this: {tags:["History", "Science", "Math"]}
db.quizzes.aggregate([
{$unwind: "$tags"},
{$group: {_id:null, tgs: {$addToSet: "$tags"}}},
{$project: {_id:0, tags: "$tgs"}},
])
However, can the above aggregation also get a count of the number of documents that contains each tag? For example if there were 3 History quizzes, 2 Science quizzes, and 5 Math quizzes, the result would look like this: {tags:[{tag: "History", count: 3}, {tag: "Science", count: 2}, {tag: "Math", count:5}]}
Thanks in advance for any tips.
Edited to include collection documents:
{
"_id" : ObjectId("57d8ccd573099cb013b462b5"),
"title" : "Presidential Trivia",
"quiz" : "[{\"question\":\"How many presidents were members of the Whig party?\",\"choices\":[\"Two\",\"Three\",\"Four\"],\"correct\":\"2\"},{\"question\":\"Who was the first president to be impeached?\",\"choices\":[\"Warren Harding\",\"Andrew Johnson\",\"Andrew Jackson\"],\"correct\":\"1\"},{\"question\":\"How many presidents died during their presidency?\",\"choices\":[\"Four\",\"Six\",\"Eight\"],\"correct\":\"2\"},{\"question\":\"How many presidents had no party affiliation?\",\"choices\":[\"One\",\"Two\",\"Three\"],\"correct\":\"0\"},{\"question\":\"Who was the only president to serve two non-consecutive terms, making him both the 22nd and 24th president?\",\"choices\":[\"John Quincy Adams\",\"Grover Cleveland\",\"Theodore Roosevelt\"],\"correct\":\"1\"}]",
"correctArray" : "[\"2\",\"1\",\"2\",\"0\",\"1\"]",
"author" : "jake2",
"createTime" : ISODate("2016-09-14T04:06:45.118Z"),
"likes" : 0,
"avgScore" : 0,
"plays" : 3,
"private" : "0",
"tags" : [
"US Presidents",
"American History",
"History"
]
}
{
"_id" : ObjectId("57d8d08973099cb013b462b6"),
"title" : "Finance Quiz",
"quiz" : "[{\"question\":\"Which of these involves the analysis of of a business's financial statements, often used in stock valuation?\",\"choices\":[\"Fundamental Analysis\",\"Technical Analysis\",\"P/E ratio\"],\"correct\":\"0\"},{\"question\":\"What was the name of the bond purchasing program started by the U.S. Federal Reserve in response to the 2008 financial crisis?\",\"choices\":[\"Stimulus Package\",\"Quantitative Easing\",\"Mercantilism\"],\"correct\":\"1\"},{\"question\":\"Which term describes a debt security issued by a government, company, or other entity?\",\"choices\":[\"Bond\",\"Stock\",\"Mutual fund\"],\"correct\":\"0\"},{\"question\":\"Which of these companies has the largest market capitalization (as of October 2015)?\",\"choices\":[\"Ford Motors\",\"Apple\",\"Bank of America\"],\"correct\":\"1\"},{\"question\":\"Which of these is a measure of the size of an economy?\",\"choices\":[\"Purchasing Power Index\",\"Unemployment Rate\",\"Gross Domestic Product\"],\"correct\":\"2\"}]",
"correctArray" : "[\"0\",\"1\",\"0\",\"1\",\"2\"]",
"author" : "jake2",
"createTime" : ISODate("2016-09-14T04:22:33.756Z"),
"tags" : [
"Finance"
],
"likes" : 0,
"avgScore" : 0,
"plays" : 10,
"private" : "0"
}
{
"_id" : ObjectId("57d8d24073099cb013b462b8"),
"title" : "Astronomy Pop Quiz",
"quiz" : "[{\"question\":\"Which of the following are currently (as of November 2015) used by scientists as observational evidence of the existence of dark matter?\",\"choices\":[\"Gravitational Lensing\",\"Specimens of dark matter collected by NASA\",\"Anomalies in planetary orbits\"],\"correct\":\"0\"},{\"question\":\"Which of these emits the most energy?\",\"choices\":[\"Stars\",\"Quasars\",\"Black Holes\"],\"correct\":\"1\"},{\"question\":\"What is it called when light or electromagnetic radiation from an object is increased in wavelength?\",\"choices\":[\"The Jupiter Effect\",\"Redshift\",\"The Observer's Differential\"],\"correct\":\"1\"},{\"question\":\"Who was the first human in space?\",\"choices\":[\"Yuri Gagarin\",\"Alan Shepard\",\"John Glenn\"],\"correct\":\"0\"},{\"question\":\"Which of these is the most dense?\",\"choices\":[\"The Sun\",\"A neutron star\",\"Earth\"],\"correct\":\"1\"}]",
"correctArray" : "[\"0\",\"1\",\"1\",\"0\",\"1\"]",
"author" : "Bertram",
"createTime" : ISODate("2016-09-14T04:29:52.636Z"),
"tags" : [
"Astronomy"
],
"likes" : 1,
"avgScore" : 0,
"plays" : 5,
"private" : "0"
}
{
"_id" : ObjectId("57d8d3c173099cb013b462ba"),
"title" : "Film Trivia",
"quiz" : "[{\"question\":\"Who directed The Godfather trilogy?\",\"choices\":[\"John Huston\",\"Francis Ford Coppola\",\"Martin Scorsese\"],\"correct\":\"1\"},{\"question\":\"What year was the first Ocscar awarded?\",\"choices\":[\"1923\",\"1927\",\"1932\"],\"correct\":\"1\"},{\"question\":\"As of 2010, this and Schindler's List (1993) are the only films to win Best Picture, Director and Screenplay at the Golden Globes, BAFTAs and the Oscars.\",\"choices\":[\"Rain Man\",\"Slumdog Millionaire\",\"Titanic\"],\"correct\":\"1\"},{\"question\":\"In Casablanca, why can't Rick return to America?\",\"choices\":[\"He is indebted to the mob.\",\"He was deported.\",\"No reason is given.\"],\"correct\":\"2\"},{\"question\":\"What was the highest-grossing Western of all time?\",\"choices\":[\"Django Unchained\",\"True Grit\",\"Dances with Wolves\"],\"correct\":\"2\"}]",
"correctArray" : "[\"1\",\"1\",\"1\",\"2\",\"2\"]",
"author" : "Pappy2",
"createTime" : ISODate("2016-09-14T04:36:17.950Z"),
"tags" : [
"Movies"
],
"likes" : 1,
"avgScore" : 0,
"plays" : 8,
"private" : "0"
}
{
"_id" : ObjectId("57ea7f67a58303f01a585e55"),
"title" : "US History Concepts",
"quiz" : "[{\"question\":\"\",\"choices\":[\"\",\"\",\"\"]}]",
"correctArray" : "[]",
"author" : "martha",
"createTime" : ISODate("2016-09-27T14:17:11.627Z"),
"tags" : [
"US History",
"History"
],
"likes" : 0,
"avgScore" : 0,
"plays" : 1,
"private" : "0"
}

You can try the following aggregation pipeline.
db.quizzes.aggregate([
{"$unwind":"$tags"},
{"$group":{"_id":"$tags", count:{$sum:1}}},
{"$project":{"_id":0, "tags":{"tag":"$_id","count":"$count"}}},
{"$group":{"_id":null, "tags":{"$push":"$tags"}}},
{"$project":{"_id":0, tags:1}}
])

mongodb filtering the first subarray in several documents

I have the various documents mongodb like this.
this is the document.
{
"_id" : 22,
"stock" : [
{
"id" : "41u",
"qty" : 10,
"price":12
},
{
"id" : "65u",
"qty" : 14,
"price":37
}
]
}
{
"_id" : 52,
"stock" : [
{
"id" : "34u",
"qty" : 10,
"price":33
},
{
"id" : "89u",
"qty" : 14,
"price":96
}
]
}
In all documents I need to find the minimum element . Therefore , you must be:
{
"_id" : 22,
"stock" : [
{
"id" : "41u",
"qty" : 10,
"price":12
}
]
}
{
"_id" : 52,
"stock" : [
{
"id" : "34u",
"qty" : 10,
"price":33
}
]
}
I am again with mongodb
mongodb in the documentation I found examples of mapreduce ,
I welcome your comments

First of all I want to tell you, map reduce jobs are for distributed systems & large dataset. According to MongoDB's documentation:
For most aggregation operations, the Aggregation Pipeline provides
better performance and more coherent interface. However, map-reduce
operations provide some flexibility that is not presently available in
the aggregation pipeline.
Since you are not applying complex functions and I am assuming you are not making a cluster or in other words in distributed system, you can use Aggregation Framework (correct me if my assumption is wrong).
Now coming to your question, please check this similar question.

Update a document (quickly) - MongoDB

Consider this MongoDB document:
{
"_id" : "RMa.103",
"official_name" : "Real Madrid Club de Fùtbol",
"country" : "Spain",
"started_by" : {
"day" : 6,
"month" : 3,
"year" : 1902
},
"stadium" : {
"name" : "Santiago Bernabeu",
"capacity" : 85454
},
"palmarès" : {
"La Liga" : 32,
"Copa del Rey" : 19,
"Supercopa de Espana" : 9,
"UEFA Champions League" : 10,
"UEFA Europa League" : 2,
"UEFA Super Cup" : 2,
"FIFA Club World cup" : 4
},
"uniform" : "white"
}
I forgot to insert an important information of the team: the common name.
So, I updated the document:
[1] db.team.update({_id:"RMa.103"}, {$set:{common_name:"Real Madrid"}})
In this way, the new information is added at the end of the document, instead I want it after the official_name:
{
"_id" : "RMa.103",
"official_name" : "Real Madrid Club de Fùtbol",
"common_name" : "Real Madrid"
.......
.......
.......
}
Now, I know that updating the document with the following method, I have the common name of the team in the right location:
db.team.update(
{_id:"RMa.103"}, {$set:{ "_id" : "RMa.103",
"official_name": "Real Madrid Club de Fùtbol",
common_name:"Real Madrid", "country" : "Spain",
"started_by" : { "day" : 6, "month" : 3, "year" : 1902 },
"stadium" : { "name" : "Santiago Bernabeu", "capacity" : 85454 },
"palmarès" : { "La Liga" : 32, "Copa del Rey" : 19, "Supercopa de Espana" : 9,
"UEFA Champions League" : 10, "UEFA Europa League" : 2, "UEFA Super Cup" : 2,
"FIFA Club World cup" : 4 }, "uniform" : "white", "common_name" : "Real Madrid" }})
I have to update a lot of documents and this kind of operation is very difficult and boring from the shell of the prompt. Are there faster methods to do this update? For example is it possible to change the update method of the [1]?

From the JSON documentation :
An object is an unordered set of name/value pairs.
So there is absolutely no way to enforce any kind of key order in JSON therefore it's the same for MongoDB documents.
It's useless : any MongoDB driver will give you the corresponding value with a given key. The key order is pointless.

No, not really. You should not have to have it in a specific order. When querying from a application, it won't care what order your values are in. For example:
db.collection("team").find({offical_name:"Real Madrid Club de Fùtbol"},function(err,results)
{
/*Gets the first result (should only be one)
and prints the common name*/
console.log(results[0].common_name);
});
If you are wondering why they would not have a way to order it for human eyes, you must remember that databases are not designed to be interacted with directly. They are supposed to be in conjunction with an application that uses them and presents the data in a human friendly manner.
I hope I answered your question sufficiently, if not please comment below and I will explain more.

want to merge two collection in mongo db using map reduce

I have two collection as bellow products has reference of user. i search product by name & in return i want combine output of product and user using map reduce method
user collection
{
"_id" : ObjectId("52ac5dd1fb670c2007000000"),
"company" : {
"about" : "This is textile machinery dealer",
"contactAddress" : [{
"address" : "abcd",
"city" : "52ac4bc6fb670c1007000000",
"zipcode" : "39as46as80"
},{
"address" : "abcd",
"city" : "52ac4bc6fb670c1007000000",
"zipcode" : "39as46as80"
}],
"fax" : "58784868",
"mainProducts" : "ads,asd,asd",
"mobileNumber" : "9537236588",
"name" : "krishna steels",
}
"user" : ObjectId("52ac4eb7fb670c0c07000000")
}
product colletion
{
"_id" : ObjectId("52ac5722fb670cf806000002"),
"category" : "52a2a9cc48a508b80e00001d",
"deliveryTime" : "10 days after received the ",
"price" : {
"minPrice" : "2000",
"maxPrice" : "3000",
"perUnit" : "5288ac6f7c104203e0976851",
"currency" : "INR"
},
"productName" : "New Mobile Solar Charger with Carabiner",
"rejectReason" : "",
"status" : 1,
"user" : ObjectId("52ac4eb7fb670c0c07000000")
}

This cannot be done. Mongo support Map Reduce only on one collection. You could try to fetch and merge in a java collection. Couple of days back I solved a similar problem using java collection.
Click to see similar response about joins and multi collection not supported in mongo.

This can be done using two map reduces.
You run your first MR and then you reduce out the second MR onto the results of the first.
You shouldn't do this though. JOINs are not designed to be done through MR, in fact it sounds like you are trying to do this MR with inline output which in itself is a very bad idea.
MRs are not designed to run inline to the application.
You would be better off doing the JOIN else where.

Get nested fields with MongoDB shell

I've "users" collection with a "watchlists" field, which have many inner fields too, one of that is "arrangeable_values" (the second field within "watchlists").
I need to find for each user in "users" collection, each "arrangeable_values" within "watchlists".
How can I do that with mongodb shell ?
Here is an example of data model :
> db.users.findOne({'nickname': 'superj'})
{
"_id" : ObjectId("4f6c42f6018a590001000001"),
"nickname" : "superj",
"provider" : "github",
"user_hash" : null,
"watchlists" : [
{
"_id" : ObjectId("4f6c42f7018a590001000002"),
"arrangeable_values" : {
"description" : "My introduction presentation to node.js along with sample code at various stages of building a simple RESTful web service with journey, cradle, winston, optimist, and http-console.",
"tag" : "",
"html_url" : "https://github.com/indexzero/nodejs-intro"
},
"avatar_url" : "https://secure.gravatar.com/avatar/d43e8ea63b61e7669ded5b9d3c2e980f?d=https://a248.e.akamai.net/assets.github.com%2Fimages%2Fgravatars%2Fgravatar-140.png",
"created_at" : ISODate("2011-02-01T10:20:29Z"),
"description" : "My introduction presentation to node.js along with sample code at various stages of building a simple RESTful web service with journey, cradle, winston, optimist, and http-console.",
"fork_" : false,
"forks" : 13,
"html_url" : "https://github.com/indexzero/nodejs-intro",
"pushed_at" : ISODate("2011-09-12T17:54:58Z"),
"searchable_values" : [
"description:my",
"description:introduction",
"description:presentation",
"html_url:indexzero",
"html_url:nodejs",
"html_url:intro"
],
"tags_array" : [ ],
"watchers" : 75
},
{
"_id" : ObjectId("4f6c42f7018a590001000003"),
"arrangeable_values" : {
"description" : "A Backbone alternative idea",
"tag" : "",
"html_url" : "https://github.com/maccman/spine.todos"
},
"avatar_url" : "https://secure.gravatar.com/avatar/baf018e2cc4616e4776d323215c7136c?d=https://a248.e.akamai.net/assets.github.com%2Fimages%2Fgravatars%2Fgravatar-140.png",
"created_at" : ISODate("2011-03-18T11:03:42Z"),
"description" : "A Backbone alternative idea",
"fork_" : false,
"forks" : 31,
"html_url" : "https://github.com/maccman/spine.todos",
"pushed_at" : ISODate("2011-11-20T22:59:45Z"),
"searchable_values" : [
"description:a",
"description:backbone",
"description:alternative",
"description:idea",
"html_url:https",
"html_url:github",
"html_url:com",
"html_url:maccman",
"html_url:spine",
"html_url:todos"
],
"tags_array" : [ ],
"watchers" : 139
}
]
}

For the document above, the following find() query would extract both the "nickname" of the document, and its associated "arrangeable_values" (where the document is in the users collection):
db.users.find({}, { "nickname" : 1, "watchlists.arrangeable_values" : 1 })
The result you get for your single document example would be:
{ "_id" : ObjectId("4f6c42f6018a590001000001"), "nickname" : "superj",
"watchlists" : [
{ "arrangeable_values" : { "description" : "My introduction presentation to node.js along with sample code at various stages of building a simple RESTful web service with journey, cradle, winston, optimist, and http-console.", "tag" : "", "html_url" : "https://github.com/indexzero/nodejs-intro" } },
{ "arrangeable_values" : { "description" : "A Backbone alternative idea", "tag" : "", "html_url" : "https://github.com/maccman/spine.todos" } }
] }

MongoDB queries return entire documents. You are looking for a field inside an array inside of the document and this will break the find().
The problem here is that any basic find() query, will return all matching documents. The find() does have the option to only return specific fields. But that will not work with your array of sub-objects. You could returns watchlists, but not watchlist entries that match.
As it stands you have two options:
Write some client-side code that loops through the documents and does the filtering. Remember that the shell is effectively a javascript driver, so you can write code in there.
Use the new aggregation framework. This will have a learning curve, but it can effectively extract the sub-items you're looking for.