I have a graph with approximately nine million nodes, and twelve million relationships. For each of the nodes within the graph there is a subset of properties for each respective node which forms a unique identity for the node, by Label. The graph is being updated by various data sources which augment existing nodes within the graph, or create new nodes if the nodes don't exist. I don't want to create duplicates according to the unique set of properties within the graph during the update.
For example, I have People in the graph, and their uniqueness is determined by their first name and last name. The following code is to create two distinct people:
CREATE (p:Person{first:"barry",last:"smith",height:187});
CREATE (p:Person{first:"fred",last:"jones",language:"welsh"});
Later, from one of the data sources I receive the following data records (one per line):
first: "fred", last: "lake", height: 201
first: "barry", last: "smith", language: "english"
first: "fred", last: "jones", language: "welsh", height: 188
first: "fred", last: "jones", eyes: "brown"
first: "barry", last: "smith"
After updating the graph I want to have the following nodes:
(:Person{first:"fred",last:"jones",language:"welsh",height:"188,eyes:"brown"})
(:Person{first:"barry",last:"smith",language"english",height:187})
(:Person{first:"fred",last"lake",height:201})
I'm trying to formulate a MERGE query which can do this kind of update. I have come up with the following approach:
Start with a MERGE that uses the uniqueness properties (first and last from the example) to find or create the initial node;
Then do a SET containing each property defined in the incoming record.
So, for the three examples records given above:
MERGE (p:Person{first:"fred",last:"lake"}) SET p.height = 201;
MERGE (p:Person{first:"barry",last:"smith"}) SET p.language = "english";
MERGE (p:Person{first:"fred",last:"jones"}) SET p.language = "welsh", p.height = 188;
MERGE (p:Person{first:"fred",last:"jones"}) SET p.eyes = "brown";
MERGE (p:Person{first:"barry",last:"smith"});
I've tried this out and it works, but I'm curious to know whether this is the best way (most efficient...) to ensure uniqueness in the nodes based on a set of properties, and allow additional information to be added (or not) as updates come in over time?
Just a naive approach: what if you run a MERGE and just create or update it?
Given your list of records, consider each record as a map:
{ first: "fred", last: "lake", height: 201 }
{ first: "barry", last: "smith", language: "english" }
{ first: "fred", last: "jones", language: "welsh", height: 188 }
{ first: "fred", last: "jones", eyes: "brown" }
{ first: "barry", last: "smith" }
Then write your query in a parametric way:
MERGE (p:Person { first: { map }.first, last: { map }.last })
ON CREATE SET p = { map }
ON MATCH SET p += { map }
Description of the query:
In case of creation it should create a new node using all the properties passed in the {map}
In case of matching it should add new properties to the node without deleting any
I've run some queries in console of the page linked above with a MERGE ON MATCH and it seems to update existing properties to new values.
The queries I've run are the following:
MATCH (peter { name: 'Peter' }) RETURN peter
MERGE (peter { name: 'Peter' }) ON MATCH SET peter += { hungry: TRUE , position: 'Entrepreneur' }
MATCH (peter { name: 'Peter' }) RETURN peter
// added two new properties here
MERGE (peter { name: 'Peter' }) ON MATCH SET peter += { hungry: FALSE , position: 'Entrepreneur' }
MATCH (peter { name: 'Peter' }) RETURN peter
// hungry is now false in here
I'd say that this is the best way. Depending on the Neo4j interface you are using, you could write a single query that would handle everything without custom SET commands, but I'm guessing that you were just simplifying the question and have that covered.
Related
I'm new to MongoDb so I'm not sure what's the best approach regarding the following:
I have a MongoDb document which contains multiple fields, including a map/dictionary.
e.g. -> priceHistogram:
rents {
_id:"1234",
city:"London",
currentPrice:"500",
priceHistogram: {"14-02-2021" : "500"}
}
I would like to update the currentPrice field with the latest price but also add to the price histogram taday's date and the price'; e.g. if today's price would be 600, I would like to obtain the following:
rents {
_id:"1234",
city:"London",
currentPrice:"600",
priceHistogram: {"14-02-2021" : "500", "20-02-2021" ": "600"}
}
What would be the most efficient MongoDb function/approach allowing me to achieve this (everything else remains the same - _id/city)?
Thank you
Not sure how your schema looks like, I will assume that the schema looks similar to:
const rentsSchema = mongoose.Schema(
{
city: { type: String, required: true },
currentPrice: {type: String},
priceHistogram: {type: Map, of:String}
}
)
const rents = mongoose.model('Rents', histSchema);
And the update:
rents.updateOne({city:"London"},{
currentPrice:"600",
"priceHistogram.24-02-2021": "600"
})
Since as I have understood Map is another way to add arbitrary properties.
I am trying to implement a collection in meteor/mongo which is of following nature:
FIRST_NAME-------LAST_NAME-------------CLASSES----------PROFESSORS
----------A-----------------------B------------------------------a---------------------b
-------------------------------------------------------------------c---------------------d
-------------------------------------------------------------------e---------------------f
-------------------------------------------------------------------g---------------------h
-------------M-------------------------N------------------------c---------------------d
-------------------------------------------------------------------p---------------------q
-------------------------------------------------------------------x---------------------q
-------------------------------------------------------------------m---------------------n
-------------------------------------------------------------------r---------------------d
So as above, a person can take multiple classes and a class can have multiple people. Now, I want to make this collection searchable and sortable by all possible fields. (Also that one professor can teach multiple classes.)
Searching by FIRST_NAME and LAST_NAME is easy in above shown model. But, I should be able to see all student depending on the class I select. I would also want to see list of classes sorted in alphabetical order and also the people enrolled in corresponding classes?
Can you please let me know how to approach this in a meteor/mongo style? I would also be glad if you could lead me to any resources available on this?
You are describing one of the typical data structures which are better suited for a relational database. But don't worry. For reasonably sized data sets it is quite workable in MongoDB too.
When modelling this type of structure in a document database you use embedding, which does lead to data duplication, but this data duplication is typically not a problem.
Pseudo-code for your model:
Collection schoolClass: { // Avoid the reserved word "class"
_id: string,
name: string,
students: [ { _id: string, firstName: string, lastName: string } ],
professor: { _id: string, firstName: string, lastName: string }
}
Collection student: {
_id: string,
firstName: string,
lastName: string,
classes: [ { _id: string, name: string } ]
}
Collection professor: {
_id: string,
firstName: string,
lastName: string,
classes: [ { _id: string, name: string } ]
}
This gives you easily searchable/sortable entry points to all objects. You only follow the "relation" _id to the next collection if you need some special data from an object. All data needed for all documents in the common queries should be present in the Collection the query is run on.
You just need to make sure you update all the relevant collections when an object changes.
A good read is https://docs.mongodb.com/manual/core/data-modeling-introduction/
I am very new to meteor.js and try to build an application with it. This time I wanted to try it over MEAN stack but at this point I am struggled to understand how to join two collection on server side...
I want very identical behaviour like mongodb populate to fetch some properties of inner document.
Let me tell you about my collection it is something like this
{
name: 'Name',
lastName: 'LastName',
anotherObject: '_id of another object'
}
and another object has some fields
{
neededField1: 'asd',
neededField2: 'zxc',
notNeededField: 'qwe'
}
So whenever I made a REST call to retrieve the first object I want it contains only neededFields of inner object so I need join them at backend but I cannot find a proper way to do it.
So far while searching it I saw some packages here is the list
Meteor Collections Helper
Publish with Relations
Reactive joins in Meteor (article)
Joins in Meteor.js (article)
Meteor Publish Composite
You will find the reywood:publish-composite useful for "joining" related collections even though SQL-like joins are not really practical in Mongo and Meteor. What you'll end up with is the appropriate documents and fields from each collection.
Using myCollection and otherCollection as pseudonyms for your two collections:
Meteor.publishComposite('pseudoJoin', {
find: function() {
return myCollection.find();
},
children: [
{
find: function(doc) {
return otherCollection.find(
{ _id: post.anotherObject },
{ fields: { neededField1: 1, neededField2: 1 } });
}
}
]
});
Note that the _id field of the otherCollection will be included automatically even though it isn't in the list of fields.
Update based on comments
Since you're only looking to return data to a REST call you don't have to worry about cursors or reactivity.
var myArray = myCollection.find().fetch();
var myOtherObject = {};
var joinedArray = myArray.map(function(el){
myOtherObject = otherCollection.findOne({ _id: el.anotherObject });
return {
_id: el._id,
name: el.name,
lastName: el.lastName,
neededField1: myOtherObject.neededField1,
neededField2: myOtherObject.neededField2
}
});
console.log(joinedArray); // These should be the droids you're looking for
This is based on a 1:1 relation. If there are many related objects then you have to repeat the parent object to the number of children.
If I have an incoming JSON of following structure
[
{
"personId" : 12,
"name": "John Doe",
"age": 48,
"birthdate": "12/1/1954",
"relationships": [
{
"relationType":"parentOf",
"value" : "Johnny walker"
},
{
"relationType":"sonOf",
"value" : "Charles Smith"
}
]
},
{
"personId" : 13,
"name": "Merry Christmas",
"age": 28,
"birthdate": "12/1/1985",
"relationships": [
{
"relationType":"sisteOf",
"value" : "Will Smith"
},
{
"relationType":"cousinOf",
"value" : "Brad Pitt"
}
]
}
]
And requirement is that for each Person record controller will have to carve out relationships array and store each record from it in a separate relationship table with personId association while persisting this incoming JSON.
And subsequently when querying these persons records system will have to lookup relationships for each person from relationships table and inject them to form the same above looking JSON to give back to UI for rendering.
What's the best efficient way to perform this "carve out" and later "inject" operations using Play framework in Scala? (using Slick in persistent layer APIs) I have looked at this JSON transformation link and json.pickBranch in there but not quite sure if that'll be fully applicable here for "carve out" and "inject" use cases for preparing JSON shown in the example. are there any elegant ideas?
One way, which is pretty straightforward, is to use case classes along with Play Json inceptions
import play.api.libs.json.Json
case class Relationship(relationType: String, value: String)
object Relationship {
implicit val RelationshipFormatter = Json.format[Relationship]
}
case class Person(personId: String, name: String, age: Int, birthdate: String, relationships: Seq[Relationship]) {
def withRelationships(relationship: Seq[Relationship]) = copy(relationships = relationships ++ relationship)
}
object Person {
implicit val PersonFormatter = Json.format[Person]
}
Now you can convert a json value to Person by using the following code, provided that jsValue is a json value of type JsValue (which in play controllers you can get by request.body.asJson):
Json.fromJson[Person](jsValue)
In Play controllers, you can
For converting a Person to json you can use the following code provided that person is a value of type Person in your context:
Json.toJson(person)
The only remaining thing is your Slick schemas which is pretty straight forward.
One option is to use a simple schema for Person, without relations and one schema for Relation with a foreign key to Person table. Then you need to find all relations associated with a specific Person, namely person and then append them to that person by calling the withRelationships method which gives you a new Person which you can serve as json:
val person = .... // find person
val relations = .... // find relationships associated with this person
val json = Json.toJson(person.withRelationships(relations))
var FamilySchema = new Schema({
members: [String],
indexedOn: {
type: Date,
default: Date.now
},
updatedOn: {
type: Date,
default: Date.now
}
});
As a crude example, I have a Family that has many members, so I use a schema like the one shown above. But there can be THOUSANDS of members in one family and a member can be in ONLY one family. So every time I come across a new member, I have to search to see if he belongs to any Families and if he does, add him. If he doesn't, I have to create a new family and add him.
This seems like an extremely inefficient way to do things. Is there a better design for this sort of use case?
You could use an array and index the field of members.
Or, here's a very common MongoDB modeling technique that avoids using an array (and means that you can have richer structures for a given family member). Create a Family and a FamilyMember. As you said that each family member may only be in one family, you would add a field to the FamilyMemberSchema as a reference to the Family (using ref as shown below).
var FamilySchema = new Schema({
name: String,
indexedOn: {
type: Date,
default: Date.now
},
updatedOn: {
type: Date,
default: Date.now
}
});
var FamilyMemberSchema = new Schema({
name: String,
family_id: { type: Schema.Types.ObjectId, ref: 'Family' }
});
// you might want an index on these fields
FamilyMemberSchema.index({ family_id: 1, name: 1});
var Family = mongoose.Model('Family', FamilySchema);
var FamilyMember = mongoose.Model('FamilyMember', FamilyMemberSchema);
You could then use a query to fetch all Family Members for a particular family:
FamilyMember.find().where('family_id', 'AFAMILYID').exec(/* callback */);
You wouldn't need to use the ref much as using the populate functionality wouldn't be particularly useful in your situation (http://mongoosejs.com/docs/populate.html), but it documents the schema definition better, so I'd use it.
You can use two collections, one for families and other for members. You can use a field in members collection in order to link them with one family (by "_id" for instance) of the other collection.
When you have to add new element you can search on "members" collections if the element already exists. An index could help to speed up the query.