Spring Data Mongo - apply unique combination fields in embedded document - mongodb

I'm working on Spring Boot v2.1.3.RELEASE & Spring Data Mongo. In this example, I want to apply uniqueness on email & deptName. The combination of email & deptName must be unique and is there any way to get email out since its repeating in each array object ?
I tried below, but it's not working !
#CompoundIndexes({
#CompoundIndex(name = "email_deptName_idx", def = "{'email' : 1, 'technologyEmployeeRef.technologyCd' : 1}")
})
Sample Data
{
"_id" : ObjectId("5ec507c72d8c2136245d35ce"),
....
....
"firstName" : "John",
"lastName" : "Doe",
"email" : "john.doe#gmail.com",
.....
.....
.....
"technologyEmployeeRef" : [
{
"technologyCd" : "john.doe#gmail.com",
"technologyName" : "Advisory",
....
.....
"Status" : "A"
},
{
"technologyCd" : "john.doe#gmail.com",
"technologyName" : "Tax",
.....
.....
"Status" : "A"
}
],
"phoneCodes" : [
"+352"
],
....
....
}
Technology.java
#Data
#Builder
#AllArgsConstructor
#NoArgsConstructor
#Document
public class Technology {
#Indexed(name = "technologyCd", unique = true, sparse = true)
private String technologyCd;
#Indexed(name = "technologyName", unique = true, sparse = true)
private String technologyName;
private String status;
}
EmployeeTechnologyRef.java
#Data
#Builder
#AllArgsConstructor
#NoArgsConstructor
public class EmployeeTechnologyRef {
private String technologyCd;
private String primaryTechnology;
private String status;
}
Employee.java
#Data
#Builder
#AllArgsConstructor
#NoArgsConstructor
#Document
#CompoundIndexes({
#CompoundIndex(name="emp_tech_indx", def = "{'employeeTechnologyRefs.primaryTechnology' : 1, 'employeeTechnologyRefs.technologyCd' : 1}" ,unique = true, sparse = true)
})
public class Employee {
private String firstName;
private String lastName;
private String email;
private List<EmployeeTechnologyRef> employeeTechnologyRefs;
}
I used below code but its not giving me any error of duplicate. How can we do this ?
Technology java8 = Technology.builder().technologyCd("Java").technologyName("Java8").status("A").build();
Technology spring = Technology.builder().technologyCd("Spring").technologyName("Spring Boot2").status("A").build();
List<Technology> technologies = new ArrayList<>();
technologies.add(java8);
technologies.add(spring);
technologyRepository.saveAll(technologies);
EmployeeTechnologyRef t1 = EmployeeTechnologyRef.builder().technologyCd("Java").primaryTechnology("Y")
.status("A")
.build();
EmployeeTechnologyRef t2 = EmployeeTechnologyRef.builder().technologyCd("Spring").primaryTechnology("Y")
.status("A")
.build();
List<EmployeeTechnologyRef> employeeTechnologyRefs = new ArrayList<>();
employeeTechnologyRefs.add(t1);
employeeTechnologyRefs.add(t2);
employeeTechnologyRefs.add(t1);
Employee employee = Employee.builder().firstName("John").lastName("Kerr").email("john.kerr#gmail.com")
.employeeTechnologyRefs(employeeTechnologyRefs).build();
employeeRepository.save(employee);

In MongoDB, a unique index ensures that a particular value in a field is not present in more than one document. It will not guarantee that a value is unique across an array within a single document. This is explained here in the MongoDB Manual where it discusses unique multikey Indexes.
Thus, a unique index will not satisfy your requirement. It will prevent seperate documents from containing duplicate combinations, but it will still allow a single document to contain duplicate values across an array.
The best option you have is to change your data model so as to split the array of technologyEmployeeRef objects into separate documents. Splitting it up into separate documents will allow you to use a unique index to enforce uniqueness.
The particular implementation that should be taken for this data model change would depend upon your access pattern (which is out of the scope of this question).
One such way this could be done is to create a TechnologyEmployee collection that has all of the fields that currently exist in the technologyEmployeeRef array. Additionally, this TechnologyEmployee collection would have a field, such as email, which would allow you to associate it with a document in the Employee collection.
Sample Employee Document
{
....
....
"firstName" : "John",
"lastName" : "Doe",
"email" : "john.doe#gmail.com",
.....
.....
.....
}
Sample EmployeeTechnology Document
{
"email" : "john.doe#gmail.com",
"technologyCd" : "Java",
"technologyName" : "Java8",
....
.....
"status" : "A"
}
Index in EmployeeTechnology collection
{'email' : 1, 'technologyCd' : 1}, {unique: true}
The disadvantage of this approach is that you would need to read from two collections to have all of the data. This drawback may not be a big deal if you rarely need to retrieve the data from both collections at the same time. If you do need all the data, it can be sped up through use of indexes. With the indexes, it could be furthered sped up through the use of covered queries.
Another option is to denormalize the data. You would do this by duplicating the Employee data that you need to access at the same time as the Technology data.
Sample Documents
[
{
....
"firstName" : "John",
"lastName" : "Doe",
"email" : "john.doe#gmail.com",
.....
"technologyCd" : "Java",
"technologyName" : "Java8",
....
"status" : "A"
},
{
....
"firstName" : "John",
"lastName" : "Doe",
"email" : "john.doe#gmail.com",
.....
"technologyCd" : "Spring",
"technologyName" : "Spring Boot2",
....
"status" : "A"
}
]
In this MongoDB blog post,they say that
You’d do this only for fields that are frequently read, get read much more often than they get updated, and where you don’t require strong consistency, since updating a denormalized value is slower, more expensive, and is not atomic.
Or as you've already mentioned, it may make sense to leave the data model as it is and to perform the check for uniqueness on the application side. This could likely give you the best read performance, but it does come with some disadvantages. First, it will slow down write operations because the application will need to run some checks before it can update the database.
It may be unlikely, but there is also a possibility that you could still end up with duplicates. If there are two back-to-back requests to insert the same EmployeeTechnology object into the array, then the validation of the second request may finish (and pass) before the first request has written to the database. I have seen a similar scenario myself with an application I worked on. Even though the application was checking for uniqueness, if a user double-clicked a submit button there would end up being duplicate entries in the database. In this case, disabling the button on the first click drastically reduced the risk. This small risk may be tolerable, depending on your requirements and the impact of having duplicate entries.
Which approach makes the most sense largely depends on your access pattern and requirements. Hope this helps.

Related

How to update document in mongo to get performance?

I am new to Spring Data Mongo. I've a scenario where I want to create a Study if already not present in mongo db. If its already present, then I've to update it with the new values.
I tried in the following way, which works fine in my case, but I'm not sure this is the correct/Best/Advisable way to update etc as far as performance is concerned.
Could anyone please guide on this?
public void saveStudy(List<Study> studies) {
for (Study study : studies) {
String id = study.getId();
Study presentInDBStudy = studyRepository.findOne(id);
//find the document, modify and update it with save() method.
if(presentInDBStudy != null) {
presentInDBStudy.setTitle(task.getTitle());
presentInDBStudy.setDescription(study.getDescription());
presentInDBStudy.setStart(study.getStart());
presentInDBStudy.setEnd(study.getEnd());
repository.save(presentInDBStudy);
}
else
repository.save(study);
}
}
You will have to use the MongoTemplate.upsert() to achieve this.
You will need to add two more classes: StudyRepositoryCustom which is an interface and a class that extends this interface, say StudyRepositoryImpl
interface StudyRepositoryCustom {
public WriteResult updateStudy(Study study);
}
Update your current StudyRepository to extend this interface
#Repository
public interface StudyRepository extends MongoRepository<Study, String>, StudyRepositoryCustom {
// ... Your code as before
}
And add a class that implements the StudyRepositoryCustom. This is where we will #Autowire our MongoTemplate and provide the implementation for updating a Study or saving it if it does not exist. We use the MongoTemplate.upsert() method.
class StudyRepositoryImpl implements StudyRepositoryCustom {
#Autowired
MongoTemplate mongoTemplate;
public WriteResult updateStudy(Study study) {
Query searchQuery = new Query(Criteria.where("id").is(study.getId());
WriteResult update = mongoTemplate.upsert(searchQuery, Update.update("title", study.getTitle).set("description", study.getDescription()).set(...)), Study.class);
return update;
}
}
Kindly note that StudyRepositoryImpl will automatically be picked up by the Spring Data infrastructure as we've followed the naming convention of extending the core repository interface's name with Impl
Check this example on github, for #Autowire-ing a MongoTemplate and using custom repository as above.
I have not tested the code but it will guide you :-)
You can use upsert functionality for this as described in mongo documentation.
https://docs.mongodb.com/v3.2/reference/method/db.collection.update/
You can update your code to use <S extends T> List<S> save(Iterable<S> entites); to save all the entities. Spring's MongoRepository will take care of all possible cases based on the presence of _id field and its value.
More information here https://docs.mongodb.com/manual/reference/method/db.collection.save/
This will work just fine for basic save operations. You don't have to load the document for update. Just set the id and make sure to include all the fields for update as it updates by replacing the existing document.
Simplified Domain Object:
#Document(collection = "study")
public class Study {
#Id
private String id;
private String name;
private String value;
}
Repository:
public interface StudyRepository extends MongoRepository<Study, String> {}
Imagine you've existing record with _id = 1
Collection state before:
{
"_id" : 1,
"_class" : "com.mongo.Study",
"name" : "saveType",
"value" : "insert"
}
Run all the possible cases:
public void saveStudies() {
List<Study> studies = new ArrayList<Study>();
--Updates the existing record by replacing with the below values.
Study update = new Study();
update.setId(1);
update.setName("saveType");
update.setValue("update");
studies.add(update);
--Inserts a new record.
Study insert = new Study();
insert.setName("saveType");
insert.setValue("insert");
studies.add(insert);
--Upserts a record.
Study upsert = new Study();
upsert.setId(2);
upsert.setName("saveType");
upsert.setValue("upsert");
studies.add(upsert);
studyRepository.save(studies);
}
Collection state after:
{
"_id" : 1,
"_class" : "com.mongo.Study",
"name" : "saveType",
"value" : "update"
}
{
"_id" : 3,
"_class" : "com.mongo.Study",
"name" : "saveType",
"value" : "insert"
}
{
"_id" : 2,
"_class" : "com.mongo.Study",
"name" : "saveType",
"value" : "upsert"
}

Mongo sub-document, good practice or not?

After few months working with Mongo trying to understand if using sub-documents for nested data is good or not, especially in this example:
Assume users collection that each document into it have the following:
{
"_id" : ObjectId("some valid Object ID"),
"userName" : "xxxxx",
"email" : "xxxx#xxxx.xx"
}
Now, in my system there are also rooms (another collection) and i want to save for each user scores per room.
In my mind, to do that i have 2 major options, (1) create new collection call userScores that will hold: userId, roomId, scores fields like i did previously in MySql and other relational DB's (2) create a sub-document into the above user document:
{
"_id" : ObjectId("sdfdfdfdfdf"),
"userName" : "xxxxx",
"email" : "xxxx#xxxx.xx",
"scores": {
"roomIdX": 50,
"roomIdY": 50,
"roomIdZ": 50
}
}
What do you think is better way so later i can handle searches, aggregations and other data queries via the code (mongoose in my case)
Thanks.

Complex Grid Implementation in meteor(blaze)

First let me explain schema of my collections.
I have 3 collections
company,deal,price
I want to use information from all three collection and make a single reactive,responsive table. Here is the image
Now the schema for price collection is like this
{
"_id" : "kSqH7QydFnPFHQmQH",
"timestamp" : ISODate("2015-10-11T11:49:50.241Z"),
"dealId" : "X5zTJ2y675PjmaLMx",
"deal" : "Games",
"price" : [{
"type" : "worth",
"value" : "Bat"
}, {
"type" : "Persons",
"value" : 4
}, {
"type" : "Cost",
"value" : 5
}],
"company" : "Company1"
}
Schema for company collection is
{
"_id" : "da2da"
"name" : "Company1"
}
Schema for deal collection is
{
"_id" : "X5zTJ2y675PjmaLMx",
"name" : "Games"
}
For each company there will be 3 columns added in table(worth,persons,cost)
For each deal there will be a new row in table.
As the information is coming from 3 collections into a single table. First I want to ask is it wise to make a table from 3 different collections? If yes how could I do that in blaze?
If no. Then I will have to make table from price collection only . What should be schema of this collection in best way.
P.S in both cases I want to make table reactive.
Firstly, I recommend reywood:publish-composite for publishing related collections.
Secondly there is no intrinsic problem in setting up a table like this, you'll first figure out which collection to loop over with your {{#each}} in spacebars and then you'll define helpers that return the values from the related collections to your templates.
As far as your schema design, the choice as to whether to use nesting within a collection vs. using an entirely separate collection is typically driven by size. If the related object is overall "small" then nesting can work well. You automatically get the nested object when you publish and query that collection. If otoh it's going to be "large" and/or you want to avoid having to update every document when something in the related object changes then a separate collection can be better.
If you do separate your collections then you'll want to refer to objects from the other collection by _id and not by name since names can easily change. For example in your price collection you'd want to use companyId: "da2da" instead of company: "Company1"

The correct way of storing document reference in one-to-one relationship in MongoDB

I have two MongoDB collections user and customer which are in one-to-one relationship. I'm new to MongoDB and I'm trying to insert documents manually although I have Mongoose installed. I'm not sure which is the correct way of storing document reference in MongoDB.
I'm using normalized data model and here is my Mongoose schema snapshot for customer:
/** Parent user object */
user: {
type: Schema.Types.ObjectId,
ref: "User",
required: true
}
user
{
"_id" : ObjectId("547d5c1b1e42bd0423a75781"),
"name" : "john",
"email" : "test#localhost.com",
"phone" : "01022223333",
}
I want to make a reference to this user document from the customer document. Which of the following is correct - (A) or (B)?
customer (A)
{
"_id" : ObjectId("547d916a660729dd531f145d"),
"birthday" : "1983-06-28",
"zipcode" : "12345",
"address" : "1, Main Street",
"user" : ObjectId("547d5c1b1e42bd0423a75781")
}
customer (B)
{
"_id" : ObjectId("547d916a660729dd531f145d"),
"birthday" : "1983-06-28",
"zipcode" : "12345",
"address" : "1, Main Street",
"user" : {
"_id" : ObjectId("547d5c1b1e42bd0423a75781")
}
}
Remember these things
Embedding is better for...
Small subdocuments
Data that does not change regularly
When eventual consistency is acceptable
Documents that grow by a small amount
Data that you’ll often need to perform a second query to fetch Fast reads
References are better for...
Large subdocuments
Volatile data
When immediate consistency is necessary
Documents that grow a large amount
Data that you’ll often exclude from the results
Fast writes
Variant A is Better.
you can use also populate with Mongoose
Use variant A. As long as you don't want to denormalize any other data (like the user's name), there's no need to create a child object.
This also avoids unexpected complexities with the index, because indexing an object might not behave like you expect.
Even if you were to embed an object, _id would be a weird name - _id is only a reserved name for a first-class database document.
One to one relations
1 to 1 relations are relations where each item corresponds to exactly one other item. e.g.:
an employee have a resume and vice versa
a building have and floor plan and vice versa
a patient have a medical history and vice versa
//employee
{
_id : '25',
name: 'john doe',
resume: 30
}
//resume
{
_id : '30',
jobs: [....],
education: [...],
employee: 25
}
We can model the employee-resume relation by having a collection of employees and a collection of resumes and having the employee point to the resume through linking, where we have an ID that corresponds to an ID in th resume collection. Or if we prefer, we can link in another direction, where we have an employee key inside the resume collection, and it may point to the employee itself. Or if we want, we can embed. So we could take this entire resume document and we could embed it right inside the employee collection or vice versa.
This embedding depends upon how the data is being accessed by the application and how frequently the data is being accessed. We need to consider:
frequency of access
the size of the items - what is growing all the time and what is not growing. So every time we add something to the document, there is a point beyond which the document need to be moved in the collection. If the document size goes beyond 16MB, which is mostly unlikely.
atomicity of data - there're no transactions in MongoDB, there're atomic operations on individual documents. So if we knew that we couldn't withstand any inconsistency and that we wanted to be able to update the entire employee plus the resume all the time, we may decide to put them into the same document and embed them one way or the other so that we can update it all at once.
In mongodb its very recommended to embedding document as possible as you can, especially in your case that you have 1-to-1 relations.
Why? you cant use atomic-join-operations (even it is not your main concern) in your queries (not the main reason). But the best reason is each join-op (theoretically) need a hard-seek that take about 20-ms. embedding your sub-document just need 1 hard-seek.
I believe the best db-schema for you is using just an id for all of your entities
{
_id : ObjectId("547d5c1b1e42bd0423a75781"),
userInfo :
{
"name" : "john",
"email" : "test#localhost.com",
"phone" : "01022223333",
},
customerInfo :
{
"birthday" : "1983-06-28",
"zipcode" : "12345",
"address" : "1, Main Street",
},
staffInfo :
{
........
}
}
Now if you just want the userinfo you can use
db.users.findOne({_id : ObjectId("547d5c1b1e42bd0423a75781")},{userInfo : 1}).userInfo;
it will give you just the userInfo:
/* 0 */
{
"name" : "john",
"email" : "test#localhost.com",
"phone" : "01022223333"
}
And if you just want the **customerInfo ** you can use
db.users.findOne({_id : ObjectId("547d5c1b1e42bd0423a75781")},{customerInfo : 1}).customerInfo;
it will give you just the customerInfo :
/* 0 */
{
"birthday" : "1983-06-28",
"zipcode" : "12345",
"address" : "1, Main Street"
}
and so on.
This schema has the minimum hard round-trip and actually you are using mongodb document-based feature with best performance you can achive.

MongoDB schema design - finding the last X comments across all blog posts filtered by user

I am trying to reproduce the classic blog schema of one Post to many Comments using Morphia and the Play Framework.
My schema in Mongo is:
{ "_id" : ObjectId("4d941c960c68c4e20d6a9abf"),
"className" : "models.Post",
"title" : "An amazing blog post",
"comments" : [
{
"commentDate" : NumberLong("1301552278491"),
"commenter" : {
"$ref" : "SiteUser",
"$id" : ObjectId("4d941c960c68c4e20c6a9abf")
},
"comment" : "What a blog post!"
},
{
"commentDate" : NumberLong("1301552278492"),
"commenter" : {
"$ref" : "SiteUser",
"$id" : ObjectId("4d941c960c68c4e20c6a9abf")
},
"comment" : "This is another comment"
}
]}
I am trying to introduce a social networking aspect to the blog, so I would like to be able to provide on a SiteUser's homepage the last X comments by that SiteUser's friends, across all posts.
My models are as follows:
#Entity
public class Post extends Model {
public String title;
#Embedded
public List<Comment> comments;
}
#Embedded
public class Comment extends Model {
public long commentDate;
public String comment;
#Reference
public SiteUser commenter;
}
From what I have read elsewhere, I think I need to run the following against the database (where [a, b, c] represents the SiteUsers) :
db.posts.find( { "comments.commenter" : {$in: [a, b, c]}} )
I have a List<SiteUser> to pass in to Morphia for the filtering, but I don't know how to
set up an index on Post for Comments.commenter from within Morphia
actually build the above query
Either put #Indexes(#Index("comments.commenter")) on the Post class, or #Indexed on the commenter field of the Comment class (Morphia's Datastore.ensureIndexes() will recurse in the classes and correctly create the comments.commenter index on the Post collection)
I think ds.find(Post.class, "comments.commenter in", users) would work, ds being a Datastore and users your List<SiteUser> (I don't use #Reference though, so I can't confirm; you might have to first extract the list of their Keys).