How to create one to many relation that will be updated? - mongodb

I have the following entities with relation one to many - one folder can have many files.
folder
{
string id;
string name;
string owner;
}
file
{
string id;
string name;
}
How can I create this relation by taking into consideration the following rules?
The folder is a separate entity, which can be updated or displayed on its own, so it can't be part of the file document, it must remain in separate collection.
When I query all the the files, I want them to be sorted by the folder.name. This query must have pagination.
I want to display only files that are in folders owned by the caller.
As I don't want to use join in document database, and I want to get all the data in one query, I was thinking about copying the folder document in the file:
file
{
string id;
string title;
Folder folder;
}
This way, I can query by including all the listed rules. And I think this would be perfect solution, if the folder name was static, but the folder.name is something that the user can update. And if the folder.name is updated, the relation in the file will remain with invalid/old data.
So, what is the correct approach in this situation? Should I implement a mechanism for updating the relation ones the folder.name is updated?
Update:
I am expecting, a lot of queries for listing of the files, and less updates of the folde.names. The most important think for me, is that I need a design that will allow me to query, sort and use pagination on the files collection. So, at this point I am researching what are the options with their pros and cons.

Related

Simple Tagging Implementation with Spring Data JPA/Rest

I am trying to come up with a way of implementing tags for my entity that works well for me and need some help in the process. Let me write down some requirements I have in mind:
Firstly, I would like tags to show in entities as a list of strings like this:
{
"tags": ["foo", "bar"]
}
Secondly, I need to be able to retrieve a set of available tags across all entities so that users can easily choose from existing tags.
The 2nd requirement could be achieved by creating a Tag entity with the value of the Tag as the #Id. But that would make the tags property in my entity a relation that requires an extra GET operation to fetch. I could work with a getter method that resolves all the Tags and returns only a list of strings, but I see two disadvantages in that: 1. The representation as a list of strings suggests you could store tags by POSTing them in that way which is not the case. 2. The process of creating an entity requires to create all the Tags via a /tags endpoint first. That seem rather complicated for such a simple thing.
Also, I think I read somewhere that you shouldn't create a repository for an entity that isn't standalone. Would I create a Tag and only a Tag at any point in time? Nope.
I could store the tags as an #ElementCollection in my entity. In this case I don't know how to fulfill the 2nd requirement, though.
#ElementCollection
private Set<String> tags;
I made a simple test via EntityManager but it looks like I cannot query things that are not an #Entity in a result set.
#RestController
#RequestMapping("/tagList")
#RequiredArgsConstructor(onConstructor = #__(#Autowired))
public class TagListController implements RepresentationModelProcessor<RepositoryLinksResource> {
#PersistenceContext
private final #NonNull EntityManager entityManager;
#RequestMapping(method = RequestMethod.GET)
public ResponseEntity<EntityModel<TagList>> get() {
System.out.println(entityManager.createQuery("SELECT t.tags FROM Training t").getFirstResult());
EntityModel<TagList> model = EntityModel.of(new TagList(Set.of("foo", "bar")));
model.add(linkTo(methodOn(TagListController.class).get()).withSelfRel());
return ResponseEntity.ok(model);
}
}
org.hibernate.QueryException: not an entity
Does anyone know a smart way?
The representation as a list of strings suggests you could store tags by POSTing them in that way which is not the case
This is precisely the issue with using entities as REST resource representations. They work fine until it turns out the internal representation (entity) does not match the external representation (the missing DTO).
However, it would probably make most sense performance-wise to simply use an #ElementCollection like you mentioned, because you then don't have the double join with a join table for the many-to-many association (you could also use a one-to-many association where the parent entity and the tag value are both part of the #Id to avoid a join table, but I'm not sure it's convenient to work with. Probably better to just put a UNIQUE(parent_id, TAG) constraint on the collection table, if you need it). Regarding the not an entity error, you would need to use a native query. Assuming that you have #ElementCollection #CollectionTable(name = "TAGS") #Column(name = "TAG") on tags, then SELECT DISTINCT(TAG) FROM TAGS should do the job.
(as a side note, the DISTINCT part of the query will surely introduce some performance penalty, but I would assume the result of that query is a good candidate for caching)

Avoid multiple transactions within single execution when updating entities in MongoDB

I am working with 2 MongoDB collections, Recipe, and Menu. A single Menu is a combination of Recipe. Refer the below code segment for more information
#Document
public class Recipe {
private String id;
private String name;
private String description;
// getter and setter
}
#Document
public class Menu {
private String id;
private String name;
private List<RecipeItem> recipeItem;
// getter and setter
}
public class RecipeItem {
private String id;
private String name;
private String description;
// getter and setter
}
RecipeItem is just a copy of the Recipe object which is referred within the Menu collection
When the Menu collection is saved, you can add recipes to the menu and therefore a list of Recipe objects will also be saved within the Menu collection in the name of RecipeItem. When any of the Recipe is updated, the corresponding RecipeItem which is in the Menu is also required to be updated. Otherwise, the recipe within the Menu becomes outdated compared to the current Recipe after updating. So I have to iterate Menu collection which contains the updated Recipe by Id and needs to update the recipe information within the Menu collection.
So the update Menu function will initiate multiple transactions within the single execution and therefore we are in a need of a rollback mechanism as well. So I am not very fond of this approach.
I am new to MongoDB and I want to verify whether the current database design of Menu and Recipe is correct or incorrect? If yes what will be the optimal way of doing it? I know that use a DB ref between collections can be used, but there is a performance impact on it.
The Menu document should store a list of Recipe s IDs rather than the recipes themselves. Then you can dispense with RecipeItem and use Recipe directly.
It would seem more sensible that a Recipe consists of RecipeItems (Apple tart consists of flour, sugar, eggs, apples etc.).
In any case a reference would remove the need to keep two lists in sync.

EF, Repositories and crossing aggregate boundaries

I have a two aggregate roots in my domain, and therefore two repositories. We'll call them BookRepository, and AuthorRepository, for the sake of example.
I'm designing an MVC application, and one page has to display a table containing a list of authors, with each row showing the author's personal details. At the end of each row is a small button that can be clicked to expand the row and show a child table detailing the author's published books.
When the page loads, some ajax is executed to retrieve the Author details from an API controller and display the data in the table. Each property in an Author object maps almost directly to a column, with one exception, and this is where I'm having my problem. I want the button at the end of each row to be disabled, if and only if the author has no published books. This means that a boolean has to returned with each Author record, indicating if they have any published books.
My book repository has a couple of methods like this:
public IEnumerable<Book> GetBooksForAuthor(int authorId);
public bool AnyBooksForAuthor(int authorId);
and my Book class has a property called AuthorId, so I can retrieve a book's author by calling
authorRepository.GetById(book.AuthorId);
My problem is that in order to create a row for my aforementioned table, I need to create it like this:
IEnumerable<Author> authors = authorRepository.GetAll();
foreach (Author author in authors)
{
yield return new AuthorTableRow
{
Name = author.Name,
Age = author.Age,
Location = author.PlaceOfResidence.Name,
HasBooks = this.bookRepository.AnyBooksForAuthor(author.Id)
};
}
The above code seems correct, but there's a fairly heft performance penalty in calling this.bookRepository.AnyBooksForAuthor(author.Id) for every single author, because it performs a database call each time.
Ideally, I suppose I would want an AuthorTableRowRepository which could perform something like the following:
public IEnumerable<AuthorTableRow> GetAll()
{
return from a in this.dbContext.Authors
select new AuthorTableRow
{
Name = a.Name,
Age = a.Age,
Location a.PlaceOfResidence.Name
HasBooks = a.Books.Any()
});
}
I'm hesitant to put this in place for these reasons :
AuthorTableRowRepository is a repository of AuthorTableRows, but AuthorTable row is not a domain object, nor an aggregate root, and therefore should not have its own repository.
As Author and Book are both aggregate roots, I removed the "Books" property from the Author entity, because I wanted the only way to retrieve books to be via the BookRepository. This makes HasBooks = a.Books.Any() impossible. I am unsure whether I am imposing my own misguided best practice here though. It seems wrong to obtain Books by obtaining an Author via the AuthorRepository and then going through its Books property, and vice versa in obtaining an Author via a property on a Book object. Crossing aggregate root boundaries would be the way I'd term it, I suppose?
How would other people solve this? Are my concerns unfounded? I am mostly concerned about the (what should be a) performance hit in the first method, but I want to adhere to best practice with the Repository pattern and DDD.
I would stick to the first approach, but try to optimize things in the bookrepository method. For instance, you can load this information all in one time, and use in-memory lookup to speed this up. Like this you would need 2 queries, and not 1 for each author.
The way I solved this in the end was to create an Entity from a view in the database. I named the entity 'AuthorSummary', and made an AuthorSummaryRepository that didn't contain any Add() methods, just retrieval methods.

Create index in correct collection

I annotate a document with #Index(unique = true) like so:
public class ADocumentWithUniqueIndex {
private static final long serialVersionUID = 1L;
#Indexed(unique = true)
private String iAmUnique;
public String getiAmUnique() {
return iAmUnique;
}
public void setiAmUnique(String iAmUnique) {
this.iAmUnique = iAmUnique;
}
}
When saving the object, I specify a custom collection:
MongoOperations mongoDb = ...
mongoDb.save(document, "MyCollection");
As a result I get:
A new document in "MyCollection"
An index in the collection "ADocumentWithUniqueIndex"
How can I create the index in "MyCollection" instead without having to explicitly specify it in the annotation?
BACKGROUND:
The default collection name is too ambiguous in our use case. We cannot guarantee, that there wouldn't be two documents with the same name but in different packages. So we added the package name to the collection.
Mapping a document to a collection is dealt with in an infrastructure component.
The implementation details like collection name etc. shouldn't leak into the individual documents.
I understand this is a bit of an "abstraction on top of an abstraction" smell but required since we had to support MongoDb and Windows Azure blob storage. Not anymore though...
This seemed like a fairly standard approach to hide the persistence details in a infrastructure component. Any comments on the approach appreciated as well.
It's kind of unusual to define the collection for an object to be stored and then expect the index annotations to work. There's a few options you have here:
Use #Document on ADocumentWithUniqueIndex and configure the collection name manually. This will cause all objects of that class to be persisted into that collection of course.
Manually create indexes via MongoOperations.indexOps() into the collections you'd like to use. This would be more consistent to your approach of manually determining the collection name during persistence operations.

Content tagging with MongoDB

I want to implement content tagging using MongoDB. In a relational database, the best approach would be to have a many-to-many relation between the content (say, "products") and tags tables. But what is best approach with NoSQL databases?
Would it be better to put every tag in a tags array of the "content" document, or put references to tags in a string?
In most cases where you have a n:m relation in MongoDB, you should use embedding instead of referencing. So I would recommend you to have an array "tags" in each product with the tag names. I assume that looking at a single product will be the most frequent use-case in your system. This design will allow you to show the user a product with a list of tag names with a single database query.
When you need some additional meta-data about the tags which you don't want to bind to a product (like a long-text description of a tag), you could create an additional tags collection, where the name field gets an unique index for fast lookup and avoiding duplicates. When the user clicks on or hovers over a tag name, you can use an additional query to get the tag details.
A problematic case in this design is the situation when you want to delete or rename a tag. Then you have to edit every product which includes the tag. But because MongoDB doesn't know foreign keys with CASCADE ON DELETE like SQL databases, you will always have that problem when you have documents referencing one another.
Renaming tags could be made easier by storing objectIDs instead of names in the tag array of the product. But IDs have the disadvantage that they are useless for the user. You need to get the names of the tags to show a product page. That means that you have to request every single one from the tags collection, which requires an additional database query.