merge - upsert/delete in google cloud datastore - nosql

I am working on a POC (to move part of functionality from relational DB to cloud datastore). I have few questions:
I would need to refresh few "kind" every night as the data comes up
from a different data source (via flat files). I read about it, and
understood that there is not TRUNCATE kind of functionality in
datastore. I believe, only option is to retrieve the keys from the
"kind" in a loop and delete entity by entity. And use import functionality to load the new set of data. Is there any better
option?
Assume I have a kind called department, and a kind called
store. Now, I need a kind called dept-store. So for this parent
nodes are department and store. Is there a way to enforce this kind
of relationship? From the documentation I see that there can only be
one parent.
If i have a child entity in kind1 whose parent is
present in kind2, and they are linked together, is there a way to
query all the properties present in kind1 and kind2 together? From
relational DB perspective, it is like equi-join with "SELECT *". I
am looking for an equivalent functionality in datastore.

In order to answer your questions:
There is two ways to delete multiple entities. First, you can use Cloud Dataflow to delete entities in Bulk [1]. Second, once keys are retrieved you can make a batch delete operation by passing the keys to Datastore delete function, you have the usage example here [2]. In order to retrieve the keys you can run keys-only query [3].
In Datastore an entitiy can have only one parent but can have multiple children. But for your use case you may try to have a third kind, dept-store, and assign its properties as the keys of the entities from the department and the store kinds. This solution might need a good understanding of your neeeds for implementation, as Datastore by nature is Non-relational database.
You can lookup multiple entities providing the keys retrieved from kind1 and kind2 with batch operations [2].

Related

Lagom persistent read side and model evolution

To learn lagom i created a simple application with some simple persistent entities and a persistent read side (as per the official documentation, using cassandra)
The official doc contains a section about model evolution, describing how to change the model. However, there is no mention of evolution when it comes to the read side.
Assuming i have an entity called Item, with an ID and a name, and the read side creates a table like CREATE TABLE IF NOT EXISTS items (id TEXT, name TEXT, PRIMARY KEY (id))
I now want to change the item to include a description. This is trivial for the persistent entity, but the read side has to be changed as well.
I can see several approaches to (maybe) achieve that:
use a model evolution tool like liquibase or play evolutions to change the read side tables.
somehow include update table statements in createTables that migrate the model
create additional tables containing the additional information, and keep the old tables without modifications
Which approach would be the most fitting? Is there something better?
Creating a new table and dropping the old table is an option too IMHO.
It is simple as modifying your "create table" command ("create table mytable_v2 ..." and "drop table mytable...") and changing the offset name and modifying your event handlers.
override def buildHandler( ): ReadSideProcessor.ReadSideHandler[MyEvent] = {
readSide.builder[MyEvent]("myOffset") // change it to "myOffset_v2"
...
}
This results in all events to be replayed and your read side table to be reconstructed from the scratch. This may not be an option if the current table is really huge as the recostruction may last very long time.
Regarding what #erip says I see perfectly normal adding a new column to your read side table. Suppose there are lots of records in this table with list of all entities and you want to retrieve a list of entities based on some criteria so you need some columns to be included in the where clause. Retrieving list of all entities and asking each of them if it complies with the criteria is not an option at all - it could be very unefficient as it needs more time, memory and network usage.
The point of a read-side is to materialize views from entity state changes from your service's event stream in your service. In this respect, you as the service controller can decide what is important for your subscribers to know about. This is handled by creating read-sides with an anti-corruption layer (or ACL).
Typically your subscribers will subscribe to API events which should experience no evolution. Your internal events (or impl events) will likely need to evolve; because of this, there should be a transformation from the impl to the API.
This is why it's very important to consider your domain very carefully before design: you really need to nail down what subscribers will need to know about. In the case of a description, it strikes me as unlikely that subscribers will need (or want!) to know about that.

When to use an owned entity types vs just creating a foreign key or adding the columns directly to the table?

I was reading about owned entity types here https://learn.microsoft.com/en-us/ef/core/modeling/owned-entities#feedback and I was wondering when I would use that. Especially when using .ToTable(); although I am not sure if ToTable creates a relationship with keys.
I read the entire article so I understand that it essentially forces you to access the data via nav properties and prevents the owned table from being treated as an entity. They also say Include() is not needed and the data comes down with every query for the parent table so its not like you are reducing the amount of data that comes back.
So whats the point exactly? Also whats the point of "table splitting"?
It takes the place of Complex types with the option to set it up like a 1-1 relationship /w ToTable while automatically eager-loaded. This would use the same PK in both tables, same as 1-1.
The point Table-splitting would be that you want an object model that is normalized, where the table structure is not. This would fit scenarios where you have an existing table structure and want to split off related pieces of that data into sub-entities associated with the main entity. With the ToTable option, it would be similar to a 1-1 relationship, but automatically eager-loaded. However when considering the reasons to use a 1-1 relationship I would consider this option a bad choice.
The common reasons for using it in normal 1-1 relationships would include:
Splitting off expensive to load, rarely used data. (images, binary, memo)
Encapsulating data particular to a single application off of a common entity. i.e. if I have a "Customer" which is used by a billing system vs. a CRM I might have "CustomerBillingData" and "CustomerCRMData" owned by "Customer" rather than an inherited BillingCustomer / CRMCustomer. As there is a "single" customer that may serve one or both systems. Billing doesn't care about CRM data, CRM doesn't care about Billing. If all data is in "Customer" then both systems potentially need to be updated, and I cannot rely on constraints when the data is optional to the other system. By using composition I can enforce required data for a particular system.
In neither of these cases would I want to use table-splitting or anything that automatically eager-loads, so Owned Types /w ToTable would not replace 1-1 relationships by any stretch. It's essentially a more strict version of complex types, I'd say it's strictly used for entity organization. Not something I'd admit to wanting to use very often.

In a nosql database like MongoDB or Couchbase how to model many to many relationship?

Consider a scenario of an application where I have users and projects and the requirement is users shall be assigned to projects. One user can be assigned to multiple projects. This is a many to many relationship. So what is the best way to model such a requirement.
I will like to discuss few approaches to model such a requirement :
- Embeded data model
In this approach I will embedd the user documents inside projects document.
Advantages : you get all the required data in one API call OR by fetching one single document.
Disadvantages : Data duplicacy which is OK
Real problem is if you update user information for eg user mobile no or name from users screen then this updated information should also be reflected under all embedded user documents. For this some bulk update query should be fired.
But is this the right way ???
- Embedding object references instead of objects (which is normalised)
In this case if we embedd user id's instead of user objects then the problem mentioned above wont be there but then we will have to make multiple network calls to get required data or make a seperate relation kond of document as we do in SQL.
Is this the best way ??
We have a same scenario, so i embed objectId. and for fill data for clients, populate users data in find function.
contract.find({}).populate('user').then(function(){});
There are few hard and fast rules, but usually with many-to-many relationships you would prefer references over embedding. This doesn't mean your data is totally flat/normalized.
For example, you could have a user document with an array of project ids. You could have the reverse for projects.
Think about your queries and how you will structure them. That can give you other hints about how to structure your documents.

Need some advice concerning MVVM + Lightweight objects + EF

We develop the back office application with quite large Db.
It's not reasonable to load everything from DB to memory so when model's proprties are requested we read from DB (via EF)
But many of our UIs are just simple lists of entities with some (!) properties presented to the user.
For example, we just want to show Id, Title and Name.
And later when user select the item and want to perform some actions the whole object is needed. Now we have list of items stored in memory.
Some properties contain large textst, images or other data.
EF works with entities and reading a bunch of large objects degrades performance notably.
As far as I understand, the problem can be solved by creating lightweight entities and using them in appropriate context.
First.
I'm afraid that each view will make us create new LightweightEntity and we eventually will end with bloated object context.
Second. As the Model wraps EF we need to provide methods for various entities.
Third. ViewModels communicate and pass entities to each other.
So I'm stuck with all these considerations and need good architectural design advice.
Any ideas?
For images an large textst you may consider table splitting, which is commonly used to split a table in a lightweight entity and a "heavy" entity.
But I think what you call lightweight "entities" are data transfer objects (DTO's). These are not supplied by the context (so it won't get bloated) but by projection from entities, which is done in a repository or service.
For projection you can use AutoMapper, especially its newer feature that I describe here. This allows you to reduce the number of methods you need to provide "for various entities" (DTO's), because the type to project to can be given in a generic type parameter.

How do I use entity framework with hierarchical data?

I'm working with a large hierarchical data set in sql server - modelled using the standard "EntityID, ParentID" kind of approach. There are about 25,000 nodes in the whole tree.
I often need to access subtrees of the tree, and then access related data that hangs off the nodes of the subtree. I built a data access layer a few years ago based on table-valued functions, using recursive queries to fetch an arbitrary subtree, given the root node of the subtree.
I'm thinking of using Entity Framework, but I can't see how to query hierarchical data like
this. AFAIK there is no recursive querying in Linq, and I can't expose a TVF in my entity data model.
Is the only solution to keep using stored procs? Has anyone else solved this?
Clarification: By 25,000 nodes in the tree I'm referring to the size of the hierarchical dataset, not to anything to do with objects or the Entity Framework.
It may the best to use a pattern called "Nested Set", which allows you to get an arbitrary subtree within one query. This is especially useful if the nodes aren't manipulated very often: Managing hierarchical data in MySQL.
In a perfect world the entity framework would provide possibilities to save and query data using this data pattern.
Everything IS possible with Entity Framework but you have to hack and slash your way in to it. The database I am currently working against has too many "holder tables" since Points for instance is shared with both teams and users. Both users and teams can also have a blog.
When you say 25 000 nodes do you mean navigational properties? If so I think it could be tricky to get the data access in place. It's not hard to navigate, search etc with entity framework but I tend to model on paper then create the database based on how I want to navigate while using entity framework. Sounds like you don't have that option.
Thanks for these suggestions.
I'm beginning to realise that the answer is to remodel the data in the database - either along the lines of nested sets as Georg suggests, or maybe a transitive closure table, which I've just come across.
That way, I'm hoping to get two key benefits:
a) faster querying aginst arbitrary subtrees
b) a data model which no longer requires recursive querying - so perhaps bringing it within easy reach of the Entity Framework!
It's always amazing how so often the right answer to a difficult problem is not to answer it, but to do something else instead!