information duplication with nosql?

information duplication with nosql? - nosql

I'm using Redis as nosql database (and really enjoying it) but there are some cases where it's not the more handy (I know that not every data model can be used with nosql, but still...).
I'd like to know if there is a proper way to perform the following in Redis or if I definitively need to go for a RDMS instead.
Let's say I have 2 objets:
- User
- Asset
an Asset can be either a "flat", a "building" or a "house".
I use a redis list named "assets" that gathers the assets like:
assets = [asset:1, asset:2, ....]
Each assets is a hash like:
asset:1 = {type: flat, value: 150000$}
How can I get the list of all the assets of type "flat" ?
The easiest way would be to iterate through the list of assets and check the type of each one, that's linear so that's seems quite good.
Is there a better solution ?
It could be usefull to have a new list: "flats" that would contains the references of the assets with type "flat" but this news list would cause an overhead as we will need to be sure it's synchronized.
flats = [asset:1, asset:5, ...]
But this lead to data replication... which is bad... Is this still the way to go with nosql ?
Any other suggestions ?

I usually build a second key that contains an index. So, when you write a new asset, you write it to your set of assets and to your set of specific asset types. It's overhead, but it's the same overhead you'd experience in other databases, it's just up front. Indexing always requires an additional write, Redis just makes it really obvious that this is happening.
Also, I would just store the id in your "assets" set, rather than asset:1, just store 1

Related

Saving arrays on Flutter

I have an array that contains widgets List<ListCard> listsList= []; And inside each widget I have an array that has objects of a class called Product List<Product> products = [];. Through the app you can add products to the list but I want to save this 2 arrays (and the inner objects). How can I do it? I thought I can do a database but maybe there is something more straightforward than designing all the schema.

You can use this package https://pub.dev/packages/localstorage . It is pretty straighforward

There are two ways to this:
1) Use this package, it's pretty simple to use and doesn't require you to write anything super complex. One downside - data is saved on the device, so if the app is deleted - all information is lost.
2) Attach Firebase Storage, using official packages. Installation of it is not complex at all, though you will need to convert your Product to Map
Both methods are valid and pretty fast to implement without groundbreaking changes.
If any questions occur - feel free to ask :3

nosql wishlist models - Struggle between reference and embedded document

I got a question about modeling wishlists using mongodb and mongoose. The idea is I need a user beeing able to have many different wishlists which contain many wishes, each wish making a reference to a single article
I was thinking about it and because a wishlist only belong to a single user I thought using embedded document for that.
Same for the wish beeing embedded to a wishlist.
So I got something like that
var UserSchema = new Schema({
...
wishlists: [wishlistSchema]
...
})
var WishlistSchema = new Schema({
...
wishes: [wishSchema]
...
})
but my question is what to do with the article ? should I use a reference or should I copy the article's data in an embedded document.
If I use embedded document I got an update problem. When the article's price change, to update every wish referencing this article become a struggle. But to access those wishes's article is a piece of cake.
If I use reference, The update is not a problem anymore but I got a probleme when I filter the wish depending on their article criteria ( when I filter the wishes depending on a price, category etc .. ).
I think the second way is probably the best but I don't know how if it's possible to build a query to filter the wish depending on the article's field. I tried a lot of things using population but nothing works very well when you need to populate depending on a nested object field. ( for exemple getting wishes where their article respond to certain conditions ).
Is this kind of query doable ?
Sry for the loooong question and for my bad English :/ but any advice would be great !

In my experience in dealing with NoSQL database (mongo, mainly), when designing a collection, do not think of the relations. Instead, think of how you would display, page, and retrieve the documents.
I would prefer embedding and updating multiple schema when there's a change, as opposed to doing a ref, for multiple reasons.
Get would be fast and easy and filter is not a problem (like you've said)
Retrieve operations usually happen a lot more often than updates and with proper indexing, you wouldn't really have to bother about performance.
It leverages on NoSQL's schema-less nature and you'll be less prone restructuring due to requirement changes (new sorting, new filters, etc)
Paging would be a lot less of a hassle, and UI would not be restricted with it's design with paging and limit.
Joining could become expensive. Redundant data might be a hassle to update but it's always better than not being able to display a data in a particular way because your schema is normalized and joining is difficult.
I'd say that the rule of thumb is that only split them when you do not need to display them together. It is not impossible to join them back if you do, but definitely more troublesome.

How to handle application death and other mid-operation faults with Mongo DB

Since Mongo doesn't have transactions that can be used to ensure that nothing is committed to the database unless its consistent (non corrupt) data, if my application dies between making a write to one document, and making a related write to another document, what techniques can I use to remove the corrupt data and/or recover in some way?

The greater idea behind NoSQL was to use a carefully modeled data structure for a specific problem, instead of hitting every problem with a hammer. That is also true for transactions, which should be referred to as 'short-lived transactions', because the typical RDBMS transaction hardly helps with 'real', long-lived transactions.
The kind of transaction supported by RDBMSs is often required only because the limited data model forces you to store the data across several tables, instead of using embedded arrays (think of the typical invoice / invoice items examples).
In MongoDB, try to use write-heavy, de-normalized data structures and keep data in a single document which improves read speed, data locality and ensures consistency. Such a data model is also easier to scale, because a single read only hits a single server, instead of having to collect data from multiple sources.
However, there are cases where the data must be read in a variety of contexts and de-normalization becomes unfeasible. In that case, you might want to take a look at Two-Phase Commits or choose a completely different concurrency approach, such as MVCC (in a sentence, that's what the likes of svn, git, etc. do). The latter, however, is hardly a drop-in replacement for RDBMs, but exposes a completely different kind of concurrency to a higher level of the application, if not the user.

Thinking about this myself, I want to identify some categories of affects:
Your operation has only one database save (saving data into one document)
Your operation has two database saves (updates, inserts, or deletions), A and B
They are independent
B is required for A to be valid
They are interdependent (A is required for B to be valid, and B is required for A to be valid)
Your operation has more than two database saves
I think this is a full list of the general possibilities. In case 1, you have no problem - one database save is atomic. In case 2.1, same thing, if they're independent, they might as well be two separate operations.
For case 2.2, if you do A first then B, at worst you will have some extra data (B data) that will take up space in your system, but otherwise be harmless. In case 2.3, you'll likely have some corrupt data in the event of a catastrophic failure. And case 3 is just a composition of case 2s.
Some examples for the different cases:
1.0. You change a car document's color to 'blue'
2.1. You change the car document's color to 'red' and the driver's hair color to 'red'
2.2. You create a new engine document and add its ID to the car document
2.3.a. You change your car's 'gasType' to 'diesel', which requires changing your engine to a 'diesel' type engine.
2.3.b. Another example of 2.3: You hitch car document A to another car document B, A getting the "towedBy" property set to B's ID, and B getting the "towing" property set to A's ID
3.0. I'll leave examples of this to your imagination
In many cases, its possible to turn a 2.3 scenario into a 2.2 scenario. In the 2.3.a example, the car document and engine are separate documents. Lets ignore the possibility of putting the engine inside the car document for this example. Its both invalid to have a diesel engine and non-diesel gas and to have a non-diesel engine and diesel gas. So they both have to change. But it may be valid to have no engine at all and have diesel gas. So you could add a step that makes the whole thing valid at all points. First, remove the engine, then replace the gas, then change the type of the engine, and lastly add the engine back onto the car.
If you will get corrupt data from a 2.3 scenario, you'll want a way to detect the corruption. In example 2.3.b, things might break if one document has the "towing" property, but the other document doesn't have a corresponding "towedBy" property. So this might be something to check after a catastrophic failure. Find all documents that have "towing" but the document with the id in that property doesn't have its "towedBy" set to the right ID. The choices there would be to delete the "towing" property or set the appropriate "towedBy" property. They both seem equally valid, but it might depend on your application.
In some situations, you might be able to find corrupt data like this, but you won't know what the data was before those things were set. In those cases, setting a default is probably better than nothing. Some types of corruption are better than others (particularly the kind that will cause errors in your application rather than simply incorrect display data).
If the above kind of code analysis or corruption repair becomes unfeasible, or if you want to avoid any data corruption at all, your last resort would be to take mnemosyn's suggestion and implement Two-Phase Commits, MVCC, or something similar that allows you to identify and roll back changes in an indeterminate state.

EF - multiple includes to eager load hierarchical data. Bad practice?

I am needing to eager load a hierarchy structure so that I can recursively iterate through it. The eager loading is necessary to prevent multiple db queries while traversing the tree. It seems the consensus is that you can't eager load infinite levels of the tree, so I did something like
var item= db.ItemHierarchies
.Include("Children.Children.Children.Children.Children")
.Where(x => x.condition == condition)
to load 5 levels of children. This seems to get the job done. I'm wondering what the drawback is to doing this? If there is none then theoretically could I add 50 levels of includes here without slowing things down?

I recommend taking a look at the SQL that is generated as you add eager loading to your query.
var item= db.ItemHierarchies
.Include("Children")
.Include("Children.Children")
.Include("Children.Children.Children")
.Include("Children.Children.Children.Children")
.Include("Children.Children.Children.Children.Children")
var sql = ((System.Data.Objects.ObjectQuery) item).ToTraceString()
// http://visualstudiomagazine.com/blogs/tool-tracker/2011/11/seeing-the-sql.aspx
You'll see that the SQL quickly gets very big and complicated and can potentially have serious performance implications. You'd do well to limit your eager loading to data that you are certain you will need and to consider using explicit loading for some of the related entities - especially if you're working with connected entities in which case you can explicitly load collection properties when they're needed.
Also note that you may not need multiple separate Includes. For example, the following needs to be separate Includes because they're addressing separate properties (Widgets and Spanners) of the root.
var item= db.ItemHierarchies
.Include("Widgets")
.Include("Spanners.Flanges")
But the following isn't necessary:
var item= db.ItemHierarchies
.Include("Widgets") //This isn't necessary.
.Include("Widgets.Flanges") //This loads both Widges and Flanges.

Well honestly.. It's an extremely bad practice.
Let's assume you had 50 objects in your root.. and 50 per level.
You may end up retrieving 312500000 "capsules" of information.
Now, one might ask: "So what is wrong with that?!",
I mean if that is what is required than why not do that..
Rule #1: we develop software that should be used by human beings.
And the fact is that no human capable of taking a glimpse at 312500000 items of information at once and learn or conclude something beneficial out of it. (except.. that it does not help him or her to watch it)
Rule #2: UI should be based on what is needed and not what is possible.
And since we already established that showing 312500000 capsules of data is not needed there is no reason to bring all that at once.
And now you might come forward and say - But I don't care about the UI, really! All I need is to iterate in that data in order to process some information!
In that case you would probably want to save your results somewhere for future reference, but that means that its a batch job.. so why not apply batch job rules upon it.. like process it item by item which will also may give you the benefit of splitting it between even more machines if needed.
So you see.. no matter which path you choose there should be no reason to do it.
(= definition of what is a bad practice.)
Update:
After reading interesting concerns in the comments, I would like to update this answer with more analysis:
Deciding what is a bad practice must always be in reference to what is to be achieved or what is the role of each part in the system. In the current situation (after reading the comments) it has been brought or implied that the data storage is actually a persistent medium for objects opposed to a different concept where the data is the 'heart' of the application.
We can define two data types:
1) Data-Center which is being used in data-centric applications such as banks, CRM, ERP, websites or other service based solutions.
VS.
2) Data-Persistence medium which is being used as data to be saved for when the application is not active, in example: any simple app save file or any game save file and etc.
The main difference is that a data persistence medium is to be accessed only by a single instance of the app at a single point in time.. meaning the data is not designed to be shared by many instances. if the data is to be shared - we are dealing with a data-center application.
If your app just need a data-persistence medium - loading all the information cannot be considered as a bad practice - but you still need to make sure you are not exploding the memory. and in that frame of work, SQL Server might not be what you need or the best tool to use.
In the other case of Data-Centric application - my original answer remains as it will be a bad practice to bring all the information per instance of the application.

Thoughts on data model

For my app I thought of two different data models, but I cannot see which one would be the best both in performance and filesize. In my app I have to store Recipes, which will consist of an array with ingredients, an array with instructions, an array with tips and some properties to select some recipes (e.g. a rating, type of dish).
I thought of two different models. The first would be to convert the arrays to NSData and store them all in the Core Data model. As the array's are localized that means that there will be multiple arrays of the same kind in there (e.g. instructionsEN, instructionsFR, instructionsNL). As it is not necessary to query the arrays I'm happy with the fact that I have to convert the arrays to NSData.
The other model would be a core data that only contains the properties to filter a recipe, and an identifier to a .plist file that is stored in the main bundle or the documents directory (as some of these files will be created by us, and some are created by the user). This .plist file will contain all the instructions, ingredients etc. Again, there are multiple arrays for the same kind for different localizations.
I hope you can help my with making my decision which of these options would be best in terms of performance and diskspace. I would also appreciate it if you could think of a different solution.

If you're going to Core Data, you should generally go all the way. In that case, you would have an NSManagedObject Ingredient. I would probably put a method on Ingredient like stringValueForLocale: that would take care of returning me the best value. This means that a given ingredient can be translated once and is reusable for all recipes.
You would then have a Component entity that would have an Ingredient, a quantity value and a unit. A Recipe would have a 1:M property components that would point to these. Component should likely have an englishDescription as well, which would return a printable value like "1/4c sugar" while frenchDescription might print "50g de sucre" (note the volume/mass conversion there; Component is probably where you'd manage this.)
Instructions are a bit different, since they are less likely to be reusable. I guess you might get lucky and "Beat eggs to hard peaks." might show up in several recipes, but unless you're going to actively look for those kinds of reuse, it's probably more trouble than it's worth. Instructions are also the natural place to address cultural differences. In France, eggs are often stored at room temperature. In America, they are always refrigerated. To correctly translate a French recipe to American English, you sometimes have to include an extra step like "bring eggs to room temperature." (But it depends on the recipe, since it doesn't always matter.) It generally makes sense to do this in the instructions rather than in the Ingredients.
I'd probably create an Instructions entity with stringValuesForLocale: (that would return an array of strings). Then you could do some profiling and decide whether to break this up into separate LocalizedInstructions entities so that you didn't have to fault all of the localization. The advantage of this design is that you can change you mind later about the internal database layout, and it doesn't impact higher levels. In either case, however, I'd probably store the actual instructions as an NSData encoding an NSArray. It's probably not worth the trouble and cost of creating a bunch of individual LocalizedInstruction entities.

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse