Couchbase: Synchronizing views with a bucket - nosql

I've got a question for you couchbase pros: Is it possible to synchronize a subset of documents (eg. the documents within a view) with an other bucket?
So that the other bucket documents are always a direct subset of the "master" bucket?
if so, isn't that to much expensive in terms of perfomance? or does couchbase have any functionality to only create deeplinks to the documents instead of copying it?
Alternatively: is it possible to write views on views?
Thank you in advance!
--- EDIT ----
Let's say I want to have two sets (buckets) of documents S1 and S2. S2 is a subset of S1. Each set contains the same views V1, V2 and V3 since I want to be able to query any of them with the same logic/interface. In my case set S2 is build per user/company/store/whatever, in production there should be like 1000ish subsets S2 - to stay abstract let's call them S2a S2b and S2c.
The selection of documents which to be contained in any subset is done by a filtering instance (for example a view). Let's call these filtering instances F1 for filtering S1 to S2 hence F1a, F1b and F1c.
So with my actual knowledge of couchbase this results in the following design/view architecture: I've got the three "base" views to display V1,V2 and V3, and to realize S2a, S2b and S2c I must create the design views S2aV1, S2aV2, S2aV3, S2bV1, S2bV2, etc. (9 Views).
One could say "Well choose your keys wisely and you can avoid the sub views" but in my opinion this isn't that easy because of the following circumstances: In worst case the filter parameters change every minute and contain many WHERE IN constraints which could (at my actual point of view) not be handled efficiently querying k/v lists.
This leads to the following thoughts and the question I initially asked. If I use the same views in any subset (defined by a filter) shouldn't it be possible to build up an entity which helps me handling complex filtering? For example a function which is called during runtime while generating the view output? This could look like /design/view?filter=F1 or something like that.
Or do you have any other ideas to solve this problem? Or should I use SQL since it's more capable of handling frequently changing filters?

Generally speaking for most models you don't really need to have bucket "subsets", is there a particular reason you are trying to do this and why you would want that data broken out? You can also query your views, or instead of a view on a view, you can just make a separate view that maps/filters further based on your needs (i.e. does the same job as a view on a view).

We are working on Elastic Search integration. Maybe better for your use case

I think what you want to do is write a view on your original bucket, and then copy the key/values from that view, to be documents in a new bucket.
It shouldn't be hard to write an automated framework for managing this so that you can keep the derived data up to date in near real time.

Related

TYPO3 backend workflow when avoiding the storage of data in intermediate table

I have a situation as described in the ExtbaseFluid book:
I would like to store information in the intermediate table which is not recommended at all.
Here is a cite from the warning box of the above linked book chapter:
Do not store data in the Intermediate Table that concern the Domain. Though TYPO3 supports this (especially in combination with Inline Relational Record Editing (IRRE) but this is always a sign that further improvements can be made to your Domain Model. Intermediate Tables are and should always be tools for storing relationships and nothing else.
Let’s say you want to store a CD with its containing music tracks: CD -- m:n (Intermediate Table) -- Song. The track number may be stored in a field of the Intermediate Table. However, the track should be stored as a separate domain object, and the connection be realized as CD -- 1:n -- Track -- n:1 -- Song.
So I want not to do what is not recommended. But thinking about the workflow for the editor that results of the recommended solution rises a few question for me.
To stay with this example I would need the following tables:
tx_extname_domain_model_cd
tx_extname_domain_model_cd_track_mm
tx_extname_domain_model_track (which holds the track number)
tx_extname_domain_model_track_song_mm
tx_extname_domain_model_song
From what I know this would end in the situation that the editor would need to create following records:
one record for the cd
one record for the song
now the editor can create one record for the track.
There the track number is added.
Furthermore the cd record needs to be assigned as well as the song.
So here are my questions:
I guess this workflow cannot be improved with some (to me unknown) TCA setup?
An editor cannot directly reach the song when the cd record is opened?
Instead first she / he has to open the track record and can from there navigate to the song?
Is it really that bad to store data in the intermediate table? The TYPO3 table sys_file_reference does the same!? But I wonder how those data could be shown (because IRRE is not possible because it shall only be used for 1:n relations (source).
The question you have to ask yourself is: Do I want to do coding by the book, or do I want to create a pragmatic approach to solve a customer's problem?
In this specific case the additional problem is, that the people who originally invented Extbase had a quite sophisticated and academic approach, but when it comes to a pragmatic use and performance, they were blocked by their own rules and stuck with coding by the book.
Especially this example and the warning message shows a way of thinking that was one of the reasons, why I never actually used Extbase but went for Core-API methods to create performant and pragmatic queries to get the desired result sets. Now that we've got Doctrine under the hood, this works like a charm even with quite exotic DB flavors.
Of course intermediate tables are a good idea and of course those intermediate tables can and should be enriched with additional data fields, that do not require a 3rd, 4th or nth table to store i.e. a simple set of dropdown options, since this can easily be handled with attributes configured in TCA, as it is shown here: https://docs.typo3.org/m/typo3/reference-tca/master/en-us/ColumnsConfig/Type/Inline/Examples.html
sys_file_reference is the most prominent example since it provides exactly that kind of additional information that should not be pumped into additional tables - and guess what, the TYPO3 core does not make use of a single line of Extbase code to deal with that data or almost any other data of the core tables.
To answer your last question: Take a look at the good old IRRE Tutorial to get a clue how to do m:n connections with intermediate inline tables.
https://docs.typo3.org/typo3cms/extensions/irre_tutorial/0.4.0/Manual/Index.html#intermediate-tables-for-m-n-relations
Depends on the issue, sometimes the intermediate table is an entity, sometimes not. In this example the intermediate table is the track, which would contain: [uid, cd, song, track_no, ... (whatever else needed to define the track)]
Be carefull when you define your data, that you do not make it too advanced.

Database schema for a tinder like app

I have a database of million of Objects (simply say lot of objects). Everyday i will present to my users 3 selected objects, and like with tinder they can swipe left to say they don't like or swipe right to say they like it.
I select each objects based on their location (more closest to the user are selected first) and also based on few user settings.
I m under mongoDB.
now the problem, how to implement the database in the way it's can provide fastly everyday a selection of object to show to the end user (and skip all the object he already swipe).
Well, considering you have made your choice of using MongoDB, you will have to maintain multiple collections. One is your main collection, and you will have to maintain user specific collections which hold user data, say the document ids the user has swiped. Then, when you want to fetch data, you might want to do a setDifference aggregation. SetDifference does this:
Takes two sets and returns an array containing the elements that only
exist in the first set; i.e. performs a relative complement of the
second set relative to the first.
Now how performant this is would depend on the size of your sets and the overall scale.
EDIT
I agree with your comment that this is not a scalable solution.
Solution 2:
One solution I could think of is to use a graph based solution, like Neo4j. You could represent all your 1M objects and all your user objects as nodes and have relationships between users and objects that he has swiped. Your query would be to return a list of all objects the user is not connected to.
You cannot shard a graph, which brings up scaling challenges. Graph based solutions require that the entire graph be in memory. So the feasibility of this solution depends on you.
Solution 3:
Use MySQL. Have 2 tables, one being the objects table and the other being (uid-viewed_object) mapping. A join would solve your problem. Joins work well for the longest time, till you hit a scale. So I don't think is a bad starting point.
Solution 4:
Use Bloom filters. Your problem eventually boils down to a set membership problem. Give a set of ids, check if its part of another set. A Bloom filter is a probabilistic data structure which answers set membership. They are super small and super efficient. But ya, its probabilistic though, false negatives will never happen, but false positives can. So thats a trade off. Check out this for how its used : http://blog.vawter.com/2016/03/17/Using-Bloomfilters-to-Avoid-Repetition/
Ill update the answer if I can think of something else.

EF - multiple includes to eager load hierarchical data. Bad practice?

I am needing to eager load a hierarchy structure so that I can recursively iterate through it. The eager loading is necessary to prevent multiple db queries while traversing the tree. It seems the consensus is that you can't eager load infinite levels of the tree, so I did something like
var item= db.ItemHierarchies
.Include("Children.Children.Children.Children.Children")
.Where(x => x.condition == condition)
to load 5 levels of children. This seems to get the job done. I'm wondering what the drawback is to doing this? If there is none then theoretically could I add 50 levels of includes here without slowing things down?
I recommend taking a look at the SQL that is generated as you add eager loading to your query.
var item= db.ItemHierarchies
.Include("Children")
.Include("Children.Children")
.Include("Children.Children.Children")
.Include("Children.Children.Children.Children")
.Include("Children.Children.Children.Children.Children")
var sql = ((System.Data.Objects.ObjectQuery) item).ToTraceString()
// http://visualstudiomagazine.com/blogs/tool-tracker/2011/11/seeing-the-sql.aspx
You'll see that the SQL quickly gets very big and complicated and can potentially have serious performance implications. You'd do well to limit your eager loading to data that you are certain you will need and to consider using explicit loading for some of the related entities - especially if you're working with connected entities in which case you can explicitly load collection properties when they're needed.
Also note that you may not need multiple separate Includes. For example, the following needs to be separate Includes because they're addressing separate properties (Widgets and Spanners) of the root.
var item= db.ItemHierarchies
.Include("Widgets")
.Include("Spanners.Flanges")
But the following isn't necessary:
var item= db.ItemHierarchies
.Include("Widgets") //This isn't necessary.
.Include("Widgets.Flanges") //This loads both Widges and Flanges.
Well honestly.. It's an extremely bad practice.
Let's assume you had 50 objects in your root.. and 50 per level.
You may end up retrieving 312500000 "capsules" of information.
Now, one might ask: "So what is wrong with that?!",
I mean if that is what is required than why not do that..
Rule #1: we develop software that should be used by human beings.
And the fact is that no human capable of taking a glimpse at 312500000 items of information at once and learn or conclude something beneficial out of it. (except.. that it does not help him or her to watch it)
Rule #2: UI should be based on what is needed and not what is possible.
And since we already established that showing 312500000 capsules of data is not needed there is no reason to bring all that at once.
And now you might come forward and say - But I don't care about the UI, really! All I need is to iterate in that data in order to process some information!
In that case you would probably want to save your results somewhere for future reference, but that means that its a batch job.. so why not apply batch job rules upon it.. like process it item by item which will also may give you the benefit of splitting it between even more machines if needed.
So you see.. no matter which path you choose there should be no reason to do it.
(= definition of what is a bad practice.)
Update:
After reading interesting concerns in the comments, I would like to update this answer with more analysis:
Deciding what is a bad practice must always be in reference to what is to be achieved or what is the role of each part in the system. In the current situation (after reading the comments) it has been brought or implied that the data storage is actually a persistent medium for objects opposed to a different concept where the data is the 'heart' of the application.
We can define two data types:
1) Data-Center which is being used in data-centric applications such as banks, CRM, ERP, websites or other service based solutions.
VS.
2) Data-Persistence medium which is being used as data to be saved for when the application is not active, in example: any simple app save file or any game save file and etc.
The main difference is that a data persistence medium is to be accessed only by a single instance of the app at a single point in time.. meaning the data is not designed to be shared by many instances. if the data is to be shared - we are dealing with a data-center application.
If your app just need a data-persistence medium - loading all the information cannot be considered as a bad practice - but you still need to make sure you are not exploding the memory. and in that frame of work, SQL Server might not be what you need or the best tool to use.
In the other case of Data-Centric application - my original answer remains as it will be a bad practice to bring all the information per instance of the application.

Displaying results of perform find in a portal

I have some global variables $$A, $$B, $$C and what to search within a table for these terms in fieldA, fieldB and fieldC (using Perform Find). How can I use the result of this Perform Find to display the results in a portal.
The implementation by my predecessor replaces a field fieldSEARCHwith 1 if it is in the Perform Find results and 0 otherwise, and then uses a portal filtered by this field. This seems a very dodgey way of doing it, not least becuase it means that multiple users will not be able to search at the same time!
Can you enhance the portal filter to filter against the variables themselves? Or you can perform the find, grab IDs of the found set, put them into a global field, and then use the field to construct the relationship. Global fields are multi-user safe.
The best way is not to do this at all, but use list views to perform searches. List views are naturally searchable and much more flexible than portals (you can easily sort them, omit arbitrary records, and so on). It's possible to repeat this functionality in portals, but it's way more complex. I mean, if there's some serious gain from using a portal, then it's doable, but if not, then the native way is obviously better.
List views are easier to search, as FileMaker still hasn't transitioned to the 21st century and insists on this model... Most users however want a Master-Detail view, like a mail app, and understandably so as it's more intuitive (i.e. produce a list view on one side, but clicking on it updates detail/fields in the middle).
If this is what you want, you may want to cast an eye at Modular FM, where someone has already done the hard work for you:
http://www.modularfilemaker.org/module/masterdetail-2-0/
HTH
Stam

How to store tree base data into Data base?

I am developing an application of iphone which is navigation based, and i am showing data on first View Items then on second View Sub Items and so on. So my question is that what will good approach to save this on Data base (sqlite).
Keep this simple.
Each object/View has it's own ID and at least one parent ID.
This will ensure your data can represent trees of any depth and any complexity.
i am not an expert in this field but you can do it like this ... now that you said all you data can be represented something like tree...
Find all the objects that will be leaf of your tree make those objects as a table in DB, (please keep in mind that you make table only for objects that have different structure not because they have different values)
Repeat above step for one level above until you reach top
Eventually you will find that you just got your DB.
To be more precise you need to study DBMS