OrientDB query for hierarchical data - orientdb

OrientDB Server v2.0.10 ,
I am trying to come up with a query for the following scenario.
I have 2 hierarchies: A->B->C and D->E->F
The number of nodes in the hierarchy can change.
The node in 1st hierarchy can be connected to the other hierarchy using some relation say 'Assigned'.
What I want is the parent node of the 2nd hierarchy if there is any incoming edge to any of the node in that 2nd hierarchy from the 1st.
For example, say we have Car-Child->Engine-Child->Piston and Country-Child->State-Child->City
And a relationship Made_In which relates Car or Engine or Piston to either Country or State or City
So if there is a relation with either of Country or State or City, the Country should be returned. Example, Engine1-Made_In->Berlin, this would return Germany.
Sorry for such a toyish example. I hope it is clear.
Thanks.

You should consider reading the chapter about "traversing" - that should be the missing link to answer your question. You can find it here: http://orientdb.com/docs/last/SQL-Traverse.html
Basically, if you think of your graph as a family tree, you want to achieve 3 things:
Find all children, grand-children, grand-grand-children (and so on) from tree 1 for a given family member (=Hierarchy1)
Find those who have relations to members of another family tree (=ASSIGNED)
Show me who's on top of this tree (=Hierarchy2)
One of the possible solutions should look a little something like this:
Since you want to end up on top of hierarchy2, you have to start on the other side, i.e. hierarchy1.
Get hierarchy1 (top-to-bottom)
TRAVERSE out("CHILD") FROM Car
Choose all relations
SELECT out("MADE_IN) FROM ([1])
and from those, go bottom-to-top
TRAVERSE in("CHILD") FROM ([2])
Who's on top?
SELECT FROM ([3]) WHERE #class="Country"
Combined into one sql, it looks as ugly as this:
SELECT FROM (
TRAVERSE in("CHILD") FROM (
SELECT out("MADE_IN") FROM (
TRAVERSE out("CHILD") FROM Car
)
)
) WHERE #class="Country"
You could replace Car with any #rid in hierarchy1 to get a list of countries it or any part of it was made in.
There might be better solutions for sure. But at least this one should work, so I hope it will help.

Related

How to find most connected child nodes in OrientDB

So I'm new to OrientDB, and while I'm pretty good at SQL the syntax to get what I want in OrientDB is escaping me.
I know I can do something like select *, in().size() as size from Users order by size desc to find the most connected node of a certain class (Users in this case), but how do I find the most connected children a couple levels down?
I.e., let's say I have Organizations --> PROMOTES (edge) --> Platform --> MANAGES (edge) --> Suggestion
How do I find the most connected Suggestions at the Organization level? I.e., I know I can easily find the most connected suggestions one level out using the query I shared, but what about the most connected another level beyond that?
I'd ultimately like a result which lists each Suggestion along with how indirectly connected (number of edges) it is to Organizations.
Thank you!
Use TRAVERSE
You should use the command TRAVERSE to do that
SELECT out(PROMOTES).size() AS connectedOrg,*
FROM (
TRAVERSE out(MANAGES)
FROM Suggestion WHILE $depth < 2
)
WHERE $depth > 0
Result will be the Platforms linked to a Suggestion. Records can be duplicated.
Along with each Platform, you get the number of Organisation called connectedOrg.
About traverse :
Traverse follow the record ids present in a record and aggregates them in the results. With WHILE $depth < 2 you can limit search to only one level and with WHERE $depth > 0 you can remove the original record from the result. More info here.
Use OrientDB Functions
If you need to know how many Organization is linked to each Suggestion (through Plateform), use this syntax.
SELECT *, set(out(MANAGES).out(PROMOTES), null).size() FROM Suggestion
Note : , null allows to switch from AGGREGATE to INLINE. It prevents from having all sizes aggregated in a single record. See the doc for more info.

OrientDB traverse (children) while connected to a vertex and get an other vertex

I'm not sure the title is the best way to phrase it, here's the structure:
Structure
Here's the db json backup if you want to import it to test it: http://pastebin.com/iw2d3uuy
I'd like to get the Dishes eaten by the Humans living in Continent 1 until a _Parent Human moved to Continent 2.
Which means the target is Dish 1 & 2.
If a parent moved to another Continent, I don't want their dish nor the dishes of their children, even if they move back to Continent 1.
I don't know if it matters, but a Human can have multiple children.
If there wasn't the condition about the children of a Human who has moved from the Continent, this query would have worked:
SELECT expand(in('_Is_in').in('_Lives').in('_Eaten_by'))
FROM Continent WHERE continent_id = 1
But I guess here we're forced to use (among other things)
TRAVERSE out('_Parent') FROM Human WHILE
I've tried to use the while of traverse with a subquery to get all the Humans I'm interested in, before to try to get the Dishes, but I'm not even sure we can use while with a subquery.
I hope the structure will help other users to quickly find out if this query is useful to them. If anyone is wondering, I used the Graph tab of OrientDB Studio to make it, along with GIMP.
As a bonus, if anyone knows the Gremlin syntax, it would also be useful to learn it.
Please feel free to edit this post as you see fit and contribute your thoughts :)
SELECT expand(in('_Eaten_by'))
FROM (TRAVERSE out('_Parent')
FROM (SELECT from Human WHERE in('_Parent').size() = 0)
WHILE out('_Lives').out('_Is_in').continent_id = 1)
Explanation:
TRAVERSE out('_Parent')
FROM (SELECT FROM Human WHERE in('_Parent').size() = 0)
WHILE out('_Lives').out('_Is_in').continent_id = 1
returns Human 1 and 2.
That query traverses Human, starting from Human 1 while the Human is connected to Continent 1.
It starts from in('_Parent').size() = 0 which are the Humans without any _Parent (there's only Human 1 in this case) (size() is the size of the collection of vertices coming in from _Parent).
And SELECT expand(in('_Eaten_by')) FROM
gets the Dishes, starting from the Humans we got from the traversal and going through the edge _Eaten_by.
Note: be sure to always use ' around the vertices and edges names, otherwise the names don't seem to be taken in account.

REST API structure for multiple countries

I'm designing a REST API where you can search for data in different countries, but since you can search for the same thing, at the same time, in different countries (max 4), am I unsure of the best/correct way to do it.
This would work to start with to get data (I'm using cars as an example):
/api/uk,us,nl/car/123
That request could return different ids for the different countries (uk=1,us=2,nl=3), so what do I do when data is requested for those 3 countries?
For a nice structure I could get the data one at the time:
/api/uk/car/1
/api/us/car/2
/api/nl/car/3
But that is not very efficient since it hits the backend 3 times.
I could do this:
/api/car/?uk=1&us=2&nl=3
But that doesn't work very well if I want to add to that path:
/api/uk/car/1/owner
Because that would then turn into:
/api/car/owner/?uk=1&us=2&nl=3
Which doesn't look good.
Anyone got suggestions on how to structure this in a good way?
I answered a similar question before, so I will stick to that idea:
You have a set of elements -cars- and you want to filter it in some way. My advice is add any filter as a field. If the field is not present, then choose one country based on the locale of the client:
mydomain.com/api/v1/car?countries=uk,us,nl
This field should dissapear when you look for a specific car or its owner
mydomain.com/api/v1/car/1/owner
because the country is not needed (unless the car ID 1 is reused for each country)
Update:
I really did not expect the id of the car can be shared by several cars, an ID should be unique (like a primary key in a database). Then, it makes sense to keep the country parameter with the owner's search:
mydomain.com/api/v1/car/1/owner?countries=uk,us
This should return a list of people who own a car with the id 1... but for me this makes little sense as a functionality, in this search I'll only allow one country:
mydomain.com/api/v1/car/1/owner?country=uk

Need a good database design for this situation

I am making an application for a restaurant.
For some food items, there are some add-ons available - e.g. Toppings for Pizza.
My current design for Order Table-
FoodId || AddOnId
If a customer opts for multiple addons for a single food item (say Topping and Cheese Dip for a Pizza), how am I gonna manage?
Solutions I thought of -
Ids separated by commas in AddOnId column (Bad idea i guess)
Saving Combinations of all addon as a different addon in Addon Master Table.
Making another Trans table for only Addon for ordered food item.
Please suggest.
PS - I searched a lot for a similar question but cudnt find one.
Your relationship works like this:
(1 Order) has (1 or more Food Items) which have (0 or more toppings).
The most detailed structure for this will be 3 tables (in addition to Food Item and Topping):
Order
Order to Food Item
Order to Food Item to Topping
Now, for some additional details. Let's start flushing out the tables with some fields...
Order
OrderId
Cashier
Server
OrderTime
Order to Food Item
OrderToFoodItemId
OrderId
FoodItemId
Size
BaseCost
Order to Food Item to Topping
OrderToFoodItemId
ToppingId
LeftRightOrWhole
Notice how much information you can now store about an order that is not dependent on anything except that particular order?
While it may appear to be more work to maintain more tables, the truth is that it structures your data, allowing you many added advantages... not the least of which is being able to more easily compose sophisticated reports.
You want to model two many-to-many realtionships by the sound of it.
i.e. Many products (food items) can belong to many orders, and many addons can belong to many products:
Orders
Id
Products
Id
OrderLines
Id
OrderId
ProductId
Addons
Id
ProductAddons
Id
ProductId
AddonId
Option 1 is certainly a bad idea as it breaks even first normal form.
why dont you go for many-to-many relationship.
situation: one food can have many toppings, and one toppings can be in many food.
you have a food table and a toppings table and another FoodToppings bridge table.
this is just a brief idea. expand the database with your requirement
You're right, first one is a bad idea, because it is not compliant with normal form of tables and it would be hard to maintain it (e.g. if you remove some addon you would need to parse strings to remove ids from each row - really slow).
Having table you have already there is nothing wrong, but the primary key of that table will be (foodId, addonId) and not foodId itself.
Alternatively you can add another "id" not to use compound primary key.

Most efficient way to store nested categories (or hierarchical data) in Mongo?

We have nested categories for several products (e.g., Sports -> Basketball -> Men's, Sports -> Tennis -> Women's ) and are using Mongo instead of MySQL.
We know how to store nested categories in a SQL database like MySQL, but would appreciate any advice on what to do for Mongo. The operation we need to optimize for is quickly finding all products in one category or subcategory, which could be nested several layers below a root category (e.g., all products in the Men's Basketball category or all products in the Women's Tennis category).
This Mongo doc suggests one approach, but it says it doesn't work well when operations are needed for subtrees, which we need (since categories can reach multiple levels).
Any suggestions on the best way to efficiently store and search nested categories of arbitrary depth?
The first thing you want to decide is exactly what kind of tree you will use.
The big thing to consider is your data and access patterns. You have already stated that 90% of all your work will be querying and by the sounds of it (e-commerce) updates will only be run by administrators, most likely rarely.
So you want a schema that gives you the power of querying quickly on child through a path, i.e.: Sports -> Basketball -> Men's, Sports -> Tennis -> Women's, and doesn't really need to truly scale to updates.
As you so rightly pointed out MongoDB does have a good documentation page for this: https://docs.mongodb.com/manual/applications/data-models-tree-structures/ whereby 10gen actually state different models and schema methods for trees and describes the main ups and downs of them.
The one that should catch the eye if you are looking to query easily is materialised paths: https://docs.mongodb.com/manual/tutorial/model-tree-structures-with-materialized-paths/
This is a very interesting method to build up trees since to query on the example you gave above into "Womens" in "Tennis" you could simply do a pre-fixed regex (which can use the index: http://docs.mongodb.org/manual/reference/operator/regex/ ) like so:
db.products.find({category: /^Sports,Tennis,Womens[,]/})
to find all products listed under a certain path of your tree.
Unfortunately this model is really bad at updating, if you move a category or change its name you have to update all products and there could be thousands of products under one category.
A better method would be to house a cat_id on the product and then separate the categories into a separate collection with the schema:
{
_id: ObjectId(),
name: 'Women\'s',
path: 'Sports,Tennis,Womens',
normed_name: 'all_special_chars_and_spaces_and_case_senstive_letters_taken_out_like_this'
}
So now your queries only involve the categories collection which should make them much smaller and more performant. The exception to this is when you delete a category, the products will still need touching.
So an example of changing "Tennis" to "Badmin":
db.categories.update({path:/^Sports,Tennis[,]/}).forEach(function(doc){
doc.path = doc.path.replace(/,Tennis/, ",Badmin");
db.categories.save(doc);
});
Unfortunately MongoDB provides no in-query document reflection at the moment so you do have to pull them out client side which is a little annoying, however hopefully it shouldn't result in too many categories being brought back.
And this is basically how it works really. It is a bit of a pain to update but the power of being able to query instantly on any path using an index is more fitting for your scenario I believe.
Of course the added benefit is that this schema is compatible with nested set models: http://en.wikipedia.org/wiki/Nested_set_model which I have found time and time again are just awesome for e-commerce sites, for example, Tennis might be under both "Sports" and "Leisure" and you want multiple paths depending on where the user came from.
The schema for materialised paths easily supports this by just adding another path, that simple.
Hope it makes sense, quite a long one there.
If all categories are distinct then think of them as tags. The hierarchy isn't necessary to encode in the items because you don't need them when you query for items. The hierarchy is a presentational thing. Tag each item with all the categories in it's path, so "Sport > Baseball > Shoes" could be saved as {..., categories: ["sport", "baseball", "shoes"], ...}. If you want all items in the "Sport" category, search for {categories: "sport"}, if you want just the shoes, search for {tags: "shoes"}.
This doesn't capture the hierarchy, but if you think about it that doesn't matter. If the categories are distinct, the hierarchy doesn't help you when you query for items. There will be no other "baseball", so when you search for that you will only get things below the "baseball" level in the hierarchy.
My suggestion relies on categories being distinct, and I guess they aren't in your current model. However, there's no reason why you can't make them distinct. You've probably chosen to use the strings you display on the page as category names in the database. If you instead use symbolic names like "sport" or "womens_shoes" and use a lookup table to find the string to display on the page (this will also save you hours of work if the name of a category ever changes -- and it will make translating the site easier, if you would ever need to do that) you can easily make sure that they are distinct because they don't have anything to do with what is displayed on the page. So if you have two "Shoes" in the hierarchy (for example "Tennis > Women's > Shoes" and "Tennis > Men's > Shoes") you can just add a qualifier to make them distinct (for example "womens_shoes" and "mens_shoes", or "tennis_womens_shoes") The symbolic names are arbitrary and can be anything, you could even use numbers and just use the next number in the sequence every time you add a category.