Find most common shared vertices in OrientDB - orientdb

I'm currently evaluating OrientDB (2.1.16) as a possible solution to building a similarity recommender. To that end, I'd love some help writing an initial query that accomplishes the following:
Vertex:Maker -(Edge:Produced)-> Vertex:Item -(Edge:TaggedBy)-> Vertex:Tag
I'd like to select a particular Item (V1) and get a list back of other Items (Vn) ordered by the number of Tags shared in common with V1;
By extension, I'd like to take a selected Maker (V2) and traverse through Items to get an ordered list of Makers (and the traversed Items, if possible) who share Tags.
There isn't an awful lot of detailed documentation on the application of intersect in this way. No unusual constraints in particular. There would be thousands of Items and Makers and probably 10x that many Tags.

I tried with this little graph example
I used this query
select item.name, count(tag)from (
select from (
MATCH {
CLASS:Item, AS:item, WHERE: (name<>'v1')
}
.out("TaggedBy"){AS:tag}
return item, tag
) where tag in (
select expand(tag) from (
MATCH {
CLASS:Item, AS:item, WHERE: (name='v1')
}.out("TaggedBy"){AS:tag}
return tag
)
)
) group by item order by count desc
and I got this result
Hope it helps.

Related

How to find most connected child nodes in OrientDB

So I'm new to OrientDB, and while I'm pretty good at SQL the syntax to get what I want in OrientDB is escaping me.
I know I can do something like select *, in().size() as size from Users order by size desc to find the most connected node of a certain class (Users in this case), but how do I find the most connected children a couple levels down?
I.e., let's say I have Organizations --> PROMOTES (edge) --> Platform --> MANAGES (edge) --> Suggestion
How do I find the most connected Suggestions at the Organization level? I.e., I know I can easily find the most connected suggestions one level out using the query I shared, but what about the most connected another level beyond that?
I'd ultimately like a result which lists each Suggestion along with how indirectly connected (number of edges) it is to Organizations.
Thank you!
Use TRAVERSE
You should use the command TRAVERSE to do that
SELECT out(PROMOTES).size() AS connectedOrg,*
FROM (
TRAVERSE out(MANAGES)
FROM Suggestion WHILE $depth < 2
)
WHERE $depth > 0
Result will be the Platforms linked to a Suggestion. Records can be duplicated.
Along with each Platform, you get the number of Organisation called connectedOrg.
About traverse :
Traverse follow the record ids present in a record and aggregates them in the results. With WHILE $depth < 2 you can limit search to only one level and with WHERE $depth > 0 you can remove the original record from the result. More info here.
Use OrientDB Functions
If you need to know how many Organization is linked to each Suggestion (through Plateform), use this syntax.
SELECT *, set(out(MANAGES).out(PROMOTES), null).size() FROM Suggestion
Note : , null allows to switch from AGGREGATE to INLINE. It prevents from having all sizes aggregated in a single record. See the doc for more info.

OrientDB - Find related items of a vertex

We are building a system that will audit searches of a user (as well as other actions by the user). We will be tracking data such as User1 searched for Term1. There is an edge called 'searched' between a User and a Term vertex. What we are trying to find is some related information like users that searched for Term1 also searched for these terms (Term2, Term3, etc) and possibly some other related information between users and terms like "you may know these users". I am guessing a traversal is needed, however what I'm wondering is if depth of a traversal matters and will tell us the data we want. If we go the traversal route how deep do we set before we lose actual relevance?
So far this is what we pieced together but we aren't entirely sure if it is the correct approach.
SELECT $depth, * FROM (TRAVERSE * FROM (SELECT FROM Searched where q = 'al') STRATEGY BREADTH_FIRST ) WHERE #class in ['Term']
Update: what I ended up going with so far was the following query. It will tell me what other users searched for and a count of how many times. So I sort by how close a vertex is and how many times it was searched for. I feel this should hopefully provide a fairly good sample of what other users are searching for that also searched for that term.
SELECT $depth, q, in().size() AS count FROM (TRAVERSE * FROM (select from Term where q.toLowerCase() = 'aluminum') STRATEGY BREADTH_FIRST) WHERE #class = 'Term' AND $depth <> 0 ORDER BY $depth ASC, count DESC

How to design database schema for meteor/mondodb for this situation?

I don't know how the collection for a meteor app should be.
In MySQL I would have 3 tables:
table_1: id, column_a, column_b
table_2: id, table_1_id, column_c, column_d
table_3: id, column_e, column_f
table_2 and table_3 could have the same identical rows. Some information can be in both tables, in table_2 and not in table_3, in table_3 and not in table_2.
I know that in meteor/mongodb when you design database schema, you need to know how you will access/display the information. I want to display something like this:
table_1.column_a
show all rows from table2 where table_2.table_1_id=table_1.id; and I
also want to check if table_2.column_c=table_3.column_e, if it's true
than I want to display that row from table_3.
I hope you understand, also if you have some suggestions about subscriptions/publications would be much appreciated.
P.S. I am sorry for the title of this topic, but I couldn't find a more specific title.
UPDATE:
Explaining it above I better understand the problem.
What I want is like a list of products(list A), and every product has a list of specifications. And I would like to have another list(list B), where I have a list of specifications with more details.
And I want to display the product details, including it's list of specifications, and when it displays the specifications of the products, I want to search in list B to find if there is a similar item, to show it's full descriptions.
I want to make that search when it's displaying because I want to be able to add the specification details(list B) later and this list will be updated periodically.
The list A(title, and another 3-4 columns) would have tens of thousands of products, the list of specification(title) of products in list A would have 10-20 items, and list B(title, description, status) would have a few hundreds.
I have an idea to create a collection of list A and for every product in there add an array with the specifications, and another collection for list B. I would subscribe/publish the whole collection of list B, and when I display the list A, I would search for every specification in list B. I don't know how good this idea is.

OrientDB query for hierarchical data

OrientDB Server v2.0.10 ,
I am trying to come up with a query for the following scenario.
I have 2 hierarchies: A->B->C and D->E->F
The number of nodes in the hierarchy can change.
The node in 1st hierarchy can be connected to the other hierarchy using some relation say 'Assigned'.
What I want is the parent node of the 2nd hierarchy if there is any incoming edge to any of the node in that 2nd hierarchy from the 1st.
For example, say we have Car-Child->Engine-Child->Piston and Country-Child->State-Child->City
And a relationship Made_In which relates Car or Engine or Piston to either Country or State or City
So if there is a relation with either of Country or State or City, the Country should be returned. Example, Engine1-Made_In->Berlin, this would return Germany.
Sorry for such a toyish example. I hope it is clear.
Thanks.
You should consider reading the chapter about "traversing" - that should be the missing link to answer your question. You can find it here: http://orientdb.com/docs/last/SQL-Traverse.html
Basically, if you think of your graph as a family tree, you want to achieve 3 things:
Find all children, grand-children, grand-grand-children (and so on) from tree 1 for a given family member (=Hierarchy1)
Find those who have relations to members of another family tree (=ASSIGNED)
Show me who's on top of this tree (=Hierarchy2)
One of the possible solutions should look a little something like this:
Since you want to end up on top of hierarchy2, you have to start on the other side, i.e. hierarchy1.
Get hierarchy1 (top-to-bottom)
TRAVERSE out("CHILD") FROM Car
Choose all relations
SELECT out("MADE_IN) FROM ([1])
and from those, go bottom-to-top
TRAVERSE in("CHILD") FROM ([2])
Who's on top?
SELECT FROM ([3]) WHERE #class="Country"
Combined into one sql, it looks as ugly as this:
SELECT FROM (
TRAVERSE in("CHILD") FROM (
SELECT out("MADE_IN") FROM (
TRAVERSE out("CHILD") FROM Car
)
)
) WHERE #class="Country"
You could replace Car with any #rid in hierarchy1 to get a list of countries it or any part of it was made in.
There might be better solutions for sure. But at least this one should work, so I hope it will help.

Sphinx Filtering based on categories using OR

I have the following text fields I search with Sphinx: Title, Description, keywords.
However, sometimes things are narrowed down using categories. We have 3 category fields: CatID1, CatID2 and CatID3.
So, for example, I need to see if the word "Kittens" is in the Title, Description, or Keywords, but I also want to filter so that only items that have the categories (Animals - ID Number 8) or (Pets - ID Number 9) or (Felines - Category ID Number 10) in either of those CatID fields.
To clarify, only show items that have a 8,9 or 10 in CatID1, 2 or 3.
Any ideas on how I would accomplish this using sphinx filtering or searching the CatID1 fields as keywords?
Note: I am able to filter and it works great only using one category, i.e:
if(!empty($cat_str)) {
$cl->SetFilter( 'catid1', array( $cat_str ));
}
Thanks!
Craig
SetFilter takes an array. In your example you are putting $cat_str into an array. A array of one item.
So you just needs to build array with all the ids.
$cl->SetFilter( 'catid', array( $cat1, $cat2, $cat3 ));
But thats not very flexible. So you probably build the array dynamically, rather than hard-coded like that. But thats upto your application how to build the array.
But also storing the ids, in three sperate attributes, makes it hard to search. Notice in the above example, just noticed a attribute called catid. This would be a single multi-value attribute, that contains the ids from all three cat fields. That way its easy to search for ids in ANY of the columns at once.
http://sphinxsearch.com/docs/current.html#mva
if using a sql source, could do with something like
sql_query = SELECT id, title ... , CONCAT_WS(',', CatID1, CatID2 and CatID3) as catid FROM ...
sql_attr_multi = uint catid from field;