I have an orientdb database with a few million vertices and a few hundred million edges. Some vertices have hundreds of thousands of edges associated with them.
I want to execute random walks on this graph. We'd be content, for now, to get simple random walks working.
To achieve this, my goal is to be able to pick a random edge attached to a specific vertex. What is the best way to do this?
Say I have a highly connected vertex of class "metadata" at #17:0.
I have a bunch of lightly connected vertex class "documents".
I have an edge class "metadata_of"
The metadata #17:0 has 200,000 "metadata_of" edges connecting it to 200,000 different document vertices.
I want to go from a metadata object, through a randomly selected metadata_of object, to the corresponding document object.
I had hoped to be able to run a random order sort to be able to get back a single random edge, but random functionality appears to be a pending enhancement filed back in January of 2014 - https://github.com/orientechnologies/orientdb/issues/1946 and there has been no apparent activity on it since June of 2015.
It seems like a potential way to go about this would be to retrieve the size of the inE value (in my case), then generate a random integer i between the size 0 and len(inE). From there, I want to retrieve edge[i] from the set of inE for a given vertex. I thought I had something like this working conveniently in Gremlin but on re-evaluation it doesn't appear to be working at speed - rather, it appears to be traversing the inE list until it reaches index X. Usually better than retrieving all 200k edges, but not ideal for performance.
gremlin> g = new OrientGraph("remote:localhost/mydb");
Oct 06, 2015 11:03:54 PM com.orientechnologies.common.log.OLogManager log
==>orientgraph[remote:localhost/activeint]
gremlin> v1 = g.v("#17:0")
==>v(concept)[#17:0]
gremlin> v1.inE[554] (this took about 4 seconds)
==>e[#18:8628863][#13:305536-metadata_of->#17:0]
What is the most sensible way to, given a specific vertex, select random edge attached to it?
I've created this following function in javascript with #rid like parameter:
var g=orient.getGraph();
var c=g.command("sql","select out('metadata_of').size() as num from "+rid);
var rand=Math.floor((Math.random() * c[0].getProperty('num')-1) + 0);
var pick=g.command("sql","select expand(out('metadata_of')['"+rand+"']) from "+rid);
return pick;
You can call the function in studio in this way:
select expand(getRandomEdge(12:0)) from (select getRandomEdge(12:0))
P.S Pay attention to insert a valid #rid
Related
I'm new to ODB, but not SQL, and can't seem to find a tutorial to learn from, or a even similar q&a; but that's likely my inability to ask the question correctly.
I'm looking for a way to find all Vertices of a particular class (e.g. Claim) which have at least one specific outE class (e.g. available_to_role) and those Edges all have a particular property value (e.g. role="adj1") and those Edges also all have on their corresponding other edge a particular Vertices with a particular property value (e.g. date).
I've tried searching from both sides of the edge and from the edge in particular but it doesn't work as expected. I always get all Claims vertices connected with any one date of a list of more than one date, and not only the Claims that match with all dates.
This is the closest I've gotten, but it results in Claims returned that are only available on one of the two dates, I'm not sure how to force an && result to only get claims available on all dates provided.
SELECT
*,
out.label as claim_label,
in.date as date,
count(in.date)
FROM
available_to_role
WHERE
role='adj1'
AND
in.date in ['2018-06-02 00:00:00','2018-06-03 00:00:00']
GROUP BY
in.date
I'm looking for a way to get all Claims that are available to a particular role on all the dates supplied, not just one of the dates supplied. And, I need those results as individual responses, not aggregated (e.g. ["role","role"] ["date","date"]). In all case, I know the dates and roles, but not the Claims.
Any help would be greatly appreciated.
I'm not sure to have undertood correctly but:
try this:
select from Claims where outE("available_to_role").role contains "adj1" and out("available_to_role").date in ['2018-06-02 00:00:00','2018-06-03 00:00:00']
if you want to make sure that the vertex in Claims class match both dates you have to use AND condition for both dates
Hope it helps
Regards
I'm not sure the title is the best way to phrase it, here's the structure:
Structure
Here's the db json backup if you want to import it to test it: http://pastebin.com/iw2d3uuy
I'd like to get the Dishes eaten by the Humans living in Continent 1 until a _Parent Human moved to Continent 2.
Which means the target is Dish 1 & 2.
If a parent moved to another Continent, I don't want their dish nor the dishes of their children, even if they move back to Continent 1.
I don't know if it matters, but a Human can have multiple children.
If there wasn't the condition about the children of a Human who has moved from the Continent, this query would have worked:
SELECT expand(in('_Is_in').in('_Lives').in('_Eaten_by'))
FROM Continent WHERE continent_id = 1
But I guess here we're forced to use (among other things)
TRAVERSE out('_Parent') FROM Human WHILE
I've tried to use the while of traverse with a subquery to get all the Humans I'm interested in, before to try to get the Dishes, but I'm not even sure we can use while with a subquery.
I hope the structure will help other users to quickly find out if this query is useful to them. If anyone is wondering, I used the Graph tab of OrientDB Studio to make it, along with GIMP.
As a bonus, if anyone knows the Gremlin syntax, it would also be useful to learn it.
Please feel free to edit this post as you see fit and contribute your thoughts :)
SELECT expand(in('_Eaten_by'))
FROM (TRAVERSE out('_Parent')
FROM (SELECT from Human WHERE in('_Parent').size() = 0)
WHILE out('_Lives').out('_Is_in').continent_id = 1)
Explanation:
TRAVERSE out('_Parent')
FROM (SELECT FROM Human WHERE in('_Parent').size() = 0)
WHILE out('_Lives').out('_Is_in').continent_id = 1
returns Human 1 and 2.
That query traverses Human, starting from Human 1 while the Human is connected to Continent 1.
It starts from in('_Parent').size() = 0 which are the Humans without any _Parent (there's only Human 1 in this case) (size() is the size of the collection of vertices coming in from _Parent).
And SELECT expand(in('_Eaten_by')) FROM
gets the Dishes, starting from the Humans we got from the traversal and going through the edge _Eaten_by.
Note: be sure to always use ' around the vertices and edges names, otherwise the names don't seem to be taken in account.
I'm working with OrientDB (2.2.10) and occasionaly I would like to visually inspect my dataset to make sure I'm doing things correctly. On this page of OrientDB http://orientdb.com/orientdb/ you see a nice visualization of a large graph with the following query:
select * from V limit -1;
So I tried the same query on my dataset but the result is so extremely sluggish that I can't work with it. My dataset is not extremely large (few hundred vertices, couple thousand edges) but still the result is unworkable. I tried all major browsers but with all I have the same result. Also my computer is not underpowered, I have a quad-core i7 with 16GB RAM.
As a very simple example I have the following graph:
BAR --WITHIN---> CITY --LOCATED_IN--> COUNTRY
Here: Find "friends of friends" with OrientDB SQL I was able to get at least an example of how to do this type of query on a graph. I managed to get a subset of my graph for example as follows:
select expand(
bothE('WITHIN').bothV()
) from Bar where barName='Foo' limit -1
This get's me the graph of 1 Bar vertex, the edge WITHIN and the City vertex. But if I now want to go one step further by also fetching the country which the city is located in I cannot get this style of query to work for me. I tried this:
select expand(
bothE('WITHIN').bothV()
.bothE('LOCATED_IN').bothV()
) from Bar where barName='Foo' limit -1
This results in the same subset being shown. However, if I first run the first query and then without clearing the canvas run the second query I do get the 3 vertices. So it seems I'm close but I would like to get all 3 vertices and it's edges in one query, not having to run first the one and then the other. Could someone point me in the right direction?
If you want to get all three vertices, it would be much easier start from the middle (city) and than get in and out to get bar and contry. I've tried with a similar little structure:
To get city, bar name and country you can try a query like this:
select name, in("WITHIN").name as barName,out("LOCATED_IN").name as barCountry from (select from City where name='Milan') unwind barName, barCountry
And the output will be:
Hope it helps.
If it is not suitable for your case, let me know.
You could use
traverse * from (select from bar where barName='Foo') while $depth <= 4
Example: I tried with this little graph
and I got
Hope it helps.
I am working on a project where we are going to be looking at finding the shortest/fastest route from point A to point B. I've been looking at the tables generated by the osm2pgsql. And I'm wondering how would I represent a road obstructed after the osm has been loaded into our database. Our project will rely on osm to map out all of the roads we will also have an operator looking at live video footage of roads. At which point if the operator see's a road is obstructed we want to update the database to reflect this road obstructed say by a downed tree.
I've been looking at all of the columns and the only one that stands out in my head is barrier. I have been unable to find any documentation on what each column represents and how pg_route takes each into consideration when creating a route. What I'm looking for is a column that when pg_route looks in the database and sees a road it says oh that roads blocked skip it?
This is good question for gis.se...
First thing is pg_routing can't route via data generated by osm2pgsql - this data is not a network. You need data generated by osm2po or osm2pgrouting and this data is quite different.
Second thing is - there is no such column. In every pg_routing function you're passing sql which will select data for route search so you're deciding which edge will be in this dataset and which not - it's not a problem to add extra column to table with edges.
Here is link to pgrouting workshop it will guide you through all process from import of data to first generated route. It's using osm2pgroutin to import data, but I suggest you use osm2po instead.
So as Jendrusk mentioned, when you generate a route you will pass the function a SQL query to select the edges for the graph you want to solve, 'select * from edges where the_geom && <bbox>' You can model blockages using point and radius, lines, or polygons that you want the route to avoid by adding to the query above avoidance zones like:
'select * from edges where the_geom && <bbox> and not st_dwithin(the_geom, point, radius) and not stdwithin(the_geom, line_or_polygon, 0.0)'
If you have lots of these avoidances then put them in a table and do a join to eliminate the edges that are used to build the graph. If the edges are not there the route is forces to find a way around the avoidance.
Background Info
C#
MS MVC 4
Sql Azure
Linq - Identities
Problem at hand:
Selecting records in an Items table where zip code is within a certain range of miles.
Items Table
id (PK)
Title
Body
ZipCode (Int)
Summary of Progress:
I have a class which uses the 2013 US Gazatteer zip code and tabulation areas to gather zip codes and assess distances between zip codes. It is basically a .csv/.txt file that I open into a stream and convert to POCOs in order to process distances. That much of the equation is working fine; however, selecting a list of Items from an Items table based on this list of zip codes is where I'm not sure what to do.
Scenario
User A wants to search for items within a 25 miles radius of area code 46324.
User A hits search and in the background my class returns a list of 124 zip codes within a 25 mile radius.
Question: What is the best way (performance wise) to retrieve items in my Item table using this list of zipcodes?
Possible Solutions
I thought about creating a dynamic query using the tsql in keyword within my where clause and simply supplying this list as the where parameters. This does not seem to be a very performance oriented way of doing this; however, considering my current architecture I do not see any other way.
I also thought about incorporating a sort of paging functionality that will only take the first 5 zip codes to return results followed by the next 5 and so on and so on. This will involve more work but it definitely would seem to be a better performance choice.
Any ideas?
I stumbled across your question purely by chance searching for something else, and also I see it's quite old, but I thought I'd give you a comment none the less:
What I would do in this case is actually allow the database to do the search and the C# to do the calcs. You have a class in C# which calculates the distances? Then why not save the distance from each zip code to each zip code in a "lookup table" in sql.
Doing it this way makes sure that the data is calculated once but you let the sql find the right data for you.
ie:
Create a table with from_zip, to_zip, distance fields
Calculate and populate table once at the beginning
Query by saying "select * from zip_lookup where zip_from = bla and distance between 0 and 100" or something like that