I am porting over a DB instance from Oracle to OrientDB. In Oracle, we're using OVER (PARTITION BY to filter the data, but I haven't found a usable way to do this in OrientDB.
We have a table of Vehicles and a table of Locations. In our model, a vehicle can have many locations and one or more current locations. In Oracle, we had setup a View that managed the current locations by doing the following in the view definition:
SELECT LOCATION_ID, VEHICLE_ID, TIME (
FROM SELECT LOCATION ID, VEHICLE_ID, TIME,
MAX(TIME) OVER (PARTITION BY VEHICLE_ID) AS MAX_TIME FROM LOCATION)
WHERE TIME = MAX_TIME ORDER BY VEHICLE_ID
In Orient DB, I've created two Vertex Types: Vehicle and Location and an Edge Type of hasLocation with indexes on VEHICLE_ID, LOCATION_ID, LOCATION_TIME, hasLocation.in and hasLocation.out. I've been able to get the correct result, but not entirely what we need.
SELECT in.LOCATION_ID as LOCATION_ID from hasLocation where out.VEHICLE_ID = 'Vehicle-12'
AND in.TIME in (
SELECT max(OUT('hasLocation').TIME) as MAX_TIME FROM VEHICLE WHERE
uuid = 'Vehicle-12')
I'm still very early in development so I'm trying to figure out the best way to lay out the data and still to get what we need. Our current model has around 50k vehicles with 10M locations and growing quickly.
I’m able to get the data for one vehicle with the above implementation, but another common request would be something like “What are the current locations for 100 vehicles?”(and page for 101-200) or “given 100 vehicle IDs, what are their current locations?”
Any help on how better to layout the Vertices and Edges and what a better sql function might look like to get the data I need would be greatly appreciated. I also played around with creating the locations as a LinkSet/List, I just don’t know what the best “graph db” layout might be.
Thank you.
Related
How can I make a model join query(condition) and sort on relation models on Sails?
Example: I have 4 tables(collections in mongodb) and 4 related models in mongodb:
User: user_id, name
Follow: user_id, following_id (user id is being followed)
Point: user_id, point
Post: name, content, user_id, created_at
So from the post table, I want to make a query to find the posts of users that I'm following and sort by their point. Like this raw sql:
SELECT post.* FROM post
LEFT JOIN user_point up ON up.user_id = post.user_id
WHERE post.user_id IN (1,2,3,4) // assume I got my following_user_ids result is 1,2,3,4 for this case so no need to join follow table
ORDER BY up.point DESC // high point then first return
I don't know how can do this by Sails model? I have read many instructions by got no helps. Almost people said: Sails Association, but it just helps return the relation instead of do the where or order by to sort original model results(is this case: post).
I have worked with Yii2, a PHP framework so with this case I can do it easily:
Post::model()->leftJoin('user_point up', 'up.user_id = post.user_id')->where(['post.user_id' => [1,2,3,4])->orderBy(['up.point' => SORT_DESC])->all();
I'm stucked in Sails, very thanks if someone help me!!!
Because you're using Mongo, and because you need the full power of normal JOIN's, you will probably be forced to use some other ORM solution (i.e. mongodb package on npm) for queries like that.
Why? See the API documentation for sendNativeQuery(), which states native query features are only available for SQL-like DBMS's.
I use PostgreSQL10 and I want to built queries that have multiple optional parameters.
A user must input area name, but then it is optional to pick none or any combination of the following event, event date, category, category date, style
So a full query could be "all the banks (category), constructed in 1990 (category date) with modern architecture (style), that got renovated in 1992 (event and event date) in the area of NYC (area) ".
My problem is that all those are in different tables, connected by many-to-many tables, so I cannot do something like
SELECT * FROM mytable
WHERE (Event IS NULL OR Event = event)
I dont know if any good will come if I just join four tables.
I can easily find the area id, since it is required, but I dont know what the user chose, beside that.
Any suggestions on how to approach this, with Postgre?
Thanks
It might be optimal to build the entire query dynamically and only join in tables that you know you're going to need in order to apply the user's filters, but it's impractical. You're better off creating a view on the full set of tables. Use LEFT OUTER JOINs to ensure that you don't accidentally filter out valid combinations and index your tables to ensure that the query planner can navigate the table graph quickly. Then query the view with a WHERE clause reflecting only the filters you want to apply.
If performance becomes a concern and you don't mind having non-realtime data, you could use a materialized view to cache the results. Materialized views can be indexed directly, but this is a pretty radical change so don't do this unless you have to.
I'm not sure the title is the best way to phrase it, here's the structure:
Structure
Here's the db json backup if you want to import it to test it: http://pastebin.com/iw2d3uuy
I'd like to get the Dishes eaten by the Humans living in Continent 1 until a _Parent Human moved to Continent 2.
Which means the target is Dish 1 & 2.
If a parent moved to another Continent, I don't want their dish nor the dishes of their children, even if they move back to Continent 1.
I don't know if it matters, but a Human can have multiple children.
If there wasn't the condition about the children of a Human who has moved from the Continent, this query would have worked:
SELECT expand(in('_Is_in').in('_Lives').in('_Eaten_by'))
FROM Continent WHERE continent_id = 1
But I guess here we're forced to use (among other things)
TRAVERSE out('_Parent') FROM Human WHILE
I've tried to use the while of traverse with a subquery to get all the Humans I'm interested in, before to try to get the Dishes, but I'm not even sure we can use while with a subquery.
I hope the structure will help other users to quickly find out if this query is useful to them. If anyone is wondering, I used the Graph tab of OrientDB Studio to make it, along with GIMP.
As a bonus, if anyone knows the Gremlin syntax, it would also be useful to learn it.
Please feel free to edit this post as you see fit and contribute your thoughts :)
SELECT expand(in('_Eaten_by'))
FROM (TRAVERSE out('_Parent')
FROM (SELECT from Human WHERE in('_Parent').size() = 0)
WHILE out('_Lives').out('_Is_in').continent_id = 1)
Explanation:
TRAVERSE out('_Parent')
FROM (SELECT FROM Human WHERE in('_Parent').size() = 0)
WHILE out('_Lives').out('_Is_in').continent_id = 1
returns Human 1 and 2.
That query traverses Human, starting from Human 1 while the Human is connected to Continent 1.
It starts from in('_Parent').size() = 0 which are the Humans without any _Parent (there's only Human 1 in this case) (size() is the size of the collection of vertices coming in from _Parent).
And SELECT expand(in('_Eaten_by')) FROM
gets the Dishes, starting from the Humans we got from the traversal and going through the edge _Eaten_by.
Note: be sure to always use ' around the vertices and edges names, otherwise the names don't seem to be taken in account.
I'm working with OrientDB (2.2.10) and occasionaly I would like to visually inspect my dataset to make sure I'm doing things correctly. On this page of OrientDB http://orientdb.com/orientdb/ you see a nice visualization of a large graph with the following query:
select * from V limit -1;
So I tried the same query on my dataset but the result is so extremely sluggish that I can't work with it. My dataset is not extremely large (few hundred vertices, couple thousand edges) but still the result is unworkable. I tried all major browsers but with all I have the same result. Also my computer is not underpowered, I have a quad-core i7 with 16GB RAM.
As a very simple example I have the following graph:
BAR --WITHIN---> CITY --LOCATED_IN--> COUNTRY
Here: Find "friends of friends" with OrientDB SQL I was able to get at least an example of how to do this type of query on a graph. I managed to get a subset of my graph for example as follows:
select expand(
bothE('WITHIN').bothV()
) from Bar where barName='Foo' limit -1
This get's me the graph of 1 Bar vertex, the edge WITHIN and the City vertex. But if I now want to go one step further by also fetching the country which the city is located in I cannot get this style of query to work for me. I tried this:
select expand(
bothE('WITHIN').bothV()
.bothE('LOCATED_IN').bothV()
) from Bar where barName='Foo' limit -1
This results in the same subset being shown. However, if I first run the first query and then without clearing the canvas run the second query I do get the 3 vertices. So it seems I'm close but I would like to get all 3 vertices and it's edges in one query, not having to run first the one and then the other. Could someone point me in the right direction?
If you want to get all three vertices, it would be much easier start from the middle (city) and than get in and out to get bar and contry. I've tried with a similar little structure:
To get city, bar name and country you can try a query like this:
select name, in("WITHIN").name as barName,out("LOCATED_IN").name as barCountry from (select from City where name='Milan') unwind barName, barCountry
And the output will be:
Hope it helps.
If it is not suitable for your case, let me know.
You could use
traverse * from (select from bar where barName='Foo') while $depth <= 4
Example: I tried with this little graph
and I got
Hope it helps.
I am setting up a location aware application, as mentioned here. I have since learned a lot more about GIS apps, and have decided to change a few things about the setup I had originally proposed -- I'm now going to use a postgresql database using the postgis extension to allow for geometric fields, and use TIGER/Line data to fill it. The TIGER/Line data seems to offer different data sets in different resolutions (layers) -- there is data for states, counties, zips, blocks, etc. I need a way to associate a post to an address using the finest grain resolution possible.
For instance, if possible, I would like to associate a post with a particular street (finest resolution). If not a street, then a particular zip code (less specific). If not a zip code, then a particular county (less specific), and so on. Sidenote: I want to eventually show these all on a map.
This is what I propose:
Locations
id -- int
street_name -- varchar -- NULL
postal_code_id -- int -- NULL
county_id -- int -- NULL
state_id -- int
Postal Codes
id -- int
code -- varchar
geom -- geometry
Counties
id -- int
name -- varchar
geom -- geometry
The states table is similar, and so on...
As you can see, the locations table would decide the level of specificity by whatever fields are set. The postal codes, counties, and states table are not tied together by foreign key (too complex to determine a proper hierarchy that is valid everywhere), however, I believe that there is a way to determine their relationship using the geometry field (e.g., query what state a certain zip code is contained in or what zip codes belong to a certain state).
I think this is a good setup because if the database grows (lets say I decide to include data for districts or blocks in the database) then I can add another table for that data and then add another foreign key to the locations table (eg, block_id).
Does anybody know of a better way to do this?
Is it possible that a street belongs to two different counties? or two postal codes?, In my country this is possible, specially in cities. If this is possible then your schema won't work.
Despite of what I said before, I would add the geometry of the streets(open street map) without linking it to a postal code or county or even the state, and then with a simple query that intersects the geometry of the streets with the other tables you could get that information, and fill another table that has that relationships.