I want to select all vertices that are connected to another vertex. I am currently using the traverse function in OrientDB. Consider the following example:
> create class professor extends V
> create class course extends V
> insert into professor set name='Smith'
Inserted record 'professor#14:0{name:Smith} v1'
> insert into course set name='Calculus'
Inserted record 'course#15:0{name:Calculus} v1'
> create class teaches extends E
> create edge teaches from #14:0 to #15:0
Created edge '[teaches#16:0{out:#14:0,in:#15:0} v3]'
Now when I try to traverse to find the course(s) that professor Smith teaches I use the following command:
> traverse out_teaches from #15:0
----+-----+---------+-----+-----------+-----+-----
# |#RID |#CLASS |name |out_teaches|out |in
----+-----+---------+-----+-----------+-----+-----
0 |#14:0|professor|Smith|[size=1] |null |null
1 |#16:0|teaches |null |null |#14:0|#15:0
----+-----+---------+-----+-----------+-----+-----
Why does this return to me the edge and not the vertex (course) that I am looking for? What is the appropriate command to return to me the vertex? I want the record for 'Calculus' to be returned.
I expanded your graph a bit to try your query.
If you want to know only the connected vertices to some starting vertex by the edge 'teaches' you should use SELECT EXPAND (OUT / IN / BOTH) because TRAVERSE is more useful if you wish to explore the graph at different depths (in my case "Smith" has the #rid #11:0):
select expand(out('teaches')) from (select from Professor where name='Smith')
----+-----+------+------------+----------+----------
# |#RID |#CLASS|name |in_teaches|in_follows
----+-----+------+------------+----------+----------
0 |#12:0|course|Calculus |[size=1] |[size=1]
1 |#12:1|course|Astrophysics|[size=1] |[size=1]
2 |#12:2|course|Law |[size=2] |[size=1]
----+-----+------+------------+----------+----------
or with select expand(out('teaches')) from #11:0 you will obtain the same result:
----+-----+------+------------+----------+----------
# |#RID |#CLASS|name |in_teaches|in_follows
----+-----+------+------------+----------+----------
0 |#12:0|course|Calculus |[size=1] |[size=1]
1 |#12:1|course|Astrophysics|[size=1] |[size=1]
2 |#12:2|course|Law |[size=2] |[size=1]
----+-----+------+------------+----------+----------
or you can obtain all the connected vertices to the professor "Smith"
select expand(out()) from professor where name="Smith"
----+-----+----------+------------+----------+----------+------------+----------
# |#RID |#CLASS |name |in_teaches|in_follows|in_studiesAt|in_worksAt
----+-----+----------+------------+----------+----------+------------+----------
0 |#12:0|course |Calculus |[size=1] |[size=1] |null |null
1 |#12:1|course |Astrophysics|[size=1] |[size=1] |null |null
2 |#12:2|course |Law |[size=2] |[size=1] |null |null
3 |#16:0|university|Cambridge |null |null |[size=1] |[size=1]
----+-----+----------+------------+----------+----------+------------+----------
Your query traverse out_teaches from #11:0 seems to list the starting vertex and all of the connected edges with relative IN and OUT vertices:
----+-----+---------+-----+-----------+-----------+-----+-----
# |#RID |#CLASS |name |out_teaches|out_worksAt|out |in
----+-----+---------+-----+-----------+-----------+-----+-----
0 |#11:0|professor|Smith|[size=3] |[size=1] |null |null
1 |#13:0|teaches |null |null |null |#11:0|#12:0
2 |#13:1|teaches |null |null |null |#11:0|#12:1
3 |#13:2|teaches |null |null |null |#11:0|#12:2
----+-----+---------+-----+-----------+-----------+-----+-----
I tried also traverse out_teaches from professor and the result is similar to the previous query:
----+-----+---------+-----+-----------+-----------+-----+-----
# |#RID |#CLASS |name |out_teaches|out_worksAt|out |in
----+-----+---------+-----+-----------+-----------+-----+-----
0 |#11:0|professor|Smith|[size=3] |[size=1] |null |null
1 |#13:0|teaches |null |null |null |#11:0|#12:0
2 |#13:1|teaches |null |null |null |#11:0|#12:1
3 |#13:2|teaches |null |null |null |#11:0|#12:2
4 |#11:1|professor|Green|[size=1] |[size=1] |null |null
5 |#13:3|teaches |null |null |null |#11:1|#12:2
----+-----+---------+-----+-----------+-----------+-----+-----
The correct syntax for selecting the courses (at least in OrientDB 2.1) would be based on out('teaches'). For example:
> select expand(out('teaches')) from (select from Professor where name='Smith')
----+-----+------+--------+----------
# |#RID |#CLASS|name |in_teaches
----+-----+------+--------+----------
0 |#12:0|Course|Calculus|[size=1]
----+-----+------+--------+----------
That is, there's just one vertex, as expected.
Please note that 'traverse' is used for a different purpose. It involves an iterative procedure for traversing graphs.
out_teaches
"out_teaches" is a reference to an edge. Using OrientDB 2.1.7, the response I obtained for your "out_teaches" query is as follows:
> select expand(out_teaches) from (select from Professor where name='Smith')
----+-----+-------+-----+-----
# |#RID |#CLASS |out |in
----+-----+-------+-----+-----
0 |#13:0|teaches|#11:0|#12:0
----+-----+-------+-----+-----
Again, this is what one would expect - an edge.
Your query is working fine for me.
In my case I have the rid as #11:0 for professor, #12:0 for course and #13:0 for teaches
Just rerun your query once again or try the below :
traverse both('teaches') from #12:0
Related
I have a dataframe looking like this (just some example values):
| id | timestamp | mode | trip | journey | value |
1 2021-09-12 23:59:19.717000 walking 1 1 1.21
1 2021-09-12 23:59:38.617000 walking 1 1 1.36
1 2021-09-12 23:59:38.617000 driving 2 1 1.65
2 2021-09-11 23:52:09.315000 walking 4 6 1.04
I want to create new columns which I fill with the previous and next mode. Something like this:
| id | timestamp | mode | trip | journey | value | prev | next
1 2021-09-12 23:59:19.717000 walking 1 1 1.21 bus driving
1 2021-09-12 23:59:38.617000 walking 1 1 1.36 bus driving
1 2021-09-12 23:59:38.617000 driving 2 1 1.65 walking walking
2 2021-09-11 23:52:09.315000 walking 4 6 1.0 walking driving
I have tried to partition by id, trip, journey and mode and ordered by timestamp. Then I tried to use lag() and lead() but I am not sure these work on other partitions. I came across the Window.unboundedPreceding and Window.unboundedFollowing, however I am not sure I completely understand how these work. In my mind I think that if I partition the data as explained above I will always just need the last value of mode from the previous partition and to fill the next I could reorder the partition from ascending to descending on the timestamp and then do the same to fill the next column. However, I am unsure how I get the last value of the previous partition.
I have tried this:
w = Window.partitionBy("id", "journey", "trip").orderBy(col("timestamp").asc())
w_prev = w.rowsBetween(Window.unboundedPreceding, Window.unboundedFollowing)
df = df.withColumn("prev", first("mode").over(w_prev))
Code examples and explainations using pyspark will be very appreciated!
So, based on what I could understand you could do something like this,
Create a partition based on ID and their journey, within each journey there are multiple trips, so order by trip and lastly the timestamp, and then simply use the lead and lag to get the output!
w = Window().partitionBy('id', 'journey').orderBy('trip', 'timestamp')
df.withColumn('prev', F.lag('mode', 1).over(w)) \
.withColumn('next', F.lead('mode', 1).over(w)) \
.show(truncate=False)
Output:
+---+--------------------------+-------+----+-------+-----+-------+-------+
|id |timestamp |mode |trip|journey|value|prev |next |
+---+--------------------------+-------+----+-------+-----+-------+-------+
|1 |2021-09-12 23:59:19.717000|walking|1 |1 |1.21 |null |walking|
|1 |2021-09-12 23:59:38.617000|walking|1 |1 |1.36 |walking|driving|
|1 |2021-09-12 23:59:38.617000|driving|2 |1 |1.65 |walking|null |
|2 |2021-09-11 23:52:09.315000|walking|4 |6 |1.04 |null |null |
+---+--------------------------+-------+----+-------+-----+-------+-------+
EDIT:
Okay as OP asked, you can do this to achieve it,
# Used for taking the latest record from same id, trip, journey
w = Window().partitionBy('id', 'trip', 'journey').orderBy(F.col('timestamp').desc())
# Used to calculate prev and next mode
w1 = Window().partitionBy('id', 'journey').orderBy('trip')
# First take only the latest rows for a particular combination of id, trip, journey
# Second, use the filtered rows to get prev and next modes
df2 = df.withColumn('rn', F.row_number().over(w)) \
.filter(F.col('rn') == 1) \
.withColumn('prev', F.lag('mode', 1).over(w1)) \
.withColumn('next', F.lead('mode', 1).over(w1)) \
.drop('rn')
df2.show(truncate=False)
Output:
+---+--------------------------+-------+----+-------+-----+-------+-------+
|id |timestamp |mode |trip|journey|value|prev |next |
+---+--------------------------+-------+----+-------+-----+-------+-------+
|1 |2021-09-12 23:59:38.617000|walking|1 |1 |1.36 |null |driving|
|1 |2021-09-12 23:59:38.617000|driving|2 |1 |1.65 |walking|null |
|2 |2021-09-11 23:52:09.315000|walking|4 |6 |1.04 |null |null |
+---+--------------------------+-------+----+-------+-----+-------+-------+
# Finally, join the calculated DF with the original DF to get prev and next mode
final_df = df.alias('a').join(df2.alias('b'), ['id', 'trip', 'journey'], how='left') \
.select('a.*', 'b.prev', 'b.next')
final_df.show(truncate=False)
Output:
+---+----+-------+--------------------------+-------+-----+-------+-------+
|id |trip|journey|timestamp |mode |value|prev |next |
+---+----+-------+--------------------------+-------+-----+-------+-------+
|1 |1 |1 |2021-09-12 23:59:19.717000|walking|1.21 |null |driving|
|1 |1 |1 |2021-09-12 23:59:38.617000|walking|1.36 |null |driving|
|1 |2 |1 |2021-09-12 23:59:38.617000|driving|1.65 |walking|null |
|2 |4 |6 |2021-09-11 23:52:09.315000|walking|1.04 |null |null |
+---+----+-------+--------------------------+-------+-----+-------+-------+
I use Orientdb 2.2.35. I insert some documents into it until a conflict occurs.
When I check the record version, it didn't change during the insertion (After conflict). In my example you can see the version of #18:0 after I insert an edge (create edge mye from #18:0 to #19:0)
Error:
com.orientechnologies.orient.core.exception.OConcurrentModificationException:
Cannot UPDATE the record #18:0 because the version is not the latest.
Probably you are updating an old record or it has been modified by
another user (db=v2 your=v1)
orientdb {db=TestDB}> select * from #18:0
+----+-----+------+----+------------------------------------------------------------------------+
|# |#RID |#CLASS|id |out_MyE |
+----+-----+------+----+------------------------------------------------------------------------+
|0 |#18:0|MyV |1 |[#22:0,#22:1,#22:2,#22:3,#22:4,#22:5,#22:6,#22:7,#22:8,#22:9(size=5000)]|
+----+-----+------+----+------------------------------------------------------------------------+
1 item(s) found. Query executed in 0.002 sec(s).
orientdb {db=TestDB}> load record #18:0
DOCUMENT #class:MyV #rid:#18:0 #version:2
+----+-------+------------------------------------------------------------------------+
|# |NAME |VALUE |
+----+-------+------------------------------------------------------------------------+
|0 |id |1 |
|1 |out_MyE|[#22:0,#22:1,#22:2,#22:3,#22:4,#22:5,#22:6,#22:7,#22:8,#22:9(size=5000)]|
+----+-------+------------------------------------------------------------------------+
OK
orientdb {db=TestDB}> create edge mye from #18:0 to #19:0
+----+--------+------+-----+-----+
|# |#RID |#CLASS|out |in |
+----+--------+------+-----+-----+
|0 |#22:5250|MyE |#18:0|#19:0|
+----+--------+------+-----+-----+
Created '1' edges in 0.017000 sec(s).
orientdb {db=TestDB}> select * from #18:0
+----+-----+------+----+------------------------------------------------------------------------+
|# |#RID |#CLASS|id |out_MyE |
+----+-----+------+----+------------------------------------------------------------------------+
|0 |#18:0|MyV |1 |[#22:0,#22:1,#22:2,#22:3,#22:4,#22:5,#22:6,#22:7,#22:8,#22:9(size=5001)]|
+----+-----+------+----+------------------------------------------------------------------------+
1 item(s) found. Query executed in 0.001 sec(s).
orientdb {db=TestDB}> load record #18:0
DOCUMENT #class:MyV #rid:#18:0 #version:2
+----+-------+------------------------------------------------------------------------+
|# |NAME |VALUE |
+----+-------+------------------------------------------------------------------------+
|0 |id |1 |
|1 |out_MyE|[#22:0,#22:1,#22:2,#22:3,#22:4,#22:5,#22:6,#22:7,#22:8,#22:9(size=5001)]|
+----+-------+------------------------------------------------------------------------+
OK
This is a common issue caused by a wrong approach to concurrency or transactions.
You're gonna need to troubleshoot the cause and either write fail-safe code or change your graph consistency level
OrientDB | Troubleshooting OConcurrentModificationException
How is the result of %ROWCOUNT displayed in the SQL statement.
Example
Select top 10 * from myTable.
I would like the results to have a rowCount for each row returned in the result set
Ex
+----------+--------+---------+
|rowNumber |Column1 |Column2 |
+----------+--------+---------+
|1 |A |B |
|2 |C |D |
+----------+--------+---------+
There are no any simple way to do it. You can add Sql Procedure with this functionality and use it in your SQL statements.
For example, class:
Class Sample.Utils Extends %RegisteredObject
{
ClassMethod RowNumber(Args...) As %Integer [ SqlProc, SqlName = "ROW_NUMBER" ]
{
quit $increment(%rownumber)
}
}
and then, you can use it in this way:
SELECT TOP 10 Sample.ROW_NUMBER(id) rowNumber, id,name,dob
FROM sample.person
ORDER BY ID desc
You will get something like below
+-----------+-------+-------------------+-----------+
|rowNumber |ID |Name |DOB |
+-----------+-------+-------------------+-----------+
|1 |200 |Quigley,Neil I. |12/25/1999 |
|2 |199 |Zevon,Imelda U. |04/22/1955 |
|3 |198 |O'Brien,Frances I. |12/03/1944 |
|4 |197 |Avery,Bart K. |08/20/1933 |
|5 |196 |Ingleman,Angelo F. |04/14/1958 |
|6 |195 |Quilty,Frances O. |09/12/2012 |
|7 |194 |Avery,Susan N. |05/09/1935 |
|8 |193 |Hanson,Violet L. |05/01/1973 |
|9 |192 |Zemaitis,Andrew H. |03/07/1924 |
|10 |191 |Presley,Liza N. |12/27/1978 |
+-----------+-------+-------------------+-----------+
If you are willing to rewrite your query then you can use a view counter to do what you are looking for. Here is a link to the docs.
The short version is you move your query into a FROM clause sub query and use the special field %vid.
SELECT v.%vid AS Row_Counter, Name
FROM (SELECT TOP 10 Name FROM Sample.Person ORDER BY Name) v
Row_Counter Name
1 Adam,Thelma P.
2 Adam,Usha J.
3 Adams,Milhouse A.
4 Allen,Xavier O.
5 Avery,James R.
6 Avery,Kyra G.
7 Bach,Ted J.
8 Bachman,Brian R.
9 Basile,Angelo T.
10 Basile,Chad L.
I have a structure that looks something like this:
How can I traverse my Page and get back a flat record so that each row represents all of data from the root node and its edges. My use case is that I'm producing a csv file.
so from the example above, i would like to create a row for each post. Each record should contain all fields from post, the language name, the page name, and the network name.
From what I can tell, when you do any kind of traversal, it only gives you the result of the final vertex and not any data from the vertices in between.
Try this query:
select *,out('posted_to').name as page,out('posted_to').out('is_language').name as language,out('posted_to').out('is_network').name as network from <class Post> unwind page,language,network
If there are many posts per page, then anchoring the query on the Pages may be more efficient than starting with the Posts.
Ergo:
select focus.in() as post,
focus.name as page,
focus.out("is_language").name as language,
focus.out("is_network").name as network
from (select #this as focus from Page)
unwind post, language, network, page
----+------+-----+----+--------+-------
# |#CLASS|post |page|language|network
----+------+-----+----+--------+-------
0 |null |#11:0|1 |Welsh |1
1 |null |#11:1|1 |Welsh |1
2 |null |#11:2|1 |Welsh |1
3 |null |#11:3|1 |Welsh |1
4 |null |#11:4|1 |Welsh |1
5 |null |#11:5|1 |Welsh |1
6 |null |#11:6|1 |Welsh |1
----+------+-----+----+--------+-------
I am struggling, maybe the simplest problem ever. My SQL knowledge pretty much limits me from achieving this. I am trying to build an sql query that should show JobTitle, Note and NoteType. Here is the thing, First job doesn't have any note but we should see it in the results. System notes never and ever should be displayed. An expected result should look like this
Result:
--------------------------------------------
|ID |Title |Note |NoteType |
--------------------------------------------
|1 |FirstJob |NULL |NULL |
|2 |SecondJob |CustomNot1|1 |
|2 |SecondJob |CustomNot2|1 |
|3 |ThirdJob |NULL |NULL |
--------------------------------------------
.
My query (doesn't work, doesn't display third job)
SELECT J.ID, J.Title, N.Note, N.NoteType
FROM JOB J
LEFT OUTER JOIN NOTE N ON N.JobId = J.ID
WHERE N.NoteType IS NULL OR N.NoteType = 1
My Tables:
My JOB Table
----------------------
|ID |Title |
----------------------
|1 |FirstJob |
|2 |SecondJob |
|3 |ThirdJob |
----------------------
My NOTE Table
--------------------------------------------
|ID |JobId |Note |NoteType |
--------------------------------------------
|1 |2 |CustomNot1|1 |
|2 |2 |CustomNot2|1 |
|3 |2 |SystemNot1|2 |
|4 |2 |SystemNot3|2 |
|5 |3 |SystemNot1|2 |
--------------------------------------------
This can't be true together (NoteType can't be NULL as well as 1 at the same time):
WHERE N.NoteType IS NULL AND N.NoteType = 1
You may want to use OR instead to check if NoteType is either NULL or 1.
WHERE N.NoteType IS NULL OR N.NoteType = 1
EDIT: With corrected query, your third job will not be retrieved as JOB_ID is matching but its the row getting filtered out because of the where condition.
Try below as work around to get the third job with null values.
SELECT J.ID, J.Title, N.Note, N.NoteType
FROM JOB J
LEFT OUTER JOIN
( SELECT JOBID NOTE, NOTETYPE FROM NOTE
WHERE N.NoteType IS NULL OR N.NoteType = 1) N
ON N.JobId = J.ID
just exclude the systemNotes and use a sub-select:
select * from job j
left outer join (
select * from note where notetype!=2
) n
on j.id=n.jobid;
if you include the joined table into where then left outer join might work as an inner join.