Orientdb : Traversal not producing correct output - orientdb

Can anybody tell me how to get all nodes and edges in a traversal?
For example, if I run the following query:
select from (TRAVERSE in(), inE() FROM (SELECT FROM Example_Class WHERE #rid = #13:187))
the result changes every time.
Requirement: Get all unordered nodes and edges from a specific node (#13:187 in the example above).

One way to retrieve the edges and nodes encountered while traversing a graph from a particular node, say NODE, is using the query:
> traverse outE(), inV() from NODE
Here is an example. First, let's just run out() to retrieve the encountered nodes (here the start (#11:11) and end (#11:15)); the resultant rows give the edge information:
> traverse out() from #11:11
----+------+------+-----+-------+-------
# |#RID |#CLASS|label|in_E3 |out_E3
----+------+------+-----+-------+-------
0 |#11:11|Circle|4 |[#15:2]|[#15:4]
1 |#11:15|Circle|8 |[#15:4]|null
----+------+------+-----+-------+-------
Here the picture is: (#11:11) -[#15:4]> (#11:15)
Now let's formulate the query so that the rows of the result-set include both nodes and edges:
> traverse outE(), inV() from #11:11
----+------+------+-----+-------+-------+------+------
# |#RID |#CLASS|label|in_E3 |out_E3 |in |out
----+------+------+-----+-------+-------+------+------
0 |#11:11|Circle|4 |[#15:2]|[#15:4]|null |null
1 |#15:4 |E3 |4>8 |null |null |#11:15|#11:11
2 |#11:15|Circle|8 |[#15:4]|null |null |null
----+------+------+-----+-------+-------+------+------

Related

Executing multiple spark queries and storing as dataframe

I have 3 Spark queries saved in List - sqlQueries. The first 2 of them creates global temporary views and third one executes on those temporary views and fetches some output.
I am able to run a single query using this -
val resultDF = spark.sql(sql)
Then I add partition information on this dataframe object and save it.
In case of multiple queries, I tried executing
sqlQueries.foreach(query => spark.sql(query))
How do I save my output of third query keeping other 2 queries run.
I have 3 queries just for example, It can be any number.
You can write the last query as insert statement to save the results into table. You are executing queries through foreach which will execute sequentially.
I am taking reference from your other question for the query which needs some modification as explained in global-temporary-view in sql section.
After modification your query file should look like
CREATE GLOBAL TEMPORARY VIEW VIEW_1 AS select a,b from abc
CREATE GLOBAL TEMPORARY VIEW VIEW_2 AS select a,b from global_temp.VIEW_1
select * from global_temp.VIEW_2
Then answering this question: you can use foldLeft again for the multiple queries to be reflected.
Lets say you have a dataframe
+----+---+---+
|a |b |c |
+----+---+---+
|a |b |1 |
|adfs|df |2 |
+----+---+---+
And given above multiple line query file, you can do the following
df.createOrReplaceTempView("abc")
val sqlFile = "path to test.sql"
val queryList = scala.io.Source.fromFile(sqlFile).getLines().filterNot(_.isEmpty).toList
val finalresult = queryList.foldLeft(df)((tempdf, query) => sqlContext.sql(query))
finalresult.show(false)
which should give you
+----+---+
|a |b |
+----+---+
|a |b |
|adfs|df |
+----+---+

Spark Structured Streaming operations on rows of a single dataframe

In my problem, there is a data stream of information about package delivery coming in. The data consists of "NumberOfPackages", "Action" (which can be either "Loaded", "Delivered" or "In Transit"), and "Driver".
val streamingData = <filtered data frame based on "Loaded" and "Delivered" Action types only>
The goal is to look at number of packages at the moment of loading and at the moment of delivery, and if they are not the same - execute a function that would call a REST service with the parameter of "TrackingId".
The data looks like this:
+-----------------+-----------+-----------------------
|NumberOfPackages |Action |TrackingId |Driver |
+-----------------+-----------+-----------------------
|5 |Loaded |a |Alex
|5 |Devivered |a |Alex
|8 |Loaded |b |James
|8 |Delivered |b |James
|7 |Loaded |c |Mark
|3 |Delivered |c |Mark
<...more rows in this streaming data frame...>
In this case, we see that by the "TrackingId" equal to "c", the number of packages loaded and delivered isn't the same, so this is where we'd need to call the REST api with the "TrackingId".
I would like to combine rows based on "TrackingId", which will always be unique for each trip. If we get the rows combined based on this tracking id, we could have two columns for number of packages, something like "PackagesAtLoadTime" and "PackagesAtDeliveryTime". Then we could compare these two values for each row and filter the dataframe by those which are not equal.
So far I have tried the groupByKey method with the "TrackingId", but I couldn't find a similar example and my experimental attempts weren't successful.
After I figure out how to "merge" the two rows with the same tracking id together and have a column for each corresponding count of packages, I could define a UDF:
def notEqualPackages = udf((packagesLoaded: Int, packagesDelivered: Int) => packagesLoaded!=packagesDelivered)
And use it to filter the rows of the dataframe to contain only those with not matching numbers:
streamingData.where(notEqualPackages(streamingData("packagesLoaded", streamingData("packagesDelivered")))

Using Traverse from to project the records in OrientDB

I'm using the Vehicle History database with OrientDb Studio 2.2.8, and I want to project all of the records of the automobile class that are made by Kia.
The schema for the database looks like this:
(Automobile) --isModel--> (Model) --isMake--> (Make)
where Automobile, Model, and Make are vertices and isModel, and isMake are edge types.
I want to use a traverse statement to return an equivalent result set as I get from this command:
Select expand(in('isMake').in('isModel')) from Make where name = "Kia"
whose result is...
+----+--------+---------+---------+---------+---------+---------+---------+---------+---------+---------+
|# |#RID |#CLASS |color |convertib|out_isMod|trailerHi|emissions|safety |out_Purch|VIN |
+----+--------+---------+---------+---------+---------+---------+---------+---------+---------+---------+
|0 |#17:1441|Automo...|White |true |[#24:1...|false |2016-0...|2014-0...|[#23:5...|840CDC...|
|1 |#17:1576|Automo...|Maroon |true |[#24:1...|false |2010-0...|2004-0...|[#23:5...|E71761...|
|2 |#17:1503|Automo...|Dark Gray|true |[#24:1...|false |2009-0...|2016-1...|[#23:5...|FAEB6F...|
+----+--------+---------+---------+---------+---------+---------+---------+---------+---------+---------+
I tried running this:
Select
from (traverse in from Make while $depth <= 2)
where name = "Kia"
I just get one record returned, and it's not of the Automobile class like I expected it to be. It is from Make.
+----+-------+------+----+-------------------------------------+----------------------------+
|# |#RID |#CLASS|name|in_isMake |out_Sold |
+----+-------+------+----+-------------------------------------+----------------------------+
|0 |#15:612|Make |Kia |[#25:1767,#25:2036,#25:2067,#25:2131]|[#22:5153,#22:5383,#22:5655]|
+----+-------+------+----+-------------------------------------+----------------------------+
Basically, I want to use a Traverse starting from Make to project the three Kia automobiles in the database.
Can you try this?
SELECT FROM (TRAVERSE in()
FROM (SELECT FROM Make where name='Kia'))
WHERE #class='Automobile'

OrientDB: How to flatten nested heirarchy into a single record

I have a structure that looks something like this:
How can I traverse my Page and get back a flat record so that each row represents all of data from the root node and its edges. My use case is that I'm producing a csv file.
so from the example above, i would like to create a row for each post. Each record should contain all fields from post, the language name, the page name, and the network name.
From what I can tell, when you do any kind of traversal, it only gives you the result of the final vertex and not any data from the vertices in between.
Try this query:
select *,out('posted_to').name as page,out('posted_to').out('is_language').name as language,out('posted_to').out('is_network').name as network from <class Post> unwind page,language,network
If there are many posts per page, then anchoring the query on the Pages may be more efficient than starting with the Posts.
Ergo:
select focus.in() as post,
focus.name as page,
focus.out("is_language").name as language,
focus.out("is_network").name as network
from (select #this as focus from Page)
unwind post, language, network, page
----+------+-----+----+--------+-------
# |#CLASS|post |page|language|network
----+------+-----+----+--------+-------
0 |null |#11:0|1 |Welsh |1
1 |null |#11:1|1 |Welsh |1
2 |null |#11:2|1 |Welsh |1
3 |null |#11:3|1 |Welsh |1
4 |null |#11:4|1 |Welsh |1
5 |null |#11:5|1 |Welsh |1
6 |null |#11:6|1 |Welsh |1
----+------+-----+----+--------+-------

OrientDB - Update with SUBSELECT

I want to update some rows of my table basing on other rows of the same table:
I try this:
UPDATE MyTable set myField =
(SELECT T1.myField
FROM MyTable T1
WHERE T1.id.substring(start,stop) = MyTable.id.substring(start,stop))
But OrientDB throws an error like this:
com.orientechnologies.orient.core.sql.OCommandSQLParsingException: Error on parsing command at position #XXX: Invalid keyword 'T1' Command:
first of all you in OrientDB you can't use Alias on Classes.
In this case you could use $parent.$current in a subquery, something like:
> update MyTable set myField = (
> select myField
> from MyTable
> where myField is null
> and id.substring(8,13) = $parent.$current.id.substring(8,13) and something else...
> ) where myField is null and something else...
Be careful to the length of the id...
Best Regards
M.
This is not a string update, but an integer update in place. Using the provided GratefulDeadDatabase, you can do:
CONNECT remote:localhost/GratefulDeadConcerts;
SELECT performances FROM v;
----+------+------------
# |#CLASS|performances
----+------+------------
0 |null |null
1 |null |5
2 |null |1
3 |null |531
4 |null |394
----+------+------------
UPDATE v SET performances = eval('performances + 2') WHERE performances IS NOT NULL;
SELECT performances FROM v;
----+------+------------
# |#CLASS|performances
----+------+------------
0 |null |null
1 |null |7
2 |null |3
3 |null |533
4 |null |396
----+------+------------
So the update works on the data in place. I'm fairly new to OrientDB so maybe an expert can tell me if I just did something horribly horribly wrong.
UPDATE
Notice that in your example you are updating the table with values from the same table. That is, from MyTable into MyTable (unless I misunderstood your query) and even within the same row. You can use criteria on the WHERE clause to only update rows of interest. In my example, that was
WHERE performances IS NOT NULL