Unable to create edge from one node to other - orientdb

How can I create a edge between 2 class instances in orientDb.
I have my class Xyz in Db with following properties
#| RID |name
--+---------+------------
0| #99:6|abc
1| #99:7|xyz
Now when I try to create edge between both these nodes and I get
following errors
orientdb> create edge E1 from #99:6 to #99:7
Error: com.orientechnologies.orient.core.exception.OCommandExecutionException:
E rror on execution of command: OCommandSQL [text=create edge E1 from
#99:6 to #99 :7] Error: com.orientechnologies.orient.core.exception.OValidationException: The
fie ld 'OGraphEdge.out' has been declared as LINK of type
'OGraphVertex' but the val ue is the document #99:6 of class 'Xyz'
orientdb>
Can I have set of labels/tags/properties to a edge?
How can i create edges in java?Do we have some java api instead of calling SQL?

Seems that record #99:6 is of a class doesn't extend V (OGraphVertex).

Related

How do I create nested dictionaries with upsert in KDB?

Why does the d[c] assignment not work here?
d: `a`b!(1;2)
d
a| 1
b| 2
d[`c]: d
'type
[0] d[`c]: d
(PS it doesn't work with any dictionary, not just the recursive example shown here)
Your attempted assignment fails because you're trying to add to a "typed" dictionary (the type being long, in this case). You'll encounter the same error, trying to add a key-value pair with a symbol as the value, for example:
q)d[`c]:`s
'type
[0] d[`c]:`s
You can get around this by using a dictionary without a specified type for the values:
q)d:enlist[`]!enlist(::)
q)d[`a]:12.5
q)d[`b]:d
q)d
| ::
a| 12.5
b| ``a!(::;12.5)

Left Join errors out: org.apache.spark.sql.AnalysisException: Detected implicit cartesian product [duplicate]

This question already has answers here:
spark 2.4.0 gives "Detected implicit cartesian product" exception for left join with empty right DF
(3 answers)
Closed 1 year ago.
"left join" requires either "spark.sql.crossJoin.enabled=true" or calling "persist()" on one dataframe.
SELECT * FROM LHS left join RHS on LHS.R = RHS.R
How do I make "left join" work without both "spark.sql.crossJoin.enabled=true" and persisting a dataframe?
The exception below occurs in both Spark 2.3.3 and 2.4.4.
Exception in thread "main" org.apache.spark.sql.AnalysisException: Detected implicit cartesian product for LEFT OUTER join between logical plans
OneRowRelation
and ...
Join condition is missing or trivial.
Either: use the CROSS JOIN syntax to allow cartesian products between these
relations, or: enable implicit cartesian products by setting the configuration
variable spark.sql.crossJoin.enabled=true;
Spark2.4.3 using dataframe
scala> var lhs = spark.createDataFrame(Seq((1,"sda"),(2,"abc"))).toDF("id","value")
scala> var rhs = spark.createDataFrame(Seq((2,"abc"),(3,"xyz"))).toDF("id1","value1")
scala> lhs.join(rhs,col("id")===col("id1"),"left_outer")
scala> lhs.join(rhs,col("id")===col("id1"),"left_outer").show
+---+-----+----+------+
| id|value| id1|value1|
+---+-----+----+------+
| 1| sda|null| null|
| 2| abc| 2| abc|
+---+-----+----+------+
Not facing any issue.

How to parse url in spark sql(Scala)

I am using following function to parse url but it throws error,
val b = Seq(("http://spark.apache.org/path?query=1"),("https://people.apache.org/~pwendell/spark-nightly/spark-master-docs/latest/api/sql/#negative")).toDF("url_col")
.withColumn("host",parse_url($"url_col","HOST"))
.withColumn("query",parse_url($"url_col","QUERY"))
.show(false)
Error:
<console>:285: error: not found: value parse_url
.withColumn("host",parse_url($"url_col","HOST"))
^
<console>:286: error: not found: value parse_url
.withColumn("query",parse_url($"url_col","QUERY"))
^
Kindly Guide how to parse url into its different parts.
Answer by #Ramesh is correct, but you also might want some hacky way to use this function without SQL queries :)
Hack is in the fact, that "callUDF" function calls not only UDFs, but any available function.
So you can write:
import org.apache.spark.sql._
import org.apache.spark.sql.functions._
b.withColumn("host", callUDF("parse_url", $"url_col", lit("HOST"))).
withColumn("query", callUDF("parse_url", $"url_col", lit("QUERY"))).
show(false)
Edit: after this Pull Request is merged, you can just use parse_url like a normal function. PR made after this question :)
parse_url is available as only sql and not as api . refer to parse_url
so you should be using it as a sql query and not as a function call through api
You should register the dataframe and use query as below
val b = Seq(("http://spark.apache.org/path?query=1"),("https://people.apache.org/~pwendell/spark-nightly/spark-master-docs/latest/api/sql/#negative")).toDF("url_col")
b.createOrReplaceTempView("temp")
spark.sql("SELECT url_col, parse_url(`url_col`, 'HOST') as HOST, parse_url(`url_col`,'QUERY') as QUERY from temp").show(false)
which should give you output as
+--------------------------------------------------------------------------------------------+-----------------+-------+
|url_col |HOST |QUERY |
+--------------------------------------------------------------------------------------------+-----------------+-------+
|http://spark.apache.org/path?query=1 |spark.apache.org |query=1|
|https://people.apache.org/~pwendell/spark-nightly/spark-master-docs/latest/api/sql/#negative|people.apache.org|null |
+--------------------------------------------------------------------------------------------+-----------------+-------+
I hope the answer is helpful
As mentioned before, when you register a UDF you don't get a Java function, rather you introduce it to Spark, so you must call it in the "Spark-way".
I want to suggest another method I find convenient, especially when there are several columns you want to add, by using selectExpr
val b = Seq(("http://spark.apache.org/path?query=1"),("https://people.apache.org/~pwendell/spark-nightly/spark-master-docs/latest/api/sql/#negative")).toDF("url_col")
val c = b.selectExpr("*", "parse_url(url_col, 'HOST') as host", "parse_url(url_col, 'QUERY') as query")
c.show(false)
I created a library called bebe that exposes the parse_url functionality via the Scala API.
Suppose you have the following DataFrame:
+------------------------------------+---------------+
|some_string |part_to_extract|
+------------------------------------+---------------+
|http://spark.apache.org/path?query=1|HOST |
|http://spark.apache.org/path?query=1|QUERY |
|null |null |
+------------------------------------+---------------+
Calculate the different parts of the URL:
df.withColumn("actual", bebe_parse_url(col("some_string"), col("part_to_extract")))
+------------------------------------+---------------+----------------+
|some_string |part_to_extract|actual |
+------------------------------------+---------------+----------------+
|http://spark.apache.org/path?query=1|HOST |spark.apache.org|
|http://spark.apache.org/path?query=1|QUERY |query=1 |
|null |null |null |
+------------------------------------+---------------+----------------+

Getting type error while adding 2 dictionaries in KDB

I am able to add and assign the second dictionary (s i) to the one with (d t)
d1:`d`t!(.z.d ;.z.t)
d1,:`s`i!`VOD`L
d1
However the other way round does not work, I am getting type error :
d2:`s`i!`VOD`L
d2,:`d`t!(.z.d ;.z.t)
d2
When dictionary d2 was created all of the values where symbols. When you try to update this using d2,: with non-symbol types it causes kdb to throw an error due to mismatched types. One way to prevent this is to add a null key to your dictionary that will ensure you can have mixed types for your values:
q)d2:enlist[`]!enlist(::) / add null key
q)d2,:`s`i!`VOD`L
q)d2
| ::
s| `VOD
i| `L
q)d2,:`d`t!(.z.d ;.z.t)
q)d2
| ::
s| `VOD
i| `L
d| 2018.03.25
t| 09:42:52.754
If you investigate a namespace, for example .q or create your own, you will see that the null key exists, ensuring namespaces can contain mixed types.
In the first case, (d t) is making a heterogenous dictionary :
q)d1:`d`t!(.z.d ;.z.t)
q)type value d1
0h
now if you add and assign any homogeneous or heterogenous dictionary, it will work.
while in another case the first dictionary created is homogeneous , and it is throwing error when you add & assign a heterogenous dictionary (or homogeneous dictionary of another type for that matter )
q)d2:`s`i!`VOD`L
q)type value d2
11h
q)type value `d`t!(.z.d ;.z.t)
To solve this issue , you should only add the dictionary and then assign it.
q)d2:`s`i!`VOD`L
q)d2:d2, `d`t!(.z.d ;.z.t)
q)d2
s| `VOD
i| `L
d| 2018.03.25
t| 09:59:17.109

Spark Scala filter DataFrame where value not in another DataFrame

I have two DataFrames: a and b. This is how they look like:
a
-------
v1 string
v2 string
roughly hundreds of millions rows
b
-------
v2 string
roughly tens of millions rows
I would like to keep rows from DataFrame a where v2 is not in b("v2").
I know I could use left join and filter where right side is null or SparkSQL with "not in" construction. I bet there is better approach though.
You can achieve that using the except method of Dataset, wich "Returns a new Dataset containing rows in this Dataset but not in another Dataset"
Use PairRDDFunctions.subtractByKey:
def subtractByKey[W](other: RDD[(K, W)])(implicit arg0: ClassTag[W]): RDD[(K, V)]
Return an RDD with the pairs from this whose keys are not in other.
(There are variants that offer control over the partitioning. See the docs.)
So you would do a.rdd.map { case (v1, v2) => (v2, v1) }.subtractByKey(b.rdd).toDF.
Consider your dataframe a looks like below.
+----+
|col1|
+----+
| v1|
| v2|
+----+
Consider your dataframe b looks like below.
+----+
|col1|
+----+
| v2|
+----+
APPROACH 1:
-------------------
You can use dataframe's join method and use the type of join as left_anti to find out the values that are in dataframe a but not in dataframe b. The code is given below :
a.as('a).join(b.as('b),$"a.col1" === $"b.col1","left_anti").show()
Please find the result below :
APPROACH 2:
-------------------
You can use sql which is similar to Sql server/Oracle etc to do this. For this, first you have to register your dataframe as temp table (which will reside in spark's memory) and then write the sql on top of that table.
a.registerTempTable("table_a")
b.registerTempTable("table_b")
spark.sql("select * from table_a a where not exists(select 1 from table_b b where a.col1=b.col1)").show()
Please find the result below :