How make logic OR between vertex Indexes in Titan 1.0 / TP3 3.01 using predicate Text - titan

During my migration from TP2 0.54 -> TP3 titan 1.0 / Tinkerpop 3.01
I'm trying to build gremlin query which make "logical OR" with Predicate Text , between properties on different Vertex indexes
Something like:
------------------- PRE-DEFINED ES INDEXES: ------------------
tg = TitanFactory.open('../conf/titan-cassandra-es.properties')
tm = tg.openManagement();
g=tg.traversal();
PropertyKey pNodeType = createPropertyKey(tm, "nodeType", String.class, Cardinality.SINGLE);
PropertyKey userContent = createPropertyKey(tm, "storyContent", String.class, Cardinality.SINGLE);
PropertyKey storyContent = createPropertyKey(tm, "userContent", String.class, Cardinality.SINGLE);
//"storyContent" : is elasticsearch backend index - mixed
tm.buildIndex(indexName, Vertex.class).addKey(storyContent, Mapping.TEXTSTRING.asParameter()).ib.addKey(pNodeType, Mapping.TEXTSTRING.asParameter()).buildMixedIndex("search");
//"userContent" : is elasticsearch backend index - mixed
tm.buildIndex(indexName, Vertex.class).addKey(userContent, Mapping.TEXTSTRING.asParameter()).ib.addKey(pNodeType, Mapping.TEXTSTRING.asParameter()).buildMixedIndex("search");
v1= g.addVertex()
v1.property("nodeType","USER")
v1.property("userContent" , "dccsdsadas")
v2= g.addVertex()
v2.property("nodeType","STORY")
v2.property("storyContent" , "abdsds")
v3= g.addVertex()
v3.property("nodeType","STORY")
v3.property("storyContent" , "xxxx")
v4= g.addVertex()
v4.property("nodeType","STORY")
v4.property("storyContent" , "abdsds") , etc'...
------------------- EXPECTED RESULT: -----------
I want to return all vertexes with property "storyContent" match text contains prefix , OR all vertexes with property "userContent" matching its case.
in this case return v1 and v2 , because v3 doesn't match and v4 duplicated so it must be ignored by dedup step
g.V().has("storyContent", textContainsPrefix("ab")) "OR" has("userContent", textContainsPrefix("dc"))
or maybe :
g.V().or(_().has('storyContent', textContainsPrefix("abc")), _().has('userContent', textContainsPrefix("dcc")))
PS,
I thought use TP3 OR step with dedup , but gremlin throws error ...
Thanks for any help
Vitaly

How about something along those lines:
g.V().or(
has('storyContent', textContainsPrefix("abc")),
has('userContent', textContainsPrefix("dcc"))
)
Edit - as mentioned in the comments, this query won't use any index. It must be split into two separate queries.
See TinkerPop v3.0.1 Drop Step documentation and Titan v1.0.0 Ch. 20 - Index Parameters and Full-Text Search documentation.
With Titan, you might have to import text predicates before:
import static com.thinkaurelius.titan.core.attribute.Text.*
_.() is TinkerPop2 material and no longer used in TinkerPop3. You now use anonymous traversals as predicates, which sometimes have to start with __. for steps named with reserved keywords in Groovy (for ex. __.in()).

Related

Adding edge attribute causes TypeError: 'AtlasView' object does not support item assignment

Using networkx 2.0 I try to dynamically add an additional edge attribute by looping through all the edges. The graph is a MultiDiGraph.
According to the tutorial it seems to be possible to add edge attributes the way I do in the code below:
g = nx.read_gpickle("../pickles/" + gname)
yearmonth = gname[:7]
g.name = yearmonth # works
for source, target in g.edges():
g[source][target]['yearmonth'] = yearmonth
This code throws the following error:
TypeError: 'AtlasView' object does not support item assignment
What am I doing wrong?
That should happen if your graph is a nx.MultiGraph. From which case you need an extra index going from 0 to n where n is the number of edges between the two nodes.
Try:
for source, target in g.edges():
g[source][target][0]['yearmonth'] = yearmonth
The tutorial example is intended for a nx.Graph.

How does scala slick determin which rows to update in this query

I was asked how scala slick determines which rows need to update given this code
def updateFromLegacy(criteria: CertificateGenerationState, fieldA: CertificateGenerationState, fieldB: Option[CertificateNotification]) = {
val a: Query[CertificateStatuses, CertificateStatus, Seq] = CertificateStatuses.table.filter(status => status.certificateState === criteria)
val b: Query[(Column[CertificateGenerationState], Column[Option[CertificateNotification]]), (CertificateGenerationState, Option[CertificateNotification]), Seq] = a.map(statusToUpdate => (statusToUpdate.certificateState, statusToUpdate.notification))
val c: (CertificateGenerationState, Option[CertificateNotification]) = (fieldA, fieldB)
b.update(c)
}
Above code is (as i see it)
a) looking for all rows that have "criteria" for "certificateState"
b) a query for said columns is created
c) a tuple with the values i want to update to is created
then the query is used to find rows where tuple needs to be applied.
Background
I wonder were slick keeps track of the Ids of the rows to update.
What i would like to find out
What is happening behind the covers?
What is Seq in "val a: Query[CertificateStatuses, CertificateStatus, Seq]"
Can someone maybe point out the slick source where the moving parts are located?
OK - I reformatted your code a little bit to easier see it here and divided it into chunks. Let's go through this one by one:
val a: Query[CertificateStatuses, CertificateStatus, Seq] =
CertificateStatuses.table
.filter(status => status.certificateState === criteria)
Above is a query that translated roughly to something along these lines:
SELECT * // Slick would list here all your columns but it's essiantially same thing
FROM certificate_statuses
WHERE certificate_state = $criteria
Below this query is mapped that is, there is a SQL projection applied to it:
val b: Query[
(Column[CertificateGenerationState], Column[Option[CertificateNotification]]),
(CertificateGenerationState, Option[CertificateNotification]),
Seq] = a.map(statusToUpdate =>
(statusToUpdate.certificateState, statusToUpdate.notification))
So instead of * you will have this:
SELECT certificate_status, notification
FROM certificate_statuses
WHERE certificate_state = $criteria
And last part is reusing this constructed query to perform update:
val c: (CertificateGenerationState, Option[CertificateNotification]) =
(fieldA, fieldB)
b.update(c)
Translates to:
UPDATE certificate_statuses
SET certificate_status = $fieldA, notification = $fieldB
WHERE certificate_state = $criteria
I understand that last step may be a little bit less straightforward then others but that's essentially how you do updates with Slick (here - although it's in monadic version).
As for your questions:
What is happening behind the covers?
This is actually outside of my area of expertise. That being said it's relatively straightforward piece of code and I guess that an update transformation may be of some interest. I provided you a link to relevant piece of Slick sources at the end of this answer.
What is Seq in "val a:Query[CertificateStatuses, CertificateStatus, Seq]"
It's collection type. Query specifies 3 type parameters:
mixed type - Slick representation of table (or column - Rep)
unpacked type - type you get after executing query
collection type - collection type were above unpacked types are placed for you as a result of a query.
So to have an example:
CertificateStatuses - this is your Slick table definition
CertificateStatus this is your case class
Seq - this is how your results would be retrieved (it would be Seq[CertificateStatus] basically)
I have it explained here: http://slides.com/pdolega/slick-101#/47 (and 3 next slides or so)
Can someone maybe point out the slick source where the moving parts are located?
I think this part may be of interest - it shows how query is converted in update statement: https://github.com/slick/slick/blob/51e14f2756ed29b8c92a24b0ae24f2acd0b85c6f/slick/src/main/scala/slick/jdbc/JdbcActionComponent.scala#L320
It may be also worth to emphasize this:
I wonder were slick keeps track of the Ids of the rows to update.
It doesn't. Look at generated SQLs. You may see them by adding following configuration to your logging (but you also have them in this answer):
<logger name="slick.jdbc.JdbcBackend.statement" level="DEBUG" />
(I assumed logback above).

Is it possible to return a map of key values using gremlin scala

Currently i have two gremlin queries which will fetch two different values and i am populating in a map.
Scenario : A->B , A->C , A->D
My queries below,
graph.V().has(ID,A).out().label().toList()
Fetch the list of outE labels of A .
Result : List(B,C,D)
graph.traversal().V().has("ID",A).outE("interference").as("x").otherV().has("ID",B).select("x").values("value").headOption()
Given A and B , get the egde property value (A->B)
Return : 10
Is it possible that i can combine both there queries to get a return as Map[(B,10)(C,11)(D,12)]
I am facing some performance issue when i have two queries. Its taking more time
There is probably a better way to do this but I managed to get something with the following traversal:
gremlin> graph.traversal().V().has("ID","A").outE("interference").as("x").otherV().has("ID").label().as("y").select("x").by("value").as("z").select("y", "z").select(values);
==>[B,1]
==>[C,2]
I would wait for more answers though as I suspect there is a better traversal out there.
Below is working in scala
val b = StepLabel[Edge]()
val y = StepLabel[Label]()
val z = StepLabel[Integer]()
graph.traversal().V().has("ID",A).outE("interference").as(b)
.otherV().label().as(y)
.select(b).values("name").as(z)
.select((y,z)).toMap[String,Integer]
This will return Map[String,Int]

Can't delete/remove multiple property keys on Vertex Titan 1.0 Tinkerpop 3

Very basic question,
I just upgraded my Titan from 0.54 to Titan 1.0 Hadoop 1 / TP3 version 3.01.
I encounter a problem with deleting values of
Property key: Cardinality.LIST/SET
Maybe it is due to upgrade process or just my TP3 misunderstanding.
// ----- CODE ------:
tg = TitanFactory.open(c);
TitanManagement mg = tg.openManagement();
//create KEY (Cardinality.LIST) and commit changes
tm.makePropertyKey("myList").dataType(String.class).cardinality( Cardinality.LIST).make();
mg.commit();
//add vertex with multi properties
Vertex v = tg.addVertex();
v.property("myList", "role1");
v.property("myList", "role2");
v.property("myList", "role3");
v.property("myList", "role4");
v.property("myList", "role4");
Now, I want to delete all the values "role1,role2...."
// iterate over all values and try to remove the values
List<String> values = IteratorUtils.toList(v.values("myList"));
for (String val : values) {
v.property("myList", val).remove();
}
tg.tx().commit();
//---------------- THE EXPECTED RESULT ----------:
Empty vertex properties
But unfortunately the result isn't empty:
System.out.println("Values After Delete" + IteratorUtils.toList(v.values("myList")));
//------------------- OUTPUT --------------:
After a delete, values are still apparent!
15:19:59,780 INFO ThriftKeyspaceImpl:745 - Detected partitioner org.apache.cassandra.dht.Murmur3Partitioner for keyspace titan
15:19:59,784 INFO Values After Delete [role1, role2, role3, role4, role4]
Any ideas?
You're not executing graph traversals with the higher level Gremlin API, but you're currently mutating the graph with the lower level graph API. Doing for loops in Gremlin is often an antipattern.
According to the TinkerPop 3.0.1 Drop Step documentation, you should be able to do the following from the Gremlin console:
v = g.addV().next()
g.V(v).property("myList", "role1")
g.V(v).property("myList", "role2")
// ...
g.V(v).properties('myList').drop()
property(key, value) will set the value of the property on the vertex (javadoc). What you should do is get the VertexProperties (javadoc).
for (VertexProperty vp : v.properties("name")) {
vp.remove();
}
#jbmusso offered a solid solution using the GraphTraversal instead.

Entity Framework - TOP using a dynamic query

I'm having issues implementing the TOP or SKIP functionality when building a new object query.
I can't use eSQL because i need to use an "IN" command - which could get quite complex if I loop over the IN and add them all as "OR" parameters.
Code is below :
Using dbcontext As New DB
Dim r As New ObjectQuery(Of recipient)("recipients", dbcontext)
r.Include("jobs")
r.Include("applications")
r = r.Where(Function(w) searchAppIds.Contains(w.job.application_id))
If Not statuses.Count = 0 Then
r = r.Where(Function(w) statuses.Contains(w.status))
End If
If Not dtFrom.DbSelectedDate Is Nothing Then
r = r.Where(Function(w) w.job.create_time >= dtDocFrom.DbSelectedDate)
End If
If Not dtTo.DbSelectedDate Is Nothing Then
r = r.Where(Function(w) w.job.create_time <= dtDocTo.DbSelectedDate)
End If
'a lot more IF conditions to add in additional predicates
grdResults.DataSource = r
grdResults.DataBind()
If I use any form of .Top or .Skip it throws an error : Query builder methods are not supported for LINQ to Entities queries
Is there any way to specify TOP or Limit using this method? I'd like to avoid a query returning 1000's of records if possible. (it's for a user search screen)
Rather than
r = new ObjectQuery<recipient>("recipients", dbContext)
try
r = dbContext.recipients.
.Skip() and .Take() return IOrderedQueriable<T> while .Where returns IQueriable<T>. Thus put the .Skip() and .Take() last.
Also change grdResults.DataSource = r to grdResults.DataSource = r.ToList() to execute the query now. That'll also allow you to temporarily wrap this line in try/catch, which may expose a better message about why it's erroring.
Mark this one down to confusion. I should have been using the .Take instead of .Top or .Limit or anything.
my final part is the below and it works :
grdResults = r.Take(100)