Titan: comparison between two query approach - titan

What is the performance difference between
g.query().has("city","mumbai").vertices().iterator().next();
here each vertex will have a property city with city name mumbai
and
v.query().direction(Direction.IN).labels("belongTo").vertices();
here v is the vertex for mumbai city and all other vertex is connect to it through edge label belongTo.
I want to do query something like all vertex having city mumbai. Which approach will be better?
The problem is a user can enter anything as city name e.g mumbai or mummbai or mubai so its not possible to varify city name. So for mumbai i will create mumbai mummbai mubai vertex. its very inefficient.
How will you handle this kind of situation?

Titans ElasticSearch integration is great for those kind of fuzzy searches. Here's an example:
g = TitanFactory.open("conf/titan-cassandra-es.properties")
g.makeKey("city").dataType(String.class).indexed("search", Vertex.class).make()
g.makeKey("info").dataType(String.class).make()
g.makeLabel("belongsTo").make()
g.commit()
cities = ["washington", "mumbai", "phoenix", "uruguay", "pompeji"]
cities.each({ city ->
info = "belongs to ${city}"
g.addVertex(["info":info]).addEdge("belongsTo", g.addVertex(["city":city]))
}); g.commit()
info = { it.getElement().in("belongsTo").info.toList() }
userQueries = ["mumbai", "mummbai", "mubai", "phönix"]
userQueries.collectEntries({ userQuery ->
q = "v.city:${userQuery}~"
v = g.indexQuery("search", q).limit(1).vertices().collect(info).flatten()
[userQuery, v]
})
The last query will give you the following result:
==>mumbai=[belongs to mumbai]
==>mummbai=[belongs to mumbai]
==>mubai=[belongs to mumbai]
==>phönix=[belongs to phoenix]
Cheers,
Daniel

Related

Query from OpenStreetMap

At the moment I'm using the Overpass API to query from OpenStreetMap using https://overpass-turbo.eu/ but when I use the following code, not all the schools in the area appear on the map (e.g. Holy Cross College doesn't appear).
area[name = "Council of the City of Ryde"];
node(area)[amenity = school];
out;
Anyone know why this might be the case?
Thanks for any help!
OpenStreetMap data consists of three basic elements: nodes, ways and relations. Your query searches only for nodes. Some schools will be mapped as ways and a few others as relations.
You have to change your query in order to search for all three elements:
area[name = "Council of the City of Ryde"];
(
node(area)[amenity = school];
way(area)[amenity = school];
relation(area)[amenity = school];
);
out;
Alternatively just use the keyword nwr to search for all three elements:
area[name = "Council of the City of Ryde"];
nwr(area)[amenity = school];
out;
If there are still missing schools then either they are mapped with a different tag or they are missing in OSM. In the second case feel free to add them yourself.

Graphite: merge series containing multiple values

Is there a way to merge two "dictionaries" of values in Graphite? That is to say, I want to start with a series:
AnimalsByCountry
England
Cats
Dogs
France
Cats
Dogs
Birds
And combine them into series:
AnimalsInWorld
Cats // = AnimalsByCountry.England.Cats + AnimalsByCountry.France.Cats
Dogs // = AnimalsByCountry.England.Dogs + AnimalsByCountry.France.Dogs
Birds // = AnimalsByCountry.France.Birds
Sorry if this is an obvious question; I'm new to Graphite and this seems like a simple operation but I can't find any functions to do it in the documentation.
Use https://graphite.readthedocs.io/en/latest/functions.html#graphite.render.functions.groupByNodes
groupByNodes(animalsbycountry.*.*,'sum',2)

Dimensional Modelling: OLAP Operation on Data Cube

Assume that we have a data cube as follows:
DairyFarms = { <Name, Time , Product> , <Sales> , <Sum> }
Name = {Farm1, Farm2, Farm3, Farm4}
Time = {Jan, Feb, Mar , ..... , Dec}
Product = {Milk, Butter, Cheese, Yogurt}
Suppose I want to retrieve the sales of Cheese across all the farms during January. Which of the following two queries is correct?
i) DairyFarms[Name*][Jan][Cheese]
ii) DairyFarms[][Jan][Cheese]
Do both of them mean the same or is there any difference between them w.r.t. correctness and/or efficiency?

How can I access an attribute of a selected parameter in Tableau?

I have a list of Companies with a ROIC measure. Each Company belongs to a Segment.
I created a parameter to select a Company: [SelectedCompany], and I want to create a SET which includes all the companies except the [SelectedCompany], which are in the same [Segment] as [SelectedCompany].
My set is currently defined by this formula:
[Company] != [SelectedCompany]
I should add something like:
[Company] != [SelectedCompany]
AND
[Segment] = [SelectedCompany].[Segment]
But I don't know how to access the [Segment] attribute of the [SelectedCompany].
Just for clarification, I'm making this because I want to compare the [SelectedCompany] ROIC against the average ROIC of the other Companies in the same Segment.
I would appreciate any help on this.
Thanks a lot!
Here's a bit of a hacky way to get what you're looking for. Keep your original definition for the set:
[Company] != [SelectedCompany]
Create a calculated field:
{ FIXED [Segment] : MAX( IIF([Company] = [Parameters].[SelectedCompany], 1, 0) ) }
Then drag that field into the Filters card and filter to allow only 1's. This will filter out all Segments except for the selected company's Segment.

How to get multiple vertices/edges in a single gremlin query?

I am in a situation where I need to get two different types of vertices using a single query. For example, assume that the graph has the following structure :
Node("User")--Edge("is_member")-->Node("Membership")--Edge("is_member")-->Node("Group")
Assume that the nodes have the following properties:
Membership
status
date
Group
name
date
type
Now, I need to get all the Membership nodes that a user is_member of, along with the corresponding Group's name. How do I write a Gremlin query for this?
I am using the Bulbs framework. How do I store the result in a python object?
The following query gives you for user u1 a map with key = Membership-Nodes and value = list of group names of the key membership node:
m=[:];u1.out('is_member').groupBy(m){it}{it.out('is_member').name}
Output is:
gremlin> m
==>v[m1]=[group1]
==>v[m2]=[group2, group3]
Here the used sample graph:
g = new TinkerGraph()
u1 = g.addVertex('u1')
u2 = g.addVertex('u2')
m1 = g.addVertex('m1')
m2 = g.addVertex('m2')
g1 = g.addVertex('g1')
g2 = g.addVertex('g2')
g3 = g.addVertex('g3')
g.addEdge(u1, m1, 'is_member')
g.addEdge(u1, m2, 'is_member')
g.addEdge(u2, m2, 'is_member')
g.addEdge(m1, g1, 'is_member')
g.addEdge(m2, g2, 'is_member')
g.addEdge(m2, g3, 'is_member')
g1.name = 'group1'
g2.name = 'group2'
g3.name = 'group3'
See also: How do I write a sub-query?
(tested with gremlin2)