Group by middle node in gremlin - group-by

All
I am doing Rank of Neighbors problem using gremlin.
Here is the details of this problem:
Rank of neighbors (SQ : Ranking)
Given a start node v and two edge types Ei and Ej, return k neighbors of v with the
edge type Ei with the highest number of out-going edges of edge type Ej .
For instance, list k friends of a given person ranked by the number of sports each friend
plays.
Here is my code in gremlin:
g.V().has('unique1',1220).out('relation1').as('a').out('relation2').as('b').select('a','b')
I am trying to group the results by nodes marked by 'a', and then return some properties and the number of each group as the results.
For example:
If the graph has the relationships:
a->b->c<br>
a->b->d<br>
a->e->f<br>
a->e->g<br>
Ideally, the result I want should be:
b:2
e:2
How can I revise my query to get the correct result.
And if it is possible, how can I get the result in descending order of the number.
Thanks for your help!

g.V().has('unique1',1220).out('relation1').as('a').out('relation2').as('b').select('a').values('unique1').groupCount().order(local).by(valueDecr)
This is the answer I want, I have figured this out.

Related

Searching nearest neighbour in same Table GIS Postgrsql

We have a database of individual trees with geo location in the DB we seem to have a geom point combined from long and lat named estimated_geometric_location. We get a periodic update of these trees lets say every month. I would like to get a list of trees that has two properties. I am looking to identify the most likely update of a specific tree ie. when a new set of trees from one tracking event comes we need to run a routine suggesting date entry x.2 is an update of the datapoint x.1. Ideally this routine then updates the new data point(child) adding the older mother data point which then hopefully represents that same tree.
So far i have something like this but the DB is not responding (or maybe i am not waiting long enough... waited about 10minutes so far)
SELECT
i.id
,ST_Distance(i.estimated_geometric_location, i.b_estimated_geometric_location) AS dist
FROM(
SELECT
a.id
,b.id AS b_id
,a.estimated_geometric_location
,b.estimated_geometric_location AS b_estimated_geometric_location
,rank() OVER (PARTITION BY a.id ORDER BY ST_Distance(a.estimated_geometric_location, b.estimated_geometric_location)) AS pos
FROM trees a, trees b
WHERE a.id <> b.id) i
WHERE pos = 1
Would be great to get some ideas on this. I got this from a post here somewhere and have adapted it but so far no luck.
There are a couple of things to mention. If the data comes from a tracking event, why compare existing trees to each other? I'd expect to have something like
SELECT id
FROM trees
ORDER BY st_distance(estimated_geometric_location, st_makepoint(15, 30))
LIMIT 1
which returns the tree closest to the point with longitude 15 and latitude 30. Have a look at whether you need to do that join at all.
Supposing that you do, the problem with a query like this is complexity. If you have any number (say 1000) trees in your database, then you're actually calculating the distances between 1000 trees and all of their 999 counterparts, calculating 999.000 distances! Just saying that if the distance between A and B is the same as between B and A, then you should be able to shave off half of them by saying a.id < b.id.
Furthermore, think about what you're doing. You want to find the minimal distance between any two trees and the ids of the trees that correspond to that distance, right? There is no need to calculate any distances as soon as you know they're not the minimal one.
SELECT a.id, b.id, ST_Distance(a.estimated_geometric_location, b.estimated_geometric_location)) distance
FROM trees a, trees b
WHERE a.id < b.id
ORDER BY distance
LIMIT 1
is a much simpler way of getting there, and for me it's a lot faster as well.

how to count occurrences of a value in Tableau?

I have a dataset with dog ids, dog names, and neighborhoods. I would like to display the n most popular dog names for each neighborhood. How can I do that?
I figured how to display counts for each name in each neighborhood, by simply dragging 'Neighborhood' and 'Animal Name' in 'Rows', and 'CNT(Animal Name)' in 'Columns'. But I don't know how to select the top 3 or 4 names for each neighborhood.
I'm going to use Tableau's Sample Superstore dataset to walk through one way you can show the top N number of Products by Category. This example will easily transfer to Dog Names by Neighborhood.
I'll start by creating a Tableau calculation we'll use as a filter later.
RANK(COUNT([Product Name]))
I'll then put the Category and Product Names dimension on the Rows shelf and the Count of Product Names on the Text marks card.
We'll then place our Production Popularity table calculation on the filter shelf.
When the dialog box appears, just click OK.
Now we need to edit how our Table calculation runs to get the Top N Product Names within each Category.
Select Specific dimension and then uncheck Category or Neighborhood in your case.
Now we need to go edit our filter.
For this example, I'll set the upper limit to 3.
Click OK and you should see the top 3 Product Names by Category.
Of course, you'll want to adjust this example to fit your data.
Hope this was helpful. Happy vizzing!

Centralities in networkx weighted graph

I am not able to compute centralities for a simple NetworkX weighted graph.
Is it normal or I am rather doing something wrong?
I add edges with a simple add_edge(c[0],c[1],weight = my_values), where
c[0],c[1] are strings (names of the nodes) and my_values integers, within a for loop. This is an example of the resulting edges:
('first node label', 'second node label', {'weight': 14})
(the number of nodes does't really matter — for now I keep it to only 20)
The edge list of my graph is a list of tuples, with (string_node1,string_node2,weight_dictionary) - everything looks fine, as I am also able to draw/save/read/ the graph...
Why?:
nx.degree_centrality gives me all 1s ?
nx.closeness_centrality gives me all 1s ?
example:
{'first node name': 1.0,
...
'last node name': 1.0}
Thanks for your help.
It was easy:
instead of using nx.degree_centrality() I use
my_graph.degree(weight='weight') - still I think this is a basic lack in the module...
...but, the issue is still open for nx.closeness_centrality
For making closeness_centrality consider weight, you have to add a distance attribute of 1 / weight to graph edges, as suggested in this issue.
Here's code to do it (graph is g):
g_distance_dict = {(e1, e2): 1 / weight for e1, e2, weight in g.edges(data='weight')}
nx.set_edge_attributes(g, g_distance_dict, 'distance')
I know this is a pretty old question, but just wanted to point out that the reason why your degree centrality values are all 1 is probably because your graph is complete (i.e., all nodes are connected to every other node), and degree centrality refers to the proportion of nodes in the graph to which a node is connected.
Per networkx's documentation:
The degree centrality for a node v is the fraction of nodes it is connected to.
The degree centrality values are normalized by dividing by the maximum possible degree in a simple graph n-1 where n is the number of nodes in G.

Calculating total area of polygons that intersects with other polygons in Postgis

I want to calculate in Postgis the total area of 'a' polygons, that intersects with others 'b'.
SELECT DISTINCT a.fk_sites,
SUM(ST_Area(a.the_geom)/100) as area
FROM parcelles a, sites b
WHERE st_intersects(a.the_geom,b.the_geom)
GROUP BY a.fk_sites
I need to do a SELECT DISTINCT because 'a' polygons may intersect with several 'b' polygons, so that the returned 'a' polygons appear a few times.
This works fine, I just have the problem, that not all areas are calculated correctly. A few seam to ignore the DISTINCT case, so that the calculated area reflects the SUM of all, even the duplicated 'a' records (even if they should be eliminated).
When I do a query without the SUM function, I get the correct number of 'a' polygons and while adding their area I get the right value.
SELECT DISTINCT a.fk_sites,
ST_Area(a.the_geom)/100 as area
FROM parcelles a, sites b
WHERE st_intersects(a.the_geom,b.the_geom)
ORDER BY a.fk_sites
Is the combination of SELECT DISTINCT and the SUM / GROUP BY not correct?
This may have something to do with you fk_sites column because the query itself should be ok, although doing a DISTINCT on a double precision value is never a good thing.
You can solve this by identifying the distinct rows from a in a sub-query, then sum() in the main query:
SELECT fk_sites, sum(ST_Area(the_geom)/100) AS area
FROM (
SELECT a.fk_sites, a.the_geom
FROM parcelles a
JOIN sites b ON ST_Intersects(a.the_geom, b.the_geom)
) sub
GROUP BY fk_sites
ORDER BY fk_sites;

Associating different breeds through turtle attributes

Say I have two classes of turtles, cars and insurers. There are 5000 cars and 100 insurers. Initially, cars are assigned a random insurer 1 through 100. Cars and insurers have several attributes:
cars-own [make model age insurance capacity]
insurers-own [number-of-customers minimum-premium maximum-premium average-premium]
What I want to do is count the number of cars with insurance = x and assign that value to number-of-customers for insurer x. For example, if there fourteen cars with insurer 24, I want number-of-customers for insurer 24 to take the value 14.
This seems like it should be straightforward, but since I'm operating between two agentsets I'm having difficulty implementing. Help would be greatly appreciated. Thank you!
EDIT: Additionally, is there a way to generalize this to a links breed? For example, a road network consists of directed links between nodes. I want to count the number of cars on any given link:
breed [cars car]
breed [insurers insurer]
breed [road_nodes road_node]
directed-link-breed [road_segments road_segment]
cars-own [make model age insurance capacity current-road-segment]
insurers-own [number-of-customers minimum-premium maximum-premium average-premium]
road-segments-own [number-cars-here]
As in the cars/insurers case, I'd like the value of number-cars-here for road_segment x y to be number of cars with current-road-segment = "road_segment x y".
There are many ways to do this, but directed links seem an obvious way. Unless you will otherwise compute the same number over and over, do not keep a number-of-customers attribute. Just make one directed link from each customer to its insurer, and then count the insurer's in-links whenever you want number-of-customers.