merge node-list and edge-list in gephi - merge

I have a question about Gephi.
I'm doing a project for my school, where I analyze a website using python. I decided to look into the moviesdatabase.org website. From the https://www.themoviedb.org website I extracted the most popular movies and out of these, I extracted the nodes. For the edges I considered the genre as a relationship (eg two films that share a genre, for example horror, are united by a relationship). Note: for simplicity I have considered only the first page of the most popular movies, in fact I have 20 as the number of nodes. In the node list I have 20 nodes and 0 arcs. In the edge list I have 15 nodes and 150 edges.
The problem is when I upload the node list first and then the edge list it doesn't recognize the node list on the edge list.
I have for example this:
and in the data laboratory I have this:
it is as if the node-list is separate from the edge-list. Why is this happening?

Related

Tableau Map Report

I am working on creating a map sales report to show the sales by product for various territories. The territories are based upon zip codes and are custom territories that overflow into multiple states or are partially in a state. I have gotten everything set up and it looks good for the most part...EXCEPT 2 areas.
1.) one of the sales numbers shows up in Alaska which is not viewable if a user is zoomed in on the USA (we are US-based so it's only relevant to show anyways). Is there a way to force a sales number to show up on a user-defined location? For instance, can I show this on the State of Washington instead of Alaska or can it only default to the largest (area) part of a user-created territory map?
2.) being that we are US-based is there a way to move the states Alaska and Hawaii closer to the US? I know that utilizing the dashboard is a workaround, but it does not look good.
I'm not sure this could be a complete answer, but I think this question has more than one take.
That being said, if your worksheet is based on zip codes in order to create a map, I don't think you can force Tableau to visualize data out of their original position based on the specific geographic role.
The only thing that come to my mind is switching your approach from geographical role (country, state, city, zip, etc) to a more generic lat/long coordinates.
Doing so, you can manually match your Alaska zip codes to lat/long more "continental" areas.
Anyway this would require a lot of data manipulation prior to Tableau.
An alternative way of accomplish something similar to what you say in your second point could lead you to use 3 seperate worksheets in a single dashboard: continental, Alaska, Hawaii.
I did something on US data and I was facing the same problem for Hawaii, so I decided to use a floating worksheet putting it on the bottom left corner of the continental map.

Gremlin- find and connect sub-graphs

My graph contains an undirected topological network data and my goal is to build a query that finds all sub networks that apply to specific networking rules, create vertex for each subnetwork and connect those who have path between them. The intention is to minimize the big graph by replacing each subnetwork-subgraph in one vertex.
To find all subnetworks I took 'connected components' query from gremlin recopies
And added my networking rules to the stopping conditions. But right now I'm having hard time connecting this sub network to each other.
I'm providing here sample graph script (using different networking domain) that contains PC, Routers and other equipment nodes. Query should find all LANs by grouping connected PCs, and for each LAN return other LAN ids that have path to it.
Direction has no meaning in this graph, and path between subgraphs may contain many types of nodes (routers, equipment etc.).
My GraphDB is OrientDB.
Networking Graph Image
Result should look like this:
==>LAN 1: {pcs: [1, 2, 3], connected LANs: [LAN 2, LAN 3]}
==>LAN 2: {pcs: [4, 5, 6], connected LANs: [LAN 1]}
==>LAN 3: {pcs: [8, 7], connected LANs: [LAN 1]}
This is query's first part (finding all sub networks):
g.V().hasLabel('PC').emit(cyclicPath().or().not(both())).
repeat(__.where(without('a')).store('a').both()).until(or(cyclicPath(), hasLabel('Router'))).
group().by(path().unfold().limit(1)).
by(path().local(unfold().filter(hasLabel('PC')).values('id')).unfold().dedup().fold()).unfold()
My questions are:
I can identify connectivity between sub networks by traversing some arbitrary node from every sub network till I reach node that exist on other sub network. How do I write it in gremlin?
How can I create new graph out of this query results?
What is the performance of this type of query in a big graph, say 30M nodes?
Create graph script:
g = TinkerGraph.open().traversal()
g.addV("PC").property("id","1").as("pc1").
addV("PC").property("id","2").as("pc2").
addV("PC").property("id","3").as("pc3").
addV("PC").property("id","4").as("pc4").
addV("PC").property("id","5").as("pc5").
addV("PC").property("id","6").as("pc6").
addV("PC").property("id","7").as("pc7").
addV("PC").property("id","8").as("pc8").
addV("Router").property("id","9").as("router1").
addV("Router").property("id","10").as("router2").
addV("Equipment").property("id","11").as("eq1").
addV("Equipment").property("id","12").as("eq2").
addV("Equipment").property("id","13").as("eq3").
addV("Equipment").property("id","14").as("eq4").
addE("Line").from("pc1").to("pc2").
addE("Line").from("pc1").to("eq3").
addE("Line").from("pc2").to("pc3").
addE("Line").from("pc3").to("eq1").
addE("Line").from("pc3").to("eq3").
addE("Line").from("pc4").to("pc5").
addE("Line").from("pc4").to("pc6").
addE("Line").from("pc5").to("pc6").
addE("Line").from("pc7").to("pc8")
addE("Line").from("router1").to("pc7").
addE("Line").from("router1").to("pc8").
addE("Line").from("router1").to("eq2").
addE("Line").from("router2").to("eq4").
addE("Line").from("eq1").to("router1").
addE("Line").from("eq3").to("router2").
addE("Line").from("eq4").to("pc4").
iterate()
This isn't a great answer because I think that I have to jump to your last question and ignore the first two of the three:
What is the performance of this type of query in a big graph, say 30M nodes?
If you modified the "Connected Component" recipe found here then I assume you read further down about the general expense of this sort of query for both OLTP and OLAP. I'd imagine that for 30M vertices you should be looking at OLAP-based processing (as opposed to that script you presented above). I suppose you might be able to do it with TinkerGraph/GraphComputer on a large enough machine with a lot of memory, but this might just be a job for SparkGraphComputer as suggested toward the end of the recipe.
I think that your first two questions seem to depend on your approach to and success around the third question and that those initial questions might get more focused or even change a bit once you get that far. Perhaps it would be best to try to get your OLAP approach to "connected components" settled and then come back with some more specific questions.

Why .OSM file contains untagged nodes?

The .osm file I have:
// Weird part of the file
<node id="104511" lat="52.1696253" lon="0.131889"
version="3" timestamp="2013-03-05T18:51:38Z"
changeset="15262147" uid="103253" user="gormur"
/>
contains nodes that apart from latitude and longitude have
no other meaningful info (in my opinion). I have no idea whether such node is a building, a bus stop, or an intersection of two streets.
1) Why do people add such nodes into the file?
2) What's a simple way to remove such nodes from .OSM and leave file only with tagged nodes, such as:
<node id="104520" lat="52.1951248" lon="0.1312155" ...>
<tag k="highway" v="traffic_signals"/>
</node>
3) Could an untagged node indicate the intersection of 2 streets? Will I realize which streets the node intersects by looking to which ways it belongs? How can I know such node is the intersection of 2 streets rather than a building on the corner of 2 streets?
Not all nodes have/need tags.
Nodes that are part of a way (a street, a building, a forest etc.) just exist to define the geometry of that way. In that case all the necessary tags (highway, building, landuse and so on) are on the corresponding way and not on the node. A way keeps a reference to all nodes it consists of. So in order to know if a node is part of a building you have to take a look at the way it belongs to (if it belongs to a way). Also note that a node can belong to multiple ways, or to no way at all.
It depends on the specific feature if the tags are primarily used on nodes, ways or relations. Buildings for example are mostly mapped as ways, sometimes as relations and rarely as nodes. Contrary, bus stops are mostly just nodes.
For more information take a look at the OSM XML file format, OSM elements and OSM tags.

Is it possible to connect two separate nodes by two links?

I am wondering if it is possible to do this as I am trying to build a traffic simulation model and may need to utilise this feature , should it exist, in my model.
There two, and only two, conditions under which a pair of turtles may be connected by more than one link:
If the links are directed, you can have two links, going in opposite directions.
If the links are different breeds.
You might consider alternatives like having a single link but adding a links-own variable to the links containing a weight, count, or other information.

Group annotations with same name

I'm currently developing an iOS app which shows realtime data, now I receive 33,265 timingpoints from the API, which are all stops. So stops on opposite sides of the street are counted as 2, bus stations which have multiple platforms are also counted as many times as there are platforms.
Now, this is confusing on a map. You'd want all data for example a bus station on one screen and don't browse past 10 platforms to get the bus you'd like to take. So how can I group these annotation, which have the same name, and often are near or overlapping each other?
You can find an example of the JSON results from the API here: http://pastebin.com/RiKS4G0Q
Just make a new entity Location and have a to-one relationship to each stop (reverse is to-many, of course). Now one stop can share a location and you can present the dat in an appropriate way. During import, you could decide to create a new location if the coordinates are close enough to each other (and maybe the stop names correspond).