I'm using SNAP facebook dataset for social network analysis. SNAP uses simple edge list as a data format "node1 node2" . How can I read SNAP dataset in Apache Giraph? I am reading the file with BufferedReader line per line but do not know how to save it in BSP model with adjacency lists. Can someone help me with a code example in java?
I would also like to add information about the nodes (characteristics each user/node has) how can I do that in Giraph?
You can use SNAP facebook dataset directly. In your command instead of using -vif ... use -eif org.apache.giraph.io.formats.IntNullTextEdgeInputFormat. This format reads each line as (source_vertex destination_vertex) just like SNAP dataset.
Related
I'm planning a climbing trip and I wanted to retrieve the climbing locations from a leaflet map.
I thought I could use chromedriver and selenium to get the information I wanted but I'm having difficulty scanning through all markers since I can't understand where all the informations are stored.
Could someone guide me through how I could get the information? (also without using selenium)
The map in question is: https://www.climbingsardinia.com/topos/maps/
Thank you in advance.
in that page you will see a global variable called cttm_markers, it contains all markers informations and relative coordinates.
For example, cttm_markers[0][[0].additionalData.title evaluates "M.te Arci – Trebina Longa".Further, cttm_markers[0][[0]._latlng is an object {lat,lng} that contains coordinates.
Try to open console and paste this: JSON.stringify(cttm_markers[0].map(c=>({title:c.additionalData.title,latlng:c._latlng}))), it will print a json.
One way to do it is to get some information on one of the markers and use this to search the request reponses made by the page (you can do this in the debug tools, usually opened with F12). One of the markers for example reveals the location "Grighini". The base request redirects to (in my case) https://www.climbingsardinia.com/topos/maps/?doing_wp_cron=1651158093.0027918815612792968750
Searching the response, reveals that in line 1908 there's the string "Grighini". This line contains a serialized JSON array, containing the markers.
Is there a place to find sample CFD data in this format please?
I am trying to create scripts for Paraview to automate files opening and want to use the track function.
Data from this article can be used to test out the file format:
I want to make a network graph which shows the distribution of our documents in our folder structure.
I have the nodefile, edgefile and gephi graph file in this location:
https://1drv.ms/f/s!AuVfRBdVHkO7hgs5K9r9f7jBBAUH
What I do is:
Run the algorithm ForceAtlas2 with scaling 10-20, dissuade hub marked and prevent overlap marked, all other standard setting.
What I get is a graph with groups radial/spherical distributed. However, what I want is a tree directed network graph.
Anyone know how I can adjust Gephi to make this?
Thanks!
I just found a solution.
I tested the file format as shown on the Yed site "import excel file" page
http://yed.yworks.com/support/manual/import_excel.html
This gave me the Yed import dialog (took a life time to figure out that it's a pop up menu and not selectable through the standard menu)
Anyway, it worked and I've adjusted the test files with the data prepared for the Gehpi. This was pretty easy, I could used the source target ID's etc. Just copy paste.
I load it into Yed and used some directed and radial clustering algorithms on it. Works fine!
Below you can find the excel node/edge file used to import in Yed and the graph file you can open with Yed to see the final radial result.
https://1drv.ms/f/s!AuVfRBdVHkO7hg6DExK_eVkm5_mR
Only thing to figure out is how to combine the weight (which represents the number of documents) with the node size.
Unfortunately, as of version 0.9.0, Gephi no longer supports hierarchical graphs. Maybe try using a previous version?
Other alternatives involve more complex software, such as Graphviz, but you need a .dot file instead of your .csv. I looked all over, but could not find an easy-to-use csv to dot converter.
You could try looking at d3-hierarchy, a node.js program, but then again you need to use the not-so-user-friendly npm. If you look at the link, it looks like it can produce the kind of diagram you're looking for.
*.pbf("Protocolbuffer Binary Format") is primarily intended as an alternative to the XML format.
There are two formats of *.osm.pbf and *.vector.pbf. What tools can I use to open these files? (I know JOSM can open *.osm.pbf files, but it can't open *.vector.pbf files.)
If I want to write own *.vector.pbf files in Mapbox, how do I work for that?
Thanks!
Regarding question #2, extracting PBF data
Using GDAL's ogr2ogr is the easiest method (I found).
Given a file named 1583.vector.pbf decode it to a, for example, shapefile (folder) named output:
# cmd show prog. output format output name input name
ogr2ogr -progress -f "ESRI Shapefile" output 1583.vector.pbf
Regarding question #3, creating PBF data
Use the same command as above but swap the input/outputs and output format:
# example source: https://gdal.org/drivers/vector/mvt.html
ogr2ogr -f MVT mytileset source.gpkg -dsco MAXZOOM=10
The Vector tiles used by Mapbox are serialized as Protocol Buffers.
Protocol Buffers allow you to efficiently compress the vector data inside the tile.
The Mapbox Tile Specification is available on github.
Esri has also adopted the same specification for their products.
You can find a list of parsers, renderers & CLI utilities here: https://github.com/mapbox/awesome-vector-tiles
In the common scenario, you can use mapbox-gl-js to render the vector tiles on the client. To generate vector tiles, you can use Mapbox Studio. This will require uploading your data online in the Studio. You can also use Mapbox Studio Classic (the older version) to generate the tiles locally.
Internally, Mapbox Studio uses the tilelive API, so you can programatically generate the tiles. In the list above there are other good alternatives as well.
I have a text file called "test.txt" which contains data in libsvm format.
Data in this file is represented as follows:
165475 0:246870 1124384:2 342593:7 1141651:1 297582:1 1186846:1 17725:1 656602:1
463304:1 766612:1 573309:1 290046:1 748198:1 216665:1 950594:2 909004:1 29008:1
105623:1 5018:5 806027:1 1125729:1 757846:1 1023921:2 612980:1 120767:1 51340:1
108172:5 674420:2
where 1st term represents the label and remaining represents the feature and its weight(separated by : ).This is a very huge file(with every label having lots of features and weights).
I am using scikit with ipython notebook and want to load this data in notebook to start processing it.
Can someone tell how to do that.Thanks in advance.
Use load_svmlight_file from sklearn.datasets.