Loading big part of the graph in a single traversal/query

Loading big part of the graph in a single traversal/query - titan

I would like to load a bigger part of a graph with one query/traversal (in order to save network requests).
So what I would like to do:
One traversal
Retrieve a big part of a graph.
Start from a given Vertex (by id normally)
In tree format (for processing afterwards)
Include edges (for processing afterwards)
Process the data afterwards to put it into a data structure.
For example in the graph below I would like to get everything starting from "hercules". But I do not want the "lives" edge and the data going out there.
So far I got this:
GraphTraversal traversal = titanGraph.traversal()
.V().has("name", "hercules").as("v").outE("battled").as("e").inV().tree();
traversal.next();
(source: thinkaurelius.com)

Related

Should I use neo4j when want to save d3 force graph for next time rendering?

It seems that if I want to render d3 force-directed graph with nodes and links, links pass to the simulation must be in a list of 2-tuple.
This means I need to export the neo4j "connection" back to a list of 2-tuple if I want to render it again next time.
Then, in this case, I couldn't get any benefit from the "native" graph database(quick).
Should I then just probably store the 2-tuple list in MongoDB instead? then I could save time "export the neo4j connection to a list of 2-tuple".

The main reason for using a native graph database is hardly ever pure visualization. But yes, your d3 or any other library you will use , will expect a list of nodes and edges. There are many tools , including ours (Graphileon) that do that conversion of a set of nodes/edges or entire paths for you.
disclosure:I work for graphileon

Features of GStreamer

Does GStreamer have the following functionalities/features, or is it possible to implement them on top of GStreamer:
Time windows: set up the graph such that a sink pad of one element does not just receive the current frame, but also n previous frames and m future frames. Including when seeking to a new position.
No data copies when passing data between elements, instead reusing the same buffer.
Having shared data between mutiple elements on different branches, that changes with time, but is buffered in such a way that all elements get the same value for it for the same frame index.

Q1) Time windows
You need to write your plugin using GstAdapter.
Q2) No data copies when passing data between elements
It's done by default. No data is copied from element to element unless required. It just passes a pointer to an instance of GstBuffer. If an element is like encoder or filter, which needs to work on a buffer to produce new data, a new GstBuffer instance is created with newly generated data in GstMemory, obviously.
Q3) Having shared data between mutiple elements
Not sure exactly what you mean. Is is possible to achieve what you want by using GstMemory share? Take a look at gst_memory_share(), gst_buffer_copy_region(), or gst_adapter_get_buffer().

Separate files to load or long line of code

http://rich.littlebigfoot.org.uk/test7.html
I am creating a map and will be loading 20 or so walks onto the map. Each walk will have upwards of 50 plus points which will create a very long file. Is it better to create a separate file for each walk, aiding any editing needs or just load one very long one please?
If I create separate walk files do I simply call them normally.
Thanks
Rich

20 paths with ~50 points each (i.e. 20*50 = 1,000 pairs of coordinates) is not that very long / big, actually. See for example this GeoJSON file with shapes of all countries in the world: it has 10k+ points.
A good practice is indeed to separate your data and application in different files, so that you can update them separately.
Then splitting again your data into separate paths is up to you, depending on the rate at which you will update them, if you have an automated process (or not) to generate them, and if your visitors are limited in terms of bandwidth. Just consider the benefit of caching on user's browser what does not change v.s. number of network requests to download separate files.
By the way, note that you can build a Polyline by passing an array of "array of coordinates", you do not have to build actual L.latLng points:
var polyline = L.polyline(
[
[50.2184,-5.4793],
[50.2166,-5.4850],
[50.2168,-5.4884] // etc.
],
polylineOptions);

How should I store my large MATLAB data files during analysis?

I am having issues with 'data overload' while processing point cloud data in MATLAB. This is what I am currently doing:
I begin with my raw data files, each in the order of ~30Mb each.
I then do initial processing on them to extract n individual objects and remove outlying points, which are all combined into a 1 x n structure, testset, saved into testset.mat (~100Mb).
So far so good. Now things become complicated:
For each point in each object in testset, I will compute one of a number of features, which ends up being a matrix of some size (for each point). The size of the matrix, and some other properties of the computation, are parameters of the calculations. I save these computed features in a 1 x n cell array, each cell of which contains an array of the matrices for each point.
I then save this cell array in a .mat file, where the name specified the parameters, the name of the test data used and the types of features extracted. For example:
testset_feature_type_A_5x5_0.2x0.2_alpha_3_beta_4.mat
Now for each of these files, I then do some further processing (using a classification algorithm). Again there are more parameters to set.
So now I am in a tricky situation, where each final piece of the initial data has come through some path, but the path taken (and the parameters set along that path) are not intrinsically held with the data itself.
So my question is:
Is there a better way to do this? Can anyone who has experience in working with large datasets in MATLAB suggest a way to store the data and the parameter settings more efficiently, and more integrally?
Ideally, I would be able to look up a certain piece of data without having to use regex on the file strings—but there is also an incentive to keep individually processed files separate to save system memory when loading them in (and to help prevent corruption).
The time taken for each calculation (some ~2 hours) prohibits computing data 'on the fly'.

For a similar problem, I have created a class structure that does the following:
Each object is linked to a raw data file
For each processing step, there is a property
The set method of the properties saves the data to file (in a directory with the same name as
the raw data file), stores the file name, and updates a "status" property to indicate that this step is done.
The get method of the properties loads the data if the file name has been stored and the status indicates "done".
Finally, the objects can be saved/loaded, so that I can do some processing now, save the object, later load it and I immediately know how far along the particular data set is in the processing pipeline.
Thus, the only data in memory is the data that is currently being worked on, and you can easily know which data set is at which processing stage. Furthermore, if you set up your methods to accept arrays of objects, you can do very convenient batch processing.

I'm not completely sure if this is what you need, but the save command allows you to store multiple variables inside a single .mat file. If your parameter settings are, for example, stored in an array, then you can save this together with the data set in a single .mat file. Upon loading the file, both the dataset and the array with parameters are restored.
Or do you want to be able to load the parameters without loading the file? Then I would personally opt for the cheap solution of having a second set of files with just the parameters (but similar filenames).

How might Union/Find data structures be applied to Kruskal's algorithm?

http://en.wikipedia.org/wiki/Disjoint_sets
http://en.wikipedia.org/wiki/Kruskal's_algorithm
Union/Find data structure being used for disjoint sets...

It is stated in the entry for Kruskal's algorithm, but you can use the union/find structure to test (via FIND) if the edge connects two different trees or whether it will form a cycle when added.
The same structure can be updated (via UNION) if the edge does not form a cycle and is added to the spanning tree.