What mechanism is used to determine the absolute order of vertices in tinkerpop/titan? - titan

When performing the following traversals:
graph.addVertex("a")
graph.addVertex("b")
graph.addVertex("c")
graph.traversal().V().range(0,2)
graph.traversal().V().range(2,3)
What determines the order in which I get these vertices back when using the range functionality? Am I guaranteed to get all three vertices a, b and c back?

Without an explicit order().by() you shouldn't expect a guaranteed order.
From the TinkerPop docs:
A Traversal’s result are never ordered unless explicitly by means of
order()-step. Thus, never rely on the iteration order between
TinkerPop3 releases and even within a release (as traversal
optimizations may alter the flow).

Related

How can an unbounded PCollection be immutable?

I am getting started in dataflow/apache beam, and I'm struggling to understand a concept. According to the documentation :
A PCollection is an immutable collection of values of type T. A PCollection can contain either a bounded or unbounded number of elements.
It is easy to understand that bounded PCollections are immutable. You get a file, you put it in a PCollection, you can't change it: Immutable.
What about unbounded PCollections? They are by definition, without a limit of number of elements, so stuff always gets added to them indefinitely; i.e. How can something be changed perpetually and also be immutable?
An explanation would be great.
That's a good question! I believe the Programming Guide explains PCollection's immutability better than the JavaDoc. The immutability has to do with individual elements:
A PCollection is immutable. Once created, you cannot add, remove, or change individual elements. A Beam Transform might process each element of a PCollection and generate new pipeline data (as a new PCollection), but it does not consume or modify the original input collection.
Note: Beam SDKs avoid unnecessary copying of elements, so PCollection contents are logically immutable, not physically immutable. Changes to input elements may be visible to other DoFns executing within the same bundle, and may cause correctness issues. As a rule, it’s not safe to modify values provided to a DoFn.
Another way to look at it is that the set is logically immutable, it's just your view into it that's changing over time (due to the inability to see into the future). E.g. ReadFromPubSub returns the (immutable, unbounded) set of all message that will ever be published to this topic. From the Beam API you can't modify this set as a PCollection, but you can create other immutable, unbounded PCollections that are derived from it.
This is similar to lazy, infinite structures that exist in functional language like Haskell--you can only ever observe a portion of it, but that doesn't mean the whole thing doesn't exist as an immutable object.

How to ensure that only one item is added to janusgraph

Is there a way that I can ensure that any creation of a vertex in janusgraph with a given set of properties only results in one such vertex being created?
Right now, what I do is I traverse the graph and ensure that the number of vertices I find with particular properties is only one. For example:
val g = graph.traversal
val vertices = g.V().has("type", givenType).has("name", givenName).toList
if (vertices.size > 1) {
// the vertex is not unique, cannot add vertex
}
This can be done with the so called get or create traversal which is described in TinkerPop's Element Existence recipe and in the section Using coalesce to only add a vertex if it does not exist of the Practical Gremlin book.
For your example, this traversal would look like this:
g.V().has("type", givenType).has("name", givenName).
fold().
coalesce(unfold(),
addV("yourVertexLabel").
property("type", givenType).
property("name", givenName))
Note however, that it depends on the graph provider whether this is an atomic operation or not. In your case of JanusGraph, the existence check and the conditional vertex addition are executed with two different operations which can lead to a race condition when two threads execute this traversal at the same time in which case you can still end up with two vertices with these properties. So, you currently need to ensure that two threads can't execute this traversal for the same properties in parallel, e.g., with locks in your application.
I just published a blog post about exactly this topic: How to Avoid Doppelgängers in a Graph Database if you want to get more information about this topic in general. It also describes distributed locking as a way to implement locks for distributed systems and discusses possible improvements to better support upserts in JanusGraph in the future.

Scala PriorityQueue conflict resolution?

I'm working on a project that uses a PriorityQueue and A*. After digging around a ton I think part of the problem that I'm encountering while my search tries to solve my problem is in the PriorityQueue. I'm guessing that when it generates nodes of equal scoring (for example one earlier, and one later) it will chose the one from earlier rather than the one that was most recently generated.
Does anyone know if a PriorityQueue prioritizes the newest node if the scores are the same? If not, how can I make it do this?
Thanks!
PriorityQueue uses a heap to select the next element. Beyond that it makes no guarantees about how the elements are ordered. If it is important to you that nodes are ordered by addition order, you should keep a count of the number of items added and prioritize by the tuple (priority, -order).
If you do anything else, even if it happens to work now, it may break at any arbitrary time since the API makes no guarantees about how it chooses from among equal elements.

Why no immutable double linked list in Scala collections?

Looking at this question, where the questioner is interested in the first and last instances of some element in a List, it seems a more efficient solution would be to use a DoubleLinkedList that could search backwards from the end of the list. However there is only one implementation in the collections API and it's mutable.
Why is there no immutable version?
Because you would have to copy the whole list each time you want to make a change. With a normal linked list, you can at least prepend to the list without having to copy everything. And if you do want to copy everything on every change, you don't need a linked list for that. You can just use an immutable array.
There are many impediments to such a structure, but one is very pressing: a doubly linked list cannot be persistent.
The logic behind this is pretty simple: from any node on the list, you can reach any other node. So, if I added an element X to this list DL, and tried to use a part of DL, I'd face this contradiction: from the node pointing to X one can reach every element in part(DL), but, by the properties of the doubly linked list, that means from any element of part(DL) I can reach the node pointing to X. Since part(DL) is supposed to be immutable and part of DL, and since DL did not include the node pointing to X, that just cannot be.
Non-persistent immutable data structures might have some uses, but they are generally bad for most operations, since they need to be recreated whenever a derivative is produced.
Now, there's the minor matter of creating mutually referencing strict objects, but this is surmountable. One can use by-name parameters and lazy vals, or one can do like Scala's List: actually create a mutable collection, and then "freeze" it in immutable state (see ListBuffer and it's toList method).
Because it is logically impossible to create a mutually (circular) referential data-structure with strict immutability.
You cannot create two nodes that point to each other due to simple existential ordering priority, in that at least one of the nodes will not exist when the other is created.
It is possible to get this circularity with tricks involving laziness (which is implemented with mutation), but the real question then becomes why you would want this thing in the first place?
As others have noted, there is no persistent implementation of a double-linked list. You will need some kind of tree to get close to the characteristics you want.
In particular, you may want to look at finger trees, which provide O(1) access to the front and back, amortized O(1) insertion to the front and back, and O(log n) insertion elsewhere. (That's in contrast to most other commonly-used trees which have O(log n) access and insertion everywhere.)
See also:
video explanation of finger trees (by the implementor of finger trees in clojure.contrib)
finger tree implementation in Scala (I haven't used it personally, but it's the top google hit)
As a supplemental to the answer of #KimStebel I like to add:
If you are searching for a data structure suitable for the question that motivated you to ask this question, then you might have a look at Extreme Cleverness: Functional Data Structures in Scala by #DanielSpiewak.

Scala SeqLike distinct preserves order?

The apidoc of distinct in SeqLike says:
Builds a new sequence from this sequence without any duplicate elements.
Returns: A new sequence which contains the first occurrence of every element of this sequence.
Do I feel it correct that no ordering guarantee is provided? More generally, do methods of SeqLike provide any process-in-order (and return-in-order) guarantee?
On the contrary: operations on Seqs guarantee the output order (unless the API says otherwise). This is one of the basic properties of sequences, where the order matters, versus sets, where only containment matters.
It depends on the collection you were using in the first place. If you had a list you'll get your order. If on the other hand you had a set, then probably not.