B+ tree print elements is order - traversal

I am trying to implement a B+ tree myself, but I want to create a method that prints what elements does the B+ tree have. If I use a traversal(in or post order) I will get also the elements in the parent nodes, therefore I will have duplicate values. Is there a way to solve this thing?
Thanks

Mark the nodes as you traverse them. Once the node is marked, it can't be traversed.

Related

Copy e paste multiple rectangular nodes with the right order

In this example I select two rectangular nodes in order to copy and paste it.When i do it, the order of the nodes is not correct (see figure).How do i put them sequentially and automatically(node1,node2,node3,mode4)? thanks
If you want to have a sequential list of any object, you put them into a LinkedList.
In your case, have a LinkedList<Node> or use an AnyLogic "Collection" element, set it to LinkedList and the object type to Node.
Then, check how they work, it is standard Java stuff (independent of AnyLogic). QUite useful data type :)

Apache Spark - Implementing a distributed QuadTree

I am really, really, new to Apache Spark.
I am working on implementing Approximate LOCI (or ALOCI), an anomaly detection algorithm, on a distributed way over Spark. This algorithm is based on storing points in a QuadTree that is used to find a point's number of neighbors.
I know exactly how QuadTrees work. In fact, I have implemented such a structure in Java recently. But I am completely lost as far as it concerns the way that such a structure can work in a distributed way over Spark.
Something similar to what I need can be found in Geospark.
https://github.com/DataSystemsLab/GeoSpark/tree/b2b6f1d7f0015d5c9d663a7b28d5e1bb1043c413/core/src/main/java/org/datasyslab/geospark/spatialPartitioning/quadtree
GeoSpark uses in many cases a PointRDD class, that extends a SpatialRDD class which I can see that uses the QuadTree that can be found in the link above to partition the Spatial objects. That is what I understood, at least, in theory.
In practice, I still cannot figure this out. Let's say for example that I have millions of records in a csv and I want to read and load them in a QuadTree.
I could read a csv to an RDD, but then what? How does this RDD logically connects to the QuadTree I am trying to build?
Of course, I don't expect a working solution here. I just need the logic here to fill the gap in my mind. How do I implement a distributed QuadTree and how do I use it?
Ok, sadly there are no answers to this, but here I am two weeks later with a working solution. Not 100% sure if it is the right approach here, though.
I created a class named Element and turned each line of my csv to an RDD[Element]. I then created a serializable class named QuadNode which has a List[Elements] and an Array[String] of size 4. On adding elements to a node, these elements are added in the node's List. If the list get more than X elements (20 in my case), the node breaks into 4 children and the elements are sent to the children. Finally, I created a class QuadTree which has an RDD[QuadNodes] among its rest properties. Every time a node breaks to children then these children-nodes are added in the tree's RDD.
In a non-functional language, each node would have 4 pointers, one for each child. Since, we are in a distributed environment this approach could not work. So, I gave each node a unique Id. Root node has an id = "0". Root's nodes have ids "00", "01", "02" and "03". Node-"00" children have ids "000","001","002","003". In this way if we want to find all the descendants of a node, we filter our tree's RDD[QuadNode] by checking if nodes' ids startWith out node id. Reversing this logic helps us to find a node's parent node.
This is how I implemented my QuadTree, at least for now. If someone knows a better way of implementing this I would love to hear his/her opinion.

scala queue sort method

I am comparing a number of different methods for organizing the nodes at the "frontier" in dijkstra's single source shortest path algorithm. One of the implementations that I am playing around with is using q: scala.collection.mutable.Queue.
Essentially, each time I add a node to q, I sort q. This method, as expected, takes significantly longer than using scala.collection.mutable.PriorityQueue and a MinHeap that I implemented. My question is, what kind of sort is Queue using when I call q.sorted? I am specifically interested in the time complexity of the sorted implementation.
I have tried looking at the API (http://www.scala-lang.org/api/2.10.2/index.html#scala.collection.mutable.Queue) and code (https://github.com/scala/scala/blob/v2.10.2/src/library/scala/collection/mutable/Queue.scala#L1) but haven't been able to track this down.
Thank you in advance for your help.
Queue inherits sorted method from SeqLike. And you can see, that it creates new array of same elements, sorts array via java.util.Arrays.sort and then creates new structure of original type.

Tree collections in Scala

I want to implement a tree in Scala. My particular tree uses Swing Split panes to give multiple views of a geographical map. Any pane within a split pane can itself be further divided to give an extra view. Am I right in saying that neither TreeMap nor TreeSet provide Tree functionality? Please excuse me if I've misunderstood this. It strikes me there should be standard Tree collections and it is bad practice to keep reinventing the wheel. Are there any Tree implementation out there that might be the future Scala standard?
All Trees have three types of elements: a Root, Nodes and Leaves. Leaves and Nodes must have a single reference to a parent. Root and Nodes can have multiple references to child nodes and leaves. Leaves have zero children. Nodes and Root can not be deleted without their children being deleted. there's probably other rules / constraints that I've missed.
This seems like enough common specification to justify a standard collection. I would also suggest that there should be a standard subclass collection for the case where Root and Nodes can only have 2 children or a single leaf child. Which is what I want in my particular case.
Actually, a tree by itself is both pretty useless and pretty difficult to specify.
Starting with the latter, speaking strictly about the data structure, how many children can a tree have? Do nodes store values or not? Do nodes store metadata? Do children have pointers to their parents? Do you store the tree as nodes with pointers, or as positional elements on an array?
These are all questions to which the answer is "it depends". In fact, you stated that children have pointers to their parents, but that is not true for any immutable tree! You also seem to assume trees are always stored as node objects with references, when some trees are actually stored as nodes on a single array (such as a Heap).
Furthermore, not all these requirements can be accommodated -- some are mutually exclusive. Even if you ignore those, you are still left with a data structure which is not optimized for anything and clumsy to use because you have to deal with lots of details not relevant to you.
And, then, there's the second problem, which is that a tree, by itself, is useless. TreeSet and TreeMap take advantage of specific trees whose insertion/deletion/lookup algorithm makes them good data structures for sorted data. That, however, is not at all the only use for trees. Trees can be used to search for spatial algorithms, to represent tree-like real world information, to make up filesystems, etc. Sometimes the task is finding a tree inside a graph. Each of these uses require different representations and different algorithms -- the algorithms being what make them at all useful.
And, to top it off, writing a tree class is trivial. The problem is writing the algorithms to manipulate it.
There is a bit of mismatch between the notion of "tree" as a GUI widget—which you seem to be referring to—and tree as an ordered data structure. In the former case it is just a nested sequence, in the latter the aim is to provide for instance fast search algorithms, and you don't arbitrary manipulate the internal structure, where often the branching factor is constant and the tree height is kept balanced, etc. An example of the latter is collection.immutable.TreeMap which uses a self-balancing binary tree structure called Red-Black-Tree.
So these data structures are rather useless for bridging to javax.swing.TreeModel. There is little that can be done about this interface, so you'll probably stick with the default implementation DefaultTreeModel, a mutable non-generic structure (which is all that single threaded Swing needs).
For a discussion about having a scala-swing JTree wrapper, see this question. It also has a link to a Scala library for JTree.
Since you can use java classes with Scala, take a look at the javax.swing.tree package: http://docs.oracle.com/javase/6/docs/api/javax/swing/tree/package-summary.html, especially TreeModel and TreeNode, MutableTreeNode and DefaultMutableTreeNode. They were designed to be used with Swing, but are pretty much a standard tree implementation.
Other than that, implementing a (generic) tree should be pretty straightforward.
Since gui application imposes low performance demands on the tree collection you may use general graph library constrained to represent only tree structured-graph: http://scala-graph.org/
TreeSet and TreeMap are both based on RedBlack:
Red-black trees are a form of balanced binary trees where some nodes
are designated “red” and others designated “black.” Like any balanced
binary tree, operations on them reliably complete in time logarithmic
to the size of the tree.
(quote from Scala 2.8 Collections)
RedBlack is not documented very well but if you look at the source of TreeSet and TreeMap it's pretty easy to figure out how to use it, though it doesn't fill all (most?) of your requirements (nodes don't have references to the parent, etc).

Why no immutable double linked list in Scala collections?

Looking at this question, where the questioner is interested in the first and last instances of some element in a List, it seems a more efficient solution would be to use a DoubleLinkedList that could search backwards from the end of the list. However there is only one implementation in the collections API and it's mutable.
Why is there no immutable version?
Because you would have to copy the whole list each time you want to make a change. With a normal linked list, you can at least prepend to the list without having to copy everything. And if you do want to copy everything on every change, you don't need a linked list for that. You can just use an immutable array.
There are many impediments to such a structure, but one is very pressing: a doubly linked list cannot be persistent.
The logic behind this is pretty simple: from any node on the list, you can reach any other node. So, if I added an element X to this list DL, and tried to use a part of DL, I'd face this contradiction: from the node pointing to X one can reach every element in part(DL), but, by the properties of the doubly linked list, that means from any element of part(DL) I can reach the node pointing to X. Since part(DL) is supposed to be immutable and part of DL, and since DL did not include the node pointing to X, that just cannot be.
Non-persistent immutable data structures might have some uses, but they are generally bad for most operations, since they need to be recreated whenever a derivative is produced.
Now, there's the minor matter of creating mutually referencing strict objects, but this is surmountable. One can use by-name parameters and lazy vals, or one can do like Scala's List: actually create a mutable collection, and then "freeze" it in immutable state (see ListBuffer and it's toList method).
Because it is logically impossible to create a mutually (circular) referential data-structure with strict immutability.
You cannot create two nodes that point to each other due to simple existential ordering priority, in that at least one of the nodes will not exist when the other is created.
It is possible to get this circularity with tricks involving laziness (which is implemented with mutation), but the real question then becomes why you would want this thing in the first place?
As others have noted, there is no persistent implementation of a double-linked list. You will need some kind of tree to get close to the characteristics you want.
In particular, you may want to look at finger trees, which provide O(1) access to the front and back, amortized O(1) insertion to the front and back, and O(log n) insertion elsewhere. (That's in contrast to most other commonly-used trees which have O(log n) access and insertion everywhere.)
See also:
video explanation of finger trees (by the implementor of finger trees in clojure.contrib)
finger tree implementation in Scala (I haven't used it personally, but it's the top google hit)
As a supplemental to the answer of #KimStebel I like to add:
If you are searching for a data structure suitable for the question that motivated you to ask this question, then you might have a look at Extreme Cleverness: Functional Data Structures in Scala by #DanielSpiewak.