Assume below are directory structures in a list, how to find lowest common ancestor for these.
List(
("A/A1/A11/A111/a111.txt", "(M)"),
("A/A1/A11/A112/a112.txt", "(M)"),
("A/A1/A12/A121/", "(D)")
)
Below function helped!!
def longestCommonParent(s1: String, s2: String): String = {
val maxSize = scala.math.min(s1.length, s2.length)
var i: Int = 0;
while (i < maxSize && s1(i) == s2(i)) i += 1;
parentFolder(s1.take(i));
}
def parentFolder(path: String) = {
path.substring(0, path.lastIndexOf("/"))
}
Related
I am trying to understand the "add" and "extract" methods of the FPTree class:
(https://github.com/apache/spark/blob/master/mllib/src/main/scala/org/apache/spark/mllib/fpm/FPGrowth.scala).
What is the purpose of 'summaries' variable?
where is the Group list?
I assume it is the following, am I correct:
val numParts = if (numPartitions > 0) numPartitions else data.partitions.length
val partitioner = new HashPartitioner(numParts)
What will 'summaries contain for 3 transactions of {a,b,c} , {a,b} , {b,c} where all are frequent?
def add(t: Iterable[T], count: Long = 1L): FPTree[T] = {
require(count > 0)
var curr = root
curr.count += count
t.foreach { item =>
val summary = summaries.getOrElseUpdate(item, new Summary)
summary.count += count
val child = curr.children.getOrElseUpdate(item, {
val newNode = new Node(curr)
newNode.item = item
summary.nodes += newNode
newNode
})
child.count += count
curr = child
}
this
}
def extract(
minCount: Long,
validateSuffix: T => Boolean = _ => true): Iterator[(List[T], Long)] = {
summaries.iterator.flatMap { case (item, summary) =>
if (validateSuffix(item) && summary.count >= minCount) {
Iterator.single((item :: Nil, summary.count)) ++
project(item).extract(minCount).map { case (t, c) =>
(item :: t, c)
}
} else {
Iterator.empty
}
}
}
After a bit experiments, it is pretty straight forward:
1+2) The partition is indeed the Group representative.
It is also how the conditional transactions calculated:
private def genCondTransactions[Item: ClassTag](
transaction: Array[Item],
itemToRank: Map[Item, Int],
partitioner: Partitioner): mutable.Map[Int, Array[Int]] = {
val output = mutable.Map.empty[Int, Array[Int]]
// Filter the basket by frequent items pattern and sort their ranks.
val filtered = transaction.flatMap(itemToRank.get)
ju.Arrays.sort(filtered)
val n = filtered.length
var i = n - 1
while (i >= 0) {
val item = filtered(i)
val part = partitioner.getPartition(item)
if (!output.contains(part)) {
output(part) = filtered.slice(0, i + 1)
}
i -= 1
}
output
}
The summaries is just a helper to save the count of items in transaction
The extract/project will generate the FIS by using up/down recursion and dependent FP-Trees (project), while checking summaries if traversal that path is needed.
summaries of node 'a' will have {b:2,c:1} and children of node 'a' are 'b' and 'c'.
I have implemented a calculation to obtain the node score of each nodes.
The formula to obtain the value is:
The children list can not be empty or a flag must be true;
The iterative way works pretty well:
class TreeManager {
def scoreNo(nodes:List[Node]): List[(String, Double)] = {
nodes.headOption.map(node => {
val ranking = node.key.toString -> scoreNode(Some(node)) :: scoreNo(nodes.tail)
ranking ::: scoreNo(node.children)
}).getOrElse(Nil)
}
def scoreNode(node:Option[Node], score:Double = 0, depth:Int = 0):Double = {
node.map(n => {
var nodeScore = score
for(child <- n.children){
if(!child.children.isEmpty || child.hasInvitedSomeone == Some(true)){
nodeScore = scoreNode(Some(child), (nodeScore + scala.math.pow(0.5, depth)), depth+1)
}
}
nodeScore
}).getOrElse(score)
}
}
But after i've refactored this piece of code to use recursion, the results are totally wrong:
class TreeManager {
def scoreRecursive(nodes:List[Node]): List[(Int, Double)] = {
def scoreRec(nodes:List[Node], score:Double = 0, depth:Int = 0): Double = nodes match {
case Nil => score
case n =>
if(!n.head.children.isEmpty || n.head.hasInvitedSomeone == Some(true)){
score + scoreRec(n.tail, score + scala.math.pow(0.5, depth), depth + 1)
} else {
score
}
}
nodes.headOption.map(node => {
val ranking = node.key -> scoreRec(node.children) :: scoreRecursive(nodes.tail)
ranking ::: scoreRecursive(node.children)
}).getOrElse(Nil).sortWith(_._2 > _._2)
}
}
The Node is an object of a tree and it's represented by the following class:
case class Node(key:Int,
children:List[Node] = Nil,
hasInvitedSomeone:Option[Boolean] = Some(false))
And here is the part that i'm running to check results:
object Main {
def main(bla:Array[String]) = {
val xx = new TreeManager
val values = List(
Node(10, List(Node(11, List(Node(13))),
Node(12,
List(
Node(14, List(
Node(15, List(Node(18))), Node(17, hasInvitedSomeone = Some(true)),
Node(16, List(Node(19, List(Node(20)))),
hasInvitedSomeone = Some(true))),
hasInvitedSomeone = Some(true))),
hasInvitedSomeone = Some(true))),
hasInvitedSomeone = Some(true)))
val resIterative = xx.scoreNo(values)
//val resRecursive = xx.scoreRec(values)
println("a")
}
}
The iterative way is working because i've checked it but i didn't get why recursive return wrong values.
Any idea?
Thank in advance.
The recursive version never recurses on children of the nodes, just on the tail. Whereas the iterative version correctly both recurse on the children and iterate on the tail.
You'll notice your "iterative" version is also recursive btw.
I want to write a for loop in scala, but the counter should get incremented by more than one (the amount is variable) in some special cases.
You can do this with a combination of a filter and an external var. Here is an example:
var nextValidVal = 0
for (i <- 0 to 99; if i >= nextValidVal) {
var amountToSkip = 0
// Whatever this loop is for
nextValidVal = if (amountToSkip > 0) i + amountToSkip + 1 else nextValidVal
}
So in the main body of your loop, you can set amountToSkip to n according to your conditions. The next n values of i´s sequence will be skipped.
If your sequence is pulled from some other kind of sequence, you could do it like this
var skip = 0
for (o <- someCollection if { val res = skip == 0; skip = if (!res) skip - 1 else 0; res } ) {
// Do stuff
}
If you set skip to a positive value in the body of the loop, the next n elements of the sequence will be skipped.
Of course, this is terribly imperative and side-effecty. I would look for other ways to to this where ever possible, by mapping or filtering or folding the original sequence.
You could implement your own stream to reflect step, for example:
import scala.collection.immutable.Stream
import ForStream._
object Test {
def main(args: Array[String]): Unit = {
val range = 0 to 20 by 1 withVariableStep; // in case you like definition through range
//val range = ForStream(0,20,1) // direct definition
for (i<- range) {
println(s"i=$i")
range.step = range.step + 1
}
}
}
object ForStream{
implicit def toForStream(range: Range): ForStream = new ForStreamMaster(range.start, range.end,range.step)
def apply(head:Int, end:Int, step:Int) = new ForStreamMaster(head, end,step)
}
abstract class ForStream(override val head: Int, val end: Int, var step: Int) extends Stream[Int] {
override val tailDefined = false
override val isEmpty = head > end
def withVariableStep = this
}
class ForStreamMaster(_head: Int, _end: Int, _Step: Int) extends ForStream(_head, _end,_Step){
override def tail = if (isEmpty) Stream.Empty else new ForStreamSlave(head + step, end, step, this)
}
class ForStreamSlave(_head: Int, _end: Int, _step: Int, val master: ForStream) extends ForStream(_head, _end,_step){
override def tail = if (isEmpty) Stream.Empty else new ForStreamSlave(head + master.step, end, master.step, master)
}
This prints:
i=0
i=2
i=5
i=9
i=14
i=20
You can define ForStream from Range with implicits, or define it directly. But be carefull:
You are not iterating Range anymore!
Stream should be immutable, but step is mutable!
Also as #om-nom-nom noted, this might be better implemented with recursion
Why not use the do-while loop?
var x = 0;
do{
...something
if(condition){change x to something else}
else{something else}
x+=1
}while(some condition for x)
I've got the following Scala code(ported from a Java one):
import scala.util.control.Breaks._
object Main {
def pascal(col: Int, row: Int): Int = {
if(col > row) throw new Exception("Coloumn out of bound");
else if (col == 0 || col == row) 1;
else pascal(col - 1, row - 1) + pascal(col, row - 1);
}
def balance(chars: List[Char]): Boolean = {
val string: String = chars.toString()
if (string.length() == 0) true;
else if(stringContains(")", string) == false && stringContains("(", string) == false) true;
else if(stringContains(")", string) ^ stringContains("(", string)) false;
else if(getFirstPosition("(", string) > getFirstPosition(")", string)) false;
else if(getLastPosition("(", string) > getLastPosition(")", string)) false;
else if(getCount("(", string) != getCount(")", string)) false;
var positionOfFirstOpeningBracket = getFirstPosition("(", string);
var openingBracketOccurences = 1; //we already know that at the first position there is an opening bracket so we are incrementing it right away with 1 and skipping the firstPosition variable in the loop
var closingBracketOccurrences = 0;
var positionOfClosingBracket = 0;
breakable {
for(i <- positionOfFirstOpeningBracket + 1 until string.length()) {
if (string.charAt(i) == ("(".toCharArray())(0)) {
openingBracketOccurences += 1;
}
else if(string.charAt(i) == (")".toCharArray())(0) ) {
closingBracketOccurrences += 1;
}
if(openingBracketOccurences - closingBracketOccurrences == 0) { //this is an important part of the algorithm. if the string is balanced and at the current iteration opening=closing that means we know the bounds of our current brackets.
positionOfClosingBracket = i; // this is the position of the closing bracket
break;
}
}
}
val insideBrackets: String = string.substring(positionOfFirstOpeningBracket + 1, positionOfClosingBracket)
balance(insideBrackets.toList) && balance( string.substring(positionOfClosingBracket + 1, string.length()).toList)
def getFirstPosition(character: String, pool: String): Int =
{
for(i <- 0 until pool.length()) {
if (pool.charAt(i) == (character.toCharArray())(0)) {
i;
}
}
-1;
}
def getLastPosition(character: String, pool: String): Int =
{
for(i <- pool.length() - 1 to 0 by -1) {
if (pool.charAt(i) == (character.toCharArray())(0)) {
i;
}
}
-1;
}
//checks if a string contains a specific character
def stringContains(needle: String, pool: String): Boolean = {
for(i <- 0 until pool.length()) {
if(pool.charAt(i) == (needle.toCharArray())(0)) true;
}
false;
}
//gets the count of occurrences of a character in a string
def getCount(character: String, pool: String) = {
var count = 0;
for ( i <- 0 until pool.length()) {
if(pool.charAt(i) == (character.toCharArray())(0)) count += 1;
}
count;
}
}
}
The problem is that the Scala IDE(lates version for Scaal 2.10.1) gives the following error at line 78(on which there is a closin brace): "Type mismatch; Found Unit, expected Boolean". I really can't understand what the actual problem is. The warning doesn't give any information where the error might be.
In Scala (and most other functional languages) the result of a function is the value of the last expression in the block. The last expression of your balance function is the definition of function getCount, which is of type Unit (the Scala equivalent of void), and your function is declared as returning a Boolean, thus the error.
In practice, you've just screwed up your brackets, which would be obvious if you used proper indentation (Ctrl+A, Ctrl+Shift+F in scala-ide).
To make it compile, you can put the following two lines at the end of the balance method instead of in the middle:
val insideBrackets: String = string.substring(positionOfFirstOpeningBracket + 1, positionOfClosingBracket)
balance(insideBrackets.toList) && balance(string.substring(positionOfClosingBracket + 1, string.length()).toList)
You would also have to put the inner functions at the top of balance I think -- such as getCount.
I would like to be able to grow an Array-like structure up to a maximum size, after which the oldest (1st) element would be dropped off the structure every time a new element is added. I don't know what the best way to do this is, but one way would be to extend the ArrayBuffer class, and override the += operator so that if the maximum size has been reached, the first element is dropped every time a new one is added. I haven't figured out how to properly extend collections yet. What I have so far is:
class FiniteGrowableArray[A](maxLength:Int) extends scala.collection.mutable.ArrayBuffer {
override def +=(elem:A): <insert some return type here> = {
// append element
if(length > maxLength) remove(0)
<returned collection>
}
}
Can someone suggest a better path and/or help me along this one? NOTE: I will need to arbitrarily access elements within the structure multiple times in between the += operations.
Thanks
As others have discussed, you want a ring buffer. However, you also have to decide if you actually want all of the collections methods or not, and if so, what happens when you filter a ring buffer of maximum size N--does it keep its maximum size, or what?
If you're okay with merely being able to view your ring buffer as part of the collections hierarchy (but don't want to use collections efficiently to generate new ring buffers) then you can just:
class RingBuffer[T: ClassManifest](maxsize: Int) {
private[this] val buffer = new Array[T](maxsize+1)
private[this] var i0,i1 = 0
private[this] def i0up = { i0 += 1; if (i0>=buffer.length) i0 -= buffer.length }
private[this] def i0dn = { i0 -= 1; if (i0<0) i0 += buffer.length }
private[this] def i1up = { i1 += 1; if (i1>=buffer.length) i1 -= buffer.length }
private[this] def i1dn = { i1 -= 1; if (i1<0) i1 += buffer.length }
private[this] def me = this
def apply(i: Int) = {
val j = i+i0
if (j >= buffer.length) buffer(j-buffer.length) else buffer(j)
}
def size = if (i1<i0) buffer.length+i1-i0 else i1-i0
def :+(t: T) = {
buffer(i1) = t
i1up; if (i1==i0) i0up
this
}
def +:(t: T) = {
i0dn; if (i0==i1) i1dn
buffer(i0) = t
this
}
def popt = {
if (i1==i0) throw new java.util.NoSuchElementException
i1dn; buffer(i1)
}
def poph = {
if (i1==i0) throw new java.util.NoSuchElementException
val ans = buffer(i0); i0up; ans
}
def seqView = new IndexedSeq[T] {
def apply(i: Int) = me(i)
def length = me.size
}
}
Now you can use this easily directly, and you can jump out to IndexedSeq when needed:
val r = new RingBuffer[Int](4)
r :+ 7 :+ 9 :+ 2
r.seqView.mkString(" ") // Prints 7 9 2
r.popt // Returns 2
r.poph // Returns 7
r :+ 6 :+ 5 :+ 4 :+ 3
r.seqView.mkString(" ") // Prints 6 5 4 3 -- 7 fell off the end
0 +: 1 +: 2 +: r
r.seqView.mkString(" ") // Prints 0 1 2 6 -- added to front; 3,4,5 fell off
r.seqView.filter(_>1) // Vector(2,6)
and if you want to put things back into a ring buffer, you can
class RingBufferImplicit[T: ClassManifest](ts: Traversable[T]) {
def ring(maxsize: Int) = {
val rb = new RingBuffer[T](maxsize)
ts.foreach(rb :+ _)
rb
}
}
implicit def traversable2ringbuffer[T: ClassManifest](ts: Traversable[T]) = {
new RingBufferImplicit(ts)
}
and then you can do things like
val rr = List(1,2,3,4,5).ring(4)
rr.seqView.mkString(" ") // Prints 2,3,4,5