Assert two streams are the same - reactive-programming

I am trying to build a test case that checks if two streams are the same. Zip can be used to check the value elements are the same, but it doesn't help if one stream is the wrong length. Any ideas on how to approach this?

There's an operator for that: sequenceEqual.
Returns
(Observable): An observable sequence that contains a single element which indicates whether both sequences are of equal length and their corresponding elements are equal according to the specified equality comparer.
Here's a simple example showing the length equality check.
var log = console.log.bind(console);
Rx.Observable.of(1, 2, 3)
.sequenceEqual(Rx.Observable.of(1, 2, 3))
.subscribe(log); // logs true
Rx.Observable.of(1, 2, 3)
.sequenceEqual(Rx.Observable.of(1, 2))
.subscribe(log); // logs false

Related

hasDefiniteSize and knownSize

I am going through the List methods in Scala.
val mylist = List(1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 3, 10)
I am quite confused by hasDefiniteSize and knownSize.
For List, hasDefiniteSize returns true and knownSize returns -1.
What is the exact theory behind these methods?
This method is defined by a superclass of List which is common with possibly endless collections (like Streams, LazyLists and Iterators).
For more details, I believe the documentation puts it best.
Here is the one for hasDefiniteSize in version 2.13.1:
Tests whether this collection is known to have a finite size. All
strict collections are known to have finite size. For a non-strict
collection such as Stream, the predicate returns true if all elements
have been computed. It returns false if the stream is not yet
evaluated to the end. Non-empty Iterators usually return false even if
they were created from a collection with a known finite size.
Note: many collection methods will not work on collections of infinite
sizes. The typical failure mode is an infinite loop. These methods
always attempt a traversal without checking first that hasDefiniteSize
returns true. However, checking hasDefiniteSize can provide an
assurance that size is well-defined and non-termination is not a
concern.
Note that hasDefiniteSize is deprecated with the following message:
(Since version 2.13.0) Check .knownSize instead of .hasDefiniteSize
for more actionable information (see scaladoc for details)
The documentation for knownSize further states:
The number of elements in this collection, if it can be cheaply
computed, -1 otherwise. Cheaply usually means: Not requiring a
collection traversal.
List is an implementation of a linked list, which is why List(1, 2, 3).hasDefiniteSize returns true (the collection is not boundless) but List(1, 2, 3).knownSize returns -1 (computing the collection size requires traversing the whole list).
Some collections know their size
Vector(1,2,3).knownSize // 3
and some do not
List(1,2,3).knownSize // -1
If a collection knows its size then some operations can be optimised, for example, consider how Iterable#sizeCompare uses knownSize to possibly return early
def sizeCompare(that: Iterable[_]): Int = {
val thatKnownSize = that.knownSize
if (thatKnownSize >= 0) this sizeCompare thatKnownSize
else {
val thisKnownSize = this.knownSize
if (thisKnownSize >= 0) {
val res = that sizeCompare thisKnownSize
// can't just invert the result, because `-Int.MinValue == Int.MinValue`
if (res == Int.MinValue) 1 else -res
} else {
val thisIt = this.iterator
val thatIt = that.iterator
while (thisIt.hasNext && thatIt.hasNext) {
thisIt.next()
thatIt.next()
}
java.lang.Boolean.compare(thisIt.hasNext, thatIt.hasNext)
}
}
}
See related question Difference between size and sizeIs

Scala: For loop that matches ints in a List

New to Scala. I'm iterating a for loop 100 times. 10 times I want condition 'a' to be met and 90 times condition 'b'. However I want the 10 a's to occur at random.
The best way I can think is to create a val of 10 random integers, then loop through 1 to 100 ints.
For example:
val z = List.fill(10)(100).map(scala.util.Random.nextInt)
z: List[Int] = List(71, 5, 2, 9, 26, 96, 69, 26, 92, 4)
Then something like:
for (i <- 1 to 100) {
whenever i == to a number in z: 'Condition a met: do something'
else {
'condition b met: do something else'
}
}
I tried using contains and == and =! but nothing seemed to work. How else can I do this?
Your generation of random numbers could yield duplicates... is that OK? Here's how you can easily generate 10 unique numbers 1-100 (by generating a randomly shuffled sequence of 1-100 and taking first ten):
val r = scala.util.Random.shuffle(1 to 100).toList.take(10)
Now you can simply partition a range 1-100 into those who are contained in your randomly generated list and those who are not:
val (listOfA, listOfB) = (1 to 100).partition(r.contains(_))
Now do whatever you want with those two lists, e.g.:
println(listOfA.mkString(","))
println(listOfB.mkString(","))
Of course, you can always simply go through the list one by one:
(1 to 100).map {
case i if (r.contains(i)) => println("yes: " + i) // or whatever
case i => println("no: " + i)
}
What you consider to be a simple for-loop actually isn't one. It's a for-comprehension and it's a syntax sugar that de-sugares into chained calls of maps, flatMaps and filters. Yes, it can be used in the same way as you would use the classical for-loop, but this is only because List is in fact a monad. Without going into too much details, if you want to do things the idiomatic Scala way (the "functional" way), you should avoid trying to write classical iterative for loops and prefer getting a collection of your data and then mapping over its elements to perform whatever it is that you need. Note that collections have a really rich library behind them which allows you to invoke cool methods such as partition.
EDIT (for completeness):
Also, you should avoid side-effects, or at least push them as far down the road as possible. I'm talking about the second example from my answer. Let's say you really need to log that stuff (you would be using a logger, but println is good enough for this example). Doing it like this is bad. Btw note that you could use foreach instead of map in that case, because you're not collecting results, just performing the side effects.
Good way would be to compute the needed stuff by modifying each element into an appropriate string. So, calculate the needed strings and accumulate them into results:
val results = (1 to 100).map {
case i if (r.contains(i)) => ("yes: " + i) // or whatever
case i => ("no: " + i)
}
// do whatever with results, e.g. print them
Now results contains a list of a hundred "yes x" and "no x" strings, but you didn't do the ugly thing and perform logging as a side effect in the mapping process. Instead, you mapped each element of the collection into a corresponding string (note that original collection remains intact, so if (1 to 100) was stored in some value, it's still there; mapping creates a new collection) and now you can do whatever you want with it, e.g. pass it on to the logger. Yes, at some point you need to do "the ugly side effect thing" and log the stuff, but at least you will have a special part of code for doing that and you will not be mixing it into your mapping logic which checks if number is contained in the random sequence.
(1 to 100).foreach { x =>
if(z.contains(x)) {
// do something
} else {
// do something else
}
}
or you can use a partial function, like so:
(1 to 100).foreach {
case x if(z.contains(x)) => // do something
case _ => // do something else
}

How to do equality check of two DataFrames?

I have below scenario:
I have 2 dataframes containing only 1 column
Lets say
DF1=(1,2,3,4,5)
DF2=(3,6,7,8,9,10)
Basically those values are keys and I am creating a parquet file of DF1 if the keys in DF1 are not in DF2 (In current example it should return false). My current way of achieving my requirement is:
val df1count= DF1.count
val df2count=DF2.count
val diffDF=DF2.except(DF1)
val diffCount=diffDF.count
if(diffCount==(df2count-df1count)) true
else false
The problem with this approach is I am calling action elements 4 times which is for sure not the best way. Can someone suggest me the best effective way of achieving this?
You can use intersect to get the values common to both DataFrames, and then check if it's empty:
DF1.intersect(DF2).take(1).isEmpty
That will use only one action (take(1)) and a fairly quick one.
Here is the check if Dataset first ist equal to Dataset second:
if(first.except(second).union(second.except(first)).count() == 0)
first == second
else
first != second
Try an intersection combined with a count this would assure the the contents are the same and the number of values in both are the same and asserts to a true
val intersectcount= DF1.intersect(DF2).count()
val check =(intersectcount == DF1.count()) && (intersectcount==DF2.count())

How to take one particular number or a range of particular number from a set of number?

I am looking for to take one particular number or range of numbers from a set of number?
Example
A = [-10,-2,-3,-8, 0 ,1, 2, 3, 4 ,5,7, 8, 9, 10, -100];
How can I just take number 5 from the set of above number and
How can I take a range of number for example from -3 to 4 from A.
Please help.
Thanks
I don't know what you are trying to accomplish by this. But you could check each entry of the set and test it it's in the specified range of numbers. The test for a single number could be accomplished by testing each number explicitly or as a special case of range check where the lower and the upper bound are the same number.
looping and testing, no matter what the programming language is, although most programming languages have builtin methods for accomplishing this type of task (so you may want to specify what language are you supposed to use for your homework):
procfun get_element:
index=0
for element in set:
if element is 5 then return (element,index)
increment index
your "5" is in element and at set[index]
getting a range:
procfun getrange:
subset = []
index = 0
for element in set:
if element is -3:
push element in subset
while index < length(set)-1:
push set[index] in subset
if set[index] is 4:
return subset
increment index
#if we met "-3" but we didn't met "4" then there's no such range
return None
#keep searching for a "-3"
increment index
return None
if ran against A, subset would be [-3,-8, 0 ,1, 2, 3, 4]; this is a "first matched, first grabbed" poorman's algorithm. on sorted sets the algorithms can get smarter and faster.

What does "rows[0]" mean?

HI! I am looking for a document that will define what the word "rows[0]" means. this is for BIRT in the Eclipse framework. Perhaps this is a Javascript word? I dunno... been searching like mad and have found nothing yet. Any ideas?
rows is a shortcut to dataSet.rows. Returns the current data rows (of type DataRow[]) for the data set associated with this report item instance. If this report element has no data set, this property is undefined.
Source: http://www.eclipse.org/birt/phoenix/ref/ROM_Scripting_SPEC.pdf
Typically code like rows[x] is accessing an element inside an array. Any intro to programming book should be able to define that for you.
rows[0] would be accessing the first element in the array.
That operation has several names depending on the language, but generally the same concept. In Java, it's an array access expression in C#, it's an indexer or array access operator. As with just about anything, C++ is more complicated, but basically the [] operator takes a collection of something or an array and pulls out (or assigns to) a specific numbered element in that collection or array (generally starting at 0). So in C# ...
// create a list of integers
List<int> lst = new List<int>() { 1, 2, 3, 4, 5 };
// access list
int x = lst[0]; // get the first element of the list, x = 1 afterwords
x = lst[2]; // get the third element of the list, x = 3 afterwords
x = lst[4]; // get the fifth element of the list, x = 5 afterwords
x = lst[5]; // IndexOutOfBounds Exception