take(n) doesn't have effect after groupBy in RxJava2 - group-by

I am trying to group several model instances by name and then use take(n) to take only certain items per group, but somehow the take has no effect on GroupedObservable. Here is the code
Let's assume this contains a list with 10 items and 5 have the name "apple" and the other 5 have the name "pear"
Observable<Item> items....
Observable<Item> groupedItems = items.groupBy(Item::name)
.flatMap(it -> it.take(2));
So I imagine groupedItems has to emit 2 "apples" and 2 "pears", but it has all of them instead.
Is there something which I am getting wrong, do I need to do it differently?

Cancelled groups are recreated when the same key is encountered again. You need to make sure the group is not stopped and you'd have to ignore further items in some fashion:
source.groupBy(func)
.flatMap(group ->
group.publish(p -> p.take(5).mergeWith(p.ignoreElements()))
);

Related

Randomising number of repeats for different users in Gatling

I'm currently trying to write a scenario in Gatling where I would like an action to be repeated between 1 and 8 times. The randomness should be on a per user basis, so for example one user may get 3 repeats and another gets 7.
I'm wanting the scenario to work like this to simulate the fact that I don't know for certain how many times a user will repeat an action.
I tried the following:
class MySimulation extends Simulation {
private val myScenario = scenario("Scenario")
.repeat(Random.nextInt(8) + 1) {
// some stuff
}
setUp(myScenario.inject(rampUsers(100) during (60 seconds)))
}
However what this ends up doing is compiling to one random number, and then using that for every single user. So if the random number generation gets 5, each user will end up repeating 5 times, which is not what I want.
Is there a way in Gatling so that each user gets a different random number for the repeat function? Or will it only work with constant numbers?
The way you attempted didn't work as your scenario as defined is a builder that is executed once at startup - so Random.nextInt is only called once.
But there are a few ways you could achieve what you want.
The easiest (since you just want a random number) would be to use the gatling EL to randomly take an element of a sequence.
firstly, define a scala val with the range of numbers you want
private val times = 1 to 8
then put your range into the session and use the EL to get a random value from the collection
.exec(_.set("times", times))
.repeat("${times.random()}" ) {
// some stuff
}
Alternatively, you could define a custom feeder - this approach lets you do things like random strings
private val times = Iterator.continually( Map( "times" -> Random.nextInt(8) + 1))
Then just feed and use the "times" value
.feed(times)
.repeat("${times}") {
// some stuff
}

Breaking from inside a collect in Drools

I'm new to Drools, so I apologize if this is basic. But how do I break in the middle of a collect? For example, in the following code
c : Customer()
items : List( size == c.items.size )
from collect( Item( price > 10 ) from c.items )
This code checks if all items have a price > 10. But if I want to see if any of the items have a price > 10, what do I do? I can change code to size > 0 instead of size == c.items.size, but that would still mean the collect iterates through all the items. Is it possible to break if any of the items match the condition from within the collect?
If you just want to check for existence, then you can use the exists operator:
rule "Sample"
c : Customer()
exists Item( price > 10 ) from c.items
then
//...
end
In this case, you don't even need to use a collect. The from keyword will "loop" over all of the items in the collection.
You can check the Drools' Manual for more information about this Conditional Element.
Hope it helps,

Group by with paging (take skip)

I am trying to make some kind of paging. But, I need to do it on a grouped result, because every time I do a page. It is a requirement that all data for a given group is fetched.
Below code:
var erere = dbCtx.StatusViewList
.GroupBy(p => p.TurbineNumber)
.OrderBy(p => p.FirstOrDefault().TurbineNumber)
.Skip(0)
.Take(10)
.ToList();
I have 200k items and the statement above seems to be so slow the connection times out. My best bet is its the orderby that slows it down. Any suggestions how to do this, or how to speed the statement above up?
At your case, grouping on server side is not needed at all, because anyway you will get all data, but with additional overhead on server side. So try another approach:
var groupPage = dbCtx.StatusViewList.Select(x => TurbineNumber)
.Distinct().OrderBy(x => x.TurbineNumber).Skip(40).Take(20).ToList();
var data = dbCtx.StatusViewList.Where(x => groupPage.Contains(x.TurbineNumber))
.ToList().GroupBy(x => x.TurbineNumber).ToList();
The GroupBy needs to visit all elements to group all StatusViews into groups of StatusViews that have equal TurbineNumber.
After that, you take every group, from every group your take the first element and ask for its TurbineNumber, to sort by Turbine Number.
Apparently you take into account that a group of StatusViews might be empty (FirstOrDefault, instead of First), but then again, you assume that FirstOrDefault never returns null.
One of the things that could speed up your query is using the Key of your groups. The Key is the element on which you grouped, in your case the TurbineNumber: All elements in the a group have the same TurbineNumber.
var result = dbCtx.StatusViewList
.GroupBy(statusView => statusView.TurbineNumber)
.OrderBy(group => group.Key)
...
I think that will be a first step to improve performance.
However, you return a fixed number of Groups. Some Groups might be huge, 1000s of elements, some groups might be small: only one element. So the result of one page could be 10 groups, each with 1000 elements, having a total of 10000 elements. It could also be 10 groups, each with 1 element, a total of 10 elements. I'm not sure if this would be the result you want by paging.
Wouldn't you prefer a page that always has the same number of elements, preferably with the same TurbineNumber, If there are not many same TurbineNumbers fill the rest of your page with the next TurbineNumber. If there are too many StatusViews with this TurbineNumber divide them into several pages?
Something like:
TurbineNumber StatusView
4 A
4 B
4 F
5 D
5 K
6 C
6 Z
6 Q
6 W
7 E
To do this, don't GroupBy, use OrderBy and then Skip and Take
IEnumerable<StatusView> GetPage(int pageNr, int pageSize)
{
return dbCtx.StatusViewList
.Orderby(statusView => statusView.TurbineNumber)
.Skip(pageNr * pageSize)
.Take(pageSize)
}
If you create an extra index for TurbineNumber, this will be very fast:
In your DbContext.OnModelCreating(DbModelBuilder modelBuilder):
// Add an extra index on TurbineNumber:
var indexAttribute = new IndexAttribute("TurbineIndex", 0) {IsUnique = false}
var indexAnnotation =new IndexAnnotation(indexAttribute);
modelBuilder.Entity<Statusview>()
.Property(statusView => statusView.TurbineNumber)
.HasColumnAnnotation("MyIndexName", indexAnnotation);

Count unique values in list of sub-lists

I have RDD of the following structure (RDD[(String,Map[String,List[Product with Serializable]])]):
This is a sample data:
(600,Map(base_data -> List((10:00 01-08-2016,600,111,1,1), (10:15 01-08-2016,615,111,1,5)), additional_data -> List((1,2)))
(601,Map(base_data -> List((10:01 01-08-2016,600,111,1,2), (10:02 01-08-2016,619,111,1,2), (10:01 01-08-2016,600,111,1,4)), additional_data -> List((5,6)))
I want to calculate the number of unique values of the 4th fields in sub-lists.
For instance let's take the first entry. The list is List((10:00 01-08-2016,600,111,1,1), (10:15 01-08-2016,615,111,1,5)). It contains 2 unique values (1 and 5) in the 4th field of sub-lists.
As to the second entry, it also contains 2 unique values (2 and 4), because 2 is repeated twice.
The resulting RDD should be of the format RDD[Map[String,Any]].
I tried to solve this task as follows:
val result = myRDD.map({
line => Map(("id",line._1),
("unique_count",line._2.get("base_data").groupBy(l => l).count(_))))
})
However this code does not do what I need. In fact, I don't know how to properly indicate that I want to group by 4th field...
You are quite close to the solution. There is no need to call groupBy, but you can access the item of the tuples by index, transform the resulting List into a Set and then just return the size of the Set, which corresponds to the number of unique elements:
("unique_count", line._2("base_data").map(bd => bd.productElement(4)).toSet.size)

Scala: How to create a map over a collection from a set of keys?

Say I have a set of people Set[People]. Each person has an age. I want to create a function, which creates a Map[Int, Seq[People]] where for each age from, say, 0 to 100, there would be a sequence of people of that age or an empty sequence if there were no people of that age in the original collection.
I.e. I'm doing something along the lines
Set[People].groupBy(_.age)
where the output was
Map[Int, Seq[People]](0 -> Seq[John,Mary], 1-> Seq[People](), 2 -> Seq[People](Bill)...
groupBy of course omits all those ages for which there are no people. How should I implement this?
Configure a default value for your map:
val grouped = people.groupBy(_.age).withDefaultValue(Set())
if you need the values to be sequences you can map them
val grouped = people.groupBy(_.age).mapValues(_.toSeq).withDefaultValue(Seq())
Remember than, as the documentation puts it:
Note: `get`, `contains`, `iterator`, `keys`, etc are not affected by `withDefault`.
Since you've got map with not empty sequences corresponding to ages, you can fill the rest with empty collections:
val fullMap = (0 to 100).map (index => index -> map.getOrElse(index, None)).toMap