How to groupBy groupBy? - scala

I need to map through a List[(A,B,C)] to produce an html report. Specifically, a
List[(Schedule,GameResult,Team)]
Schedule contains a gameDate property that I need to group by on to get a
Map[JodaTime, List(Schedule,GameResult,Team)]
which I use to display gameDate table row headers. Easy enough:
val data = repo.games.findAllByDate(fooDate).groupBy(_._1.gameDate)
Now the tricky bit (for me) is, how to further refine the grouping in order to enable mapping through the game results as pairs? To clarify, each GameResult consists of a team's "version" of the game (i.e. score, location, etc.), sharing a common Schedule gameID with the opponent team.
Basically, I need to display a game result outcome on one row as:
3 London Dragons vs. Paris Frogs 2
Grouping on gameDate let's me do something like:
data.map{case(date,games) =>
// game date row headers
<tr><td>{date.toString("MMMM dd, yyyy")}</td></tr>
// print out game result data rows
games.map{case(schedule,result, team)=>
...
// BUT (result,team) slice is ungrouped, need grouped by Schedule gameID
}
}
In the old version of the existing application (PHP) I used to
for($x = 0; $x < $this->gameCnt; $x = $x + 2) {...}
but I'd prefer to refer to variable names and not the come-back-later-wtf-is-that-inducing:
games._._2(rowCnt).total games._._3(rowCnt).name games._._1(rowCnt).location games._._2(rowCnt+1).total games._._3(rowCnt+1).name
maybe zip or double up for(t1 <- data; t2 <- data) yield(?) or something else entirely will do the trick. Regardless, there's a concise solution, just not coming to me right now...

Maybe I'm misunderstanding your requirements, but it seems to me that all you need is an additional groupBy:
repo.games.findAllByDate(fooDate).groupBy(_._1.gameDate).mapValues(_.groupBy(_._1.gameID))
The result will be of type:
Map[JodaTime, Map[GameId, List[(Schedule,GameResult,Team)]]]
(where GameId is the type of the return type of Schedule.gameId)
Update: if you want the results as pairs, then pattern matching is your friend, as shown by Arjan. This would give us:
val byDate = repo.games.findAllByDate(fooDate).groupBy(_._1.gameDate)
val data = byDate.mapValues(_.groupBy(_._1.gameID).mapValues{ case List((sa, ra, ta), (sb, rb, tb)) => (sa, (ta, ra), (tb, rb)))
This time the result is of type:
Map[JodaTime, Iterable[ (Schedule,(Team,GameResult),(Team,GameResult))]]
Note that this will throw a MatchError if there are not exactly 2 entries with the same gameId. In real code you will definitely want to check for this case.

Ok a soultion from Régis Jean-Gilles:
val data = repo.games.findAllByDate(fooDate).groupBy(_._1.gameDate).mapValues(_.groupBy(_._1.gameID))
You said it was not correct, maybe you just didnt use it the right way?
Every List in the result is a pair of games with the same GameId.
You could pruduce html like that:
data.map{case(date,games) =>
// game date row headers
<tr><td>{date.toString("MMMM dd, yyyy")}</td></tr>
// print out game result data rows
games.map{case (gameId, List((schedule, result, team), (schedule, result, team))) =>
...
}
}
And since you dont need a gameId, you can return just the paired games:
val data = repo.games.findAllByDate(fooDate).groupBy(_._1.gameDate).mapValues(_.groupBy(_._1.gameID).values)
Tipe of result is now:
Map[JodaTime, Iterable[List[(Schedule,GameResult,Team)]]]
Every list again a pair of two games with the same GameId

Related

Processing of two dimension list in scala

I want process to two dimension list in scala: eg:
Input:
List(
List(‘John’, ‘Will’, ’Steven’),
List(25,28,34),
List(‘M’,’M’,’M’)
)
O/P:
John|25|M
Will|28|M
Steven|34|M
You need to work with indexes, and while working with indexes List is not probably the wisest choice, and also your input is not a well-structured two dimensional list, I would suggest you to have a dedicated data structure for this:
// you can name this better
case class Input(names: Array[String], ages: Array[Int], genders: Array[Char])
val input = Input(Array("John", "Will", "Steven"), Array(25, 28, 34), Array('M', 'M', 'M'))
By essence, this question is meant to use indexes with, so this would be a solution using indexes:
input.names.zipWithIndex.map {
case (name, index) =>
(name, inp.ages(index), inp.genders(index))
}
// or use a for instead
But always try to keep an eye on IndexOutOfBoundsException while working with array indexes. A safer and more "Scala-ish" approach would be zipping, which also take care of arrays with non-equal sizes:
input.names zip inp.ages zip inp.genders map { // flattening tuples
case ((name, age), gender) => (name, age, gender)
}
But this is a little bit slower (one iteration per zipping, and an iteration in mapping stage, I don't know if there's an optimization for this).
And in case you didn't want to go on with this kind of modeling, the algorithm would stay just the same, except for explicit unsafe conversions you will need to do from Any to String and Int and Char.
I got the expected o/p from for loop
for(i<-0 to 2)
{
println()
for(j<-0 to 2)
{
print(twodlist(j)(i)+"|")
}
}
Note: In loop we can use list length instead of 2 to make it more generic.

Scala: For loop that matches ints in a List

New to Scala. I'm iterating a for loop 100 times. 10 times I want condition 'a' to be met and 90 times condition 'b'. However I want the 10 a's to occur at random.
The best way I can think is to create a val of 10 random integers, then loop through 1 to 100 ints.
For example:
val z = List.fill(10)(100).map(scala.util.Random.nextInt)
z: List[Int] = List(71, 5, 2, 9, 26, 96, 69, 26, 92, 4)
Then something like:
for (i <- 1 to 100) {
whenever i == to a number in z: 'Condition a met: do something'
else {
'condition b met: do something else'
}
}
I tried using contains and == and =! but nothing seemed to work. How else can I do this?
Your generation of random numbers could yield duplicates... is that OK? Here's how you can easily generate 10 unique numbers 1-100 (by generating a randomly shuffled sequence of 1-100 and taking first ten):
val r = scala.util.Random.shuffle(1 to 100).toList.take(10)
Now you can simply partition a range 1-100 into those who are contained in your randomly generated list and those who are not:
val (listOfA, listOfB) = (1 to 100).partition(r.contains(_))
Now do whatever you want with those two lists, e.g.:
println(listOfA.mkString(","))
println(listOfB.mkString(","))
Of course, you can always simply go through the list one by one:
(1 to 100).map {
case i if (r.contains(i)) => println("yes: " + i) // or whatever
case i => println("no: " + i)
}
What you consider to be a simple for-loop actually isn't one. It's a for-comprehension and it's a syntax sugar that de-sugares into chained calls of maps, flatMaps and filters. Yes, it can be used in the same way as you would use the classical for-loop, but this is only because List is in fact a monad. Without going into too much details, if you want to do things the idiomatic Scala way (the "functional" way), you should avoid trying to write classical iterative for loops and prefer getting a collection of your data and then mapping over its elements to perform whatever it is that you need. Note that collections have a really rich library behind them which allows you to invoke cool methods such as partition.
EDIT (for completeness):
Also, you should avoid side-effects, or at least push them as far down the road as possible. I'm talking about the second example from my answer. Let's say you really need to log that stuff (you would be using a logger, but println is good enough for this example). Doing it like this is bad. Btw note that you could use foreach instead of map in that case, because you're not collecting results, just performing the side effects.
Good way would be to compute the needed stuff by modifying each element into an appropriate string. So, calculate the needed strings and accumulate them into results:
val results = (1 to 100).map {
case i if (r.contains(i)) => ("yes: " + i) // or whatever
case i => ("no: " + i)
}
// do whatever with results, e.g. print them
Now results contains a list of a hundred "yes x" and "no x" strings, but you didn't do the ugly thing and perform logging as a side effect in the mapping process. Instead, you mapped each element of the collection into a corresponding string (note that original collection remains intact, so if (1 to 100) was stored in some value, it's still there; mapping creates a new collection) and now you can do whatever you want with it, e.g. pass it on to the logger. Yes, at some point you need to do "the ugly side effect thing" and log the stuff, but at least you will have a special part of code for doing that and you will not be mixing it into your mapping logic which checks if number is contained in the random sequence.
(1 to 100).foreach { x =>
if(z.contains(x)) {
// do something
} else {
// do something else
}
}
or you can use a partial function, like so:
(1 to 100).foreach {
case x if(z.contains(x)) => // do something
case _ => // do something else
}

magento2 - How to get a product's stock status enabled/disabled?

I'm trying to get whether the product's stock status is instock/outofstock (Integers representing each state are fine. i don't necessarily need the "in stock"/"out of stock" strings per se).
I've tried various things to no avail.
1)
$inStock = $obj->get('Magento\CatalogInventory\Api\Data\StockItemInterface')->getisInStock()'
// Magento\CatalogInventory\Api\Data\StockItemInterface :: getisInStock returns true no matter what, even for 0qty products
// summary: not useful. How do you get the real one?
2)
$inStock = $obj->get('\Magento\CatalogInventory\Api\StockStateInterface')->verifyStock($_product->getId());
// test results for "verifyStock":
// a 0 qty product is in stock
// a 0 qty product is out of stock
// summary: fail. find correct method, with tests.
3)
$stockItemRepository = $obj->get('Magento\CatalogInventory\Model\Stock\StockItemRepository');
stockItem = $stockItemRepository->get($_product->getId());
$inStock = $stockItem->getIsInStock();
// Uncaught Magento\Framework\Exception\NoSuchEntityException: Stock Item with id "214"
// summmary: is stockitem not 1to1 with proudctid?
The weird thing is, getting stock quantities works just fine.
$availability = (String)$obj->get('\Magento\CatalogInventory\Api\StockStateInterface')->getStockQty($_product->getId(), $_product->getStore()->getWebsiteId());
So why isn't getIsInStock working?
This was one way I did it.
$stockItemResource = $obj->create('Magento\CatalogInventory\Model\ResourceModel\Stock\Item');
// grab ALL stock items (i.e. object that contains stock information)
$stockItemSelect = $stockItemResource->getConnection()->select()->from($stockItemResource->getMainTable());
$stockItems = $stockItemResource->getConnection()->fetchAll($stockItemSelect);
$inStock = null;
foreach($stockItems as $k => $item) {
if ($item['product_id'] == $_productId) {
$inStock = $item['is_in_stock'];
break; // not breaking properly. 'qz' still prints
}
}
Notes on efficiency:
I'm sure there are another ways to target the single item specifically, instead of getting all. Either through a method, or by adjusting the query passed in somehow.
But this method is probably more efficient for large n, avoiding the n+1 query problem.
You do still end up iterating through a lot, but perhaps theta(n) of iterating through a cached PHP variable is probably lower than n+1 querying the database. Haven't tested, just a hypothesis.
The returned structure is an array of arrays, where the sub-array (which also happens to be a stock item) has the product ID and the stock status value. And because the product ID and the stock status value is on the same level of nesting, we have no choice but to iterate through each sub-array to check the product_id, choose that sub-array, and grab the stock value. In short, we can't just utilize the hashmap, since the keys of the sub-array are not product IDs.
Ultimately, the efficiency of this depends on your use case. Rarely will you grab all stock items, unless doing mass exports. So the ultimate goal is to really just stay within the configured time limit is allowed for a request to persist.

(Spark/Scala) What would be the most effective way to compare specific data in one RDD to a line of another?

Basically, I have two sets of data in two text files. One set of data is in the format:
a,DataString1,DataString2 (One line) (The first character is in every entry but not relevant)
.... (and so on)
The second set of data is in format:
Data, Data Data Data, Data Data, Data, Data Data Data (One line)(separated by either commas or spaces, but I'm able to use a regular expression to handle this, so that's not the problem)
.... (And so on)
So what I need to do is check if DataString1 AND DataString2 are both present on any single line of the second set of data.
Currently I'm doing this like so:
// spark context is defined above, imported java.util.regex.Pattern above as well
case class test(data_one: String, data_two: String)
// case class is used to just more simply organize data_one to work with
val data_one = sc.textFile("path")
val data_two = sc.textFile("path")
val rdd_one = data_one.map(_.split(",")).map( c => test(c(1),c(2))
val rdd_two = data_two.map(_.split("[,\\s*]"))
val data_two_array = rdd_two.collect()
// this causes data_two_array to be an array of array of strings.
one.foreach { line =>
for (array <- data_two_array) {
for (string <- array) {
// comparison logic here that checks finds if both dataString1 and dataString2
// happen to be on same line is in these two for loops
}
}
}
How could I make this process more efficient? At the moment it does work correctly, but as data sizes grow this becomes very ineffective.
The double for loop scans for all elements with size m*n where m,n are sizes of each set. You can start with join to eliminate rows. Since you have 2 columns to verify, make sure the join takes care of those.

Use forall instead of filter on List[A]

Am trying to determine whether or not to display an overtime game display flag in weekly game results report.
Database game results table has 3 columns (p4,p5,p6) that represent potential overtime game period score total ( for OT, Double OT, and Triple OT respectively). These columns are mapped to Option[Int] in application layer.
Currently I am filtering through game result teamA, teamB pairs, but really I just want to know if an OT game exists of any kind (vs. stepping through the collection).
def overtimeDisplay(a: GameResult, b: GameResult) = {
val isOT = !(List(a,b).filter(_.p4.isDefined).filter(_.p5.isDefined).filter(_.p6.isDefined).isEmpty)
if(isOT) {
<b class="b red">
{List( ((a.p4,a.p5,a.p6),(b.p4,b.p5,b.p6)) ).zipWithIndex.map{
case( ((Some(_),None,None), (Some(_),None,None)), i)=> "OT"
case( ((Some(_),Some(_),None), (Some(_),Some(_),None )), i)=> "Double OT"
case( ((Some(_),Some(_),Some(_)), (Some(_),Some(_),Some(_) )), i)=> "Triple OT"
}}
</b>
}
else scala.xml.NodeSeq.Empty
}
Secondarily, the determination of which type of overtime to display, currently that busy pattern match (which, looking at it now, does not appear cover all the scoring scenarios), could probably be done in a more functional/concise manner.
Feel free to lay it down if you have the better way.
Thanks
Not sure if I understand the initial code correctly, but here is an idea:
val results = List(a, b).map(r => Seq(r.p4, r.p5, r.p6).flatten)
val isOT = results.exists(_.nonEmpty)
val labels = IndexedSeq("", "Double ", "Triple ")
results.map(p => labels(p.size - 1) + "OT")
Turning score column to flat list in first line is crucial here. You have GameResult(p4: Option[Int], p5: Option[Int], p6: Option[Int]) which you can map to Seq[Option[Int]]: r => Seq(r.p4, r.p5, r.p6) and later flatten to turn Some[Int] to Int and get rid of None. This will turn Some(42), None, None into Seq(42).
Looking at this:
val isOT = !(List(a,b).filter(_.p4.isDefined).filter(_.p5.isDefined).filter(_.p6.isDefined).isEmpty)
This can be rewritten using exists instead of filter. I would rewrite it as follows:
List(a, b).exists(x => x.p4.isDefined && x.p5.isDefined && x.p6.isDefined)
In addition to using exists, I am combining the three conditions you passed to the filters into a single anonymous function.
In addition, I don't know why you're using zipWithIndex when it doesn't seem as though you're using the index in the map function afterwards. It could be removed entirely.