Group by with paging (take skip) - entity-framework

I am trying to make some kind of paging. But, I need to do it on a grouped result, because every time I do a page. It is a requirement that all data for a given group is fetched.
Below code:
var erere = dbCtx.StatusViewList
.GroupBy(p => p.TurbineNumber)
.OrderBy(p => p.FirstOrDefault().TurbineNumber)
.Skip(0)
.Take(10)
.ToList();
I have 200k items and the statement above seems to be so slow the connection times out. My best bet is its the orderby that slows it down. Any suggestions how to do this, or how to speed the statement above up?

At your case, grouping on server side is not needed at all, because anyway you will get all data, but with additional overhead on server side. So try another approach:
var groupPage = dbCtx.StatusViewList.Select(x => TurbineNumber)
.Distinct().OrderBy(x => x.TurbineNumber).Skip(40).Take(20).ToList();
var data = dbCtx.StatusViewList.Where(x => groupPage.Contains(x.TurbineNumber))
.ToList().GroupBy(x => x.TurbineNumber).ToList();

The GroupBy needs to visit all elements to group all StatusViews into groups of StatusViews that have equal TurbineNumber.
After that, you take every group, from every group your take the first element and ask for its TurbineNumber, to sort by Turbine Number.
Apparently you take into account that a group of StatusViews might be empty (FirstOrDefault, instead of First), but then again, you assume that FirstOrDefault never returns null.
One of the things that could speed up your query is using the Key of your groups. The Key is the element on which you grouped, in your case the TurbineNumber: All elements in the a group have the same TurbineNumber.
var result = dbCtx.StatusViewList
.GroupBy(statusView => statusView.TurbineNumber)
.OrderBy(group => group.Key)
...
I think that will be a first step to improve performance.
However, you return a fixed number of Groups. Some Groups might be huge, 1000s of elements, some groups might be small: only one element. So the result of one page could be 10 groups, each with 1000 elements, having a total of 10000 elements. It could also be 10 groups, each with 1 element, a total of 10 elements. I'm not sure if this would be the result you want by paging.
Wouldn't you prefer a page that always has the same number of elements, preferably with the same TurbineNumber, If there are not many same TurbineNumbers fill the rest of your page with the next TurbineNumber. If there are too many StatusViews with this TurbineNumber divide them into several pages?
Something like:
TurbineNumber StatusView
4 A
4 B
4 F
5 D
5 K
6 C
6 Z
6 Q
6 W
7 E
To do this, don't GroupBy, use OrderBy and then Skip and Take
IEnumerable<StatusView> GetPage(int pageNr, int pageSize)
{
return dbCtx.StatusViewList
.Orderby(statusView => statusView.TurbineNumber)
.Skip(pageNr * pageSize)
.Take(pageSize)
}
If you create an extra index for TurbineNumber, this will be very fast:
In your DbContext.OnModelCreating(DbModelBuilder modelBuilder):
// Add an extra index on TurbineNumber:
var indexAttribute = new IndexAttribute("TurbineIndex", 0) {IsUnique = false}
var indexAnnotation =new IndexAnnotation(indexAttribute);
modelBuilder.Entity<Statusview>()
.Property(statusView => statusView.TurbineNumber)
.HasColumnAnnotation("MyIndexName", indexAnnotation);

Related

Understanding the usage of map/reduce functions in couchdb

I have the below output key value pairs after my map function.
["hello"] => 12
["hello"] => 1
["world"] => 23
["world"] => 4
["canada"] => 18
When i use __count as the reduce function, i got the result 5 as below.
System counts every row.
{
"rows": [
{
"key": null,
"value": 5
}
]
}
I use the same map function with __count again.. This time i add group=true to the query. I get the below result. It seems like reduce function works for every grouped key and counts them in itself.
["hello"] => 2
["world"] => 2
["canada"] => 1
I can't understand the mechanism here.. Why the system works like this with and without grouping. If reduce function works for every unique key , Shouldn't the result without grouping be like below?
["hello"] => 1
["hello"] => 1
["world"] => 1
["world"] => 1
["canada"] => 1
With reduce=true&group=false and a _count reduce function you're asking the system to count the total number of entries in the index. Hence, you see the expected result of 5 in your case.
The group=true is a request to apply the reduce function at a per-key level only, and not do the final summation across all entries. As you can see, if you sum the values you get from the group=true case, you end up with the value you get for the group=false case: 2+2+1 = 5.
It gets even more complicated if you emit a vector-valued key, for example where your map says something along the lines of
emit([doc.field1, doc.field2, doc.field3], 1)
Then you can do the grouping at a select level of the precise number of values from the key that you want to group at, using group_level=X. This is often used when dealing with time-series type data, to be able to group per year, or per month or per day. This is explained in depth in the following blog-post:
https://console.bluemix.net/docs/services/Cloudant/blog/mapreduce.html

Getting the total number of records in PagedList

The datagrid that I use on the client is based on SQL row number; it also requires a total number of pages for its paging. I also use the PagedList on the server.
SQL Profiler shows that the PagedList makes 2 db calls - the first to get the total number of records and the second to get the current page. The thing is that I can't find a way to extract that total number of records from the PagedList. Therefore, currently I have to make an extra call to get that total which creates 3 calls in total for each request, 2 of which are absolutely identical. I understand that I probably won't be able to rid of the call to get the totals but I hate to call it twice. Here is an extract from my code, I'd really appreciate any help in this:
var t = from c in myDb.MyTypes.Filter<MyType>(filterXml) select c;
response.Total = t.Count(); // my first call to get the total
double d = uiRowNumber / uiRecordsPerPage;
int page = (int)Math.Ceiling(d) + 1;
var q = from c in myDb.MyTypes.Filter<MyType>(filterXml).OrderBy(someOrderString)
select new ReturnType
{
Something = c.Something
};
response.Items = q.ToPagedList(page, uiRecordsPerPage);
PagedList has a .TotalItemCount property which reflects the total number of records in the set (not the number in a particular page). Thus response.Items.TotalItemCount should do the trick.

How to groupBy groupBy?

I need to map through a List[(A,B,C)] to produce an html report. Specifically, a
List[(Schedule,GameResult,Team)]
Schedule contains a gameDate property that I need to group by on to get a
Map[JodaTime, List(Schedule,GameResult,Team)]
which I use to display gameDate table row headers. Easy enough:
val data = repo.games.findAllByDate(fooDate).groupBy(_._1.gameDate)
Now the tricky bit (for me) is, how to further refine the grouping in order to enable mapping through the game results as pairs? To clarify, each GameResult consists of a team's "version" of the game (i.e. score, location, etc.), sharing a common Schedule gameID with the opponent team.
Basically, I need to display a game result outcome on one row as:
3 London Dragons vs. Paris Frogs 2
Grouping on gameDate let's me do something like:
data.map{case(date,games) =>
// game date row headers
<tr><td>{date.toString("MMMM dd, yyyy")}</td></tr>
// print out game result data rows
games.map{case(schedule,result, team)=>
...
// BUT (result,team) slice is ungrouped, need grouped by Schedule gameID
}
}
In the old version of the existing application (PHP) I used to
for($x = 0; $x < $this->gameCnt; $x = $x + 2) {...}
but I'd prefer to refer to variable names and not the come-back-later-wtf-is-that-inducing:
games._._2(rowCnt).total games._._3(rowCnt).name games._._1(rowCnt).location games._._2(rowCnt+1).total games._._3(rowCnt+1).name
maybe zip or double up for(t1 <- data; t2 <- data) yield(?) or something else entirely will do the trick. Regardless, there's a concise solution, just not coming to me right now...
Maybe I'm misunderstanding your requirements, but it seems to me that all you need is an additional groupBy:
repo.games.findAllByDate(fooDate).groupBy(_._1.gameDate).mapValues(_.groupBy(_._1.gameID))
The result will be of type:
Map[JodaTime, Map[GameId, List[(Schedule,GameResult,Team)]]]
(where GameId is the type of the return type of Schedule.gameId)
Update: if you want the results as pairs, then pattern matching is your friend, as shown by Arjan. This would give us:
val byDate = repo.games.findAllByDate(fooDate).groupBy(_._1.gameDate)
val data = byDate.mapValues(_.groupBy(_._1.gameID).mapValues{ case List((sa, ra, ta), (sb, rb, tb)) => (sa, (ta, ra), (tb, rb)))
This time the result is of type:
Map[JodaTime, Iterable[ (Schedule,(Team,GameResult),(Team,GameResult))]]
Note that this will throw a MatchError if there are not exactly 2 entries with the same gameId. In real code you will definitely want to check for this case.
Ok a soultion from RĂ©gis Jean-Gilles:
val data = repo.games.findAllByDate(fooDate).groupBy(_._1.gameDate).mapValues(_.groupBy(_._1.gameID))
You said it was not correct, maybe you just didnt use it the right way?
Every List in the result is a pair of games with the same GameId.
You could pruduce html like that:
data.map{case(date,games) =>
// game date row headers
<tr><td>{date.toString("MMMM dd, yyyy")}</td></tr>
// print out game result data rows
games.map{case (gameId, List((schedule, result, team), (schedule, result, team))) =>
...
}
}
And since you dont need a gameId, you can return just the paired games:
val data = repo.games.findAllByDate(fooDate).groupBy(_._1.gameDate).mapValues(_.groupBy(_._1.gameID).values)
Tipe of result is now:
Map[JodaTime, Iterable[List[(Schedule,GameResult,Team)]]]
Every list again a pair of two games with the same GameId

In Linq to EF 4.0, I want to return rows matching a list or all rows if the list is empty. How do I do this in an elegant way?

This sort of thing:
Dim MatchingValues() As Integer = {5, 6, 7}
Return From e in context.entity
Where MatchingValues.Contains(e.Id)
...works great. However, in my case, the values in MatchingValues are provided by the user. If none are provided, all rows ought to be returned. It would be wonderful if I could do this:
Return From e in context.entity
Where (MatchingValues.Length = 0) OrElse (MatchingValues.Contains(e.Id))
Alas, the array length test cannot be converted to SQL. I could, of course, code this:
If MatchingValues.Length = 0 Then
Return From e in context.entity
Else
Return From e in context.entity
Where MatchingValues.Contains(e.Id)
End If
This solution doesn't scale well. My application needs to work with 5 such lists, which means I'd need to code 32 queries, one for every situation.
I could also fill MatchingValues with every existing value when the user doesn't want to use the filter. However, there could be thousands of values in each of the five lists. Again, that's not optimal.
There must be a better way. Ideas?
Give this a try: (Sorry for the C# code, but you get the idea)
IQueryable<T> query = context.Entity;
if (matchingValues.Length < 0) {
query = query.Where(e => matchingValues.Contains(e.Id));
}
You could do this with the other lists aswell.

Scala vals vs vars

I'm pretty new to Scala but I like to know what is the preferred way of solving this problem. Say I have a list of items and I want to know the total amount of the items that are checks. I could do something like so:
val total = items.filter(_.itemType == CHECK).map(._amount).sum
That would give me what I need, the sum of all checks in a immutable variable. But it does it with what seems like 3 iterations. Once to filter the checks, again to map the amounts and then the sum. Another way would be to do something like:
var total = new BigDecimal(0)
for (
item <- items
if item.itemType == CHECK
) total += item.amount
This gives me the same result but with 1 iteration and a mutable variable which seems fine too. But if I wanted to to extract more information, say the total number of checks, that would require more counters or mutable variables but I wouldn't have to iterate over the list again. Doesn't seem like the "functional" way of achieving what I need.
var numOfChecks = 0
var total = new BigDecimal(0)
items.foreach { item =>
if (item.itemType == CHECK) {
numOfChecks += 1
total += item.amount
}
}
So if you find yourself needing a bunch of counters or totals on a list is it preferred to keep mutable variables or not worry about it do something along the lines of:
val checks = items.filter(_.itemType == CHECK)
val total = checks.map(_.amount).sum
return (checks.size, total)
which seems easier to read and only uses vals
Another way of solving your problem in one iteration would be to use views or iterators:
items.iterator.filter(_.itemType == CHECK).map(._amount).sum
or
items.view.filter(_.itemType == CHECK).map(._amount).sum
This way the evaluation of the expression is delayed until the call of sum.
If your items are case classes you could also write it like this:
items.iterator collect { case Item(amount, CHECK) => amount } sum
I find that speaking of doing "three iterations" is a bit misleading -- after all, each iteration does less work than a single iteration with everything. So it doesn't automatically follows that iterating three times will take longer than iterating once.
Creating temporary objects, now that is a concern, because you'll be hitting memory (even if cached), which isn't the case of the single iteration. In those cases, view will help, even though it adds more method calls to do the same work. Hopefully, JVM will optimize that away. See Moritz's answer for more information on views.
You may use foldLeft for that:
(0 /: items) ((total, item) =>
if(item.itemType == CHECK)
total + item.amount
else
total
)
The following code will return a tuple (number of checks -> sum of amounts):
((0, 0) /: items) ((total, item) =>
if(item.itemType == CHECK)
(total._1 + 1, total._2 + item.amount)
else
total
)