I have this piece of code.
for {
country <- getCountryList
city <- getCityListForCountry(country)
person <- getPersonListForCity(person)
} {...}
When we run this code, we need to have a counter inside the body of the loop which increments every time the loop executes. This counter needs to show the number of people processed per country. So it has to reset itself to 0 every time we start executing the loop for a new country.
I tried
for {
country <- getCountryList
counterPerCountry = 0
city <- getCityListForCountry(country)
person <- getPersonListForCity(city)
} {counterPerCountry = counterPerCountry + 1; ...}
but this says that I am trying to reassign a value to val.
so I tried
var counterPerCountry = 0
for {
country <- getCountryList
counterPerCountry = 0
city <- getCityListForCountry(country)
person <- getPersonListForCity(city)
} {counterPerCountry = counterPerCountry + 1; ...}
also tried
for {
country <- getCountryList
var counterPerCountry = 0
city <- getCityListForCountry(country)
person <- getPersonListForCity(city)
} {counterPerCountry = counterPerCountry + 1; ...}
If you're just trying to figure out how to assign a value to a var within a for-comprehension for science, here's a solution:
var counter = 0
for {
a <- getList1
_ = {counter = 0}
b <- getList2(a)
c <- getList3(b)
} {
counter = counter + 1
...
}
If you're actually trying to count the number of people in a country, and you say it's the number of people in a city times the number of cities in a country - then it comes down to simple arithmetics:
for {
country <- getCountryList
cities = getCityListForCountry(country)
city <- cities
persons = getPersonListForCity(person)
personsPerCountry = cities.length * persons.length
person <- persons
} {...}
I agree with #pamu that a for-comprehension does not seem the like a natural choice here. But if you turn the for comprehension into the underlying operations, I think you can get a solution that, while not as readable as a for comprehension, works with Scala's functional style and avoids mutable variables. I'm thinking of something along this line:
getCountryList flatMap (country =>
(getCityListForCountry(country) flatMap (city =>
getPersonListForCity(city))
).zipWithIndex
)
That should yield a list of (person, index) tuples where the index starts at zero for each country.
The inner part could be turned back into a for comprehension, but I'm not sure whether that would improve readability.
I don't think for-comprehension allows this naturally. You have to do it bit hacky way. Here is one way to do it.
var counter = 0
for {
country <- getCountryList.map { elem => counter = 0; elem }
city <- getCityForCountry(country)
person <- getPersonForCity(person)
} {
counter + 1
//do something else here
}
or use function for being modular
var counter = 0
def reset(): Unit = counter = 0
for {
country <- getCountryList
_ = reset()
city <- getCityForCountry(country)
person <- getPersonForCity(person)
} {
counter + 1
//do something else here
}
People per country
val peoplePerCountry =
for {
country <- getCountryList
cities = getCityForCountry(country)
city <- cities
persons = getPersonForCity(person)
} yield (country -> (cities.length * persons.length))
The code returns list of country, persons per that country
The above for-comprehension is the answer, you do not have to go for counter approach. This functional and clean. No mutable state.
One more approach, if your only need is the actual sum would be something compact and functional such as:
getCountryList.map( country => //-- for each country
(country, //-- return country, and ...
getCityListForCountry(country).map ( city => //-- the sum across cities
getPersonListForCity(city).length //-- of the number of people in that city
).sum
)
)
which is a list of tuples of countries with the number of people in each country. I like to think of map as the "default" loop where I would have used a for in the past. I've found the index value is very seldom needed. The index value is available with the zipWithIndex method as mentioned in another answer.
Related
Data format of one row:
id: 123456
Topiclist: ABCDE:1_8;5_10#BCDEF:1_3;7_11
One id can have many rows:
id: 123456
Topiclist:ABCDE:1_1;7_2;#BCDEF:1_2;7_11#
Target: (123456, (ABCDE,9,2),(BCDEF,5,2))
Records in topic list are split by #, so ABCDE:1_8;5_10 is one record.
A record is in the format <topicid>:<topictype>_<topicvalue>
E.g for ABCDE:1_8 has
topicid = ABCDE
topictype = 1
topicvalue = 8
Target: sum the total value of TopicType1 , and count frequency of TopicType1
so should be (id, (topicid, value,frequency)), eg: (123456, (ABCDE,9,2),(BCDEF,5,2))
Assume that your data are "123456!ABCDE:1_8;5_10#BCDEF:1_3;7_11" and "123456!ABCDE:1_1;7_2#BCDEF:1_2;7_11", so we use "!" to get your userID "123456"
rdd.map{f=>
val userID = f.split("!")(0)
val items = f.split("!")(1).split("#")
var result = List[Array[String]]()
for (item <- items){
val topicID = item.split(":")(0)
for (topicTypeValue <- item.split(":")(1).split(";") ){
println(topicTypeValue);
if (topicTypeValue.split("_")(0)=="1"){result = result:+Array(topicID,topicTypeValue.split("_")(1),"1") }
}
}
(userID,result)
}
.flatMapValues(x=>x).filter(f=>f._2.length==3)
.map{f=>( (f._1,f._2(0)),(f._2(1).toInt,f._2(2).toInt) )}
.reduceByKey{case(x,y)=> (x._1+y._1,x._2+y._2) }
.map(f=>(f._1._1,(f._1._2,f._2._1,f._2._2))) // (userID, (TopicID,valueSum,frequences) )
The output is ("12345",("ABCDE",9,2)), ("12345",("BCDEF",5,2)) a little different from your output, you can group this result if you really need ("12345",("ABCDE",9,2), ("BCDEF",5,2) )
In Scala I need to evaluate a expression like this:
Some(((for { **** SOME CONDITION ****} yield ps.price.get * ps.quantity.get ).sum).toString)
The problem I get is that the values for price or quantity can be null (not existent in the database) and therefore I get the error:
[NoSuchElementException: None.get]
If price is null then I need a way to obtain 0 from ps.price.get and the same for ps.quantity.get so I can use sum. Price and quantity are
Option[scala.math.BigDecimal]
How can I do this?
Note: I tried
yield ps.price.getOrElse(0) * ps.quantity.getOrElse(0)
but in this case I get the error:
value * is not a member of Any
I think you can use something like this (for comprehension):
for {
// some conditions
} yield {
// now ps.price and ps.quantity are options
(ps.price, ps.quantity) match {
case (Some(p), Some(q)) => p * q
case _ => new BigDecimal(0)
}
}
Try this:
val prices = for {
ps <- listOfPs
price <- ps.price
quantity <- ps.quantity
if // put condition here
} yield price * quantity
prices.sum.toString
Your orElse case needs to be the correct type for both prices and quantity:
yield ps.price.getOrElse(0d) * ps.quantity.getOrElse(BigDecimal(0))
I am seeking for the best solution for this simple problem.
Run in C#/Entity the following SQL:
select user.name, userstat.point from user, userstat where userstat.user_id = user.id order by userstat.point desc
There is a User table [Id, Name, ...] and Statistic table [Id, UserId, Point. ...], where it's connected to User by Statistic.UserId. It's a 1:1 relation, so there is (max) 1 Statistic record for each User record.
I want to have a list User+Point, ordered by Point desc, and select a range, let's say 1000-1100.
Currently I have this:
public List<PointItem> Get(int startPos, int count)
{
using (DB.Database db = new DB.Database())
{
var dbList = db.Users.Where(user => .... ).ToList();
List<PointItem> temp = new List<PointItem>(count);
foreach (DB.User user in db.Users)
{
//should be always 1 stat for the user, but just to be sure check it...
if (user.Stats != null && user.Stats.Count > 0)
temp.Add(new PointItem { Name = user.Name, Point = user.Stats.First().Point });
} <--- this foreach takes forever
return temp.OrderByDescending(item => item.Point).Skip(startPos).Take(count).ToList();
}
}
It works fine, but when I have 10000 User (with 10000 UserStat) it runs for 100sec, which is only 1000x slower than I want it to be.
Is there more efficient solution than this?
If I run SQL, it takes 0 sec basically for 10K record.
EDIT
I made it faster, now 100sec -> 1 sec, but still I want it faster (if possible).
var userPoint = db.Users
.Where(u => u.UserStats.Count > 0 && ....)
.Select(up => new
{
User = up,
Point = up.UserStats.FirstOrDefault().Point
})
.OrderByDescending(up => up.Point)
.ToList();
var region = userPoint.Skip(0).Take(100);
Ok, I found the solution, the following code is 0.05 sec. Just need to go from child to parent:
using (DB.Database db = new DB.Database())
{
var userPoint = db.UserStats
.Where(s => s.User.xxx .....)
.Select(userpoint => new
{
User = userpoint.User.Name,
Point = userpoint.Point
})
.OrderByDescending(userpoint => userpoint.Point)
.ToList().Skip(startPos).Take(count);
}
I m applying map reduce function but facing an issue. In case of one record it returns the id instead of count = 1.
map_func = """function () {
emit(this.school_id, this.student_id);
}"""
reduce_func = """
function (k, values) {
values.length;
}
"""
if school 100 has only one student then it should return school id 100 , value =1 but in this scenario it return
schoolid = 100 , value = 12 ( 12 is its student id in db ). for other records it works fine.
map_func = """function () {
emit({this.school_id, this.student_id},{count:1});
}"""
reduce_func = """
function (k, values) {
var count =0 ;
values.forEach(function(v)
{
count += v['count'];
});
return {count:count};
}
"""
map_func2 = """
function() {
emit(this['_id']['school_id'], {count: 1});
}
"""
http://cookbook.mongodb.org/patterns/unique_items_map_reduce/
i used this example but it uses two maps-reduce function so it took much more time.
It looks like you may be misunderstanding some of the mechanics of mapReduce.
The emit will get called on every document, but reduce will only be called on keys which have more than one value emitted (because the purpose of the reduce function is to merge or reduce an array of results into one).
You map function is wrong - it needs to emit a key and then a value you want - in this case a count.
Your reduce function needs to reduce these counts (add them) but it has to work correctly even if it gets called multiple times (to re-reduce previously reduced results).
I recommend reading here for more details.
if you are trying to count number of students per school :
map = """emit(this.school_id, 1)"""
reduce = """function (key, values) {var total = 0; for (var i = 0; i < values.length; i++) { total += values[i]; } return total;} """
I have a couple of tables where there are one to many relationships. Let's say I have a Country table, a State table with a FK to Country, and a City table with a FK to State.
I'd like to be able to create a count of all Cities for a given country, and also a count of cities based on a filter - something like:
foreach( var country in Model.Country ) {
total = country.State.All().City.All().Count() ;
filtered = country.State.All().City.Any(c=>c.field == value).Count();
}
Obviously, this doesn't work - is there any way to do this?
Update:
I can iterate thru the objects:
foreach (var item in Model ) {
... other stuff in here ...
int tot = 0;
int filtered = 0;
foreach (var state in item.State)
{
foreach (var city in state.City)
{
tot++;
if (city.Field == somevalue)
filtered ++;
}
}
... other stuff in here ...
}
but that doesn't seem very elegant.
Update: #AD has a couple of suggestions, but what worked to solve the problem was:
int tot = item.States.Sum(s=>s.City.Count);
int filtered = item.States.Sum(s=>s.City.Where(c=>c.Field == somevalue).Count());
You can try, assuming you already have the givenCountry and value variable populated:
int total = EntityModel.CitySet.Where( it => it.State.Country.ID == givenCountry.ID ).Count();
Above, you take your entire set of cities (EntityMode.CitySet). This set contains all the cities in all the states in all the countries. The problem becomes: what subset of those cities are in country 'givenCountry'? To figure it out, you apply the Where() to the entire set and you compare the countries id to see if they are the same. However, since the city only knows which state it is in (and not the country) you first have to reference its state (it.State). it.State references the state object and that object has a Country property that will reference the country. Then it.State.Country references the country 'it' is in and 'it' is the city, creating a link between the city and the country.
Note that you could have done this is reverse as well with
int total = givenCountry.Sum( c => c.States.Sum( s.Cities.Count() ) )
However, here you will have to make sure that givenCountry has its States collection loaded in memory and also that each State has its Cities collection loaded. That is because you are using Linq-to-Entities on a loaded object and not on an Entity Framework instance object has was the case in the first example. There is a way to craft the last query to use the entity framework object however:
int total = EntityModel.CountrySet.Where( c => c.ID == givenCountry.ID ).Sum( c => c.States.Sum( s.Cities.Count() ) )
As for the number of cities with a specific field, you take a similar approach with a Where() call:
int filtered = EntityModel.CitySet.Where( it => it.field == value ).Count();
Why dont you reverse it?
foreach( var country in Model.Country ) {
var city = Model.State.Where(x=>x.StateID==country.State.StateID).Select(x=>City)
total = city.Count();
filtered = city.All(c=>c.field == value).Count();
}
You have to explicitly load children in the Entity Framework. If you load all the children then you can get counts just fine.
IEnumberable<Country> countries = Model.Country.Include("State");
total = countries[i].State.Count();
Assuming of course that the iteration through all countries is important. Otherwise why not just query against City filtered by State and Country?
In your state foreach you should just be able to do
tot += state.City.Where(x=> x.Field == value).Count();