from null to 0 in Scala - scala

In Scala I need to evaluate a expression like this:
Some(((for { **** SOME CONDITION ****} yield ps.price.get * ps.quantity.get ).sum).toString)
The problem I get is that the values for price or quantity can be null (not existent in the database) and therefore I get the error:
[NoSuchElementException: None.get]
If price is null then I need a way to obtain 0 from ps.price.get and the same for ps.quantity.get so I can use sum. Price and quantity are
Option[scala.math.BigDecimal]
How can I do this?
Note: I tried
yield ps.price.getOrElse(0) * ps.quantity.getOrElse(0)
but in this case I get the error:
value * is not a member of Any

I think you can use something like this (for comprehension):
for {
// some conditions
} yield {
// now ps.price and ps.quantity are options
(ps.price, ps.quantity) match {
case (Some(p), Some(q)) => p * q
case _ => new BigDecimal(0)
}
}

Try this:
val prices = for {
ps <- listOfPs
price <- ps.price
quantity <- ps.quantity
if // put condition here
} yield price * quantity
prices.sum.toString

Your orElse case needs to be the correct type for both prices and quantity:
yield ps.price.getOrElse(0d) * ps.quantity.getOrElse(BigDecimal(0))

Related

Slick - Update query with multiple tables

I'm currently facing an issue with my update query in my scala-slick3 project. I have a Report-Class, which contains multiple Products and each Product contains multiple Parts. I want to implement a function that marks every Part of every Product within this Report as assessed.
I thought about doing something like this:
def markProductPartsForReportAsAssessed(reportId: Int) = {
val query = for {
(products, parts) <- (report_product_query filter(_.reportId === reportId)
join (part_query filter(_.isAssessed === false))
on (_.productId === _.productId))
} yield parts.isAssessed
db.run(query.update(true))
}
Now, when I run this code slick throws this exception:
SlickException: A query for an UPDATE statement must resolve to a comprehension with a single table.
I already looked at similiar problems of which their solutions (like this or this) weren't really satisfying to me.
Why does slick throw this excpetion or why is it a problem to begin with? I was under the impression that my yield already takes care of not "updating multiple tables".
Thanks in advance!
I guess it's because the UPDATE query requires just one table. If you write SQL for the above query, it can be
UPDATE parts a SET isAccessed = 'true'
WHERE a.isAccessed = 'false' and
exists(select 'x' from products b
where a.productId = b.producId and b.reportId = reportId)
Therefore, you can put conditions related with 'Product' table in the filter as follows.
val reportId = "123" // some variable
val subQuery = (reportId:Rep[String], productId:Rep[String]) =>
report_product_query.filter(r => r.report_id === reportId && r.product_id === productId)
val query = part_query.filter(p => p.isAccesssed === false:Rep[Boolean] &&
subQuery(reportId, p.productId).exists).map(_.isAccessed)
db.run(query.update(true))

Counter inside of scala for comprehension

I have this piece of code.
for {
country <- getCountryList
city <- getCityListForCountry(country)
person <- getPersonListForCity(person)
} {...}
When we run this code, we need to have a counter inside the body of the loop which increments every time the loop executes. This counter needs to show the number of people processed per country. So it has to reset itself to 0 every time we start executing the loop for a new country.
I tried
for {
country <- getCountryList
counterPerCountry = 0
city <- getCityListForCountry(country)
person <- getPersonListForCity(city)
} {counterPerCountry = counterPerCountry + 1; ...}
but this says that I am trying to reassign a value to val.
so I tried
var counterPerCountry = 0
for {
country <- getCountryList
counterPerCountry = 0
city <- getCityListForCountry(country)
person <- getPersonListForCity(city)
} {counterPerCountry = counterPerCountry + 1; ...}
also tried
for {
country <- getCountryList
var counterPerCountry = 0
city <- getCityListForCountry(country)
person <- getPersonListForCity(city)
} {counterPerCountry = counterPerCountry + 1; ...}
If you're just trying to figure out how to assign a value to a var within a for-comprehension for science, here's a solution:
var counter = 0
for {
a <- getList1
_ = {counter = 0}
b <- getList2(a)
c <- getList3(b)
} {
counter = counter + 1
...
}
If you're actually trying to count the number of people in a country, and you say it's the number of people in a city times the number of cities in a country - then it comes down to simple arithmetics:
for {
country <- getCountryList
cities = getCityListForCountry(country)
city <- cities
persons = getPersonListForCity(person)
personsPerCountry = cities.length * persons.length
person <- persons
} {...}
I agree with #pamu that a for-comprehension does not seem the like a natural choice here. But if you turn the for comprehension into the underlying operations, I think you can get a solution that, while not as readable as a for comprehension, works with Scala's functional style and avoids mutable variables. I'm thinking of something along this line:
getCountryList flatMap (country =>
(getCityListForCountry(country) flatMap (city =>
getPersonListForCity(city))
).zipWithIndex
)
That should yield a list of (person, index) tuples where the index starts at zero for each country.
The inner part could be turned back into a for comprehension, but I'm not sure whether that would improve readability.
I don't think for-comprehension allows this naturally. You have to do it bit hacky way. Here is one way to do it.
var counter = 0
for {
country <- getCountryList.map { elem => counter = 0; elem }
city <- getCityForCountry(country)
person <- getPersonForCity(person)
} {
counter + 1
//do something else here
}
or use function for being modular
var counter = 0
def reset(): Unit = counter = 0
for {
country <- getCountryList
_ = reset()
city <- getCityForCountry(country)
person <- getPersonForCity(person)
} {
counter + 1
//do something else here
}
People per country
val peoplePerCountry =
for {
country <- getCountryList
cities = getCityForCountry(country)
city <- cities
persons = getPersonForCity(person)
} yield (country -> (cities.length * persons.length))
The code returns list of country, persons per that country
The above for-comprehension is the answer, you do not have to go for counter approach. This functional and clean. No mutable state.
One more approach, if your only need is the actual sum would be something compact and functional such as:
getCountryList.map( country => //-- for each country
(country, //-- return country, and ...
getCityListForCountry(country).map ( city => //-- the sum across cities
getPersonListForCity(city).length //-- of the number of people in that city
).sum
)
)
which is a list of tuples of countries with the number of people in each country. I like to think of map as the "default" loop where I would have used a for in the past. I've found the index value is very seldom needed. The index value is available with the zipWithIndex method as mentioned in another answer.

Can I optimize this: Programmatically prepare two DataFrame's for a Union

This is under the understanding that withColumn can only take one column at a time, so if I'm wrong there I'm going to be embarrassed, but I'm worried about the memory performance of this because the DF's are likely to be very large in production. Essentially the idea is to do a union on the column arrays (Array[String]), distinct the result, and foldLeft over that set updating the accumulated DF's as I go. I'm looking for a programatic way to match the columns on the two DF's so I can perform a union afterwards.
val (newLowerCaseDF, newMasterDF): (DataFrame,DataFrame) = lowerCaseDFColumns.union(masterDFColumns).distinct
.foldLeft[(DataFrame,DataFrame)]((lowerCaseDF, masterDF))((acc: (DataFrame, DataFrame), value: String) =>
if(!lowerCaseDFColumns.contains(value)) {
(acc._1.withColumn(value,lit(None)), acc._2)
}
else if(!masterDFColumns.contains(value)) {
(acc._1, acc._2.withColumn(value, lit(None)))
}
else{
acc
}
)
Found out that it's possible to select hardcoded null columns, so my new solution is:
val masterExprs = lowerCaseDFColumns.union(lowerCaseMasterDFColumns).distinct.map(field =>
//if the field already exists in master schema, we add the name to our select statement
if (lowerCaseMasterDFColumns.contains(field)) {
col(field.toLowerCase)
}
//else, we hardcode a null column in for that name
else {
lit(null).alias(field.toLowerCase)
}
)
val inputExprs = lowerCaseDFColumns.union(lowerCaseMasterDFColumns).distinct.map(field =>
//if the field already exists in master schema, we add the name to our select statement
if (lowerCaseDFColumns.contains(field)) {
col(field.toLowerCase)
}
//else, we hardcode a null column in for that name
else {
lit(null).alias(field.toLowerCase)
}
)
And then you're able to do a union like so:
masterDF.select(masterExprs: _*).union(lowerCaseDF.select(inputExprs: _*))

Get Counts using Linq to SQL

I have the following SQL which works fine
SELECT
f.ForumId,
f.Name,
COUNT(ft.TopicId) AS TotalTopics,
COUNT(fm.MessageId) AS TotalMessages
FROM
tblForumMessages fm INNER JOIN
tblForumTopics ft ON fm.TopicId = ft.TopicId RIGHT OUTER JOIN
tblForums f ON ft.ForumId = f.ForumId
GROUP BY f.ForumId, f.Name
That I'm trying to convert to Linq.
Here's what I have
var forums = (from f in Forums
join ft in ForumTopics on f.ForumId equals ft.ForumId into topics
from y in topics.DefaultIfEmpty()
join fm in ForumMessages on y.TopicId equals fm.TopicId into messages
from x in messages.DefaultIfEmpty()
select new { f.ForumId, f.Name, y.TopicId, x.MessageId } into x
group x by new { x.ForumId, x.Name } into g
select new
{
ForumId = g.Key.ForumId,
ForumName = g.Key.Name,
TopicCount = g.Count(i => i.TopicId),
MessageCount = g.Count(i => i.MessageId)
}
).ToList();
I'm getting an error on TopicCount = g.Count(i => i.TopicId) saying Cannot convert expression type 'System.Guid' to return type 'bool'.
What am I missing to make this work?
Thanks
* EDIT *
Thanks to Rob I got it to work but the counts were always returning 1 for Topic Count and Message Count even though there were no records. It should have been returning 0 for both.
I've modified the query by changing
select new { f.ForumId, f.Name, y.TopicId, x.MessageId } into x
to
select new
{
f.ForumId, f.Name,
TopicId = y != null ? y.TopicId : (Guid?)null,
MessageId = z != null ? z.MessageId : (Guid?)null
} into x
And for the actual counts, I changed the query to
select new
{
g.Key.ForumId,
g.Key.Name,
TopicCount = g.Count(t => t.TopicId != null),
MessageCount = g.Count(t => t.MessageId != null)
}
The offending article is TopicCount = g.Count(i => i.TopicId). The Count method takes a Func<T, bool> (it gives the number of items in the collection that satisfy the predicate).
It looks like you want the number of distinct TopicIds in your group. Try replacing TopicCount = g.Count(i => i.TopicId) with TopicCount = g.GroupBy(i => i.TopicId).Count().
You can also try TopicCount = g.Select(i => i.TopicId).Distinct().Count()

map reduce function return id instead of count

I m applying map reduce function but facing an issue. In case of one record it returns the id instead of count = 1.
map_func = """function () {
emit(this.school_id, this.student_id);
}"""
reduce_func = """
function (k, values) {
values.length;
}
"""
if school 100 has only one student then it should return school id 100 , value =1 but in this scenario it return
schoolid = 100 , value = 12 ( 12 is its student id in db ). for other records it works fine.
map_func = """function () {
emit({this.school_id, this.student_id},{count:1});
}"""
reduce_func = """
function (k, values) {
var count =0 ;
values.forEach(function(v)
{
count += v['count'];
});
return {count:count};
}
"""
map_func2 = """
function() {
emit(this['_id']['school_id'], {count: 1});
}
"""
http://cookbook.mongodb.org/patterns/unique_items_map_reduce/
i used this example but it uses two maps-reduce function so it took much more time.
It looks like you may be misunderstanding some of the mechanics of mapReduce.
The emit will get called on every document, but reduce will only be called on keys which have more than one value emitted (because the purpose of the reduce function is to merge or reduce an array of results into one).
You map function is wrong - it needs to emit a key and then a value you want - in this case a count.
Your reduce function needs to reduce these counts (add them) but it has to work correctly even if it gets called multiple times (to re-reduce previously reduced results).
I recommend reading here for more details.
if you are trying to count number of students per school :
map = """emit(this.school_id, 1)"""
reduce = """function (key, values) {var total = 0; for (var i = 0; i < values.length; i++) { total += values[i]; } return total;} """