It has been a few days and i started learning Scala on IntelliJ and I am learning all by myself. Please bear my rookie mistakes. I have a csv file with more than 10,000 rows and 13 columns.
The heading of of the columns are:
Category | Rating | Reviews | Size | Installs | Type | Price | Content Rating | Genres | Last updated | Current Version | Android Version
I did manage to read and display the the csv file with the following code:
import scala.io.Source
object task {
def main(args: Array[String]): Unit = {
for(line <- Source.fromFile("D:/data.csv"))
{
println(line)
}
}
}
The problem with this is that this code displays one alphabet or digit, moves onto the next line and displays the next alphabet or digit. It does not display a row in one line.
I want to find out the best app for each category (ART_AND_DESIGN, AUTO_AND_VEHICLES, BEAUTY…,) based on its assigned priorities of reviews and ratings. The priorities are defined as 60 % for “reviews” and 40% for “rating” columns respectively. Calculate a value for each category (ART_AND_DESIGN, AUTO_AND_VEHICLES, BEAUTY…,) by using these assigned values of priorities. This value will help us out to find the best app in each category. You can use Priority formula equation as follows.
Priority = ( (((rating/max_rating) * 100) * 0.4) + (((reviews/max_reviews) * 100) * 0.6) )
Here max_rating is maximum rating of given data in same category like category(“ART_AND_DESIGN”) maximum rating is “4.7”, max_reviews is maximum reviews of app in same category like category(“ART_AND_DESIGN”) maximum reviews is “295221”. So priority value will be for first data record of category(“ART_AND_DESIGN”) is:
Rating= 4.1, reviews= 159,
max_rating= 4.7, max_reviews= 295221
My question is, how can i store every column in an array? That is how i plan on computing the data. If there is any other way to solve the above problem, i am open to suggestions.
I can upload a small chunk of the data if anyone wants to.
Source gives you a byte Iterator by default. To iterate through lines, use .getLines:
Source.fromFile(fileName)
.getLines
.foreach(println)
To split lines into arrays, use split (assuming the column values do not include separator):
val arrays = Source.fromFile(fileName).getLines.map(_.split("|"))
It is better to avoid using raw arrays though. Creating a case class makes for much better, readable code:
case class AppData(
category: String,
rating: Int,
reviews: Int,
size: Int,
installs: Int,
`type`: String,
price: Double,
contentRating: Int,
generes: Seq[String],
lastUpdated: Long,
version: String,
androidVersion: String
) {
def priority(maxRating: Int, maxReview: Int) =
if(maxRatings == 0 || maxReviews == 0) 0 else
(rating * 0.4 / maxRating + reviews * 0.6 /maxReview) * 100
}
object AppData {
def apply(str: String) = {
val fields = str.split("|")
assert(fields.length == 12)
AppData(
fields(0),
fields(1).toInt,
fields(2).toInt,
fields(3).toInt,
fields(4).toInt,
fields(5),
fields(6).toDouble,
fields(7).toInt,
fields(8).split(",").toSeq,
fields(9).toLong,
fields(10),
fields(11)
)
}
}
Now you can do what you want pretty neatly:
// Read the data, parse it and group by category
// This gives you a map of categories to a seq of apps
val byCategory = Source.fromFile(fileName)
.map(AppData)
.groupBy(_.category)
// Now, find out max ratings and reviews for each category
// This could be done even nicer with another case class and
// a monoid, but tuple/fold will do too
// It is tempting to use `.mapValues` here, but that's not a good idea
// because .mapValues is LAZY, it will recompute the max every time
// the value is accessed!
val maxes = byVategory.map { case (cat, data) =>
cat ->
data.foldLeft(0 -> 0) { case ((maxRatings, maxReviews), in) =>
(maxRatings max in.rating, maxReviews max in.reviews)
}
}.withDefault( _ => (0,0))
// And finally go through your categories, and find best for each,
// that's it!
val bestByCategory = byCategory.map { case(cat, apps) =>
cat -> apps.maxBy { _.priority.tupled(maxes(cat)) }
}
I have a case class that I am trying to test via ScalaCheck. The case class contains other classes.
Here are the classes:
case class Shop(name: String = "", colors: Seq[Color] = Nil)
case class Color(colorName: String = "", shades: Seq[Shade] = Nil)
case class Shade(shadeName: String, value: Int)
I have generators for each one
implicit def shopGen: Gen[Shop] =
for {
name <- Gen.alphaStr.suchThat(_.length > 0)
colors <- Gen.listOf(colorsGen)
} yield Shop(name, colors)
implicit def colorsGen: Gen[Color] =
for {
colorName <- Gen.alphaStr.suchThat(_.length > 0)
shades <- Gen.listOf(shadesGen)
} yield Color(colorName, shades)
implicit def shadesGen: Gen[Shade] =
for {
shadeName <- Gen.alphaStr.suchThat(_.length > 0) //**Note this**
value <- Gen.choose(1, Int.MaxValue)
} yield Shade(shadeName, value)
When I write my test and simply do the below:
property("Shops must encode/decode to/from JSON") {
"test" mustBe "test
}
I get an error and the test hangs and stops after 51 tries. The error I get is Gave up after 1 successful property evaluation. 51 evaluations were discarded.
If I remove Gen.alphaStr.suchThat(_.length > 0) from shadesGen and just replace it with Gen.alphaStr then it works.
Question
Why does having Gen.alphaStr work for shadesGen, however, Gen.alphaStr.suchThat(_.length > 0) does not?
Also when I run test multiple times (with Gen.alphaStr) some pass while some don't. Why is this?
You probably see this behavior because of the way listOf is implemented. Inside it is based on buildableOf which is in turn based on buildableOfN which has following comment:
... If the given generator fails generating a value, the
complete container generator will also fail.
Your data structure is essentially a list of lists so even one bad generation will curse the whole data-structure to be discarded. And obviously most of the failures happens at the bottom level. That's why removing the filter for shadeName helps. So to make it work you should generate more valid strings. You may change Gen.alphaStr to some custom-made generator based on nonEmptyListOf such as:
def nonemptyAlphaStr:Gen[String] = Gen.nonEmptyListOf(alphaChar).map(_.mkString)
Another simple way to work this around is to use retryUntil instead of suchThat such as in:
implicit def shadesGen: Gen[Shade] =
for {
//shadeName <- Gen.alphaStr.suchThat(_.length > 0) //**Note this**
shadeName <- Gen.alphaStr.retryUntil(_.length > 0)
value <- Gen.choose(1, Int.MaxValue)
} yield Shade(shadeName, value)
I want to check if a specify id that contained in an Enumeration.
So I write down the contains function
object Enum extends Enumeration {
type Enum = Value
val A = Value(2, "A")
def contains(value: Int): Boolean = {
Enum.values.map(_.id).contains(value)
}
}
But the time cost is unexpected while id is a big number, such as over eight-digit
val A = Value(222222222, "A")
Then the contains function cost over 1000ms per calling.
And I also noticed the first time calling always cost hundreds millisecond whether the id is big or small.
I can't figure out why.
First, lets talk about the cost of Enum.values. This is implemented here:
See here: https://github.com/scala/scala/blob/0b47dc2f28c997aed86d6f615da00f48913dd46c/src/library/scala/Enumeration.scala#L83
The implementation is essentially setting up a mutable map. Once it is set up, it is re-used.
The cost for big numbers in your Value is because, internally Scala library uses a BitSet.
See here: https://github.com/scala/scala/blob/0b47dc2f28c997aed86d6f615da00f48913dd46c/src/library/scala/Enumeration.scala#L245
So, for larger numbers, BitSet will be bigger. That only happens when you call Enum.values.
Depending on your specific uses case you can choose between using Enumeration or Case Object:
Case objects vs Enumerations in Scala
It sure looks like the mechanics of Enumeration don't handle large ints well in that position. The Scaladocs for the class don't say anything about this, but they don't advertise using Enumeration.Value the way you do either. They say, e.g., val A = Value, where you say val A = Value(2000, "A").
If you want to keep your contains method as you have it, why don't you cache the Enum.values.map(_.id)? Much faster.
object mult extends App {
object Enum extends Enumeration {
type Enum = Value
val A1 = Value(1, "A")
val A2 = Value(2, "A")
val A222 = Enum.Value(222222222, "A")
def contains(value: Int): Boolean = {
Enum.values.map(_.id).contains(value)
}
val cache = Enum.values.map(_.id)
def contains2(value: Int): Boolean = {
cache.contains(value)
}
}
def clockit(desc: String, f: => Unit) = {
val start = System.currentTimeMillis
f
val end = System.currentTimeMillis
println(s"$desc ${end - start}")
}
clockit("initialize Enum ", Enum.A1)
clockit("contains 2 ", Enum.contains(2))
clockit("contains 222222222 ", Enum.contains(222222222))
clockit("contains 222222222 ", Enum.contains(222222222))
clockit("contains2 2 ", Enum.contains2(2))
clockit("contains2 222222222", Enum.contains2(222222222))
}
I am trying to use the should matchers on a case class
case class ListOfByteArrayCaseConfig(
#BeanProperty
permissions: java.util.List[Array[Byte]]
)
With the following test case
val orig = ListOfByteArrayCaseConfig(List(Array[Byte](10, 20, 30)))
val orig2 = ListOfByteArrayCaseConfig(List(Array[Byte](10, 20, 30)))
orig2 should be === orig
Obviously this would fail because the two byte arrays are not equal reference wise. What I want to do is somehow make this work without changing the test case code and still keeping the case class.
Is it even possible? (like adding a custom equals method to the case class?)
I found the solution. Apparently I can override the equals method in a case class
Scala: Ignore case class field for equals/hascode?
Though it gets rid of the reason for using case classes in the first place which is to simplify data objects.
case class ListOfByteArrayCaseConfig(
#BeanProperty
permissions: java.util.List[Array[Byte]]
) {
override def equals(arg: Any): Boolean = {
val obj = arg.asInstanceOf[ListOfByteArrayCaseConfig]
var i: Int = 0
for (i <- 0 until permissions.size()) {
if (!util.Arrays.equals(permissions.get(i), obj.permissions.get(i))) {
return false
}
}
return true
}
}
For those who don't know what a 5-card Poker Straight is: http://en.wikipedia.org/wiki/List_of_poker_hands#Straight
I'm writing a small Poker simulator in Scala to help me learn the language, and I've created a Hand class with 5 ordered Cards in it. Each Card has a Rank and Suit, both defined as Enumerations. The Hand class has methods to evaluate the hand rank, and one of them checks whether the hand contains a Straight (we can ignore Straight Flushes for the moment). I know there are a few nice algorithms for determining a Straight, but I wanted to see whether I could design something with Scala's pattern matching, so I came up with the following:
def isStraight() = {
def matchesStraight(ranks: List[Rank.Value]): Boolean = ranks match {
case head :: Nil => true
case head :: tail if (Rank(head.id + 1) == tail.head) => matchesStraight(tail)
case _ => false
}
matchesStraight(cards.map(_.rank).toList)
}
That works fine and is fairly readable, but I was wondering if there is any way to get rid of that if. I'd imagine something like the following, though I can't get it to work:
private def isStraight() = {
def matchesStraight(ranks: List[Rank.Value]): Boolean = ranks match {
case head :: Nil => true
case head :: next(head.id + 1) :: tail => matchesStraight(next :: tail)
case _ => false
}
matchesStraight(cards.map(_.rank).toList)
}
Any ideas? Also, as a side question, what is the general opinion on the inner matchesStraight definition? Should this rather be private or perhaps done in a different way?
You can't pass information to an extractor, and you can't use information from one value returned in another, except on the if statement -- which is there to cover all these cases.
What you can do is create your own extractors to test these things, but it won't gain you much if there isn't any reuse.
For example:
class SeqExtractor[A, B](f: A => B) {
def unapplySeq(s: Seq[A]): Option[Seq[A]] =
if (s map f sliding 2 forall { case Seq(a, b) => a == b } ) Some(s)
else None
}
val Straight = new SeqExtractor((_: Card).rank)
Then you can use it like this:
listOfCards match {
case Straight(cards) => true
case _ => false
}
But, of course, all that you really want is that if statement in SeqExtractor. So, don't get too much in love with a solution, as you may miss simpler ways of doing stuff.
You could do something like:
val ids = ranks.map(_.id)
ids.max - ids.min == 4 && ids.distinct.length == 5
Handling aces correctly requires a bit of work, though.
Update: Here's a much better solution:
(ids zip ids.tail).forall{case (p,q) => q%13==(p+1)%13}
The % 13 in the comparison handles aces being both rank 1 and rank 14.
How about something like:
def isStraight(cards:List[Card]) = (cards zip cards.tail) forall { case (c1,c2) => c1.rank+1 == c2.rank}
val cards = List(Card(1),Card(2),Card(3),Card(4))
scala> isStraight(cards)
res2: Boolean = true
This is a completely different approache, but it does use pattern matching. It produces warnings in the match clause which seem to indicate that it shouldn't work. But it actually produces the correct results:
Straight !!! 34567
Straight !!! 34567
Sorry no straight this time
I ignored the Suites for now and I also ignored the possibility of an ace under a 2.
abstract class Rank {
def value : Int
}
case class Next[A <: Rank](a : A) extends Rank {
def value = a.value + 1
}
case class Two() extends Rank {
def value = 2
}
class Hand(a : Rank, b : Rank, c : Rank, d : Rank, e : Rank) {
val cards = List(a, b, c, d, e).sortWith(_.value < _.value)
}
object Hand{
def unapply(h : Hand) : Option[(Rank, Rank, Rank, Rank, Rank)] = Some((h.cards(0), h.cards(1), h.cards(2), h.cards(3), h.cards(4)))
}
object Poker {
val two = Two()
val three = Next(two)
val four = Next(three)
val five = Next(four)
val six = Next(five)
val seven = Next(six)
val eight = Next(seven)
val nine = Next(eight)
val ten = Next(nine)
val jack = Next(ten)
val queen = Next(jack)
val king = Next(queen)
val ace = Next(king)
def main(args : Array[String]) {
val simpleStraight = new Hand(three, four, five, six, seven)
val unsortedStraight = new Hand(four, seven, three, six, five)
val notStraight = new Hand (two, two, five, five, ace)
printIfStraight(simpleStraight)
printIfStraight(unsortedStraight)
printIfStraight(notStraight)
}
def printIfStraight[A](h : Hand) {
h match {
case Hand(a: A , b : Next[A], c : Next[Next[A]], d : Next[Next[Next[A]]], e : Next[Next[Next[Next[A]]]]) => println("Straight !!! " + a.value + b.value + c.value + d.value + e.value)
case Hand(a,b,c,d,e) => println("Sorry no straight this time")
}
}
}
If you are interested in more stuff like this google 'church numerals scala type system'
How about something like this?
def isStraight = {
cards.map(_.rank).toList match {
case first :: second :: third :: fourth :: fifth :: Nil if
first.id == second.id - 1 &&
second.id == third.id - 1 &&
third.id == fourth.id - 1 &&
fourth.id == fifth.id - 1 => true
case _ => false
}
}
You're still stuck with the if (which is in fact larger) but there's no recursion or custom extractors (which I believe you're using incorrectly with next and so is why your second attempt doesn't work).
If you're writing a poker program, you are already check for n-of-a-kind. A hand is a straight when it has no n-of-a-kinds (n > 1) and the different between the minimum denomination and the maximum is exactly four.
I was doing something like this a few days ago, for Project Euler problem 54. Like you, I had Rank and Suit as enumerations.
My Card class looks like this:
case class Card(rank: Rank.Value, suit: Suit.Value) extends Ordered[Card] {
def compare(that: Card) = that.rank compare this.rank
}
Note I gave it the Ordered trait so that we can easily compare cards later. Also, when parsing the hands, I sorted them from high to low using sorted, which makes assessing values much easier.
Here is my straight test which returns an Option value depending on whether it's a straight or not. The actual return value (a list of Ints) is used to determine the strength of the hand, the first representing the hand type from 0 (no pair) to 9 (straight flush), and the others being the ranks of any other cards in the hand that count towards its value. For straights, we're only worried about the highest ranking card.
Also, note that you can make a straight with Ace as low, the "wheel", or A2345.
case class Hand(cards: Array[Card]) {
...
def straight: Option[List[Int]] = {
if( cards.sliding(2).forall { case Array(x, y) => (y compare x) == 1 } )
Some(5 :: cards(0).rank.id :: 0 :: 0 :: 0 :: 0 :: Nil)
else if ( cards.map(_.rank.id).toList == List(12, 3, 2, 1, 0) )
Some(5 :: cards(1).rank.id :: 0 :: 0 :: 0 :: 0 :: Nil)
else None
}
}
Here is a complete idiomatic Scala hand classifier for all hands (handles 5-high straights):
case class Card(rank: Int, suit: Int) { override def toString = s"${"23456789TJQKA" rank}${"♣♠♦♥" suit}" }
object HandType extends Enumeration {
val HighCard, OnePair, TwoPair, ThreeOfAKind, Straight, Flush, FullHouse, FourOfAKind, StraightFlush = Value
}
case class Hand(hand: Set[Card]) {
val (handType, sorted) = {
def rankMatches(card: Card) = hand count (_.rank == card.rank)
val groups = hand groupBy rankMatches mapValues {_.toList.sorted}
val isFlush = (hand groupBy {_.suit}).size == 1
val isWheel = "A2345" forall {r => hand exists (_.rank == Card.ranks.indexOf(r))} // A,2,3,4,5 straight
val isStraight = groups.size == 1 && (hand.max.rank - hand.min.rank) == 4 || isWheel
val (isThreeOfAKind, isOnePair) = (groups contains 3, groups contains 2)
val handType = if (isStraight && isFlush) HandType.StraightFlush
else if (groups contains 4) HandType.FourOfAKind
else if (isThreeOfAKind && isOnePair) HandType.FullHouse
else if (isFlush) HandType.Flush
else if (isStraight) HandType.Straight
else if (isThreeOfAKind) HandType.ThreeOfAKind
else if (isOnePair && groups(2).size == 4) HandType.TwoPair
else if (isOnePair) HandType.OnePair
else HandType.HighCard
val kickers = ((1 until 5) flatMap groups.get).flatten.reverse
require(hand.size == 5 && kickers.size == 5)
(handType, if (isWheel) (kickers takeRight 4) :+ kickers.head else kickers)
}
}
object Hand {
import scala.math.Ordering.Implicits._
implicit val rankOrdering = Ordering by {hand: Hand => (hand.handType, hand.sorted)}
}