Scala:Splitting a line and count the number of words - scala

I am new to scala and learning scala...
val pair=("99","ABC",88)
pair.toString().split(",").foreach { x => println(x)}
This gives the splitted line. But How do I count the number of splitted words .
I am trying as below:
pair.toString().split(",").count { x => ??? }
I am not sure how can I get the count of splitted line. ie 3 ..
Any help appreciated....

Tuples are equipped with product functions such as productElement, productPrefix, productArity and productIteratorfor processing its elements.
Note that
pair.productArity
res0: Int = 3
and that
pair.productIterator foreach println
99
ABC
88

pair.toString().split(",").size will give you the number of elements. OTOH, you have a Tuple3, so its size will only ever be three. Asking for a size function on a tuple is rather redundant, their sizes are fixed by their type.
Plus, if any of the elements contain a comma, your function will break.

Related

Scala - Not enough arguments for method count

I am fairly new to Scala and Spark RDD programming. The dataset I am working with is a CSV file containing list of movies (one row per movie) and their associated user ratings (comma delimited list of ratings). Each column in the CSV represents a distinct user and what rating he/she gave the movie. Thus, user 1's ratings for each movie are represented in the 2nd column from the left:
Sample Input:
Spiderman,1,2,,3,3
Dr.Sleep, 4,4,,,1
I am getting the following error:
Task4.scala:18: error: not enough arguments for method count: (p: ((Int, Int)) => Boolean)Int.
Unspecified value parameter p.
var moviePairCounts = movieRatings.reduce((movieRating1, movieRating2) => (movieRating1, movieRating2, movieRating1._2.intersect(movieRating2._2).count()
when I execute the few lines below. For the program below, the second line of code splits all values delimited by "," and produces this:
( Spiderman, [[1,0],[2,1],[-1,2],[3,3],[3,4]] )
( Dr.Sleep, [[4,0],[4,1],[-1,2],[-1,3],[1,4]] )
On the third line, taking the count() throws an error. For each movie (row), I am trying to get the number of common elements. In the above example, [-1, 2] is clearly a common element shared by both Spiderman and Dr.Sleep.
val textFile = sc.textFile(args(0))
var movieRatings = textFile.map(line => line.split(","))
.map(movingRatingList => (movingRatingList(0), movingRatingList.drop(1)
.map(ranking => if (ranking.isEmpty) -1 else ranking.toInt).zipWithIndex));
var moviePairCounts = movieRatings.reduce((movieRating1, movieRating2) => (movieRating1, movieRating2, movieRating1._2.intersect(movieRating2._2).count() )).saveAsTextFile(args(1));
My target output of line 3 is as follows:
( Spiderman, Dr.Sleep, 1 ) --> Between these 2 movies, there is 1 common entry.
Can somebody please advise ?
To get the number of elements in a collection, use length or size. count() returns number of elements which satisfy some additional condition.
Or you could avoid building the complete intersection by using count to count the elements of the first collection which the second contains:
movieRating1._2.count(movieRating2._2.contains(_))
The error message seems pretty clear: count takes one argument, but in your call, you are passing an empty argument list, i.e. zero arguments. You need to pass one argument to count.

apply a function in printf on scala

I receive in the method printfields a vector[String] which I am printing as follows:
def printFields(fields: Vector[String]): Unit =
{
printf(fields.map(_ => "%s").mkString("",",","\n"),fields: _*)
println(fields)
}
now this give me as output the following:
39,39,35,30
Vector(39, 39, 35,30)
28,28,35,30
Vector(28, 28, 35,30)
Now, Each number correspond to an Id, I need to apply a function to each number that appear here in order to print the element that correspond, in other words, make something like:
printf(fields.map(_ => "%s").mkString("",",","\n"),con.convI2N((fields: _*).toInt))
I try with converting the function to an Iterator, but give me Strings like
39
39
35,30
The last String can not be converted toInt, then this is not an option,
Someone can help me?
Thank you so much
What about converting the Vector[String] to a Vector[Int] as preliminary operation?
fields.map(_.split(',')).flatten.map(_.toInt)
This is just an hint, it is not the safer way, you should check that every String in your Vector is actually an Int or a sequence of comma-separated Ints.

get one random letter from each tuple then return them all as a string

3 tuples in a list
val l = List(("a","b"),("c","d"),("e","f"))
choice one element from each tuple then return this 3 letters word every time
for example: fca or afd or cbf ...
how to realize it
the same as:
echo {a,b}{c,d}{e,f}|xargs -n1|shuf -n1|sed 's/\B/\n/g'|shuf|paste -sd ''
Working with tuples can be a bit of a pain. You can't easily index them and tuples of different sizes are considered different types in the type system.
val ts = List(("a","b"),("c","d"),("e","f"))
val str = ts.map{t =>
t.productElement(util.Random.nextInt(t.productArity))
}.mkString("")
Every time I run this I get a different result: bde, acf, bdf, etc.

Scala read multidimensional array line by line

I am trying to read a multidimensional array line by line, as shown beneath:
var a = Array(MAX_N)(MAX_M)
for(i <- 1 to m) {
a(i) = readLine.split(" ").map(_.toInt)
}
However, I am getting the error:
error: value update is not a member of Int
So, how can I read the array line by line?
The main problem here is actually in your first line of code.
Array(MAX_N)(MAX_M) doesn't mean what you think it means.
The first part, Array(MAX_N), means "make an array of size 1 containing MAX_N", and then (MAX_M) means "return the MAX_M'th element of that array". So for example:
scala> Array(9)(0)
res1: Int = 9
To make a two-dimensional array, use Array.ofDim. See How to create and use a multi-dimensional array in Scala?
(There are more problems in your code after the first line. Perhaps someone else will point them out.)

Map word ngrams to counts in scala

I'm trying to create a map which goes through all the ngrams in a document and counts how often they appear. Ngrams are sets of n consecutive words in a sentence (so in the last sentence, (Ngrams, are) is a 2-gram, (are, sets) is the next 2-gram, and so on). I already have code that creates a document from a file and parses it into sentences. I also have a function to count the ngrams in a sentence, ngramsInSentence, which returns Seq[Ngram].
I'm getting stuck syntactically on how to create my counts map. I am iterating through all the ngrams in the document in the for loop, but don't know how to map the ngrams to the count of how often they occur. I'm fairly new to Scala and the syntax is evading me, although I'm clear conceptually on what I need!
def getNGramCounts(document: Document, n: Int): Counts = {
for (sentence <- document.sentences; ngram <- nGramsInSentence(sentence,n))
//I need code here to map ngram -> count how many times ngram appears in document
}
The type Counts above, as well as Ngram, are defined as:
type Counts = Map[NGram, Double]
type NGram = Seq[String]
Does anyone know the syntax to map the ngrams from the for loop to a count of how often they occur? Please let me know if you'd like more details on the problem.
If I'm correctly interpreting your code, this is a fairly common task.
def getNGramCounts(document: Document, n: Int): Counts = {
val allNGrams: Seq[NGram] = for {
sentence <- document.sentences
ngram <- nGramsInSentence(sentence, n)
} yield ngram
allNgrams.groupBy(identity).mapValues(_.size.toDouble)
}
The allNGrams variable collects a list of all the NGrams appearing in the document.
You should eventually turn to Streams if the document is big and you can't hold the whole sequence in memory.
The following groupBycreates a Map[NGram, List[NGram]] which groups your values by its identity (the argument to the method defines the criteria for "aggregate identification") and groups the corresponding values in a list.
You then only need to map the values (the List[NGram]) to its size to get how many recurring values there were of each NGram.
I took for granted that:
NGram has the expected correct implementation of equals + hashcode
document.sentences returns a Seq[...]. If not you should expect allNGrams to be of the corresponding collection type.
UPDATED based on the comments
I wrongly assumed that the groupBy(_) would shortcut the input value. Use the identity function instead.
I converted the count to a Double
Appreciate the help - I have the correct code now using the suggestions above. The following returns the desired result:
def getNGramCounts(document: Document, n: Int): Counts = {
val allNGrams: Seq[NGram] = (for(sentence <- document.sentences;
ngram <- ngramsInSentence(sentence,n))
yield ngram)
allNGrams.groupBy(l => l).map(t => (t._1, t._2.length.toDouble))
}