How to perform the below Scala operation to find the most frequent character in a string in java 8?
val tst = "Scala is awesomestttttts"
val op = tst.foldLeft(Map[Char,Int]())((a,b) => {
a+(b -> ((a.getOrElse(b, 0))+1))
}).maxBy(f => f._2)
Here the output is
(Char, Int) = (t,6)
I was able to get a stream of characters in Java 8 like this:
Stream<Character> sch = tst.chars().mapToObj(i -> (char)i);
but not able to figure out whats the fold/foldLeft/foldRight alternative we have in Java 8
Can someone pls help?
Something like this seems to match with the Scala code you provided (if I understand it correctly):
String tst = "Java is awesomestttttts";
Optional<Map.Entry<Character, Long>> max =
tst.chars()
.mapToObj(i -> (char) i)
.collect(Collectors.groupingBy(Function.identity(),
Collectors.counting()))
.entrySet()
.stream()
.max(Comparator.comparing(Map.Entry::getValue));
System.out.println(max.orElse(null));
If you don't mind using a third-party library Eclipse Collections has a Bag type that can keep track of the character counts. I've provided two examples below that use Bags. Unfortunately there is no maxByOccurrences available today on Bag, but the same result can be achieved by using topOccurrences(1) which is available. You can also use forEachWithOccurrences to find the max but it will be a little more code.
The following example uses a CharAdapter, which is also included in Eclipse Collections.
MutableBag<Character> characters =
CharAdapter.adapt("Scala is awesomestttttts")
.collect(Character::toLowerCase)
.toBag();
MutableList<ObjectIntPair<Character>> charIntPairs = characters.topOccurrences(2);
Assert.assertEquals(
PrimitiveTuples.pair(Character.valueOf('t'), 6), charIntPairs.get(0));
Assert.assertEquals(
PrimitiveTuples.pair(Character.valueOf('s'), 5), charIntPairs.get(1));
The second example uses the chars() method available on String which returns an IntStream. It feels a bit awkward that something called chars() does not return a CharStream, but this is because CharStream is not available in JDK 8.
MutableBag<Character> characters =
"Scala is awesomestttttts"
.toLowerCase()
.chars()
.mapToObj(i -> (char) i)
.collect(Collectors.toCollection(Bags.mutable::empty));
MutableList<ObjectIntPair<Character>> charIntPairs = characters.topOccurrences(2);
Assert.assertEquals(
PrimitiveTuples.pair(Character.valueOf('t'), 6), charIntPairs.get(0));
Assert.assertEquals(
PrimitiveTuples.pair(Character.valueOf('s'), 5), charIntPairs.get(1));
In both examples, I converted the characters to lowercase first, so there are 5 occurrences of 's'. If you want uppercase and lowercase letters to be distinct then just drop the lowercase code in both examples.
Note: I am a committer for Eclipse Collections.
Here is the sample by the Stream in abacus-common:
String str = "Scala is awesomestttttts";
CharStream.from(str).boxed().groupBy(t -> t, Collectors.counting())
.max(Comparator.comparing(Map.Entry::getValue)).get();
But I think the simplest way by Multiset:
CharStream.from(str).toMultiset().maxOccurrences().get();
Related
I am new to Scala so feel free to point me in the direction of documentation but I was not able to find an answer to this question in my research.
I am using scala 2.11.8 with Spark2.2 and trying to create a dynamic string containing dateString1_dateString2 (with underscores) using interpolation but having some issues.
val startDt = "20180405"
val endDt = "20180505"
This seems to work:
s"$startDt$endDt"
res62: String = 2018040520180505
But this fails:
s"$startDt_$endDt"
<console>:27: error: not found: value startDt_
s"$startDt_$endDt"
^
I expected this simple workaround with escapes to work but does not produce desired results:
s"$startDt\\_$endDt"
res2: String = 20180405\_20180505
Note that this question differs from Why can't _ be used inside of string interpolation? in that this question is looking to find a workable string interpolation solution while the previous question is much more internals-of-scala focused.
You can be explicit using curly braces:
# s"${startDt}_${endDt}"
res11: String = "20180405_20180505"
Your code:
s"$startDt_$endDt"
fails since startDt_ is a valid identifier, and scala tries to interpolate that non-existant variable.
Imagine that I wanted to take the characters from a string in Scala but have the toInt conversion to behave as it would on a string instead of as on a character.
To illustrate the following code behaves like so:
"0".toInt // results in 0
"000".charAt(0).toInt // results in 48
I'd like a version of the second line that would also result in 0. I have a solution like the following:
"000".charAt(0).toString.toInt // results in 0
But I wonder if there is a more direct or better way?
You can use asDigit:
val i: Int = "000".charAt(0).asDigit
You can do:
"000".substring(0, 1).toInt
But I'm not sure it's more "direct" than "000".charAt(0).toString.toInt
Hi guys I am parsing an unstructured file for some key words but i can't seem to easily find the line number of what the results I am getiing
val filePath:String = "myfile"
val myfile = sc.textFile(filePath);
var ora_temp = myfile.filter(line => line.contains("MyPattern")).collect
ora_temp.length
However, I not only want to find the lines that contains MyPatterns but I want more like a tupple (Mypattern line, line number)
Thanks in advance,
You can use ZipWithIndex as eliasah pointed out in a comment (with probably the most succinct way to do this using the direct tuple accessor syntax), or like so using pattern matching in the filter:
val matchingLineAndLineNumberTuples = sc.textFile("myfile").zipWithIndex().filter({
case (line, lineNumber) => line.contains("MyPattern")
}).collect
This is driving me nuts... there must be a way to strip out all non-digit characters (or perform other simple filtering) in a String.
Example: I want to turn a phone number ("+72 (93) 2342-7772" or "+1 310-777-2341") into a simple numeric String (not an Int), such as "729323427772" or "13107772341".
I tried "[\\d]+".r.findAllIn(phoneNumber) which returns an Iteratee and then I would have to recombine them into a String somehow... seems horribly wasteful.
I also came up with: phoneNumber.filter("0123456789".contains(_)) but that becomes tedious for other situations. For instance, removing all punctuation... I'm really after something that works with a regular expression so it has wider application than just filtering out digits.
Anyone have a fancy Scala one-liner for this that is more direct?
You can use filter, treating the string as a character sequence and testing the character with isDigit:
"+72 (93) 2342-7772".filter(_.isDigit) // res0: String = 729323427772
You can use replaceAll and Regex.
"+72 (93) 2342-7772".replaceAll("[^0-9]", "") // res1: String = 729323427772
Another approach, define the collection of valid characters, in this case
val d = '0' to '9'
and so for val a = "+72 (93) 2342-7772", filter on collection inclusion for instance with either of these,
for (c <- a if d.contains(c)) yield c
a.filter(d.contains)
a.collect{ case c if d.contains(c) => c }
I have the following code:
object testLines extends App {
val items = Array("""a-b-c d-e-f""","""a-b-c th-i-t""")
val lines = items.map(_.replaceAll("-", "")split("\t"))
print(lines.map(_.mkString(",")).mkString("\n"))
}
By mistake i did not put a dot between replaceAll and split but it worked.
By contrary when putting a dot between replaceAll and split i got an error
identifier expected but ';' found.
Implicit conversions found: items =>
What is going on?
Why does it work without a dot but is not working with a dot.
Update:
It works also with dot. The error message is a bug in the scala ide. The first part of the question is still valid
Thanks,
David
You have just discovered that Operators are methods. x.split(y) can also be written x split y in cases where the method is operator-like and it looks nicer. However there is nothing stopping you putting either side in parentheses like x split (y), (x) split y, or even (x) split (y) which may be necessary (and is a good idea for readability even if not strictly necessary) if you are passing in a more complex expression than a simple variable or constant and need parentheses to override the precedence.
With the example code you've written, it's not a bad idea to do the whole thing in operator style for clarity, using parentheses only where the syntax requires and/or they make groupings more obvious. I'd probably have written it more like this:
object testLines extends App {
val items = Array("a-b-c d-e-f", "a-b-c th-i-t")
val lines = items map (_ replaceAll ("-", "") split "\t")
print(lines map (_ mkString ",") mkString "\n")
}