I am trying to figure out a way to find out longest prefix of a string which matches a certain condition.
It is trivial to write a imperative method which does exactly that, but I am new to scala and wondering if there is an idiomatic scala (or functional) way to do that.
Keep iterating through longest prefixes (mystring.take(mystring.size),..mystring.take(1)
Apply a predicate on each of those substrings
If a prefix matches a predicate, break the loop and return the longest prefix.
For example, this:
def predicate(str: String): Boolean = ???
val longest_matching: Option[String] =
Iterator(mystring.size, 0, -1) // iterator of lengths
.map(mystring.take) // take string prefix
.find(predicate) // find first matching entry
longest_matching.fold {
println("No matching prefix")
} { prefix =>
println("Longest matching prefix: " + prefix)
}
You can take full advantage of the Scala standard library by using inits:
val longest_matching = mystring.inits.find(predicate)
Related
I'm trying to improve the algorithm.
Now it works for O(n) and iterates through all the elements of the set. Always. My attempts to achieve incomplete O(n) lead to the introduction of var variables. It would be great to do without var.
Task:
We need a class that implements a list of company names by substring - from the list of all available names, output a certain number of companies
that start with the entered line.
It is assumed that the class will be called when filling out a form on a website/mobile application with a high RPS (Requests per second).
My solution:
class SuggestService(companyNames : Seq[String]) {
def suggest(input: String, numberOfSuggest : Int) : Seq[String] = {
val resultCompanyNames =
for {
name <- companyNames if input.equals(name.take(input.length))
} yield name
resultCompanyNames.take(numberOfSuggest)
} //TODO: My code
}
Scastie: https://scastie.scala-lang.org/mIC5ZTGwRyKnuJAbhM1VlA
The solution proposed by #jwvh in the comments:
companyNames.view.filter(_.startsWith(input)).take(numberOfSuggest).toSeq
is good when you have several dozen company names, but in the worst case you'll have to check every single company name. If you have thousands of company names and many requests per second, this has a chance to become a serious bottleneck.
A better approach might be to sort the company names and use binary search to find the first potential result in O(L log N), where L is the average length of a company name:
import scala.collection.imuutable.ArraySeq // in Scala 2.13
class SuggestService(companyNames: Seq[String]) {
// in Scala 2.12 use companyNames.toIndexedSeq.sorted
private val sortedNames = companyNames.to(ArraySeq).sorted
#annotation.tailrec
private def binarySearch(input: String, from: Int = 0, to: Int = sortedNames.size): Int = {
if (from == to) from
else {
val cur = (from + to) / 2
if (sortedNames(cur) < input) binarySearch(input, cur + 1, to)
else binarySearch(input, from, cur)
}
}
def suggest(input: String, numberOfSuggest: Int): Seq[String] = {
val start = binarySearch(input)
sortedNames.view
.slice(start, start + numberOfSuggest)
.takeWhile(_.startsWith(input))
.toSeq
}
}
Note that the built-in binary search in scala (sortedNames.search) returns any result, not necessarily the first one. And the built-in binary search from Java works either on Arrays (Arrays.binarySearch) or on Java collections (Collections.binarySearch). So in the code above I've provided an explicit implementation of the lower bound binary search.
If you need still better performance, you can use a trie data structure. After traversing the trie and finding the node corresponding to input in O(L) (may depend on trie implementation), you can then continue down from this node to retrieve the first numberOfSuggest results. The query time doesn't depend on N at all, so you can use this method with millions company names.
In language like python and ruby to ask the language what index-related methods its string class supports (which methods’ names contain the word “index”) you can do
“”.methods.sort.grep /index/i
And in java
List results = new ArrayList();
Method[] methods = String.class.getMethods();
for (int i = 0; i < methods.length; i++) {
Method m = methods[i];
if (m.getName().toLowerCase().indexOf(“index”) != -1) {
results.add(m.getName());
}
}
String[] names = (String[]) results.toArray();
Arrays.sort(names);
return names;
How would you do the same thing in Scala?
Curious that no one tried a more direct translation:
""
.getClass.getMethods.map(_.getName) // methods
.sorted // sort
.filter(_ matches "(?i).*index.*") // grep /index/i
So, some random thoughts.
The difference between "methods" and the hoops above is striking, but no one ever said reflection was Java's strength.
I'm hiding something about sorted above: it actually takes an implicit parameter of type Ordering. If I wanted to sort the methods themselves instead of their names, I'd have to provide it.
A grep is actually a combination of filter and matches. It's made a bit more complex because of Java's decision to match whole strings even when ^ and $ are not specified. I think it would some sense to have a grep method on Regex, which took Traversable as parameters, but...
So, here's what we could do about it:
implicit def toMethods(obj: AnyRef) = new {
def methods = obj.getClass.getMethods.map(_.getName)
}
implicit def toGrep[T <% Traversable[String]](coll: T) = new {
def grep(pattern: String) = coll filter (pattern.r.findFirstIn(_) != None)
def grep(pattern: String, flags: String) = {
val regex = ("(?"+flags+")"+pattern).r
coll filter (regex.findFirstIn(_) != None)
}
}
And now this is possible:
"".methods.sorted grep ("index", "i")
You can use the scala REPL prompt. To find list the member methods of a string object, for instance, type "". and then press the TAB key (that's an empty string - or even a non-empty one, if you like, followed by a dot and then press TAB). The REPL will list for you all member methods.
This applies to other variable types as well.
More or less the same way:
val names = classOf[String].getMethods.toSeq.
filter(_.getName.toLowerCase().indexOf(“index”) != -1).
map(_.getName).
sort(((e1, e2) => (e1 compareTo e2) < 0))
But all on one line.
To make it more readable,
val names = for(val method <- classOf[String].getMethods.toSeq
if(method.getName.toLowerCase().indexOf("index") != -1))
yield { method.getName }
val sorted = names.sort(((e1, e2) => (e1 compareTo e2) < 0))
This is as far as I got:
"".getClass.getMethods.map(_.getName).filter( _.indexOf("in")>=0)
It's strange Scala array doesn't have sort method.
edit
It would end up like.
"".getClass.getMethods.map(_.getName).toList.sort(_<_).filter(_.indexOf("index")>=0)
Now, wait a minute.
I concede Java is verbose compared to Ruby for instance.
But that piece of code shouldn't have been so verbose in first place.
Here's the equivalent :
Collection<String> mds = new TreeSet<String>();
for( Method m : "".getClass().getMethods()) {
if( m.getName().matches(".*index.*")){ mds.add( m.getName() ); }
}
Which has almost the same number of characters as the marked as correct, Scala version
Just using the Java code direct will get you most of the way there, as Scala classes are still JVM ones. You could port the code to Scala pretty easily as well, though, for fun/practice/ease of use in REPL.
I written the code below for finding even numbers and the number just before it in a RDD object. In this I first converted that to a List and tried to use my own function to find the even numbers and the numbers just before them. The following is my code. In this I have made an empty list in which I am trying to append the numbers one by one.
object EvenandOdd
{
def mydef(nums:Iterator[Int]):Iterator[Int]=
{
val mylist=nums.toList
val len= mylist.size
var elist=List()
var i:Int=0
var flag=0
while(flag!=1)
{
if(mylist(i)%2==0)
{
elist.++=List(mylist(i))
elist.++=List(mylist(i-1))
}
if(i==len-1)
{
flag=1
}
i=i+1
}
}
def main(args:Array[String])
{
val myrdd=sc.parallelize(List(1,2,3,4,5,6,7,8,9,10),2)
val myx=myrdd.mapPartitions(mydef)
myx.collect
}
}
I am not able to execute this command in Scala shell as well as in Eclipse and not able to figure out the error as I am just a beginner to Scala.
The following are the errors I got in Scala Shell.
<console>:35: error: value ++= is not a member of List[Nothing]
elist.++=List(mylist(i))
^
<console>:36: error: value ++= is not a member of List[Nothing]
elist.++=List(mylist(i-1))
^
<console>:31: error: type mismatch;
found : Unit
required: Iterator[Int]
while(flag!=1)
^
Your code looks too complicated and not functional. Also, it introduce potential problems with memory: you take Iterator as param and return Iterator as output. So, knowing that Iterator itself could be lazy and has under the hood huge amount of data, materializing it inside method with list could cause OOM. So your task is to get as much data from initial iterator as it it enough to answer two methods for new Iterator: hasNext and next
For example (based on your implementation, which outputs duplicates in case of sequence of even numbers) it could be:
def mydef(nums:Iterator[Int]): Iterator[Int] = {
var before: Option[Int] = None
val helperIterator = new Iterator[(Option[Int], Int)] {
override def hasNext: Boolean = nums.hasNext
override def next(): (Option[Int], Int) = {
val result = (before, nums.next())
before = Some(result._2)
result
}
}
helperIterator.withFilter(_._2 % 2 == 0).flatMap{
case (None, next) => Iterator(next)
case (Some(prev), next) => Iterator(prev, next)
}
}
Here you have two iterators. One helper, which just prepare data, providing previous element for each next. And next on - resulting, based on helper, which filter only even for sequence elements (second in pair), and output both when required (or just one, if first element in sequence is even)
For initial code
Additionally to answer of #pedrorijo91, in initial code you do did not also return anything (suppose you wanted to convert elist to Iterator)
It will be easier if you use a functional coding style rather than an iterative coding style. In functional style the basic operation is straightforward.
Given a list of numbers, the following code will find all the even numbers and the values that precede them:
nums.sliding(2,1).filter(_(1) % 2 == 0)
The sliding operation creates a list containing all possible pairs of adjacent values in the original list.
The filter operation takes only those pairs where the second value is even.
The result is an Iterator[List[Int]] where each List[Int] has two elements. You should be able to use this in your RDD framework.
It's marked part of the developer API, so there's no guarantee it'll stick around, but the RDDFunctions object actually defines sliding for RDDs. You will have to make sure it sees elements in the order you want.
But this becomes something like
rdd.sliding(2).filter(x => x(1) % 2 == 0) # pairs of (preceding number, even number)
for the first 2 errors:
there's no ++= operator on Lists. You will have to do list = list ++ element
In some book I've got a code similar to this:
object ValVarsSamples extends App {
val pattern = "([ 0-9] +) ([ A-Za-z] +)". r // RegEx
val pattern( count, fruit) = "100 Bananas"
}
This is supposed to be a trick, it should like defining same names for two vals, but it is not.
So, this fails with an exception.
The question: what this might be about? (what's that supposed to be?) and why it does not work?
--
As I understand first: val pattern - refers to RegEx constructor function.. And in second val we are trying to pass the params using such a syntax? just putting a string
This is an extractor:
val pattern( count, fruit) = "100 Bananas"
This code is equivalent
val res = pattern.unapplySeq("100 Bananas")
count = res.get(0)
fruit = res.get(1)
The problem is your regex doesn't match, you should change it to:
val pattern = "([ 0-9]+) ([ A-Za-z]+)". r
The space before + in [ A-Za-z] + means you are matching a single character in the class [ A-Za-z] and then at least one space character. You have the same issue with [ 0-9] +.
Scala regexes define an extractor, which returns a sequence of matching groups in the regular expression. Your regex defines two groups so if the match succeeds the sequence will contain two elements.
Scala has a language feature to support disjunctions in pattern matching ('Pattern Alternatives'):
x match {
case _: String | _: Int =>
case _ =>
}
However, I often need to trigger an action if the scrutiny satisfies PatternA and PatternB (conjunction.)
I created a pattern combinator '&&' that adds this capability. Three little lines that remind me why I love Scala!
// Splitter to apply two pattern matches on the same scrutiny.
object && {
def unapply[A](a: A) = Some((a, a))
}
// Extractor object matching first character.
object StartsWith {
def unapply(s: String) = s.headOption
}
// Extractor object matching last character.
object EndsWith {
def unapply(s: String) = s.reverse.headOption
}
// Extractor object matching length.
object Length {
def unapply(s: String) = Some(s.length)
}
"foo" match {
case StartsWith('f') && EndsWith('f') => "f.*f"
case StartsWith('f') && EndsWith(e) && Length(3) if "aeiou".contains(e) => "f..[aeiou]"
case _ => "_"
}
Points for discussion
Is there an existing way to do this?
Are there problems with this approach?
Can this approach create any other useful combinators? (for example, Not)
Should such a combinator be added to the standard library?
UPDATE
I've just been asked how the compiler interprets case A && B && C. These are infix operator patterns (Section 8.1.9 of the Scala Reference). You could also express this with standard extract patterns (8.1.7) as &&(&&(A, B), C).' Notice how the expressions are associated left to right, as per normal infix operator method calls likeBoolean#&&inval b = true && false && true`.
I really like this trick. I do not know of any existing way to do this, and I don't foresee any problem with it -- which doesn't mean much, though. I can't think of any way to create a Not.
As for adding it to the standard library... perhaps. But I think it's a bit hard. On the other hand, how about talking Scalaz people into including it? It looks much more like their own bailiwick.
A possible problem with this is the bloated translation that the pattern matcher generates.
Here is the translation of the sample program, generated with scalac -print. Even -optimise fails to simplify the if (true) "_" else throw new MatchError() expressions.
Large pattern matches already generate more bytecode than is legal for a single method, and use of this combinator may amplify that problem.
If && was built into the language, perhaps the translation could be smarter. Alternatively, small improvements to -optimise could help.