Extract multiple substrings from same string efficiently - scala

I have a large dataset of URL strings containing key-value pairs, and I want to capture a list of values from that string. One example of a string is below:
"GET /no_cache/bi_page?Log=1&pg_inst=600474500174606089&pg=mdot_fyc_pnt&platform=mdot&ver=10.c110&pid=157876860906745096&rid=157876731027276387&srch_id=-2&row=7&seq=1&tot=1&tsp=1&test_name=m_control&logDomain=http%3A%2F%2Fwww.xyz.com&ref_url=http%3A%2F%2Fm.xyz.com%2F&z=44134 HTTP/1.1"
So if my list of values to return come from keys: "pg","test_name","some_other_key" ... I'd want the function to return ("mdot_fyc","m_control","NA") for this row.
I could just write three separate regex lines to capture each value. But some of these strings are long and I could have dozens of these values to extract instead of just three.
What's the most efficient way to extract multiple values from the same string?

Here is a simple 1-pass solution. Let me know if it's fast enough.
I'm no expert in URLs, so it might need tuning. Basically it assumes that there are no unescaped spaces, '?', '&' or '=' characters.
It can be further smoothed with low-level opti.
def extractParams(params: List[String], from: String): Map[String, String] = {
val a = from.toCharArray
val len = a.length
import scala.annotation.tailrec
#tailrec
def extract(p: Set[String], start: Int, results: Map[String, String]): Map[String, String] = {
var paramStart = start
var nextEquals = -1
var nextAmpersand = -1
if (start == 0) { // find start of params
var i = 0
while (i < len && a(i) != '?') {
i += 1
}
if (i == len) return results
paramStart = i
}
{ // find equals
var i = paramStart
while (i < len && a(i) != '=') {
i += 1
}
if (i == len) return results
nextEquals = i
}
{ // find nextAmpersand or end
var i = nextEquals
while (i < len && !(a(i) == '&' || a(i) == ' ')) {
i += 1
}
nextAmpersand = i
}
val paramNameArr = new Array[Char](nextEquals - paramStart - 1)
System.arraycopy(a, paramStart + 1, paramNameArr, 0, nextEquals - paramStart - 1)
val paramName = new String(paramNameArr)
var newResults = results
var newP = p
if (p.contains(paramName)) { // find param value
val paramValueArr = new Array[Char](nextAmpersand - nextEquals - 1)
System.arraycopy(a, nextEquals + 1, paramValueArr, 0, nextAmpersand - nextEquals - 1)
val paramValue = new String(paramValueArr)
newResults = newResults + (paramName -> paramValue)
newP = p - (paramName)
}
if (nextAmpersand == len || a(nextAmpersand) == ' ') { // check for end
return newResults
} else {
return extract(newP, nextAmpersand, newResults)
}
}
extract(params.toSet, "GET ".length, Map.empty)
}

Related

better way to write scala function with conditions

I have this function which I'm using as a UDF in spark.
def convertRecipeTimeToMinutes: String => Int =
(time: String) => {
val size = time.size
val res =
if (size == 2)
0
else {
var recipeTime = 0
val builder = new StringBuilder
val slice = time.slice(2, size)
for (i <- slice) {
if (i.isDigit) {
builder.append(i)
} else {
if (i == 'H')
recipeTime += builder.toInt * 60
else if (i == 'M')
recipeTime += builder.toInt
builder.clear
}
}
recipeTime
}
res
}
It converts data into time in minutes.
Sample Input Data
xx25M
xx1H
xx1H30M
xx
Sample Output Data
25
60
90
0
it does the required job but I want to know and learn is there a better way to write this? Pattern matching, partial function or anything?
You can use a regular expression to extract the hours and minutes from the string:
def convertRecipeTimeToMinutes: String => Int = { time =>
val Time = """\D*(?:(\d+)H)?(?:(\d+)M)?""".r
time match {
case Time(hours, minutes) =>
Option(hours).fold(0)(_.toInt * 60) + Option(minutes).fold(0)(_.toInt)
}
}
Check https://regex101.com/r/vFkY9G/1 to see how this regular expression works.

Why isn't the return as expected?

Could you please run println(func("ctnkh"))? I got 4, but isn't it supposed to be 5?
def func(s: String): Int = {
if(s == "")
return 0
val len = s.length
var max = Int.MinValue
for(i <- 0 until len)
for(j <- i+1 to len) {
val ss = s.substring(i, j)
if(ss.mkString == ss.toSet.mkString) {
if(ss.length > max)
max = ss.length
}
}
max
}
I'll appreciate any of your hints
because ss.toSet.mkString will be in a different order than ss.mkString
for ex. try the following:
val str = "ctnkh"
println(str.mkString)
println(str.toSet.mkString)
output is:
ctnkh
ntchk
so the result will never be 5
edit: as mentioned in the comments you can't rely on the order of the characters in the Set as this order is not predictable.
You probably meant: ss.distinct. It drops all duplicate characters from the string, and preserves the order of the remaining characters.
def func(s: String): Int = {
val len = s.length
var max = 0
for(i <- 0 until len) {
for(j <- i+1 to len) {
val ss = s.substring(i, j)
if(ss == ss.distinct) {
max = max.max(ss.length)
}
}
}
max
}
println(func("ctnkh"))
gives 5, as you expected.
For your test, it's sufficient to check size:
if (ss.length == ss.toSet.size) max = max.max(ss.length)
since iteration order is extraneous.

scala tictactoe avoid mutable variable

I wrote a simple implementation of the tic tac toe game in scala.
package tiktaktoe
/**
* Created by donbeo on 28/12/15.
*/
class Board {
val size = 3
// size of the board
val goal = 3
// number in line. These values are assumed to be always 3 and 3
val board = Array.fill(size, size)(0) // the board
def isValidPosition(pos: Pos): Boolean = {
if (pos._1 >= 0 && pos._2 >= 0 && pos._1 < size && pos._2 < size && board(pos._1)(pos._2) == 0) true
else false
}
def setPosition(pos: Pos, number: Int): Unit = {
if (isValidPosition(pos)) {
board(pos._1)(pos._2) = number
println(this)
val winner = winningPlayer
if (winner != 0) println("The winner is: Player " + winner)
}
else throw new Exception("Not Valid Position")
}
def winningPlayer: Int = {
if (isLine(0) != 0) isLine(0)
if (isLine(1) != 0) isLine(1)
if (isLine(2) != 0) isLine(2)
if (isColumn(0) != 0) isColumn(0)
if (isColumn(1) != 0) isColumn(1)
if (isColumn(2) != 0) isColumn(2)
if (isDiagonal != 0) isDiagonal
else 0
}
private def isLine(line: Int): Int = {
// this works only if goal = size
val number = board(line)(0)
if (number == 0) return 0
else if (board(line) == Array.fill(size)(number)) return number
else 0
}
private def isColumn(column: Int): Int = {
val number = board(0)(column)
if (number == 0) 0
else if (board(1)(column) == number && board(2)(column) == number) number
else 0
}
private def isDiagonal: Int = {
val number = board(1)(1)
if (board(2)(2) == number && board(1)(1) == number) number
else if (board(0)(2) == number && board(2)(0) == number) number
else 0
}
override def toString = "Board=\n" + board.deep.mkString("\n") + "\n"
}
class Player(number: Int) {
private def computeNextMove(board: Board): Pos = {
(2, 3)
}
def nextMove(board: Board, pos: Pos): Unit = {
board.setPosition(pos, number)
}
}
object PlayGame {
def main(args: Array[String]) {
val board = new Board
val players = List(new Player(1), new Player(2))
println("Start the game \n\n" + board)
players(0).nextMove(board, (1, 0))
players(1).nextMove(board, (1, 1))
players(0).nextMove(board, (1, 2))
players(1).nextMove(board, (0, 0))
players(0).nextMove(board, (2, 1))
players(1).nextMove(board, (2, 2))
}
}
The game is based on the function Player.nextMove that modifies the status of the board. As far as I know it is better to avoid mutable variables in scala.
I am wondering how can I implement the game without using the mutable board

Why do I have "Type mismatch; Found Unit, expected Boolean" in Scala-IDE?

I've got the following Scala code(ported from a Java one):
import scala.util.control.Breaks._
object Main {
def pascal(col: Int, row: Int): Int = {
if(col > row) throw new Exception("Coloumn out of bound");
else if (col == 0 || col == row) 1;
else pascal(col - 1, row - 1) + pascal(col, row - 1);
}
def balance(chars: List[Char]): Boolean = {
val string: String = chars.toString()
if (string.length() == 0) true;
else if(stringContains(")", string) == false && stringContains("(", string) == false) true;
else if(stringContains(")", string) ^ stringContains("(", string)) false;
else if(getFirstPosition("(", string) > getFirstPosition(")", string)) false;
else if(getLastPosition("(", string) > getLastPosition(")", string)) false;
else if(getCount("(", string) != getCount(")", string)) false;
var positionOfFirstOpeningBracket = getFirstPosition("(", string);
var openingBracketOccurences = 1; //we already know that at the first position there is an opening bracket so we are incrementing it right away with 1 and skipping the firstPosition variable in the loop
var closingBracketOccurrences = 0;
var positionOfClosingBracket = 0;
breakable {
for(i <- positionOfFirstOpeningBracket + 1 until string.length()) {
if (string.charAt(i) == ("(".toCharArray())(0)) {
openingBracketOccurences += 1;
}
else if(string.charAt(i) == (")".toCharArray())(0) ) {
closingBracketOccurrences += 1;
}
if(openingBracketOccurences - closingBracketOccurrences == 0) { //this is an important part of the algorithm. if the string is balanced and at the current iteration opening=closing that means we know the bounds of our current brackets.
positionOfClosingBracket = i; // this is the position of the closing bracket
break;
}
}
}
val insideBrackets: String = string.substring(positionOfFirstOpeningBracket + 1, positionOfClosingBracket)
balance(insideBrackets.toList) && balance( string.substring(positionOfClosingBracket + 1, string.length()).toList)
def getFirstPosition(character: String, pool: String): Int =
{
for(i <- 0 until pool.length()) {
if (pool.charAt(i) == (character.toCharArray())(0)) {
i;
}
}
-1;
}
def getLastPosition(character: String, pool: String): Int =
{
for(i <- pool.length() - 1 to 0 by -1) {
if (pool.charAt(i) == (character.toCharArray())(0)) {
i;
}
}
-1;
}
//checks if a string contains a specific character
def stringContains(needle: String, pool: String): Boolean = {
for(i <- 0 until pool.length()) {
if(pool.charAt(i) == (needle.toCharArray())(0)) true;
}
false;
}
//gets the count of occurrences of a character in a string
def getCount(character: String, pool: String) = {
var count = 0;
for ( i <- 0 until pool.length()) {
if(pool.charAt(i) == (character.toCharArray())(0)) count += 1;
}
count;
}
}
}
The problem is that the Scala IDE(lates version for Scaal 2.10.1) gives the following error at line 78(on which there is a closin brace): "Type mismatch; Found Unit, expected Boolean". I really can't understand what the actual problem is. The warning doesn't give any information where the error might be.
In Scala (and most other functional languages) the result of a function is the value of the last expression in the block. The last expression of your balance function is the definition of function getCount, which is of type Unit (the Scala equivalent of void), and your function is declared as returning a Boolean, thus the error.
In practice, you've just screwed up your brackets, which would be obvious if you used proper indentation (Ctrl+A, Ctrl+Shift+F in scala-ide).
To make it compile, you can put the following two lines at the end of the balance method instead of in the middle:
val insideBrackets: String = string.substring(positionOfFirstOpeningBracket + 1, positionOfClosingBracket)
balance(insideBrackets.toList) && balance(string.substring(positionOfClosingBracket + 1, string.length()).toList)
You would also have to put the inner functions at the top of balance I think -- such as getCount.

Palindromes using Scala

I came across this problem from CodeChef. The problem states the following:
A positive integer is called a palindrome if its representation in the
decimal system is the same when read from left to right and from right
to left. For a given positive integer K of not more than 1000000
digits, write the value of the smallest palindrome larger than K to
output.
I can define a isPalindrome method as follows:
def isPalindrome(someNumber:String):Boolean = someNumber.reverse.mkString == someNumber
The problem that I am facing is how do I loop from the initial given number and break and return the first palindrome when the integer satisfies the isPalindrome method? Also, is there a better(efficient) way to write the isPalindrome method?
It will be great to get some guidance here
If you have a number like 123xxx you know, that either xxx has to be below 321 - then the next palindrom is 123321.
Or xxx is above, then the 3 can't be kept, and 124421 has to be the next one.
Here is some code without guarantees, not very elegant, but the case of (multiple) Nines in the middle is a bit hairy (19992):
object Palindrome extends App {
def nextPalindrome (inNumber: String): String = {
val len = inNumber.length ()
if (len == 1 && inNumber (0) != '9')
"" + (inNumber.toInt + 1) else {
val head = inNumber.substring (0, len/2)
val tail = inNumber.reverse.substring (0, len/2)
val h = if (head.length > 0) BigInt (head) else BigInt (0)
val t = if (tail.length > 0) BigInt (tail) else BigInt (0)
if (t < h) {
if (len % 2 == 0) head + (head.reverse)
else inNumber.substring (0, len/2 + 1) + (head.reverse)
} else {
if (len % 2 == 1) {
val s2 = inNumber.substring (0, len/2 + 1) // 4=> 4
val h2 = BigInt (s2) + 1 // 5
nextPalindrome (h2 + (List.fill (len/2) ('0').mkString)) // 5 + ""
} else {
val h = BigInt (head) + 1
h.toString + (h.toString.reverse)
}
}
}
}
def check (in: String, expected: String) = {
if (nextPalindrome (in) == expected)
println ("ok: " + in) else
println (" - fail: " + nextPalindrome (in) + " != " + expected + " for: " + in)
}
//
val nums = List (("12345", "12421"), // f
("123456", "124421"),
("54321", "54345"),
("654321", "654456"),
("19992", "20002"),
("29991", "29992"),
("999", "1001"),
("31", "33"),
("13", "22"),
("9", "11"),
("99", "101"),
("131", "141"),
("3", "4")
)
nums.foreach (n => check (n._1, n._2))
println (nextPalindrome ("123456678901234564579898989891254392051039410809512345667890123456457989898989125439205103941080951234566789012345645798989898912543920510394108095"))
}
I guess it will handle the case of a one-million-digit-Int too.
Doing reverse is not the greatest idea. It's better to start at the beginning and end of the string and iterate and compare element by element. You're wasting time copying the entire String and reversing it even in cases where the first and last element don't match. On something with a million digits, that's going to be a huge waste.
This is a few orders of magnitude faster than reverse for bigger numbers:
def isPalindrome2(someNumber:String):Boolean = {
val len = someNumber.length;
for(i <- 0 until len/2) {
if(someNumber(i) != someNumber(len-i-1)) return false;
}
return true;
}
There's probably even a faster method, based on mirroring the first half of the string. I'll see if I can get that now...
update So this should find the next palindrome in almost constant time. No loops. I just sort of scratched it out, so I'm sure it can be cleaned up.
def nextPalindrome(someNumber:String):String = {
val len = someNumber.length;
if(len==1) return "11";
val half = scala.math.floor(len/2).toInt;
var firstHalf = someNumber.substring(0,half);
var secondHalf = if(len % 2 == 1) {
someNumber.substring(half+1,len);
} else {
someNumber.substring(half,len);
}
if(BigInt(secondHalf) > BigInt(firstHalf.reverse)) {
if(len % 2 == 1) {
firstHalf += someNumber.substring(half, half+1);
firstHalf = (BigInt(firstHalf)+1).toString;
firstHalf + firstHalf.substring(0,firstHalf.length-1).reverse
} else {
firstHalf = (BigInt(firstHalf)+1).toString;
firstHalf + firstHalf.reverse;
}
} else {
if(len % 2 == 1) {
firstHalf + someNumber.substring(half,half+1) + firstHalf.reverse;
} else {
firstHalf + firstHalf.reverse;
}
}
}
This is most general and clear solution that I can achieve:
Edit: got rid of BigInt's, now it takes less than a second to calculate million digits number.
def incStr(num: String) = { // helper method to increment number as String
val idx = num.lastIndexWhere('9'!=, num.length-1)
num.take(idx) + (num.charAt(idx)+1).toChar + "0"*(num.length-idx-1)
}
def palindromeAfter(num: String) = {
val lengthIsOdd = num.length % 2
val halfLength = num.length / 2 + lengthIsOdd
val leftHalf = num.take(halfLength) // first half of number (including central digit)
val rightHalf = num.drop(halfLength - lengthIsOdd) // second half of number (also including central digit)
val (newLeftHalf, newLengthIsOdd) = // we need to calculate first half of new palindrome and whether it's length is odd or even
if (rightHalf.compareTo(leftHalf.reverse) < 0) // simplest case - input number is like 123xxx and xxx < 321
(leftHalf, lengthIsOdd)
else if (leftHalf forall ('9'==)) // special case - if number is like '999...', then next palindrome will be like '10...01' and one digit longer
("1" + "0" * (halfLength - lengthIsOdd), 1 - lengthIsOdd)
else // other cases - increment first half of input number before making palindrome
(incStr(leftHalf), lengthIsOdd)
// now we can create palindrome itself
newLeftHalf + newLeftHalf.dropRight(newLengthIsOdd).reverse
}
According to your range-less proposal: the same thing but using Stream:
def isPalindrome(n:Int):Boolean = n.toString.reverse == n.toString
def ints(n: Int): Stream[Int] = Stream.cons(n, ints(n+1))
val result = ints(100).find(isPalindrome)
And with iterator (and different call method, the same thing you can do with Stream, actually):
val result = Iterator.from(100).find(isPalindrome)
But as #user unknown stated, it is direct bruteforce and not practical with large numbers.
To check if List of Any Type is palindrome using slice, without any Loops
def palindrome[T](list: List[T]): Boolean = {
if(list.length==1 || list.length==0 ){
false
}else {
val leftSlice: List[T] = list.slice(0, list.length / 2)
var rightSlice :List[T]=Nil
if (list.length % 2 != 0) {
rightSlice = list.slice(list.length / 2 + 1, list.length).reverse
} else {
rightSlice = list.slice(list.length / 2, list.length).reverse
}
leftSlice ==rightSlice
}
}
Though the simplest solution would be
def palindrome[T](list: List[T]): Boolean = {
list == list.reverse
}
You can simply use the find method on collections to find the first element matching a given predicate:
def isPalindrome(n:Int):Boolean = n.toString.reverse == n.toString
val (start, end) = (100, 1000)
val result: Option[Int] = (start to end).find(isPalindrome)
result foreach println
>Some(101)
Solution to verify if a String is a palindrome
This solution doesn't reverse the String. However I am not sure that it will be faster.
def isPalindrome(s:String):Boolean = {
s.isEmpty ||
((s.last == s.head) && isPalindrome(s.tail.dropRight(1)))
}
Solution to find next palindrome given a String
This solution is not the best for scala (pretty the same as Java solution) but it only deals with Strings and is suitable for large numbers
You just have to mirror the first half of the number you want, check if it is higher than the begin number, otherwise, increase by one the last digit of the first half and mirror it again.
First, a function to increment the string representation of an int by 1:
def incrementString(s:String):String = {
if(s.nonEmpty){
if (s.last == '9')
incrementString(s.dropRight(1))+'0'
else
s.dropRight(1) + (s.last.toInt +1).toChar
}else
"1"
}
Then, a function to compare to string representation of ints: (the function 'compare' doesn't work for that case)
/* is less that 0 if x<y, is more than 0 if x<y, is equal to 0 if x==y */
def compareInts(x:String, y:String):Int = {
if (x.length !=y.length)
(x.length).compare(y.length)
else
x.compare(y)
}
Now the function to compute the next palindrome:
def nextPalindrome(origin_ :String):String = {
/*Comment if you want to have a strictly bigger number, even if you already have a palindrome as input */
val origin = origin_
/* Uncomment if you want to have a strictly bigger number, even if you already have a palindrome as input */
//val origin = incrementString(origin_)
val length = origin.length
if(length % 2 == 0){
val (first, last) = origin.splitAt(length/2);
val reversed = first.reverse
if (compareInts(reversed,last) > -1)
first ++ reversed
else{
val firstIncr = incrementString(first)
firstIncr ++ firstIncr.reverse
}
} else {
val (first,last) = origin.splitAt((length+1)/2)
val reversed = first.dropRight(1).reverse
if (compareInts(reversed,last) != -1)
first ++ reversed
else{
val firstIncr = incrementString(first)
firstIncr ++ firstIncr.dropRight(1).reverse
}
}
}
You could try something like this, I'm using basic recursive:
object Palindromo {
def main(args: Array[String]): Unit = {
var s: String = "arara"
println(verificaPalindromo(s))
}
def verificaPalindromo(s: String): String = {
if (s.length == 0 || s.length == 1)
"true"
else
if (s.charAt(0).toLower == s.charAt(s.length - 1).toLower)
verificaPalindromo(s.substring(1, s.length - 1))
else
"false"
}
}
#tailrec
def palindrome(str: String, start: Int, end: Int): Boolean = {
if (start == end)
true
else if (str(start) != str(end))
false
else
pali(str, start + 1, end - 1)
}
println(palindrome("arora", 0, str.length - 1))