Convert stringLiteral into a string - scala

In Scala's parser combinators (JavaTokensParser in particular) there is a definition stringLiteral that matches a Java-like string. Is there a way to convert a stringLiteral into a String? For example, If I parse "Run \" run \\ run" I would want to convert the entered string literal into Run " run \ run.
Also, is there a definition for stringLiterals that also supports """?

I have a hunch you are asking a more complicated question, but just in case the simple answer is write your own parser and trim the quotes after the applicator, ^^.
In the REPL you can test it like such:
import scala.util.parsing.combinator.JavaTokenParsers
object testParsers extends JavaTokenParsers {
val aString : Parser[String] = stringLiteral ^^ {
case s => s.substring( 1, s.length-1 )
}
}
testParsers.parseAll(testParsers.stringLiteral,""""Run \" run \\ run"""")
testParsers.parseAll(testParsers.aString,""""Run \" run \\ run"""")
I am not aware of any built in triple-quote parsers, so I guess you will have to roll your own.

Apache Commons provides a useful method: StringEscapeUtils.escapeJava.

Related

Calling methods with parameters without using parentheses

I am using the following implicit class in order to call a function which would otherwise take parameters without having to write the parameters in brackets:
scala> implicit class Print(string: String) {
| def echo: Unit = Console.println(string)
| }
defined class Print
scala> "Hello world" echo
Hello world
However while this works, I don't really like how it looks and my goal is to get the method call in front of the input variable as I think it reads better.
Is there any simple way, without relying on external libraries, to be able to call a method before supplying the parameters and without needing brackets? Implicit classes are what I've been using so far but that doesn't have to be the final solution.
What I would like to type instead of "Hello world" echo:
scala> echo "Hello world"
Alternatives I have tried:
Object with apply method
Requires parentheses
object echo {
def apply(string: String): Unit = Console.println(string)
}
echo "Hello world" // Error: ';' or newline expected
extending Dynamic [see here]
Doesn't seem to work in my version of Scala
Special characters [see here]
Looks ugly and not what I am looking for
Scalaz [see here]
Looks to do basically what my implicit class solution does, and I don't want any external dependencies.
EDIT
This answer has been pointed to as a potential solution, but again it doesn't address my issue as it relies on Dynamic to achieve a solution. As previously mentioned, Dynamic does not solve my problem for a couple of reasons:
It behaves funnily
If you define a val and try to println that val, it gives you back the val's name and not its value (as pointed out by #metaphori):
object println extends Dynamic {
def typed[T] = asInstanceOf[T]
def selectDynamic(name: String) = Console.println(name)
}
val x = "hello"
println x // prints: x
The specific example linked to did not work when I tried to recreate it - it still gave the ';' or newline expected error
If I just misunderstood how to implement it then I would appreciate a scalafiddle demonstrating that this solution solves my problem and will happily concede that this question is a duplicate of the previously mentioned answer, but until then I do contest it.
AFAIK only way to do something similar to what you want is extending Dynamic like this:
object println extends Dynamic {
def typed[T] = asInstanceOf[T]
def selectDynamic(name: String) = Console.println(name)
}
and using it with:
println `Hello World`
edit: of course you need to enable related features either by adding compiler parameters -language:postfixOps and -language:dynamics or by importing scala.language.dynamics and scala.language.postfixOps
You cannot achieve
echo "str"
since Scala is not e.g. Ruby: it syntactically requires that function invocations use parentheses.
It is not a matter of semantics or how things are implemented or what techniques are used: here is the parser that complains.
The point is that x y is actually interpreted as x.y, which means that y must be a method.
Refer to the Scala Language Specification, section 6.6 Function Applications:
SimpleExpr ::= SimpleExpr1 ArgumentExprs
ArgumentExprs ::= ‘(’ [Exprs] ‘)’
| ‘(’ [Exprs ‘,’] PostfixExpr ‘:’ ‘_’ ‘*’ ‘)’
| [nl] BlockExpr
Exprs ::= Expr {‘,’ Expr}
I do not like the trick of #hüseyin-zengin since it leverages dynamic method invocations, and also does not work as expected:
val x = "hello"
println x // prints: x
To partially achieve what you like you need to use infix operator notation
object run { def echo(s: String) = println(s) }
run echo "hello" // OK
run.echo "hello" // error: ';' expected but string literal found.
You may also use a symbol to reduce "typing" overhead (though may be perceived weirdly):
object > { def echo(s: String) = println(s) }
> echo "hello" // OK

Using regex parser within a JavaTokensParser subclass

I am trying out scala parser combinators with the following object:
object LogParser extends JavaTokenParsers with PackratParsers {
Some of the parsers are working. But the following one is getting tripped up:
def time = """([\d]{2}:[\d]{2}:[\d]{2}\.[\d]+)"""
Following is the input not working:
09:58:24.608891
On reaching that line we get:
[2.22] failure: `([\d]{2}:[\d]{2}:[\d]{2}\.[\d]+)' expected but `:' found
09:58:24.608891
Note: I did verify correct behavior of that regex within the scala repl on the same input pattern.
val r = """([\d]{2}):([\d]{2}):([\d]{2}\.[\d]+)""".r
val s = """09:58:24.608891"""
val r(t,t2,t3) = s
t: String = 09
t2: String = 58
t3: String = 24.608891
So.. AFA parser combinator: is there an issue with the ":" token itself - i.e. need to create my own custom Lexer and add ":" to lexical.delimiters?
Update an answer was provided to add ".r". I had already tried that- but in any case to be explicit: the following has the same behavior (does not work):
def time = """([\d]{2}:[\d]{2}:[\d]{2}.[\d]+)""".r
I think you're just missing an .r at the end here to actually have a Regex as opposed to a string literal.
def time = """([\d]{2}:[\d]{2}:[\d]{2}\.[\d]+)"""
it should be
def time = """([\d]{2}:[\d]{2}:[\d]{2}\.[\d]+)""".r
The first one expects the text to be exactly like the regex string literal (which obviously isn't present), the second one expects text that actually matches the regex. Both create a Parser[String], so it's not immediately obvious that something is missing.
There's an implicit conversion from java.lang.String to Parser[String], so that string literals can be used as parser combinators.
There's an implicit conversion from scala.util.matching.Regex to > Parser[String], so that regex expressions can be used as parser combinators.
http://www.scala-lang.org/files/archive/api/2.11.2/scala-parser-combinators/#scala.util.parsing.combinator.RegexParsers

Make parser include surrounding whitespace in string literals

I wrote a Scala parser for an in-house expression language that has double quote-delimited string literals:
object MyParser extends JavaTokenParsers {
lazy val strLiteral = "\"" ~> """[^"]*""".r <~ "\"" ^^ {
case x ⇒ StringLiteral(x)
}
// ...
}
(The actual code is a bit different since I support "" as an escape sequence for a literal double quote. While this is not relevant for the discussion, it's the reason why I cannot just use JavaTokenParsers's stringLiteral).
I noticed that the parser fails to include whitespace at the beginning and at the end of a string:
"a" parsed as StringLiteral("a")
" a" parsed as StringLiteral("a")
"a " parsed as StringLiteral("a")
" a " parsed as StringLiteral("a")
I tried matching whitespace in the regex:
"\"" ~> """\s*[^"]*\s*""".r <~ "\""
and also using the explicit whiteSpace parser:
"\"" ~> whiteSpace.? ~ """[^"]*""".r ~ whiteSpace.? <~ "\""
but in both cases the ~> operator has already consumed and ignored the spaces before there's a chance to read and handle them.
I know that I can set skipWhitespace = false, but I prefer not to, since in general I want to allow arbitrary whitespace around tokens in this language.
What's a simple and clean strategy to include surrounding whitespace in string literals?
One option you have is to use single regexp for your string literal:
val stringLiteral:Parser[String] = """"([^"]*("")?)*"""".r
and then strip matched quotes afterwards.

Parsing sentences using Scala parser combinator

I just started playing with parser combinators in Scala, but got stuck on a parser to parse sentences such as "I like Scala." (words end on a whitespace or a period (.)).
I started with the following implementation:
package example
import scala.util.parsing.combinator._
object Example extends RegexParsers {
override def skipWhitespace = false
def character: Parser[String] = """\w""".r
def word: Parser[String] =
rep(character) <~ (whiteSpace | guard(literal("."))) ^^ (_.mkString(""))
def sentence: Parser[List[String]] = rep(word) <~ "."
}
object Test extends App {
val result = Example.parseAll(Example.sentence, "I like Scala.")
println(result)
}
The idea behind using guard() is to have a period demarcate word endings, but not consume it so that sentences can. However, the parser gets stuck (adding log() reveals that it is repeatedly trying the word and character parser).
If I change the word and sentence definitions as follows, it parses the sentence, but the grammar description doesn't look right and will not work if I try to add parser for paragraph (rep(sentence)) etc.
def word: Parser[String] =
rep(character) <~ (whiteSpace | literal(".")) ^^ (_.mkString(""))
def sentence: Parser[List[String]] = rep(word) <~ opt(".")
Any ideas what may be going on here?
However, the parser gets stuck (adding log() reveals that it is repeatedly trying the word and character parser).
The rep combinator corresponds to a * in perl-style regex notation. This means it matches zero or more characters. I think you want it to match one or more characters. Changing that to a rep1 (corresponding to + in perl-style regex notation) should fix the problem.
However, your definition still seems a little verbose to me. Why are you parsing individual characters instead of just using \w+ as the pattern for a word? Here's how I'd write it:
object Example extends RegexParsers {
override def skipWhitespace = false
def word: Parser[String] = """\w+""".r
def sentence: Parser[List[String]] = rep1sep(word, whiteSpace) <~ "."
}
Notice that I use rep1sep to parse a non-empty list of words separated by whitespace. There's a repsep combinator as well, but I think you'd want at least one word per sentence.

Can't convert unicode symbols to cyrillic

I have a bunch of documents persisted in Apache Lucene with some names in russian, and when I'm trying to print them out it looks like this "\u0410\u0441\u043f\u0430\u0440", but not in cyrillic symbols. The project is in Scala. I've tried to fix this with Apache Commons unescapeJava method, but it didn't help. Are there any other options?
Updated:
Project is writen with Spray framework and returns json like this.
{
"id" : 0,
"name" : "\u0410\u0441\u043f\u0430\u0440"
}
I'm going to try to infer exactly what you are doing.
You are using Spray, so I gather that you are using its json library "spray-json"
So I suppose that you have some instance of spray.json.JsObject, and that what you posted in your question is what you get as the output when printing this instance.
Your json object is correct, the value of the name field has no embeded escaping, it is actually the conversion to string that escapes some unicode characters.
See the definition of printString here:
https://github.com/spray/spray-json/blob/master/src/main/scala/spray/json/JsonPrinter.scala
I will also assume that when you tried to use unescapeJava, you applied it on the value of the name field, creating a new spray.json.JsObject instance that you then printed as before. Given that your json object does not actually have any escaping, this did absolutly nothing, and then when printing it the printer does the escaping as before, and you're back to square one.
As a side note, it's worth mentioning that the json spec does not mandate how characters are encoded: they can either be stored as their literal value, or as a unicode escape. By example the string "abc" could be described as just "abc", or as "\u0061\u0062\u0063". Either form is correct. It just happens that the author of spray-json decided to use the latter form for all non-ascii characters.
So now you ask, what can I do to work around this? You could ask the spray-json author to add an option that let's you specify that you don't want any unicode escaping.
But I imagine that you want a solution right now.
The simplest thing to do is to just convert your object to a string (via JsValue.toString or JsValue.compactPrint or JsValue.prettyPrint), and then pass the result to unescapeJava. At least this will give you back your cyrillic original characters.
But this is a bit gross, and actually quite dangerous as some characters are not safe to unescape inside a string literal. By example: \n will be unescaped to an actual return, and \u0022 will be unescaped to ". You can easily see how it will break your json document.
But at the very least it will allow to confirm my theory (remember that I have been making assumptions about what exactly you are doing).
Now for a proper fix: you could simply extend JsonPrinter and override its printString method to remove the unicode escapting. Something like this (untested):
trait NoUnicodeEscJsonPrinter extends JsonPrinter {
override protected def printString(s: String, sb: StringBuilder) {
#tailrec
def printEscaped(s: String, ix: Int) {
if (ix < s.length) {
s.charAt(ix) match {
case '"' => sb.append("\\\"")
case '\\' => sb.append("\\\\")
case x if 0x20 <= x && x < 0x7F => sb.append(x)
case '\b' => sb.append("\\b")
case '\f' => sb.append("\\f")
case '\n' => sb.append("\\n")
case '\r' => sb.append("\\r")
case '\t' => sb.append("\\t")
case x => sb.append(x)
}
printEscaped(s, ix + 1)
}
}
sb.append('"')
printEscaped(s, 0)
sb.append('"')
}
}
trait NoUnicodeEscPrettyPrinter extends PrettyPrinter with NoUnicodeEscJsonPrinter
object NoUnicodeEscPrettyPrinter extends NoUnicodeEscPrettyPrinter
trait NoUnicodeEscCompactPrinter extends CompactPrinter with NoUnicodeEscJsonPrinter
object NoUnicodeEscCompactPrinter extends NoUnicodeEscCompactPrinter
Then you can do:
val json: JsValue = ...
val jsonString: String = NoUnicodeEscPrettyPrinter( json )
jsonString will contain your json document in pretty-print format and without any unicde escaping.
This problem appears to be corrected in spray-json 1.3.2: https://github.com/spray/spray-json/issues/46
I ran into a similar problem with Arabic characters using Akka HTTP 1.0, which depends on 1.3.1. By upgrading to 1.3.2, my problem was resolved.