OCaml: How to decode unicode-escape string? - unicode

Given a str as following:
let str = "#include \\u003Cunordered_map\\u003E\\u000D\\u000A"
How do I decode unicode-escape string into a unicode string or in may case Ascii string in OCaml?
In python I could easily do
str.decode("unicode-escape")

If your embedded escape sequences are always going to encode ASCII characters, as you say, you can find them and replace them with the decoded equivalent:
let decode s =
let re = Str.regexp "\\\\u[0-9a-fA-F][0-9a-fA-F][0-9a-fA-F][0-9a-fA-F]" in
let s1 n = String.make 1 (Char.chr n) in
let subst = function
| Str.Delim u -> s1 (int_of_string ("0x" ^ String.sub u 2 4))
| Str.Text t -> t
in
String.concat "" (List.map subst (Str.full_split re s))
This works for your example:
val decode : string -> string = <fun>
# decode "#include \\u003Cunordered_map\\u003E\\u000D\\u000A";;
- : string = "#include <unordered_map>\r\n"
Indeed, Python has built-in support to decode these sequences.
Update
To support all four-digit hex escape sequences "\uXXXX" by converting to UTF-8 you can use this code:
let utf8encode s =
let prefs = [| 0x0; 0xc0; 0xe0 |] in
let s1 n = String.make 1 (Char.chr n) in
let rec ienc k sofar resid =
let bct = if k = 0 then 7 else 6 - k in
if resid < 1 lsl bct then
(s1 (prefs.(k) + resid)) ^ sofar
else
ienc (k + 1) (s1 (0x80 + resid mod 64) ^ sofar) (resid / 64)
in
ienc 0 "" (int_of_string ("0x" ^ s))
let decode2 s =
let re = Str.regexp "\\\\u[0-9a-fA-F][0-9a-fA-F][0-9a-fA-F][0-9a-fA-F]" in
let subst = function
| Str.Delim u -> utf8encode (String.sub u 2 4)
| Str.Text t -> t
in
String.concat "" (List.map subst (Str.full_split re s))
It also works for your example, and some other examples:
val utf8encode : string -> string = <fun>
val decode2 : string -> string = <fun>
# decode2 "#include \\u003Cunordered_map\\u003E\\u000D\\u000A";;
- : string = "#include <unordered_map>\r\n"
# print_endline (decode2 "\\u00A2");;
¢
- : unit = ()
# print_endline (decode2 "\\u20AC");;
€
- : unit = ()

Related

How to use string.split() without foreach()?

Write a program in Scala that reads an String from the keyboard and counts the number of characters, ignoring if its UpperCase or LowerCase
ex: Avocado
R: A = 2; v = 1; o = 2; c = 1; d = 2;
So, i tried to do it with two fors iterating over the string, and then a conditional to transform the character in the position (x) to Upper and compare with the character in the position (y) which is the same position... basically i'm transforming the same character so i can increment in the counter ex: Ava -> A = 2; v = 1;
But with this logic when i print the result it comes with:
ex: Avocado
R: A = 2; v = 1; o = 2; c = 1; a = 2; d = 1; o = 2;
its repeting the same character Upper or Lower in the result...
so my teacher asked us to resolve this using the split method and yield of Scala but i dunno how to use the split without forEach() that he doesnt allow us to use.
sorry for the bad english
object ex8 {
def main(args: Array[String]): Unit = {
println("Write a string")
var string = readLine()
var cont = 0
for (x <- 0 to string.length - 1) {
for (y <- 0 to string.length - 1) {
if (string.charAt(x).toUpper == string.charAt(y).toUpper)
cont += 1
}
print(string.charAt(x) + " = " + cont + "; ")
cont = 0
}
}
}
But with this logic when i print the result it comes with:
ex: Avocado
R: A = 2; V = 1; o = 2; c = 1; a = 2; d = 1; o = 2;
Scala 2.13 has added a very handy method to cover this sort of thing.
inputStr.groupMapReduce(_.toUpper)(_ => 1)(_+_)
.foreach{case (k,v) => println(s"$k = $v")}
//A = 2
//V = 1
//C = 1
//O = 2
//D = 1
It might be easier to group the individual elements of the String (i.e. a collection of Chars, made case-insensitive with toLower) to aggregate their corresponding size using groupBy/mapValues:
"Avocado".groupBy(_.toLower).mapValues(_.size)
// res1: scala.collection.immutable.Map[Char,Int] =
// Map(a -> 2, v -> 1, c -> 1, o -> 2, d -> 1)
Scala 2.11
Tried with classic word count approach of map => group => reduce
val exampleStr = "Avocado R"
exampleStr.
toLowerCase.
trim.
replaceAll(" +","").
toCharArray.map(x => (x,1)).groupBy(_._1).
map(x => (x._1,x._2.length))
Answer :
exampleStr: String = Avocado R
res3: scala.collection.immutable.Map[Char,Int] =
Map(a -> 2, v -> 1, c -> 1, r -> 1, o -> 2, d -> 1)

How to ceil the result for UInt division in Chisel

As the title stated, how to do that?
val a = 3.U
val result = a / 2.U
result would be 1.U
However I want to apply ceil on division.
val result = ceil(a / 2.U )
Therefore, I could get 2.U of the result value.
When dividing a by b, if you know that a is not too big (namely that a <= UInt.MaxValue - (b - 1)), then you can do
def ceilUIntDiv(a: UInt, b: UInt): UInt =
(a + b - 1.U) / b
If a is potentially too big, then the above can overflow, and you'll need to adapt the result after the fact instead:
def ceilUIntDiv(a: UInt, b: UInt): UInt = {
val c = a / b
if (b * c == a) c else c + 1.U
}
The problem is the expression a / 2.U is indeed 1.U: if you apply ceil to 1.U you'll get 1.U.
Recall that this happens to Ints as well, as they use integer division:
scala> val result = Math.ceil(3 / 2)
result: Double = 1.0
What you should do is to enforce one of the division operands to be a Double likewise:
scala> val result = Math.ceil(3 / (2: Double))
result: Double = 2.0
And then just convert it back to UInt.
def ceilUIntDiv(a: UInt, b: UInt): UInt = {
(a / b) + {if (a % b == 0.U) 0.U else 1.U}
}

OperatorPrecedenceParser throw exception about negative priority which I don't have

I am creating a parser for a programming language based on the lambda-calculus. I added an infix operator and their precedence but the parser crashed with an error about negative priority. I am able to do the parsing of operator by hand, but it seem that I cannot get the priority right. So I thought that I may as well learn to use the OperatorPrecedenceParser.
I will show the code because I have no idea why it crash, since I don't have any negative priority.
The language AST
module MiniML
type Exp =
| C of Cst
| Id of Id
| Lam of Id * Exp
| App of Exp * Exp
| Let of Id * Exp * Exp
| Pair of Exp * Exp
| If of Exp * Exp * Exp
and Cst = I of int | B of bool | Unit | Nil
and Id = string;;
let op = ["+";
"-";
"*";
"/";
"=";
"<";
">";
"#";
"and";
"or";
",";
"::"
]
Here is the parser itself. It's my first time with parser combinator (and parsing) so if there is something terribly wrong, I'd like to know. Otherwise, just knowing why it crash would be enough.
open MiniML
open FParsec
let ws = spaces
let operator : Parser<MiniML.Id,unit> = op |> List.map pstring |> choice
let keyword : Parser<string,unit> = ["false";"true";"let";"end";"in";"if";"then";"else";"lam"] |> List.map pstring |> choice
let fstId = asciiLetter <|> pchar '_'
let restId = fstId <|> digit <|> pchar '''
let betweenPar p = between (pchar '(' .>> ws) (pchar ')' .>> ws) p
let cstB = (stringReturn "true" (B true)) <|> (stringReturn "false" (B false))
let cstI = puint32 |>> (int >> I)
let cstU = stringReturn "()" Unit
let cstN = stringReturn "[]" Nil
let expC : Parser<Exp,unit> = cstB <|> cstI <|> cstU <|> cstN |>> C
let expIdStr = notFollowedByL keyword "Cannot use keyword as variable" >>.
notFollowedByL operator "Cannot use operator as variable" >>.
many1CharsTill2 fstId restId (notFollowedBy restId)
let expId : Parser<Exp,unit> = expIdStr |>> (MiniML.Exp.Id)
let exp, expRef = createParserForwardedToRef<Exp, unit>()
let expApp, expAppRef = createParserForwardedToRef<Exp, unit>()
let expLam : Parser<Exp,unit> = (pstring "lam" >>. ws >>. expIdStr .>> ws .>> pchar '.') .>> ws .>>. exp |>> Lam
let expLet = tuple3 (pstring "let" >>. ws >>. expIdStr .>> ws .>> pchar '=' .>> ws) (exp .>> ws .>> pstring "in" .>> ws) (exp .>> ws .>> pstring "end") |>> Let
let expIf = tuple3 (pstring "if" >>. ws >>. exp .>> ws) (pstring "then" >>. ws >>. exp .>> ws) (pstring "else" >>. ws >>. exp) |>> If
let closeEXP, closeEXPRef = createParserForwardedToRef<Exp, unit>()
let expBang = (pstring "!" >>% MiniML.Id "!") .>>. closeEXP |>> App
let buildList (el,ef) =
let rec go l = match l with
| (e::es) -> App(MiniML.Id "cons", Pair(e,go es))
| [] -> C Nil
go (el # [ef])
let expList = between (pchar '[' .>> ws) (pchar ']') (many (exp .>>? (ws .>> pchar ';' .>> ws)) .>>. exp .>> ws
|>> buildList )
do closeEXPRef := choice [expC ; expId ; expBang ; betweenPar exp ; expList] .>> ws
do expAppRef := many1 closeEXP |>> (function (x::xs) -> List.fold (fun x y -> App(x,y)) x xs | [] -> failwith "Impossible")
let opOpp : InfixOperator<Exp,unit,unit> list =
[
InfixOperator("*", ws, 6, Associativity.Left, fun x y -> App(MiniML.Id "*",Pair(x,y)));
InfixOperator("/", ws, 6, Associativity.Left, fun x y -> App(MiniML.Id "/",Pair(x,y)));
InfixOperator("+", ws, 5, Associativity.Left, fun x y -> App(MiniML.Id "+",Pair(x,y)));
InfixOperator("-", ws, 5, Associativity.Left, fun x y -> App(MiniML.Id "-",Pair(x,y)));
InfixOperator("::", ws,4, Associativity.Right, fun x y -> App(MiniML.Id "cons",Pair(x,y)));
InfixOperator("=", ws, 3, Associativity.Left, fun x y -> App(MiniML.Id "=",Pair(x,y)));
InfixOperator("<", ws, 3, Associativity.Left, fun x y -> App(MiniML.Id "<",Pair(x,y)));
InfixOperator(">", ws, 3, Associativity.Left, fun x y -> App(MiniML.Id ">",Pair(x,y)));
InfixOperator("and", ws, 2, Associativity.Right, fun x y -> App(MiniML.Id "and",Pair(x,y)));
InfixOperator("or", ws, 1, Associativity.Right, fun x y -> App(MiniML.Id "or",Pair(x,y)));
InfixOperator(",", ws,0, Associativity.None, fun x y -> Pair(x,y) )
]
let opp = new OperatorPrecedenceParser<Exp,unit,unit>()
let expr = opp.ExpressionParser
let term = exp <|> betweenPar expr
opp.TermParser <- term
List.iter (fun x -> opp.AddOperator(x)) opOpp
do expRef := [expLam;expIf;expLet;expApp] |> choice |> (fun p -> p .>>. opt (expOp operator) |>> binOp )
let mainExp = expr .>> eof
Your sample code doesn't seem to be complete, since expOp and binOp are not included. When I run your code without the last two lines, the OPP throws an ArgumentOutOfRangeException with the message "The operator precedence must be greater than 0." when the comma operator is added. The problem is that you specified 0 as the precedence for the comma operator.
Such problems are easier to diagnose when you use an IDE with a fully integrated debugger like Visual Studio.

Extract coefficients from binomial expression entered as a string in Scala

I am trying to write a program that can find the roots of a quadratic equation using Scala. The input should be a quadratic equation in the form ax^2+bx+c (e.g: 5x^2+2x+3) as a string.
I managed to code the calculation of the roots but am having trouble extracting the coefficients from the input. Here's the code I wrote for extracting the coefficients so far:
def getCoef(poly: String) = {
var aT: String = ""
var bT: String = ""
var cT: String = ""
var x: Int = 2
for (i <- poly.length - 1 to 0) {
val t: String = poly(i).toString
if (x == 2) {
if (t forall Character.isDigit) aT = aT + t(i)
else if (t == "^") {i = i + 1; x = 1}
}
else if (x == 1) {
if (t forall Character.isDigit) bT = bT + t(i)
else if (t == "+" || t == "-") x = 0
}
else if (x == 0) {
if (t forall Character.isDigit) cT = cT + t(i)
}
val a: Int = aT.toInt
val b: Int = bT.toInt
val c: Int = cT.toInt
(a, b, c)
}
}
Simple solution with regex:
def getCoef(poly: String) = {
val polyPattern = """(\d+)x\^2\+(\d+)x\+(\d+)""".r
val matcher = polyPattern.findFirstMatchIn(poly).get
(matcher.group(1).toInt, matcher.group(2).toInt, matcher.group(3).toInt)
}
Does not handle all cases (e.g.: minus) and just throws an error if the input does not match the pattern, but it should get you going.

Implementing sequences of sequences in F#

I am trying to expose a 2 dimensional array as a sequence of sequences on an object(to be able to do Seq.fold (fun x -> Seq.fold (fun ->..) [] x) [] mytype stuff specifically)
Below is a toy program that exposes the identical functionality.
From what I understand there is a lot going on here, first of IEnumerable has an ambiguous overload and requires a type annotation to explicitly isolate which IEnumerable you are talking about.
But then there can be issues with unit as well requiring additional help:
type blah =
class
interface int seq seq with
member self.GetEnumerator () : System.Collections.Generic.IEnumerable<System.Collections.Generic.IEnumerable<(int*int)>> =
seq{ for i = 0 to 10 do
yield seq { for j=0 to 10 do
yield (i,j)} }
end
Is there some way of getting the above code to work as intended(return a seq<seq<int>>) or am I missing something fundamental?
Well for one thing, GetEnumerator() is supposed to return IEnumerator<T> not IEnumerable<T>...
This will get your sample code to compile.
type blah =
interface seq<seq<(int * int)>> with
member self.GetEnumerator () =
(seq { for i = 0 to 10 do
yield seq { for j=0 to 10 do
yield (i,j)} }).GetEnumerator()
interface System.Collections.IEnumerable with
member self.GetEnumerator () =
(self :> seq<seq<(int * int)>>).GetEnumerator() :> System.Collections.IEnumerator
How about:
let toSeqOfSeq (array:array<array<_>>) = array |> Seq.map (fun x -> x :> seq<_>)
But this works with an array of arrays, not a two-dimensional array. Which do you want?
What are you really out to do? A seq of seqs is rarely useful. All collections are seqs, so you can just use an array of arrays, a la
let myArrayOfArrays = [|
for i = 0 to 9 do
yield [|
for j = 0 to 9 do
yield (i,j)
|]
|]
let sumAllProds = myArrayOfArrays |> Seq.fold (fun st a ->
st + (a |> Seq.fold (fun st (x,y) -> st + x*y) 0) ) 0
printfn "%d" sumAllProds
if that helps...
module Array2D =
// Converts 2D array 'T[,] into seq<seq<'T>>
let toSeq (arr : 'T [,]) =
let f1,f2 = Array2D.base1 arr , Array2D.base2 arr
let t1,t2 = Array2D.length1 arr - f1 - 1 , Array2D.length2 arr - f2 - 1
seq {
for i in f1 .. t1 do
yield seq {
for j in f2 .. t2 do
yield Array2D.get arr i j }}
let myArray2D : string[,] = array2D [["a1"; "b1"; "c1"]; ["a2"; "b2"; "c2"]]
printf "%A" (Array2D.toSeq myArray2D)