Haskell IO with non English characters - unicode

Look at this , i am try
appendFile "out" $ show 'д'
'д' is character from Russian alphabet.
After that "out" file contains:
'\1076'
How i understand is the unicode numeric code of character 'д'. Why is it happens ? And How i can to get the normal representation of my character ?
For additional information it is works good:
appendFile "out" "д"
Thanks.

show escapes all characters outside the ASCII range (and some inside the ASCII range), so don't use show.
Since "д" works fine, just use that. If you can't because the д is actually inside a variable, you can use [c] (where c is the variable containing the character. If you need to surround it by single quotes (like show does), you can use ['\'', c, '\''].

After reading your reply to my comment, I think your situation is that you have some data structure, maybe with type [(String,String)], and you'd like to output it for debugging purposes. Using show would be convienent, but it escapes non-ASCII characters.
The problem here isn't with the unicode, you need a function that will properly format your data for display. I don't think show is the right choice, in part because of the problems with escaping some characters. What you need is a type class like Show, but one that displays data for reading instead of escaping characters. That is, you need a pretty-printer, which is a library that provides functions to format data for display. There are several pretty-printers available on Hackage, I'd look at uulib or wl-pprint to start. I think either would be suitable without too much work.
Here's an example with the uulib tools. The Pretty type class is used instead of Show, the library comes with many useful instances.
import UU.PPrint
-- | Write each item to StdOut
logger :: Pretty a => a -> IO ()
logger x = putDoc $ pretty x <+> line
running this in ghci:
Prelude UU.PPrint> logger 'Д'
Д
Prelude UU.PPrint> logger ('Д', "other text", 54)
(Д,other text,54)
Prelude UU.PPrint>
If you want to output to a file instead of the console, you can use the hPutDoc function to output to a handle. You could also call renderSimple to produce a SimpleDoc, then pattern match on the constructors to process output, but that's probably more trouble. Whatever you do, avoid show:
Prelude UU.PPrint> show $ pretty 'Д'
"\1044"
You could also write your own type class similar to show but formatted as you like it. The Text.Printf module can be helpful if you go this route.

Use Data.Text. It provides IO with locale-awareness and encoding support.

A quick web search for "UTF Haskell" should give you good links. Probably the most recommended package is the text package.
import Data.Text.IO as UTF
import Data.Text as T
main = UTF.appendFile "out" (T.pack "д")

To display national characters by show, put in your code:
{-# LANGUAGE FlexibleInstances #-}
instance {-# OVERLAPPING #-} Show String where
show = id
You can try then:
*Main> show "ł"
ł
*Main> show "ą"
ą
*Main> show "ę"
ę
*Main> show ['ę']
ę
*Main> show ["chleb", "masło"]
[chleb,masło]
*Main> data T = T String deriving (Show)
*Main> t = T "Chleb z masłem"
*Main> t
T Chleb z masłem
*Main> show t
T Chleb z masłem

There were no quotes in my previous solution. In addition, I put the code in the module now and the module must be imported into your program.
{-# LANGUAGE FlexibleInstances #-}
module M where
instance {-# OVERLAPPING #-} Show String where
show x = ['"'] ++ x ++ ['"']
Information for beginners: remember that the show does not display anything. show converts data to string with additional formatting characters.
We can try in WinGHCi:
automaticaly by WinGHCi
*M> "ł"
"ł"
*M> "ą"
"ą"
*M> "ę"
"ę"
*M> ['ę']
"ę"
*M> ["chleb", "masło"]
["chleb","masło"]
*M> data T = T String deriving (Show)
*M> t = T "Chleb z masłem"
or manualy
*M> (putStrLn . show) "ł"
"ł"
*M> (putStrLn . show) "ą"
"ą"
*M> (putStrLn . show) "ę"
"ę"
*M> (putStrLn . show) ['ę']
"ę"
*M> (putStrLn . show) ["chleb", "masło"]
["chleb","masło"]
*M> data T = T String deriving (Show)
*M> t = T "Chleb z masłem"
*M> (putStrLn . show) t
T "Chleb z masłem"
In code to display:
putStrLn "ł"
putStrLn "ą"
putStrLn "ę"
putStrLn "masło"
(putStrLn . show) ['ę']
(putStrLn . show) ["chleb", "masło"]
data T = T String deriving (Show)
t = T "Chleb z masłem"
(putStrLn . show) t
I'm adding tag "polskie znaki haskell" for Google.

Related

Strip margin of indented triple-quote string in Purescript?

When using triple quotes in an indented position I for sure get indentation in the output js string too:
Comparing these two in a nested let
let input1 = "T1\nX55.555Y-44.444\nX52.324Y-40.386"
let input2 = """T1
X66.324Y-40.386
X52.324Y-40.386"""
giving
// single quotes with \n
"T1\x0aX55.555Y-44.444\x0aX52.324Y-40.386"
// triple quoted
"T1\x0a X66.324Y-40.386\x0a X52.324Y-40.386"
Is there any agreed upon thing like stripMargin in Scala so I can use those without having to unindent to top level?
Update, just to clarify what I mean, I'm currently doing:
describe "header" do
it "should parse example header" do
let input = """M48
;DRILL file {KiCad 4.0.7} date Wednesday, 31 January 2018 'AMt' 11:08:53
;FORMAT={-:-/ absolute / metric / decimal}
FMAT,2
METRIC,TZ
T1C0.300
T2C0.400
T3C0.600
T4C0.800
T5C1.000
T6C1.016
T7C3.400
%
"""
doesParse input header
describe "hole" do
it "should parse a simple hole" do
doesParse "X52.324Y-40.386" hole
Update:
I was asked to clarify stripMargin from Scala. It's used like so:
val speech = """T1
|X66.324Y-40.386
|X52.324Y-40.386""".stripMargin
which then removes the leading whitespace. stripMargin can take any separator, but defaults to |.
More examples:
Rust has https://docs.rs/trim-margin/0.1.0/trim_margin/
Kotlin has in stdlib: https://kotlinlang.org/api/latest/jvm/stdlib/kotlin.text/trim-margin.html
I guess it might sound like asking for left-pad ( :) ) but if there's something there already I'd rather not brew it myself…
I'm sorry you didn't get a prompt response to this one, but I have implemented this function here. In case the pull request isn't merged, here's an implementation that just depends on purescript-strings:
import Data.String (joinWith, split) as String
import Data.String.CodeUnits (drop, dropWhile) as String
import Data.String.Pattern (Pattern(..))
stripMargin :: String -> String
stripMargin =
let
lines = String.split (Pattern "\n")
unlines = String.joinWith "\n"
mapLines f = unlines <<< map f <<< lines
in
mapLines (String.drop 1 <<< String.dropWhile (_ /= '|'))

Couldn't match expected type 'Control.Monad.Trans.Reader.ReaderT MongoContext IO a0' with actual type 'IO ()'

I want to print my dot graph taken from mongoDB and then convert into an image.
run = do
docs <- timeFilter -- function to fetch [Document] from mongoDB
let dot = onlyDot docs -- exclude extra field from the documents
let dotObject = getObjId dot -- convert into an object
-- converting dot graph to string and then string to text to pass it on to parseDotGraph function
let xDotGraph = parseDotGraph (B.pack (show dotObject)) :: G.DotGraph String
Prelude.putStrLn $ B.unpack $ renderDot $ toDot xDotGraph -- this is not working, want to print
-- addExtension (runGraphviz xDotGraph) Png "graph" -- this is not working, want to draw as an image
printDocs dot
You need liftIO $ to the left of Prelude.putStrLn, but next time paste the complete error with line numbers and such. Your do block is in the ReaderT MongoContext IO monad, which contains IO so you can do IO actions in it but you have to lift them first.

Implement Scala-style String Interpolation In Scala

I want to implement a Scala-style string interpolation in Scala. Here is an example,
val str = "hello ${var1} world ${var2}"
At runtime I want to replace "${var1}" and "${var2}" with some runtime strings. However, when trying to use Regex.replaceAllIn(target: CharSequence, replacer: (Match) ⇒ String), I ran into the following problem:
import scala.util.matching.Regex
val placeholder = new Regex("""(\$\{\w+\})""")
placeholder.replaceAllIn(str, m => s"A${m.matched}B")
java.lang.IllegalArgumentException: No group with name {var1}
at java.util.regex.Matcher.appendReplacement(Matcher.java:800)
at scala.util.matching.Regex$Replacement$class.replace(Regex.scala:722)
at scala.util.matching.Regex$MatchIterator$$anon$1.replace(Regex.scala:700)
at scala.util.matching.Regex$$anonfun$replaceAllIn$1.apply(Regex.scala:410)
at scala.util.matching.Regex$$anonfun$replaceAllIn$1.apply(Regex.scala:410)
at scala.collection.Iterator$class.foreach(Iterator.scala:743)
at scala.collection.AbstractIterator.foreach(Iterator.scala:1174)
at scala.util.matching.Regex.replaceAllIn(Regex.scala:410)
... 32 elided
However, when I removed '$' from the regular expression, it worked:
val placeholder = new Regex("""(\{\w+\})""")
placeholder.replaceAllIn(str, m => s"A${m.matched}B")
res2: String = hello $A{var1}B world $A{var2}B
So my question is that whether this is a bug in Scala Regex. And if so, are there other elegant ways to achieve the same goal (other than brutal force replaceAllLiterally on all placeholders)?
$ is a treated specially in the replacement string. This is described in the documentation of replaceAllIn:
In the replacement String, a dollar sign ($) followed by a number will be interpreted as a reference to a group in the matched pattern, with numbers 1 through 9 corresponding to the first nine groups, and 0 standing for the whole match. Any other character is an error. The backslash (\) character will be interpreted as an escape character and can be used to escape the dollar sign. Use Regex.quoteReplacement to escape these characters.
(Actually, that doesn't mention named group references, so I guess it's only sort of documented.)
Anyway, the takeaway here is that you need to escape the $ characters in the replacement string if you don't want them to be treated as references.
new scala.util.matching.Regex("""(\$\{\w+\})""")
.replaceAllIn("hello ${var1} world ${var2}", m => s"A\\${m.matched}B")
// "hello A${var1}B world A${var2}B"
It's hard to tell what you're expecting the behavior to do. The issue is that s"${m.matched}" is turning into "${var1}" (and "${var2}"). The '$' is special character to say "place the group with name {var1} here instead".
For example:
scala> placeholder.replaceAllIn(str, m => "$1")
res0: String = hello ${var1} world ${var2}
It replaces the match with the first capturing group (which is m itself).
It's hard to tell exactly what you're doing, but you could escape any $ like so:
scala> placeholder.replaceAllIn(str, m => s"${m.matched.replace("$","\\$")}")
res1: String = hello ${var1} world ${var2}
If what you really want to do is evaluate var1/var2 for some variables in the local scope of the method; that's not possible. In fact, the s"Hello, $name" pattern is actually converted into new StringContext("Hello, ", "").s(name) at compile time.

How should characters be escaped in github markdown code blocks

It appears that a github fenced code block doesn't accept arbitrary character strings. A block like this:
```haskell
sinusoid1 = plot_lines_values .~ [[ (x,(am x)) | x <- [0,(0.5)..400]]]
$ plot_lines_style . line_color .~ opaque blue
$ plot_lines_title .~ "am"
$ def
```
ends displaying the code truncated a the first < symbol.
A real example of this is here:
https://github.com/timbod7/haskell-chart/wiki/example-1
Do I need to escape certain character strings in github markdown code blocks? How do I do this?
If you use a backslash before the markdown HTML < tag, the parser will treat it as a literal.
So to show this:
#interface ViewController ()<MPC_MaxCharacterDelimitedTextFieldDelegate>
Escape the < tag with a backslash:
#interface ViewController ()\<MPC_MaxCharacterDelimitedTextFieldDelegate>

How to convert Unicode characters to escape codes

So, I have a bunch of strings like this: {\b\cf12 よろてそ } . I'm thinking I could iterate over each character and replace any unicode (Edit: Anything where AscW(char) > 127 or < 0) with a unicode escape code (\u###). However, I'm not sure how to programmatically do so. Any suggestions?
Clarification:
I have a string like {\b\cf12 よろてそ } and I want a string like {\b\cf12 [STUFF]}, where [STUFF] will display as よろてそ when I view the rtf text.
You can simply use the AscW() function to get the correct value:-
sRTF = "\u" & CStr(AscW(char))
Note unlike other escapes for unicode, RTF uses the decimal signed short int (2 bytes) representation for a unicode character. Which makes the conversion in VB6 really quite easy.
Edit
As MarkJ points out in a comment you would only do this for characters outside of 0-127 but then you would also need to give some other characters inside the 0-127 range special handling as well.
Another more roundabout way, would be to add the MSScript.OCX to the project and interface with VBScript's Escape function. For example
Sub main()
Dim s As String
s = ChrW$(&H3088) & ChrW$(&H308D) & ChrW$(&H3066) & ChrW$(&H305D)
Debug.Print MyEscape(s)
End Sub
Function MyEscape(s As String) As String
Dim scr As Object
Set scr = CreateObject("MSScriptControl.ScriptControl")
scr.Language = "VBScript"
scr.Reset
MyEscape = scr.eval("escape(" & dq(s) & ")")
End Function
Function dq(s)
dq = Chr$(34) & s & Chr$(34)
End Function
The Main routine passes in the original Japanese characters and the debug output says:
%u3088%u308D%u3066%u305D
HTH