Does PureScript support “format strings” like C / Java etc.? - purescript

I need to output a number with leading zeros and as six digits. In C or Java I would use "%06d" as a format string to do this. Does PureScript support format strings? Or how would I achieve this?

I don't know of any module that would support a printf-style functionality in PureScript. It would be very nice to have a type-safe way to format numbers.
In the meantime, I would write something likes this:
import Data.String (length, fromCharArray)
import Data.Array (replicate)
-- | Pad a string with the given character up to a maximum length.
padLeft :: Char -> Int -> String -> String
padLeft c len str = prefix <> str
where prefix = fromCharArray (replicate (len - length str) c)
-- | Pad a number with leading zeros up to the given length.
padZeros :: Int -> Int -> String
padZeros len num | num >= 0 = padLeft '0' len (show num)
| otherwise = "-" <> padLeft '0' len (show (-num))
Which produces the following results:
> padZeros 6 8
"000008"
> padZeros 6 678
"000678"
> padZeros 6 345678
"345678"
> padZeros 6 12345678
"12345678"
> padZeros 6 (-678)
"-000678"
Edit: In the meantime, I've written a small module that can format numbers in this way:
https://github.com/sharkdp/purescript-format
For your particular example, you would need to do the following:
If you want to format Integers:
> format (width 6 <> zeroFill) 123
"000123"
If you want to format Numbers
> format (width 6 <> zeroFill <> precision 1) 12.345
"0012.3"

Related

Ocaml: unicode string length

In OCaml, how can I compute the length of a string that may have unicode encodings ? To give an example, here is my problem:
utop # "\u{02227}";;
- : string = "∧"
utop # Caml.String.length "\u{02227}";;
- : int = 3
utop # Base.String.length "\u{02227}";;
- : int = 3
and I would like to obtain the obvious answer: 1.
If you want to count the number of extended grapheme clusters (aka a graphical character), you can use uuseg. For instance
let len = Uuseg_string.fold_utf_8 `Grapheme_cluster (fun x _ -> x + 1) 0
let n = len "∧";;
returns
val n : int = 1

PowerShell formating numbers by variables

I have integer values: 3 60 150 1500 and float values 1.23354, 1.234, 1.234567...
I calculate the number of digits of the biggest integer:
$nInt = [System.Math]::Ceiling([math]::log10($maxInt))
# nInt = 4
and in another way the biggest number of dec. behind the decimal point of the float-variable: $nDec = 6
How can I format a print out that all integer do have the same string-length with leading spaces?
|1500
| 500
| 60
| 3
And all float with the same string-length as well?
1.234567|
1.23354 |
1.234 |
The | is just to mark my 'point of measure'.
Of course I have to choose a character-set where all characters do have the same pixex-size.
I am thinking of formatting by "{0:n}" or $int.ToString(""), but I can't see how to use this.
Try PadLeft or PadRight. For example, for your integers:
$maxInt.ToString().PadLeft($nInt.ToString().Length, ' ')

How to write a unicode symbol in lua

How can I write a Unicode symbol in lua. For example I have to write symbol with 9658
when I write
string.char( 9658 );
I got an error. So how is it possible to write such a symbol.
Lua does not look inside strings. So, you can just write
mychar = "►"
(added in 2015)
Lua 5.3 introduced support for UTF-8 escape sequences:
The UTF-8 encoding of a Unicode character can be inserted in a literal string with the escape sequence \u{XXX} (note the mandatory enclosing brackets), where XXX is a sequence of one or more hexadecimal digits representing the character code point.
You can also use utf8.char(9658).
Here is an encoder for Lua that takes a Unicode code point and produces a UTF-8 string for the corresponding character:
do
local bytemarkers = { {0x7FF,192}, {0xFFFF,224}, {0x1FFFFF,240} }
function utf8(decimal)
if decimal<128 then return string.char(decimal) end
local charbytes = {}
for bytes,vals in ipairs(bytemarkers) do
if decimal<=vals[1] then
for b=bytes+1,2,-1 do
local mod = decimal%64
decimal = (decimal-mod)/64
charbytes[b] = string.char(128+mod)
end
charbytes[1] = string.char(vals[2]+decimal)
break
end
end
return table.concat(charbytes)
end
end
c=utf8(0x24) print(c.." is "..#c.." bytes.") --> $ is 1 bytes.
c=utf8(0xA2) print(c.." is "..#c.." bytes.") --> ¢ is 2 bytes.
c=utf8(0x20AC) print(c.." is "..#c.." bytes.") --> € is 3 bytes.
c=utf8(0x24B62) print(c.." is "..#c.." bytes.") --> 𤭢 is 4 bytes.
Maybe this can help you:
function FromUTF8(pos)
local mod = math.mod
local function charat(p)
local v = editor.CharAt[p]; if v < 0 then v = v + 256 end; return v
end
local v, c, n = 0, charat(pos), 1
if c < 128 then v = c
elseif c < 192 then
error("Byte values between 0x80 to 0xBF cannot start a multibyte sequence")
elseif c < 224 then v = mod(c, 32); n = 2
elseif c < 240 then v = mod(c, 16); n = 3
elseif c < 248 then v = mod(c, 8); n = 4
elseif c < 252 then v = mod(c, 4); n = 5
elseif c < 254 then v = mod(c, 2); n = 6
else
error("Byte values between 0xFE and OxFF cannot start a multibyte sequence")
end
for i = 2, n do
pos = pos + 1; c = charat(pos)
if c < 128 or c > 191 then
error("Following bytes must have values between 0x80 and 0xBF")
end
v = v * 64 + mod(c, 64)
end
return v, pos, n
end
To get broader support for Unicode string content, one approach is slnunicode which was developed as part of the Selene database library. It will give you a module that supports most of what the standard string library does, but with Unicode characters and UTF-8 encoding.

preventing overlong forms when parsing UTF-8

I have been working on another UTF-8 parser as a personal exercise, and while my implementation works quite well, and it rejects most malformed sequences (replacing them with U+FFFD), I can't seem to figure out how to implement rejection of overlong forms. Could anyone tell me how to do so?
Pseudocode:
let w = 0, // the number of continuation bytes pending
c = 0, // the currently being constructed codepoint
b, // the current byte from the source stream
valid(c) = (
(c < 0x110000) &&
((c & 0xFFFFF800) != 0xD800) &&
((c < 0xFDD0) || (c > 0xFDEF)) &&
((c & 0xFFFE) != 0xFFFE))
for each b:
if b < 0x80:
if w > 0: // premature ending to multi-byte sequence
append U+FFFD to output string
w = 0
append U+b to output string
else if b < 0xc0:
if w == 0: // unwanted continuation byte
append U+FFFD to output string
else:
c |= (b & 0x3f) << (--w * 6)
if w == 0: // done
if valid(c):
append U+c to output string
else if b < 0xfe:
if w > 0: // premature ending to multi-byte sequence
append U+FFFD to output string
w = (b < 0xe0) ? 1 :
(b < 0xf0) ? 2 :
(b < 0xf8) ? 3 :
(b < 0xfc) ? 4 : 5;
c = (b & ((1 << (6 - w)) - 1)) << (w * 6); // ugly monstrosity
else:
append U+FFFD to output string
if w > 0: // end of stream and we're still waiting for continuation bytes
append U+FFFD to output string
If you save the number of bytes you'll need (so you save a second copy of the initial value of w), you can compare the UTF32 value of the codepoint (I think you are calling it c) with the number of bytes that were used to encode it. You know that:
U+0000 - U+007F 1 byte
U+0080 - U+07FF 2 bytes
U+0800 - U+FFFF 3 bytes
U+10000 - U+1FFFFF 4 bytes
U+200000 - U+3FFFFFF 5 bytes
U+4000000 - U+7FFFFFFF 6 bytes
(and I hope I have done the right math on the left column! Hex math isn't my strong point :-) )
Just as a sidenote: I think there are some logic errors/formatting errors. if b < 0x80 if w > 0 what happens if w = 0? (so for example if you are decoding A)? And shouldn't you reset c when you find an illegal codepoint?
Once you have the decoded character, you can tell how many bytes it should have had if properly encoded just by looking at the highest bit set.
If the highest set bit's position is <= 7, the UTF-8 encoding requires 1 octet.
If the highest set bit's position is <= 11, the UTF-8 encoding requires 2 octets.
If the highest set bit's position is <= 16, the UTF-8 encoding requires 3 octets.
etc.
If you save the original w and compare it to these values, you'll be able to tell if the encoding was proper or overlong.
I had initially thought that if at any point in time after decoding a byte, w > 0 && c == 0, you have an overlong form. However, it's more complicated than that as Jan pointed out. The simplest answer is probably to have a table like xanatos has, only rejecting anything longer than 4 bytes:
if c < 0x80 && len > 1 ||
c < 0x800 && len > 2 ||
c < 0x10000 && len > 3 ||
len > 4:
append U+FFFD to output string

Three boolean values saved in one tinyint

probably a simple question but I seem to be suffering from programmer's block. :)
I have three boolean values: A, B, and C. I would like to save the state combination as an unsigned tinyint (max 255) into a database and be able to derive the states from the saved integer.
Even though there are only a limited number of combinations, I would like to avoid hard-coding each state combination to a specific value (something like if A=true and B=true has the value 1).
I tried to assign values to the variables so (A=1, B=2, C=3) and then adding, but I can't differentiate between A and B being true from i.e. only C being true.
I am stumped but pretty sure that it is possible.
Thanks
Binary maths I think. Choose a location that's a power of 2 (1, 2, 4, 8 etch) then you can use the 'bitwise and' operator & to determine the value.
Say A = 1, B = 2 , C= 4
00000111 => A B and C => 7
00000101 => A and C => 5
00000100 => C => 4
then to determine them :
if( val & 4 ) // same as if (C)
if( val & 2 ) // same as if (B)
if( val & 1 ) // same as if (A)
if((val & 4) && (val & 2) ) // same as if (C and B)
No need for a state table.
Edit: to reflect comment
If the tinyint has a maximum value of 255 => you have 8 bits to play with and can store 8 boolean values in there
binary math as others have said
encoding:
myTinyInt = A*1 + B*2 + C*4 (assuming you convert A,B,C to 0 or 1 beforehand)
decoding
bool A = myTinyInt & 1 != 0 (& is the bitwise and operator in many languages)
bool B = myTinyInt & 2 != 0
bool C = myTinyInt & 4 != 0
I'll add that you should find a way to not use magic numbers. You can build masks into constants using the Left Logical/Bit Shift with a constant bit position that is the position of the flag of interest in the bit field. (Wow... that makes almost no sense.) An example in C++ would be:
enum Flags {
kBitMask_A = (1 << 0),
kBitMask_B = (1 << 1),
kBitMask_C = (1 << 2),
};
uint8_t byte = 0; // byte = 0b00000000
byte |= kBitMask_A; // Set A, byte = 0b00000001
byte |= kBitMask_C; // Set C, byte = 0b00000101
if (byte & kBitMask_A) { // Test A, (0b00000101 & 0b00000001) = T
byte &= ~kBitMask_A; // Clear A, byte = 0b00000100
}
In any case, I would recommend looking for Bitset support in your favorite programming language. Many languages will abstract the logical operations away behind normal arithmetic or "test/set" operations.
Need to use binary...
A = 1,
B = 2,
C = 4,
D = 8,
E = 16,
F = 32,
G = 64,
H = 128
This means A + B = 3 but C = 4. You'll never have two conflicting values. I've listed the maximum you can have for a single byte, 8 values or (bits).