What is the correct use of select statement in Go? - select

Good Day everyone
I have been learning the fundaments of go and how to use its channel-based concurrency paradigm.
However, while playing with some code I wrote focusing on the select statement I found a strange behavior:
func main() {
even := make(chan int)
odd := make(chan int)
quit := make(chan bool)
//send
go send(even, odd, quit)
//receive
receive(even, odd, quit)
fmt.Println("Exiting")
}
func send(e, o chan<- int, q chan<- bool) {
for i := 0; i < 100; i++ {
if i%2 == 0 {
e <- i
} else {
o <- i
}
}
close(e)
close(o)
q <- true
close(q)
}
func receive(e, o <-chan int, q <-chan bool) {
for cont, i := true, 0; cont; i++ {
fmt.Println("value of i", i, cont)
select {
case v := <-e:
fmt.Println("From even channel:", v)
case v := <-o:
fmt.Println("from odd channel:", v)
case v := <-q:
fmt.Println("Got exit message", v)
// return // have also tried this instead
cont = false
}
}
}
when I run this simple program sometimes the i accumulator ends up with more than a 100 being printed to the console, and instead of finishing up with a "from odd channel: 99", the for loop continues on outputting one or more zeroes from even/odd channels randomly, as if the quit channel's message was being somewhat being delayed onto its case and instead the odd/even channels were sending more integers thus quitting the for loop not exactly after the odd/even channels have been closed.
value of i 97 true
from odd channel: 97
value of i 98 true
From even channel: 98
value of i 99 true
from odd channel: 99
value of i 100 true
From even channel: 0
value of i 101 true
From even channel: 0
value of i 102 true
from odd channel: 0
value of i 103 true
From even channel: 0
value of i 104 true
Got exit message true
Exiting
I have tried to search for the correct use of the case statement but I haven´t been able to find the problem with my code.
It seems like the same behavior can be reproduced on the go playground: my code
thanks for your attention put on my question.

The program is printing 0 because receive on a closed channel returns the zero value. Here's one way to accomplish your goal.
First, eliminate the q channel. Closing the o and e channels is sufficient to indicate that the sender is done.
func send(e, o chan<- int, q chan<- bool) {
for i := 0; i < 100; i++ {
if i%2 == 0 {
e <- i
} else {
o <- i
}
}
close(e)
close(o)
}
When receiving values, use the two value receive to detect when the zero value is returned because the channel is closed. Set the channel to nil when the channel is closed. Receive on a nil channel does not yield a value. Loop until both channels are nil.
func receive(e, o <-chan int, q <-chan bool) {
for e != nil && o != nil {
select {
case v, ok := <-e:
if !ok {
e = nil
continue
}
fmt.Println("From even channel:", v)
case v, ok := <-o:
if !ok {
o = nil
continue
}
fmt.Println("From odd channel:", v)
}
}
}
playground example

Related

How to use pgcrypto with go-pg for column encryption?

Could you help me please with my problem. I'm trying to use column encryption for postgres and there is one small question: how I can achieve transformation value of column (ex: "test_value") to PGP_SYM_ENCRYPT('test_value', 'KEY') in "insert" sql query?
As I understood, the custom types can be the solution for me, but some things isn't clear... Maybe anyone has an example for my case?
(I see this aws docs about pgcrypto using: https://docs.aws.amazon.com/dms/latest/sql-server-to-aurora-postgresql-migration-playbook/chap-sql-server-aurora-pg.security.columnencryption.html)
What I did:
type sstring struct {
string
}
var _ types.ValueAppender = (*sstring)(nil)
func (tm sstring) AppendValue(b []byte, flags int) ([]byte, error) {
if flags == 1 {
b = append(b, '\'')
}
b = []byte("PGP_SYM_ENCRYPT('123456', 'AES_KEY')")
if flags == 1 {
b = append(b, '\'')
}
return b, nil
}
var _ types.ValueScanner = (*sstring)(nil)
func (tm *sstring) ScanValue(rd types.Reader, n int) error {
if n <= 0 {
tm.string = ""
return nil
}
tmp, err := rd.ReadFullTemp()
if err != nil {
return err
}
tm.string = string(tmp)
return nil
}
type model struct {
ID uint `pg:"id"`
Name string `pg:"name"`
Crypto sstring `pg:"crypto,type:sstring"`
tableName struct{} `pg:"models"`
}
----------
_, err := r.h.ModelContext(ctx, model).Insert()
And... the process just do nothing. Do not respond, do not fall, do not create row in sql table.. Nothing.
Anyway. My question is how to implement wrap some column by sql function using pg-go orm.
I tried to use https://github.com/go-pg/pg/blob/v10/example_custom_test.go#L13-L49 as custom type handler example.. But smth went wrong. =(

Dafny prove lemmas in a high-order polymorphic function

I have been working on an algorithm (Dafny cannot prove function-method equivalence, with High-Order-Polymorphic Recursive vs Linear Iterative) to count the number of subsequences of a sequence that hold a property P. For instance, how many subsequences hold that 'the number of positives on its left part are more that on its right part'.
But this was just to offer some context.
The important function is this CountAux function below. Given a proposition P (like, x is positive), a sequ sequence of sequences, an index i to move through the sequences, and an upper bound j:
function CountAux<T>(P: T -> bool, sequ: seq<T>, i: int, j:int): int
requires 0 <= i <= j <= |sequ|
decreases j - i //necessary to prove termination
ensures CountAux(P,sequ,i,j)>=0;
{
if i == j then 0
else (if P(sequ[i]) then 1 else 0) + CountAux(P, sequ, i+1,j)
}
To finish with it, now, it turns out I need a couple of lemmas (which I strongly believe they are true). But I have no idea how to do prove, could anyone help or provide the proofs? Do not seem difficult, but I am not used to prove in Dafny (sure they can be done using structural induction).
These are the lemmas I would like to prove:
lemma countLemma1<T>(P: T -> bool, sequ: seq<T>,i:int,j:int,k:int)
requires 0<=i<=k<=j<=|sequ|
ensures CountAux(P,sequ,i,j)==CountAux(P,sequ,i,k)+CountAux(P,sequ,k,j)
//i.e. [i,k) [k,j)
{
if sequ == [] {
assert CountAux(P,sequ,i,j)==CountAux(P,sequ,i,k)+CountAux(P,sequ,k,j);
}
else{
assert CountAux(P,sequ,i,j)==CountAux(P,sequ,i,k)+CountAux(P,sequ,k,j);
}
}
lemma countLemma2<T>(P: T -> bool, sequ: seq<T>,i:int,j:int,k:int,l:int)
requires 0<=i<=j<=|sequ| && 0<=k<=l<=j-i
ensures CountAux(P,sequ[i..j],k,l)==CountAux(P,sequ,i+k,i+l)
//that is, counting the subsequence is the same as counting the original sequence with certain displacements
{
if sequ == [] {
assert CountAux(P,sequ[i..j],k,l)==CountAux(P,sequ,i+k,i+l);
}
else{
assert CountAux(P,sequ[i..j],k,l)==CountAux(P,sequ,i+k,i+l);
}
}
EDIT:
I have been trying but it seems I am misunderstanding structural induction. I identified three basic cases. Out of them, I see that if i==0, then the induction should hold (it fails), and therefore if i>0 I try to reach the i==0 using induction:
lemma countID<T>(P: T -> bool, sequ: seq<T>,i:int,j:int,k:int)//[i,k) [k,j)
requires 0<=i<=k<=j<=|sequ|
ensures CountAux(P,sequ,i,j)==CountAux(P,sequ,i,k)+CountAux(P,sequ,k,j)
{
if sequ == [] || (j==0) || (k==0) {
assert CountAux(P,sequ,i,j)==CountAux(P,sequ,i,k)+CountAux(P,sequ,k,j);
}
else {
if (i==0) {
countID(P,sequ[1..],i,j-1,k-1);
assert CountAux(P,sequ[1..],i,j-1)
==CountAux(P,sequ[1..],i,k-1)+CountAux(P,sequ[1..],k-1,j-1);
assert CountAux(P,sequ,i,j)==CountAux(P,sequ,i,k)+CountAux(P,sequ,k,j);
}
else{
//countID(P,sequ[i..],0,j-i,k-i);
//assert CountAux(P,sequ[i..],0,j-i)
// ==CountAux(P,sequ[i..],0,k-i)+CountAux(P,sequ[i..],k-i,j-i);
countID(P,sequ[1..],i-1,j-1,k-1);
assert CountAux(P,sequ[1..],i-1,j-1)
==CountAux(P,sequ[1..],i-1,k-1)+CountAux(P,sequ[1..],k-1,j-1);
}
//assert CountAux(P,sequ,i,j)==CountAux(P,sequ,i,k)+CountAux(P,sequ,k,j);
}
}
You can prove your lemma in recursive manner.
You can refer https://www.rise4fun.com/Dafny/tutorialcontent/Lemmas#h25 for detailed explanation. It also has an example which happens to be very similar to your problem.
lemma countLemma1<T>(P: T -> bool, sequ: seq<T>,i:int,j:int,k:int)
requires 0<=i<=k<=j<=|sequ|
ensures CountAux(P,sequ,i,j)==CountAux(P,sequ,i,k)+CountAux(P,sequ,k,j)
decreases j - i
//i.e. [i,k) [k,j)
{
if i == j {
assert CountAux(P,sequ,i,j)==CountAux(P,sequ,i,k)+CountAux(P,sequ,k,j);
}
else{
if i == k {
assert CountAux(P,sequ,i,j)==CountAux(P,sequ,i,k)+CountAux(P,sequ,k,j);
}
else {
countLemma1(P, sequ, i+1, j, k);
assert CountAux(P,sequ,i,j)==CountAux(P,sequ,i,k)+CountAux(P,sequ,k,j);
}
}
}
lemma countLemma2<T>(P: T -> bool, sequ: seq<T>,i:int,j:int,k:int,l:int)
requires 0<=i<=j<=|sequ| && 0<=k<=l<=j-i
ensures CountAux(P,sequ[i..j],k,l)==CountAux(P,sequ,i+k,i+l)
decreases j - i
//that is, counting the subsequence is the same as counting the original sequence with certain displacements
{
if i == j {
assert CountAux(P,sequ[i..j],k,l) == CountAux(P,sequ,i+k,i+l);
}
else{
if k == l {
assert CountAux(P,sequ[i..j],k,l) == CountAux(P,sequ,i+k,i+l);
}
else {
countLemma1(P, sequ[i..j], k, l, l-1);
assert CountAux(P,sequ[i..j],k,l) == CountAux(P,sequ[i..j],k,l-1) + CountAux(P,sequ[i..j],l-1,l);
countLemma1(P, sequ, i+k, i+l, i+l-1);
assert CountAux(P,sequ,i+k,i+l) == CountAux(P,sequ,i+k,i+l-1) + CountAux(P,sequ,i+l-1,i+l);
countLemma2(P, sequ, i, j-1, k ,l-1);
assert CountAux(P,sequ[i..(j-1)],k,l-1) == CountAux(P,sequ,i+k,i+l-1);
lastIndexDoesntMatter(P, sequ, i,j,k,l);
assert CountAux(P,sequ[i..j],k,l-1) == CountAux(P,sequ[i..(j-1)],k,l-1); // this part is what requires two additional lemmas
assert CountAux(P,sequ[i..j],l-1,l) == CountAux(P,sequ,i+l-1,i+l);
assert CountAux(P,sequ[i..j],k,l) == CountAux(P,sequ,i+k,i+l);
}
}
}
lemma lastIndexDoesntMatter<T>(P: T -> bool, sequ: seq<T>,i:int,j:int,k:int,l:int)
requires i != j
requires k != l
requires 0<=i<=j<=|sequ| && 0<=k<=l<=j-i
ensures CountAux(P,sequ[i..j],k,l-1) == CountAux(P,sequ[i..(j-1)],k,l-1)
{
assert l-1 < j;
if j == i + 1 {
}
else {
unusedLastIndex(P, sequ[i..j], k, l-1);
assert sequ[i..(j-1)] == sequ[i..j][0..(|sequ[i..j]|-1)];
assert CountAux(P,sequ[i..j],k,l-1) == CountAux(P,sequ[i..(j-1)],k,l-1);
}
}
lemma unusedLastIndex<T>(P: T -> bool, sequ: seq<T>, i: int, j:int)
requires 1 < |sequ|
requires 0 <= i <= j < |sequ|
ensures CountAux(P,sequ,i,j) == CountAux(P,sequ[0..(|sequ|-1)],i,j)
decreases j-i
{
if i == j{
}
else {
unusedLastIndex(P, sequ, i+1, j);
}
}

Writing a Swift function that returns itself

I have this piece of code in Python :
def f(x, y):
# do something...
return f
I'm trying to write this in Swift but can't figure out if it's possible or not. The return type would get infinitely long.
Here's a part of the game I'm trying to recreate written in Python. It's a dice game with multiple commentary functions that get invoked on each round. After every round finishes, the commentary function could return itself but with some changes as well (such as changing variables in the enclosing scope).:
def say_scores(score0, score1):
"""A commentary function that announces the score for each player."""
print("Player 0 now has", score0, "and Player 1 now has", score1)
return say_scores
def announce_lead_changes(previous_leader=None):
"""Return a commentary function that announces lead changes."""
def say(score0, score1):
if score0 > score1:
leader = 0
elif score1 > score0:
leader = 1
else:
leader = None
if leader != None and leader != previous_leader:
print('Player', leader, 'takes the lead by', abs(score0 - score1))
return announce_lead_changes(leader)
return say
def both(f, g):
"""Return a commentary function that says what f says, then what g says."""
def say(score0, score1):
return both(f(score0, score1), g(score0, score1))
return say
def announce_highest(who, previous_high=0, previous_score=0):
"""Return a commentary function that announces when WHO's score
increases by more than ever before in the game.
assert who == 0 or who == 1, 'The who argument should indicate a player.'"""
# BEGIN PROBLEM 7
"*** YOUR CODE HERE ***"
def say(score0,score1):
scores = [score0,score1]
score_diff = scores[who]-previous_score
if score_diff > previous_high:
print(score_diff,"point(s)! That's the biggest gain yet for Player",who)
return announce_highest(who,score_diff,scores[who])
return announce_highest(who,previous_high,scores[who])
return say
# END PROBLEM 7
The play function that repeats until some player reaches some score:
def play(strategy0, strategy1, score0=0, score1=0, dice=six_sided,
goal=GOAL_SCORE, say=silence):
"""Simulate a game and return the final scores of both players, with Player
0's score first, and Player 1's score second.
A strategy is a function that takes two total scores as arguments (the
current player's score, and the opponent's score), and returns a number of
dice that the current player will roll this turn.
strategy0: The strategy function for Player 0, who plays first.
strategy1: The strategy function for Player 1, who plays second.
score0: Starting score for Player 0
score1: Starting score for Player 1
dice: A function of zero arguments that simulates a dice roll.
goal: The game ends and someone wins when this score is reached.
say: The commentary function to call at the end of the first turn.
"""
player = 0 # Which player is about to take a turn, 0 (first) or 1 (second)
# BEGIN PROBLEM 5
"*** YOUR CODE HERE ***"
scores = [score0,score1]
strategies = [strategy0,strategy1]
while score0 < goal and score1 < goal:
scores[player] += take_turn(strategies[player](scores[player], scores[other(player)]),
scores[other(player)], dice)
swap = is_swap(scores[player], scores[other(player)])
player = other(player)
if swap:
scores[0],scores[1] = scores[1], scores[0]
score0,score1 = scores[0],scores[1]
# END PROBLEM 5
# BEGIN PROBLEM 6
"*** YOUR CODE HERE ***"
say = say(score0,score1)
# END PROBLEM 6
return score0, score1
Let's try to write such a thing.
func f() {
return f
}
Now the compiler complains because f is not declared to return anything when it does return something.
Okay, let's try to add a return value type i.e. A closure that accepts no parameters and return nothing.
func f() -> (() -> ()) {
return f
}
Now the compiler complains that f is () -> (() -> ()), and so cannot be converted to () -> ().
We should edit the declaration to return a () -> (() -> ()), right?
func f() -> (() -> (() -> ())) {
return f
}
Now f becomes a () -> (() -> (() -> ())), which cannot be converted to a () -> (() -> ())!
See the pattern now? This will continue forever.
Therefore, you can only do this in a type-unsafe way, returning Any:
func f() -> Any { return f }
Usage:
func f() -> Any {
print("Hello")
return f
}
(f() as! (() -> Any))()
The reason why this is possible in python is exactly because Python is weakly typed and you don't need to specify the return type.
Note that I do not encourage you to write this kind of code in Swift. When you code in Swift, try to solve the problem with a Swift mindset. In other words, you should think of another way of solving the problem that does not involve a function like this.
Not exactly what you want perhaps but you can do something similar with a closure
typealias Closure = (Int) -> Int
func doStuff(action: #escaping Closure, value: Int) -> Closure {
let x = action(value)
//do something
return action
}
Well, actually you can do something like that in Swift, only you will have to separate the linear part of code from the recursive, and wrap recursive code in the struct:
// Recursive code goes here:
struct Rec<T> {
let call: (T) -> Rec<T> // when code `from outside` calls it, it will execute linear part and return recursive
init(closure: #escaping (T) -> Void) { // create new loop with linear `closure`
self.call = {
closure($0) // execute linear code
return Rec(closure: closure) // return recursive wrapper
}
}
subscript(input: T) -> Rec<T> { // this exist just to simulate `f(x)` calls, using square brackets notation
return self.call(input)
}
}
// Linear code goes here
let sayScores = Rec { (score0: Int, score1: Int) in
print("Player 0 now has", score0, "and Player 1 now has", score1)
}
Usage:
let temp = sayScores.call((1, 2)) // will print: Player 0 now has 1 and Player 1 now has 2
temp[(0, 0)][(10, 42)] // temp is `Rec<(Int, Int)>`
// will print:
// Player 0 now has 0 and Player 1 now has 0
// Player 0 now has 10 and Player 1 now has 42
So you may make it work, but I don't know whether you should use it in Swift.

Why does this multiple recursion fail over a certain number of recursions?

This code adds up all the integers and number, however it crashes with Segmentation fault: 11 (or bad memory access) at 104829 or larger. Why?
import Foundation
func sigma(_ m: Int64) -> Int64 {
if (m <= 0 ) {
return 0
} else {
return m + sigma(m - 1)
}
}
let number: Int64 = 104829
let answer = sigma(number)
nb: sigma(104828) = 5494507206
Running in terminal on macOS 10.11 on CoreDuo 2 Macbook Pro with 8GB Ram (incase that's relevant!)
You're getting a Stack Overflow. You can get/set the stack size of your current process using getrlimit(2)/setrlimit(2). Here's an example usage:
import Darwin // Unnecessary if you already have Foundation imported
func getStackByteLimit() -> rlimit? {
var limits = rlimit()
guard getrlimit(RLIMIT_STACK, &limits) != -1 else {
perror("Error with getrlimit")
return nil
}
return limits
}
func setStackLimit(bytes: UInt64) -> Bool {
guard let max = getStackByteLimit()?.rlim_max else { return false }
var limits = rlimit(rlim_cur: bytes, rlim_max: max)
guard setrlimit(RLIMIT_STACK, &limits) != -1 else {
perror("Error with setrlimit")
return false
}
return true
}
By default, it's 8,388,608 bytes, (2,048 pages of 4,096 bytes).
Yours is a textbook example of an algorithm that cannot be tail call optimized. The result of the recursive call isn't returned directly, but rather, used as an operand for addition. Because of this, the compiler can't generate code to eliminate away stack frames during recursion. They must stay, in order to keep track of the addition that will need to eventually be done. This algorithm can be improved by using an accumulator parameter:
func sigma(_ m: Int64, acc: Int64 = 0) -> Int64 {
if (m <= 0 ) {
return acc
} else {
return sigma(m - 1, acc: acc + m)
}
}
In this code, the result of the recursive call is returned directly. Because of this, the compiler can write code that removed intermediate stack frames. This should prevent stack overflows.
But really, you can just do this in constant time, without any recursive non-sense :p
func sum(from start: Int64 = 0, to end: Int64) -> Int64 {
let count = end - start + 1
return (start * count + end * count) / 2
}
print(sum(to: 50))

How to read utf16 text file to string in golang?

I can read the file to bytes array
but when I convert it to string
it treat the utf16 bytes as ascii
How to convert it correctly?
package main
import ("fmt"
"os"
"bufio"
)
func main(){
// read whole the file
f, err := os.Open("test.txt")
if err != nil {
fmt.Printf("error opening file: %v\n",err)
os.Exit(1)
}
r := bufio.NewReader(f)
var s,b,e = r.ReadLine()
if e==nil{
fmt.Println(b)
fmt.Println(s)
fmt.Println(string(s))
}
}
output:
false
[255 254 91 0 83 0 99 0 114 0 105 0 112 0 116 0 32 0 73 0 110 0 102 0 111 0 93 0
13 0]
S c r i p t I n f o ]
Update:
After I tested the two examples, I have understanded what is the exact problem now.
In windows, if I add the line break (CR+LF) at the end of the line, the CR will be read in the line. Because the readline function cannot handle unicode correctly ([OD OA]=ok, [OD 00 OA 00]=not ok).
If the readline function can recognize unicode, it should understand [OD 00 OA 00] and return []uint16 rather than []bytes.
So I think I should not use bufio.NewReader as it is not able to read utf16, I don't see bufio.NewReader.ReadLine can accept parameter as flag to indicate the reading text is utf8, utf16le/be or utf32. Is there any readline function for unicode text in go library?
The latest version of golang.org/x/text/encoding/unicode makes it easier to do this because it includes unicode.BOMOverride, which will intelligently interpret the BOM.
Here is ReadFileUTF16(), which is like os.ReadFile() but decodes UTF-16.
package main
import (
"bytes"
"fmt"
"io/ioutil"
"log"
"strings"
"golang.org/x/text/encoding/unicode"
"golang.org/x/text/transform"
)
// Similar to ioutil.ReadFile() but decodes UTF-16. Useful when
// reading data from MS-Windows systems that generate UTF-16BE files,
// but will do the right thing if other BOMs are found.
func ReadFileUTF16(filename string) ([]byte, error) {
// Read the file into a []byte:
raw, err := ioutil.ReadFile(filename)
if err != nil {
return nil, err
}
// Make an tranformer that converts MS-Win default to UTF8:
win16be := unicode.UTF16(unicode.BigEndian, unicode.IgnoreBOM)
// Make a transformer that is like win16be, but abides by BOM:
utf16bom := unicode.BOMOverride(win16be.NewDecoder())
// Make a Reader that uses utf16bom:
unicodeReader := transform.NewReader(bytes.NewReader(raw), utf16bom)
// decode and print:
decoded, err := ioutil.ReadAll(unicodeReader)
return decoded, err
}
func main() {
data, err := ReadFileUTF16("inputfile.txt")
if err != nil {
log.Fatal(err)
}
final := strings.Replace(string(data), "\r\n", "\n", -1)
fmt.Println(final)
}
Here is NewScannerUTF16 which is like os.Open() but returns a scanner.
package main
import (
"bufio"
"fmt"
"log"
"os"
"golang.org/x/text/encoding/unicode"
"golang.org/x/text/transform"
)
type utfScanner interface {
Read(p []byte) (n int, err error)
}
// Creates a scanner similar to os.Open() but decodes the file as UTF-16.
// Useful when reading data from MS-Windows systems that generate UTF-16BE
// files, but will do the right thing if other BOMs are found.
func NewScannerUTF16(filename string) (utfScanner, error) {
// Read the file into a []byte:
file, err := os.Open(filename)
if err != nil {
return nil, err
}
// Make an tranformer that converts MS-Win default to UTF8:
win16be := unicode.UTF16(unicode.BigEndian, unicode.IgnoreBOM)
// Make a transformer that is like win16be, but abides by BOM:
utf16bom := unicode.BOMOverride(win16be.NewDecoder())
// Make a Reader that uses utf16bom:
unicodeReader := transform.NewReader(file, utf16bom)
return unicodeReader, nil
}
func main() {
s, err := NewScannerUTF16("inputfile.txt")
if err != nil {
log.Fatal(err)
}
scanner := bufio.NewScanner(s)
for scanner.Scan() {
fmt.Println(scanner.Text()) // Println will add back the final '\n'
}
if err := scanner.Err(); err != nil {
fmt.Fprintln(os.Stderr, "reading inputfile:", err)
}
}
FYI: I have put these functions into an open source module and have made further improvements. See https://github.com/TomOnTime/utfutil/
UTF16, UTF8, and Byte Order Marks are defined by the Unicode Consortium: UTF-16 FAQ, UTF-8 FAQ, and Byte Order Mark (BOM) FAQ.
Issue 4802: bufio: reading lines is too cumbersome
Reading lines from a file is too cumbersome in Go.
People are often drawn to bufio.Reader.ReadLine because of its name,
but it has a weird signature, returning (line []byte, isPrefix bool,
err error), and requires a lot of work.
ReadSlice and ReadString require a delimiter byte, which is almost
always the obvious and unsightly '\n', and also can return both a line
and an EOF
Revision: f685026a2d38
bufio: new Scanner interface
Add a new, simple interface for scanning (probably textual) data,
based on a new type called Scanner. It does its own internal
buffering, so should be plausibly efficient even without injecting a
bufio.Reader. The format of the input is defined by a "split
function", by default splitting into lines.
go1.1beta1 released
You can download binary and source distributions from the usual place:
https://code.google.com/p/go/downloads/list?q=go1.1beta1
Here's a program which uses the Unicode rules to convert UTF16 text file lines to Go UTF8 encoded strings. The code has been revised to take advantage of the new bufio.Scanner interface in Go 1.1.
package main
import (
"bufio"
"bytes"
"encoding/binary"
"fmt"
"os"
"runtime"
"unicode/utf16"
"unicode/utf8"
)
// UTF16BytesToString converts UTF-16 encoded bytes, in big or little endian byte order,
// to a UTF-8 encoded string.
func UTF16BytesToString(b []byte, o binary.ByteOrder) string {
utf := make([]uint16, (len(b)+(2-1))/2)
for i := 0; i+(2-1) < len(b); i += 2 {
utf[i/2] = o.Uint16(b[i:])
}
if len(b)/2 < len(utf) {
utf[len(utf)-1] = utf8.RuneError
}
return string(utf16.Decode(utf))
}
// UTF-16 endian byte order
const (
unknownEndian = iota
bigEndian
littleEndian
)
// dropCREndian drops a terminal \r from the endian data.
func dropCREndian(data []byte, t1, t2 byte) []byte {
if len(data) > 1 {
if data[len(data)-2] == t1 && data[len(data)-1] == t2 {
return data[0 : len(data)-2]
}
}
return data
}
// dropCRBE drops a terminal \r from the big endian data.
func dropCRBE(data []byte) []byte {
return dropCREndian(data, '\x00', '\r')
}
// dropCRLE drops a terminal \r from the little endian data.
func dropCRLE(data []byte) []byte {
return dropCREndian(data, '\r', '\x00')
}
// dropCR drops a terminal \r from the data.
func dropCR(data []byte) ([]byte, int) {
var endian = unknownEndian
switch ld := len(data); {
case ld != len(dropCRLE(data)):
endian = littleEndian
case ld != len(dropCRBE(data)):
endian = bigEndian
}
return data, endian
}
// SplitFunc is a split function for a Scanner that returns each line of
// text, stripped of any trailing end-of-line marker. The returned line may
// be empty. The end-of-line marker is one optional carriage return followed
// by one mandatory newline. In regular expression notation, it is `\r?\n`.
// The last non-empty line of input will be returned even if it has no
// newline.
func ScanUTF16LinesFunc(byteOrder binary.ByteOrder) (bufio.SplitFunc, func() binary.ByteOrder) {
// Function closure variables
var endian = unknownEndian
switch byteOrder {
case binary.BigEndian:
endian = bigEndian
case binary.LittleEndian:
endian = littleEndian
}
const bom = 0xFEFF
var checkBOM bool = endian == unknownEndian
// Scanner split function
splitFunc := func(data []byte, atEOF bool) (advance int, token []byte, err error) {
if atEOF && len(data) == 0 {
return 0, nil, nil
}
if checkBOM {
checkBOM = false
if len(data) > 1 {
switch uint16(bom) {
case uint16(data[0])<<8 | uint16(data[1]):
endian = bigEndian
return 2, nil, nil
case uint16(data[1])<<8 | uint16(data[0]):
endian = littleEndian
return 2, nil, nil
}
}
}
// Scan for newline-terminated lines.
i := 0
for {
j := bytes.IndexByte(data[i:], '\n')
if j < 0 {
break
}
i += j
switch e := i % 2; e {
case 1: // UTF-16BE
if endian != littleEndian {
if i > 1 {
if data[i-1] == '\x00' {
endian = bigEndian
// We have a full newline-terminated line.
return i + 1, dropCRBE(data[0 : i-1]), nil
}
}
}
case 0: // UTF-16LE
if endian != bigEndian {
if i+1 < len(data) {
i++
if data[i] == '\x00' {
endian = littleEndian
// We have a full newline-terminated line.
return i + 1, dropCRLE(data[0 : i-1]), nil
}
}
}
}
i++
}
// If we're at EOF, we have a final, non-terminated line. Return it.
if atEOF {
// drop CR.
advance = len(data)
switch endian {
case bigEndian:
data = dropCRBE(data)
case littleEndian:
data = dropCRLE(data)
default:
data, endian = dropCR(data)
}
if endian == unknownEndian {
if runtime.GOOS == "windows" {
endian = littleEndian
} else {
endian = bigEndian
}
}
return advance, data, nil
}
// Request more data.
return 0, nil, nil
}
// Endian byte order function
orderFunc := func() (byteOrder binary.ByteOrder) {
switch endian {
case bigEndian:
byteOrder = binary.BigEndian
case littleEndian:
byteOrder = binary.LittleEndian
}
return byteOrder
}
return splitFunc, orderFunc
}
func main() {
file, err := os.Open("utf16.le.txt")
if err != nil {
fmt.Println(err)
os.Exit(1)
}
defer file.Close()
fmt.Println(file.Name())
rdr := bufio.NewReader(file)
scanner := bufio.NewScanner(rdr)
var bo binary.ByteOrder // unknown, infer from data
// bo = binary.LittleEndian // windows
splitFunc, orderFunc := ScanUTF16LinesFunc(bo)
scanner.Split(splitFunc)
for scanner.Scan() {
b := scanner.Bytes()
s := UTF16BytesToString(b, orderFunc())
fmt.Println(len(s), s)
fmt.Println(len(b), b)
}
fmt.Println(orderFunc())
if err := scanner.Err(); err != nil {
fmt.Println(err)
}
}
Output:
utf16.le.txt
15 "Hello, 世界"
22 [34 0 72 0 101 0 108 0 108 0 111 0 44 0 32 0 22 78 76 117 34 0]
0
0 []
15 "Hello, 世界"
22 [34 0 72 0 101 0 108 0 108 0 111 0 44 0 32 0 22 78 76 117 34 0]
LittleEndian
utf16.be.txt
15 "Hello, 世界"
22 [0 34 0 72 0 101 0 108 0 108 0 111 0 44 0 32 78 22 117 76 0 34]
0
0 []
15 "Hello, 世界"
22 [0 34 0 72 0 101 0 108 0 108 0 111 0 44 0 32 78 22 117 76 0 34]
BigEndian
Here is the simplest way to read it:
package main
import (
"bufio"
"fmt"
"log"
"os"
"golang.org/x/text/encoding/unicode"
"golang.org/x/text/transform"
)
func main() {
file, err := os.Open("./text.txt")
if err != nil {
log.Fatal(err)
}
scanner := bufio.NewScanner(transform.NewReader(file, unicode.UTF16(unicode.LittleEndian, unicode.UseBOM).NewDecoder()))
for scanner.Scan() {
fmt.Printf(scanner.Text())
}
}
since Windows use little-endian order by default link, we use unicode.UseBOM policy to retrieve BOM from the text, and unicode.LittleEndian as a fallback
For example:
package main
import (
"errors"
"fmt"
"log"
"unicode/utf16"
)
func utf16toString(b []uint8) (string, error) {
if len(b)&1 != 0 {
return "", errors.New("len(b) must be even")
}
// Check BOM
var bom int
if len(b) >= 2 {
switch n := int(b[0])<<8 | int(b[1]); n {
case 0xfffe:
bom = 1
fallthrough
case 0xfeff:
b = b[2:]
}
}
w := make([]uint16, len(b)/2)
for i := range w {
w[i] = uint16(b[2*i+bom&1])<<8 | uint16(b[2*i+(bom+1)&1])
}
return string(utf16.Decode(w)), nil
}
func main() {
// Simulated data from e.g. a file
b := []byte{255, 254, 91, 0, 83, 0, 99, 0, 114, 0, 105, 0, 112, 0, 116, 0, 32, 0, 73, 0, 110, 0, 102, 0, 111, 0, 93, 0, 13, 0}
s, err := utf16toString(b)
if err != nil {
log.Fatal(err)
}
fmt.Printf("%q", s)
}
(Also here)
Output:
"[Script Info]\r"
If you want anything to print as a string you could use fmt.Sprint
package main
import (
"bufio"
"fmt"
"os"
)
func main() {
// read whole the file
f, err := os.Open("test.txt")
if err != nil {
fmt.Printf("error opening file: %v\n", err)
return
}
r := bufio.NewReader(f)
var s, _, e = r.ReadLine()
if e != nil {
fmt.Println(e)
return
}
fmt.Println(fmt.Sprint(string(s)))
}