I am a Go beginner and stuck with a problem.
I want to encode a string with UTF16 little endian and then hash it with MD5 (hexadecimal). I have found a piece of Python code, which does exactly what I want. But I am not able to transfer it to Google Go.
md5 = hashlib.md5()
md5.update(challenge.encode('utf-16le'))
response = md5.hexdigest()
The challenge is a variable containing a string.
You can do it with less work (or at least more understandability, IMO) by using golang.org/x/text/encoding and golang.org/x/text/transform to create a Writer chain that will do the encoding and hashing without so much manual byte slice handling. The equivalent function:
func utf16leMd5(s string) []byte {
enc := unicode.UTF16(unicode.LittleEndian, unicode.IgnoreBOM).NewEncoder()
hasher := md5.New()
t := transform.NewWriter(hasher, enc)
t.Write([]byte(s))
return hasher.Sum(nil)
}
You can use the unicode/utf16 package for UTF-16 encoding. utf16.Encode() returns the UTF-16 encoding of the Unicode code point sequence (slice of runes: []rune). You can simply convert a string to a slice of runes, e.g. []rune("some string"), and you can easily produce the byte sequence of the little-endian encoding by ranging over the uint16 codes and sending/appending first the low byte then the high byte to the output (this is what Little Endian means).
For Little Endian encoding, alternatively you can use the encoding/binary package: it has an exported LittleEndian variable and it has a PutUint16() method.
As for the MD5 checksum, the crypto/md5 package has what you want, md5.Sum() simply returns the MD5 checksum of the byte slice passed to it.
Here's a little function that captures what you want to do:
func utf16leMd5(s string) [16]byte {
codes := utf16.Encode([]rune(s))
b := make([]byte, len(codes)*2)
for i, r := range codes {
b[i*2] = byte(r)
b[i*2+1] = byte(r >> 8)
}
return md5.Sum(b)
}
Using it:
s := "Hello, playground"
fmt.Printf("%x\n", utf16leMd5(s))
s = "エヌガミ"
fmt.Printf("%x\n", utf16leMd5(s))
Output:
8f4a54c6ac7b88936e990256cc9d335b
5f0db9e9859fd27f750eb1a212ad6212
Try it on the Go Playground.
The variant that uses encoding/binary would look like this:
for i, r := range codes {
binary.LittleEndian.PutUint16(b[i*2:], r)
}
(Although this is slower as it creates lots of new slice headers.)
So, for reference, I used this complete python program:
import hashlib
import codecs
md5 = hashlib.md5()
md5.update(codecs.encode('Hello, playground', 'utf-16le'))
response = md5.hexdigest()
print response
It prints 8f4a54c6ac7b88936e990256cc9d335b
Here is the Go equivalent: https://play.golang.org/p/Nbzz1dCSGI
package main
import (
"crypto/md5"
"encoding/binary"
"encoding/hex"
"fmt"
"unicode/utf16"
)
func main() {
s := "Hello, playground"
fmt.Println(md5Utf16le(s))
}
func md5Utf16le(s string) string {
encoded := utf16.Encode([]rune(s))
b := convertUTF16ToLittleEndianBytes(encoded)
return md5Hexadecimal(b)
}
func md5Hexadecimal(b []byte) string {
h := md5.New()
h.Write(b)
return hex.EncodeToString(h.Sum(nil))
}
func convertUTF16ToLittleEndianBytes(u []uint16) []byte {
b := make([]byte, 2*len(u))
for index, value := range u {
binary.LittleEndian.PutUint16(b[index*2:], value)
}
return b
}
Related
I'm passing password into sha256. I successfully create sha256 and can also print it. The problem begins when I'm trying to convert digest.bytes into a string and append it.
import 'package:crypto/crypto.dart';
var url = "http://example_api.php?";
url += '&hash=';
// hash the password
var bytes = utf8.encode(password);
var digest = sha256.convert(bytes);
print("Digest as hex string: $digest");
url += String.fromCharCodes(digest.bytes);
This is printed: Digest as hex string: 03ac674216f3e15c761ee1a5e255f067953623c8b388b4459e13f978d7c846f4
This is appended to url: ¬gBóá\vá¥âUðg6#ȳ´Eùx×ÈFô
What am I doing wrong? I also tried utf8.decode method but using it gives me an error.
When you print digest, the print method will call digest.toString(), which is implemented to return a string of the digest bytes using a hexadecimal representation. If you want the same thing you have several options:
Call digest.toString() explicitly (or implicitly)
final digestHex = digest.toString(); // explicitly
final digestHex = '$digest'; // implicitly
Map the byte array to its hexadecimal equivalent
final digestHex = digest.bytes.map((b) => b.toRadixString(16).padLeft(2, '0')).join();
Use the convert package (this is what the crypto package does)
import 'package:convert/convert.dart';
...
final digestHex = hex.encode(digest.bytes);
The reason you are getting an error using utf8.decode is that your digest isn't an encoded UTF-8 string but a list of bytes that for all intents and purposes are completely random. You are trying to directly convert the bytes into a string, and doing so is easier if you can assume that they already represent a valid string. With the byte output from a hashing algorithm, though, you cannot safely make such an assumption.
However, if for some reason you still want to use this option, use the second optional parameter for utf8.decode to force it to try and decode the bytes anyway:
final digestString = utf8.decode(bytes, allowMalformed: true);
For reference, a byte list of [1, 255, 47, 143, 6, 80, 33, 202] results in "�/�P!�" where "�" represents an invalid/control character. You do not want to use this option, especially where the string will become part of a URL (as it's virtually guaranteed that the resulting string will not be web-safe).
For the hexadecimal representation of a Digest object, please explicitly call Digest.toString() (though in formatted strings, i.e. "url${digest}", this is done for you implicitly).
I'm frankly not familiar with String.fromCharCode, but I think it's looking for UTF-16 and not UTF-8 bits. I wrote a terminal example to show this, and how the outputs differ.
import 'dart:core';
import 'dart:convert';
import 'package:crypto/crypto.dart';
void main() {
const String password = "mypassword";
// hash the password
var bytes = utf8.encode(password);
var digest = sha256.convert(bytes);
// different formats
var bytesDigest = digest.bytes;
var hexDigest = digest.toString();
String url = "http://example_api.php?hash=";
print(url + hexDigest);
print(url + String.fromCharCodes(bytesDigest));
}
Output:
> dart test.dart
http://example_api.php?hash=89e01536ac207279409d4de1e5253e01f4a1769e696db0d6062ca9b8f56767c8
http://example_api.php?hash=à6¬ ry#Ö,©¸õggÈ
I have a struct referencing a *big.Int. When storing this struct naively into MongoDB (using the official driver) the field turns to be nil when fetching the struct back. What is the proper/best way to store a big.Int into MongoDB?
type MyStruct struct {
Number *big.Int
}
nb := MyStruct{Number: big.NewInt(42)}
r, _ := db.Collection("test").InsertOne(context.TODO(), nb)
result := &MyStruct{}
db.Collection("test").FindOne(context.TODO(), bson.D{{"_id", r.InsertedID}}).Decode(result)
fmt.Println(result) // <== Number will be 0 here
My best idea so far would be to create a wrapper around big.Int that implements MarshalBSON and UnmarshalBSON (which I am not even sure how to do properly to be honest). But that'd be quite inconvenient.
Here's a possible implementation I came up with that stores the big.Int as plain text into MongoDb. It is also possible to easily store as byte array by using methods Bytes and SetBytes of big.Int instead of MarshalText/UnmarshalText.
package common
import (
"fmt"
"math/big"
"go.mongodb.org/mongo-driver/bson"
)
type BigInt struct {
i *big.Int
}
func NewBigInt(bigint *big.Int) *BigInt {
return &BigInt{i: bigint}
}
func (bi *BigInt) Int() *big.Int {
return bi.i
}
func (bi *BigInt) MarshalBSON() ([]byte, error) {
txt, err := bi.i.MarshalText()
if err != nil {
return nil, err
}
a, err := bson.Marshal(map[string]string{"i": string(txt)})
return a, err
}
func (bi *BigInt) UnmarshalBSON(data []byte) error {
var d bson.D
err := bson.Unmarshal(data, &d)
if err != nil {
return err
}
if v, ok := d.Map()["i"]; ok {
bi.i = big.NewInt(0)
return bi.i.UnmarshalText([]byte(v.(string)))
}
return fmt.Errorf("key 'i' missing")
}
the field turns to be nil when fetching the struct back
The reason why it returns 0, is because there is no bson mapping available for big.Int. If you check the document inserted into MongoDB collection you should see something similar to below:
{
"_id": ObjectId("..."),
"number": {}
}
Where there is no value stored in number field.
What is the proper/best way to store a big.Int into MongoDB?
Let's first understand what BigInt is. Big integer data type is intended for use when integer values might exceed the range that is supported by the int data type. The range is -2^63 (-9,223,372,036,854,775,808) to 2^63-1 (9,223,372,036,854,775,807) with storage size of 8 bytes. Generally this is used in SQL.
In Go, you can use a more precise integer types. Available built-in types int8, int16, int32, and int64 (and their unsigned counterparts) are best suited for data. The counterparts for big integer in Go is int64. With range -9,223,372,036,854,775,808 through 9,223,372,036,854,775,807, and storage size of 8 bytes.
Using mongo-go-driver,you can just use int64 which will be converted to bson RawValue.Int64. For example:
type MyStruct struct {
Number int64
}
collection := db.Collection("tests")
nb := MyStruct{Number: int64(42)}
r, _ := collection.InsertOne(context.TODO(), nb)
var result MyStruct
collection.FindOne(context.TODO(), bson.D{{"_id", r.InsertedID}}).Decode(&result)
This is a simple question, but I still can't figure out how to do it.
Say I have this string:
x := "this string"
The whitespace between 'this' and 'string' defaults to the regular unicode whitespace character 32/U+0020. How would I convert it into the non-breaking unicode whitespace character U+00A0 in Go?
Use the documentation to identify the standard strings package as a likely candidate, and then search it (or read through it all, you should know what's available in the standard library/packages of any language you use) to find strings.Map.
Then the obvious short simple solution to convert any white space would be:
package main
import (
"fmt"
"strings"
"unicode"
)
func main() {
const nbsp = '\u00A0'
result := strings.Map(func(r rune) rune {
if unicode.IsSpace(r) {
return nbsp
}
return r
}, "this string")
fmt.Printf("%s → %[1]q\n", result)
}
Playground
As previously mentioned, if you really only want to replace " " then perhaps strings.Replace.
I think a basic way to do it is by creating a simple function:
http://play.golang.org/p/YT8Cf917il
package main
import "fmt"
func ReplaceSpace(s string) string {
var result []rune
const badSpace = '\u0020'
for _, r := range s {
if r == badSpace {
result = append(result, '\u00A0')
continue
}
result = append(result, r)
}
return string(result)
}
func main() {
fmt.Println(ReplaceSpace("this string"))
}
If you need more advanced manipulations you could create something with
"golang.org/x/text/transform"
"golang.org/x/text/unicode/norm"
Read http://blog.golang.org/normalization for more information on how to use it
I'm using goyaml as a YAML beautifier. By loading and dumping a YAML file, I can source-format it. I unmarshal the data from a YAML source file into a struct, marshal those bytes, and write the bytes to an output file. But the process morphs my Unicode strings into the literal version of the quoted strings, and I don't know how to reverse it.
Example input subtitle.yaml:
line: 你好
I've stripped everything down to the smallest reproducible problem. Here's the code, using _ to catch errors which don't pop-up:
package main
import (
"io/ioutil"
//"unicode/utf8"
//"fmt"
"gopkg.in/yaml.v1"
)
type Subtitle struct {
Line string
}
func main() {
filename := "subtitle.yaml"
in, _ := ioutil.ReadFile(filename)
var subtitle Subtitle
_ = goyaml.Unmarshal(in, &subtitle)
out, _ := goyaml.Marshal(&subtitle)
//for len(out) > 0 { // For debugging, see what the runes are
// r, size := utf8.DecodeRune(out)
// fmt.Printf("%c ", r)
// out = out[size:]
//}
_ = ioutil.WriteFile(filename, out, 0644)
}
Actual output subtitle.yaml:
line: "\u4F60\u597D"
I want to reverse the weirdness in goyaml after I get the variable out.
The commented-out rune-printing code block, which adds spaces between runes for clarity, outputs the following. It shows that Unicode runes like 你 aren't being decoded, but treated literally:
l i n e : " \ u 4 F 6 0 \ u 5 9 7 D "
How can I unquote out, before writing it to the output file, so that the output looks like the input (albeit beautified)?
Desired output subtitle.yaml:
line: "你好"
Temporary Solution
I've filed https://github.com/go-yaml/yaml/issues/11. In the meantime, #bobince's tip on yaml_emitter_set_unicode was helpful in unconvering the problem. It was defined as a C binding but never called (or given an option to set it)! I changed encode.go and added yaml_emitter_set_unicode(&e.emitter, true) to line 20, and everything works as expected. It would be better to make it optional, but that would require a change in the Marshal API.
Had a similar issue and could apply this to circumvent the bug in goyaml.Marshal(). (*Regexp) ReplaceAllFunc is your friend which you can use to expand the escaped Unicode runes in the byte array. A little bit too dirty for production maybe, but works for the example ;-)
package main
import (
"io/ioutil"
"unicode/utf8"
"regexp"
"strconv"
"launchpad.net/goyaml"
)
type Subtitle struct {
Line string
}
var reFind = regexp.MustCompile(`^\s*[^\s\:]+\:\s*".*\\u.*"\s*$`)
var reFindU = regexp.MustCompile(`\\u[0-9a-fA-F]{4}`)
func expandUnicodeInYamlLine(line []byte) []byte {
// TODO: restrict this to the quoted string value
return reFindU.ReplaceAllFunc(line, expandUnicodeRune)
}
func expandUnicodeRune(esc []byte) []byte {
ri, _:= strconv.ParseInt(string(esc[2:]), 16, 32)
r := rune(ri)
repr := make([]byte, utf8.RuneLen(r))
utf8.EncodeRune(repr, r)
return repr
}
func main() {
filename := "subtitle.yaml"
filenameOut := "subtitleout.yaml"
in, _ := ioutil.ReadFile(filename)
var subtitle Subtitle
_ = goyaml.Unmarshal(in, &subtitle)
out, _ := goyaml.Marshal(&subtitle)
out = reFind.ReplaceAllFunc(out, expandUnicodeInYamlLine)
_ = ioutil.WriteFile(filenameOut, out, 0644)
}
If you run fmt.Println("\u554a"), it shows '啊'.
But how to get unicode-style-string \u554a from a rune '啊' ?
package main
import "fmt"
import "strconv"
func main() {
quoted := strconv.QuoteRuneToASCII('啊') // quoted = "'\u554a'"
unquoted := quoted[1:len(quoted)-1] // unquoted = "\u554a"
fmt.Println(unquoted)
}
This outputs:
\u554a
IMHO, it should be better:
func RuneToAscii(r rune) string {
if r < 128 {
return string(r)
} else {
return "\\u" + strconv.FormatInt(int64(r), 16)
}
}
You can use fmt.Sprintf along with %U to get the hexadecimal value:
test = fmt.Sprintf("%U", '啊')
fmt.Println("\\u" + test[2:]) // Print \u554A
For example,
package main
import "fmt"
func main() {
r := rune('啊')
u := fmt.Sprintf("%U", r)
fmt.Println(string(r), u)
}
Output:
啊 U+554A
fmt.Printf("\\u%X", '啊')
http://play.golang.org/p/Jh9ns8Qh15
(Upper or lowercase 'x' will control the case of the hex characters)
As hinted at by package fmt's documentation:
%U Unicode format: U+1234; same as "U+%04X"
package main
import "fmt"
func main() {
fmt.Printf("%+q", '啊')
}
I'd like to add to the answer that hardPass has.
In the case where the hex representation of the unicode is less that 4 characters (ü for example) strconv.FormatInt will result in \ufc which will result in a unicode syntax error in Go. As opposed to the full \u00fc that Go understands.
Padding the hex with zeros using fmt.Sprintf with hex formatting will fix this:
func RuneToAscii(r rune) string {
if r < 128 {
return string(r)
} else {
return fmt.Sprintf("\\u%04x", r)
}
}
https://play.golang.org/p/80w29oeBec1
This would do the job..
package main
import (
"fmt"
)
func main() {
str := fmt.Sprintf("%s", []byte{0x80})
fmt.Println(str)
}