I am trying to read and write data from a net.Conn but since I have only Read([]byte) and Write([]byte) functions, I am finding quite hard to find helper functions to do this job.
I need to read and write the following types:
uint64
byte
uint32
UTF-8 encoded string ( first a uint32 length and the string data after)
In Short
Is there anything like Java's DataInputStream and DataOutputStream in Go's packages ?
Thanks and regards
You need to decide on a format to marshal to and from. Your choices are to either roll your own format or to use one that was already made. I highly recommend the latter.
I have previously posted about many of the formats supported in the go standard library here: https://stackoverflow.com/a/13575325/727643
If you decide to roll your own, then uints can be encoded and decoded from []byte using encoding/binary. It gives you the option of both little and big endian. Strings can be converted directly to []byte using []byte(str). Finally, bytes can just be sent as bytes. No magic needed.
I will stress that making up your own format is normally a bad idea. I tend to use JSON by default and use others only when I can get a significant performance increase and I believe it worth the time to do it.
One little secret of binary encoding is that you can write and read entire data structures:
From the Playground
buf := new(bytes.Buffer)
err := binary.Write(buf, binary.LittleEndian, &MyMessage{
First: 100,
Second: 0,
Third: 100,
Message: MyString{0, [10]byte{'H', 'e', 'l', 'l', 'o', '\n'}},
})
if err != nil {
fmt.Printf("binary.Read failed:", err)
return
}
// <<--- CONN -->>
msg := new(MyMessage)
err2 := binary.Read(buf, binary.LittleEndian, msg)
if err2 != nil {
fmt.Printf("binary.Read failed:", err2)
return
}
Pay attention at the kind of types that you can use:
from binary/encoding docs:
A fixed-size value is either a fixed-size arithmetic type (int8, uint8, int16, float32, complex64, ...) or an array or struct containing only fixed-size values.
notice then that you have to use [10] byte and can't use []byte
Fabrizio's answer is good and I would like to add that you should probably wrap your socket with a buffered reader and buffered writer from the bufio package:
http://golang.org/pkg/bufio/
Related
From the very beginning Swift strings were tricky since they work properly with UTF and there is a standard example from Apple:
let cafe1 = "Cafe\u{301}"
let cafe2 = "Café"
print(cafe1 == cafe2)
// Prints "true"
It means that comparison has some implicit logic and it's not a simple comparison of two memory areas are the same. I used to see recommendations to flat out strings into [Character] since when you do this all unicode-related conversions take place once and then all operations are faster. Additionally strings are not necessarily use continuous memory area which makes it more expensive to compare them than character arrays.
Long story short, I solved this problem on leetcode: https://leetcode.com/problems/implement-strstr/ and tried different approaches: KMP, character arrays and strings. To my surprise strings are the fastest.
How is it so? KMP has some prework and it is less efficient in general but why strings are faster than [Character]? Is it new for some recent Swift version or do I miss something conceptually?
Code that I used for reference:
[Character], 8ms, 15mb memory
func strStr(_ haystack: String, _ needle: String) -> Int {
guard !needle.isEmpty else { return 0 }
guard haystack.count >= needle.count else { return -1 }
var result: Int = -1
let str = Array(haystack)
let pattern = Array(needle)
for i in 0...(str.count - pattern.count) {
if str[i] == pattern[0] && Array(str[i...(i + pattern.count - 1)]) == pattern {
result = i
break
}
}
return result
}
Strings, 4ms(!!!), 14.5mb memory
func strStr(_ haystack: String, _ needle: String) -> Int {
guard !needle.isEmpty else { return 0 }
guard haystack.count >= needle.count else { return -1 }
var result: Int = -1
for i in 0...(haystack.count - needle.count) {
var hIdx = haystack.index(haystack.startIndex, offsetBy: i)
if haystack[hIdx] == needle[needle.startIndex] {
var hEndIdx = haystack.index(hIdx, offsetBy: needle.count - 1)
if haystack[hIdx...hEndIdx] == needle {
result = i
break
}
}
}
return result
}
First, I think there may be some misunderstandings on your part:
flat out strings into [Character] since when you do this all unicode-related conversions take place once and then all operations are faster
This doesn't make a lot of sense. Character has exactly the same issues as String. It still may be made of composed or decomposed UnicodeScalars that need special handling for equality.
Additionally strings are not necessarily use continuous memory area
This is equally true of Array. Nothing in Array promises that memory is contiguous. That's why ContiguousArray exists.
As to why String is faster than hand-coded abstractions, that should be obvious. If you could easily out-perform String with no major tradeoffs, then stdlib would implement String to do that.
To the mechanics of it, String does not promise any particular internal representation, so it heavily depends on how you're creating your strings. Small strings, for example, can be reduced all the way to a tagged pointer that requires zero memory (it can live in a register). Strings can be stored in UTF-8, but they can also be stored in UTF-16 (which is extremely fast to work with).
When Strings are compared with other Strings that know they have the same internal representations, then they can apply various optimizations. And this really points to one part of your problem:
Array(str[i...(i + pattern.count - 1)])
This is forcing a memory allocation and copy to create a new Array out of str. You would probably do much better if you used Slice for this work rather than making full Array copies. You'd almost certainly find in that case that you're exactly matching String's implementations (using SubStr).
But the real lesson here is that you're unlikely to beat String at its own game in the general case. If you happen to have very specialized knowledge about your Strings, then I can see where you'd be able to beat the general-purpose String algorithms. But if you think you're beating stdlib for arbitary strings, why would stdlib not just implement what you're doing (and beat you using knowledge of the internal details of String)?
TL;DR: Does the MongoDB driver provide a function to marshal and unmarshal a single field of a document?
This is a pretty straightforward question, but here's some context:
I have a worker responsible for synchronizing data between 2 separated databases. When it receives an event message, signalizing some document must sync, it selects the document in the primary database, and replicates it in another (it's a whole different database, not a replica set).
The thing is: I don't know the full structure of that document, so to preserve the data, I must unmarshal this document in a map map[string]interface{}, or a bson.M that works in the same fashion. But this seems like a lot of overhead, to unmarshal all this data I'm not even using, only to marshal it back to the other database.
So I thought about creating a structure that would just store the binary value of that document, without performing any marshal or unmarshal in order to reduce the overhead, like this:
type Document = map[string]Field
type Field struct {
Type bsontype.Type
Value []byte
}
func (f Field) MarshalBSONValue() (bsontype.Type, []byte, error) {
return f.Type, f.Value, nil
}
func (f *Field) UnmarshalBSONValue(btype bsontype.Type, value []byte) error {
f.Type = btype
f.Value = value
return nil
}
With this structure I can indeed reduce how much of the data will be parsed, but now, I need to manually unmarshal the one value in this document I'll need to use.
So I'm wondering if the MongoDB driver would have some function such as:
// Hypothetical function to get the value of a BSON
var status string
if err := decodeBSON(doc['status'].Type, doc['status'].Value, &status); err != nil {
return err
}
And
// Hypothetical function to set the value of a BSON
createdAt, err := encodeBSON(bsontype.Date, time.Now())
if err != nil {
return err
}
doc["createdAt"] = Field{Type: bsontype.Date, Value: createdAt}
How can I achieve this?
The Field type in your code is equivalent to the driver's bson.RawValue type. By switching to RawValue, you can decode individual fields using the RawValue.Unmarshal method and encode fields using bson.MarshalValue, which returns the two components (type and data) that you need to construct a new RawValue.
An example of how you can use these methods to change a field based on its original value: The Field type in your code is equivalent to the driver's bson.RawValue type. By switching to RawValue, you can decode individual fields using the RawValue.Unmarshal method and encode fields using bson.MarshalValue, which returns the two components (type and data) that you need to construct a new RawValue.
An example of how you can change a field depending on its original value without unmarshalling all of the original document's fields: https://gist.github.com/divjotarora/06c5188138456070cee26024f223b3ee
I find myself reading large CSV files and collecting the numerical elements into a Vec<&str>. Thereafter I have to convert them to numeric types and the simplest way I've found to do that is to implement a function like this:
fn to_u32(str: &str) -> u32
{
let str_num: Option<u32> = str.parse();
match str_num
{
Some(num) => num,
None => panic!("Failed to read number"),
}
}
This seems like a fairly common operation so I've sifted through the reference docs but haven't found anything that matches it. Is there a cleaner way to do this?
The Option type has a large variety of adapter methods which can be used to munge the data around more nicely than repeated matchs.
For example, there's unwrap and expect for just extracting the data out of a Some, panicking if the Option is None. The expect method is actually the closest to your written code: str.parse().expect("Failed to read number.").
However, it can often makes sense to use other the functions listed there, to propagate errors, avoiding the hard "crash" of a panic and allowing users (or yourself) to handle errors more centrally and with more control. It also often makes sense to use Result for this, which gives you the chance to pass along more information in the error case, and also allows one to use the try! macro, that said, one can easily define an equivalent of try! for Option:
macro_rules! option_try {
($e: expr) => {
match $e {
Some(x) => x,
None => return None
}
}
}
Well, you can use unwrap() to avoid pattern matching, but you should do it sparingly - with unwrap() you can't handle the actual parse error, so if the string does not represent a number, it'll panic:
let str_num: u32 = str.parse().unwrap();
if let is also an option:
if let Some(str_num) = str.parse::<u32>() {
// ...
}
You can also use unwrap_or() if you want to specify some default value:
let str_num: u32 = str.parse().unwrap_or(42);
Or you can use unwrap_or_default() which employs Default instance for u32:
let str_num: u32 = str.parse().unwrap_or_default();
The sqlx package has a MapScan function that's quite handy in that it returns a row as a map (map[string]interface{}) but all string columns come out as runes (if I'm not mistaken). Is there a way to have it just return as strings instead?
sqlx - github.com/jmoiron/sqlx
I have encountered a similar issue when dealing with sql in go. Some googling lead me to the go driver docs. Here is what they have for a Value type returned from a query.
Value is a value that drivers must be able to handle. It is either nil or an instance of one of these types:
int64
float64
bool
[]byte
string [*] everywhere except from Rows.Next.
time.Time
Strings are returned as byte slices (when you attempt to encode []bytes into json it base64s it). I have not found a way within the framework of sqlx or sql/db to return strings instead of []byte slices, but did come up with a quick and dirty conversion for slices in a map. There is limited type checking, but it is a good start.
func convertStrings(in map[string]interface{}) {
for k, v := range in {
t := reflect.TypeOf(v)
if t != nil {
switch t.Kind() {
case reflect.Slice:
in[k] = fmt.Sprintf("%s", v)
default:
// do nothing
}
}
}
}
http://play.golang.org/p/SKtaPFtnKO
func md(str string) []byte {
h := md5.New()
io.WriteString(h, str)
fmt.Printf("%x", h.Sum(nil))
// base 16, with lower-case letters for a-f
return h.Sum(nil)
}
All I need is Hash-key string that is converted from an input string. I was able to get it in bytes format usting h.Sum(nil) and able to print out the Hash-key in %x format. But I want to return the %x format from this function so that I can use it to convert email address to Hash-key and use it to access Gravatar.com.
How do I get %x format Hash-key using md5 function in Go?
Thanks,
If I understood correctly you want to return the %x format:
you can import "encoding/hex" and use the EncodeToString method
str := hex.EncodeToString(h.Sum(nil))
or just Sprintf the value:
func md(str string) string {
h := md5.New()
io.WriteString(h, str)
return fmt.Sprintf("%x", h.Sum(nil))
}
note that Sprintf is slower because it needs to parse the format string and then reflect based on the type found
http://play.golang.org/p/vsFariAvKo
You should avoid using the fmt package for this. The fmt package uses reflection, and it is expensive for anything other than debugging. You know what you have, and what you want to convert to, so you should be using the proper conversion package.
For converting from binary to hex, and back, use the encoding/hex package.
To Hex string:
str := hex.EncodeToString(h.Sum(nil))
From Hex string:
b, err := hex.DecodeString(str)
There are also Encode / Decode functions for []byte.
When you need to convert to / from a decimal use the strconv package.
From int to string:
str := strconv.Itoa(100)
From string to int:
num, err := strconv.Atoi(str)
There are several other functions in this package that do other conversions (base, etc.).
So unless you're debugging or formatting an error message, use the proper conversions. Please.