BSON Decoder exception

BSON Decoder exception - mongodb

Since Mongo uses BSON, I am using the BSONDecoder from Java API to get the BSON document from the Mongo query and print the string output. In the following a byte[] array stores the bytes of the MongoDB document (when I print the hex values they are the same as in Wireshark)
byte[] array = byteBuffer.array();
BasicBSONDecoder decoder = new BasicBSONDecoder();
BSONObject bsonObject = decoder.readObject(array);
System.out.println(bsonObject.toString());
I get the following error:
org.bson.BSONException: should be impossible
Caused by: java.io.IOException: unexpected EOF
at org.bson.BasicBSONDecoder$BSONInput._need(BasicBSONDecoder.java:327)
at org.bson.BasicBSONDecoder$BSONInput.read(BasicBSONDecoder.java:364)
at org.bson.BasicBSONDecoder.decodeElement(BasicBSONDecoder.java:118)
at org.bson.BasicBSONDecoder._decode(BasicBSONDecoder.java:79)
at org.bson.BasicBSONDecoder.decode(BasicBSONDecoder.java:57)
at org.bson.BasicBSONDecoder.readObject(BasicBSONDecoder.java:42)
at org.bson.BasicBSONDecoder.readObject(BasicBSONDecoder.java:32)
... 4 more
Looking on the implementation
https://github.com/mongodb/mongo-java-driver/blob/master/src/main/org/bson/LazyBSONDecoder.java it looks that it is caught in
throw new BSONException( "should be impossible" , ioe );
The above takes place in query to the database (by query I mean that byte[] array contains all the bytes after the document length). The query itself contains a string "ismaster" or in hex is "x10 ismaster x00 x01 x00 x00 x00 x00". I suspect it is the BSON format of {isMaster: 1}, but I still do not understand why it fails.

You say:
byte[] array contains all the bytes after the document length
If you are stripping off the first part of the BSON that's returned, you are not passing a valid BSON document to the parser/decoder.
See BSON spec for details, but in a nut-shell, the first four bytes are the total size of the binary document in little endian format.
You are getting an exception in the code that is basically trying to read an expected number of bytes. It read the first int32 as length and then tried to parse the rest of it as BSON elements (and got an exception when it didn't find a valid type in the next byte). Pass it everything you get back from the query, including document size and it will work correctly.

This works just fine:
byte[] array = new BigInteger("130000001069734d6173746572000100000000", 16).toByteArray();
BasicBSONDecoder decoder = new BasicBSONDecoder();
BSONObject bsonObject = decoder.readObject(array);
System.out.println(bsonObject.toString());
And produces this output:
{ "isMaster" : 1}
There is something wrong with the bytes in your byteBuffer. Note that you must include the whole document (including the first 4 bytes which are the size).

Related

How to convert String to UTF-8 to Integer in Swift

I'm trying to take each character (individual number, letter, or symbol) from a string file name without the extension and put each one into an array index as an integer of the utf-8 code (i.e. if the file name is "A1" without the extension, I would want "A" as an int "41" in first index, and "1" as int "31" in second index)
Here is the code I have but I'm getting this error "No exact matches in call to instance method 'append'", my guess is because .utf8 still keeps it as a string type:
for i in allNoteFiles {
var CharacterArray : [Int] = []
for character in i {
var utf8Character = String(character).utf8
CharacterArray.append(utf8Character) //error is here
}
....`//more code down here within the for in loop using CharacterArray indexes`
I'm sure the answer is probably simple, but I'm very new to Swift.
I've tried appending var number instead with:
var number = Int(utf8Character)
and
var number = (utf8Character).IntegerValue
but I get errors "No exact matches in call to initializer" and "Value of type 'String.UTF8View' has no member 'IntegerValue'"
Any help at all would be greatly appreciated. Thanks!

The reason
var utf8Character = String(character).utf8
CharacterArray.append(utf8Character)
doesn't work for you is because utf8Character is not a single integer, but a UTF8View: a lightweight way to iterate over the UTF-8 codepoints in a string. Every Character in a String can be made up of any number of UTF-8 bytes (individual integers) — while ASCII characters like "A" and "1" map to a single UTF-8 byte, the vast majority of characters do not: every UTF-8 code point maps to between 1 and 4 individual bytes. The Encoding section of UTF-8 on Wikipedia has a few very illustrative examples of how this works.
Now, assuming that you do want to split a string into individual UTF-8 bytes (either because you can guarantee your original string is ASCII-only, so the assumption that "character = byte" holds, or because you actually care about the bytes [though this is rarely the case]), there's a short and idiomatic solution to what you're looking for.
String.UTF8View is a Sequence of UInt8 values (individual bytes), and as such, you can use the Array initializer which takes a Sequence:
let characterArray: [UInt8] = Array(i.utf8)
If you need an array of Int values instead of UInt8, you can map the individual bytes ahead of time:
let characterArray: [Int] = Array(i.utf8.lazy.map { Int($0) })
(The .lazy avoids creating and storing an array of values in the middle of the operation.)
However, do note that if you aren't careful (e.g., your original string is not ASCII), you're bound to get very unexpected results from this operation, so keep that in mind.

convert ByteArray to String to ByteArray

I want to convert ByteArray to string and then convert the string to ByteArray,But while converting values changed. someone help to solve this problem.
person.proto:
syntax = "proto3";
message Person{
string name = 1;
int32 age = 2;
}
After sbt compile it gives case class Person (created by google protobuf while compiling)
My MainClass:
val newPerson = Person(
name = "John Cena",
age = 44 //output
)
println(newPerson.toByteArray) //[B#50da041d
val l = newPerson.toByteArray.toString
println(l) //[B#7709e969
val l1 = l.getBytes
println(l1) //[B#f44b405
why the values changed?? how to convert correctly??

[B#... is the format that a JVM byte array's .toString returns, and is just [B (which means "byte array") and a hex-string which is analogous to the memory address at which the array resides (I'm deliberately not calling it a pointer but it's similar; the precise mapping of that hex-string to a memory address is JVM-dependent and could be affected by things like which garbage collector is in use). The important thing is that two different arrays with the same bytes in them will have different .toStrings. Note that in some places (e.g. the REPL), Scala will instead print something like Array(-127, 0, 0, 1) instead of calling .toString: this may cause confusion.
It appears that toByteArray emits a new array each time it's called. So the first time you call newPerson.toByteArray, you get an array at a location corresponding to 50da041d. The second time you call it you get a byte array with the same contents at a location corresponding to 7709e969 and you save the string [B#7709e969 into the variable l. When you then call getBytes on that string (saving it in l1), you get a byte array which is an encoding of the string "[B#7709e969" at the location corresponding to f44b405.
So at the locations corresponding to 50da041d and 7709e969 you have two different byte arrays which happen to contain the same elements (those elements being the bytes in the proto representation of newPerson). At the location corresponding to f44b405 you have a byte array where the bytes encode (in some character set, probably UTF-16?) [B#7709e969.
Because a proto isn't really a string, there's no general way to get a useful string (depending on what definition of useful you're dealing with). You could try interpreting a byte array from toByteArray as a string with a given character encoding, but there's no guarantee that any given proto will be valid in an arbitrary character encoding.
An encoding which is purely 8-bit, like ISO-8859-1 is guaranteed to at least be decodable from a byte array, but there could be non-printable or control characters, so it's not likely to that useful:
val iso88591Representation = new String(newPerson.toByteArray, java.nio.charset.StandardCharsets.ISO_8859_1)
Alternatively, you might want a representation like how the Scala REPL will (sometimes) render it:
"Array(" + newPerson.toByteArray.mkString(", ") + ")"

MongoDB should report error when negative integer is used in dot notation?

MongoDB allows to use dot notation to do queries on JSON sub-keys or array elements (see ref1 or ref2). For instance, if a is an array in the documents the following query:
db.c.find({"a.1": "foo"})
returns all documents in which 2nd element in the a arrary is the "foo" string.
So far, so good.
What is a bit surprissing in that MongoDB accepts using negative values for the index, e.g.:
db.c.find({"a.-1": "foo"})
That doesn't return anything (makes sense if it an unsupported syntax) but what I wonder if why MongoDB doesn't return error upon this operation or if it has some sense at the end. Documentation (as far as I've checked) doesn't provide any clue.
Any information on this is welcome!

That is not an error. The BSON spec defines a key name as
Zero or more modified UTF-8 encoded characters followed by '\x00'. The (byte*) MUST NOT contain '\x00', hence it is not full UTF-8.
Since "-1" is a valid string by that definition, it is a valid key name.
Quick demo:
> db.test.find({"a.-1":{$exists:true}})
{ "_id" : 0, "a" : { "-1" : 3 } }
Playground
Also note how that spec defines array:
Array - The document for an array is a normal BSON document with integer values for the keys, starting with 0 and continuing sequentially. For example, the array ['red', 'blue'] would be encoded as the document {'0': 'red', '1': 'blue'}. The keys must be in ascending numerical order.

Decoded Snappy compressed byte arrays have trailing zeros

I am trying to write and read Snappy compressed byte array created from a protobuf from a Hadoop Sequence File.
The array read back from hadoop has trailing zeros. If a byte array is a small and simple removing trailing zeros is enough to parse the protobuf back, however for more complex objects and big sequence files parsing fails.
Byte array example:
val data = Array(1,2,6,4,2,1).map(_.toByte)
val distData = sparkContext.parallelize(Array.fill(5)(data))
.map(j => (NullWritable.get(), new BytesWritable(j)))
distData
.saveAsSequenceFile(file, Some(classOf[SnappyCodec]))
val original = distData.map(kv=> kv._2.getBytes).collect()
val decoded = sparkContext
.sequenceFile[NullWritable, BytesWritable](file)
.map( kv => kv._2.getBytes.mkString).collect().foreach(println(_))
Output:
original := 126421
decoded := 126421000

This problem stems from BytesWritable.getBytes, which returns a backing array that may be longer than your data. Instead, call copyBytes (as in Write and read raw byte arrays in Spark - using Sequence File SequenceFile).
See HADOOP-6298: BytesWritable#getBytes is a bad name that leads to programming mistakes for more details.

How to convert byte[] back to Barcode in ZXing

ZXing.Result obtained from ZXing.BarcodeReader provides a property of RawBytes of byte[]. However, I have tried but been unable to find a function in ZXing.BarcodeWriter which accepts byte[] as its argument.
I want to have a barcode from ZXing.BarcodeWriter which is exactly the same as that ZXing.BarcodeReader reads.
say, a barcode is known as Code 128,
BarcodeReader gives RawBytes starts with 3 bytes, 105, 102, 42,
which means [Start Code C], [FNC 1], [42] in Code 128,
which means the barcode starts with 2 digits 4, 2.
The major reason to find such a function is that meta-data is lost if a barcode is converted to string and converted back.
string only represents 4 and 2, and [Start Code C] and [FNC 1] are lost.
Is there a function for that? You may assume the barcode format is known.
I am using ZXing.NET, but I suppose the functions are similar across different platforms.

It dependents on the barcode type, if a encode method with a byte array parameter is available. For the Aztec-Barcode you can use the class com.google.zxing.aztec.encoder.Encoder which provides the method:
public static AztecCode encode(byte[] data)
The Encoder class of the QR-Code for example doesn't provide a encode method with a byte[].

No, there is not BarcodeWriter which accepts a byte array.
But you can set an option which interprets the GS1 symbology.
var reader = new BarcodeReader { Options = new DecodingOptions { AssumeGS1 = true } };
The barcode reader will now convert the FNC1 codeword to a string representation in the result, in your case "]C142". The FNC1 representation "]C1" isn't very intuitive but it is defined in the GS1 spec for Code 128 5.4.6.4. Every following FNC1 codeword is translated to the group separator character (GS / 0x1D / (char)29).
The barcode writer on the other hand uses only the group separator character.
That means if you want to generate the same barcode from the result string you have to replace the leading "]C1" with "(char)29".

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse