convert ByteArray to String to ByteArray - scala

I want to convert ByteArray to string and then convert the string to ByteArray,But while converting values changed. someone help to solve this problem.
person.proto:
syntax = "proto3";
message Person{
string name = 1;
int32 age = 2;
}
After sbt compile it gives case class Person (created by google protobuf while compiling)
My MainClass:
val newPerson = Person(
name = "John Cena",
age = 44 //output
)
println(newPerson.toByteArray) //[B#50da041d
val l = newPerson.toByteArray.toString
println(l) //[B#7709e969
val l1 = l.getBytes
println(l1) //[B#f44b405
why the values changed?? how to convert correctly??

[B#... is the format that a JVM byte array's .toString returns, and is just [B (which means "byte array") and a hex-string which is analogous to the memory address at which the array resides (I'm deliberately not calling it a pointer but it's similar; the precise mapping of that hex-string to a memory address is JVM-dependent and could be affected by things like which garbage collector is in use). The important thing is that two different arrays with the same bytes in them will have different .toStrings. Note that in some places (e.g. the REPL), Scala will instead print something like Array(-127, 0, 0, 1) instead of calling .toString: this may cause confusion.
It appears that toByteArray emits a new array each time it's called. So the first time you call newPerson.toByteArray, you get an array at a location corresponding to 50da041d. The second time you call it you get a byte array with the same contents at a location corresponding to 7709e969 and you save the string [B#7709e969 into the variable l. When you then call getBytes on that string (saving it in l1), you get a byte array which is an encoding of the string "[B#7709e969" at the location corresponding to f44b405.
So at the locations corresponding to 50da041d and 7709e969 you have two different byte arrays which happen to contain the same elements (those elements being the bytes in the proto representation of newPerson). At the location corresponding to f44b405 you have a byte array where the bytes encode (in some character set, probably UTF-16?) [B#7709e969.
Because a proto isn't really a string, there's no general way to get a useful string (depending on what definition of useful you're dealing with). You could try interpreting a byte array from toByteArray as a string with a given character encoding, but there's no guarantee that any given proto will be valid in an arbitrary character encoding.
An encoding which is purely 8-bit, like ISO-8859-1 is guaranteed to at least be decodable from a byte array, but there could be non-printable or control characters, so it's not likely to that useful:
val iso88591Representation = new String(newPerson.toByteArray, java.nio.charset.StandardCharsets.ISO_8859_1)
Alternatively, you might want a representation like how the Scala REPL will (sometimes) render it:
"Array(" + newPerson.toByteArray.mkString(", ") + ")"

Related

How to convert String to UTF-8 to Integer in Swift

I'm trying to take each character (individual number, letter, or symbol) from a string file name without the extension and put each one into an array index as an integer of the utf-8 code (i.e. if the file name is "A1" without the extension, I would want "A" as an int "41" in first index, and "1" as int "31" in second index)
Here is the code I have but I'm getting this error "No exact matches in call to instance method 'append'", my guess is because .utf8 still keeps it as a string type:
for i in allNoteFiles {
var CharacterArray : [Int] = []
for character in i {
var utf8Character = String(character).utf8
CharacterArray.append(utf8Character) //error is here
}
....`//more code down here within the for in loop using CharacterArray indexes`
I'm sure the answer is probably simple, but I'm very new to Swift.
I've tried appending var number instead with:
var number = Int(utf8Character)
and
var number = (utf8Character).IntegerValue
but I get errors "No exact matches in call to initializer" and "Value of type 'String.UTF8View' has no member 'IntegerValue'"
Any help at all would be greatly appreciated. Thanks!
The reason
var utf8Character = String(character).utf8
CharacterArray.append(utf8Character)
doesn't work for you is because utf8Character is not a single integer, but a UTF8View: a lightweight way to iterate over the UTF-8 codepoints in a string. Every Character in a String can be made up of any number of UTF-8 bytes (individual integers) — while ASCII characters like "A" and "1" map to a single UTF-8 byte, the vast majority of characters do not: every UTF-8 code point maps to between 1 and 4 individual bytes. The Encoding section of UTF-8 on Wikipedia has a few very illustrative examples of how this works.
Now, assuming that you do want to split a string into individual UTF-8 bytes (either because you can guarantee your original string is ASCII-only, so the assumption that "character = byte" holds, or because you actually care about the bytes [though this is rarely the case]), there's a short and idiomatic solution to what you're looking for.
String.UTF8View is a Sequence of UInt8 values (individual bytes), and as such, you can use the Array initializer which takes a Sequence:
let characterArray: [UInt8] = Array(i.utf8)
If you need an array of Int values instead of UInt8, you can map the individual bytes ahead of time:
let characterArray: [Int] = Array(i.utf8.lazy.map { Int($0) })
(The .lazy avoids creating and storing an array of values in the middle of the operation.)
However, do note that if you aren't careful (e.g., your original string is not ASCII), you're bound to get very unexpected results from this operation, so keep that in mind.

Get UTF-16 code unit at a given index in ABAP

I want to get the UTF-16 code unit at a given index in ABAP.
Same can be done in JavaScript with charCodeAt().
For example "d".charCodeAt(); will give back 100.
Is there a similar functionality in ABAP?
This can be done with class CL_ABAP_CONV_OUT_CE
DATA(lo_converter) = cl_abap_conv_out_ce=>create( encoding = '4103' ). "Litte Endian
TRY.
CALL METHOD lo_converter->convert
EXPORTING
data = 'a'
n = 1
IMPORTING
buffer = DATA(lv_buffer). "lv_buffer will 0061
CATCH ...
ENDTRY.
Codepage 4102 is for UTF-16 Big endian.
It is possible to encode not just a single character, but a string as well:
EXPORTING
data = 'abc'
n = 3
"n" always stands for the length of the string you want to be encoded. It could be less, than the actual length of the string.
When you say you "want to get the UTF-16 code unit",
either you mean the Unicode code point, e.g. the character d is always U+0064 (official "name" of Unicode character, the two bytes 0x0064 being the hexadecimal representation of decimal 100),
or you mean you want to encode d to UTF-16 little endian (SAP code page 4103) or big endian (SAP code page 4102) which gives respectively 2 bytes 0x4400 or 2 bytes 0x0044.
For the second case, see József answer.
For the first case, you may get it using the method UCCP (UniCode Code Point) or UCCPI (UniCode Code Point Integer) of class CL_ABAP_CONV_OUT_CE:
DATA: l_unicode_point_hex TYPE x LENGTH 2,
l_unicode_point_int TYPE i.
l_unicode_point_hex = cl_abap_conv_out_ce=>UCCP( 'd' ).
ASSERT l_unicode_point_hex = '0064'.
l_unicode_point_int = cl_abap_conv_out_ce=>UCCPI( 'd' ).
ASSERT l_unicode_point_int = 100.
EDIT: Note that the two methods return always the same values whatever the SAP system code page is (4102, 4103 or whatever).

How do I parse out a number from this returned XML string in python?

I have the following string:
{\"Id\":\"135\",\"Type\":0}
The number in the Id field will vary, but will always be an integer with no comma separator. I'm not sure how to get just that value from that string given that it's string data type and not real "XML". I was toying with the replace() function, but the special characters are making it more complex than it seems it needs to be.
is there a way to convert that to XML or something that I can reference the Id value directly?
Maybe use a regular expression, e.g.
import re
txt = "{\"Id\":\"135\",\"Type\":0}"
x = re.search('"Id":"([0-9]+)"', txt)
if x:
print(x.group(1))
gives
135
It is assumed here that the ids are numeric and consist of at least one digit.
Non-regex answer as you asked
\" is an escape sequence in python.
So if {\"Id\":\"135\",\"Type\":0} is a raw string and if you put it into a python variable like
a = '{\"Id\":\"135\",\"Type\":0}'
gives
>>> a
'{"Id":"135","Type":0}'
OR
If the above string is python string which has \" which is already escaped, then do a.replace("\\","") which will give you the string without \.
Now just load this string into a dict and access element Id like below.
import json
d = json.loads(a)
d['Id']
Output :
135

Decoded Snappy compressed byte arrays have trailing zeros

I am trying to write and read Snappy compressed byte array created from a protobuf from a Hadoop Sequence File.
The array read back from hadoop has trailing zeros. If a byte array is a small and simple removing trailing zeros is enough to parse the protobuf back, however for more complex objects and big sequence files parsing fails.
Byte array example:
val data = Array(1,2,6,4,2,1).map(_.toByte)
val distData = sparkContext.parallelize(Array.fill(5)(data))
.map(j => (NullWritable.get(), new BytesWritable(j)))
distData
.saveAsSequenceFile(file, Some(classOf[SnappyCodec]))
val original = distData.map(kv=> kv._2.getBytes).collect()
val decoded = sparkContext
.sequenceFile[NullWritable, BytesWritable](file)
.map( kv => kv._2.getBytes.mkString).collect().foreach(println(_))
Output:
original := 126421
decoded := 126421000
This problem stems from BytesWritable.getBytes, which returns a backing array that may be longer than your data. Instead, call copyBytes (as in Write and read raw byte arrays in Spark - using Sequence File SequenceFile).
See HADOOP-6298: BytesWritable#getBytes is a bad name that leads to programming mistakes for more details.

Breaking Hexadecimal Value to multiple lines in CoffeeScript

How do I break a long Hexadecimal Value in Coffeescript so that it spans multiple lines?
authKey = 0xe6b86ae8bdf696009c90e0e650a92c63d52a4b3232cca36e0ff2f5911e93bd0067df904dc21ba87d29c32bf17dc88da3cc20ba65c6c63f21eaab5bdb29036b83
to something like
authKey = 0xe6b86ae8bdf696009c90e0e650a92c63d52a4b323\
2cca36e0ff2f5911e93bd0067df904dc21ba87d29c3\
2bf17dc88da3cc20ba65c6c63f21eaab5bdb29036b83
Using \ results in an Unexpected 'NUMBER' Error,
using line break in an Unexpected 'INDENT' Error
There's actually no point in doing this in CoffeeScript because numbers are stored as 64-bit IEEE 754 values and you have too many bits of precision for the value to be stored as a number.
If you write
authKey = 0xe6b86ae8bdf696009c90e0e650a92c63d52a4b3232cca36e0ff2f5911e93bd0067df904dc21ba87d29c32bf17dc88da3cc20ba65c6c63f21eaab5bdb29036b83
console.log(authKey)
then the value logged is
1.2083806867379407e+154
You want to store your authKey as a string or byte array, both of which are trivial to write across multiple lines.
Like others have said, this doesn't really make a whole lot of sense to be stored in a number, as opposed to a string; however, I decided to throw something together to allow it anyway:
stringToNumber = ( str ) -> parseInt( str.replace( /\n/g, '' ) )
authKey = stringToNumber """
0xe6b86ae8bdf696009c90e0e650a92c63d52a4b323
2cca36e0ff2f5911e93bd0067df904dc21ba87d29c3
2bf17dc88da3cc20ba65c6c63f21eaab5bdb29036b83
"""
Like Ray said, this will just result in:
1.2083806867379407e+154