How protobuf parses a string from a large string? - proto

I have a message with repeated field, assuming the repeating field size is very large, and then I save the proto object to string (SerializeToString), which size is 200M.
message ChunkBody {
repeated SingleMessage messages = 1;
}
Now If I want to parse this object (ParseFromString), I have to read the whole 200M file before I can parse it?
Actually I want to implement the function of iteratively reading repeated messages from file.
the environment I want to use
lib: protobuf
language : python

Related

Unable to pass dynamic and unique date values in JMeter

I have a request payload(JSON format) which has an array with 1000 objects and each object has 6 key value pairs out of which 5 I’m reading from the csv file using parameterization and the 6th key has to be a unique date value of a future date for each of the object in the array.
I tried this with time-shift function which works for 1 iteration but I want to execute it for n- number of iterations.
I checked for groovy code for this but I have no knowledge of groovy and have started learning it.
How can I achieve this in JMeter.
Also, on reading time-shift function from HTTP Request Defaults-Parameters or from the Test Plan-User Defined Variables it does not read different date for each object, it duplicates same date of the first variable in each object.
{
“deviceNumber": “XX”,
“array: [
{
“keyValue1: “${value1_ReadFromCSV}”,
"keyValue2”: “${value2_ReadFromCSV}”,
"keyValue3”: “${value3_ReadFromCSV}”,
"keyValue4”: “${value4_ReadFromCSV}”,
"keyValue5”: “${value5_ReadFromCSV}”,
"keyValue6”: "2020-05-23” (Should be dynamically generated)
},
{
“keyValue7: “value7_ReadFromCSV”,
"keyValue8”: "value8_ReadFromCSV",
"keyValue9”: "value9_ReadFromCSV",
"keyValue10”: "value10_ReadFromCSV",
"keyValue11”: "value11_ReadFromCSV",
"keyValue12”: "2020-05-24” (Should be dynamically generated)
},
.
.
.
.
{
“keyValue995: “value995_ReadFromCSV”,
"keyValue996”: "value996_ReadFromCSV",
"keyValue997”: "value997_ReadFromCSV",
"keyValue998”: "value998_ReadFromCSV",
"keyValue999”: "value999_ReadFromCSV",
"keyValue1000”: "2025–12-31” (Should be dynamically generated)
}
]
}
I have got the partial solution to this, by reading the csv file line by line and storing each line into a variable using groovy. However, I don't want to store directly the line into the variable but to create a JSON object like above from each line of csv file with a unique future date for each object which is in the array.
The csv file is : (Note: I have removed column for date column in csv as I no longer need it.)
deviceNumber,keyValue1,keyValue2,keyValue3,keyValue4,keyValue5,keyValue7,keyValue8,keyValue9,keyValue10,keyValue11,keyValue12,keyValue13,keyValue15,keyValue15,keyValue16
01,somestring,somestring,somestring,somestring,somestring,somestring,somestring,somestring,somestring,somestring,somestring,somestring,somestring,somestring,somestring
02,somestring,somestring,somestring,somestring,somestring,somestring,somestring,somestring,somestring,somestring,somestring,somestring,somestring,somestring,somestring
03,somestring,somestring,somestring,somestring,somestring,somestring,somestring,somestring,somestring,somestring,somestring,somestring,somestring,somestring,somestring
.
.
.
1000,somestring,somestring,somestring,somestring,somestring,somestring,somestring,somestring,somestring,somestring,somestring,somestring,somestring,somestring,somestring
Kindly suggest any reference/example to do this.
I provide only generic instructions:
You can dynamically construct request body using JSR223 PreProcessor
You can read CSV file into memory using File.readLines() function
You can build JSON out of the values from the CSV file using JsonBuilder class
More information:
Apache Groovy - Parsing and producing JSON
Apache Groovy - Why and How You Should Use It

How could I get offset of a field in flatbuffers binary file?

I am using a library, and the library requires me to provide the offset of the desired data in the file, so it can use mmap to read the data (I can't edit the code of this library but only provide the offset).
So I want to use flatbuffers to serialize my whole data because there isn't any packing and unpacking in flatbuffers, (I think) which means that it is easy to get the offset of the desired part in the binary file.
But I don't know how to get the offset. I have tried loading the binary file and calculate the offset of the pointer of the desired field, for example, the address of the root is 1111, the address of the desired field is 1222, so the offset of the field in the binary file is 1222 - 1111 = 111 (because there is no unpacking step). But in fact, the offset of the pointer is a huge negative number.
Could someone help me with this problem? Thanks in advance!
FlatBuffers is indeed very suitable for mmap. There are no offsets to be computed, since the generated code does that all for you. You should simply mmap the whole FlatBuffers file, and then use the field accessors as normal, starting from auto root = GetRoot<MyRootType>(my_mmapped_buffer). If you want to get a direct pointer to the data in a larger field such as a string or a vector, again simply use the provided API: root->my_string_field()->c_str() for example (which will point to inside your mmapped buffer).

mirth connect Database Reader automatic column mapping

Please could somebody confirm the following..
I am using Mirth Connect 3.5.08232.
My Source Connector is a Database Reader.
Say, I am using a query that returns multiple rows, and return the result (via JavaScript), as documentation suggests, so that Mirth would treat each row as a separate message. I also use a couple of mappers as source transformers, and save the mapped fields in my channel map (which ends up to contain only those fields that I define in transformers)
In the destination, and specifically, in destination response transformer (or destination body, if it is a JavaScript writer), how do I access the source fields?
the only way I found by trial and error is
var rawMsg = connectorMessage.getRawData();
var xmlMsg = new XML(rawMsg);
logger.info(xmlMsg.some_field); // ignore the root element of rawMsg
Is this the right way to do this? I thought that maybe the fields that were nicely automatically detected would be put in some kind of a map, like sourceMap - but that doesn't seem to be the case, right?
Thank you
If you are using Mapper steps in your transformer to extract the data and put it into a variable map (like the channel map), then you can use any of the following methods to retrieve it from a subsequent JavaScript context (including a JavaScript Writer, and your response transformer):
var value = channelMap.get('key');
var value = $c('key');
var value = $('key');
Look at the Variable Maps section of the User Guide for more information.
So to recap, say you're selecting a column "mycolumn" with a Database Reader. The XML sent to the channel will be something like this:
<result>
<mycolumn>value</mycolumn>
</result>
Then you can choose to extract pieces of that message into specific variables for later use. The transformer allows you to easily drag-and-drop pieces of the sample inbound message.
Finally in your JavaScript Writer (or in any subsequent filter, transformer, or response transformer), just drag the value into the field you want:
And the corresponding JavaScript code will automatically be inserted:
One last note, if you are selecting a lot of variables and don't want to make Mapper steps for each one individually, you can use a JavaScript Step to iterate through the message and extract each column into a separate map variable:
for each (child in msg.children()) {
channelMap.put(child.localName(), child.toString());
}
Or, you can just reference the columns directly from within the JavaScript Writer:
var msg = new XML(connectorMessage.getEncodedData());
var column1 = msg.column1.toString();
var column2 = msg.column2.toString();
...

General purpose Tuple in Hadoop

I'm new to Hadoop, so please do not judge strictly my seemingly simple question.
The short version: What tuple data type can I use in Hadoop, to store 2 longs as a single value is a sequence file?
Moreover, I want to be able to read and process this file with Apache Pig like A = LOAD '/my/file' AS (a:long, (b:long, c:long)) and with Scala & Spark like val a = sc.sequenceFile[LongWritable, DesiredTuple]("/my/file", 1).
The full story:
I'm writing a Hadoop Job in Java, and I need to output a sequence file, which contains 3 long values at each line. I use first value a a key and group two other values together as a value in my Reducer.
I tried several variants:
Using org.apache.hadoop.mapreduce.lib.join.TupleWritable
public class MyReducer extends Reducer<...> {
public void reduce(Context context){
long a,b,c;
// ...
context.write(a, new TupleWritable(
new LongWritable[]{new LongWritable(b), new LongWritable(c)}));
}
}
But the javadoc of TupleWritable class says " * This is not a general-purpose tuple type." It seems to be ok for first attempt, but I can't get back my Tuples. Look as a simple script in Apace Pig:
A = LOAD '/my/file' USING org.apache.pig.piggybank.storage.SequenceFileLoader()
AS (a:long, (b:long, t:long));
DUMP A;
I got Something like this:
(2220,)
(5640,)
(6240,)
...
So what is the Apache Pig way of reading Hadoop's TupleWritable from a sequence file?
Furthermore, I tried to change sequence format to text format: job.setOutputFormatClass(TextOutputFormat.class);
This time I just looked in one of outputed files:
> hdfs dfs -cat /my/file/part-r-00000 | head
2220 [,]
5640 [,]
6240 [,]
...
So is the next question: Why there is nothing in my TupleWritable value?
After that, I tried org.apache.mahout.cf.taste.hadoop.EntityEntityWritable.
For a sequence file I got the same result as before:
grunt> A = LOAD '/my/file' USING org.apache.pig.piggybank.storage.SequenceFileLoader() AS (a:long, (b:long, c:long));
(2220,)
(5640,)
(6240,)
...
For a text file I got the desired result:
2220 2 15
5640 1 9
6240 0 1
...
And next question is: How to read such tuples (EntityEntityWritable) and may be other custom objects back from Hadoop-written sequence file?

Sending corrupted csv line over the socket in Python 3

I build a sensor unit where data are gathered at one Raspberry Pi and send to another's over the network.
My first Pi creates a line with multiple readings from different sensors. It supposed to create a server and send it to clients. The client Pis needs to receive the sentence, do further processing or visualisation.
To test my solutions I want to read data from a txt file, which was build in an experiment. The problem is that sometimes data are corrupted, has different format depending on sensor and rows can be different for different set-ups.
I have build a function which suppose to change the input line to bytes. (I tried different methods but only this clunky function is the closest to any results). But it does not convert over the network
import struct
message = ['First sensor', 'second data', 'third',1, '19.04.2016', 0.1]
def packerForNet(message):
pattern = ''
newMessage = []
for cell in message:
if isinstance( cell, int ):
pattern += ('I')
newMessage.append(cell)
elif isinstance( cell, float ):
pattern += ('d')
newMessage.append(cell)
elif isinstance(cell, str):
pattern += (str(len(cell)))
pattern += ('s')
newMessage.append(cell.encode('UTF-8'))
else:
cell = str(cell)
pattern += (len(cell))
pattern += ('s')
newMessage.append(cell.encode('UTF-8'))
return (newMessage, pattern)
newMessage, pattern = packerForNet(message)
patternStruct = struct.Struct(pattern)
packedM = patternStruct.pack(*newMessage)
The output from the function does not unpack correctly:
packedM = b'First sensorsecond datathird\x01\x00\x00\x0019.04.2016\x00\x00\x00\x00\x00\x00\x9a\x99\x99\x99\x99\x99\xb9?'
56
print('unpacked = %s' % patternStruct.unpack(packedM))
TypeError: not all arguments converted during string formatting
In addition I need to know the pattern to unpack it on the client side so in general it doesn't have sense.
In final version the server needs to work in the way that after client connects it will send to the client line from the sensors every millisecond. Sensors parsing is implemented in C and at the moment it creates a txt file for off-line processing. I can't change the way the sensor's line are made.
I don't know how to pack the list of different types and constantly send such lists to the client.
Actually it does unpack correctly. The problem is not with the unpacking, but with the print.
Try:
unpackedM = patternStruct.unpack(packedM)
print(unpackedM)
unpackedM is a tuple of multiple values. The formatting of the string with the tuple failed.
EDIT:
To convert entire objects you can use python msgpack. That is what we use in communication python to python and php to python.