Using a pre-parsed protocol definition in a script and keeping it up-to-date - perl

For my work, I sometimes have to deal with logfiles from a binary protocol (the logfiles contain hexdumps of the messages). I want to write a Perl script that can interpret the binary data for me and print the contents in a more friendly format.
I have a (machine readable) description of the protocol messages in a proprietary format and I have (mostly) figured out how to parse that format (the parts I can"t fully understand are not related to my goal, so I can just ignore them), so I can convert the description into a data structure for use in my script.
Because the protocol description only rarely changes, it seems a waste to re-parse the protocol description each time I want to analyse a logfile, but on the other hand, if the description does change or if I accidentally throw away my pre-parsed form of the description, then I would like my script to automatically trigger a re-parsing of the description.
What is the best way to realise this?

Assuming that the protocol description lives in a file accessible to the script, have a function to read in the parsed data which caches the parsed results in intermediate file. The logic is very very simple but the steps look very verbose since I tried to write out the full spec - in reality it should take <10 lines of Perl code.
Check if intermediate file exists. If it does not (or can not be read), skip to proprietary parsing step (#4)
If you can read in the intermediate cache file, read in the "protocol description timestamp" field (described below). Then find out modification time of "protocol description" file via stat() and compare. If modification time of "protocol description" file is >= cache file's stored timestamp, skip to proprietary parsing step (#4)
Else (e.g. the time of "protocol description" file is < cache file's stored timestamp), read the intermediary cache file data via Data::Dumper or Storable. End.
If you need to re-parse because of logic in #1 or #2, read in "protocol description" file, parse it into your data structure.
Then create a hash with 2 keys: "protocol_description_timestamp" (with the value being the modification time of protocol description file derived from stat call) and second key "data", with the value being a reference to the data structure you just produced as a result of parsing.
Then save that data structure into the intermediate cache file using Storable or Data::Dumper or any other method of your choice for storing Perl data structires.

You can use a Makefile for this. Make the data structure you use a Makefile target that depends on the protocol description.
When Make notices that the protocol was updated more recently than the script, it will run the commands you specify to recreate your data.

Related

Building an OID->MIB index from a PySMI script using JSON?

I have successfully compiled several MIBs into JSON using PySMI with JsonCodeGen and CallbackWriter (which uploads the parsed JSON to cloud storage). Now I am trying to build an index using freshly compiled JSON MIBs in combination with already-compiled JSON files.
From the documentation, it looks like I need to pass all of these files to mibCompiler.compile() function, even though most of them have already been compiled, so that I can run mibCompiler.buildIndex() after compiling.
From what I understand, I need a searcher to exclude the already-compiled JSON MIBs...is this the case? All I see in the current code are PyFileSearcher, StubSearcher, and AnyFileSearcher. I 'm not sure what to do from here to ignore my JSON files.
I'm also not sure buildIndex() will even accept JSON files as input, so I'm hoping this is the right approach.
Thanks in advance!
I'm also not sure buildIndex() will even accept JSON files as input, so I'm hoping this is the right approach.
Actually, no! Present day PySMI compiler can only parse ASN.1 MIBs, it will fail on JSON input.
Probably the simplest solution would be to just load JSON MIBs and existing JSON index into Python as dicts, walk the dicts updating one another. Here is the code that builds JSON index dict out of some internal objects (which carry pieces of MIB data).
From PySMI perspective, the best course of action would probably be to introduce a JSON MIB compiler which would turn JSON MIB into the abstract syntax tree from which JSON MIB index could be build...

Can ItemReaders just pass in the record read and not need a lineMapper t o convert to an object

I'm asking if I can pass into the ItemProcessors the entire delimited record read in the ItemReader as one long string.
I have situations with unpredictable data. The file is pipe-delimited, but even with that, a single double-quote will have a parse error using Spring Batch's ItemReader.
In a standalone java application I wrote code using Spring's StringUtils class. I read in the full delimited record as a String (BufferedReader), then call Spring's StringUtils.delimitedListToStringArray(...,...). This gets all the characters whether valid or not, and then I can do a search/replace to get things like any single double-quote or commas in the fields.
My standalone Java program is a down-n-dirty solution. I'm turning it into a Spring Batch job for the long term solution. It's a monthly process, and it's an impractical, if not impossible, task to get SAP users to keep trash out of data fields (i.e. fat-finger city).
I see where it appears I have to have a domain object for the input record to be mapped into. Is this correct, or can i do a pass-through scenario, and let me handle the parsing myself using StringUtils?
The pipe-delimited records turn into comma-delimited records. There's really no need to create a domain object and do all the field set mapping.
Am happy for ideas if I'm approaching this the wrong way.
Thank you in advance.
Thanks,
Michael
EDIT:
This is the error, and the record. The lone double-quote in column 6 is the problem. I can't control the input, so I'm scrubbing each field (all Strings) for unwanted characters. So, my solution was to skip the line mapping and use StringUtils to do it myself--as I've done as mentioned earlier.
Caused by: org.springframework.batch.item.file.FlatFileParseException: Parsing error at line: 33526 in resource=[URL [file:/temp/comptroller/myfile.txt]], input=[xxx|xxx|xxx|xxx|xxx|xxx x xxx xxxxxxx xxxx xxxx "x|xxx|xxx|xxxxx|xx|xxxxxxxxxxxxx|xxxxxxx|xxx|xx |xxx ]
at org.springframework.batch.item.file.FlatFileItemReader.doRead(FlatFileItemReader.java:182)
at org.springframework.batch.item.support.AbstractItemCountingItemStreamItemReader.read(AbstractItemCountingItemStreamItemReader.java:85)
at org.springframework.batch.core.step.item.SimpleChunkProvider.doRead(SimpleChunkProvider.java:90)
at org.springframework.batch.core.step.item.FaultTolerantChunkProvider.read(FaultTolerantChunkProvider.java:87)
... 27 more
Caused by: org.springframework.batch.item.file.transform.IncorrectTokenCountException: Incorrect number of tokens found in record: expected 15 actual 6
Since the domain objects you read from ItemReaders, write to ItemWriters, and optionally process with ItemProcessors can be any Object, they can be Strings.
So the short answer is yes, you should be able to use a FlatFileItemReader to read one line at a time, pass it to SomeItemProcessor<String,String>, which replaces your pipes with commas (and handles existing commas) with whatever code you want, and sends those converted lines to a FlatFileItemWriter. Spring Batch includes common implementations of the LineTokenizer and LineAggregator classes which could help.
In this scenario, Spring Batch would be acting like a glorified search replace tool, with saner failure handling. To answer the bigger question of whether you should be using domain objects, or at least beans, think about whether you want to perform other tasks in the conversion process, like validation.
P.S. I'm not aware that FFItemReader blows up on a single double-quote, might want to file that as a bug.

using protobuf as a textual configuraton file

I recently encountered a very large mission-critical project where all the configuration
files were defined using textual protobuf definitions. The configuration files are meant to be
human readable and editable.
For example
message ServerSettings {
required int32 port = 3022;
optional string name = "mywebserver";
}
Personally I found this humorous.
But is it in fact a reasonable keep-it-simple technique, or clearly moronic ?!
In other words, are there REAL, ACTUAL problems with this ?
If that is the text proto if format, then... Whatever, I guess. If it works, then it is as reasonable as any other serialization format.
If that is meant to be proto schema, then it is illegal (the value after the = is meant to be the field number).
Json or XML might be more typical, but as long as it works it isn't "moronic". So the ultimate question is: does it work?
I think it's quite clever. I am guessing they pass it through protoc --encode to generate a binary which is what is actually parsed.
Pros:
1. Code is generated to parse configuration
2. Type validation
3. More robust configuration file compared to a key/value as it supports structs, unions, maps and arrays
4. The configuration data is now serializable meaning it can be easily exposed to an RPC or IPC interface.
Cons:
1. The syntax can be a little verbose for maps/arrays.
2. It requires protoc to be installed on the target as well as libprotobuf.so if you are on a system with tight memory limits.

Spreadsheet::ParseExcel module in perl

I always gets confused when I deal with Classes and Objects. As I am trying to understand the Spreadsheet::ParseExcel module, I am having some doubts for its classes and object :
My doubt is:
With $parser= Spreadsheet::ParseExcel->new();, we are creating an object for Spreadsheet::ParseExcel and after this we shall create the object for Spreadsheet::ParseExcel::Workbook.
Why can not we create the object directly for Spreadsheet::ParseExcel::Workbook and start parsing ?
Thanks
Why can not we create the object directly for Spreadsheet::ParseExcel::Workbook and start parsing
That is a reasonable question and in older versions of Spreadsheet::ParseExcel there was a Spreadsheet::ParseExcel::Workbook->Parse() method that did just that. (*)
Users tend to see an Excel file only as a workbook. However the file format also contains data such as metadata (author, creation date, etc.) and vba macros that are separate from the workbook data.
As such the logical division of the parser from the workbook probably occurred due to the physical division of the data in the file.
Or it may have been to allow reporting of file parsing errors rather than just returning an undefined workbook object.
Either way, other people may have chosen to model the interface differently but that is what the original author chose. It is not completely intuitive but it works.
(*) This method is now deprecated since it doesn't allow error checking on the file.
Think about Spreadsheet::ParseExcel and Spreadsheet::ParseExcel::Workbook like they are just of different types, like integer and string, which are both scalar, but you cannot, say, multiply them, although they can interact in some cases. E.g. length() applied to string gives you integer length of string. The same way, Spreadsheet::ParseExcel::parse() gives you Spreadsheet::ParseExcel::Workbook. They are bound by common namespace but they are completely different, Spreadsheet::ParseExcel is a parser and Spreadsheet::ParseExcel::Workbook is a workbook.

loading parameter files for data different sets

I need to analyse several sets of data which are associated with different parameter sets (one single set of parameters for each set of data). I'm currently struggling to find a good way to store these parameters such that they are readily available when analysing a specific dataset.
The first thing I tried was saving them in a script file parameters.m in the data directory and load them with run([path_to_data,'/parameters.m']). I understand, however, that this is not good coding practice and it also gave me scoping problems (I think), as changes in parameters.m were not always reflected in my workspace variables. (Workspace variables were only changed after Clear all and rerunning the code.)
A clean solution would be to define a function parameters() in each data directory, but then again I would need to add the directory to the search path. Also I fear I might run into namespace collisions if I don't give the functions unique names. Using unique names is not very practical on the other hand...
Is there a better solution?
So define a struct or cell array called parameters and store it in the data directory it belongs in. I don't know what your parameters look like, but ours might look like this:
parameters.relative_tolerance = 10e-6
parameters.absolute_tolerance = 10e-6
parameters.solver_type = 3
.
.
.
and I can write
save('parameter_file', 'parameters')
or even
save('parameter_file', '-struct', 'parameters', *fieldnames*)
The online help reveals how to use -struct to store fields from a structure as individual variables should that be useful to you.
Once you've got the parameters saved you can load them with the load command.
To sum up: create a variable (most likely a struct or cell array) called parameters and save it in the data directory for the experiment it refers to. You then have all the usual Matlab tools for reading, writing and investigating the parameters as well as the data. I don't see a need for a solution more complicated than this (though your parameters may be complicated themselves).