Does a description or specification exist for scala picked signatures? - scala

I'm looking to pull some data out of the pickled signatures of classes stored on disk, but not loaded in the jvm.
Getting hold of the byte array stored in the ScalaSignature is easy enough via ASM, it is however less than clear how it should be interpreted.
The closest I can find to a description of the format is
http://www.scala-lang.org/old/sid/10
Which doesn't describe it at all.
Does a better resource exist, or is my only option to delve into the source?

In answer to my own question I've now found this, dated 2008, which looks to provide a good overview :-
http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.214.5115&rep=rep1&type=pdf
Would be interested in any more recent documents.

Related

Looking for first record layout of z/OS runnables starting with "IEWPLMH "

This feels something like an archeology expedition but I have been unable to find the record format of the first record of seemingly all executable load modules on z/OS systems. The record always starts with IEWPLMH even with when producing a GOFF format (which I have) runnable. Does anyone have any information on this or a link to it?
The format of load modules is documented in the Load Module Formats section of the z/OS MVS Program Management: Advanced Facilities manual.
But I suspect you are looking for the format of a program object, which is not documented, and, last I knew, IBM had stated they would not document (at least publicly for the likes of us).
There are decades of history behind this. IBM found themselves painted into a corner because customers had written code that depended on the format of load modules not changing. As of 2011, there were 8 different formats/subformats of program object and that number has no doubt grown. By not documenting (for customers) the format of a program object, IBM felt they had freed themselves to make format changes (adding features customers wanted) as they saw fit.
You may be able to get the information you want using the Binder's API or AMBLIST.
The use of the IEWBINDD facility is definitely the way to go. For USS programs,
When compiling the source, the -Wc,DLL option is required. When linking the -Wl,DYNAM=DLL does the trick. The example program in the appendix of the z/OS MVS Program Management: Advanced Facilities was very helpful.

What's the best way to store a huge Map object populated at runtime to be reused by another tool?

I'm writing a Scala tool that encodes ~300 JSON Schema files into files of a different format and saves them to disk. These schemas I later re-need for instantiating JSON Data files, or better, I don't need all the schemas but only a few fields of each.
I was thinking that the best solution could be to populate a Map object (while the tool encodes the schemas) containing only the info that I need. And later re-use the Map object (in another run of the tool) as already compiled and populated map.
I've got two questions:
1. Is this really the most performant solution? and
2. How can I save the Map object, created at runtime, on disk as a file that can be later built/executed with the rest of my code?
I've read several posts about serialization and storing objects, but I'm not entirely sure whether these are the same as what I need. Also, I'm not sure is the best solution and I would like to hear an opinion from people with more experience than me.
What I would like to achieve is an elegant solution that allows me to lookup values from a map generated by another tool.
The whole process of compiling/building/executing sometimes is still confusing to me, so apologies if the question is trivial.
To Answer your question,
I think using an embedded KV Store would be more efficient considering the number of files and amount of traversal.
Here is a small Wiki on "How to use RocksJava". You can consider it as a KV store. https://github.com/facebook/rocksdb/wiki/RocksJava-Basics
You can use the below reference to serialize and de-serialize an object in Scala and put it as Key value pair in the RocksDB as I mentioned in the comment.
Convert Any type in scala to Array[Byte] and back
On how to use rocksDB, the below dependency in your build will suffice:
"org.rocksdb" % "rocksdbjni" % "5.17.2"
Thanks.

Need some help understanding Vocabulary of Interlinked Dataset (VoID) in Linked Open Data

I have been trying to understand VoID in Linked Open Data. It would be great if anyone could help clarify some of my confusions.
Does it need to be stored in a separate file or it can be included in the RDF dataset itself? If so, how do I query it? (A sample query would be really helpful)
How is the information in VoID used in real life?
Does it need to be stored in a separate file or it can be included in the RDF dataset itself? If so, how do I query it? (A sample query would be really helpful)
In theory not, but for practical purposes yes. In the end the information is encoded in triples, so it doesn't really matter in what file you put them and you could argue that it's best to actually put the VoID info into the data files and serve these triples with your data as meta-info. It's query-able as all other forms of RDF, either load it into some SPARQL endpoint or use a library that can directly load RDF files. This however also shows the reason why a separate file makes sense: instead of having to load potentially large data files just to get some dataset meta info, it makes sense to offer the meta-data in its own file.
How is the information in VoID used in real life?
VoID is actually used in several scenarios already, but mostly a recommendation and a good idea. The most prominent use-cases i know of is to get your dataset shown in the LOD Cloud. You currently have to register it with datahub.io and add a VoID file (example from my associations dataset).
Other examples (sadly many defunct nowadays) can be found here: http://semanticweb.org/wiki/VoID.html

Documentation or specification for .step and .stp files

I am looking for some kind of specification, documentation, explanation, etc. for .stp/.step files.
It's more about what information each line contains instead of a general information.
I can't seem to figure out what each value means all by myself.
Does anyone know some good readings about STEP files?
I already searched google but all I got were information about the general structure instead of each particular value.
The structure of STEP-File, i.e. the grammar and the logic behind how the file is organized is described in the standard ISO 10303-21.
ISO 10303 or STEP is divided into Application Protocols (AP). Each AP defines a schema written in EXPRESS. The schemas are available on the Internet: the CAX-IF provides some, STEPtools has some good HTML documentations.
The reference of the AP schemas is hosted on stepmod.

How can I build a generic dataset-handling Perl library?

I want to build a generic Perl module for handling and analysing biomedical character separated datasets and which can, most certain, be used on any kind of datasets that contain a mixture of categorical (A,B,C,..) and continuous (1.2,3,881..) and identifier (XXX1,XXX2...). The plan is to have people initialize the module and then use some arguments to point to the data file(s), the place were the analysis reports should be placed and the structure of the data.
By structure of data I mean which variable is in which place and its name/type. And this is where I need some enlightenment. I am baffled how to do this in a clean way. Obviously, having people create a simple schema file, be it XML or some other format would be the cleanest but maybe not all people enjoy doing something like this.
The solutions I can think of are:
Create a configuration file in XML or similar and with a prespecified format.
Pass the information during initialization of the module.
Use the first row of the data as headers and try to guess types (ouch)
Surely there must be a "canonical" way of doing this that is also usable and efficient.
This doesn't answer your question directly, but have you checked CPAN? It might have the module you need already. If not, it might have similar modules -- related either to biomedical data or simply to delimited data handling -- that you can mine for good ideas, both concerning formats for metadata and your module's API.
Any of the approaches you've listed could make sense. It all depends on how complex the data structures and their definitions are. What will make something like this useful to people is whether it saves them time and effort. So, your decision will have to be answered based on what approach will best satisfy the need to make:
use of the module easy
reuse of data definitions easy
the data definition language sufficiently expressive to describe all known use cases
the data definition language sufficiently simple that an infrequent user can spend minimal time with the docs before getting real work done.
For example, if I just need to enter the names of the columns and their types (and there are only 4 well defined types), doing this each time in a script isn't too bad. Unless I have 350 columns to deal with in every file.
However, if large, complicated structure definitions are common, then a more modular reuse oriented approach is better.
If your data description language is difficult to work with, you can mitigate the issue a bit by providing a configuration tool that allows one to create and edit data schemes.
rx might be worth looking at, as well as the Data::Rx module on the CPAN. It provides schema checking for JSON, but there is nothing inherent in the model that makes it JSON-only.