Do metadata exist on Machine Code level in LLVM? - metadata

I added my new custom attribute and I can see it on the memory operands of LLVM IR, is there any way to do so on the Machine Code level as well?

No, there isn't.
Metadata is an LLVM-IR thing. It can be consumed by passes and then used to generate something on the machine code level, but you have to do that yourself (or add metadata which is already used by some pass, such as debug information).

Related

How do you access the storage map for a cache class programmatically?

I'm trying to create a method that will automatically create something I can save and use later, which will let me automate data migration from one version of a class to the next.
When I compile a normal persistent class, a "storage" gets created, and I would like to be able to archive some code that would let me recreate this "old storage" so I could access an "older version" from a global and map the values into the current mapping of the class. I have objectgenerator methods that will save a version with every record, and detect if the "current version" of the class code is different from the version saved in the data record itself, but I'm not sure how to retain what the "old" version actually looked like so I can auto-migrate the data from the "old to the new". To do this, I'm thinking that being able to read what the current storage is at compile time, I should be able to save that value elsewhere and create a durable "version reader" so I can migrate the data forward without having to actually have the programmer do the work.
Does this seem like a reasonable approach? and if so, can anyone point me to where, at compile time, the storage values are so I can do the saving of the data?
(or should I be looking at cloning some part of the generated %Save() method chain at compile time to a version specific save attached to "something else" (not sure what, but 'something'))
This question and answer originally appeared in the InterSystems Developer Community https://community.intersystems.com/post/how-do-you-access-storage-map-cache-class-programmatically
In a database, your most important asset is your data -- business logic and code comes a close second. There should at least be someone paying attention to the data slots for sanity, even though in most cases, there is nothing to do. At the very least, any developer adding or modifying properties should "diff" the storage definition when submitting code to the source repository. The class compiler by default will handle everything correctly.
The simplest and best way to move from one version of a class to the next is to keep the same storage definition. The class compiler will create slots for any new properties and is completely non-destructive and conservative. When I retire properties, I generally manually rename the storage slot and give it a prefix such as zzz. This allows me to explicitly clean up and reuse these slots later if I so choose.
For the type of before/after compile triggers you are looking for, I would use the RemoveProjection/CreateProjection methods of a projection class which provide "a way to customize what happens when a class is compiled or removed".
You can use %Dictionary.CompiledStorage and related classes(Data) to have full access to the compiled storage definition of a class.
3a. An alternate approach is to use XSLT or XML parsing to read the storage definition from the exported class definition. I would only use this alternative if you need to capture details for separate source control purposes.
The simplest storage definitions use $ListBuild slots and a single global node. When subclasses and collection properties are used, the default storage definition gets more complicated -- that is where you will really want to stick to the simple approach (item 1).

What is the correct failure mode for Modbus (0x10) Write Multiple Registers (if any)?

How should a Modbus device fail if an error is encountered after a Write Multiple Register (0x10) request has been validated but before all writes have been completed? The specification (6.12) seems vague on this point and web searches have not proven fruitful. I see three possibilities:
Attempt to write each register in turn. If an error occurs immediately give up/send an exception.
Treat the request as an atomic transaction where either all data should be written or none of it.
Attempt to write all data sending an exception at the end if any failure was observed.
Is there a conventional or proper way to fail here or is the way a device fails implementation specific requiring only that it be documented in the manual?
According to Figure 22: Write Multiple Registers state diagram on page 31 of the specification exception code 4 should be returned but the protocol itself does not specify how the application layer should work.
The way your device fails must consider constraints and possible side effects. For example if you accept some control commands (set point value or just setting multiple outputs assigned to different registers) it is a good practice to never assign such values directly from registers but with some validation first before re-writing the value to internal program variables which are used to drive IOs or influence any control actions. Such approach complies with option 2 from your list.
You can implement whatever behavior works best provided it’s documented. If your slave is under control of a master capable of implementing part of your application layer you may also consider support for diagnostic or error information requests from slave in case of failure but such functionality is not specified for exception 0x4 to function code 0x10.
Since master can't know what is successfully written and what is not in case of such a partially failed write operation, most appropriate approach would be "all-or-nothing", just like you mentioned at #2. It's the hardest path to implement, but it's the only approach where master won't end up in an ambiguous state on such an error.

major and minor device numbers

I am reading linux device driver book of rubini,corbet and hartmen.I have doubt regarding dynamic allocation of major and minor device numbers.They say
The disadvantage of dynamic assignment is that you can’t create the device nodes in
advance, because the major number assigned to your module will vary.For normal
use of the driver, this is hardly a problem, because once the number has been
assigned, you can read it from /proc/devices.
1)What does it mean by advance here?
2)Why major and minor numbers must be read from /proc/devices when function alloc_chrdev_region provides major and minor numbers in the argument sent to it.can that argument sent, not be used directly?
Thanks in advance
1) Dynamic assignment would mean that you cannot create device nodes before the driver is loaded, for example having them as a static part of the filesystem when the system boots. Instead you could only create them once you discover what their major/minor numbers are this time.
2) The driver may know what it's major and minor numbers are, but the device nodes should be created by something in userspace. They are suggesting that if this information cannot be given in advance in parallel to both the kernel driver and userspace, then userspace will have to discover it at runtime from something such as /proc/devices.
When we assign a major number dynamically to a device driver, we are not aware with the Major Number till the alloc_chrdev_region function finishes execution or let's say that you don't get to know the Major Number before you insert the module into the kernel (and for this we use insmod). And so you cannot create a node for your driver (for which we use mknod) unless you load a device driver, this is referred to as advanced by the author.
We read /proc/devices for Major and Minor Numbers of one device driver when a different device/program needs them.

How to tag a scientific data processing tool to ensure repeatability

we develop a data processing tool to extract some scientific results out of a given set of raw data. In data science it is very important that you can re-obtain your results and repeat the calculations, that led to a result set
Since the tool is evolving, we need a way to find out which revision/build of our tool generated a given result set and how to find the corresponding source from which the tool was build.
The tool is written in C++ and Python; gluing together the C++ parts using Boost::Python. We use CMake as a build system generating Make files for Linux. Currently the project is stored in a subversion repo, but some of us already use git resp. hg and we are planning to migrate the whole project to one of them in the very near future.
What are the best practices in a scenario like this to get a unique mapping between source code, binary and result set?
Ideas we are already discussing:
Somehow injecting the global revision number
Using a build number generator
Storing the whole sourcecode inside the executable itself
This is a problem I spend a fair amount of time working on. To what #VonC has already written let me add a few thoughts.
I think that the topic of software configuration management is well understood and often carefully practiced in commercial environments. However, this general approach is often lacking in scientific data processing environments many of which either remain in, or have grown out of, academia. However, if you are in such a working environment, there are readily available sources of information and advice and lots of tools to help. I won't expand on this further.
I don't think that your suggestion of including the whole source code in an executable is, even if feasible, necessary. Indeed, if you get SCM right then one of the essential tests that you have done so, and continue to do so, is your ability to rebuild 'old' executables on demand. You should also be able to determine which revision of sources were used in each executable and version. These ought to make including the source code in an executable unnecessary.
The topic of tying result sets in to computations is also, as you say, essential. Here are some of the components of the solution that we are building:
We are moving away from the traditional unstructured text file that is characteristic of the output of a lot of scientific programs towards structured files, in our case we're looking at HDF5 and XML, in which both the data of interest and the meta-data is stored. The meta-data includes the identification of the program (and version) which was used to produce the results, the identification of the input data sets, job parameters and a bunch of other stuff.
We looked at using a DBMS to store our results; we'd like to go this way but we don't have the resources to do it this year, probably not next either. But businesses use DBMSs for a variety of reasons, and one of the reasons is their ability to roll-back, to provide an audit trail, that sort of thing.
We're also looking closely at which result sets need to be stored. A nice approach would be only ever to store original data sets captured from our field sensors. Unfortunately some of our computations take 1000s of CPU-hours to produce so it is infeasible to reproduce them ab-initio on demand. However, we will be storing far fewer intermediate data sets in future than we have in the past.
We are also making it much harder (I'd like to think impossible but am not sure we are there yet) for users to edit result sets directly. Once someone does that all the provenance information in the world is wrong and useless.
Finally, if you want to read more about the topic, try Googling for 'scientific workflow' and 'data provenance' similar topics.
EDIT: It's not clear from what I wrote above, but we have modified our programs so that they contain their own identification (we use Subversion's keyword capabilities for this with an extension or two of our own) and write this into any output that they produce.
You need to consider git submodules of hg subrepos.
The best practice in this scenario os to have a parent repo which will reference:
the sources of the tool
the result set generated from that tool
ideally the c++ compiler (won't evolve every day)
ideally the python distribution (won't evolve every day)
Each of those are a component, that is an independent repository (Git or Mercurial).
One precise revision of each component will be reference by a parent repository.
The all process is representative of a component-based approach, and is key in using an SCM (here Software Configuration Management) at its fullest.

In salesforce.com can you have multivalued attributes?

I am developing a Novell Identity Manager driver for Salesforce.com, and am trying to understand the Salesforce.com platform better.
I have had really good success to date. I can read pretty much arbitrary object classes out of SFDC, and create eDirectory objects for them, and what not. This is all done and working nicely. (Publisher Channel). Once I got Query events mapped out, most everything started working in the Publisher Channel.
I am now working on sending events back to SFDC (Subscriber channel) when changes occur in eDirectory.
I am using the upsert() function in the SOAP API, and with Novell Identity Manager, you basically build the SOAP doc, and can see the results as you build it. (You can do it in XSLT or you can use the various allowed tokens to build the document in DirXML Script. I am using DirXML Script which has been working well so far.).
The upshot of that comment is that I can build the SOAP document, see it, to be sure I get it right. Which is usually different than the Java/C++ approach that the sample code usually provides. Much more visual this way.
There are several things about upsert() that I do not entirely understand. I know how to blank a value, should I get that sort of event. Inside the <urn:sObjects> node, add a node like (assuming you get your namespaces declared already):
<urn1:fieldsToNull>FieldName</urn1:fieldsToNull>
I know how to add a value (AttrValue) to the attribute (FieldName), add a node like:
<FieldName>AttrValue</FieldName>
All this works and is pretty straight forward.
The question I have is, can a value in SFDC be multi-valued? In eDirectory, a multi valued attribute being changed, can happen two ways:
All values can be removed, and the new set re-added.
The single value removed can be sent as that sort of event (remove-value) or many values can be removed in one operation.
Looking at SFDC, I only ever see Multi-picklist attributes that seem to be stored in a single entry : or ; delimited. Is there another kind of multi valued attribute managed differently in SFDC? And if so, how would one manipulate it via the SOAP API?
I still have to decide if I want to map those multi-picklists to a single string, or a multi valued attribute of strings. First way is easier, second way is more useful... Hmmm... Choices...
Some references:
I have been using the page Sample SOAP messages to understand what the docs should look like.
Apex Explorer is a kicking tool for browsing the database and testing queries. Much like DBVisualizer does for JDBC connected databases. This would have been so much harder without it!
SoapUi is also required, and a lovely tool!
As far as I know there's no multi-value field other than multi-select picklists (and they map to semicolon-separated string). Generally platform encourages you to create a proper relationship with another (possibly new, custom) table if you're in need of having multiple values associated to your data.
Only other "unusual" thing I can think of is how the OwnerId field on certain objects (Case, Lead, maybe something else) can be used to point to User or Queue record. Looks weird when you are used to foreign key relationships from traditional databases. But this is not identical with what you're asking as there will be only one value at a time.
Of course you might be surpised sometimes with values you'll see in the database depending on the viewing user's locale (stuff like System Administrator profile becoming Systeembeheerder in Dutch). But this will be still a single value, translated on the fly just before the query results are sent back to you.
When I had to perform SOAP integration with SFDC, I've always used WSDL files and most of the time was fine with Java code generated out of them with Apache Axis. Hand-crafting the SOAP message yourself seems... wow, hardcore a bit. Are you sure you prefer visualisation of XML over the creation of classes, exceptions and all this stuff ready for use with one of several out-of-the-box integration methods? If they'll ever change the WSDL I need just to regenerate the classes from it; whereas changes to your SOAP message creation library might be painful...