Want to know what allth terms in windbg output of "!analyze -v" indicates? - windbg

What the key value indicates.......and which is the term help me to undersatnd how the windbg bucket the crashes means how it braodly classify the crashes into?

help me to understand the windbg bucket
IMHO, the idea of buckets was introduced for WER (Windows Error Reporting). WER was used by Microsoft but was also available for companies. WER included a service where you could log in on a Microsoft website and then get an overview of your application crashes.
Of course, people were not interested in a flat list of crashes, but they wanted to know how many crashes of the same type occured. Thus Microsoft and other company could focus on fixing those bugs first which affected most of the users.
The bucket, as the name suggests, is a container where similar problems grouped. The bucket ID is generated in 2 phases: a labeling process which was done on the client side and a classifying process which was done on server side.
What you get from !analyze is the classification, so basically you have access to the functionality via WinDbg that Microsoft used on the server side for providing the WER services.
These WER services are not available any more. They hae been replaced by something else, but I have forgotten the name.
how it braodly classify the crashes into?
An ideal bucketing algorithm would create a new bucket for each bug. So the number of buckets is just limited by the amount of bugs you can code into your application.
The command !analyze has implemented more than 500 different heuristics. The combination of these can create more than 25.000.000 different buckets.
Buckets can differ because of
stack
modules
function name
function offset
corruptions (heap corruption, image corruption)
detected malware
known outdated programs or libraries
known defective hardware
exception codes
exception subcodes
...
The result of that bucketing process is this line of output:
FAILURE_BUCKET_ID: BREAKPOINT_80000003_ntdll.dll!LdrpDoDebuggerBreak
which is probably somehow equivalent to this hash:
FAILURE_ID_HASH: {06f54d4d-201f-7f5c-0224-0b1f2e1e15a5}
I have read some of your previous questions in the windbg tag and I get the impression that you want to use the bucket ID to display some meaningful information to humans.
Actually, the WER system provided such a feature. It worked like this: a developer analyzes the crashes in a bucket and finds out what to do (e.g. update a driver, install a newer version of the application etc). He then assigns that bucket ID a text. Any customers that experience the same crash again were redirected to a website at Microsoft that contained the text written by the developer.
However, note that there is no magic involved that would transfer a crash into something human readable. That's the developer doing hard work and then creating a mapping from the bucket ID to some text that is displayed.
IMHO, the latter can easily be achieved. However, any new bug will require an analysis first. But, who knows, maybe we can train an AI that does better at this.
For more on buckets etc. please read the Microsoft paper Debugging in the (Very) Large:Ten Years of Implementation and Experience

Related

What happens to performance.mark entries when the resource buffer gets full

I am building a large private (ie used behind a firewall) PWA and wondering how to improve the diagnostics if/when my users hit issues. I already have an error manager which uses navigator.sendBeacon to log the error on the server, but that lacks detailed info of what led up to that point.
A thought I had was to liberally mark the code with performance.mark() statements and on an error dump the performance buffer to the server. It would give me an ordered list of recent activity.
However it only makes sense to do this if the browser throws away the oldest entry to make way for the new when the internal buffer is full. However all the documentation I found with a google search doesn’t mention it. I am aware I can get an event when it is full and could use that to copy and clear it but I can find no words on what happens if I ignore the event. Neither can I find a typical size. I don’t want to keep getting entries filling up the entire computer memory either
Can anyone give me a definite answer
Edit: The more I look into this, the more confused I become. It appears that you can control the size of the resourceTimingBuffer but "resource" performance entries are related to fetch and not Performance.mark(). I can't find any statement on limitations.
There are no meaningful limitations I could find. I did a test and generated more than 4000 marks and they were all there and the memory usage did not increase in any measurable way.

Looking for first record layout of z/OS runnables starting with "IEWPLMH "

This feels something like an archeology expedition but I have been unable to find the record format of the first record of seemingly all executable load modules on z/OS systems. The record always starts with IEWPLMH even with when producing a GOFF format (which I have) runnable. Does anyone have any information on this or a link to it?
The format of load modules is documented in the Load Module Formats section of the z/OS MVS Program Management: Advanced Facilities manual.
But I suspect you are looking for the format of a program object, which is not documented, and, last I knew, IBM had stated they would not document (at least publicly for the likes of us).
There are decades of history behind this. IBM found themselves painted into a corner because customers had written code that depended on the format of load modules not changing. As of 2011, there were 8 different formats/subformats of program object and that number has no doubt grown. By not documenting (for customers) the format of a program object, IBM felt they had freed themselves to make format changes (adding features customers wanted) as they saw fit.
You may be able to get the information you want using the Binder's API or AMBLIST.
The use of the IEWBINDD facility is definitely the way to go. For USS programs,
When compiling the source, the -Wc,DLL option is required. When linking the -Wl,DYNAM=DLL does the trick. The example program in the appendix of the z/OS MVS Program Management: Advanced Facilities was very helpful.

syslog - log line classifications

A very generic question; in the context of a programmer, with operational aspect of the process (program) in mind.
Is there any sort of best-practice / guide to classify messages, particularly in the context of SaaS / multi-tenancy (server) software environment, which would be generating errors and warnings due to user actions or misconfiguration. Due to the nature of the software, most modules that I am having to deal with, are stateless; i.e when an error happens due to user-error, it is quite hard to distinguish between that and an operational error (like network misconfiguration, etc).
What I want to know is from some of you experienced folks; what is the sensible logic to be employed here, in order to make it easy for the operations boys/girls to classify these messages, and identify problems?
Just three aspects from an admin and log analysis/classification perspective:
Make the tag field/program name configurable. Then one can configure multiple instances to use log tags like app/user_1, app/user_2 etc., allowing for fast and simple filters on the syslog level.
Structure you messages from left to right, so one can filter different categories of log lines with simple search patterns or regular expression. E.g. config error - cannot parse line 123 or runtime warning - lost connection to DB xyz
For very structured logs you might also take a look at the 'structured data' field in syslog-protocol. So far it is rarely used and without tool support, but it allows for application log messages with namespaces and very clear key-value-attributes.
Identify the servers and server types (name, ip address, etc.)
Classify by severity, make sure all the clocks are in synch in order
to have the message ordered correctly.
Put a message/error code to filter/create some rules in your monitoring tool.
Put a module (used if several modules on one server)
Put a category for addressing general services like networking, etc.
I guess you will gather the logs from the different machines with their syslog deamon to a central machine in charge of supervision/monitoring.
Most *nix processes log to syslog (or should at least) using a semi-standard format "Month Day 24H-Time host process_name[pid]: message". Syslog incorporates ways to indicate the message's severity, use them (though keep in mind that the severity is from the system's prospective, not the applications).
If message is a debugging problem then it's usually "Function_Name File_Name Line_No Error_Code Error_Desc"; otherwise the format of the message is entirely program dependent.
For multi-tenant systems it's pretty common for the "message" part to start with some form of tenant identification, followed by the actual log message.

How to tag a scientific data processing tool to ensure repeatability

we develop a data processing tool to extract some scientific results out of a given set of raw data. In data science it is very important that you can re-obtain your results and repeat the calculations, that led to a result set
Since the tool is evolving, we need a way to find out which revision/build of our tool generated a given result set and how to find the corresponding source from which the tool was build.
The tool is written in C++ and Python; gluing together the C++ parts using Boost::Python. We use CMake as a build system generating Make files for Linux. Currently the project is stored in a subversion repo, but some of us already use git resp. hg and we are planning to migrate the whole project to one of them in the very near future.
What are the best practices in a scenario like this to get a unique mapping between source code, binary and result set?
Ideas we are already discussing:
Somehow injecting the global revision number
Using a build number generator
Storing the whole sourcecode inside the executable itself
This is a problem I spend a fair amount of time working on. To what #VonC has already written let me add a few thoughts.
I think that the topic of software configuration management is well understood and often carefully practiced in commercial environments. However, this general approach is often lacking in scientific data processing environments many of which either remain in, or have grown out of, academia. However, if you are in such a working environment, there are readily available sources of information and advice and lots of tools to help. I won't expand on this further.
I don't think that your suggestion of including the whole source code in an executable is, even if feasible, necessary. Indeed, if you get SCM right then one of the essential tests that you have done so, and continue to do so, is your ability to rebuild 'old' executables on demand. You should also be able to determine which revision of sources were used in each executable and version. These ought to make including the source code in an executable unnecessary.
The topic of tying result sets in to computations is also, as you say, essential. Here are some of the components of the solution that we are building:
We are moving away from the traditional unstructured text file that is characteristic of the output of a lot of scientific programs towards structured files, in our case we're looking at HDF5 and XML, in which both the data of interest and the meta-data is stored. The meta-data includes the identification of the program (and version) which was used to produce the results, the identification of the input data sets, job parameters and a bunch of other stuff.
We looked at using a DBMS to store our results; we'd like to go this way but we don't have the resources to do it this year, probably not next either. But businesses use DBMSs for a variety of reasons, and one of the reasons is their ability to roll-back, to provide an audit trail, that sort of thing.
We're also looking closely at which result sets need to be stored. A nice approach would be only ever to store original data sets captured from our field sensors. Unfortunately some of our computations take 1000s of CPU-hours to produce so it is infeasible to reproduce them ab-initio on demand. However, we will be storing far fewer intermediate data sets in future than we have in the past.
We are also making it much harder (I'd like to think impossible but am not sure we are there yet) for users to edit result sets directly. Once someone does that all the provenance information in the world is wrong and useless.
Finally, if you want to read more about the topic, try Googling for 'scientific workflow' and 'data provenance' similar topics.
EDIT: It's not clear from what I wrote above, but we have modified our programs so that they contain their own identification (we use Subversion's keyword capabilities for this with an extension or two of our own) and write this into any output that they produce.
You need to consider git submodules of hg subrepos.
The best practice in this scenario os to have a parent repo which will reference:
the sources of the tool
the result set generated from that tool
ideally the c++ compiler (won't evolve every day)
ideally the python distribution (won't evolve every day)
Each of those are a component, that is an independent repository (Git or Mercurial).
One precise revision of each component will be reference by a parent repository.
The all process is representative of a component-based approach, and is key in using an SCM (here Software Configuration Management) at its fullest.

machine learning and code generator from strings

The problem: Given a set of hand categorized strings (or a set of ordered vectors of strings) generate a categorize function to categorize more input. In my case, that data (or most of it) is not natural language.
The question: are there any tools out there that will do that? I'm thinking of some kind of reasonably polished, download, install and go kind of things, as opposed to to some library or a brittle academic program.
(Please don't get stuck on details as the real details would restrict answers to less generally useful responses AND are under NDA.)
As an example of what I'm looking at; the input I'm wanting to filter is computer generated status strings pulled from logs. Error messages (as an example) being filtered based on who needs to be informed or what action needs to be taken.
Doing Things Manually
If the error messages are being generated automatically and the list of exceptions behind the messages is not terribly large, you might just want to have a table that directly maps each error message type to the people who need to be notified.
This should make it easy to keep track of exactly who/which-groups will be getting what types of messages and to update the routing of messages should you decide that some of the messages are being misdirected.
Typically, a small fraction of the types of errors make up a large fraction of error reports. For example, Microsoft noticed that 80% of crashes were caused by 20% of the bugs in their software. So, to get something useful, you wouldn't even need to start with a complete table covering every type of error message. Instead, you could start with just a list that maps the most common errors to the right person and routes everything else to a person for manual routing. Each time an error is routed manually, you could then add an entry to the routing table so that errors of that type are handled automatically in the future.
Document Classification
Unless the error messages are being editorialized by people who submit them and you want to use this information when routing them, I wouldn't recommend treating this as a document classification task. However, if this is what you want to do, here's a list of reasonably good packages for document document classification organized by programming language:
Python - To do this using the Python based Natural Language Toolkit (NLTK), see the Document Classification section in the freely available NLTK book.
Ruby - If Ruby is more of your thing, you can use the Classifier gem. Here's sample code that detects whether Family Guy quotes are funny or not-funny.
C# - C# programmers can use nBayes. The project's home page has sample code for a simple spam/not-spam classifier.
Java - Java folks have Classifier4J, Weka, Lucene Mahout, and as adi92 mentioned Mallet.
Learning Rules with Weka - If rules are what you want, Weka might be of particular interest, since it includes a rule set based learner. You'll find a tutorial on using Weka for text categorization here.
Mallet has a bunch of classifiers which you can train and deploy entirely from the commandline
Weka is nice too because it has a huge number of classifiers and preprocessors for you to play with
Have you tried spam or email filters? By using text files that have been marked with appropriate categories, you should be able to categorize further text input. That's what those programs do, anyway, but instead of labeling your outputs a 'spam' and 'not spam', you could do other categories.
You could also try something involving AdaBoost for a more hands-on approach to rolling your own. This library from Google looks promising, but probably doesn't meet your ready-to-deploy requirements.