I need an example to understand Implicit Tagging in ASN.1 - tags

I have been going through the following tutorial
http://www.obj-sys.com/asn1tutorial/node12.html
Can you help me understand implicit tagging with an example?

In ASN.1 tagging, in fact, serves two purposes: typing and naming. Typing means it tells an en-/decoder what kind of data type that is (is it a string, an integer, a boolean, a set, etc.), naming means that if there are multiple fields of the same type and some (or all of them) are optional, it tells the en-/decoder for which field that value is.
If you compare ASN.1 to, let's say, JSON, and you look at the following JSON data:
"Image": {
"Width": 800,
"Height": 600,
"Title": "View from 15th Floor"
}
You'll notice that in JSON every field is always explicitly named ("Image", "Width", "Height", "Title") and either explicitly or implicitly typed ("Title" is a string, because its value is surrounded by quotes, "Width" is an integer, because it has no quotes, only digits, it's not "null", "true" or "false", and it has no decimal period).
In ASN.1 this piece of data would be:
Image ::= SEQUENCE {
Width INTEGER,
Height INTEGER,
Title UTF8String
}
This will work without any special tagging, here only the universal tags are required. Universal tags don't name data, they just type data, so en-/decoder know that the first two values are integers and the last one is a string. That the first integer is Width and the second one is Height doesn't need to be encoded in the byte stream, it is defined by their order (sequences have a fixed order, sets don't. On the page you referred to sets are being used).
Now change the schema as follows:
Image ::= SEQUENCE {
Width INTEGER OPTIONAL,
Height INTEGER OPTIONAL,
Title UTF8String
}
Okay, now we have a problem. Assume that the following data is received:
INTEGER(750), UTF8String("A funny kitten")
What is 750? Width or Height? Could be Width (and Height is missing) or could be Height (and Width is missing), both would look the same as a binary stream. In JSON that would be clear as every piece of data is named, in ASN.1 it isn't. Now a type alone isn't enough, now we also need a name. That's where the non-universal tags enter the game. Change it to:
Image ::= SEQUENCE {
Width [0] INTEGER OPTIONAL,
Height [1] INTEGER OPTIONAL,
Title UTF8String
}
And if you receive the following data:
[1]INTEGER(750), UTF8String("A funny kitten")
You know that 750 is the Height and not the Width (there simply is no Width). Here you declare a new tag (in that case a context specific one) that serves two purposes: It tells the en-/decoder that this is an integer value (typing) and it tells it which integer value that is (naming).
But what is the difference between implicit and explicit tagging? The difference is that implicit tagging just names the data, the en-/decoder needs to know the type implicitly for that name, while explicit tagging names and explicitly types the data.
If tagging is explicit, the data will be sent as:
[1]INTEGER(xxx), UTF8String(yyy)
so even if a decoder has no idea that [1] means Height, it knows that the bytes "xxx" are to be parsed/interpreted as an integer value. Another important advantage of explicit tagging is that the type can be changed in the future without changing the tag. E.g.
Length ::= [0] INTEGER
can be changed to
Length ::= [0] CHOICE {
integer INTEGER,
real REAL
}
Tag [0] still means length, but now length can either be an integer or a floating point value. Since the type is encoded explicitly, decoders will always know how to correctly decode the value and this change is thus forward and backward compatible (at least at decoder level, not necessarily backward compatible at application level).
If tagging is implicit, the data will be sent as:
[1](xxx), UTF8String(yyy)
A decoder that doesn't know what [1] is, will not know the type of "xxx" and thus cannot parse/interpret that data correctly. Unlike JSON, values in ASN.1 are just bytes. So "xxx" may be one, two, three or maybe four bytes and how to decode those bytes depends on their data type, which is not provided in the data stream itself. Also changing the type of [1] will break existing decoders for sure.
Okay, but why would anyone use implicit tagging? Isn't it better to always use explicit tagging? With explicit tagging, the type must also be encoded in the data stream and this will require two additional bytes per tag. For data transmissions containing several thousand (maybe even millions of) tags and where maybe every single byte counts (very slow connection, tiny packets, high packet loss, very weak processing devices) and where both sides know all custom tags anyway, why wasting bandwidth, memory, storage and/or processing time for encoding, transmitting and decoding unnecessary type information?
Keep in mind that ASN.1 is a rather old standard and it was intended to achieve a highly compact representation of data at a time where network bandwidth was very expensive and processors several hundred times slower than today. If you look at all the XML and JSON data transfers of today, it seems ridiculous to even think about saving two bytes per tag.

I find this thread to be clear enough, it also contains (small) examples even tough they are quite 'extreme' examples at that. A more 'realistic' examples using IMPLICIT tags can be found in this page.

Using the accepted answer as an example of encoding:
Image ::= SEQUENCE {
Width INTEGER,
Height INTEGER,
Title UTF8String
}
An example of encoding would be:
The internal sequence breaks down into:
Explicit Optional
If you then have EXPLICIT OPTIONAL values:
Image ::= SEQUENCE {
Width [0] EXPLICIT INTEGER OPTIONAL,
Height [1] EXPLICIT INTEGER OPTIONAL,
Title UTF8String
}
The encoded sequence might be:
SEQUENCE 30 15 A1 02 02 02 EE 0C 0E 41 20 66 75 6E 6E 79 20 6B 69 74 74 65 6E (21-bytes)
And the internal sequence breaks down into:
CONTEXT[1] INTEGER: A1 02 02 02 EE 750 (2-bytes)
UTF8STRING: 0C 0E 41 20 66 75 6E 6E 79 20 6B 69 74 74 65 6E "A funny kitten" (14-bytes)

Related

How can I return true in COBOL

So I am making a program that checks if a number is divisible by another number or not. If it is its supposed to return true, otherwise false. Here's what I have so far.
P.S : I'm using IBM (GnuCOBOL v2.2 -std=ibm-strict -O2) to run this.
IDENTIFICATION DIVISION.
PROGRAM-ID. CHECKER.
DATA DIVISION.
WORKING-STORAGE SECTION.
01 BASE PIC 9(5).
01 FACTOR PIC 9(2).
01 RESULT PIC 9(5).
88 TRU VALUE 0 .
88 FAL VALUE 1 THRU 99 .
PROCEDURE DIVISION.
CHECK-FOR-FACTOR SECTION.
IF FUNCTION MOD(BASE, FACTOR) = 0 THEN
SET TRU TO TRUE
ELSE
SET FAL TO TRUE
END-IF.
END PROGRAM CHECKER.
It gives me error saying invalid use of level 88. I'm sure I'm making a mistake, and I've searched for couple of days and I can't seem to find anything that can help me with it. Any ideas if it is possible in COBOL or does COBOL handle all the boolean stuff some other way ?
(Kindly do not reply with look up level 88 or some other stuff like that, I have already looked them up and they haven't been helping)
To return TRUE from a program you'd need an implementation that has boolean USAGE, define that in LINKAGE and specify it in PROCEDURE-DIVISION RETURNING true-item and also use CALL 'yourprog' RETURNING true-item.
Your specified environment GnuCOBOL doesn't have a boolean USAGE in 2021 and can't handle RETURNING phrase of PROCEDURE DIVISION in programs.
But you can use a very common extension to COBOL which is available in both IBM and GnuCOBOL:
Before the program ends MOVE RESULT TO RETURN-CODE (which is a global register) and in the calling program check its value (and reset it to zero).
Then it is only up to you what value means "true" (in your program it is 0).
As an alternative you could create a user-define function (FUNCTION-ID instead of PROGRAM-ID and use the RETURNING phrase to pass your result) - but that would mean you need to use IF FUNCTION instead of CALL + IF RETURN-CODE in each caller.

Is it valid that two FIX messages are sent together in one go?

My QuickFIX client is complaining that the body length is not expected.
After checking, it is found that it receives a message which actually contains 2 messages (2 different MsgTypes <35>). Also, 2 BeginStrings <8>
Is it a valid message?
The error is reported by QuickFIX, instead of my own code.
Hence, it looks like an invalid message to me although I cannot find any official doc, saying that it is not allowed.
I would expect that QuickFIX could parse the messages as long as the body length of the first message is correct.
You could check if the body length is correct by using the following:
counting the number of characters in the message following the BodyLength (9) field up to, and including, the delimiter immediately preceding the CheckSum (10) field. ALWAYS SECOND FIELD IN MESSAGE. (Always unencrypted) For example, for message 8=FIX 4.4^9=5^35=0^10=10^, the BodyLength is 5 for 35=0^
Source: https://btobits.com/fixopaedia/fixdic44/index.html?tag_9_BodyLength.html
Its completely depends on your fix engine whether to support multiple messages in one go or not.
Using BodyLength[9] and CheckSum[10] fields.
BodyLength is calculated starting from field starting after BodyLenght and
before CheckSum field.
CheckSum is calculated from ‘8= upto SOH before the checksum field.
Binary value of each character is calculated and compared to the LSB of the calculated value to the checksum value.
If the checksum has been calculated to be 274 then the modulo 256 value is 18 (256 + 18 = 274). This value would be transmitted a 10=018 where
"10="is the tag for the checksum field.

Encoding of a CHOICE type when the CHOICE itself is used with an implicit tag (using the specific example: CRLDistPoints)

The Botan crypto library has only a very limited support for the X.509 extension CRLDistributionPoint and actually throws an exception, if any of the "advanced" attributes of the extension are set which are not expected by Botan.
Hence, I try to patch the decoding of this extension, but I have a problem to correctly determine the type of the encoded objects based on the tags. Either this is an oversight in the specification for this extension (I doubt it) or I am subject to a fundamental misunderstanding of the encoding/decoding rules.
Here are the relevant parts of the specification
CertificateExtensions {joint-iso-itu-t ds(5) module(1)
certificateExtensions(26) 5} DEFINITIONS IMPLICIT TAGS
CRLDistPointsSyntax ::= SEQUENCE SIZE (1..MAX) OF DistributionPoint
DistributionPoint ::= SEQUENCE {
distributionPoint [0] DistributionPointName OPTIONAL,
reasons [1] ReasonFlags OPTIONAL,
cRLIssuer [2] GeneralNames OPTIONAL
}
DistributionPointName ::= CHOICE {
fullName [0] GeneralNames,
nameRelativeToCRLIssuer [1] RelativeDistinguishedName
}
The modules uses implicit tagging by default. In the following this will be important. A DistributionPoint is a SEQUENCE where all attributes are optional. The first optional attribute distributionPoint has the type tag 0 and is of type DistributionPointName. In turn, DistributionPointName is a CHOICE with possible choices which are either tag 0 (if GeneralNames is chosen) or tag 1 (if RelativeDistinguishedName is chosen).
According to my understanding, in case of implicit tagging a CHOICE type is encoded using the tag of the chosen type. In other words, a CHOICE type is not "nested" somehow but encoded on the same level on which the CHOICE type is used. But DistributionPointName has already been given the tag 0.
The specific question is: How is a DistributionPoint encoded, if nameRelativeToCRLIssuer (tag 1) is chosen as the choice for DistributionPointName without triggering a clash with tag 1 of the reasons attribute?
Here is an illustration of my problem:
30 // Type tag for outer SEQUENCE, DistributionPoint starts here
ll // Length of SEQUENCE, omitted here for editorial purposes
+--> 00 vs 01 // Type tag for distributionPoint
| // At first glance, 00 according to SEQUENCE definition for OPTIONAL DistributionPointName,
| // but maybe 01 if RelativeDistinguishedName is the selected CHOICE
| kk // Length of RelativeDistinguishedName, omitted here for editorial purposes
| vv // Encoding of RelativeDistinguishedName begins
| vv
| vv // Encoding of RelativeDistinguishedName ends, accordingly to length kk
+--> 01 // Type tag for OPTIONAL ReasonsFlags
jj // Length of ReasonsFlags
ww // Encoding of ReasonsFlags begins
ww
ww // Encoding of ReasonsFlags ends, accordingly to length jj
// Encoding of DistributionPoint ends, too, accordingly to length ll
In line three, the type tag should be 00 to indicate that the OPTIONAL DistributionPointName exists. This also avoids a clash with the type tag 01 in line 8 for the OPTIONAL ReasonFlags.
However, in line three, the type tag should also indicate which type has been chosen for DistributionPointName. :-(
According to my understanding, in case of implicit tagging a CHOICE
type is encoded using the tag of the chosen type. In other words, a
CHOICE type is not "nested" somehow but encoded on the same level on
which the CHOICE type is used. But DistributionPointName has already
been given the tag 0.
I'm afraid this is the opposite: CHOICE tagging is always explicit whatever the default tagging ...
In the X.680 document, there is following note
The tagging construction specifies explicit tagging if any of the following holds:
c) the "Tag Type" alternative is used and the value of "TagDefault" for
the module is IMPLICIT TAGS or AUTOMATIC TAGS, but the type defined by
"Type" is an untagged choice type, an untagged open type, or an
untagged "DummyReference" (see Rec. ITU-T X.683 | ISO/IEC 8824-4,
8.3).
So, if RelativeDistinguishedName is chosen, distributionPoint component tagging will be 0 (distributionPoint) and then 1 (RelativeDistinguishedName)
The reason for this is that CHOICE does not have a UNIVERSAL tag

COBOL add 0 to a Variable in COMPUTE

I ran into a strange statement when working on a COBOL program from $WORK.
We have a paragraph that is opening a cursor (from DB2), and the looping over it until it hits an EOT (in pseudo code):
... working storage ...
01 I PIC S9(9) COMP VALUE ZEROS.
01 WS-SUB PIC S9(4) COMP VALUE 0.
... code area ...
PARA-ONE.
PERFORM OPEN-CURSOR
PERFORM FETCH-CURSOR
PERFORM VARYING I FROM 1 BY 1 UNTIL SQLCODE = DB2EOT
do stuff here...
END-PERFORM
COMPUTE WS-SUB = I + 0
PERFORM CLOSE-CURSOR
... do another loop using WS-SUB ...
I'm wondering why that COMPUTE WS-SUB = I + 0 line is there. My understanding is that I will always at least be 1, because of the perform block above it (i.e., even if there is an EOT to start with, I will be set to one on that initial iteration).
Is that COMPUTE line even needed? Is it doing some implicit casting that I'm not aware of? Why would it be there? Why wouldn't you just MOVE I TO WS-SUB?
Call it stupid, but with some compilers (with the correct options in effect), given
01 SIGNED-NUMBER PIC S99 COMP-5 VALUE -1.
01 UNSIGNED-NUMBER PIC 99 COMP-5.
...
MOVE SIGNED-NUMBER TO UNSIGNED-NUMBER
DISPLAY UNSIGNED-NUMBER
results in: 255. But...
COMPUTE UNSIGNED-NUMBER = SIGNED-NUMBER + ZERO
results in: 1 (unsigned)
So to answer your question, this could be classified as a technique used cast signed numbers into unsigned numbers. However, in the code example you gave it makes no sense at all.
Note that the definition of "I" was (likely) coded by one programmer and of WS-SUB by another (naming is different, VALUE clause is different for same purpose).
Programmer 2 looks like "old school": PIC S9(4), signed and taking up all the digits which "fit" in a half-word. The S9(9) is probably "far over the top" as per range of possible values, but such things concern Programmer 1 not at all.
Probably Programmer 2 had concerns about using an S9(9) COMP for something requiring (perhaps many) fewer than 9999 "things". "I'll be 'efficient' without changing the existing code". It seems to me unlikely that the field was ever defined as unsigned.
A COMP/COMP-4 with nine digits does have a performance penalty when used for calculations. Try "ADD 1" to a 9(9) and a 9(8) and a 9(10) and compare the generated code. If you can have nine digits, define with 9(10), otherwise 9(8), if you need a fullword.
Programmer 2 knows something of this.
The COMPUTE with + 0 is probably deliberate. Why did Programmer 2 use the COMPUTE like that (the original question)?
Now it is going to get complicated.
There are two "types" of "binary" fields on the Mainframe: those which will contain values limited by the PICture clause (USAGE BINARY, COMP and COMP-4); those which contain values limited by the field size (USAGE COMP-5).
With BINARY/COMP/COMP-4, the size of the field is determined from the PICture, and so are the values that can be held. PIC 9(4) is a halfword, with a maxiumum value of 9999. PIC S9(4) a halfword with values -9999 through +9999.
With COMP-5 (Native Binary), the PICture just determines the size of the field, all the bits of the field are relevant for the value of the field. PIC 9(1) to 9(4) define halfwords, pic 9(5) to 9(9) define fullwords, and 9(10) to 9(18) define doublewords. PIC 9(1) can hold a maximum of 65535, S9(1) -32,768 through +32,767.
All well and good. Then there is compiler option TRUNC. This has three options. STD, the default, BIN and OPT.
BIN can be considered to have the most far-reaching affect. BIN makes BINARY/COMP/COMP-4 behave like COMP-5. Everything becomes, in effect, COMP-5. PICtures for binary fields are ignored, except in determining the size of the field (and, curiously, with ON SIZE ERROR, which "errors" when the maxima according to the PICture are exceeded). Native Binary, in IBM Enterprise Cobol, generates, in the main, though not exclusively, the "slowest" code. Truncation is to field size (halfword, fullword, doubleword).
STD, the default, is "standard" truncation. This truncates to "PICture". It is therefore a "decimal" truncation.
OPT is for "performance". With OPT, the compiler truncates in whatever way is the most "performant" for a particular "code sequence". This can mean intermediate values and final values may have "bits set" which are "outside of the range" of the PICture. However, when used as a source, a binary field will always only reflect the value specified by the PICture, even if there are "excess" bits set.
It is important when using OPT that all binary fields "conform to PICture" meaning that code must never rely on bits which are set outside the PICture definition.
Note: Even though OPT has been used, the OPTimizer (OPT(STD) or OPT(FULL)) can still provide further optimisations.
This is all well and good.
However, a "pickle" can readily ensue if you "mix" TRUNC options, or if the binary definition in a CALLing program is not the same as in the CALLed program. The "mix" can occur if modules within the same run-unit are compiled with different TRUNC options, or if a binary field on a file is written with one TRUNC option and later read with another.
Now, I suspect Programmer 2 encountered something like this: Either, with TRUNC(OPT) they noticed "excess bits" in a field and thought there was a need to deal with them, or, through the "mix" of options in a run-unit or "across file usage" they noticed "excess bits" where there would be a need to do something about it (which was to "remove the mix").
Programmer 2 developed the COMPUTE A = B + 0 to "deal" with a particular problem (perceived or actual) and then applied it generally to their work.
This is a "guess", or, better, a "rationalisation" which works with the known information.
It is a "fake" fix. There was either no problem (the normal way that TRUNC(OPT) works) or the correct resolution was "normalisation" of the TRUNC option across modules/file use.
I do not want loads of people now rushing off and putting COMPUTE A = B + 0 in their code. For a start, they don't know why they are doing it. For a continuation it is the wrong thing to do.
Of course, do not just remove the "+ 0" from any of these that you find. If there is a "mix" of TRUNCs, a program may stop "working".
There is one situation in which I have used "ADD ZERO" for a BINARY/COMP/COMP-4. This is in a "Mickey Mouse" program, a program with no purpose but to try something out. Here I've used it as a method to "trick" the optimizer, as otherwise the optimizer could see unchanging values so would generate code to use literal results as all values were known at compile time. (A perhaps "neater" and more flexible way to do this which I picked up from PhilinOxford, is to use ACCEPT for the field). This is not the case, for certain, with the code in question.
I wonder if a testing version of the sources ever had
COMPUTE WS-SUB = I + 0
ON SIZE ERROR
DISPLAY "WS-SUB overflow"
STOP RUN
END-COMPUTE
with the range test discarded when the developer was satisfied and cleaning up? MOVE doesn't allow declarative SIZE statements. That's as much of a reason as I could see. Or perhaps developer habit of using COMPUTE to move, as a subtle reminder to question the need for defensive code at every step? And perhaps not knowing, as Joe pointed out, the SIZE clause would be just as effective without the + 0? Or a maintainer struggled with off by one errors and there was a corrective change from 1 to 0 after testing?

Sending hex characters over a socket

I am trying to send a hex character to a socket to indicate a new message. This code works:
$socket->send("\x{0B}");
$socket->send($contents);
$socket->send("\x{1C}");
$socket->send("\x{0D}");
However, since this happens in a loop, I need variable hex characters, and I have not figure out how to get it to work. This is what I have tried.
my $start_char = get(); # returns, for example 0B
my $end_char = get(); # 1C
my $end_seg = get(); #0D
$socket->send("\x{$start_char}");
$socket->send($contents);
$socket->send("\x{$end_char}");
$socket->send("\x{$end_seg}");
I can verify that the variables returned by the function are correct on the perl side, but the server does not accept them as valid characters. Any input regarding how to do this?
Try ...send( chr($start_char) );, etc. (guessing that get() is actually returning integers).
If it really gives you strings like "0B", then ...send( chr(hex($start_chr)) );
If you have a small amount of data, ysth's answer makes sense.
If you have a larger amount of data, you may want to look at pack.
pack ("H*", "0B") and pack ("C*", 0x0B) both give "\x0B".