Encoding and decoding implicit tagging - encoding

I have a question about explicit and implicit tagging, in the following example
X ::= [APPLICATION 5] IMPLICIT INTEGER
for X, since the implicit tag will replace the existing tag on INTEGER with [APPLICATION 5], so the encoding in BER of the value 5 would be in hex 45 01 05. How does the decoder know the type from 45 01 05?

I suspect your real question is, "How can a BER decoder know what to do when implicit tags are used and these tags replace the tags that would otherwise signal the ASN.1 type that needs to be decoded?"
Whether the decoder can handle IMPLICIT tags depends on whether the decoder is informed by the ASN.1 specification, which provides the necessary context. There are requirements imposed on the components of SEQUENCE, SET, and CHOICE to ensure that a decoder can read a tag and know which component needs to be decoded and, therefore, what the type is. This requires knowledge of the ASN.1 specification.
By contrast, a generic BER decoder that is not informed by the ASN.1 specification will have a problem with implicit tags, because it lacks the necessary context to interpret them.

The only way for the decoder to recover original type from octet stream is to know that it is coming. AFAIK, your decoder should be given a hint on what type to expect in given circumstances and, most importantly, into what base ASN.1 type that implicitly tagged type maps.
Consider checking out this book.

Usually, the BER decoder is generated by an ASN.1 compiler based on the given specification (schema). Then, during the decoding, beside the input encoded data, the users will also specify the type that they want to decode. Using the type information the decoder will know what to decode.

First ,I have cheked a book of "ASN.1 Communication between Heterogeneous Systems" that send me Ilya Etingof , the following shows more detaills:
"The IMPLICIT marker proceeds as follows: all the following tags, explicitly mentioned or indirectly reached through a type reference are ignored until the next occurrence (included) of the UNIVERSAL class tag (except if the EXPLICIT marker is encountered before). So, for the type T below:
T ::= [1] IMPLICIT T1
T1 ::= [5] IMPLICIT T2*
T2 ::= [APPLICATION 0] IMPLICIT INTEGER
only the tag [1] should be encoded. Another way of explaining the concept
of implicit tagging is to say that a tag marked IMPLICIT overwrites
the tag that follows it (recursively); hence, for the example above, tag[1] overwrites tag [5], which in turn overwrites tag [APPLICATION 0] which
fnally overwrites the default tag [UNIVERSAL 2] of the INTEGER type.
A type tagged in implicit mode can be decoded only if the receiving
application `knows' the abstract syntax, i.e. the decoder has been
generated from the same ASN.1 module as the encoder was (and such
is the case most of the time)."
So i guess that a negociation of (ASN1 specification)should be perfermed in the presentation layaer at the begining of transfert of data.

Related

ASN.1 sequence with missing tag/length field

I'm implementing a specification that, as the outermost data type, specifies a sequence
LogMessage ::= SEQUENCE {
version INTEGER (4),
...
}
When encoded, I would expect the messages to always start with 30, but this is not the case. Indeed what I see when I look at the messages is that the inner part of the SEQUENCE is encoded, but the outer definition (30 and length field of payload) is omitted.
I cannot find why this is happening (the ASN.1 looks just "normal"). Is there a special mode in which this behavior can be seen? Of course I can "manually" trucate that leading data, but I would like to build a robust solution and if this is somehow defined in ASN.1, I'm sure my library (pyasn1) would have an option to set it.

Swift: Compiler's conversion from type to optional type

It looks like compiler automatically converts a type into an optional type when needed, even though there is no inheritance relationship here.
Where in the documentation is this behavior specified?
func test(value: String?) {
// String passed in is now an optional String instead.
print(value ?? "")
}
// Pass an actual string
test(value: "test")
This behaviour is actually explicitly documented in a well-hidden corner of the docs folder of the Swift github repo.
Citing swift/docs/archive/LangRef.html [changed some formatting; emphasis mine]:
Types
type ::= attribute-list type-function
type ::= attribute-list type-array
...
type-simple ::= type-optional
Swift has a small collection of core datatypes that are built into
the compiler. Most user-facing
datatypes are defined by the standard library or declared as a user
defined types.
...
Optional Types
Similar constructs exist in Haskell
(Maybe),
the Boost library
(Optional),
and C++14
(optional).
type-optional ::= type-simple '?'-postfix
An optional type is syntactic sugar for the library type Optional<T>.
This is a enum with two cases: None and Some, used to represent
a value that may or may not be present.
Swift provides a number of special, builtin behaviors involving this
library type:
There is an implicit conversion from any type T to the corresponding optional type T?.
...
See the htmlpreview.github.io rendering of the HTML for easier overview of the docs than the .html source:
http://htmlpreview.github.io/?https://github.com/apple/swift/blob/master/docs/archive/LangRef.html
htmlpreview of the LangRef.html at July 25 2017 (from which state the information above has been cited)
Now, this is me speculating, but the reason why this is not very publicly available (and also not entirely up to date; placed in the archive sub-folder, and still using the old None and Some cases rather than none and some, respectively) is probably because the Swift team (no longer?) see a reason for general Swift users to know details regarding the compiler "magic" associated with the very special type Optional, but rather focuses on the use cases and grammar (in the context of the Swift language and not its compiler) of Optional.

ASN.1 BER Sequence with undefined tags

I need to decode some BER messages that are quite long, and I have two different situations. One has a couple of mandatory parameters with no specific tags, and a lot of optional parameters with implicit tags. The other has only optional implicit tags, for instance :
Case 1:
MySeq ::= SEQUENCE
{
a TYPE1,
b TYPE1,
C TYPE1,
-- first 3 elements have same type
d IMPLICT [1] TYPEd OPTIONAL,
e IMPLICT [1] TYPEe OPTIONAL,
etc, and many more parameters, around 40, some of them constructed, with also constructed parameters inside.
Case 2:
MySeq ::= SEQUENCE
{
a IMPLICT [1] TYPEa OPTIONAL,
b IMPLICT [2] TYPEb OPTIONAL,
c IMPLICT [3] TYPEc OPTIONAL,
d IMPLICT [4] TYPEd OPTIONAL,
e IMPLICT [5] TYPEe OPTIONAL,
etc
The point is, I really need just 3 or 4 parameters from those messages.
I do not care about the rest. I don't want my decoder to spend so much processing time decoding the full message if I dont need it. Is there any standard way to do this ?
In the second case I came with an idea, to change the ASN.1 definition from SEQUENCE to SET,like :
MySeq ::= SET
{
a IMPLICT [1] TYPEa OPTIONAL,
a20 IMPLICT [20] TYPEa OPTIONAL,
a40 IMPLICT [40] TYPEa OPTIONAL,
...
}
I mean, the parse will just decode those 3 parameters as a SET. Of course I will have to modify the binary message on reception to convert it from SEQUENCE to SET (just one bit). But I can't do that with the first SEQUENCE.
Is there any way to indicate to "ignore unknown tags" ?
I have read about the "EXTENSIBILITY IMPLIED", but I can't understand if that is what I need, or it just implies extensibility at the end of SEQUENCE, as if I was using the extensibility marker "...".
Thanks in advance,
Luis
Trying to fiddle with the SEQUENCE tag to change it to a SET tag is dangerous since a sequence can contain the same tag multiple times as long as there is a non-optional component between them. A SET cannot handle this. Also, decoding a SET is inherently more complicated than decoding a SEQUENCE in a robust manner since the decoder must be able to handle the components in any order.
Regarding EXTENSIBILITY IMPLIED, you are correct that it is the equivalent of adding the ... extension mark to the end of each SEQUENCE, SET and CHOICE type, so I am not sure this would help you. If it is not just at the end of each SEQUENCE that you need an extension mark, this may not be useful to you.
One alternative is to try using the "partial decoding" feature of the OSS ASN.1 Tools (http://www.oss.com) which allows you to select particular components of a message that you are interested in and skip past the others.
Disclosure: I work for OSS Nokalva, Inc.

Argonaut CodecJson and decoding subtypes

In the Argonaut DecodeJson trait there is a method ||| for chaining together decoders, so that the first succeeding decoder is chosen. There is also a similar method in DecodeResult which has the same effect. It looks at first glance as though one of these would be what we want for decoding multiple subtypes of a common trait. However, how do we actually do this?
The first problem is that the argument to ||| has to be a DecodeJson decoding a supertype of the type that the callee is supposed to be decoding (and similarly for DecodeResult). I would expect that such a decoder would be able to decode all of the subtypes of the common supertype, so this seems like a recipe for infinite recursion!
We can get around this using the following ugly asInstanceOf hack while defining the CodecJson for the supertype:
c => c.as[A] ||| c.as[Foo](implicitly[DecodeJson[B]].asInstanceOf[DecodeResult[Foo]])
However, then there is still a problem when decoding more than two subtypes. Assume there are subtypes A, B and C of Foo. Now what? How do we add yet another alternative to this decoding expression? .asInstanceOf[DecodeResult[AnyRef]] is going to destroy the type-safety of the parsed result (as if we hadn't already discarded type-safety at this point!). And then we are quickly going to run out of options with 4, 5, or 6 alternatives.
EDIT: I will gladly accept as an answer any alternative approach to decoding more-than-2-wide subtype hierarchies using Argonaut.

How do purely functional compilers annotate the AST with type info?

In the syntax analysis phase, an imperative compiler can build an AST out of nodes that already contain a type field that is set to null during construction, and then later, in the semantic analysis phase, fill in the types by assigning the declared/inferred types into the type fields.
How do purely functional languages handle this, where you do not have the luxury of assignment? Is the type-less AST mapped to a different kind of type-enriched AST? Does that mean I need to define two types per AST node, one for the syntax phase, and one for the semantic phase?
Are there purely functional programming tricks that help the compiler writer with this problem?
I usually rewrite a source (or an already several steps lowered) AST into a new form, replacing each expression node with a pair (tag, expression).
Tags are unique numbers or symbols which are then used by the next pass which derives type equations from the AST. E.g., a + b will yield something like { numeric(Tag_a). numeric(Tag_b). equals(Tag_a, Tag_b). equals(Tag_e, Tag_a).}.
Then types equations are solved (e.g., by simply running them as a Prolog program), and, if successful, all the tags (which are variables in this program) are now bound to concrete types, and if not, they're left as type parameters.
In a next step, our previous AST is rewritten again, this time replacing tags with all the inferred type information.
The whole process is a sequence of pure rewrites, no need to replace anything in your AST destructively. A typical compilation pipeline may take a couple of dozens of rewrites, some of them changing the AST datatype.
There are several options to model this. You may use the same kind of nullable data fields as in your imperative case:
data Exp = Var Name (Maybe Type) | ...
parse :: String -> Maybe Exp -- types are Nothings here
typeCheck :: Exp -> Maybe Exp -- turns Nothings into Justs
or even, using a more precise type
data Exp ty = Var Name ty | ...
parse :: String -> Maybe (Exp ())
typeCheck :: Exp () -> Maybe (Exp Type)
I cant speak for how it is supposed to be done, but I did do this in F# for a C# compiler here
The approach was basically - build an AST from the source, leaving things like type information unconstrained - So AST.fs basically is the AST which strings for the type names, function names, etc.
As the AST starts to be compiled to (in this case) .NET IL, we end up with more type information (we create the types in the source - lets call these type-stubs). This then gives us the information needed to created method-stubs (the code may have signatures that include type-stubs as well as built in types). From here we now have enough type information to resolve any of the type names, or method signatures in the code.
I store that in the file TypedAST.fs. I do this in a single pass, however the approach may be naive.
Now we have a fully typed AST you could then do things like compile it, fully analyze it, or whatever you like with it.
So in answer to the question "Does that mean I need to define two types per AST node, one for the syntax phase, and one for the semantic phase?", I cant say definitively that this is the case, but it is certainly what I did, and it appears to be what MS have done with Roslyn (although they have essentially decorated the original tree with type info IIRC)
"Are there purely functional programming tricks that help the compiler writer with this problem?"
Given the ASTs are essentially mirrored in my case, it would be possible to make it generic and transform the tree, but the code may end up (more) horrendous.
i.e.
type 'type AST;
| MethodInvoke of 'type * Name * 'type list
| ....
Like in the case when dealing with relational databases, in functional programming it is often a good idea not to put everything in a single data structure.
In particular, there may not be a data structure that is "the AST".
Most probably, there will be data structures that represent parsed expressions. One possible way to deal with type information is to assign a unique identifier (like an integer) to each node of the tree already during parsing and have some suitable data structure (like a hash map) that associates those node-ids with types. The job of the type inference pass, then, would be just to create this map.