Specification and correct use of (boolean) URI matrix parameters (and making them optional when using CXF/JAXB)? - rest

I was wondering if the "proper" use of URI/URL matrix parameters was ever defined in a specification, such as an RFC or a W3 recommendation?
In particular, I just joined a project where we use matrix parameters and a Java framework to implement a REST service. One of the matrix parameters we have for our REST service is a boolean one, much like ;sortByDate=true
What bugged me about this one is that the Java framework we use apparently insists that boolean parameters are always passed in (i.e. you cannot make them optional/omit them; probably because they are converted to Java boolean type). I think that's a bit odd...
I have to doublecheck what framework we use tomorrow (I think it is JAXB), but in the meantime I wondered if matrix parameters were defined in an official specification somewhere, and if such a specification made any mention of boolean parameters.
So far I found a hint (though no mention of boolean matrix parameters) in Appendix B 2.2 of the W3's "HTML 4.01 Specification":
We recommend that HTTP server implementors, and in particular, CGI implementors support the use of ";" in place of "&" to save authors the trouble of escaping "&" characters in this manner.
And the "Web Application Description Language" specification specifies:
Boolean matrix parameters are represented as: ';' name when value is true and are omitted from identifier when value is false
What I haven't found is "the" specification for matrix parameters. Is there any? Does it mention how boolean matrix parameters should be used? If not, is there an established best practice?
And, as a bonus question: can you omit boolean URL matrix parameters when using CXF (JAXB), or do you always have to specify them?
Cheers! :)
Update: We're using CXF (which apparently uses JAXB under the hood...)

RFC3986 describes Matrix Parameters without explicitly naming them. Quoting https://www.rfc-editor.org/rfc/rfc3986#section-3.3:
For example, the semicolon (";") and equals ("=") reserved characters are often used to delimit parameters and parameter values applicable to that segment. The comma (",") reserved character is often used for similar purposes. For example, one URI producer might use a segment such as "name;v=1.1" to indicate a reference to version 1.1 of "name", whereas another might use a segment such as "name,1.1" to indicate the same.
I hope this helps.

I think that this answer does a good job of explaining the purpose of matrix parameters:
https://stackoverflow.com/a/5602678
You can use the Boolean wrapper class to support an optional boolean value. The values true and false will be mapped to the correct boolean values.
#MatrixParam("sortByDate") Boolean sortByDate
It will be null if the param is not present. Note that JAXB doesn't apply when dealing with JAX-RS parameters.

Related

What's the correct way to convert from StringBuilder to String?

From what I've seen online, people seem to suggest that the toString() method is to be used, however the documentation states:
Creates a String representation of this object. The default representation is platform dependent. On the java platform it is the concatenation of the class name, "#", and the object's hashcode in hexadecimal.
So it seems like using this method might cause some problems down the line?
There is also mkString and result(). The latter of which seems to make the most sense. But I'm not sure what the differences between these 3 methods are and if that's how result() is supposed to be used.
The toString implementation currently just redirects to the result method anyway, so those two methods will behave in the same way. However, they express slightly different intent:
toString requests a textual representation of StringBuilders current state that is "concise but informative (and) that is easy for a person to read". So, theoretically, the (vague) specification of this method does not forbid abbreviating the result, or enhancing conciseness and readability in any other way.
result requests the actual constructed string. No different readings seem possible here.
Therefore, if you want to obtain the resulting string, use result to express your intent as clearly as possible.
In this way, the reader of your code won't have to wonder whether StringBuilder.toString might shorten something for the sake of "conciseness" when the string gets over 9000 kB long, or something like that.
The mkString is for something else entirely, it's mostly used for interspersing separators, as in "hello".mkString(",") == "h,e,l,l,o".
Some further links:
The paragraph with "hashcode in hexadecimal" describes the default. It is just documentation inherited from AnyRef, because the creator of StringBuilder didn't bother to provide more detailed documentation.
If you look into code, you'll see that toString is actually just delegating to result.
The documentation of StringBuilder also mentions result() in the introductory overview paragraph.
Just use result().
TL;DR; use result as stated in the docs.
toString MUST never be called in anything at all for another purpose other than a quick debug.
mkString is inherited from collections hierarchy and it will basically create another StringBuilder so is very inefficient.

Subresource and path variable conflicts in REST?

Is it considered bad practice to design a REST API that may have an ambiguity in the path resolution? For example:
GET /animals/{id} // Returns the animal with the given ID
GET /animals/dogs // Returns all animals of type dog
Ok, that's contrived, because you would actually just do GET /dogs, but hopefully it illustrates what I mean. From a path resolution standpoint, it seems like you wouldn't know whether you were looking for an animal with id="dogs" or just all the dogs
Specifically, I'm interested in whether Jersey would have any trouble resolving this. What if you knew the id to be an integer?
"Specifically, I'm interested in whether Jersey would have any trouble resolving this"
No this would not be a problem. If you look at the JAX-RS spec § 3.7.2, you'll see the algorithm for matching requests to resource methods.
[E is the set of matching methods]...
Sort E using the number of literal characters in each member as the primary key (descending order), the number of capturing groups as a secondary key (descending order) and the number of capturing groups with non-default regular expressions (i.e. not ‘([^ /]+?)’) as the tertiary key (descending order)
So basically it's saying that the number of literal characters is the primary key of which to sort by (note that it is short circuiting; you win the primary, you win). So for example if a request goes to /animals/cat, #Path("/animals/dogs") would obviously not be in the set, so we don't need to worry about it. But if the request is to /animals/dogs, then both methods would be in the set. The set is then sorted by the number of literal characters. Since #Path("/animals/dogs") has more literal characters than #Path("/animals/"), the former wins. The capturing group {id} doesn't count towards literal characters.
"What if you knew the id to be an integer?"
The capture group allows for regex. So you can use #Path("/animals/{id: \\d+}"). Anything not numbers will not pass and lead to a 404, unless of course it is /animals/dogs.

How do purely functional compilers annotate the AST with type info?

In the syntax analysis phase, an imperative compiler can build an AST out of nodes that already contain a type field that is set to null during construction, and then later, in the semantic analysis phase, fill in the types by assigning the declared/inferred types into the type fields.
How do purely functional languages handle this, where you do not have the luxury of assignment? Is the type-less AST mapped to a different kind of type-enriched AST? Does that mean I need to define two types per AST node, one for the syntax phase, and one for the semantic phase?
Are there purely functional programming tricks that help the compiler writer with this problem?
I usually rewrite a source (or an already several steps lowered) AST into a new form, replacing each expression node with a pair (tag, expression).
Tags are unique numbers or symbols which are then used by the next pass which derives type equations from the AST. E.g., a + b will yield something like { numeric(Tag_a). numeric(Tag_b). equals(Tag_a, Tag_b). equals(Tag_e, Tag_a).}.
Then types equations are solved (e.g., by simply running them as a Prolog program), and, if successful, all the tags (which are variables in this program) are now bound to concrete types, and if not, they're left as type parameters.
In a next step, our previous AST is rewritten again, this time replacing tags with all the inferred type information.
The whole process is a sequence of pure rewrites, no need to replace anything in your AST destructively. A typical compilation pipeline may take a couple of dozens of rewrites, some of them changing the AST datatype.
There are several options to model this. You may use the same kind of nullable data fields as in your imperative case:
data Exp = Var Name (Maybe Type) | ...
parse :: String -> Maybe Exp -- types are Nothings here
typeCheck :: Exp -> Maybe Exp -- turns Nothings into Justs
or even, using a more precise type
data Exp ty = Var Name ty | ...
parse :: String -> Maybe (Exp ())
typeCheck :: Exp () -> Maybe (Exp Type)
I cant speak for how it is supposed to be done, but I did do this in F# for a C# compiler here
The approach was basically - build an AST from the source, leaving things like type information unconstrained - So AST.fs basically is the AST which strings for the type names, function names, etc.
As the AST starts to be compiled to (in this case) .NET IL, we end up with more type information (we create the types in the source - lets call these type-stubs). This then gives us the information needed to created method-stubs (the code may have signatures that include type-stubs as well as built in types). From here we now have enough type information to resolve any of the type names, or method signatures in the code.
I store that in the file TypedAST.fs. I do this in a single pass, however the approach may be naive.
Now we have a fully typed AST you could then do things like compile it, fully analyze it, or whatever you like with it.
So in answer to the question "Does that mean I need to define two types per AST node, one for the syntax phase, and one for the semantic phase?", I cant say definitively that this is the case, but it is certainly what I did, and it appears to be what MS have done with Roslyn (although they have essentially decorated the original tree with type info IIRC)
"Are there purely functional programming tricks that help the compiler writer with this problem?"
Given the ASTs are essentially mirrored in my case, it would be possible to make it generic and transform the tree, but the code may end up (more) horrendous.
i.e.
type 'type AST;
| MethodInvoke of 'type * Name * 'type list
| ....
Like in the case when dealing with relational databases, in functional programming it is often a good idea not to put everything in a single data structure.
In particular, there may not be a data structure that is "the AST".
Most probably, there will be data structures that represent parsed expressions. One possible way to deal with type information is to assign a unique identifier (like an integer) to each node of the tree already during parsing and have some suitable data structure (like a hash map) that associates those node-ids with types. The job of the type inference pass, then, would be just to create this map.

Method naming convention in Scala -- mutable and not version?

This example is trivial just to show the point.
Let's say I use matrix library, but is lacks some power, let's say doubling every element in matrix is so crucial for me, I decide to write a method doubleIt. However, I could write 2 versions of this method
mutable -- doubleItInPlace
non mutable -- doubleItByCreatingNewOne
This is a bit lengthy, so one could think of naming convention, adding to mutable version _! suffix, or prefixing it with word "mut".
Is there any establishing naming convention for making such difference?
The convention is to name the mutable (in general, side-effecting) version with a verb in imperative form. Additionally, and more importantly, use the empty parameter list () at the end:
def double()
def doubleIt()
The immutable version, i.e. one producing a new object, you should name via verb in the passive form. More importantly, do not use the empty parameter list () at the end:
def doubled
def doubledMatrix
Note that naming the non-side-effecting method in the passive form is not always adhered (e.g. the standard collections library), but is a good idea unless it makes the name overly verbose.
Source: Scala styleguide.

Purpose of Scala's Symbol? [duplicate]

This question already has answers here:
Closed 12 years ago.
Possible Duplicate:
What are some example use cases for symbol literals in Scala?
What's the purpose of Symbol and why does it deserve some special literal syntax e. g. 'FooSymbol?
Symbols are used where you have a closed set of identifiers that you want to be able to compare quickly. When you have two String instances they are not guaranteed to be interned[1], so to compare them you must often check their contents by comparing lengths and even checking character-by-character whether they are the same. With Symbol instances, comparisons are a simple eq check (i.e. == in Java), so they are constant time (i.e. O(1)) to look up.
This sort of structure tends to be used more in dynamic languages (notably Ruby and Lisp code tends to make a lot of use of symbols) since in statically-typed languages one usually wants to restrict the set of items by type.
Having said that, if you have a key/value store where there are a restricted set of keys, where it is going to be unwieldy to use a static typed object, a Map[Symbol, Data]-style structure might well be good for you.
A note about String interning on Java (and hence Scala): Java Strings are interned in some cases anyway; in particular string literals are automatically interned, and you can call the intern() method on a String instance to return an interned copy. Not all Strings are interned, though, which means that the runtime still has to do the full check unless they are the same instance; interning makes comparing two equal interned strings faster, but does not improve the runtime of comparing different strings. Symbols benefit from being guaranteed to be interned, so in this case a single reference equality check is both sufficient to prove equality or inequality.
[1] Interning is a process whereby when you create an object, you check whether an equal one already exists, and use that one if it does. It means that if you have two objects which are equal, they are precisely the same object (i.e. they are reference equal). The downsides to this are that it can be costly to look up which object you need to be using, and allowing objects to be garbage collected can require complex implementation.
Symbols are interned.
The purpose is that Symbol are more efficient than Strings and Symbols with the same name are refered to the same Symbol object instance.
Have a look at this read about Ruby symbols: http://glu.ttono.us/articles/2005/08/19/understanding-ruby-symbols
You can only get the name of a Symbol:
scala> val aSymbol = 'thisIsASymbol
aSymbol: Symbol = 'thisIsASymbol
scala> assert("thisIsASymbol" == aSymbol.name)
It's not very useful in Scala and thus not widely used. In general, you can use a symbol where you'd like to designate an identifier.
For example, the reflection invocation feature which was planned for 2.8.0 used the syntax obj o 'method(arg1, arg2) where 'o' was a method added to Any and Symbol was added the method apply(Any*) (both with 'pimp my library').
Another example could be if you want to create an easier way to create HTML documents, then instead of using "div" to designate an element you'd write 'div. Then one can imagine adding operators to Symbol to make syntactic sugar for creating elements