Kubernetes client-go: watch.Interface vs. cache.NewInformer vs. cache.NewSharedIndexInformer? - kubernetes

I need my Go app to monitor some resources in a Kubernetes cluster and react to their changes. Based on numerous articles and examples, I seem to have found a few ways to do it; however, I'm relatively new to Kubernetes, and they're described in terms much too complex to me, such that I'm still unable to grasp the difference between them — and thus, to know which one to use, so that I don't get some unexpected behaviors... Specifically:
watch.Interface.ResultChan() — (acquired through e.g. rest.Request.Watch()) — this already seems to let me react to changes happening to a resource, by providing Added/Modified/Deleted events;
cache.NewInformer() — when I implement a cache.ResourceEventHandler, I can pass it as last argument in:
cache.NewInformer(
cache.NewListWatchFromClient(clientset.Batch().RESTClient(), "jobs", ...),
&batchv1.Job{},
0,
myHandler)
— then, the myHandler object will receive OnAdd()/OnUpdate()/OnDelete() calls.
To me, this seems more or less equivalent to the ResultChan I got in (1.) above; one difference is that apparently now I get the "before" state of the resource as a bonus, whereas with ResultChan I would only get its "after" state.
Also, IIUC, this is actually somehow built on the watch.Interface mentioned above (through NewListWatchFromClient) — so I guess it brings some value over it, and/or fixes some (what?) deficiencies of a raw watch.Interface?
cache.NewSharedInformer() and cache.NewSharedIndexInformer() — (uh wow, now those are a mouthful...) I tried to dig through the godocs, but I feel completely overloaded with terminology I don't understand, such that I don't seem to be able to grasp the subtle (?) differences between a "regular" NewInformer vs. NewSharedInformer vs. NewSharedIndexInformer... 😞
Could someone please help me understand the differences between above APIs in the Kubernetes client-go package?

These methods differ in the level of abstraction. If a higher level abstraction fits your need, you should use it, as many lower level problems is solved for you.
Informers is a higher level of abstraction than watch that also include listers. In most use cases you should use any kind of Informer instead of lower level abstraction. An Informer internally consists of a watcher, a lister and an in-memory cache.
SharedInformers share the connection with the API server and other resources between your informers.
SharedIndexInformers add an index to your data cache, in case you work with a larger dataset.
It is recommended to use SharedInformers instead of the lower level abstractions. Instantiate new SharedInformes from the same SharedInformerFactory. Theres is an example in Kubernetes Handbook example
informerFactory := informers.NewSharedInformerFactory(clientset, time.Second*30)
podInformer := informerFactory.Core().V1().Pods()
serviceInformer := informerFactory.Core().V1().Services()
podInformer.Informer().AddEventHandler(
// add your event handling
)
// add event handling for serviceInformer
informerFactory.Start(wait.NeverStop)
informerFactory.WaitForCacheSync(wait.NeverStop)

Related

What alternatives are there to dynamic patching (to deal with variables passed at creation time)?

I have heard people describe dynamic patching as a bit of a hack or at risk of breaking in future releases of Pd. This is reasonable enough, but it seems to imply that there are alternatives when building abstractions.
Dynamic patching seems to be useful for both instantiating a variable number of objects and connecting up to a variable number (a number defined at creation time - I personally don't need it to change after the fact, at this stage) of inlets and outlets within an abstraction.
Now I understand that the [clone] object can solve the problem of creating objects. I can see too that looping through send and receive objects would solve much of the connection issues with careful planning but what I do not understand is how objects like [trigger], [route] and [select] can be adjusted or replaced in some way? I fail to see how you would avoid using dynamic patching to, for example, create a [trigger f f] when the creation arg to your abstraction is 2, and a [trigger f f f] when the creation arg is 3. Again, the same with [route] and [select] and similar objects.
EDIT: The original question was perceived as too vague. I later posed a follow-up question in the comments which should really be here instead. As it happens, the answer to the follow-up provided a good answer to the original question, in my opinion. So to summarise and hopefully clarify, I was after a few "tools" to use when building abstractions so that I could limit my use of dynamic patching, if possible. These tools turned out to be:
using send and receive instead of inlets and outlets (although [initbang] can be used for creating inlets and outlets at instantiation).
using [clone]
chaining trigger, route and select objects using send and receive - for example, using [t b b] - [t b b] instead of [t b b b]. This means that the number of arguments in these objects can be defined at creation time with the help of [clone] for example. This is discussed in the Pd mailing list.
using [initbang] as indicated in the answer below.
After having attempted to build a drum machine with presets and an arbitrary number of tracks with my limited knowledge of dynamic patching techniques, I realised that there must be many ways of avoiding the problems I had when doing this, which were several! Of course, some things have to be done with dynamic patching and that's fine. It's just about creating manageable code.
This is really an answer to "follow-up question" in the comment¹, rather than the original question (which I consider too broad to be answered),
Is there a way to define an abstraction that has an argument that defines how many outlets the abstraction exposes?
Sure, just use $1 for that.
E.g. [gates 10] could create 10 outlets...
Presumably it could dynamically patch itself, but that doesn't seem like a good idea.
well, if you want an abstraction to have a dynamic API (that is: a variable number of inlets/outlets), then there is no way around dynamic patching.
Is this a good case for building your own external?
depends on what you actually want the external to do.
the iemguts library (disclaimer: of which I am the author) has everything in place to allow you to dynamically patch what you need.
Most important, there is [initbang], to create iolets before Pd tries to connect them (if you use [loadbang], the iolets will be created after Pd failed to connect to them).
It also includes a [canvasargs] object which allows you to get all the arguments to the abstraction (e.g. which simplifies the task of having the number of outlets equal the number of arguments - like [trigger] or [pack])
if instead you want to wrap the entire functionality of your abstraction into an external, that's of course also possible (and pretty simple in the realm of C).
Also keep in mind that other's might have already coded what you need.
¹ please don't abuse the comment field for follow-up questions. either update your original question (if the follow-up is a mere clarification of the original question) or post a new one.

Why do lambdas resolve faster?

The docs recommend to register frequently-used components via lambdas as ...
This can yield an improvement of up to 10x faster Resolve() calls
Now there are obviously a few questions:
why? (EDIT: to clarify: I would understand if the register time goes up, because you got to use reflection now to find the right constructor and such, but why the heck the resolve time?)
In which scenarios does this apply / what aspects of the registered class make this number go up/which make it go down?
What resolve times are we generally talking about anyhow? Like "yeah now it takes 100 instead of 10 cpu cycles" or actually measurable numbers in "normal" use cases (web service with per-request lifetimes)?
As noted in the comments, the short version is that the concrete implementation is going to be faster than the reflection manner of resolving.
Diving deeper, think about the steps involved in each.
Lambda:
Execute the method.
There is no step two.
Reflection:
Enumerate all the constructors for the type to be instantiated. This list could be cached, but is already fairly well cached by the .NET framework.
Of all of the constructors that are available, figure out which one to execute based on the number of constructor parameters available and the types registered in the container. Note the types registered in the container may change based on registration sources, lifetime scope registrations, etc.
Resolve the constructor parameters. If there are reflection-based registrations that make up the constructor parameters, run them through this process recursively.
Invoke the selected constructor using the resolved parameters.
As you can see, there's actually a lot more work than just Activator.CreateInstance in the reflection manner of resolving, which is why it takes longer.
But, as also noted in the comments, don't worry about premature optimization. This all happens pretty darn quickly so wait to optimize until you can actually locate a bottleneck using a profiler or some similar tool.

REST: The same resource accessed from two different URLs

I've just started my journey with REST, so please be patient while anserwing my questions.
I have some doubts how to cope with some situations. For instance, I read that evenry resource/entity should have its own, unique URL. But how to deal with examples like:
There is a list of objects.
Each object is described by some basic information and its state.
Basic info for object is for example its shape, color.
The state of object contains information like position in space, velocity etc.
Both two pieces of information (basic + state) are rather big objects (only examplary properties were introduced).
Now, we have the following questions:
Get all basic info about objects.
Get all object states.
Get basic info of object with ID=2.
Get state of object ID=5.
Get all info of object ID=7.
I tried to solve it in this way:
/rest/objects
/rest/states
/rest/objects/2/basic
/rest/objects/5/state (/rest/states/5)
/rest/objects/7
However, I've some doubts pointed in 4 - it doesn't look like to be correct. There are two ways to access the same information/resource/entity.
How to deal with it?
Isn't the "basic" just part of the resource? It seems not from your description, but you should consider it. In fact, all of "basic" and "state" might just be part of the resource. It's difficult to be authoritative without more information. For instance, how large is "rather big"?
It also doesn't seem like /state should be an independent endpoint, since a state is an integral part of an object, not something that lives on its own.
I might start with something like:
1. GET /objects
2. GET /objects?expand=state
3. GET /objects/2?expand=basic -- or -- GET /objects/2/basic
4. GET /objects/5?expand=state -- or -- GET /objects/5/state
5. GET /objects/7?expand=basic,state
If you need /state to be top level, then you have two options:
2. GET /states -- or -- GET /objects/states
Neither of those are really ideal. It is acceptable to have multiple paths to get to the same resource, as long as there's only one canonical URI for the resource. The issue is that it's harder for the end user to learn APIs that lean heavily on that style. The second style is also confusing. /objects/XXX is a specific object, unless XXX == states? That's a special case for your end user to have to remember.
From a caching point of view it makes sense to have "state" and "basic" in separate resources - "state" resources changes frequently and is thus non-cachable and "basic" changes seldom and should thus be cachable for a long time. If this is your goal, then go ahead, it sure makes sense (and is an often mentioned pattern). But usually, to keep complexity low, both sets of values are combined into one single resource (your "Object" in this case).
Your paths are pretty good, Eric's version is also fine - here is my suggestion:
/rest/objects (all complete objects info)
/rest/objects/basic (all basic info)
/rest/objects/state (all state info)
/rest/objects/7 (single complete object)
/rest/objects/2/basic (single basic info)
/rest/objects/5/state (single state info)
In the end it all boils down to a matter of personal preference as the exact URL doesn't matter much.
See for instance http://soabits.blogspot.dk/2013/10/url-structures-and-hyper-media-for-web.html for a longer discussion on this issue (disclaimer: that's my blog).
resource/entity
Don't confuse these concepts. REST has nothing to do how you store your data, and resources map to entities only by CRUD applications.
I read that evenry resource/entity should have its own, unique URL.
You should read the IRI standard. IRIs are for identifying resources. Without them you cannot tell the server what you want to manipulate.
However, I've some doubts pointed in 4 - it doesn't look like to be
correct. There are two ways to access the same
information/resource/entity.
How to deal with it?
It is up to you. You can choose either of them or both. There is no restriction about how many identifiers a resource can have or what IRI structure to choose.

Elegant AST model

I am in the process of writing a toy compiler in scala. The target language itself looks like scala but is an open field for experiment.
After several large refactorings I can't find a good way to model my abstract syntax tree. I would like to use the facilities of scala's pattern matching, the problem is that the tree carries moving information (like types, symbols) along the compilation process.
I can see a couple of solutions, none of which I like :
case classes with mutable fields (I believe the scala compiler does this) : the problem is that those fields are not present a each stage of the compilation and thus have to be nulled (or Option'd) and it becomes really heavy to debug/write code. Moreover, if for exemple, I find a node with null type after the typing phase I have a really hard time finding the cause of the bug.
huge trait/case class hierarchy : something like Node, NodeWithSymbol, NodeWithType, ... Seems like a pain to write AND work with
something completly hand crafted with extractors
I'm also not sure if it is good practice to go with a fully immutable AST, especially in scala where there is no implicit sharing (because the compiler is not aware of immutability) and it could hurt performances to copy the tree all the time.
Can you think of an elegant pattern to model my tree using scala's powerful type system ?
TL;DR I prefer to keep the AST immutable and carry things like type information in a separate structure, e.g. a Map, that can be referred by IDs stored in the AST. But there is no perfect answer.
You're by no means the first to struggle with this question. Let me list some options:
1) Mutable structures that get updated at each phase. All the up and downsides you mention.
2) Traits/cake pattern. Feasible, but expensive (there's no sharing) and kinda ugly.
3) A new tree type at each phase. In some ways this is the theoretically cleanest. Each phase can deal only with a structure produced for it by the previous phase. Plus the same approach carries all the way from front end to back end. For instance, you may "desugar" at some point and having a new tree type means that downstream phase(s) don't have to even consider the possibility of node types that are eliminated by desugaring. Also, low level optimizations usually need IRs that are significantly lower level than the original AST. But this is also a lot of code since almost everything has to be recreated at each step. This approach can also be slow since there can be almost no data sharing between phases.
4) Label every node in the AST with an ID and use that ID to reference information in other data structures (maps and vectors and such) that hold information computed for each phase. In many ways this is my favorite. It retains immutability, maximizes sharing and minimizes the "excess" code you have to write. But you still have to deal with the potential for "missing" information that can be tricky to debug. It's also not as fast as the mutable option, though faster than any option that requires producing a new tree at each phase.
I recently started writing a toy verifier for a small language, and I am using the Kiama library for the parser, resolver and type checker phases.
Kiama is a Scala library for language processing. It enables convenient analysis and transformation of structured data. The programming styles supported by the library are based on well-known formal language processing paradigms, including attribute grammars, tree rewriting, abstract state machines, and pretty printing.
I'll try to summarise my (fairly limited) experience:
[+] Kiama comes with several examples, and the main contributor usually responds quickly to questions asked on the mailing list
[+] The attribute grammar paradigm allows for a nice separation into "immutable components" of the nodes, e.g., names and subnodes, and "mutable components", e.g., type information
[+] The library comes with a versatile rewriting system which - so far - covered all my use cases
[+] The library, e.g., the pretty printer, make nice examples of DSLs and of various functional patterns/approaches/ideas
[-] The learning curve it definitely steep, even with examples and the mailing list at hand
[-] Implementing the resolving phase in a "purely function" style (cf. my question) seems tricky, but a hybrid approach (which I haven't tried yet) seems to be possible
[-] The attribute grammar paradigm and the resulting separation of concerns doesn't make it obvious how to document the properties nodes have in the end (cf. my question)
[-] Rumour has it, that the attribute grammar paradigm does not yield the fastest implementations
Summarising my summary, I enjoy using Kiama a lot and I strongly recommend that you give it a try, or at least have a look at the examples.
(PS. I am not affiliated with Kiama)

Why use a post compiler?

I am battling to understand why a post compiler, like PostSharp, should ever be needed?
My understanding is that it just inserts code where attributed in the original code, so why doesn't the developer just do that code writing themselves?
I expect that someone will say it's easier to write since you can use attributes on methods and then not clutter them up boilerplate code, but that can be done using DI or reflection and a touch of forethought without a post compiler. I know that since I have said reflection, the performance elephant will now enter - but I do not care about the relative performance here, when the absolute performance for most scenarios is trivial (sub millisecond to millisecond).
Let's try to take an architectural point on the issue. Say you are an architect (everyone wants to be an architect ;)
You need to deliver the architecture to your team:
a selected set of libraries, architectural patterns, and design patterns. As a part of your design, you say: "we will implement caching using the following design pattern:"
string key = string.Format("[{0}].MyMethod({1},{2})", this, param1, param2 );
T value;
if ( !cache.TryGetValue( key, out value ) )
{
using ( cache.Lock(key) )
{
if (!cache.TryGetValue( key, out value ) )
{
// Do the real job here and store the value into variable 'value'.
cache.Add( key, value );
}
}
}
This is a correct way to do tracing. Developers are going to implement this pattern thousands of times, so you write a nice Word document telling how you want the pattern to be implemented. Yeah, a Word document. Do you have a better solution? I'm afraid you don't. Classic code generators won't help. Functional programming (delegates)? It works fairly well for some aspects, but not here: you need to pass method parameters to the pattern. So what's left? Describe the pattern in natural language and trust developers will implement them.
What will happen?
First, some junior developer will look at the code and tell "Hm. Two cache lookups. Kinda useless. One is enough." (that's not a joke -- ask the DNN team about this issue). And your patterns cease to be thread-safe.
As an architect, how do you ensure that the pattern is properly applied? Unit testing? Fair enough, but you will hardly detect threading issues this way. Code review? That's maybe the solution.
Now, what is you decide to change the pattern? For instance, you detect a bug in the cache component and decide to use your own? Are you going to edit thousands of methods? It's not just refactoring: what if the new component has different semantics?
What if you decide that a method is not going to be cached any more? How difficult will it be to remove caching code?
The AOP solution (whatever the framework is) has the following advantages over plain code:
It reduces the number of lines of code.
It reduces the coupling between components, therefore you don't have to change much things when you decide to change the logging component (just update the aspect), therefore it improves the capacity of your source code to cope with new requirements over time.
Because there is less code, the probability of bugs is lower for a given set of features, therefore AOP improves the quality of your code.
So if you put it all together:
Aspects reduce both development costs and maintenance costs of software.
I have a 90 min talk on this topic and you can watch it at http://vimeo.com/2116491.
Again, the architectural advantages of AOP are independent of the framework you choose. The differences between frameworks (also discussed in this video) influence principally the extent to which you can apply AOP to your code, which was not the point of this question.
Suppose you already have a class which is well-designed, well-tested etc. You want to easily add some timing on some of the methods. Yes, you could use dependency injection, create a decorator class which proxies to the original but with timing for each method - but even that class is going to be a mess of repetition...
... or you can add reflection to the mix and use a dynamic proxy of some description, which lets you write the timing code once, but requires you to get that reflection code just right -which isn't as easy as it might be, especially if generics are involved.
... or you can add an attribute to each method that you want timed, write the timing code once, and apply it as a post-compile step.
I know which seems more elegant to me - and more obvious when reading the code. It can be applied even in situations where DI isn't appropriate (and it really isn't appropriate for every single class in a system) and with no other changes elsewhere.
AOP (PostSharp) is for attaching code to all sorts of points in your application, from one location, so you don't have to place it there.
You cannot achieve what PostSharp can do with Reflection.
I personally don't see a big use for it, in a production system, as most things can be done in other, better, ways (logging, etc).
You may like to review the other threads on this matter:
Anyone with Postsharp experience in production?
Other than logging, and transaction management what are some practical applications of AOP?
Aspect Oriented Programming: What do you use PostSharp for?
etc (search)
Aspects take away all the copy & paste - code and make adding new features faster.
I hate nothing more than, for example, having to write the same piece of code over and over again. Gael has a very nice example regarding INotifyPropertyChanged on his website (www.postsharp.net).
This is exactly what AOP is for. Forget about the technical details, just implement what you are being asked for.
In the long run, I think we all should say goodbye to the way we are writing software now. It's tedious and plainly stupid to write boilerplate code and iterate manually.
The future belongs to declarative, functional style being held together by an object oriented framework - and the cross cutting concerns being handled by aspects.
I guess the only people who will not get it soon are the guys who are still payed for lines of code.