What is the correct operation of a CANopen inhibit timer? - canopen

I understand that the operation of a CANopen inhibit timer is to ensure a minimum time between successive transmissions of the same message, but the specification does not make it clear what to do if the data changes during the inhibit time (and the transmission is on change-of-state). Should I buffer the data and transmit it when the inhibit timer expires, or discard it and wait for a change after the timer has expired?
My assumption would be, since it is not clearly defined, I can choose whichever approach I want, but I'd appreciate the input of any experienced architects / developers on this.
Thanks.

You're correct that the inhibit time is simply the minimum time between consecutive CAN frames with the same CAN-ID. The standard does not specify the behavior for multiple events during the inhibit time window, because it depends on the situation.
For services like NMT, EMCY and perhaps LSS, you'd want to buffer the messages and send them later. In this case the inhibit time is simply a means to help slow (or badly programmed) devices to handle short bursts of messages. I've seen devices that could only handle 3 CAN frames at once, so it's often necessary, but you would not want them to miss messages.
For event-driven Transmit-PDOs, it depends on what the PDO represents. If you use it to track state, it might make sense to drop events during the inhibit window. They're invalidated by subsequent events anyway. To ensure you always emit the latest state, you can store the most recent event and transmit it once the inhibit time has elapsed, or use the event-timer to ensure you're never too far behind. I've used this strategy in the past for analog inputs where line noise would sometimes cause event bursts.
If you use PDOs to track events (or state changes), you'd be better of buffering them so no events get lost. However, this can introduce potentially unbounded delays if the event period is shorter than the inhibit time.
For the products we're working on at Lely (dairy farm robots), we actually prefer to use SYNC-driven PDOs instead. It results in a much more predictable CAN bus load. And we don't have to track state at the receiver side because we receive a full update on every SYNC. However, the receiver is always one SYNC period behind the transmitter, so this may not be appropriate for your use case.

Related

Is there a way to specify infinite allowed lateness in Apache Beam?

I'm using fixed windows to batch data by event time in order to send it to an external API efficiently (batches of 60 seconds), accumulation mode is set to DISCARDING because it doesn't matter if late data is sent to the external API without the previous data.
Is it possible to specify an infinite allowed lateness, so late data is never discarded?
It is definitely possible, you can set allowed lateness to a very high Duration (for instance, Duration.standardDays(36500)). On the other hand , doing so would result in your state growing indefinitely, which might not be what you want. Every open window (every window ever seen) will have at least a timer called a GC timer - a timer set for the end of the window + allowed lateness. Every timer has to be kept in state and therefore, the size of your state will grow over time.
If you do not need batching based on event-time, it might be a better option to use GroupIntoBatches, which should not suffer from this problem (you don't need to set allowed lateness and the size of your state will not grow).

Delay fixed window from triggering for several minutes

Using Fixed Windows in Apache Beam. The watermark is set by the event time.
Some data may arrive out of order and cause the window to close.
How can a trigger be defined in Java to occur say 2 minutes after the last data was seen?
It's not entire clear what behavior you expect. One question is what do you expect to happen if the data arrives within the two minutes? Do you want to restart the two minutes interval, don't restart it, re-emit the data or not?
Looks like the trigger you are trying to describe is something along these lines:
wait until the watermark passed the end of window, in event time;
wait for additional 2 minutes in processing time;
emit the data;
If in step 2 it was event time, i.e. you wanted to re-emit the window if a late element arrives that fits within window + 2min, then you could use withAllowedLateness(). Though it sounds different from what you want, because it can keep re-emitting the window contents every time a matching late element arrives.
With processing-time in step 2 this is not possible in general with basic triggers that are available in Beam. You can probably achieve a behavior you want if you manually manage state and timers in your own ParDo, e.g. you can watch for the incoming elements, keep track on them in the state, and then on timer emit what you want. This can become very complicated and might still be not flexible enough for your specific use case.
One of the major problems is that there is no good way to define processing time triggers in Beam in general. It would be complicated to define a general mechanism of working with timers in this manner. For example, when you want to express "wait for 2 minutes", the framework needs to understand in relation to what these two minutes are, when to start the timer, so you need a mechanism to express that as well. And with composition, continuation and other complications this doesn't seem easy to reason about. So it's not in the framework in this general form.
In order to implement only the "wait for 2 minutes after the last element was seen in the window", the framework has to watch for it and set the timer. Technically it is possible to do something like this but doesn't seem like anyone has done it yet.
There seems to be only one meaningful processing time trigger available in Beam but it's not generic enough and doesn't do what you want. You can look at composite triggers like AfterFirst or AfterAll but they likely won't help you without a better general processing time trigger.
I decided against using Beam and implemented the solution in Kafka Streams.
I basically grouped by, then used fixed windows and the aggregated the result.
The "grace" on the window allows data to arrive late.
KGroupedStream<Long, OxyStreamItem> grouped = input.groupByKey();
TimeWindowedKStream<Long, OxyStreamItem> windowed =
grouped.windowedBy(
TimeWindows.of(WIN_SIZE)
.advanceBy(WIN_SIZE)
.grace(Duration.ofSeconds(5L)));
return windowed
.aggregate(
makeInitializer(),
makeAggregator(),
Materialized
.<Long, Aggregate, WindowStore<Bytes, byte[]>>as("tmp")
.withValueSerde(new AggregateSerde()))
.suppress(
Suppressed.untilWindowCloses(Suppressed.BufferConfig.unbounded()))
.toStream()
.map(calculateAvg());

Why is response time important in CPU scheduling?

I'm looking for an example of a job for which response time is important.
One definition of response time is:
The time taken in an interactive program from the issuance of a command to the commence of a response to that command.
I've read that response time is important for interactivity, but I can't understand why. If the job isn't fully completed, what output could be produced that would be of interest to a user?
Wouldn't the user only care about how soon a job finishes, as that's the first time any output is produced?
For example, consider these two possible schedulings of two jobs:
Case 1: |---B---|---A---|
Case 2: |-A-|---B---|-A-|
Suppose that job A and B are issued at the same time, A being a command typed in by the user and B being some background process.
The response time for job A as I understand it would be shorter in case 2. As job A finishes (and produces output) at the same time in the two cases, I don't understand how the user benefits (or even notices) the better response time in case 2.
When writing an operating system, one has to take into consideration what will the intended audience be. In some cases it matters most to finish jobs as quickly as possible (supercomputer systems), in some cases it matters most to be as responsive as possible (regular desktop systems), and in some cases it matters most to be as predictable as possible (real-time systems).
For finishing jobs as fast as possible, tasks should be interrupted the rarest possible (so big intervals between task switches are the best option). Here response time doesn't really matter much. It should be noted that task switches usually take some time (thousands of CPU cycles usually) due to having to save the state (including registers and paging structures) of the old task to memory and restore the state (including registers and paging structures) of the new task from memory. This also causes cache and TLB misses, since the cached information doesn't usually belong to the current process.
For being the most responsive possible, tasks should be interrupted as often as possible so the user doesn't experience the so-called lag. This is where response time is important. Note however that on interrupt-driven architectures (like x86) an interrupt from the keyboard or the mouse would automatically pause execution of the current task and call the interrupt handler, which processes the input and sends it to the appropriate program.
For being the most predictable possible, input should be processed neither too fast, neither too slow. This means that response time is constrained from both ways, thus being much more important than in "most responsive possible" designs. A misprediction can even be a fatal failure in mission-critical systems.
In a nutshell, importance of response time varies from design to design and can range from nearly unimportant to critical.
I think I have an answer to my own question. The problem was, I was just thinking about simple processes like ls that once issued runs for some amount of time and then, when they're finished, deliver their first and only output.
However, suppose job A in the example from the question is a program with multiple print statements. Output will in that case be produced before the process is complete (and some of the printouts may well occur during the first scheduled burst). It would thus make sense for interactivity to want to begin running such a process as soon as possible.

How can running event handlers on production be done?

On production enviroments event numbers scale massively, on cases of emergency how can you re run all the handlers when it can take days if they are too many?
Depends on which sort of emergency you are describing
If the nature of your emergency is that your event handlers have fallen massively behind the writers (eg: your message consumers blocked, and you now have 48 hours of backlog waiting for you) -- not much. If your consumer is parallelizable, you may be able to speed things up by using a data structure like LMAX Disruptor to support parallel recovery.
(Analog: you decide to introduce a new read model, which requires processing a huge backlog of data to achieve the correct state. There isn't any "answer", except chewing through them all. In some cases, you may be able to create an approximation based on some manageable number of events, while waiting for the real answer to complete, but there's no shortcut to processing all events).
On the other hand, in cases where the history is large, but the backlog is manageable (ie - the write model wasn't producing new events), you can usually avoid needing a full replay.
In the write model: most event sourced solutions leverage an event store that supports multiple event streams - each aggregate in the write model has a dedicated stream. Massive event numbers usually means massive numbers of manageable streams. Where that's true, you can just leave the write model alone -- load the entire history on demand.
In cases where that assumption doesn't hold -- a part of the write model that has an extremely large stream, or a pieces of the read model that compose events of multiple streams, the usual answer is snapshotting.
Which is to say, in the healthy system, the handlers persist their state on some schedule, and include in the meta data an identifier that tracks where in the history that snapshot was taken.
To recover, you reload the snapshot, and the identifier. You then start the replay from that point (this assumes you've got an event store that allows you to start the replay from an arbitrary point in the history).
So managing recovery time is simply a matter of tuning the snapshotting interval so that you are never more than recovery SLA behind "latest". The creation of the snapshots can happen in a completely separate process. (In truth, your persistent snapshot store looks a lot like a persisted read model).

NoSQL as storage for publish-subscribe/multi-reader queue?

Looking for a storage solution for the following problem, preferably with some NoSQL-like speed and scalability:
Events. Lots of them, little data per event. This is what we need to store.
Not necessary to exactly keep the order in which the events arrive.
It would be nice not to store multiple copies of each event (as in separate storage for each observer).
Observers. A few of them (< 50) They need to read the events
At their own pace (pull model)
Preferably with a "get me the next chunk of unread events" API
Each observer needs to read every event (eventually)
No guarantees on how often they will pull the changes. It might be necessary to store lots of events before they are read.
In an RDBMS you'd probably just number the events sequentially and remember the "last read no" for every observer. Is it possible to implement something similar while trading some of the ACID for speed & scalability?
So far Redis with its lists looks good - anything better I should look at?
I think Redis lists are a good choice. I'd go with a list for each observer though - that way you have O(1) read and write with RPUSH/LPOP, and events automatically disappear from the system when all observers have received them.
You can reduce the storage required for each observer by just storing an event id in each list, though then you will need to keep a counter for each event to determine when it can be removed from the system.
To implement with a single list, set up a counter that is incremented every time an event is added to the head of list. Also set up a counter for each client indicating how many events they have received. The difference between those is the number of items you need to get from the list.
The disadvantage of this approach is that new items can be added to the list after you check the counters. You can get around this by counting from the tail of the list, but that is O(N) rather than O(1). You can reduce N by trimming received events from the list and maintaining a counter for tail position also - how well that works will depend on how many events can accumulate when an observer is offline.
You could take a look at how it's done in Tarantool, with a Lua procedure to keep a ring buffer for events:
https://github.com/mailru/tntlua/blob/master/notifications.lua