How to Deal with Inaccurate MSMQ Performance Counters - msmq

I have a private queue. I am using the following code to efficiently return the number of messages in the queue.
var queuePath = _queue.Path; //_queue is a System.Messaging.MessageQueue object
if (queuePath.StartsWith("."))
{
queuePath = System.Environment.GetEnvironmentVariable("COMPUTERNAME") + queuePath.Substring(1);
}
var queueCounter = new PerformanceCounter("MSMQ Queue","Messages in Queue",queuePath.ToLower(),true);
This code usually works but it appears to be sometimes wrong (i.e. the value lags the actual length). How can I force an MSMQ Queue counter to be up-to-date?
My test data inserts two items and then checks the length. If I step through it the value is correct but if I run it at full speed, the count comes back as zero.
Is there a better way than a Performance Counter for this purpose?

Related

Kafka Streams / How to get the partition an iterartor is iterating over?

in my Kafka Streams application, I have a task that sets up a scheduled (by the wall time) punctuator. The punctuator iterates over the entries of a store and does something with them. Like this:
var store = context().getStateStore("MyStore");
var iter = store.all();
while (iter.hasNext()) {
var entry = iter.next();
// ... do something with the entry
}
// Print a summary (now): N entries processed
// Print a summary (wish): N entries processed in partition P
Since I'm working with a single store here (which might be partitioned), I assume that every single execution of the punctuator is bound to a single partition of that store.
Is it possible to find out which partition the punctuator operates on? The java docs for ProcessorContext.partition() states that this method returns -1 within punctuators.
I've read Kafka Streams: Punctuate vs Process and the answers there. I can understand that a task is, in general, not tied to a particular partition. But an iterator should be tied IMO.
How can I find out the partition?
Or is my assumption that a particular instance of a store iterator is tied to a partion wrong?
What I need it for: I'd like to include the partition number in some log messages. For now, I have several nearly identical log messages stating that the punctuator does this and that. In order to make those messages "unique" I'd like to include the partition number into them.
Just to post here the answer that was provided in https://issues.apache.org/jira/browse/KAFKA-12328:
I just used context.taskId(). It contains the partition number at the end of the value, after the underscore. This was sufficient for me.

Partial batch sizes

I'm trying to simulate pallet behavior by using batch and move to. This works fine except towards the end where the number of elements left is smaller than the batch size, and these never get picked up. Any way out of this situation?
Have tried messing with custom queues, pickup/dropoff pairs.
To elaborate, the batch object has a queue size of 15. However once the entire set has been processed a number of elements less than 15 remain which don't get picked up by the subsequent moveTo block. I need to send the agents to the subsequent block once the queue size falls below 15.
You can dynamically change the batch size of your Batch object towards "the end" (whatever you mean by that :-) ). You need to figure out when to change the batch size (as this depends on your model). But once it is time to adjust, you can call myBatchItem.set_batchSize(1) and it will now batch things together individually.
However, a better model design might be to have a cool-down period before the model end, i.e. stop taking model measurements before your batch objects run out of agents to batch.
You need to know what the last element is somehow for example using a boolean variable called isLast in your agent that is true for the last agent.
and in the batch you have to change the batch size programatically.. maybe like this in the on enter action of your batch:
if(agent.isLast)
self.set_batchSize(self.size());
To determine if the "end" or any lack of supply is reached, I suggest a timeout. I would save a timestamp in a variable lastBatchDate in the OnExit code of the batch block:
lastBatchDate = date();
A cyclically activated event checkForLeftovers will check every once in a while if there is objects waiting to be batched and the timeout (here: 10 minutes) is reached. In this case, the batch size will be reduced to exactly the number of waiting objects, in order for them to continue in a smaller batch:
if( lastBatchDate!=null //prevent a NullPointerError when date is not yet filled
&& ((date().getTime()-lastBatchDate.getTime())/1000)>600 //more than 600 seconds since last batch
&& batch.size()>0 //something is waiting
&& batch.size()<BATCH_SIZE //not more then a normal batch is waiting
){
batch.set_batchSize(batch.size()); // set the batch size to exactly the amount waiting
}
else{
batch.set_batchSize(BATCH_SIZE); // reset the batch size to the default value BATCH_SIZE
}
The model will look something like this:
However, as Benjamin already noted, you should be careful if this is what you really need to model. Take care for example on these aspects:
Is the timeout long enough to not accidentally push smaller batches during normal operations?
Is the timeout short enough to have any effect?
Is it ok to have a batch of a smaller size downstream in your process?
etc.
You might just want to make sure upstream that the number of objects reaches the batching station are always filling full batches, or you might to just stop your simulation before the line "runs dry".
You can see the model and download the source code here.

Nearby API - queuing of payloads

I have an use case where I am sending control data from AndroidThings device to an Android mobile phone - it is periodal reading of voltage 10 times a second, so every 100 millis.
But since this is Nearby API feature - concering sending payloads:
Senders use the sendPayload() method to send a Payload. This method can be invoked multiple times, but since we guarantee in-order delivery, the second Payload onwards will be queued for sending until the first Payload is done.
What happens in reality in my case is, based on the the fact that transmission speed varies, I am getting readings on the phone with delay that is increasing, simply queue gets bigger and bigger.
Any ideas how to overcome this ? Basically I don't need in-order delivery.
My first idea is to have some sort of confirmation of payload delivery and only after the confirmation of receipt the second payload should be sent to recipient.
Thanks for ideas
UPDATE:
STREAM type of payload is a perferct solution. If the InputStream transfers more than one set of readings (reading inlcude voltage, maxvoltage etc. altogether 32 bytes of data) then I use the skip method to skip to the very last readings.
For your use-case, I would recommend using a STREAM Payload, and then you can keep streaming your control data over that single Payload -- this is exactly one of the use-cases for which we created STREAM. :)
Yup, your first idea sounds like the right thing to do. There's no way to turn off in-order delivery of payloads inside Nearby Connections, so you'll have to handle dropping payloads yourself.
I would build a class that caches the 'most recent voltage'. As you get new readings, this value will overwrite itself each time.
private volatile Long mostRecentVoltage;
public void updateVoltage(long voltage) {
if (mostRecentVoltage != null) {
Log.d(TAG, String.format("Dropping voltage %d due to poor network latency", mostRecentVoltage));
}
mostRecentVoltage = voltage;
}
Then add another piece of logic that'll grab the cached value every time the previous payload was successfully sent.
#Nullable
public Long getVoltage() {
try {
return mostRecentVoltage;
} finally {
mostRecentVoltage = null;
}
}

KafkaConsumer: consume all available messages once, and exit

I want to create a KafkaConsumer, with Kafka 2.0.0, that consumes all available messages once, and exits immediately. This differs slightly from the standard console consumer utility because that utility waits for a specified timeout for new messages, and only exits once that timeout has expired.
This seemingly simple task seems to be surprisingly hard using the KafkaConsumer. My gut reaction was the following pseudo-code:
consumer.assign(all partitions)
consumer.seekToBeginning(all partitions)
do
result = consumer.poll(Duration.ofMillis(0))
// onResult(result)
while result is not empty
However this does not work, as poll always returns an empty collection even though there are many messages on the topic.
Researching this, it looks like one reason may have been that assign/subscribe are considered lazy, and partitions are not assigned until a poll loop has completed (although I cannot find any support for this assertion in the docs). However, the following pseudo-code also returns an empty collection on each call to poll:
consumer.assign(all partitions)
consumer.seekToBeginning(all partitions)
// returns nothing
result = consumer.poll(Duration.ofMillis(0))
// returns nothing
result = consumer.poll(Duration.ofMillis(0))
// returns nothing
result = consumer.poll(Duration.ofMillis(0))
// deprecated poll also returns nothing
result = consumer.poll(0)
// returns nothing
result = consumer.poll(0)
// returns nothing
result = consumer.poll(0)
...
So clearly "laziness" is not the issue.
The javadoc states:
This method returns immediately if there are records available.
which seems to imply that the first pseudo-code above should work. However, it does not.
The only thing that seems to work is to specify a non-zero timeout on poll, and not just any non-zero value, for example 1 doesn't work. Which indicates that there is some non-deterministic behavior happening inside poll which assumes that poll will always be carried out in an infinite loop and it doesn't matter that it occasionally returns an empty collection despite the availability of messages. The code seems to confirm this with various calls to check if the timeout is expired sprinkled throughout the poll implementation and its callees.
So with the naive approach, a longer timeout is obviously required (and ideally Long.MAX_VALUE to avoid the non-deterministic behavior of a shorter poll interval), but unfortunately this will cause the consumer to block on the last poll, which isn't desired in this situation. With the naive approach, we now have a trade-off between how deterministic we want the behavior to be, vs how long we have to wait for no reason on the last poll. How do we avoid this?
The only way to accomplish this seems to be with some additional logic that self-manages the offsets. Here is the pseudo-code:
consumer.assign(all partitions)
consumer.seekToBeginning(all partitions)
// record the current ending offsets and poll until we get there
endOffsets = consumer.endOffsets(all partitions)
do
result = consumer.poll(NONTRIVIAL_TIMEOUT)
// onResult(result)
while given any partition p, consumer.position(p) < endOffsets[p]
and an implementation in Kotlin:
val topicPartitions = consumer.partitionsFor(topic).map { TopicPartition(it.topic(), it.partition()) }
consumer.assign(topicPartitions)
consumer.seekToBeginning(consumer.assignment())
val endOffsets = consumer.endOffsets(consumer.assignment())
fun pendingMessages() = endOffsets.any { consumer.position(it.key) < it.value }
do {
records = consumer.poll(Duration.ofMillis(1000))
onResult(records)
} while(pendingMessages())
The poll duration can now be set to a reasonable value (such as 1s) without concern of missing messages, since the loop continues until the consumer reaches the end offsets identified at the beginning of the loop.
There is one other corner case this deals with correctly: if the end offsets have changed, but there are actually no messages between the current offset and the end offset, then the poll will block and timeout. So it is important that the timeout not be set too low (otherwise the consumer will timeout before retrieving messages that are available) and it must also not be set too high (otherwise the consumer will take too long to timeout when retrieving messages that are not available). The latter situation can happen if those messages were deleted, or if the topic was deleted and recreated.
If there's noone producing concurrently, you can also use endOffsets to get the position of last message, and consume until that.
So, in pseudocode:
long currentOffset = -1
long endOffset = consumer.endOffset(partition)
while (currentOffset < endOffset) {
records = consumer.poll(NONTRIVIAL_TIMEOUT) // discussed in your answer
currentOffset = records.offsets().max()
}
This way we avoid final non-zero hangup, as we are always sure there is something to receive.
You might need to add safeguards if your consumer's position is equal to end offset (as you'd get no messages there).
Also, you might want to set max.poll.records to 1, so you don't consume messages positioned after end offset, if someone is producing in parallel.

Akka synchronizing timestamped messages from several actors

Imagine the following architecture. There is an actor in akka that receives push messages via websocket. They have a timestamp and interval between those timestamps is 1 minute. Though the messages with the same timestamp can arrive multiple times via websocket. And then this messages are being broadcasted to as example three further actors (ma). They calculate metrics and push the messages further to the one actor(c).
For ma I defined a TimeSeriesBuffer that allows writing to the buffer only if entities have consequent timestamps. After successfull push to the buffer ma's emit metrics, that go to the c. c can only change it's state when it has all three metrics. Therefore I defined a trait Synchronizable and then a SynchronizableTimeSeriesBuffer with "master-slave" architecture.
On each push to every buffer a check is triggered in order to understand if there are new elements in the buffers of all three SynchronizableTimeSeriesBuffer with the same timestamp that can be emitted further to c as a single message.
So here are the questions:
1) Is it too complicated of a solution?
2) Is there a better way to do it in terms of scala and akka?
3) Why is it not so fast and not so parallel when messages in the system instead of being received "one by one" are loaded from db in a big batch and fed to the system in order to backtest the metrics. (one of the buffers is filling much faster than the others, while other one is at 0 length). I have an assumption it has something to do with akka's settings regarding dispatching/mailbox.
I created a gist with regarding code:
https://gist.github.com/ifif14/18b5f85cd638af7023462227cd595a2f
I would much appreciate the community's help in solving this nontrivial case.
Thanks in advance
Igor
Simplification
It seems like much of your architecture is designed to ensure that your message are sequentially ordered in time. Why not just add a simple Actor at the beginning that filters out duplicated messages? Then the rest of your system could be relatively simple.
As an example; given a message with timestamp
type Payload = ???
case class Message(timestamp : Long, payload : Payload)
You can write the filter Actor:
class FilterActor(ma : Iterable[ActorRef]) extends Actor {
var currentMaxTime = 0L
override def receive = {
case m : Message if m.timestamp > currentMaxTime => ma foreach (_ ! m)
case _ =>
}
}
Now you can eliminate all of the "TimeSeriesBuffer" and "Synchronizable" logic since you know that ma, and c, will only receive time-ordered messages.
Batch Processing
The likely reason why batch processing is not so concurrent is because the mailbox for your ma Actor is being filled up by the database query and whatever processing it is doing is slower than the processing for c. Therefore ma's mailbox continues to accumulate messages while c's mailbox remains relatively empty.
Thanks so much for your answer. The part with cutting off is what I also implemented in Synchronizable Trait.
//clean up slaves. if their queue is behind masters latest element
master_last_timestamp match {
case Some(ts) => {
slaves.foreach { s =>
while ( s.queue.length > 0 && s.getElementTimestamp(s.queue.front) < ts ) {
s.dequeue()
}
// val els = s.dequeueAll { queue_el => s.getElementTimestamp(queue_el) < ts }
}
}
case _ => Unit
}
The reason why I started to implement the buffer is because I feel like I will be using it a lot in the system and I don't think to write this part for each actor I will be using. Seems easier to have a blueprint that does it.
But a more important reason is that for some reason one buffer is either being filled much slower or not at all than the other two. Though they are being filled by the same actors!! (just different instances, and computation time should be pretty much the same) And then after two other actors emitted all messages that were "passed" from the database the third one starts receiving it. It feels to me that this one actor is just not getting processor time. So I think it's a dispatcher's setting that can affect this. Are you familiar with this?
Also I would expect dispatcher work more like round-robin, given each process a little of execution time, but it ends up serving only limited amount of actors and then jumping to the next ones. Although they sort of have to receive initial messages at the same time since there is a broadcaster.
I read akka documentation on dispatchers and mailboxes, but I still don't understand how to do it.
Thank you
Igor