Jmeter JSR223 Pre processor and sampler - apache-kafka

I have a jsr223 preprocessor in the concurrency thread group which creates data to send it to Kafka producer. and I have a JSR sampler that uses Kafka client 2.7.0 to send messages to Kafka procedure.
The message sent to Kafka should be different each time for e.g. it has device information which should be different and events with time (which is the current time). These are been generated without any issues as I tested it with few
(50) threads. The problem I am having is when I want to send more messages like 6000 messages per second. How to resolve this issue
below is my setup

You're showing us a screenshot of the Concurrency Thread Group configured to start 6000 threads (virtual users) and hold them for 20 seconds.
It will result in 6000 messages per second only if your JSR223 PreProcessor and Sampler cumulative response time is exactly 1 second. If it will be less - you will generate more messages per second and vice versa.
For example:
if PreProcessor and Sampler execution time is 500ms - you will end with 12000 messages per second
if PreProcessor and Sampler execution time is 2000ms - you will send 3000 messages per second
If you're sending less messages than you need - consider following JMeter Best Practices, at least disable all the Listeners and run your test in non-GUI mode. Still not enough? Increase the concurrency. Increased concurrency, lacking resources and still not enough - go for Distributed Testing
If you're sending more messages than 6000 per second you can limit JMeter sampler execution rate to the desired throughput using Throughput Shaping Timer
You can see your current throughput using i.e. Transactions per Second listener

Related

process pubsub messages in constant rate. Using streaming and serverless

The scenario:
I have thousands of requests I need to issue each day.
I know the number at the beginning of the day and hopefully I want to send all the data about the requests to pubsub. Message per request.
I want to make the requests in constant rate. for example if I have 172800 requests, I want to process 2 in each second.
The ultimate way will involved pubsub push and cloud run.
Using pull with long running instances is also an option.
Any other option are also welcome.
I want to avoid running in a loop and fetch records from a database with limit.
This is how I am doing it today.
You can use batch and flow control settings for fine-tuning Pub/Sub performance which will help in processing messages at a constant rate.
Batching
A batch, within the context of Cloud Pub/Sub, refers to a group of one or more messages published to a topic by a publisher in a single publish request. Batching is done by default in the client library or explicitly by the user. The purpose for this feature is to allow for a higher throughput of messages while also providing a more efficient way for messages to travel through the various layers of the service(s). Adjusting the batch size (i.e. how many messages or bytes are sent in a publish request) can be used to achieve the desired level of throughput.
Features specific to batching on the publisher side include setElementCountThreshold(), setRequestByteThreshold(), and setDelayThreshold() as part of setBatchSettings() on a publisher client (the naming varies slightly in the different client libraries). These features can be used to finely tune the behavior of batching to find a better balance among cost, latency, and throughput.
Note: The maximum number of messages that can be published in a single batch is 1000 messages or 10 MB.
An example of these batching properties can be found in the Publish with batching settings documentation.
Flow Control
Flow control features on the subscriber side can help control the unhealthy behavior of tasks on the pipeline by allowing the subscriber to regulate the rate at which messages are ingested. These features provide the added functionality to adjust how sensitive the service is to sudden spikes or drops of published throughput.
Some features that are helpful for adjusting flow control and other settings on the subscriber are setMaxOutstandingElementCount(), setMaxOutstandingRequestBytes(), and setMaxAckExtensionPeriod().
Examples of these settings being used can be found in the Subscribe with flow control documentation.
For more information refer to this link.
If you are having long running instances as subscribers, then you will need to set relevant FlowControl settings for example .setMaxOutstandingElementCount(1000L)
Once you have set it to the desired number (for example 1000), this should control the maximum amount of messages the subscriber receives before pausing the message stream, as explained in the code below from this documentation:
// The subscriber will pause the message stream and stop receiving more messsages from the
// server if any one of the conditions is met.
FlowControlSettings flowControlSettings =
FlowControlSettings.newBuilder()
// 1,000 outstanding messages. Must be >0. It controls the maximum number of messages
// the subscriber receives before pausing the message stream.
.setMaxOutstandingElementCount(1000L)
// 100 MiB. Must be >0. It controls the maximum size of messages the subscriber
// receives before pausing the message stream.
.setMaxOutstandingRequestBytes(100L * 1024L * 1024L)
.build();

Kafka Connect fetch.max.wait.ms & fetch.min.bytes combined not honored?

I'm creating a custom SinkConnector using Kafka Connect (2.3.0) that needs to be optimized for throughput rather than latency. Ideally, what I want is:
Batches of ~ 20 megabytes or 100k records whatever comes first, but if message rate is low, process at least every minute (avoid small batches, but minimum MySinkTask.put() rate to be every minute).
This is what I set for consumer settings in an attempt to accomplish it:
consumer.max.poll.records=100000
consumer.fetch.max.bytes=20971520
consumer.fetch.max.wait.ms=60000
consumer.max.poll.interval.ms=120000
consumer.fetch.min.bytes=1048576
I needs this fetch.min.bytes setting, or else MySinkTask.put() is called for multiple times per second despite the other settings...?
Now, what I observe in a low-rate situation is that MySinkTask.put() is called with 0 records multiple times and several minutes pass by, until fetch.min.bytes is reached, and then I get them all at once.
I fail to understand so far:
Why fetch.max.wait.ms=60000 is not pushing downwards from the consumer to the put() call of my connector? Shouldn't that have precedence over fetch.min.bytes?
What setting controls the ~ 2x per second call to MySinkTask.put() if fetch.min.bytes=1 (default)? I don't understand why it does that, even the verbose output of the Connect runtime settings don't show any interval below multiples of seconds.
I've double-checked the log output, and the lines INFO o.a.k.c.consumer.ConsumerConfig - ConsumerConfig values: as printed by the Connect Runtime are showing the expected values as I pass with the consumer. prefixed values.
The "process at least every interval" part seems not possible, as the fetch.min.bytes consumer setting takes precedence and Connect does not allow you to dynamically adjust the ConsumerConfig while the Task is running. :-(
Work-around for now is batching in the Task manually; set fetch.min.bytes to 1 (yikes), buffer records in the Task on put() calls, and flush when necessary. This is not very ideal as it infers some overhead for the Connector which I hoped to avoid.
The logic how Connect does a ~ 2x per second batching from its consumer's poll to SinkTask.put() remains a mystery to me, but it's better than being called for every message.

Distributed timer service

I am looking for a distributed timer service. Multiple remote client services should be able to register for callbacks (via REST apis) after specified intervals. The length of an interval can be 1 minute. I can live with an error margin of around 1 minute. The number of such callbacks can go up to 100,000 for now but I would need to scale up later. I have been looking at schedulers like Quartz but I am not sure if they are a fit for the problem. With Quartz, I will probably have to save the callback requests in a DB and poll every minute for overdue requests on 100,000 rows. I am not sure that will scale. Are there any out of the box solutions around? Else, how do I go about building one?
Posting as answer since i cant comment
One more options to consider is a message queue. Where you publish a message with scheduled delay so that consumers can consume after that delay.
Amazon SQS Delay Queues
Delay queues let you postpone the delivery of new messages in a queue for the specified number of seconds. If you create a delay queue, any message that you send to that queue is invisible to consumers for the duration of the delay period. You can use the CreateQueue action to create a delay queue by setting the DelaySeconds attribute to any value between 0 and 900 (15 minutes). You can also change an existing queue into a delay queue using the SetQueueAttributes action to set the queue's DelaySeconds attribute.
Scheduling Messages with RabbitMQ
https://github.com/rabbitmq/rabbitmq-delayed-message-exchange/
A user can declare an exchange with the type x-delayed-message and then publish messages with the custom header x-delay expressing in milliseconds a delay time for the message. The message will be delivered to the respective queues after x-delay milliseconds.
Out of the box solution
RocketMQ meets your requirements since it supports the Scheduled messages:
Scheduled messages differ from normal messages in that they won’t be
delivered until a provided time later.
You can register your callbacks by sending such messages:
Message message = new Message("TestTopic", "");
message.setDelayTimeLevel(3);
producer.send(message);
And then, listen to this topic to deal with your callbacks:
consumer.subscribe("TestTopic", "*");
consumer.registerMessageListener(new MessageListenerConcurrently() {...})
It does well in almost every way except that the DelayTimeLevel options can only be defined before RocketMQ server start, which means that if your MQ server has configuration messageDelayLevel=1s 5s 10s, then you just can not register your callback with delayIntervalTime=3s.
DIY
Quartz+storage can build such callback service as you mentioned, while I don't recommend that you store callback data in relational DB since you hope it to achieve high TPS and constructing distributed service will be hard to get rid of lock and transaction which bring complexity to DB coding.
I do suggest storing callback data in Redis. Because it has better performance than relational DB and it's data structure ZSET suits this scene well.
I once developed a timed callback service based on Redis and Dubbo. it provides some more useful features. Maybe you can get some ideas from it https://github.com/joooohnli/delay-callback

How scale a Elasticbeanstalk application worker on based on messages from SQS?

I have a Scala application to do some heavy calculation based on customer_id, I'm putting customers id in SQS (Amazon Simple Queue Service), the application it is configured on ElastikBeansTalk to consume msg from SQS.
I would like to scale my application based on message coming from SQS, the problem is my application is running as HTTP server and it return 200 code after finishing the calculation, which minimum takes 15min.
SQS max timeout is 60 seconds, so after that, all msgs ended up in Dead Letter queue, I tried to send a 200 response code before finishing the calculation, but it receives another message from the queue and starts another process.
Any solution, please ?
EDIT: example of my worker configuration:
Thank you in advance !
The max VisibilityTimeout is 12 hours. So you could put it to 30 minutes and that should cover your case.
Well, the issue is not from Akka HTTP Server or either from SQS,it's coming from default nginx conf. Issue resolved by editing the default proxy_read_timeout which 60s to the value desired.

Oracle Service Bus Proxy Service Scheduler

I need to create a proxy service scheduler that receive messages of the queue after 5 minutes. like queue produce message either a single or multiple but proxy receieve that messages after interval of every 5 minutes. how can i achieve this only using oracle service bus ...
Kindly help me for this
OSB do not provide Scheduler capabilities out of the box. You can do either of the following:
For JMS Queue put infinite retries by not setting retry limit and set retry interval as 5 minutes.
Create a scheduler. Check this post for the same: http://blogs.oracle.com/jamesbayer/entry/weblogic_scheduling_a_polling
Answer left for reference only, messages shouldn't be a subject to complex computed selections in this way, some value comparison and pattern matching only.
To fetch only old enough messages from queue,
not modifying queue or messages
not introducing any new brokers between queue and consumer
not prematurely consuming messages
, use Message Selector field of OSB Proxy on JMS Transport tab to set boolean expression (SQL 92) that checks that message's JMSTimestamp header is at least 5 minutes older than current time.
... and I wasn't successful to quickly produce valid message selector neither from timestamp nor JMSMessageID (it contains time in milis - 'ID:<465788.1372152510324.0>').
I guess somebody could still use it in some specific case.
You can use Quartz scheduler APIs to create schedulers across domains.
Regards,
Sajeev
I don't know whether this works for you, but its working good for me. May be you can use this to do your needful.
Goto Transport Details of your Proxy Service, under Advanced Options tab, set the following fields.
Polling Frequency (Mention your frequency 300 sec(5 min))
Physical Directory (may be here you need to give your Queue path)