public void setReadTimeout (int timeoutMillis)
Sets the maximum time to wait for an input stream read to complete before giving up. Reading will fail with a SocketTimeoutException if the timeout elapses before data becomes available. The default value of 0 disables read timeouts; read attempts will block indefinitely.
Parameters
timeoutMillis - the read timeout in milliseconds. Non-negative.
What is the meaning of info with bold characters?Is it good to include this in network connection?
It indicates the time between when the socket is connected and when it expects response for the request the client makes.
Default value is good enough unless you want to return the call within a certain duration and you are sure the server doesn't exceed responses beyond certain time.
Related
What is the difference between eventTimeTimeout and processingTimeTimeout in mapGroupsWithState?
Also, is possible to make a state expire after every 10 min and if the data for that particular key arrives after 10 min the state should be maintained from the beginning?
In short:
processing-based timeouts rely on the time/clock of the machine your job is running. It is independent of any timestamps given in your data/events.
event-based timeouts rely on a timestamp column within your data that serves as the event time. In that case you need to declare this timestamp as a Watermark.
More details are available in the Scala Docs on the relevant class
GroupState:
With ProcessingTimeTimeout, the timeout duration can be set by calling GroupState.setTimeoutDuration. The timeout will occur when the clock has advanced by the set duration. Guarantees provided by this timeout with a duration of D ms are as follows:
Timeout will never be occur before the clock time has advanced by D ms
Timeout will occur eventually when there is a trigger in the query (i.e. after D ms). So there is a no strict upper bound on when the timeout would occur. For example, the trigger interval of the query will affect when the timeout actually occurs. If there is no data in the stream (for any group) for a while, then their will not be any trigger and timeout function call will not occur until there is data.
Since the processing time timeout is based on the clock time, it is affected by the variations in the system clock (i.e. time zone changes, clock skew, etc.).
With EventTimeTimeout, the user also has to specify the event time watermark in the query using Dataset.withWatermark(). With this setting, data that is older than the watermark are filtered out. The timeout can be set for a group by setting a timeout timestamp usingGroupState.setTimeoutTimestamp(), and the timeout would occur when the watermark advances beyond the set timestamp. You can control the timeout delay by two parameters - (i) watermark delay and an additional duration beyond the timestamp in the event (which is guaranteed to be newer than watermark due to the filtering). Guarantees provided by this timeout are as follows:
Timeout will never be occur before watermark has exceeded the set timeout.
Similar to processing time timeouts, there is a no strict upper bound on the delay when the timeout actually occurs. The watermark can advance only when there is data in the stream, and the event time of the data has actually advanced.
"Also, is possible to make a state expire after every 10 min and if the data for that particular key arrives after 10 min the state should be maintained from the beginning?"
This is happening automatically when using mapGroupsWithState. You just need to make sure to actually remove the state after the 10 minutes.
Looking at the documentation, I'm not sure if I understand the difference between using close() and flush().
This is the doc for flush()
* Invoking this method makes all buffered records immediately available to send (even if <code>linger.ms</code> is
* greater than 0) and blocks on the completion of the requests associated with these records. The post-condition
* of <code>flush()</code> is that any previously sent record will have completed (e.g. <code>Future.isDone() == true</code>).
* A request is considered completed when it is successfully acknowledged
* according to the <code>acks</code> configuration you have specified or else it results in an error.
And the doc for close():
* This method waits up to <code>timeout</code> for the producer to complete the sending of all incomplete requests.
* If the producer is unable to complete all requests before the timeout expires, this method will fail
* any unsent and unacknowledged records immediately.
Does this mean that:
If I use close() and there are some records pending in the in-memory buffer, they won't even be attempted (compared to flush, which would attempt to send them)?
If I use flush(), it might block "forever" if the retries are large? While with close(), I have an upper bound for how long this is going to take?
I suppose if I'm right in 1., a producer with acks=0 would get a confirmation for a record that might not even be attempted to be published if it is "unlucky" enough to be placed in the in-memory queue and immediately after close() is called.
If you want to keep using the producer and wait for messages to be sent, you would use flush else close. Close with timeout value will wait for the messages to be sent and ack received as per the config till time out value and then close the producer
In my application when I send messages I use the Metadata in the callback to save the offset of the record for future usage. However sometimes the metadata.offset() returns -1 which makes things hard later.
Why does this happen and is there a way to get the offset without consuming the topic to find it.
Edit: I am on ack 0 currently, when I pass to ack 1 I don't have these errors anymore however my performance drops drastically. From 100k message in 10 sec to 1 min.
acks=0 If set to zero then the producer will not wait for any
acknowledgment from the server at all. The record will be immediately
added to the socket buffer and considered sent. No guarantee can be
made that the server has received the record in this case, and the
retries configuration will not take effect (as the client won't
generally know of any failures). The offset given back for each
record will always be set to -1.
This is not exactly true as out of 100k messages I got 95k with offsets but I guess it's normal.
Still will need to find another solution to get the offset with ack=0
I am acting as server which receives multiple requests from client in socket and handles in a thread.
Should i set any parameter in TCP level to set maximum number of requests a connection can handle simultaneously?
because in my server side ,if processing the request is slow i observe that other requests are queued up (client says request has been sent but i receive it late)
Kindly guide me
If it takes a long time to do the work and you want to handle multiple connections simultaneously, you have to change how you do things.
If you are actively using a lot of CPU during processing a long request, you'll need multiple threads. That's the only way to actually get more CPU time / second -- assuming you have multiple cores available.
If you are waiting on things like file IO, then you can instead use asynchronous processing to handle the requests on a single thread, but just handle a little piece at a time.
Setting a maximum number of TCP connections won't help you handle more processes more quickly. It will just reject connections and not even allow a first-come first-served type of behavior - it will just be random if a specific client ever gets through or not.
I am in a gaming scenario. Both the server and clients exchange messages through non-blocking UDP in a high rate.
(Maybe odd here...) The legacy code also uses select() with timeout value set to 0, which means select() does not block. select() is within a forever while loop. Upon select() returns a number greater than 0, following code goes receive message through recvfrom(). If it returns 0, following code does not try receiving.
From print-out info, I saw select() sometimes returns 1 (greater than 0). I am confused that since timeout is set to 0, how does select() have time to check if a message is ready to read from any of readfds? Thanks.
According to the spec:
Upon successful completion, the pselect() or select() function shall
modify the objects pointed to by the readfds, writefds, and errorfds
arguments to indicate which file descriptors are ready for reading,
ready for writing, or have an error condition pending, respectively,
and shall return the total number of ready descriptors in all the
output sets. For each file descriptor less than nfds, the
corresponding bit shall be set upon successful completion if it was
set on input and the associated condition is true for that file
descriptor.
If none of the selected descriptors are ready for the requested operation, the pselect() or select() function shall block until at least one of the requested operations becomes ready, until the timeout occurs, or until interrupted by a signal. The timeout parameter controls how long the pselect() or select() function shall take before timing out. If the timeout parameter is not a null pointer, it specifies a maximum interval to wait for the selection to complete. If the specified time interval expires without any requested operation becoming ready, the function shall return. If the timeout parameter is a null pointer, then the call to pselect() or select() shall block indefinitely until at least one descriptor meets the specified criteria. To effect a poll, the timeout parameter should not be a null pointer, and should point to a zero-valued timespec structure.
In short, checking the descriptors happens before checking the timeout. If the socket already has data ready when select() is called, the timeout is ignored and select() exits immediately.