How to modify default expired time of continue token in Kubernetes? - kubernetes

On this page https://kubernetes.io/docs/reference/using-api/api-concepts/#retrieving-large-results-sets-in-chunks, there is a continue token that will expire after a short amount of time (by default 5 minutes).
I find that when kubernetes controller manager does cronjob syncall() function in my cluster, this token always expires and stops cronjob creating jobs on schedule.
The following is the log in kubernetes-controller-manager:
E0826 11:26:45.441592 1 cronjob_controller.go:146] Failed to extract cronJobs list: The provided continue parameter is too old to display a consistent list result. You can start a new list without the continue parameter, or use the continue token in this response to retrieve the remainder of the results. Continuing with the provided token results in an inconsistent list - objects that were created, modified, or deleted between the time the first chunk was returned and now may show up in the list.
So I want to know can I modify the default expired time of the continue token in Kubernetes, and how to do it?
Thanks.

This is an etcd default. Any auth request to etcd will incur into that 5 seconds expiry interval. This is due to the compaction interval. The good news is that you can change that as an option in the kube-apiserver with the --etcd-compaction-interval option.
Also, it looks like doing a simple GET within the 5 minutes would actually make it extend the token timeout.
✌️

Related

How does APNS determines a provider token's age?

The documentation says:
The claims payload of the token must include:
The issued at (iat) registered claim key, whose value indicates
the time at which the token was generated, in terms of the number of
seconds since Epoch, in UTC
To ensure security, APNs requires new tokens to be generated periodically. A new token has an
updated issued at claim key, whose value indicates the time the token was generated. If the timestamp for token issue is not within the last hour, APNs rejects subsequent push
messages, returning an ExpiredProviderToken (403) error.
Source: https://developer.apple.com/library/archive/documentation/NetworkingInternet/Conceptual/RemoteNotificationsPG/CommunicatingwithAPNs.html
At another section:
iat | The “issued at” time, whose value indicates the time at which this JSON token was generated. Specify the value as the number of seconds since Epoch, in UTC. The value must be no more than one hour from the current time.
Those rules are so fragmented and repetitive at the same time, so please correct me if I'm wrong:
iat must be a numeric date between 1h ago and 1h from now. Let's say it's 8:30 now, and I set iat to 8 o'clock: does it mean my token is gonna be valid for another half hour, since that's what iat is telling APNS, or doest it start counting by the time APNS receive my push request? What if I set iat to 1h from now... does it mean my token is gonna be valid for 2h?
Another question. Given that:
Refresh Your Token Regularly
For security, APNs requires you to refresh your token regularly. Refresh your token no more than once every 20 minutes and no less than once every 60 minutes. APNs rejects any request whose token contains a timestamp that is more than one hour old. Similarly, APNs reports an error if you recreate your tokens more than once every 20 minutes.
Source: https://developer.apple.com/documentation/usernotifications/setting_up_a_remote_notification_server/establishing_a_token-based_connection_to_apns
Everytime I sign a token (using a Node module for JWT), it generates a different string, even though I use the same iat. Does it counts as a "recreation", causing a TooManyProviderTokenUpdate error if I use it before that 20 minute threshold?

Why do I keep getting "Quota exceededfor quota group 'AnalyticsDefaultGroup' and limit 'USER-100s'" Errors?

I am currently managing two Google Analytics Management Accounts with many clients and view_ids on each one. The task is to request client data via the Google Analytics Reporting API (v4) and store them to a SQL Backend on a daily basis via an Airflow DAG-structure.
For the first account everything works fine.
Just recently I added the second account to the data request routine.
The problem is that even though both accounts are set to the same "USER-100s" quota limits, I keep getting this error for the newly added account:
googleapiclient.errors.HttpError: <HttpError 429 when requesting https://analyticsreporting.googleapis.com/v4/reports:batchGet?alt=json returned "Quota exceeded for quota group 'AnalyticsDefaultGroup' and limit 'USER-100s' of service 'analyticsreporting.googleapis.com' for consumer 'project_number:XXXXXXXXXXXX'.">
I already set the quota limit "User-100s" from 100 to the maximum of 1000, as recommended in the official Google guidelines (https://developers.google.com/analytics/devguides/config/mgmt/v3/limits-quotas)
Also I checked the Google API Console and the number of requests for my project number, but I never exceeded the 1000 requests per 100 seconds so far (see request history account 2), while the first account always works(see request history account 1). Still the above error appeared.
Also I could rule out the possibility that the 2nd account's clients simply have more data.
request history account 1
request history account 2
I am now down to a try-except loop that keeps on requesting until the data is eventually queried successfully, like
success = False
data = None
while not success:
try:
data = query_data() # trying to receive data from the API
if data:
success = True
except HttpError as e:
print(e)
This is not elegant at all and bad for maintaining (like integration tests). In addition, it is very time and resource intensive, because the loop might sometimes run indefinitely. It can only be a workaround for a short time.
This is especially frustrating, because the same implementation works with the first account, that makes more requests, but fails with the second account.
If you know any solution to this, I would be very happy to know.
Cheers Tobi
I know this question is here for a while, but let me try to help you. :)
There are 3 standard request limits:
50k per day per project
2k per 100 seconds per project
100 per 100 seconds per user
As you showed in your image (https://i.stack.imgur.com/Tp76P.png)
The quota group "AnalyticsDefaultGroup" refers to your API project and the user quota is included in this limit.
Per your description, you are hitting the user quota and that usually happens when you don't provide the userIP or quotaUser in your requests.
So there is to main points you have to handle, to prevent those errors:
Include the quotaUser with a unique string in every request;
Keep 1 request per second
By your code, I will presume that you are using the default Google API Client for Python (https://github.com/googleapis/google-api-python-client), which don't have a global way to define the quotaUser.
To include the quotaUser
analytics.reports().batchGet(
body={
'reportRequests': [{
'viewId': 'your_view_id',
'dateRanges': [{'startDate': '2020-01-01', 'endDate': 'today'}],
'pageSize': '1000',
'pageToken': pageToken,
'metrics': [],
'dimensions': []
}]
},
quotaUser='my-user-1'
).execute()
That will make to Google API register you request for that user, using 1 of the 100 user limit, and not the same for your whole project.
Limit 1 request per second
If you plan to make a lot of requests, I suggest including a delay between every request using:
time.sleep(1)
right after a request on the API. That way you can keep under 100 requests per 100 seconds.
I hoped I helped. :)

How can I get down time of a specific deployment in kubernetes?

I have an use case where I need to collect the downtime of each deployment (if all the replicas(pods) are down at the same point of time).
My goal is to maintain the total down time for each deployment since it was created.
I tried getting it from deployment status, but the problem is that I need to make frequent calls to get the deployment and check for any down time.
Also the deployment status stores only the latest change. So, I will end up missing out the changes that occurred in between each call if there is more than one change(i.e., down time). Also I will end up making multiple calls for multiple deployments frequently which will consume more compute resource.
Is there any reliable method to collect the down time data of an deployment?
Thanks in advance.
A monitoring tool like prometheus would be a better solution to handle this.
As an example, below is a graph from one of our deployments for last 2 days
If you look at the blue line for unavailable replicas, we had one replica unavailable from about 17:00 to 10:30 (ideally unavailable count should be zero)
This seems pretty close to what you are looking for.

kafka Streams session windows

Hello I am working on kafka session window with inactive time 5 mins. I want some kind of feedback when inactive time is reached and session is drooped for the key.
lets assume I have
(A,1)
record where 'A' is the key. now if i don't get any 'A' key record in 5 mins the session is dropped.
I want to do some operation on end of session lets say (value)*2 for that session. is there any way I can achieve this using Kafka Stream API
Kafka Streams does not drop a session after the gap-time passed. Instead, if will create a new session if another record with the same key arrives after the gap-time passed and maintain both session in parallel. This allows to handle out-of-order data. It could even happen, that two session get merged if an out-of-order data falls into a gap and "connects" both sessions with each other.
Sessions are maintained for 1 day by default. You can change this via SessionWindows#until() method. If a session expires it will be dropped silently. There is no notification. You also need to consider config parameter window.store.change.log.additional.retention.ms:
The default retention setting is Windows#maintainMs() + 1 day. You can override this setting by specifying StreamsConfig.WINDOW_STORE_CHANGE_LOG_ADDITIONAL_RETENTION_MS_CONFIG in the StreamsConfig.
Thus, you want to do react if time passed, you should look into punctuations that allow you to register regular callbacks (some kind of timer) either based on "even time progress" or wall-clock time. This allows you to react if a session is not update for a certain period of time and you think it's "completed".

Memcache maximum key expiration time

What's memcached's maximum key expiration time?
If I don't provide an expiration time and the cache gets full, what happens?
You can set key expiration to a date, by supplying a Unix timestamp instead of a number of days. This date can be more than 30 days in the future:
Expiration times are specified in unsigned integer seconds. They can be set from 0, meaning "never expire", to 30 days (60*60*24*30). Any time higher than 30 days is interpreted as a unix timestamp date. If you want to expire an object on january 1st of next year, this is how you do that.
https://github.com/memcached/memcached/wiki/Programming#expiration
But, as you say, if you’re setting key expiration to an amount of time rather than a date, the maximum is 2,592,000 seconds, or 30 days.
If you don't provide expiration and cache gets full then the oldest key-values are expired first:
Memory is also reclaimed when it's time to store a new item. If there are no free chunks, and no free pages in the appropriate slab class, memcached will look at the end of the LRU for an item to "reclaim". It will search the last few items in the tail for one which has already been expired, and is thus free for reuse. If it cannot find an expired item however, it will "evict" one which has not yet expired. This is then noted in several statistical counters
https://github.com/memcached/memcached/wiki/UserInternals#when-are-items-evicted
No there is no limit. The 30 days limit is if you give the amount of seconds it should stay there, but if you give a timestamp, there is only the max long or int value on the machine which can be a limit.
->set('key', 'value', time() + 24*60*60*365) will make the key stay there for a year for example, but yeah if the cache gets full or restarted in between, this value can be deleted.
An expiration time, in seconds. Can be up to 30 days. After 30 days,
is treated as a unix timestamp of an exact date.
https://code.google.com/p/memcached/wiki/NewCommands#Standard_Protocol
OK, I found out that the number of seconds may not exceed 2592000 (30 days). So the maximum expiration time is 30 days.
Looks like some answers are not valid anymore.
I found out a key does not get set at all when the TTL is too high. For example 2992553564.
Tested with the following PHP code:
var_dump($memcached->set($id, "hello", 2992553564); // true
var_dump($memcached->get($id)); // empty!
var_dump($memcached->set($id, "hello", 500); // true
var_dump($memcached->get($id)); // "hello"
Version is memcached 1.4.14-0ubuntu9.
On laravel config.session.lifetime setting that if set to be an equivalent of 30days above, will be considered as a timestamp (this will give an error of token mismatch everytime assuming that memcached is used).
To answer, memcached expiration could be set anytime. (Laravel's default setting (on v5.0) will set you to an already expire timestamp). If you did not set it, the defualt will be used.
If I don't provide an expiration time and the cache gets full, what happens?
If the expiration is not provided (or TTL is set to 0) and the cache gets full then your item may or may not get evicted based on the LRU algorithm.
Memcached provides no guarantee that any item will persist forever. It may be deleted when the overall cache gets full and space has to be allocated for newer items. Also in case of a hard reboot all the items will be lost.
From user internals doc
Items are evicted if they have not expired (an expiration time of 0 or
some time in the future), the slab class is completely out of free
chunks, and there are no free pages to assign to a slab class.
Below is how you can reduce the chance's of your item getting cleaned by the LRU job.
Create an item that you want to expire in a
week? Don't always fetch the item but want it to remain near the top
of the LRU for some reason? add will actually bump a value to the
front of memcached's LRU if it already exists. If the add call
succeeds, it means it's time to recache the value anyway.
source on "touch"
It is also good to monitor overall memory usage of memcached for resource planning and track the eviction statistics counter to know how often cache's are getting evicted due to lack of memory.