Adding an exception to kubernetes garbage collection - kubernetes

In our project we have multiple cron-job using very large images, configured to run pretty often.
Whenever the garbage collection threshold is met images associated with those cron-jobs are removed, because they are not currently in use. Pulling those images from repository whenever they are needed introduces some problems due to their size.
My question is can i make it so that images associated with cron-jobs are ommited during garbage collection? A way to add an exception?
So far the only thing i came up with was creating another deployment that would use same image 24/7 with some changes so that it's execution doesn't finish normally. So that the image is in use when garbage collection is triggered.

I don`t know the way to specify a list of image name exceptions to Image Garbage Collection Policy, but maybe you can workaround it by overriding a default value (2 minutes) of
Minimum age for an unused image before it is garbage collected.
through the following kubelet flags:
--minimum-image-ttl-duration=12h (by default it`s set to 2m - minutes)
the other user controlled flags are documented here
The above one I found in kubelet source code on GitHub

Related

Can you calculate active users using time series

My atomist client exposes metrics on commands that are run. Each command is a metric with a username element as well a status element.
I've been scraping this data for months without resetting the counts.
My requirement is to show the number of active users over a time period. i.e 1h, 1d, 7d and 30d in Grafana.
The original query was:
count(count({Username=~".+"}) by (Username))
this is an issue because I dont clear the metrics so its always a count since inception.
I then tried this:
count(max_over_time(help_command{job=“Application
Name”,Username=~“.+“}[1w]) -
max_over_time(help_command{job=“Application name”,Username=~“.+“}[1w]
offset 1w) > 0)
which works but only for one command I have about 50 other commands that need to be added to that count.
I tried the:
"{__name__=~".+_command",job="app name"}[1w] offset 1w"
but this is obviously very expensive (timeout in browser) and has issues with integrating max_over_time which doesn't support it.
Any help, am I using the metric in the wrong way. Is there a better way to query... my only option at the moment is the count (format working above for each command)
Thanks in advance.
To start, I will point out a number of issues with your approach.
First, the Prometheus documentation recommends against using arbitrarily large sets of values for labels (as your usernames are). As you can see (based on your experience with the query timing out) they're not entirely wrong to advise against it.
Second, Prometheus may not be the right tool for analytics (such as active users). Partly due to the above, partly because it is inherently limited by the fact that it samples the metrics (which does not appear to be an issue in your case, but may turn out to be).
Third, you collect separate metrics per command (i.e. help_command, foo_command) instead of a single metric with the command name as label (i.e. command_usage{commmand="help"}, command_usage{commmand="foo"})
To get back to your question though, you don't need the max_over_time, you can simply write your query as:
count by(__name__)(
(
{__name__=~".+_command",job=“Application Name”}
-
{__name__=~".+_command",job=“Application name”} offset 1w
) > 0
)
This only works though because you say that whatever exports the counts never resets them. If this is simply because that exporter never restarted and when it will the counts will drop to zero, then you'd need to use increase instead of minus and you'd run into the exact same performance issues as with max_over_time.
count by(__name__)(
increase({__name__=~".+_command",job=“Application Name”}[1w]) > 0
)

kubernetes possible different image running at same time which cause issue?

Let's say I have a pod using an image myImage:latest (in docker.hub) which its imagePullPolicy is Always, and it is running in worker node 1. The image id is myImage:latest#sha-1111.
Let's say if I push a new image to docker.hub with myImage:latest#sha-2222. After that I scale my pod up, the pod is scheduled in worker node 2 which will pull the new image to start the pod. In this case, I suppose the new pod will be using image sha-2222? Would this be an issue? How could we solve this?
PS:
Note that you should avoid using :latest tag, see Best Practices for
Configuration for more information.
Understand using latest tag is bad practice. But I believe this will happen too if we tag to specific number.
If you tag to a specific number, then that specific number will get pulled onto node2 -> no issue.
If you don't tag a specific number, but use latest (as you noted, not recommended) -> outcome is determined by whether the container's behavior is backwards compatible. E.g., if the first container is v1.1.0 and the second container is v1.2.0 and your versioning is based on semantic versioning, you still should have no practical issue.
To solve, use specific image versions and perform an upgrade at the time of doing the scaling. Existing instances will be updated to the new version, and new instances (to match scaling needs) will be pulled from the new version.

radosgw remain shadow files when I delete objects in pool

I deployed the rgw in my cluster and when I did the test,I frequently uploaded and deleted the objects,and after that I found a lot of shadows files remain in .rgw.buckets,I try to run the commend:radosgw-admin temp remove but it give me a error which arg remove cannot be recognized.I also try to config gc but gc list always gives me en empty list.
Could someone tell how to deal with shadow file or how to delete them?
Thanks so much
the gc triggers after sometime, but it does take a few hours before it can get all the shadow objects... What does gc list --include-all show? In general, --include-all may show the objects pending deletion) Does it decrease after a few hours?
Another option is to try finding orphaned objects using radosgw-admin orphans find on the pool, these can be deleted later manually via a rados client of choice (edit not sure why my previous answer got deleted..)

Can watchman send why a file changed?

Is watchman capable of posting to the configured command, why it's sending a file to that command?
For example:
a file is new to a folder would possibly be a FILE_CREATE flag;
a file that is deleted would send to the command the FILE_DELETE flag;
a file that's modified would send a FILE_MOD flag etc.
Perhaps even when a folder gets deleted (and therefore the files thereunder) would send a FOLDER_DELETE parameter naming the folder, as well as a FILE_DELETE to the files thereunder / FOLDER_DELETE to the folders thereunder
Is there such a thing?
No, it can't do that. The reasons why are pretty fundamental to its design.
The TL;DR is that it is a lot more complicated than you might think for a client to correctly process those individual events and in almost all cases you don't really want them.
Most file watching systems are abstractions that simply translate from the system specific notification information into some common form. They don't deal, either very well or at all, with the notification queue being overflown and don't provide their clients with a way to reliably respond to that situation.
In addition to this, the filesystem can be subject to many and varied changes in a very short amount of time, and from multiple concurrent threads or processes. This makes this area extremely prone to TOCTOU issues that are difficult to manage. For example, creating and writing to a file typically results in a series of notifications about the file and its containing directory. If the file is removed immediately after this sequence (perhaps it was an intermediate file in a build step), by the time you see the notifications about the file creation there is a good chance that it has already been deleted.
Watchman takes the input stream of notifications and feeds it into its internal model of the filesystem: an ordered list of observed files. Each time a notification is received watchman treats it as a signal that it should go and look at the file that was reported as changed and then move the entry for that file to the most recent end of the ordered list.
When you ask Watchman for information about the filesystem it is possible or even likely that there may be pending notifications still due from the kernel. To minimize TOCTOU and ensure that its state is current, watchman generates a synchronization cookie and waits for that notification to be visible before it responds to your query.
The combination of the two things above mean that watchman result data has two important properties:
You are guaranteed to have have observed all notifications that happened before your query
You receive the most recent information for any given file only once in your query results (the change results are coalesced together)
Let's talk about the overflow case. If your system is unable to keep up with the rate at which files are changing (eg: you have a big project and are very quickly creating and deleting files and the system is heavily loaded), the OS can't fit all of the pending notifications in the buffer resources allocated to the watches. When that happens, it blows those buffers and sends an overflow signal. What that means is that the client of the watching API has missed some number of events and is no longer in synchronization with the state of the filesystem. If that client is maintains state about the filesystem it is no longer valid.
Watchman addresses this situation by re-examining the watched tree and synthetically marking all of the files as being changed. This causes the next query from the client to see everything in the tree. We call this a fresh instance result set because it is the same view you'd get when you are querying for the first time. We set a flag in the result so that the client knows that this has happened and can take appropriate steps to repair its own state. You can configure this behavior through query parameters.
In these fresh instance result sets, we don't know whether any given file really changed or not (it's possible that it changed in such a way that we can't detect via lstat) and even if we can see that its metadata changed, we don't know the cause of that change.
There can be multiple events that contribute to why a given file appears in the results delivered by watchman. We don't them record them individually because we can't track them with unbounded history; imagine a file that is incrementally being written once every second all day long. Do we keep 86400 change entries for it per day on hand and deliver those to our clients? What if there are hundreds of thousands of files like this? We'd have to truncate that data, and at that point the loss in the data reduces how well you can reason about it.
At the end of all of this, it is very rare for a client to do much more than try to read a file or look at its metadata, and generally speaking, they want to do that only when the file has stopped changing. For this use case, watchman-wait, watchman-make and trigger all have the concept of a settle period that causes the change notifications to be delayed in delivery until after the filesystem has stopped changing.

Swiftstack - Containers not getting removed

Even after deleting containers and objects directly from file system, Swift is listing the containers when executed GET command on it. However, if we try to delete the container with DELETE command then 404: Not Found error message is returned. Please explain whether there is something wrong or is there some kind of cache?
I think the problem came from deleting the containers and/or objects directly from the file system.
Swift's methods for handling write requests for object and container have to be very careful to ensure all the distributed index information remains eventually consistent. Direct modification of the file system is not sufficient. It sounds like the container databases got removed before they had a chance to update the account databases listings - perhaps manually unlinked before all of the object index information was removed?
Normally after a delete request the containers have to hang around for awhile as "tombstones" to ensure the account database gets updated correctly.
As a work around you could recreate them (with a POST) and then re-issue the DELETE; which should successfully allow the DELETE of the new empty containers and update the account database listing directly.
(Note: the container databases themselves, although empty, will still exist on disk as tombstones until the the reclaim_age passes)