Sensu Dashboard events contain stale events - sensu

I am trying to clear all the stale old renamed clients from my uchiwa dashboard. I am unable to clear them.
Basically I renamed the clients with a new name, but the old clients keep showing up on the dashboard.
I restarted sensu-server, uchiwa dashboard, and redis-server. Still I keep seeing the stale entries.

I had to get into redis-cli prompt, and run flushall. This cleared all the old events. Ideally redis-server restart should have cleared it. Well, now its all neat.
I also restarted rabbitmq-server.

I saw you found A solution but maybe not the one and only.
So what’s happening is that, the clients will not automatically deregister which is an intended behaviour since otherwise you’d not get any sort of “keep alive’ alerts.
Moreover you might at some point just be willing to check if a check has not be running for a while, like we do but those alerts will hang around for example also if you just tested a check and never executed it again.
You can thus just go to the corresponding client you have an issue with on uchiwa and delete it from there.
You can perform the same action with a curl request against the sensu-api.
The advantage over hitting flushall is that you will only delete the client(s) you want to delete whereas flushall will also remove all your stashes which might be very annoying at a larger scale.

In Uchiwa, under clients, you can select a client and delete it. The client will then disapear.

Related

how do you dynamically insert console logs on a development server

When you're developing on localhost, then you've got full access to a terminal that you can log anywhere you want. But, in a project, I work on (and am new to team collaboration as a whole) they use something called weavescope to view logs that developers have created at the time of coding.
Now what the difference between this and logging locally, everytime you'll create a change in the code, you gotta send a pull request, they approve it, and merge it, deploy it and we finally see it in the log. Now, sometimes the state of local and deployed things don't match and it really makes us wanna dynamically log on to the development server without having to go through all these cycles over again. Is there any solution already around that helps us insert some quick log statements without having to go through the routine PR, merge, deploy cycle?
EDIT: I think from discussions I had below, the tool I am looking for is more or less a logging statment code injection tool. A tool that would keep track of the logs I'm inserting into the production code, and on/off them at spin of a command.
This seems like something that logging levels can help with (unless I'm misunderstanding). Something I typically do is leave debug-level log messages on commonly problematic or complex functions, but change the logging level to something higher when I move out of local. Sometimes depending on the app and access these can be configured at the environment rather than in the build.
For example there are Spring libraries that will let you import a static logger, set the level of each message you log out. Then locally you can keep the level at DEBUG, in UAT the level can be INFO, and if you only want ERROR OR WARN messages in prod you can separate that too. At the time of deployment you can set what environment it is and store a separate app.properties or yml file for each environment storing the desired level for each
Of course there is a solution for fast pace code changes.
Maybe this kind of hot reloading is what you're looking for. This way you can insert new calls to a logger or console.log quickly.
Although it does come with a disclaimer from the author.
I honestly haven't looked into whether this method of hot reloading would provide stable production zero-downtime deploys, however my "gut feel" says don't do it. And production deployments are probably one area where we should stick to known, trusted procedures unless we have good reason.

Why is my mongodb collection deleted automatically?

I have a MongoDB client in three EC2 instances and I have created a replica set. Last time I had a problem, of space constraint which stopped my mongod process, thereby halting the application and now in an instance couple of days back, some of my tables were gone from database, so I set logging and all to my database just to catch if anything like that happens again. In a fresh incident this morning I was unable to login to my system and that's when I found out that whole database was empty. I checked other SO question like this which suggest setting up a TTL.Which I haven't done at all.
Now how do I debug this situation and do a proper root cause analysis? I can't even find anything in my debug logs as well. The tables just vanished. How do I set up proper logging mechanism and how do I ensure that all my tables are never ever deleted again?
Today I got a mail from Amazon that I was probably running an unsecured version of MongoDB and that may have caused this issue. So who ever is facing this issue please go through the Security Checklist Provided by MongoDB. There are some points that are absolutely necessary in there.
1. Enable Access Control and Enforce Authentication
2. Encrypt Communication
3. Limit Network Exposure
These three are the core and depending upon how many people access your database you can Configure Role-Based Access Control.
These are all the things I have done. Before this incident I had not taken security that seriously but after I was hit by it. I made sure I have all the necessary precautions in place.
Hope this helps someone.

Distributed Recovery - can this be done without timeout?

We have a mail sender application, that receives a bunch of mails in one blob, and then puts all those mails into database. This can take up to ten minutes. During this process the state of the mailing is BUILDING.
When it is finished the state gets changed to READY.
When the server crashes (shouldn't happen of course) and restarts, it looks for all mailings with status BUILDING and marks them as ERROR. This happens, because we never want to send incomplete mailings.
Now we'd like to scale up using a second server. The recovery strategy above doesn't work here.
e.g. server 1 is BUILDING a mailing, and server 2 crashes and restarts. Now server 2 will see the BUILDING mailing and doesn't know if it's been aborted or if it's running on another server.
So what's the best recovery strategy for distributed services?
(We thought about some timeout mechanism, where the BUILDING server updates a timestamp every few seconds, and when some server reboots it checks if there's a BUILDING mailing that hasn't been updated for x minutes. Then it's highly possible that this mailing has been aborted.)
EDIT:
What I'd like to achieve: If some server restarts (after a crash or just because we added a new mailing server to the cluster), it should not mark mailings as ERROR if this particular mailing is actually being built (by another server).
Nice to have: If this would work without having to store server ids, because then it's possible to easily add and/or remove servers. Else it would not be possible to completely remove some server, because then there might be a BUILDING mailing with that particular server id. But this server got removed and will never get started again. Though the only server that could set the mailing to ERROR will be gone.
Add two things to your state tracking: a timestamp and the server working on it.
If a server starts up and sees anything in a building state for itself it knows it failed. Conversely, if it starts up and sees something in a building state for another server, it now has information that it's going to need to look at later to see if there's a problem that needs to be addressed. You need to worry about multiple servers restarting at the same time, so you can't just have a server grab all old bundles for all servers at startup.
Or you can just use a clustering service for your OS.

How should I determine what is issuing a flush_all command

We have a memcached server that is shared by about two dozen apps. One of the web apps (or perhaps one of our utility apps) is issuing a flush_all command periodically. The frequency seems random, or at least we haven't seen a pattern yet. It happens about 10 times an hour.
Here's the rub. I can't figure out a good way to figure out which app is doing this. The memcacehd logs are not helpful at all. Here's what I've done so far:
* grep all source code - Other than memcached libraries I can't see anywhere where we issue this command.
* Enable verbose logging (-vv) in memcached - I see the commands get issued, but the log doesn't show any information about where the command is being issued from.
* Research how to administratively disable this; without an unapproved source patch to memcached I can't figure out a good way to do it.
Has anyone else had this problem? I'm assuming that this is coming from one of our web apps, but its possible its from somewhere else too. Any suggestions?
My next step is to setup a second memcached server and move applications one by one (which will be slow and time consuming). There must be a better way.
A little late, but in case anyone else hits this...
I'd suggest you set up multiple memcache proxies and configure each application to use a different one. The first proxy I found was twemproxy, no idea how good it is.
After that you can use the logs for the proxy to identify which application is issuing the commands.

What's the best way to update code remotely?

For example, I have a website with various types of information. If that goes down I have a copy of the same website the users use on a local webserver, like Apache or IIS on the client. They use this local version until the Internet version returns. They can have no downtime, in other words.
The problem is that over time the Internet version will change while the client versions will remain the same unless I touch each client's machine to make the updates. I don't want to do that.
Is there a good way to keep my client up to date so that when I make a change on the server the client gets a copy so they can run it locally if needs be?
Thank you.
EDIT: do you think maybe using SVN and timely running of the update by the clients would work?
EDIT: they'll never ever submit anything. It's just so I don't have to update the client by hand, manually going to the machine. they're webpages that run in case the main server is down.
I will go for Git over SVN because of its distributed nature. Gives you multiple copies of code; use it along with this comment's solution:
Making git auto-commit
to autocommit.
Why not use something like HTTrack to make local copies of your actual internet site on each machine, rather then trying to do a separate deployment. That way you'll automatically stay in sync.
This has the advantage that if, at some point, part of your website is updated dynamically from a database, the user will still be able to have a static copy of the resulting site that is up-to-date.
There are tools like rsync which you can use periodically to sync the changes.