Sensu transport handler define max record count - sensu

I have some sensu metric check with transport handler defined on it. My transport is Redis and I can see that every metric cycle new record added to redis, however I would like to define max records limit or record TTL so I won't store endless metrics data in redis. How can I do it from handler json declaration? Here is my handler definition:
{
"handlers": {
"redis_handler": {
"type": "transport",
"mutator": "only_check_output",
"pipe": {
"type": "direct",
"name": "example_handler_queue"
}
}
}
}
Thanks

You can use occurences filter extension for sensu,
The occurrences filter will determine if an event occurrence count
meets the user defined requirements in the event check definition.
Users can specify a minimum number of occurrences before an event will
be passed to a handler. Users can also specify a refresh time, in
seconds, to reset where recurrences are counted from.

Related

When doing an upsert to MongoDb is it possible to set a field with a timestamp only if other data in the record has changed?

We need to cache records for a service with a terrible API.
This service provides us with API to query for data about our employees, but does not inform us whether employees are new or have been updated. Nor can we filter our queries to them for this information.
Our proposed solution to the problems this creates for us is to periodically (e.g. every 15 minutes) query all our employee data and upsert it into a Mongo database. Then, when we write to the MongoDb, we would like to include an additional property which indicates whether the record is new or whether the record has any changes since the last time it was upserted (obviously not including the field we are using for the timestamp).
The idea is, instead of querying the source directly, which we can't filter by such timestamps, we would instead query our cache which would include said timestamp and use it for a filter.
(Ideally, we'd like to write this in C# using the MongoDb driver, but more important right now is whether we can do this in an upsert call or whether we'd need to load all the records into memory, do comparisons, and then add the timestamps before upserting them....)
There might be a way of doing that, but how efficient that is, still needs to be seen. The update command in MongoDB can take an aggregation pipeline to perform an update operation. We can use the $addFields stage of MongoDB to add a new field denoting the update status, and we can use $function to compute its value. A short example is:
db.collection.update({
key: 1
},
[
{
"$addFields": {
changed: {
"$function": {
lang: "js",
"args": [
"$$ROOT",
{
"key": 1,
data: "somedata"
}
],
"body": "function(originalDoc, newDoc) { return JSON.stringify(originalDoc) !== JSON.stringify(newDoc) }"
}
}
}
}
],
{
upsert: true
})
Here's the playground link.
Some points to consider here, are:
If the order of fields in the old and new versions of the doc is not the same then JSON.stringify will fail.
The function specified in $function will run on the server-side, so ideally it needs to be lightweight. If there is a large number of users, that will get upserted, then it may or may not act as a bottleneck.

Inserting multiple key value pair data under single _id in cloudant db at various timings?

My requirement is to get json pair from mqtt subscriber at different timings under single_id in cloudant, but I'm facing error while trying to insert new json pair in existing _id, it simply replace old one. I need at least 10 json pair under one _id. Injecting at different timings.
First, you should make sure about your architectural decision to update a particular document multiple times. In general, this is discouraged, though it depends on your application. Instead, you could consider a way to insert each new piece of information as a separate document and then use a map-reduce view to reflect the state of your application.
For example (I'm going to assume that you have multiple "devices", each with some kind of unique identifier, that need to add data to a cloudant DB)
PUT
{
"info_a":"data a",
"device_id":123
}
{
"info_b":"data b",
"device_id":123
}
{
"info_a":"message a"
"device_id":1234
}
Then you'll need a map function like
_design/device/_view/state
{
function (doc) {
emit(doc.device_id, 1);
}
Then you can GET the results of that view to see all of the "info_X" data that is associated with the particular device.
GET account.cloudant.com/databasename/_design/device/_view/state
{"total_rows":3,"offset":0,"rows":[
{"id":"28324b34907981ba972937f53113ac3f","key":123,"value":1},
{"id":"d50553d206d722b960fb176f11841974","key":123,"value":1},
{"id":"eaa710a5fa1ff4ba6156c997ddf6099b","key":1234,"value":1}
]}
Then you can use the query parameters to control the output, for example
GET account.cloudant.com/databasename/_design/device/_view/state?key=123&include_docs=true
{"total_rows":3,"offset":0,"rows":[
{"id":"28324b34907981ba972937f53113ac3f","key":123,"value":1,"doc":
{"_id":"28324b34907981ba972937f53113ac3f",
"_rev":"1-bac5dd92a502cb984ea4db65eb41feec",
"info_b":"data b",
"device_id":123}
},
{"id":"d50553d206d722b960fb176f11841974","key":123,"value":1,"doc":
{"_id":"d50553d206d722b960fb176f11841974",
"_rev":"1-a2a6fea8704dfc0a0d26c3a7500ccc10",
"info_a":"data a",
"device_id":123}}
]}
And now you have the complete state for device_id:123.
Timing
Another issue is the rate at which you're updating your documents.
Bottom line recommendation is that if you are only updating the document once per ~minute or less frequently, then it could be reasonable for your application to update a single document. That is, you'd add new key-value pairs to the same document with the same _id value. In order to do that, however, you'll need to GET the full doc, add the new key-value pair, and then PUT that document back to the database. You must make sure that your are providing the most recent _rev of that document and you should also check for conflicts that could occur if the document is being updated by multiple devices.
If you are acquiring new data for a particular device at a high rate, you'll likely run into conflicts very frequently -- because cloudant is a distributed document store. In this case, you should follow something like the example I gave above.
Example flow for the second approach outlined by #gadamcox for use cases where document updates are not required very frequently:
[...] you'd add new key-value pairs to the same document with the same _id value. In order to do that, however, you'll need to GET the full doc, add the new key-value pair, and then PUT that document back to the database.
Your application first fetches the existing document by id: (https://docs.cloudant.com/document.html#read)
GET /$DATABASE/100
{
"_id": "100",
"_rev": "1-2902191555...",
"No": ["1"]
}
Then your application updates the document in memory
{
"_id": "100",
"_rev": "1-2902191555...",
"No": ["1","2"]
}
and saves it in the database by specifying the _id and _rev (https://docs.cloudant.com/document.html#update)
PUT /$DATABASE/100
{
"_id": "100",
"_rev": "1-2902191555...",
"No":["1","2"]
}

What's the pattern to request for all records using RESTful app?

I would like to know what's the best approach for my RESTful app when requesting all records.
For example, I limit my responses to 10 if no $top value is provided to avoid overload. However, how can I format my request? Which is better, $top=allrows or $top=all? Is there any pattern I should check for?
If $top value is not provided I only return up to 10 rows.
GET /products?$top=
I just want to avoid this:
GET /products/all
There's no official pattern and any choice would depend on the size of your data.
Whatever you do, always put a maximum limit to the number of items you'll return regardless of parameters the client provides in the request.
Also, create a default count to return when no information is provided by parameters.
If you don't have tons of items to return, you count set your default count to be max limit and that be enough to always return all and you could just make the url without any details on specific counts return all.
GET /products (no count/provided)
If you have hundreds or thousands and you have a default count of say 100, maybe use something a explicit count to extend that limit (up to the max of course--if asking for count > max, return a 400 bad request with message indicating count can't be higher than max)
GET /products?count=1000000
However, this could be horrible for your server(s) and/or the client if you keep pushing the max limit higher and higher.
Typically, if you've a lot of records, you chunk it up and use a count and offset to pull it down in byte-sized chunks. Also add meta data to the response object letting the requester know the current position, total records, and offset supplied
A little pseudo-code:
$count = 1000
$offset = 0
While count*offset < total records:
GET /products?count=$count&offset=$offset
$offset = $offset + $count
Assuming one of the requests looks like:
GET /products?count=1000&offset=1000
Then in the response body you'd expect something like:
{
"result": [
{
"id": "123",
"name": "some product",
"link": "/product/123"
},
... many more products ...
{
"id": "465",
"name": "another product",
"link": "/product/465"
}
],
"meta": {
"count": 1000,
"offset": 1000,
"total_count": 3000,
"next_link": "/products?count=1000&offset=2000",
"prev_link": "/products?count=1000&offset=0",
},
"status": 200
}
If you really want gold star you can make your resources adhere to HATEOS ( https://en.wikipedia.org/wiki/HATEOAS ) and include links to the individual resources in the list and maybe in the meta have links to the next and prior chunks of the list if you're walking a large list of items. I've put some example links in the json sample above.
I would do it like that:
For all products GET /products
For certain product GET /products/{id}
For all products with filtering or sorting,
use GET /products but let the client send you Filter object in request body.
There it can specify if he wants certain page in pagination, or filter some records etc.
Filter Object could look like:
{"pageNumber":1,"pageSize":12,"freeText":"","conditions":[],"sortings":{}}
In your service map it to inner service filter and return the requested records.

Is it possible to process objects in a Google Cloud Storage bucket in FIFO order?

In my web app, I need to pull objects from gcs one by one and process them.
So the question is,
"How do I send a request to gcs to get the next unprocessed object?"
What I’d like to do is to simply rely on the sort order provided by gcs and then just process the objects in this sorted list one by one.
That way, I only need to keep track of the last processed item in my app.
I’d like to rely on the sort order provided by the timeCreated timestamp on each individual object in the bucket.
When I query my bucket via the JSON API, I notice that the objects are returned sorted by timeCreated from oldest to newest.
For example, this query ...
returns this list ...
{
"items": [
{
"name": "cars_train/00001.jpg",
"timeCreated": "2016-03-23T19:19:47.506Z"
},
{
"name": "cars_train/00002.jpg",
"timeCreated": "2016-03-23T19:19:49.320Z"
},
{
"name": "cars_train/00003.jpg",
"timeCreated": "2016-03-23T19:19:50.228Z"
},
{
"name": "cars_train/00004.jpg",
"timeCreated": "2016-03-23T19:19:51.377Z"
},
{
"name": "cars_train/00005.jpg",
"timeCreated": "2016-03-23T19:19:51.778Z"
},
{
"name": "cars_train/00006.jpg",
"timeCreated": "2016-03-23T19:19:52.817Z"
},
{
"name": "cars_train/00007.jpg",
"timeCreated": "2016-03-23T19:19:53.868Z"
},
{
"name": "cars_train/00008.jpg",
"timeCreated": "2016-03-23T19:19:54.925Z"
},
{
"name": "cars_train/00009.jpg",
"timeCreated": "2016-03-23T19:19:58.426Z"
},
{
"name": "cars_train/00010.jpg",
"timeCreated": "2016-03-23T19:19:59.323Z"
}
]
}
This sort order by timeCreated is exactly what I need, though I’m not certain if I can rely on this always being true?
So, I could code my app to process this list by simply searching for the first timeCreated value greater than the last object that processed.
The problem is this list can be very large and searching through a huge list every single time the user presses the NEXT button is too computationally expensive.
I would like to be able to specify in my query to gcs to filter the list so that I return only the single item that I need.
The API does allow me to set the maxResults returned to a value of 1.
However, I do not see an option that would allow me to return only objects whose timeCreated value is greater than the value I specified.
I think what I am trying to do is probably fairly common, so I’m guessing that a solution may exist for this problem.
One work around for this problem is to physically move an object that has been processed to another bucket.
That way the first item in the list would always be the newest one and I could simply send the request with maxCount=1.
But this adds complexity because it forces me have have 2 separate buckets for every project instead of 1.
Is there a way to filter this list of objects to only include ones whose timeCreated date is above a specified value?
In MySQL, it might be something like ...
SELECT name
FROM bucket
WHERE timeCreated > X
ORDER BY timeCreated
LIMIT 1
You can configure object change notifications on the bucket, and get a notification each time a new object arrives. That would allow you to process new objects without scanning a long listing each time. It also avoids the problem that listing a bucket is only eventually consistent (so, recently uploaded objects may not show up immediately when you list objects; I don't know if that's a problem for your app).
Details about object change notification are documented at https://cloud.google.com/storage/docs/object-change-notification.
Object listing in GCS is not sorted by timeCreated. Object listing results are always in alphabetical order. In your example, those two things merely happen to catch up.
If you want to get a list of objects in the order they were uploaded, you must ensure that each object has a name alphabetically later than the name of any object uploaded before it. Even then, however, you must take care, as object listing is eventually consistent, which means that objects you upload may not immediately show up in a listing.
If some ordering of objects is critically important, it would be a good idea to maintain a separate index of the objects and their timestamps in a separate data structure, perhaps populated via object change notifications as Mike suggested.

Given a list of unique discount codes, how to send each one only once

I'm wondering the best way to solve this problem: given a pre-generated list of 500 unique discount codes for an e-commerce site, how do I ensure that each of the first 500 users that receive a discount code each receive a unique one? The e-commerce site would be making an asynchronous request to a separate server with the list of discount codes stored in its database. It's this server's job to make sure that it sends back each discount code only once, in chronological order as requests are received.
As this seems like a rather primitive problem, I wonder if there is a clever and elegant way to do this with a relatively low level of effort.
A simple way is to have a collection of your codes and remove items as you select them. Here is a simple example with .findAndModify().
A basic collection example:
db.codes.insert([
{ "a": 1 },
{ "a": 2 },
{ "a": 3 }
])
Issue a .findAndModify():
db.codes.findAndModify({
"query": {},
"remove": true,
"new": false
})
Returns:
{ "_id" : ObjectId("550caf3f7d9c3dc0eab83334"), "a" : 1 }
And the new state of the collection is:
{ "a": 2 }
{ "a": 3 }
So as the document is retrieved it is removed from the collection preventing further selection. Since .findAndModify() is an atomic operation, no other request can see the same document and every request will get it's own unique response.
If your DB has atomic transactions, this is no problem. Just make a table discount with 2 fields, code (varchar wide enough to hold the code) and used (boolean), indexed by used and then by code. Initially INSERT 500 rows, each with used = false of course. Whenever a request comes, just SELECT min(code) FROM discount FOR UPDATE WHERE NOT used, and then UPDATE discount SET used = true WHERE NOT used AND code = <that code>, all inside a single DB transaction. (The NOT used part of the update is not necessary for correctness, but may speed things up by enabling the index to be used.)
If contention is a problem (and I don't see how it could be for 500 requests, but maybe it somehow could be), then add an integer id field containing a unique integer between 1 and 500 to the table. Then on each request, pick a random number r between 1 and 500, and SELECT min(code) FROM discount FOR UPDATE WHERE NOT used AND (id >= <r> OR id + 500 >= <r>). The condition in parentheses ensures that the search will "wrap around" to lower-numbered discounts if (and only if) all discounts >= r have already been taken.