Register a LUA script to be triggered when a specified (redis-key, op) event occurs - triggers

Can I make a LUA script to be a TRIGGER used in relational database.
e.g., after ever append or rpush to the list with key 'TIMELIST', a lua script is triggered to do some work on the list?
I have a quick look at the lua-script section in REDIS documentation and found that it seems that the script can only be explicitly evaluated. Is my idea impossible?
EXT:
If the above idea is impossible. What I should take to address such problem. I have a sorted list to store daily quotas. I use
> ZADD 'TIMELIST' <TS> <QUOTA>
to add a new a quota with its date as the weight. I wanna the sorted list act as a bounded list which only maintain the last 7 days' quotas. I don't use the LTRIM on a normal redis-list because the quotas may have gap in timeline. Hash isn't suitable either since for some reason, I must make the TRUNCATION op invisible to application who access REDIS.
I'm a newbie to redis. Any suggestion? Thanks!

No, that isn't doable. What you can easily do, however, is use a script to both perform the operation (LPUSH, ZADD, etc) as well as any additional trigger-like logic.

You can use ZREMRANGEBYRANK to remove the old elements.
redis> ZADD myzset 1 "one"
(integer) 1
redis> ZADD myzset 2 "two"
(integer) 1
redis> ZADD myzset 3 "three"
(integer) 1
redis> ZREMRANGEBYRANK myzset 0 1
(integer) 2
redis> ZRANGE myzset 0 -1 WITHSCORES
1) "three"
2) "3"
redis>

There has been an interesting development with Redis regarding these kind of "triggers". You could accomplish what you want with:
Redis Key Space Notifications (get notified when an event happens - for example adding a key to your set/zset) and run an internal Redis Module (new in Version 5)
that will perform a ZREM from set under some conditions - effectively having a Redis "trigger".
You could technically just use the Key Space Notifications option that send a pub/sub event (in Redis) - but you might miss events if you have no subscribers to that event (pub/sub events in Redis are not durable).
The downside is that you will have to write that module in C/C++ code and validate it works properly.

Related

Race condition in amplify datastore

When updating an object, how can I handle race condition?
final object = await Amplify.Datastore.query(Object.classtype, where: Object.ID.eq('aa');
Amplify.Datastore.save(object.copywith(count: object.count + 1 ));
user A : execute first statement
user B : execute first statement
user A : execute second statement
user B : execute second statement
=> only updated + 1
Apparently the way to resolve this is to either
1 - use conflict resolution, available from Datastore 0.5.0
One of your users (whichever is slowest) gets sent back the rejected version plus the latest version from server, you get both objects back to resolve discrepancies locally and retry update.
2 - Use a custom resolver
here..
and check ADD expressions
You save versions locally and your vtl is configured to provide additive values to the pipeline instead of set values.
This nice article might also help to understand that
Neither really worked for me, one of my devices could be offline for days at a time and i would need multiple updates to objects to be performed in order, not just the last current version of the local object.
What really confuses me is that there is no immediate way to just increment values, and keep all incremented objects' updates in the outbox instead of just the latest object, then apply them in order when connection is made..
I basically wrote in a separate table to do just that to solve my problem, but of course with more tables and rows, comes more reads and writes and therefore more expense.
Have a look at my attempts here if you want the full code lmk
And then i guess hope for an update to amplify that includes increment values logic to update values atomically out of the box to avoid these common race conditions.
Here is some more context

OpenSearch return results based on field in multiple indices

I have an index containing vulnerable dependencies, and their status in repositories.
I don't want to remove the alerts when they are resolved, as i also want to log that the vulnerability has been patched.
However, this means that i end up with some data that i'm not sure what would be the best way to deal with.
Here is a simplified example of how my data looks like
_id
alert_id
repository
alert_name
action
1
1
car_repository
jwt
created
2
2
car_repository
express
created
3
2
car_repository
express
resolved
4
5
boat_repository
express
created
5
3
car_repository
log4j
resolved
6
3
car_repository
log4j
created
7
4
boat_repository
log4j
created
In total, 5 vulnerability warnings has been created. 2 of them has been resolved.
Now - what i want to do is show the current status. We have 3 active vulnerabilities still. How would i go about only showing the 3 relevant rows? (1, 4 and 7)
Keep in mind that i am still pretty new to using ELK/OpenStack, so i don't know if this is best solved using queries or filters, or if it would help dividing into multiple indices.
I'd say the easiest way would be to maintain 2 indices: one for actions with what you have in the table above and one with vulnerabilities and current status. So whenever you're creating a "created" action you would also create a vulnerability doc with status == "created" and when you create action which is not "created" you'll update_by_query that doc to set status = "resolved". Then your query would become super simple.
Alternative would be to use collapse but in my experience its behavior is quite confusing when you try to paginate or aggregate the results.

Can you calculate active users using time series

My atomist client exposes metrics on commands that are run. Each command is a metric with a username element as well a status element.
I've been scraping this data for months without resetting the counts.
My requirement is to show the number of active users over a time period. i.e 1h, 1d, 7d and 30d in Grafana.
The original query was:
count(count({Username=~".+"}) by (Username))
this is an issue because I dont clear the metrics so its always a count since inception.
I then tried this:
count(max_over_time(help_command{job=“Application
Name”,Username=~“.+“}[1w]) -
max_over_time(help_command{job=“Application name”,Username=~“.+“}[1w]
offset 1w) > 0)
which works but only for one command I have about 50 other commands that need to be added to that count.
I tried the:
"{__name__=~".+_command",job="app name"}[1w] offset 1w"
but this is obviously very expensive (timeout in browser) and has issues with integrating max_over_time which doesn't support it.
Any help, am I using the metric in the wrong way. Is there a better way to query... my only option at the moment is the count (format working above for each command)
Thanks in advance.
To start, I will point out a number of issues with your approach.
First, the Prometheus documentation recommends against using arbitrarily large sets of values for labels (as your usernames are). As you can see (based on your experience with the query timing out) they're not entirely wrong to advise against it.
Second, Prometheus may not be the right tool for analytics (such as active users). Partly due to the above, partly because it is inherently limited by the fact that it samples the metrics (which does not appear to be an issue in your case, but may turn out to be).
Third, you collect separate metrics per command (i.e. help_command, foo_command) instead of a single metric with the command name as label (i.e. command_usage{commmand="help"}, command_usage{commmand="foo"})
To get back to your question though, you don't need the max_over_time, you can simply write your query as:
count by(__name__)(
(
{__name__=~".+_command",job=“Application Name”}
-
{__name__=~".+_command",job=“Application name”} offset 1w
) > 0
)
This only works though because you say that whatever exports the counts never resets them. If this is simply because that exporter never restarted and when it will the counts will drop to zero, then you'd need to use increase instead of minus and you'd run into the exact same performance issues as with max_over_time.
count by(__name__)(
increase({__name__=~".+_command",job=“Application Name”}[1w]) > 0
)

How to segregate large real time data in MongoDB

Let me explain the problem
We get real time data which is as big as 0.2Million per day.
Some of these records are of special significance. The attributes
that shall mark them as significant are pushed in a reference collection. Let us say each row in Master Database has the following attributes
a. ID b. Type c. Event 1 d. Event 2 e. Event 3 f. Event 4
For the special markers, we identify them as
Marker1 -- Event 1 -- Value1
Marker2 -- Event 3 -- Value1
Marker3 -- Event 1 -- Value2
and so on. We can add 10000 such markers.
Further, the attribute Type can be Image, Video, Text, Others. Hence the idea is to segregate Data based on Type, which means that we create 4 collections out of Master Collection. This is because we have to run search on collections based on Type and also run some processing.The marker data should show in a different tab on the search screen.
We shall also be running a search on Master Collection through a wild search.
We are running Crons to do these processes as
I. Dumping Data in Master Collection - Cron 1
II. Assigning Markers - Cron 2
III. Segregating Data based on Type - Cron 3
Which runs as a module. Cron 1 - Cron 2 - Cron 3.
But assigning targets and segregation takes a very long time. We are using Python as scripting language.
In fact, the crons don't seem to work at all. The cron works from the command prompt. But scheduling these in crontab does not work. We are giving absolute path to the files. The crons are scheduled at 3 minutes apart.
Can someone help?
Yes, I also faced this problem but then I tried by moving small chunks of the data. Sharding is not the better way as per my experience regarding this kind of problem. Same thing for the replica set.

JOlivier EventStore difference and usage of StreamRevision and CommitSequence?

When looking to JOlivers "EventStore", I see that StreamRevision and CommitSequence are the same if you only commit 1 event. And it is the StreamRevision that is used to select events with.
Suppose I first created an aggregate which comitted 1 event. And after that comitted 10 events which would make my SQL database table look like this (simplified):
Revision Items Sequence
1 1 1
11 10 2
I have 2 question that derive from this:
Is this the difference between StreamRevision and CommitSequence?
The store exposes a "GetFrom" method that takes a "minRevision" and a "maxRevision". With the data from above, how does this work if I request minRevision=4 and maxRevision=8 ? Shouldn't it have been "minSequence" and "maxSequence" instead?
Thanks.
Werner
Commits are a storage concept to prevent duplicates and facilitate optimistic concurrency by storage engines that don't have transactional support such as CouchDB and MongoDB. StreamRevision, on the other hand, represents the number of events committed to the stream.
When you're working with stream and you call GetFrom() with min/max revision of 4-8, that means that you want (according to your example) all events starting at v4 through v8 which is encapsulated by commit #2.