Atomic GETSET on a hash in Redis - hash

I'm going to be storing a hit counter for a number of URLs in Redis. I'm planning on using a hash because that seems to make sense. It also has an atomic increment function which is critical to my use case.
Every so often, I'm going to aggregate the hit count per URL into another data store. For this purpose, I'd like to get the hit count and reset it back to zero. I can't seem to find an operation like GETSET that works on hashes. If I record a hit between getting the hit count and resetting it to zero, it will get lost without some kind of atomic operation.
Am I missing something? One alternative that occurred to me would be to hash the URL in my client (python) code and use the string commands, but that seems like a bit of a hack when Redis provides a hash itself.

Try to look at redis transactions docs, namely the combination of WATCH and MULTI commands:
WATCHed keys are monitored in order to detect changes against them. If
at least one watched key is modified before the EXEC command, the
whole transaction aborts, and EXEC returns a Null multi-bulk reply to
notify that the transaction failed.
...
So what is WATCH really about?
It is a command that will make the EXEC conditional: we are asking
Redis to perform the transaction only if no other client modified any
of the WATCHed keys. Otherwise the transaction is not entered at all.

Related

mongodb change stream, operation update, how to solve get previous value problem

I know that this feature is not implemented by mongodb. I am thinking what can be best way to achieve that.
Using caching service ? The approach will work but there is one problem, when the query You watch on is too big, like whole collection, You will never have the first before value, because You start caching only when first change appear on watch.
Service started watching.
Received object id 1, no cache for previous change, caching value.
Received object id 1, cache for previous change, can do comparison, caching value
I see another problem here, if I have 2 watchers which could potentially receive information about the same object, this will cause sync problems, as one process may update cache and second will already receive wrong data, hm. I mean the second process could be in a situation that cached previous value is already the same as the one in mongodb change stream.
I was thinking as well about mongodb replicas, but not sure if the problem can be solved with it.
Best,
Igor

Firestore Increment - Cloud Function Invoked Twice

With Firestore Increment, what happens if you're using it in a Cloud Function and the Cloud Function is accidentally invoked twice?
To make sure that your function behaves correctly on retried execution attempts, you should make it idempotent by implementing it so that an event results in the desired results (and side effects) even if it is delivered multiple times.
E.g. the function is trying to increment a document field by 1
document("post/Post_ID_1").
updateData(["likes" : FieldValue.increment(1)])
So while Increment may be atomic it's not idempotent? If we want to make our counters idempotent we still need to use a transaction and keep track of who was the last person to like the post?
It will increment once for each invocation of the function. If that's not acceptable, you will need to write some code to figure out if any subsequent invocations are valid for your case.
There are many strategies to implement this, and it's up to you to choose one that suits your needs. The usual strategy is to use the event ID in the context object passed to your function to determine if that event has been successfully processed in the past. Maybe this involves storing that record in another document, in Redis, or somewhere that persists long enough for duplicates to be prevented (an hour should be OK).

What are these key-value store semantics called?

Imagine a simple key-value server that allows the following verbs:
PUT key value - Sets the value of key to value
GET key - Gets the value of the key if it set, or indicates it is missing
WAIT key timeout - If the value of the key is set, get it immediately. Otherwise, block/wait until somebody else PUTs the key, returning as quickly as possible. If the timeout is reached, indicate failure.
These semantics are somewhat similar to Futures and Promises in various local execution environments, but in a distributed environment, I'm imagining it is typically accomplished with some combination of a messaging protocol and a key-value store.
I am wondering if anybody is either:
Aware of a good name for these semantics so I can start googling
Aware of a tool that offers this out of the box
Still not sure what the semantics are called -- but this is accomplishable using redis blocking.
Using blocking pop/pushes with one-element lists, we can implement the GETs as follows:
BRPOPLPUSH q q 0
If the list already exists, it will return the value immediately, and then just add it back to the list. If it doesn't it'll block until a value is added (or you can set a timeout using the last arg).
To set a value, you can just push to the list.
LPUSH q 1
If you want to ensure true SET semantics, you might prefer a transaction
MULTI
DEL q
LPUSH q 1
EXEC

How cursor.observe works and how to avoid multiple instances running?

Observe
I was trying to figure it out how cursor.observe runs inside meteor, but found nothing about it.
Docs says
Establishes a live query that notifies callbacks on any change to the query result.
I would like to understand better what live query means.
Where will be my observer function executed? By Meteor or by mongo?
Multiple runs
When we have more than just a user subscribing an observer, one instance runs for each client, leading us to a performance and race condition issue.
How can I implement my observe to it be like a singleton? Just one instance running for all.
Edit: There was a third question here, but now it is a separated question: How to avoid race conditions on cursor.observe?
Server side, as of right now, observe works as follows:
Construct the set of documents that match the query.
Regularly poll the database with query and take a diff of the changes, emitting the relevant events to the callbacks.
When matching data is changed/inserted into mongo by meteor itself, emit the relevant events, short circuiting step #2 above.
There are plans (possibly in the next release) to automatically ensure that calls to subscribe that have the same arguments are shared. So basically taking care of the singleton part for you automatically.
Certainly you could achieve something like this yourself, but I believe it's a high priority for the meteor team, so it's probably not worth the effort at this point.

How to fetch the continuous list with PostgreSQL in web

I am making an API over HTTP that fetches many rows from PostgreSQL with pagination. In ordinary cases, I usually implement such pagination through naive OFFET/LIMIT clause. However, there are some special requirements in this case:
A lot of rows there are so that I believe users cannot reach the end (imagine Twitter timeline).
Pages does not have to be randomly accessible but only sequentially.
API would return a URL which contains a cursor token that directs to the page of continuous chunks.
Cursor tokens have not to exist permanently but for some time.
Its ordering has frequent fluctuating (like Reddit rankings), however continuous cursors should keep their consistent ordering.
How can I achieve the mission? I am ready to change my whole database schema for it!
Assuming it's only the ordering of the results that fluctuates and not the data in the rows, Fredrik's answer makes sense. However, I'd suggest the following additions:
store the id list in a postgresql table using the array type rather than in memory. Doing it in memory, unless you carefully use something like redis with auto expiry and memory limits, is setting yourself up for a DOS memory consumption attack. I imagine it would look something like this:
create table foo_paging_cursor (
cursor_token ..., -- probably a uuid is best or timestamp (see below)
result_ids integer[], -- or text[] if you have non-integer ids
expiry_time TIMESTAMP
);
You need to decide if the cursor_token and result_ids can be shared between users to reduce your storage needs and the time needed to run the initial query per user. If they can be shared, chose a cache window, say 1 or 5 minute(s), and then upon a new request create the cache_token for that time period and then check to see if the results ids have already been calculated for that token. If not, add a new row for that token. You should probably add a lock around the check/insert code to handle concurrent requests for a new token.
Have a scheduled background job that purges old tokens/results and make sure your client code can handle any errors related to expired/invalid tokens.
Don't even consider using real db cursors for this.
Keeping the result ids in Redis lists is another way to handle this (see the LRANGE command), but be careful with expiry and memory usage if you go down that path. Your Redis key would be the cursor_token and the ids would be the members of the list.
I know absolutely nothing about PostgreSQL, but I'm a pretty decent SQL Server developer, so I'd like to take a shot at this anyway :)
How many rows/pages do you expect a user would maximally browse through per session? For instance, if you expect a user to page through a maximum of 10 pages for each session [each page containing 50 rows], you could make take that max, and setup the webservice so that when the user requests the first page, you cache 10*50 rows (or just the Id:s for the rows, depends on how much memory/simultaneous users you got).
This would certainly help speed up your webservice, in more ways than one. And it's quite easy to implement to. So:
When a user requests data from page #1. Run a query (complete with order by, join checks, etc), store all the id:s into an array (but a maximum of 500 ids). Return datarows that corresponds to id:s in the array at positions 0-9.
When the user requests page #2-10. Return datarows that corresponds to id:s in the array at posisions (page-1)*50 - (page)*50-1.
You could also bump up the numbers, an array of 500 int:s would only occupy 2K of memory, but it also depends on how fast you want your initial query/response.
I've used a similar technique on a live website, and when the user continued past page 10, I just switched to queries. I guess another solution would be to continue to expand/fill the array. (Running the query again, but excluding already included id:s).
Anyway, hope this helps!