Algolia: can I set expiry time for records? - algolia

I'm using Algolia to index my docs. I don't want the number of records to grow unboundedly. Is there a way to set an expiry time for each record so they're automatically deleted?

The best way to achieve that would be to add an attribute to your record with the expiry dates and run a script regularly to delete the data. For instance. I would use an int so you can use numeric filters (see: https://www.algolia.com/doc/api-reference/api-parameters/filters/#numeric-filters)
{
name: "test",
expire: 20171201
}

Related

Auto-update field after specific amount of time only when specific conditions are met

Mongodb has an option to set TTL on documents.
I would like to know if there is a similar feature that allows the update of a specific field after a specific amount of time.
Basically what I want to achieve is update the field STATUS of a document from PENDING to EXPIRED after a specific amount of time (only if it was PENDING).
I know I could use cronjobs, but first I want to check whether it's possible natively with Mongodb.
Furthermore, is it possible to set the TTL with condition? Like deleting a document after X days only if STATUS is EXPIRED?
what I want to achieve is update the field STATUS of a document from PENDING to EXPIRED after a specific amount of time
Not achievable today, you have to create your own script
Furthermore, is it possible to set the TTL with condition? Like
deleting a document after X days only if STATUS is EXPIRED?
I recommend you that the script that set the status also set readyToBeDeletedSince: Date and that you put your TTL on that field.
Regarding the TTL with condition, I found that there is an option called "partialFilterExpression" that you can add to the index.
For example the following will delete a document after 5 minutes only if status is "PENDING"
db.mymodel.createIndex({ createdAt: 1 }, { expireAfterSeconds: 300, partialFilterExpression: { status: 'PENDING' } })

Remove document if timestamp is too old

I know there is possible to remove some documents when the saved date is passed away with this command:
db.course.deleteMany({date: {"$lt": ISODate()}})
I trying to do the same thing but I'm trying to check is a timestamps is passed away
All saved timestamps are like this one
1492466400000
Is it possible to make a command with a condition to delete all documents with a too old timestamp?
EDIT
I use milliseconds timestamps
In MongoDB you 3.2 you can also use TTL Indexes where you can say to MongoDB "please remove all the Documents if the {fieldDate} is more older then 3600 seconds". This is pretty useful for Logging (remove all logs more older then 3 months).
Maybe is not your use case, but I think is pretty good to know.
TL indexes are special single-field indexes that MongoDB can use to automatically remove documents from a collection after a certain amount of time or at a specific clock time. Data expiration is useful for certain types of information like machine generated event data, logs, and session information that only need to persist in a database for a finite amount of time.
db.eventlog.createIndex( { "lastModifiedDate": 1 }, { expireAfterSeconds: 3600 } )
https://docs.mongodb.com/v3.2/core/index-ttl/
I find the way to do this
db.course.deleteMany({date: {"$lt": ISODate()*1}})
ISODate()*1 condition will convert the Date into a timestamp
and
ISODate(timestamp) convert the timestamps into a Date.

mongodb upsert with conditional field update

I have a script that populates a mongo db from daily server log files. Log files come from a number of servers so the chronological order of the data is not guaranteed. To make this simple, let's say that the document schema is this:
{
_id: <username>,
first_seen: <date>,
last_seen: <date>,
most_recent_ip: <string>
}
that is, documents are indexed by the name of the user who accessed the server. For each user, we keep track of the first time the user was seen and the ip from the last visit.
Right now I handle this very inefficiently: first try an insert. If it fails, retrieve a record by _id, then calculate updated values (e.g. first_seen and most_recent_up), and finally update the record. This is 3 db calls per log entry, which makes the script's running time prohibitively long given the very high volume of data.
I'm wondering if I can replace this with an upsert instead. I can see how to handle first/last_seen: probably something like {$min: {'first_seen': <log_entry_date>}} (hope this works correctly when inserting a new doc). But how do I set most_recent_ip to the new value only when <log_entry_date> > $last_seen.
Is there generally a preferred pattern for my use case?
You can just use $set to set the most_recent_ip, e.g.
db.logs.update(
{_id:"user1"},
{$set:{most_recent_ip:"2.2.2.2"}, $min:{first_seen:new Date()}, $max:{last_seen:new Date()}},
{upsert: true}
)

Algolia: Best way to query slave index to get sort by date ranking functionality

I have a data set where I want to dynamically sort by date (both ascending and descending) on the fly. I read through the docs and as instructed I've created a slave index of my master index, where the top ranking value is my 'date' ordered by DESC. The date is in the correct integer and unix timestamp format.
My question is how do I query this new index on the fly using the front end Javascript Algolia API?
Right now, my code looks like the following:
this.client = algoliasearch("xxxx", "xxxxx");
this.index = this.client.initIndex('master_index');
this.index.search(
this.query, {
hitsPerPage: 10,
page: this.pagination,
facets: '*',
facetFilters: facetArray
},
function(error, results) {
// do stuff
}.bind(this));
What I've tried doing is to just change the initIndex to use my slave index instead and this does work...but I'm thinking that this is slow and inefficient if I need to reinitialize the index every time the user just wants to sort by date. Isn't there a parameter instead that I can insert in the query to sort by date?
Also, my second question is that even when I change the index to the slave index, it only sorts by descending. How can I have it sort by ascending as well?
I really do not want to create ANOTHER slave index just to sort by ascending date since I have many thousands of rows and am already close to exceeding my record limit. Surely there must be another way here?
Thanks!
What I've tried doing is to just change the initIndex to use my slave index instead and this does work...but I'm thinking that this is slow and inefficient if I need to reinitialize the index every time the user just wants to sort by date. Isn't there a parameter instead that I can insert in the query to sort by date?
You should store all the indices you want to do sorts in different properties on the this object:
this.indices = {
mostRelevant: this.client.initIndex('master_index'),
desc: this.client.initIndex('slave_desc')
};
Then you can use this.indices.mostRelevant.search() or this.indices.desc.search().
This is not a performance issue to do so.
Also see the dedicated library to create instant-search experiences: https://community.algolia.com/instantsearch.js/
Also, my second question is that even when I change the index to the slave index, it only sorts by descending. How can I have it sort by ascending as well?
I really do not want to create ANOTHER slave index just to sort by ascending date since I have many thousands of rows and am already close to exceeding my record limit. Surely there must be another way here?
This is the only true way to do sorts in Algolia. This is by design what makes Algolia so fast and is currently the only way to do so.

What's the benefit of mongodb's ttl collection? vs purging data from a housekeeper?

I have been thinking about using the build in TTL feature, but it's not easy to dynamically changing the expiration date.
Since mongodb is using a background task purging the data. Is there any downside just coding my own purging function based on "> certain_date" and run say once a day?
This way, I can dynamically changing the TTL value, and this date field won't have to be single indexed. I can reuse this field as part of the complex indexing to minimize number of indexes.
There are 2 ways to set the expiration date on a TTL collection:
at a global level, when creating the index
per document, as a field in the document
Those modes are exclusive.
Global expiry
If you want all your documents to expire 3 months after creation, use the first mode by creating the index like the following:
db.events.ensureIndex({ "createdAt": 1 }, { expireAfterSeconds: 7776000 })
If you later decide to change the expiry to "4 months", you just need to update the expireAfterSeconds value using the collMod command:
db.runCommand({"collMod" : "events" , "index" : { "keyPattern" : {"createdAt" : 1 } , "expireAfterSeconds" : 10368000 } })
Per-document expiry
If you want to have every document has its own expiration date, save the specific date in a field like "expiresAt", then index your collection with:
db.events.ensureIndex({ "expiresAt": 1 }, { expireAfterSeconds: 0 })
I have been thinking about using the build in TTL feature, but it's not easy to dynamically changing the expiration date
That's odd. Why would that be a problem? If your document has a field Expires, you can update that field at any time to dynamically prolong or shorten the life of the document.
Is there any downside just coding my own purging function based on "> certain_date" and run say once a day?
You have to code, document and maintain it
Deleting a whole lot of documents can be expensive and lead to a lot of re-ordering. It's probably helpful to run the purging more often
Minimizing the number of indexes is a good thing, but the question is whether it's really worth the effort. Only you can give an answer to this question. My advice is: start with something that's already there if any possible and come up with something better if and only if you really have to.