Algolia Real-time Webhooks - algolia

Is there any way to know whenever Algolia has successfully processed a queued job, or if Algolia has indexed new documents compared to the last re-index?
We'd like to build a system that whenever a new document is indexed, users browsing the website are warned in real-time of the update, and to go checkout the new stuff.
Is something like this possible?
If not, is there any workaround to make this possible?

You're completely right, webhooks could definitely make sense. Unfortunately, Algolia doesn't provide such a feature, you'll have to rely on polling.
Jobs you send to Algolia are executed sequentially. A fairly easy solution would be to store in a queue each job with its associated action: you can have a dedicated process whose only role would be to infinitely wait on the first taskID of the queue and execute the action as soon as it returns.

Related

API to access Google Doc "Email Notification settings"

We have hundreds of business documents, and when a user makes a suggested edit or comment it's the manager's responsibility to review and approve/reject.
Google offers a feature to receive an email notification when a comment is made or suggested edit made(otherwise easy for managers to loose track or not know about suggested edits/comments), and we'd like to turn this on for managers but manually doing this for hundreds of documents is a maintenance nightmare. Is there an API that would allow us programmatically set this field, or even read it.
If there are no APIs is there some other recommended work flow such that employees can suggest improvements and managers will be proactively notified so they can approve/reject the suggestion(ISO 9001 Control of Documents/Records)?
PS I wrote some scripts to poll documents for open comments/suggestion, but we'd prefer to be proactively notified.
You could create a program to watch changes in files by using the SuggestionsViewMode. You need to fetch the entire document content and then look through it for suggestions.
result = service.documents().get(documentId=DOCUMENT_ID, suggestion_mode=SUGGEST_MODE).execute()
I assume by the previous response that there is not a way to change notification settings for a doc via API?
I actually have sort of the opposite problem of the OP, where I am generating lots of documents for a shared drive via automations. These docs are used by our team, but aren't generally relevant to me. Because my account is the one generating the docs, the notification setting defaults to "All comments and tasks".
It would be great if I could update my automation to change the notification setting to "Comments and tasks for you" after creating the doc. I'd appreciate any suggestions.

Getting updates in every blockchain append

I am working on an art project which desplays bitcoin transactions graphically.
Thus i need some way of getting an update when a transaction is filed in the blockchain.
Is there any way to accomplish this without copying the whole blockchain, since i do not need any precise information about the transaction?
If you are really want to get all the transactions that are happening, then you have to parse each new block that comes in. You have to look into the RPC Calls to get each individual block.
If you just want to watch certain addresses for transactions you can look into walletnotify of the bitcoind node.

How to restrict Collection.find() to certain select patterns in Meteor

I am experimenting with a simple chat app and Meteor 0.8.0
For a list of messages, where each message references a user through user_id, I want to display the username together with the message.
Is it possible to restrict the select patterns for a find()-call, so that e.g. Meteor.users.find({_id: msg.userId}) is allowed but not Meteor.users.find({})?
Unfortunately this is not covered by Collection.allow/.deny, where I think would be the natural place. If this would be possible, I could simply use Meteor.publish("usersWithName",function() {Meteor.users.find({},{fields:{username:1}}); without having to worry that the complete user list can be fetched on the client by an attacker.
Currently, I am using the smart-publish package to publish only the users referenced by messages, but I would prefer a simpler solution.
No, there is no way to restrict find queries from being run client-side, since the server is never contacted. It just runs the query against it's local collection. In the same way that an insert, update, or delete first happens client-side and then validates against the server (i.e. someone can remove a document on their client but the server will then reject it).
The best way to handle this is to only publish the documents you specifically need. As you mentioned, if you only publish the documents that the client should have then you are secure. Even if there was a way to force a restriction on the search client-side, it still does not really make sense to pass down more collections than you need.

New/Read Flags in CQRS

I am currently drafting a concept for a (mostly) HTML-based collaboration suite which I plan to implement using CQRS. This software will contain messages that can be sent to the user (which can either be read or unread, obviously) and other elements which shall be marked "new" if they were created after the last user login.
Hardly something new, but I am not quite sure how that would be correctly implemented using CQRS. As I understand it, Change of any kind should, without exception, only be possible via Commands. But creating commands for every single (new) element that is being accessed seems a bit too much, not to mention the overhead.
I don't know if I need it, but what would be the best way to implement a Last-Accessed Timestamp on elements. Basically the same problem like the above, with the difference that the change happens EVERY time the element is accessed, not only the first time for each user.
CQRS seems to be an awesome concept but it really needs more learning material. Can't wait till a book is released :)
Regards
[Edit] No one? Wouldn't have thought that this is such a complicated issue..
I assume you're using event-sourcing in which case once you allow your query-service/event-handlers to raise appropriate events then this becomes fairly easy to solve.
For your messages/elements; when handling the specific creation events of your elements either add to existing or create additional event-handlers, to store to a messages read-model with a status of new and appropriate information about the element.
As part of you're user login I don't see why you can't raise a user-logged-in event (from the security/query service depending on how your implementing authentication) to say the user has logged in. An event-handler could capture this and write the last-login timestamp to a specific user-last-login read-model.
In addition the user-logged-in event-handler would need to update all the new messages (for that user) to an unread status. Seeing as we're changing the status of the messages as the user logs in do you still need to store the last-login timestamp?
For your last-accessed timestamp, perhaps you could just work this into your query service as queries for your different elements complete. Raise a query-completed event with element id/type information.

How to Implement Queue Based Workflow System?

I'm working on a document management system. An example workflow would be something like this:
A document is emailed to the system
The system does a number of preparatory actions to the document
Document is presented to a user for further processing
Afterwards, document is sent to Quality Assurance
Afterwards, the system does a number or post-processing actions to the document
Document is considered completely processed and disseminated (e.g. emailed back to whoever emailed the document to the system, etc.)
Since the volume of my input will vary (but will usually be high volume), I am very concerend about scalability.
For example, say the system has already downloaded the email attachments. If the attachments are PDF documents, the system needs to split the PDF into individual pages, then convert each page into multiple size thumbnails, etc. I plan to have a cron job check (say, every minute) to see if there are an PDF documents that need to be processed. Using a flagging system (e.g. "PDF Document Ready to be Processed"), I can check the database for all PDF documents that are flagged to be processed. Once the PDF processing is done, the flag can be updated to say "PDF Processing Done."
However, since the processing of each PDF document is very time consuming, I am concerned that when the next cron job is executed, that cron job will also try to process the PDFs that the previous cron job is still processing.
A possible solution is to immediately flag the PDF documents with "PDF Document Currently Being Processed." That way, when the next cron job is executed, it will exclude the ones already being processed.
Thus, each step in the workflow will probably have 3 flags:
PDF Document Ready to be Processed
PDF Document Currently Being Processed
PDF Processing Done
Same for QA:
Document Ready for QA
Document Currently Being QAd
Document QA Done
Is this a good approach? Is there a better approach? Would I have these flags as a single column of the "PDF Document" table in the database? Or should the flags be its own table (e.g. especially if a document can have multiple flags set).
I'd like to solicit suggestions on how to implement such a system.
To solve your concern about concurrent processing on the same document, you can use many scheduler packages to help you manage this aspect. http://www.quartz-scheduler.org/ is one I've used with great success.
To address your problem, I'd have the 3 states, received, queued, processed (similar to what you suggest).
I'd have a scheduled recurring job which polls the database, looking for received pdfs, and for each, queue a job to process and mark the pdf as queued. If you ensure this happens in the same transaction, and utilize optimistic locking, there is no risk another job could come along and re-read this as received.
Quartz uses a thread pool, with may configuration options, and is great for deferred, resource intensive processing (I use it for image thumbnailing in a server setting).
To take a step back, there are some great workflow packages in the java world which can handle most of what you want to do, including the deferred pdf processing. Take a look at jbpm or drools flow, these are two great, if complex, packages.
UPDATE: Drools Flow has been merged into JBPM. For this particular problem it may be a bit of "killing a mosquito with a bazooka" situation, but it's a great workflow package.
The solution kind of depends on what technologies you are using to implement this system is the pre / post processing done by the same software / language as the emailing software? Additionally are they running in seperate processes.
If you have distributed components you could do much worse than investigating an AMQP solution like RabbitMQ, as this takes care of putting each job into a queue, and making sure that only one of your consumers takes each job. (we'd model each thumbnailing job as individual tasks).
If however the entire system is implemented in one language, and inside a single process there's some simpler systems you can use:
Resque is a good solution for Ruby
Java would work well as a LinkedBlockingQueue
Uh, I'm sure c# will have some way of creating a queue of jobs (disclaimer: I know nothing of c#)