Heroku mongodb scheduled query - mongodb

I need to run a scheduled task for a mongodb query once a month. Can anyone point me in the right direction? I didn't write the app, but I need to do some improvements. I'm hoping to do this with an add on possibly or something else that doesn't require me to modify the source code of the original application.
I haven't tried too much, because I'm not sure where to start.
This is the command that the web interface would run today if done manually. I need to automate this process. I caputred this using papertrail in Heroku.
Delete records with query: { find: { created_at: { '$lte': '2019-06-05' } }, count: 10 }
Expected result is I can schedule a query to run once a month and it succeeds.

Related

List all the query shapes in mongo database or collection

I was going through the documentation on the official site, where I happened to find out the term query-shape while browsing details over the indexes section.
The details look interesting and quite possibly a list of these could help me with all possible queries that are being raised to a cluster while I am planning to onboard an existing deployed application.
But the question that I have now is that is there a way to do the above on the command line for a collection(or complete database)?
As a side note, I use both compass community and robo3t as tools built over CLI to access the datastore and as well comfortable to run the command on mongo shell directly too.
With some more time and effort, I could find PlanCache.listQueryShapes which was a slight variation towards the more recent version of mongo which I was using.
Seemingly the $planCacheStats introduced in 4.2 was something I was looking forward to. The following query helped me list all the query shapes over a collection as mentioned in the list query shapes section.
db.user_collections.aggregate( [ { $planCacheStats: { } } ,
{ $project: {createdFromQuery: 1, queryHash: 1 } } ] )

IPython parallel : how to recover job IDs from IPcontroller

I have a server running IP controller and 12 IPengines. I connect to the controller from my laptop using SSH. I submitted some jobs to the controller using the load-balanced view interface (in non-blocking mode) and stored the message IDs in the Asyc Result object returned the by apply_async() method.
I accidentally lost the message IDs for the jobs and wanted to know if there's a way to retrieve the job IDs (or the results) from the Hub database. I use a SQLite database for the Hub, and I can get the rc.db_query() method to work, but I don't know what to look for.
Does anyone know how to query the Hub database only for message IDs of the jobs I submitted? What's the easiest way of retrieving the job results from the Hub, if I don't have access to the AsyncHubResult object (or their message IDs)?
Thanks!
Without the message IDs, you are might have a pretty hard time finding the right tasks, unless there haven't been so many tasks submitted.
The querying is based on MongoDB (it's a passthrough when you use mongodb, and a subset of simple operators are implemented for sqlite).
Quick summary: a query is a dict. If you use literal values, they are equality tests, but you can use dict values for comparison operators.
You can search by date for any of the timestamps:
submitted: arrived at the controller
started: arrived on an engine
completed: finished on the engine
For instance, to find tasks submitted yesterday:
from datetime import date, time, timedelta, datetime
# round to midnight
today = datetime.combine(date.today(), time())
yesterday = today - timedelta(days=1)
rc.db_query({'submitted': {
'$lt': today, # less than midnight last night
'$gt': yesterday, # greater than midnight the night before
}})
or all tasks submitted 1-4 hours ago:
found = rc.db_query({'submitted': {
'$lt': datetime.now() - timedelta(hours=1),
'$gt': datetime.now() - timedelta(hours=4),
}})
With the results of that, you can look at keys like client_uuid to retrieve all messages submitted by a given client instance (e.g. a single notebook or script):
client_id = found[0]['client_uuid']
all_from_client = rc.db_query({'client_uuid': client_uuid})
Since you are only interested in results at this point, you can specify keys=['msg_id'] to only retrieve the message IDs. We can then use these msg_ids to get all the results produced by a single client session:
# construct list of msg_ids
msg_ids = [ r['msg_id'] for r in rc.db_query({'client_uuid': client_uuid}, keys=['msg_id']) ]
# use client.get_result to retrieve the actual results:
results = rc.get_result(msg_ids)
At this point, you have all of the results, but you have lost the association of which results came from which execution. There isn't a lot of info to help you out there, but you might be able to tell by type, timestamps, or perhaps select the 9 final items from a given session.

Best approach for scheduling tasks on interval basis from database

I have a MongoDB collection with Tasks. Each task has an interval in seconds, Task identifier and payload that should be sent via HTTP POST to gather results and store them into another collection.
It may be thousands tasks with different intervals and I cannot figure out how to schedule them.
Currently I'm using simple polling by last execution time every 10ms but it produces heavy load on DB.
and it looks like this
mongo.MongoClient.connect(MONGO_URL, (err, db) ->
handle_error(err)
schedule = (collection) ->
collection.find({isEnabled:true, '$where': '((new Date()).getTime() - this.timestamp) > (this.checkInterval * 60 * 1000)'}).toArray((err, docs) ->
handle_error(err)
for i, doc of docs
collection.update({_id: doc._id}, {'$set': {timestamp: (new Date()).getTime()}}, {w: 1})
task = prepare(doc)
request.post({url: url, formData: {task: JSON.stringify(prepare(doc))}}, (err,httpResponse,body) ->
result = JSON.parse(body)
console.log(result)
db.collection(MONGO_COLLECTION_RESULTS).save({
task: result.id,
type: result.type,
data: result
})
)
setTimeout((() -> schedule(collection)), 10)
)
setTimeout((() -> schedule(db.collection(MONGO_COLLECTION_TASKS))), 10)
)
tasks can be added, updated, deleted and I have to handle it.
what about using redis? but I have no clue how to sync the data from mongo to redis when some tasks are waiting for result, interval was changed, etc
please advice best strategy for that
I don't think this is the right way to solve your use case.
I would suggest to not store the tasks in whatever database but schedule them directly when they come in and save the result, with or without the original task information.
Why not use Quartz to schedule the task?
If you know the tasks to be run, you can schedule with the unix crontab, which runs a script that connects to DB or send HTTP requests.
If each task is unique, and you cannot pre-schedule them that way, perhaps you can use your current db collections, but not poll the db that often.
If it is not critical that the tasks are executed exactly on the right time, I would do a db lookup maybe once every 10 sec to see what tasks should have been executed since the last lookup.
One way of solving the db load would be to make a query that fetches tasks ordered on when they should be executed, with all tasks that should be executed within the next minute or so. Then you have (hopefully) a low amount of tasks in memory, and can set a javascript timeout for when they should be run. If too many tasks should be run at the same time, this could be problematic to fetch from the db at once.
The essence is to batch several tasks from the db into memory, and handle some of the scheduling there.

Meteor Collections behaving differently when uploaded to a live environment

I'm having a issue where my mongo collection is behaving differently after I've uploaded it to Meteor's servers. Locally everything works perfectly and I'm not seeing any issues when running meteor logs either.
What I'm trying to do is this:
In my collection called RaceList I have several entry's. Each has a unique id, an array of users and a 'live' variable which is a boolean.
Every hour I update this collection by removing the live race, setting the next race's live variable to true and adding another race to the end of the collection.
All this is working for me locally but after uploading to my Meteor server something strange is happening. If I don't and anybody to the array of users in the next race to turn live it seems to be working ok but as soon as I join the race, for some reason, the race immediately after the one I have just joined will become the live race and the race I joined is skipped...
Here is the code from my server that is executed every hour:
updateRaces: ->
# Remove the finished race
Meteor.call 'removeLiveRace'
# Set the next race to live
Meteor.call 'updateLiveRace'
# Add another race to the collection
Meteor.call 'insertNewRace'
And here is the code from my Meteor.methods
removeLiveRace: ->
id = RaceList.findOne( { live: true } )?._id
if id
RaceList.remove _id: id
updateLiveRace: ->
id = _.first( RaceList.find().fetch() )._id
RaceList.update id, $set: live: true
insertNewRace: ->
RaceList.insert
live : false
users : []
Any help is greatly appreciated. I'm still just getting started with Meteor so any advice to make this code more efficient/safe would be great!
Thanks : )
While it doesn't look like you're using Cron, the standard warning message still applies to you:
Caveats
Beware, SyncedCron probably won't work as expected on certain shared
hosting providers that shutdown app instances when they aren't
receiving requests (like Heroku's free dyno tier or Meteor free
galaxy).
In other words, because you are using free services, any moments of app shutdown will mess with any Cron/time-based functions.
Figured it out. It was because my collection was sending an entry to the bottom of the collection when I added more than 1 user to it. I have no idea why this would happen? I just set the collection to sort by createdAt wherever I query it and that fixed the problem.

In Meteor is there way to age out data from a cursor?

I'm thinking about building a log follower similar to the Console on OSX.
Log entries get inserted in the database on the server and are displayed in the client browser.
Meteor seems well-suited for this with its ability to follow a cursor but, my question is:
Is there way to age out older data in the client-side Mongo/Collection and the DOM? (While keeping it all in the server-side Mongo?) Otherwise, the longer you run the more memory you'll use and it's just not sustainable.
An easy way to do this is just to publish the N most recent logs. For example:
Meteor.publish('recentLogs', function () {
return Logs.find({owner: this.userId}, {sort: {createdAt: -1}, limit: 100});
});
In this example, the client would only have the 100 most recent logs that he or she owned.