IPython parallel : how to recover job IDs from IPcontroller - ipython

I have a server running IP controller and 12 IPengines. I connect to the controller from my laptop using SSH. I submitted some jobs to the controller using the load-balanced view interface (in non-blocking mode) and stored the message IDs in the Asyc Result object returned the by apply_async() method.
I accidentally lost the message IDs for the jobs and wanted to know if there's a way to retrieve the job IDs (or the results) from the Hub database. I use a SQLite database for the Hub, and I can get the rc.db_query() method to work, but I don't know what to look for.
Does anyone know how to query the Hub database only for message IDs of the jobs I submitted? What's the easiest way of retrieving the job results from the Hub, if I don't have access to the AsyncHubResult object (or their message IDs)?
Thanks!

Without the message IDs, you are might have a pretty hard time finding the right tasks, unless there haven't been so many tasks submitted.
The querying is based on MongoDB (it's a passthrough when you use mongodb, and a subset of simple operators are implemented for sqlite).
Quick summary: a query is a dict. If you use literal values, they are equality tests, but you can use dict values for comparison operators.
You can search by date for any of the timestamps:
submitted: arrived at the controller
started: arrived on an engine
completed: finished on the engine
For instance, to find tasks submitted yesterday:
from datetime import date, time, timedelta, datetime
# round to midnight
today = datetime.combine(date.today(), time())
yesterday = today - timedelta(days=1)
rc.db_query({'submitted': {
'$lt': today, # less than midnight last night
'$gt': yesterday, # greater than midnight the night before
}})
or all tasks submitted 1-4 hours ago:
found = rc.db_query({'submitted': {
'$lt': datetime.now() - timedelta(hours=1),
'$gt': datetime.now() - timedelta(hours=4),
}})
With the results of that, you can look at keys like client_uuid to retrieve all messages submitted by a given client instance (e.g. a single notebook or script):
client_id = found[0]['client_uuid']
all_from_client = rc.db_query({'client_uuid': client_uuid})
Since you are only interested in results at this point, you can specify keys=['msg_id'] to only retrieve the message IDs. We can then use these msg_ids to get all the results produced by a single client session:
# construct list of msg_ids
msg_ids = [ r['msg_id'] for r in rc.db_query({'client_uuid': client_uuid}, keys=['msg_id']) ]
# use client.get_result to retrieve the actual results:
results = rc.get_result(msg_ids)
At this point, you have all of the results, but you have lost the association of which results came from which execution. There isn't a lot of info to help you out there, but you might be able to tell by type, timestamps, or perhaps select the 9 final items from a given session.

Related

Atomically query for all collection documents + watching for further changes

Our Java app saves its configurations in a MongoDB collections. When the app starts it reads all the configurations from MongoDB and caches them in Maps. We would like to use the change stream API to be able also to watch for updates of the configurations collections.
So, upon app startup, first we would like to get all configurations, and from now on - watch for any further change.
Is there an easy way to execute the following atomically:
A find() that retrieves all configurations (documents)
Start a watch() that will send all further updates
By atomically I mean - without potentially missing any update (between 1 and 2 someone could update the collection with new configuration).
To make sure I lose no update notifications, I found that I can use watch().startAtOperationTime(serverTime) (for MongoDB of 4.0 or later), as follows.
Query the MongoDB server for its current time, using command such as Document hostInfoDoc = mongoTemplate.executeCommand(new Document("hostInfo", 1))
Query for all interesting documents: List<C> configList = mongoTemplate.findAll(clazz);
Extract the server time from hostInfoDoc: BsonTimestamp serverTime = (BsonTimestamp) hostInfoDoc.get("operationTime");
Start the change stream configured with the saved server time ChangeStreamIterable<Document> changes = eventCollection.watch().startAtOperationTime(serverTime);
Since 1 ends before 2 starts, we know that the documents that were returned by 2 were at least same or fresher than the ones on that server time. And any updates that happened on or after this server time will be sent to us by the change stream (I don't care to run again redundant updates, because I use map as cache, so extra add/remove won't make a difference, as long as the last action arrives).
I think I could also use watch().resumeAfter(_idOfLastAddedDoc) (didn't try). I did not use this approach because of the following scenario: the collection is empty, and the first document is added after getting all (none) documents, and before starting the watch(). In that scenario I don't have previous document _id to use as resume token.
Update
Instead of using "hostInfo" for getting the server time, which couldn't be used in our production, I ended using "dbStats" like that:
Document dbStats= mongoOperations.executeCommand(new Document("dbStats", 1));
BsonTimestamp serverTime = (BsonTimestamp) dbStats.get("operationTime");

MeteorJS - How Send two seperate queries of the same collection from server to client

I am trying to send two separate sets of data from the same collection from server to client. Data is being inserted in the collection on a set interval of 30 seconds. One set of data sent to the client must return all documents over the course of the current day on an hourly basis, while the other set of data simply sends the most recent entry in the collection. I have a graph that needs to display hourly data, as well as fields that need to display the most recent record every 30 seconds, however, I cannot seem to decouple these two data sets. The query for the most recent entry seems to always overwrite the query for the hourly data when attempting to access the data on the client. So my question summed up is: How does one send two separate sets of data of the same collection from server to client, and then access these two separate sets independently on the client?
The answer is simple, you cannot!
The server is always answering to client with a result set that the client asked for. So, if the client needs two separate (different) result sets, then the client must fire up two different queries. Queries that request hourly data OR last (newest) entry.
Use added, changed, removed to modify the results from the two queries so that they are "transformed" into different fields. https://docs.meteor.com/api/pubsub.html#Subscription-added
However, this is probably not your issue. You are almost certainly using the same string as the name argument of your Meteor.publish call, or you are accidentally Meteor.subscribe-ing to the same Meteor.publish twice.
Make two separate Meteor.publish names, one for the most recent and one for the hourly data. Subscribe to each of them separately. The commenter is incorrect.

IBM Cloudant DB - get historical data - best way?

I'm pretty confused concerning this hip thing called NoSQL, especially CloudantDB by Bluemix. As you know, this DB doesn't store the values chronologically. It's the programmer's task to sort the entries in case he wants the data to.. well.. be sorted.
What I try to achive is to simply get the last let's say 100 values a sensor has sent to Watson IoT (which saves everything in the connected CloudantDB) in an ORDERED way. In the end it would be nice to show them in a D3.css style kind of graph but that's another task. I first need the values in an ordered array.
What I tried so far: I used curl to get the data via PHP from https://averylongID-bluemix.cloudant.com/iotp_orgID_iotdb_2018-01-25/_all_docs?limit=20&include_docs=true';
What I get is an unsorted array of 20 row entries with random timestamps. The last 20 entries in the DB. But not in terms of timestamps.
My question is now: Do you know of a way to get the "last" 20 entries? Sorted by timestamp? I did a POST request with a JSON string where I wanted the data to be sorted by the timestamp, but that doesn't work, maybe because of the ISO timestamp string.
Do I really have to write a javascript or PHP script to get ALL the database entries and then look for the 20 or 100 last entries by parsing the timestamp, sorting the array again and then get the (now really) last entries? I can't believe that.
Many thanks in advance!
I finally found out how to get the data in a nice ordered way. The key is to use the _design api together with the _view api.
So a curl request with the following URL / attributes and a query string did the job:
https://alphanumerical_something-bluemix.cloudant.com/iotp_orgID_iotdb_2018-01-25/_design/iotp/_view/by-date?limit=120&q=name:%27timestamp%27
The curl result gets me the first (in terms of time) 120 entries. I just have to find out how to get the last entries, but that's already a pretty good result. I can now pass the data on to a nice JS chart and display it.
One option may be to include the timestamp as part of the ID. The _all_docs query returns documents in order by id.
If that approach does not work for you, you could look at creating a secondary index based on the timestamp field. One type of index is Cloudant Query:
https://console.bluemix.net/docs/services/Cloudant/api/cloudant_query.html#query
Cloudant query allows you to specify a sort argument:
https://console.bluemix.net/docs/services/Cloudant/api/cloudant_query.html#sort-syntax
Another approach that may be useful for you is the _changes api:
https://console.bluemix.net/docs/services/Cloudant/api/database.html#get-changes
The changes API allows you to receive a continuous feed of changes in your database. You could feed these changes into a D3 chart for example.

In Couchbase, expired document included to query view list with NULL contents

Recently we started to use couchbase, We are using java spring-data-couchbase with Jersey to access couchbase. Accessing low level java-sdk-api we set expire time (TTL) to a particular document with the KEY(id). It's working fine. The code is as follows.
// define couchbaseTemplate for lower-level access to Java SDK
#Autowired
CouchbaseTemplate couchbaseTemplate;
// setExpiry method update expiry given a doc ID
#Override
public void setExpiry(String key, int expN) throws RepositoryException {
couchbaseTemplate.getCouchbaseClient().touch(key, expN);
}
The problem we face is when we try to get list of documents using query, the list contains the expired documents. And when we try to access the documents from the list we found it to be null.
But if we execute query after a while the expired document no longer include to the list.
Example: When the expN = 10 seconds, and we execute query around 10 seconds after setting the TTL, the expired documents included
If we execute query around 20 seconds after setting the TTL, the expired documents no longer included
in stale options we set
Query.setStale(Stale.false)
We have tried to manipulate
Query.setIncludeDocs
But no luck, any help....
Couchbase Server does expiries lazily. There are three ways an item can be expired:
When a document is accessed (get operation) the expiration value is checked
When the expiry pager runs
When disk compaction process runs (Only in Couchbase Server 3 and onwards)
As a result of this views will not be updated until one of these three processes has happened.
For this use case you could simple do a range query against the view using the current time so it only returns documents that have not expired. Assuming the time is the same on the cluster are well as the client and the view being used is this one:
function (doc, meta) {
emit(meta.expiration, null);
}
The meta.expiration is an epoch timestamp, so the following query could be used:
String currentEpoch = String.valueOf((System.currentTimeMillis()/1000));
bucket.query(ViewQuery.from("designdoc", "myview").startkey(currentEpoch));
Please note that this will return all alive documents that have an expiration set.
If you want to do something more interesting with date formats have a look at the Date and time selection half way down in the View and query examples chapter in the Couchbase Server manual.
As the Couchbase official documents said,
Detecting Expired Documents in Result Sets : If you are using views for indexing items from Couchbase Server, items that have not yet been removed as part of the expiry pager maintenance process will be part of a result set returned by querying the view. To exclude these items from a result set you should use query parameter include_doc set to true.
For expired documents, if you set include_doc=true , Couchbase Server returns a result set indicating the document does not exist anymore. Specifically, the key that had expired but had not yet been removed by the cleanup process will appear in the result set as a row where "doc":null :
So, this is how Couchbase works with expired documents.
For your case, just filter out the items where the doc is null, the rest will be your expected result.

Meteor Collections behaving differently when uploaded to a live environment

I'm having a issue where my mongo collection is behaving differently after I've uploaded it to Meteor's servers. Locally everything works perfectly and I'm not seeing any issues when running meteor logs either.
What I'm trying to do is this:
In my collection called RaceList I have several entry's. Each has a unique id, an array of users and a 'live' variable which is a boolean.
Every hour I update this collection by removing the live race, setting the next race's live variable to true and adding another race to the end of the collection.
All this is working for me locally but after uploading to my Meteor server something strange is happening. If I don't and anybody to the array of users in the next race to turn live it seems to be working ok but as soon as I join the race, for some reason, the race immediately after the one I have just joined will become the live race and the race I joined is skipped...
Here is the code from my server that is executed every hour:
updateRaces: ->
# Remove the finished race
Meteor.call 'removeLiveRace'
# Set the next race to live
Meteor.call 'updateLiveRace'
# Add another race to the collection
Meteor.call 'insertNewRace'
And here is the code from my Meteor.methods
removeLiveRace: ->
id = RaceList.findOne( { live: true } )?._id
if id
RaceList.remove _id: id
updateLiveRace: ->
id = _.first( RaceList.find().fetch() )._id
RaceList.update id, $set: live: true
insertNewRace: ->
RaceList.insert
live : false
users : []
Any help is greatly appreciated. I'm still just getting started with Meteor so any advice to make this code more efficient/safe would be great!
Thanks : )
While it doesn't look like you're using Cron, the standard warning message still applies to you:
Caveats
Beware, SyncedCron probably won't work as expected on certain shared
hosting providers that shutdown app instances when they aren't
receiving requests (like Heroku's free dyno tier or Meteor free
galaxy).
In other words, because you are using free services, any moments of app shutdown will mess with any Cron/time-based functions.
Figured it out. It was because my collection was sending an entry to the bottom of the collection when I added more than 1 user to it. I have no idea why this would happen? I just set the collection to sort by createdAt wherever I query it and that fixed the problem.