Keep set members sorted in Redis

Keep set members sorted in Redis - nosql

I'm coding an IM system.
I'm using Redis and JSON to store the data. I have a Redis Set with the conversation IDs. When I retrieve them, I would like to get the list sorted by the timestamp of the messages:
conversation 9 -> last message timestamp: 1390300000
conversation 12 -> last message timestamp: 1390200000
conversation 7 -> last message timestamp: 1390100000
I have a Set with the conversations where each user participates (user1337:conversations) and a List with the JSON-encoded messages of each conversation (conversation1234:messages).
I guess there is no need for tricks, so it can be done natively with Redis. How would you manage to achieve this?

Sounds like a Sorted Set is exactly what you need.
You would set the timestamp of each conversation as its score (see ZADD) and then you can retrieve them ordered, using commands like ZRANGE, ZRANGEBYSCORE, ZREVRANGE and ZREVRANGEBYSCORE.

Related

How to get most recent message in a messaging DB? (RethinkDB)

Hi I'm building a chat messaging system and am reading/writing to the DB for my first time. I'm creating the method calls to retrieve relevant data I need for this chat. Like with most chat systems, I want to have a general list of message with the name, most recent message, and time/date associated with the most recent message. Once I click on that thread, the corresponding messages will show up. I'm trying to call the DB but am having trouble using the correct command to get the most recent message.
This is what one message contains:
{
"_id": "134a8cba-2195-4ada-bae2-bc1b47d9925a" ,
"clinic_id": 1 ,
"created": 1531157560 ,
"direction": "o" ,
"number": "14383411234" ,
"read": true ,
"text": "hey hows it going"
}
Every single message that is sent and received gets POSTed like this. I'm having trouble coming up with the correct commands to get the most recent message of all the distinct "number" so that for number x, I get its corresponding recent message and with number y, I get its corresponding recent message. "created" is the time when the message was created in UNIX time.
This is what I have started with:
Retrieve all thread numbers:
r.db('d2').table('sms_msg').between([1, r.minval], [1, r.maxval], {index:'clinic_number'}).pluck(['number']).distinct()
Retrieve all messages in specific thread:
r.db('d2').table('sms_msg').getAll([1, "14383411234"], {index:'clinic_number'})
Retrieve recent message for all distinct threads:
r.db('d2').table('sms_msg').filter()....???
Some help would really be appreciated!

That's a very tricky query for any database, and usually involves a multitude of sub-queries. You might want to consider denormalizing it, keeping a reference to last entry in another table, for each number.
But basically, with your current approach, this might work (untested) but might be highly inefficient:
r.table('sms_msg').orderBy(r.desc('created')).group('number').nth(0)
It's usually fast to get the lowest value of the property of a document, but when you want the whole document of a sorted list like this, it is very inefficient in my experience.

IBM Cloudant DB - get historical data - best way?

I'm pretty confused concerning this hip thing called NoSQL, especially CloudantDB by Bluemix. As you know, this DB doesn't store the values chronologically. It's the programmer's task to sort the entries in case he wants the data to.. well.. be sorted.
What I try to achive is to simply get the last let's say 100 values a sensor has sent to Watson IoT (which saves everything in the connected CloudantDB) in an ORDERED way. In the end it would be nice to show them in a D3.css style kind of graph but that's another task. I first need the values in an ordered array.
What I tried so far: I used curl to get the data via PHP from https://averylongID-bluemix.cloudant.com/iotp_orgID_iotdb_2018-01-25/_all_docs?limit=20&include_docs=true';
What I get is an unsorted array of 20 row entries with random timestamps. The last 20 entries in the DB. But not in terms of timestamps.
My question is now: Do you know of a way to get the "last" 20 entries? Sorted by timestamp? I did a POST request with a JSON string where I wanted the data to be sorted by the timestamp, but that doesn't work, maybe because of the ISO timestamp string.
Do I really have to write a javascript or PHP script to get ALL the database entries and then look for the 20 or 100 last entries by parsing the timestamp, sorting the array again and then get the (now really) last entries? I can't believe that.
Many thanks in advance!

I finally found out how to get the data in a nice ordered way. The key is to use the _design api together with the _view api.
So a curl request with the following URL / attributes and a query string did the job:
https://alphanumerical_something-bluemix.cloudant.com/iotp_orgID_iotdb_2018-01-25/_design/iotp/_view/by-date?limit=120&q=name:%27timestamp%27
The curl result gets me the first (in terms of time) 120 entries. I just have to find out how to get the last entries, but that's already a pretty good result. I can now pass the data on to a nice JS chart and display it.

One option may be to include the timestamp as part of the ID. The _all_docs query returns documents in order by id.
If that approach does not work for you, you could look at creating a secondary index based on the timestamp field. One type of index is Cloudant Query:
https://console.bluemix.net/docs/services/Cloudant/api/cloudant_query.html#query
Cloudant query allows you to specify a sort argument:
https://console.bluemix.net/docs/services/Cloudant/api/cloudant_query.html#sort-syntax
Another approach that may be useful for you is the _changes api:
https://console.bluemix.net/docs/services/Cloudant/api/database.html#get-changes
The changes API allows you to receive a continuous feed of changes in your database. You could feed these changes into a D3 chart for example.

Structuring Firebase database for Messages with Auto ID

I have been structuring my Firebase database so that messages end up being stored sequentially. Unfortunately, firebase must have made a change because now numbers no longer sort sequencially, so I have to store data a different way and still get the messages to come out in the correct order. The mainstream way to do this is to store a the message data by Auto ID. I do not understand how to properly retrieve the data though to get it back in the correct order. I have read a lot of code and docs and still that question lingers.
So, to clarify: This is what a message in my Database would look like:
id123
NAME: John
MESSAGE: Hi guys!
DATE: 11/11/16
TIME: 6:04 PM
I stored the id in that format ("id"+numberOfPost). It is considered good practice to store by Auto ID, but I do not know how to retrieve the data in its proper order. Ex. of my Database structure with Auto ID.
Kr7r2kupqerhrepqbuixd
NAME: John
MESSAGE: Hi guys!
DATE: 11/11/16
TIME: 6:04 PM
This would sort to be in the "K" section, and the post after it could start with a "B", making the database out of order. How do I structure the database to use Auto ID and yet be able to get messages in the proper order?
My code for posting a message looks like this:
self.firebase.child("Chats").child(chatID!).child("id\(counter)").child("MESSAGE").setValue(message!)
self.firebase.child("Chats").child(chatID!).child("id\(counter)").child("FIRST_NAME").setValue(fn!)
self.firebase.child("Chats").child(chatID!).child("id\(counter)").child("LAST_NAME").setValue(ln!)
self.firebase.child("Chats").child(chatID!).child("id\(counter)").child("ID").setValue(id!)
self.firebase.child("Chats").child(chatID!).child("id\(counter)").child("DATE").setValue(date!)
self.firebase.child("Chats").child(chatID!).child("id\(counter)").child("TIME").setValue(time!)
Thank you for your help!

IPython parallel : how to recover job IDs from IPcontroller

I have a server running IP controller and 12 IPengines. I connect to the controller from my laptop using SSH. I submitted some jobs to the controller using the load-balanced view interface (in non-blocking mode) and stored the message IDs in the Asyc Result object returned the by apply_async() method.
I accidentally lost the message IDs for the jobs and wanted to know if there's a way to retrieve the job IDs (or the results) from the Hub database. I use a SQLite database for the Hub, and I can get the rc.db_query() method to work, but I don't know what to look for.
Does anyone know how to query the Hub database only for message IDs of the jobs I submitted? What's the easiest way of retrieving the job results from the Hub, if I don't have access to the AsyncHubResult object (or their message IDs)?
Thanks!

Without the message IDs, you are might have a pretty hard time finding the right tasks, unless there haven't been so many tasks submitted.
The querying is based on MongoDB (it's a passthrough when you use mongodb, and a subset of simple operators are implemented for sqlite).
Quick summary: a query is a dict. If you use literal values, they are equality tests, but you can use dict values for comparison operators.
You can search by date for any of the timestamps:
submitted: arrived at the controller
started: arrived on an engine
completed: finished on the engine
For instance, to find tasks submitted yesterday:
from datetime import date, time, timedelta, datetime
# round to midnight
today = datetime.combine(date.today(), time())
yesterday = today - timedelta(days=1)
rc.db_query({'submitted': {
'$lt': today, # less than midnight last night
'$gt': yesterday, # greater than midnight the night before
}})
or all tasks submitted 1-4 hours ago:
found = rc.db_query({'submitted': {
'$lt': datetime.now() - timedelta(hours=1),
'$gt': datetime.now() - timedelta(hours=4),
}})
With the results of that, you can look at keys like client_uuid to retrieve all messages submitted by a given client instance (e.g. a single notebook or script):
client_id = found[0]['client_uuid']
all_from_client = rc.db_query({'client_uuid': client_uuid})
Since you are only interested in results at this point, you can specify keys=['msg_id'] to only retrieve the message IDs. We can then use these msg_ids to get all the results produced by a single client session:
# construct list of msg_ids
msg_ids = [ r['msg_id'] for r in rc.db_query({'client_uuid': client_uuid}, keys=['msg_id']) ]
# use client.get_result to retrieve the actual results:
results = rc.get_result(msg_ids)
At this point, you have all of the results, but you have lost the association of which results came from which execution. There isn't a lot of info to help you out there, but you might be able to tell by type, timestamps, or perhaps select the 9 final items from a given session.

How to implement unread/new posts/comments in NoSQL document store like Mongodb?

I've searched and didn't find any exact answer to this common problem.
I would like to show to users new/unread posts in a way that they get for example list of topics with unreaded posts.
If user decide and open any of that topics then automatically it's marked as read and will not show inside that list again when he click on unread posts. Plus the possibility to mark all as read.
I was thinking about maybe showing unread posts only from last 30 days, so data would not be so big.
The obvious and best solution would be making this inside objects embedded inside arrays, every array would have userid and timestamp of last view of specific topic, then i would just compare timestamp of the last post in thread to timestamp of last view of that thread by the user.
Only 2-3 queries then would be needed to show results to the user.
So for example it would look like this:
{
_id: uniqueObjectid,
id: topic_id,
topic: topic of the thread,
last_update: timestamp of last reply to that topic,
reads: [
{id: userid, last_view: timestamp of last view on this topic by user},
...
]
}
I would delete all threads from this collection that have last_update field older than 30 days.
Showing then unread posts for users would be very easy, just compare last_update with last_view for certain userid.
But it's not a good solution, from what i've read the way arrays are implemented in mongodb make that solution very slow. Imagine having last view of some topic for 1000 users, it means 1000 indexed array elements.
So it can't be done in arrays.
Here Asya from MongoDB describes why big embedded arrays should not be used link
I am having difficulties to think of any other efficient way of solving this issue.

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse