I have an unreliable data source. I ask for 20 data items from it and it
might return:
20 items
1-20 items
0 items (there will be no more items from that data source ever)
I want to create a data provider that ensures that if I request 20 items I will always get 20 items by doing multiple requests to data source if needed.
So if I get 14 and then 11 items I need to cache those remaining 5 items for next request if there is one.
This class implementing this interface should handle all the logic I mentioned here.
interface GuaranteedDataPrivider {
fun getData(itemCount : Int) : Single<List<Data>>
}
How one should go about doing that?
Related
I have a group chat feature in my app that has its messages node structured like this. Currently, it doesn't use the fan-out technique. It just lists all of the messages under the group name e.g. "group1"
groups: {
group1: {
-MEt4K5xhsYL33anhXpP: {
fromUid: "diidssm......."
userImage: "https://firebasestorage..."
text: "hello"
date: 1617919946
emojis: {
"heart": 2
"like": 1
}
}
-MEt8BLP2yMEUMPbG2zV: {
...
}
-MF-Grpl8Jchxpbn2mxH: {
...
}
-MF-OUjWXsFh7lBPosMf: {
...
}
}
}
I first observe the most recent 40 messages and observe whether new children get added as such
ref = Database.database().reference().child("groups").child("group1")
ref.queryLimited(toLast: 40).observe(.childAdded, with: { (snapshot) in
...
//add to messages array to load collection view
//for each message observe emojis and update emojis to reflect changes e.g. +1 like
ref.child("emojis").observe(.value, with: { (snapshot) in
...
})
})
Every time the user scrolls up I load another 40 messages (and observe the emojis child under each of those message nodes) using the last date (and index by date in security rules) as such
ref.queryOrdered(byChild: "date").queryEnding(beforeValue: prevdate, childKey: messageId).queryLimited(toLast: 40).observeSingleEvent(of: .value, with: { (snapshot) in
I understand the fan-out technique is used to get less information per synchronization. If I attach a listener to the groups/groupname/ to get a list of all messages for that group, I will also ask for all the info of each and every message under that node. With the fan out approach I can also just ask for the message information of the 40 most recent messages and the next 40 per scroll up using the keys of the messages from another node like this.
allGroups: {
group1: {
-MEt4K5xhsYL33anhXpP: 1
-MEt8BLP2yMEUMPbG2zV: 1
-MF-Grpl8Jchxpbn2mxH: 1
-MF-OUjWXsFh7lBPosMf: 1
}
}
However, if I am using queryLimited(toLast: 40) is the fan-out approach beneficial or even necessary? Wouldn't this fix the problem of "I will also ask for all the info of each and every message under that node"?
In terms of checking for new messages, I just check using .childAdded in the first code above (ref.queryLimited(toLast: 40).observe(.childAdded)). According to the post below, queryLimited(toLast: 40) will sync the last 40 child nodes, and keep synchronizing those (removing previous ones as new ones are added).
Some questions about keepSynced(true) on a limited query reference
I'm assuming if group1 had 1000 messages, with this approach I am just reading the 40 most recent messages I need and the next 40 per scroll, thus ignoring the other several hundred. Why would I use the fan-out technique then? May be I'm not understanding something fundamental about limited queries.
Side Question: Should I be including references to profile images under each message node? Is it bad to do this in terms of cloud storage and realtime database storage? Ideally there would be hundreds of groupchats.
There's a lot of comments to the question so I thought I would condense all of that into an answer.
The intention of the 'fan out technique' in the question was to maximize query performance.
In this use case the query only returns the last 40 results
ref.queryLimited(toLast: 40)
The assumption in the question was that Firebase had to 'go through' all of the nodes before those 40 to get to the 40, therefore affecting performance. That's not the case with Firebase so whether it be the first 40 or the last 40, the performance is 'the same'.
Because of that, no 'fan-out' is really needed in this situation. For clarity
Fan-out is the process duplicating data in the database. When
data is duplicated it eliminates slow joins and increases read
performance.
I am going to steal a fan out example from an old Firebase Blog. Here's a fan out to update multiple nodes at once, and since it's an atomic operation it either all passes or all fails.
let updatedUser = ["name": "Shannon", "username": "shannonrules"]
let ref = Firebase(url: "https://<YOUR-FIREBASE-APP>.firebaseio.com")
let fanoutObject = ["/users/1": updatedUser,
"/usersWhoAreCool/1": updatedUser,
"/usersToGiveFreeStuffTo/1", updatedUser]
ref.updateChildValues(updatedUser) // atomic updating goodness
I will also include a link to Introducing multi-location updates and more as well as suggesting a read on the topic of denormalization.
In the question, there isn't really any data to 'fan out' so it would not be applicable as there isn't an attempt to join (pull data from multiple nodes) or to update multiple nodes.
The one change I would suggest would be to remove the emoji's node from the message node.
As is, every one of those has an observer which results in thousands of observers which can be difficult to manage. I would create a separate high-level node just for those emojis
emojis
-MEt4K5xhsYL33anhXpP: //the message id
"heart": 2 //or however you want to store them
"like": 1
Then add a single observer (much easier to manage!) to the emoji node. When an emoji changes, that one observer will notify the app of which message it was for, and what the change was. It will also cut down on reads and overall cost.
I need to serve elements sorted on a particular field (score) to a client in a paginated fashion.
The elements are stored in MongoDB as part of a collection. One document looks like this:
{
"id": ObjectId("<>"),
"score": 10
}
To serve the elements, I reverse sort the documents on the score field, and serve 10 elements to the client.
Also, the value in the score field is continuously getting updates from another consumer in an async fashion.
How can I perform pagination of such documents? I was thinking about the following approaches that I usually use, but cannot find a way to fit them in the above design:
Return the last served score as offset and in the next request to fetch elements use the offset.
Issue: This would return some duplicates with the same score (as many elements can have same scores).
I have a more general question about how to implement lazy loading in my projects for lists, grids, etc.. My question is not about an specific framework or language, because I develop different project types with C#, java, android, winforms, etc.
My question is, how to implement this lazy loading pattern in a RESTful envirorment?
For example, I have a database with about 2 mio datasets. The user select some filters on client side and the server response about 100.000 records. Thats to much to show them all, all over that the time to "load and render" the items in the list control takes several minutes (on bad days). The better way is to show the user the first 200 items, and load the next blocks as needed.
Another example is search for images on a mobile device. If the search result are about 10.000, the mobile traffic for image loading will be exploding. So it's better to show up to 20 entries and if the user scrolls to bottom for example, the next 20 are loaded.
So, how can I archive that on backend / db / client side? Unfortunatly I can't query "select 200 to 400 from table where ...".
On some extern API's I could send a "page token" to get the next block of items. But how do I know which "page" the user ask for?
A bad way I've tried is to load with every request the whole collection (the 100.000 records in first example), and serve only the part which is needed. But this is very ressource wasty when the quantity of clients expands and 10.000 clients do this every minute serveral times.
You can use pagination for this. For example, the client asks for the first page of x entries, so the server will only fetch that from the DB (see LINQ's skip/take):
DB.Skip(pageNumber * pageSize).Take(pageSize);
The client can alter the filter, pageNumber ("next page") or pageSize without any issues.
I have a REST web-service which is expected to expose a paginated GET call.
For eg: I have a list of students( "Name" , "Age" , "Class" ) in my sql table. And I have to expose a paginated API to get all students given a class. So far so good. Just a typical REST api does the job and pagination can be achieved by the sql query.
Now suppose we have the same requirement just that we need to send back students who are from particular state. This information is hosted by a web-service, S2. S2 has an API which given a list of student names and a state "X" returns the students that belong to state X.
Here is where I'm finding it difficult to support pagination.
eg: I get a request with page_size 10, a class C and a state X which results in 10 students from class C from my db. Now I make a call to S2 with these 10 students and state X, in return, the result may include 0 students, all 10 students, or any number students between 0 and 10 from state 'X'.
How do I support pagination in this case?
Brute force would be to make db calls and S2 calls till the page size is met and then only reply. I don't like this approach .
Is there a common practice followed for this, a general rule of thumb, or is this architecture a bad service design?
(EDIT): Also please tell about managing the offset value.
if we go with the some approach and get the result set , how can I manage the offset for next page request ?
Thanks for reading :)
Your service should handle the pagination and not hand it off the SQL. Make these steps:
Get all students from S1 (SQL database) where class = C.
Using the result, get all students from S2 that are in the result and where state = X.
Sort the second result in a stable way.
Get the requested page you want from the sorted result.
All this is done in the code that calls both S1 and S2. Only it has the knowledge to build the pages.
Not doing the pagination with SQL can lead to performance problems in case of large databases.
Some solution in between can be applied. I assume that the pagination parameters (offset, page size) are configurable for both services, yours and the external one.
You can implement prefetch logic for both services, lets say the prefetch chunk size can be 100.
The frontend can be served with required page size 10.
If the prefetched chunks do not result in a frontend page size 10, the backend should prefetch another chunk till the fronend can be served with 10 students.
This approach require more logic in backend to calculate the next offsets for prefetching, but if you want performance and pagination solved you must invest some effort.
I have events module .my user favroite some of random events from listing . i will store them in favroite table of my database
id module
1 events
5 events
9 events
2 business
now , my question how can i make a query to fetch 1,5,9 at single request for event ? is there any way to request it
Yes, you can filter by id to get multiple events in one request, look at the selected answer to this question to learn how to do that. Should work out of the box.