Wicket: Paginate a huge result query - wicket

I'm getting over 100 MB of results from an API which I need to paginate because it takes over 10 minutes to sort. All of the results are reglar JSON Objects
How could I paginate this amount of data using wicket?

100MB is a an amount that you better do not keep in memory! Better store it (temporarily) into some Document NoSQL database (like Couchbase, MongoDB, or similar). Then use the database query language to read one page at a time.

Wicket offers components working with an IDataProvider, an interface that support data paging.
https://ci.apache.org/projects/wicket/guide/8.x/single.html#_pageable_repeaters
You'll probably have to cache your 100 MB result somewhere, since you don't want to reload the data on each paging. You should not store it inside a component though, or it will be serialized along with the containing page.

Related

What are the practical size limits of streaming a Firestore collection (in Flutter)

I have a collection that is about 2000 docs in size. I want to stream a List of just their ids.
As follows...
Stream<QuerySnapshot> stream = FirebaseFirestore.instance.collection('someCollection').snapshots();
return stream.map((querySnapshot) => querySnapshot.docs.map((doc) => doc.id).toList());
Will this be a significant performance issue on the client side? Is this an unrealistic approach?
When you query for documents in Firestore using the client SDKs, you are pulling down the entire document every time. There is no way to reduce the amount of data - all of the matching documents are sent across the wire in their entirety.
As such, you use of map() to extract only the document ID has no real effect on performance here, since it runs after the query is complete and you have all of that data in a snapshot on the client. All you are doing is trimming down the entire document down to a string, but you are not saving on the cost of transferring that entire document.
If you want to make this faster, you should make the query on a backend (such as Cloud Functions), ideally in the same region as your Firestore instance, and trim the documents in your backend code before you send the data to the frontend. That will save you the cost of unnecessarily trasferring the contents of the document you don't need.
Read: Should I query my database directly or use Cloud Functions?
Performance implications will mostly come from:
The bandwidth consumed for transferring the documents.
The memory used for keeping the DocumentSnapshot objects.
Since you're throwing the document.data() of each snapshot away, that is quite directly where the quickest gains will be. If your documents have few fields, the gains will be small. If they have many fields, the gains will larger.
If you have a lot to gain by not transferring the fields and keeping them in memory, the main options you have are:
Use a server-side SDK to get only the document IDs, and transfer only that back to the client.
Create secondary/proxy documents that contain only the ID, and no data.
While the first approach is tempting because it reduces data duplication, the second one is typically a lot simpler to implement (as you're only going to be impacting the code that handles data writes).

Small collection with document average size of 30kb - mongodb query very slow

I am building a chat service and I designed the chat schema, so that it nests all the users that belong to this chat. I only nested the essential data of users such as name and avatarUrl and userId.
I believe compared to relational databases, this nesting feature is the power of MongoDB, where you basically store the "JOIN"ed data. So querying a nested document should generally be faster than querying a chat row and joining to multiple user rows.
Now even though I only store essential data, some of the chat document size became quite large(30kB), because there were chat rooms where it had more than 100 users. There will be a max user limit so the chat document will not grow indefinitely. But nesting about 100 users, leading to a document size of about 30kb looks reasonable to me.
But, then I realized that the chat page loading with large users became significantly slow. I measured the time taken for the query to execute from backend(node.js on local environment. so there is some latency. my laptop is in Korea and the database server is in US).
For small documents the query time was within 230ms, but for the 30kb document a simple Chat.findOne({_id:chatId }), took about 500ms and thats just way too long. The collection only has like 100 documents, so index would not improve performance.
Now two things come in my mind.
First, Why is the document so big? Maybe its best practice to remove keys and store everything in array(matrix) format? This would be terrible to work with in the backend... But maybe this is necessary for performance optimization later. Is this common practice?
Second, my real question. Why does it take so long to load 20kb of data? if 20kb already takes up 500ms, I am guessing that larger documents would be pretty much unusable.
I am a huge fan of MongoDB and its nesting style. But if the document needs to stay small for reasonable response time, then this is really upsetting. Because it means I can only use nesting sparingly, then I would have to use $lookup or mongoose populate to make joins, which are terrible in terms of performance.
I am currently using MongoDB 4.4 with MongoDB Atlas M10. The current app has a very small user pool, which is why I thought M10 is sufficient.

Can I store data that won't affect query performance in MongoDB?

We have an application which requires saving of data that should be in documents, for querying and sorting purposes. The data should be schema less, as some of the fields would be known only via usage. For this, MongoDB is a great solution and it works great for us.
Part of the data in each document, is for displaying purposes. Meaning the data can be objects (let's say json) that the client side uses in order to plot diagrams.
I tried to save this data using gridfs, but the use cases makes it not responsive enough. Also, the documents won't exceed the 16 MB limits even with the diagram data inside them. And in fact, while trying to save this data directly within the documents, we got better results.
This data is used only for client side responses, meaning we should never query it. So my question is, can I insert this data to MongoDB, and set it as a 'not for query' data? Meaning, can I insert this data without affecting Mongo's performance? The data is strict and once a document is inserted, there might be only updating of existing fields, not adding new ones.
I've noticed there is a Binary Data type in Mongo, and I am wondering if I should use this type for objects that are not binary. Can this give me what I'm looking for?
Also, I would love to know what is the advantage in using this type inside my documents. Can it save me disk space?
As at MongoDB 3.4, read and write operations are atomic on the level of a single document from the storage/memory point of view. If the MongoDB server needs to fetch a document from memory or disk (even when projecting a subset of fields to return) the full document generally has to be loaded into memory on a mongod. The only exception is if you can take advantage of covered queries where all of the fields returned are also included in the index used.
This data is used only for client side responses, meaning we should never query it.
Data fields which aren't queried directly do not need to be in any indexes. However, there is currently no concept like "not for query" fields in MongoDB. You can query or project any field (with or without an index).
Meaning, can I insert this data without affecting Mongo's performance?
Data with very different access or growth patterns (such as your infrequently requested client data) is a recommended candidate for storing separately from a parent document with frequently accessed data. This will improve the efficiency of memory usage for mongod by avoiding unnecessary retrieval of data when working with documents in the parent collection.
I've noticed there is a Binary Data type in Mongo, and I am wondering if I should use this type for objects that are not binary. Can this give me what I'm looking for? Also, I would love to know what is the advantage in using this type inside my documents. Can it save me disk space?
You should use a type that is most appropriate for the data that you are storing. Storing text data as binary will not gain you any obvious efficiencies in server storage. However, storing a complex object as a single value (for example, a JSON document serialized as a string) could save some serialization overhead if that object will only be interpreted via your client-side code. Binary data stored in MongoDB will be an opaque blob as far as indexing or querying, which sounds fine for your purposes.

Using nested document structure in mongodb

I am planning to use a nested document structure for my MongoDB Schema design as I don't want to go for flat schema design as In my case I will need to fetch my result in one query only.
Since MongoDB has a size limit for a document.
MongoDB Limits and Threshold
A MongoDB document has a size limit of 16MB ( an amount of data). If your subcollection can growth without limits go flat.
I don't need to fetch my nested data but only be needing my nested data for filtering and querying purpose.
I want to know whether I will still be bound by MongoDB size limits even if I use my embedded data only for querying and filter purpose and never for fetching of nested data because as per my understanding, in this case, MongoDB won't load the complete document in memory but only the selected fields?
Nested schema design example
{
clinicName: "XYZ Hopital",
clinicAddress: "ABC place.",
"doctorsWorking":{
"doctorId1":{
"doctorJoined": ISODate("2017-03-15T10:47:47.647Z")
},
"doctorId2":{
"doctorJoined": ISODate("2017-04-15T10:47:47.647Z")
},
"doctorId3":{
"doctorJoined": ISODate("2017-05-15T10:47:47.647Z")
},
...
...
//upto 30000-40000 more records suppose
}
}
I don't think your understanding is correct when you say "because as per my understanding, in this case, MongoDB won't load the complete document in memory but only the selected fields?".
If we see MongoDB Doc. then it reads
The maximum BSON document size is 16 megabytes. The maximum document size helps ensure that a single document cannot use excessive amount of RAM or, during transmission, excessive amount of bandwidth. To store documents larger than the maximum size, MongoDB provides the GridFS API.
So the clear limit is 16 MB on document size. Mongo should stop you from saving such a document which is greater than this size.
If I agree with your understanding for a while then let's say that it allows to
save any size of document but more than 16 MB in RAM is not allowed. But on other hand, while storing the data it won't know what queries will be run on this data. So ultimately you will be inserting such big documents which can't be used later. (because while inserting we don't tell the query pattern, we can even try to fetch the full document in a single shot later).
If the limit is on transmission (hypothetically assuming) then there are lot of ways (via code) software developers can bring data into RAM in clusters and they won't cross 16 MB limit ever (that's how they do IO ops. on large files). They will make fun of this limit and just leave it useless. I hope MongoDB creators knew it and didn't want it to happen.
Also if limit is on transmission then there won't be any need of separate collection. We can put everything in a single collections and just write smart queries and can fetch data. If fetched data is crossing 16 MB then fetch it in parts and forget the limit. But it doesn't go this way.
So the limit must be on document size else it can create so many issues.
In my opinion if you just need "doctorsWorking" data for filtering or querying purpose (and if you also think that "doctorsWorking" will cause document to cross 16 MB limit) then it's good to keep it in a separate collection.
Ultimately all things depend on query and data pattern. If a doctor can serve in multiple hospitals in shifts then it will be great to keep doctors in separate collection.

Is it possible to run queries on 200GB data on mongodb with 16GB RAM?

I am trying to run a simple query to find number of all records with a particular value using:
db.ColName.find({id_c:1201}).count()
I have 200GB of data. When I run this query, mongodb takes up all the RAM and my system starts lagging. After an hour of futile waiting, I give up without getting any results.
What can be the issue here and how can I solve it?
I believe the right approach in the NoSQL world isn't trying to perform a full query like that, but accumulate stats overtime.
For example, you should have a collection stats with arbitrary objects which should own a kind or id property that can take a value like "totalUserCount". Whenever you add an user, you also update this count.
This way you'll get instant results. It's just getting a property value in a small collection of stats.
BTW, this slowness should be originated by querying objects by a non-indexed property in your collection. Try to index id_c and probably you'll get quicker results.
That amount of data can easily be managed by MySQL, MSSQL or Oracle with the given hardware specification. You don't need a NoSQL database for that, NoSQL databases are made for much larger storing needs which actually require lots of hardware (RAM, harddisks) to be efficient.
You need to define an index to read that id and use a normal SQL database.