select records for each page in dataobjects.net - select

I have lots of records in the database, and I have a control that pages those records. How do I select records for each page? For example I need to select records from 51st record to 100th record. And I can't use LINQ expressions. I am using dataobjects 3.9 .
So I start as
Query q = new Query("select SomeClass objects");

Use this query:
Query q = new Query("select top 100 SomeClass objects");
As far as I remember, there is no way to specify .Skip-like condition in case with DO39, so you should do this manually (e.g. by applying .Skip to enumerable you've got).
There is an obvious performance impact in this case, but it isn't essential in terms of computational complexity. The only effect of this is that more rows will be sent by SQL Server to the client, but all the other the job it must do remains the same.
An example illustrating this:
if you'll ask Google to show you 1000th page of result, it will anyway
find all the document related to your query, compute match rank for
each of them, sort it to get at least first 1000 of pages with best
match ranks and only after all this job it will be able to give you
1000th page.
So if there is 1,000,000,000,000 of documents, the computational
complexity of sending 10K rows to the client is tiny in comparison
with all the other job done.
Also note that the whole idea of paging is to show a tiny fraction of the whole set of data. So if your user needs to paginate to e.g. 1000th page, there is something wrong with design. There are just two cases:
User must get a tiny fraction of data (i.e. perform some search)
User must get all the data (e.g. to make a backup)
There are no intermediate cases.

Related

Suggested way to structure Firestore database for deeply nested set of spreadsheet like objects

My application is used for creating production budgets for complex projects (construction, media productions etc.)
The structure of the budget is as follows:
The budget contains "sections",
the sections contain "accounts"
the accounts contains "subaccounts"
the subaccounts contain line items.
Line items have a number of fields, (units, rate, currency, tax etc.) and a calculated total
Or perhaps using Firestore to do these cascading calculations is the wrong approach? I should just load a single complex budget document into my app, do all the cacluations and updates on the clients, and then write back the entire budget as a single document when the user presses "save budget"?
Certain fields in line items may have alpha numeric codes which represent numeric values, which a user can use instead of a hard-coded number, e.g. user can enter "=build-weeks" and define that with a formula that evaluates to say "7" which is then used in the calculation of a total.
Line items bubble up their totals, so subaccounts have total equal to the sum of their line items,
Accounts total equals the total of their subaccounts,
sections total equals sum of accounts totals,
and budget total is total of section totals.
The question si how to aggregate into this data into documents comprising the budget.
Budgets may be sort of long, say 5,000 linesitems or more in total. Single accounts may have hundreds of line items.
Users will most likely look at a all of the line items for a given account, so it occurred to me
to make individual documents for sections, accounts and subaccounts, and make line items a map within a sub account.
The problem main concern I have with this approach is that when the user changes, say the exchange rate of currency of a line item, or changes the calculated value of a named value like "build-weeks" I will ahve to retrieve all the individual line items containing that curency or named value, recalculate the total, and then bubble up the changes through the hierarchy.
This seems not that complicated if each line item is its own document, I can just search the collection for the presence of the code in question, recalculate the line item, and use a cloud function to bubble up teh changes maybe.
But if all the lineitems are contained in an array of maps within each subaccount map item,
it seems like it will be quite tedious to find and change them when necessary..
On the other hand -- keeping these documents so small seems like a lot of document reads when somebody is reviewing a budget, or say, printing it, If somebody just clicks on a bunch of accounts, it might be 100's of reads per click to retrieve all the line items and hundreds or a thousand writes when somebody changes the value of a often used named value like "build-weeks".
Does anybody have any thoughts on the obvious "right" answer to this? Or does it just depend on what I want to optimize for - firestore costs, responsiveness of app, complexity of code?
From my stand point, there is no obvious answer to your problem and indeed it does depend on what you want to optimize for.
However there are a few points that you need to consider on your decision:
Documents in Firestore have a limit of 1Mb/Document;
Documents in Firestore have a limit of 20000 fields;
Queries are shallow, so you don't get data from subcollections on the same query;
For considerations 1 and 2, this means that if you choose the design you database to a big document containing everything, even though you said that your app will have lots of data, I doubt that it will be more than the limits mentioned, still, do consider those. Also, how necessary is it to get all the data at once, this could represent performance and user battery/data usage issues (if you are making a mobile app).
For consideration 3, it means that you would have to make many reads if you choose to get all the data for your sections divided in subdocuments, this will mean more cost to you but better performance for users.
To make the right call on this problem I suggest that you talk to possible users of your solution and understand the problem that you are trying to fix and what they expect of the app. Also, it might be interesting to take a look at the How to Structure Your Data and Maps, Arrays and Subcollections videos, as they explain in a more visual way how Firestore behaves and it could be helpful to antecipate problems that the approach you choose could cause.
Hope I was able to help with these considerations.

MongoDB: what is faster: single find() query or many find_one()?

I have the following problem connected to the MongoDB database design. Here is my situation:
I have a collection with about 50k documents (15kB each),
every document have a dictionary storing data samples,
my query always gets all the data from the document,
every query uses an index,
the collection have only one index (based on a single datetime field),
in most cases, I need to get data from many documents (typically 25 < N < 100),
it is easier for me to perform many SELECT queries over a single one,
I have a lot of updates in my databases, much less than SELECT ones,
I use the WiredTiger engine (the newest version of MongoDB),
server instance and web application are on the same machine.
I have two possibilities for making a SELECT query:
perform a single query retrieving all documents I am interested in,
perform N queries, everyone gets a single document, where typically 25 < N < 100 (what about a different scenario when 100 < N < 1k or 1k < N < 10k?)
So the question is if there is any additional overhead when I perform many small queries over a single one? In relational databases making many queries is a very bad practice - but in NoSQL? I am asking about a general practice - should I avoid that much queries?
In the documentation, I read that the number of queries is not important but the number of searches over documents - is that true?
Thanks for help ;)
There is a similar question like the one you asked : Is it ok to query mongodb multiple times
IMO, for your use-case i.e. 25<N<100, one should definitely go with batching.
In case of Single queries :
Looping in a single thread will not suffice, you'll have to make parallel requests which would create additional overhead
creates tcp/ip overhead for every request
there is a certain amount of setup and teardown for each query creating and exhausting cursors which would create unnecessary overhead.
As explained in the answer above, there appears be a sweet-spot for how many values to batch up vs. the number of round trips and that depends on your document type as well.
In broader terms, anything 10<N<1000 should go with batching and the remaining records should form part of other batches but querying single document at a time would definitely create unnecessary overhead.
The problem when you perform small queries over one query is network overhead that is the network latency roundtrip.
For a single request in a batch processing it may be not much, but if you make multiple requests like these or use this technique on frontend it will decrease performance.
Also you may need to preprocess the data like sorting aggregating it manually.

Paginating results in MongoDB without relying on .skip()

I'm building an app that calls data from MongoDB. For purposes of this question, pretend that the user searches my app for a certain query, and MongoDB has 4,000 results to spit out that match the query.
After reading around a bit, I see that it's possible to paginate using the .skip() method, but MongoDB themselves suggest against using this as it requires the curser to iterate through all the records up until the one you're skipping to, which gets more and more expensive the higher in the list you go.
I've seen a few tutorials that rely on the _id property of the results to be sequential, but this doesn't apply here - my database has tens of thousands of records, and each has a unique id, and the 4000 results that apply to the user's query are definitely not going to be sequential.
Can anyone think of a way to do this, or is skip() the only option here?
Other considerations:
The pagination will work based on the position on the page. For instance, the first query should spit out 20 records to my app. When the user scrolls to the bottom of the page, I could potentially get the _id of the 20th element on the page and pass that to my query, find it in the list of 4,000 results, find the subsequent result and start the next set of 20 from there. Is that sort of thing possible, and would it be less CPU intensive than skip()?
Your trick in "other considerations" works only if you add a sort on _id, otherwise you can't guarantee order for follow up queries. If you want to sort on a different field, you need to index that field. I would also suggest you query for 21 elements so that you don't have to go back and find the next one after the 20th element (of course, you can still show only the first 20 elements).
MongoDB ranged pagination has a good example as well.

Create aggregated user stats with MongoDB

I am building a MongoDB database that will work with an Android app. I have a user collection and a records collection. The records documents consist of GPS tracks such as start and end coordinates, total time and top speed and distance. The user document is has user id, first name, last name and so forth.
I want to have aggregate stats for each user that summarizes total distance, total time, total average speed and top speed to date.
I am confused if I should do a map reduce and create an aggregate collection for users, or if I should add these stats to the user document with some kind of cron job type soliuton. I have read many guides about map reduce and aggregation for MongoDB but can't figure this out.
Thanks!
It sounds like your aggregate indicator values are per-user, in which case I would simply calculate them and push them directly into the user object as the same time as you update current co-oordinates, speed etc. They would be nice and easy (and fast) to query, and you could aggregate them further if you wished.
When I say pre-calculate, I don't mean MapReduce, which you would use as a batch process, I simply mean calculate on update of the user object.
If your aggregate stats are compiled across users, then you could still pre-calculate them on update, but if you also need to be able to query those aggregate stats against some other condition or filter, such as, "tell me what the total distance travelled for all users within x region", then depending on the number of combinations you may not be able to cover all those with pre-calculation.
So, if your aggregate stats ARE across users, AND need some sort of filter applying, then they'll need to be calculated from some snapshot of data. The two approaches here are;
the aggregation framework in 2.2
MapReduce
You would need to use MapReduce say, if you've a LOT of historical data that you want to crunch and you can pre-calculate the results for fast reading later. By my definition, that data isn't changing frequently, but even if it did, you can also use incremental MR to add new results to an existing calculation.
The aggregation framework in 2.2 will allow you to do a lot of this on demand, but it won't be as quick of course as pre-calculated values but way quicker than MR when executed on-demand. It can't cope with the high volume result-sets that you can do with MR, but it's better suited to queries where you don't know the parameter values in advance.
By way of example, if you wanted to calculate the aggregate sums of users stats within a particular lat/long, you couldn't use MR because there are just too many combinations of that filter, so you'd need to do that on the fly.
If however, you wanted it by city, well you could conceivably use MR there because you could stick to a finite set of cities and just pre-calculate them all.
But to wrap up, if your aggregate indicator values are per-user alone, then I'd start by calculating and storing the values inside the user object when I update the user object as I said in the first paragraph. Yes, you're storing the value as well as the inputs, but that's the model that saves you having to calculate on the fly.

Query for set complement in CouchDB

I'm not sure that there is a good way to do with with the facilities CouchDB provides, but I'd like to somehow extract the relative complement of the sets of two different document types over a particular key.
For example, let's say that I have documents representing users and posts, both of which have a (unique) username field. There's a validation in place ensuring that a user document exists for the username in every post, but there may be any number post documents with a given username, include none. It's trivial to create a view which counts the number of posts per username. The view can even include zero-counts by emitting zero post-counts for the user documents in the view map function. What I want to do though is retrieve just the list of users who have zero associated posts.
It's possible to build the view I described above and filter client-side for zero-value results, but in my actual situation the number of results could be very, very large, and the interesting results a relatively small proportion of the total. Is there a way to do this sever-side and retrieve back just the interesting results?
I would write a map function to iterate through the documents and emit the users (or just usersnames) with 0 posts.
Then I would write a list function to iterate through the map function results and format them however you want (JSON, csv, etc).
(I would NOT use a reduce function to format the results, even if a reduce function appears to work OK in development. That is just my own experience from lessons learned the hard way.)
Personally I would filter on the client-side until I had performance issues. Next I would probably use Teddy's _filter technique—all pretty standard CouchDB stuff.
However, I stumbled across (IMO) an elegant way to find set complements. I described it when exploring how to find documents missing a field.
The basic idea
Finding non-members of your view obviously can't be done with a simple query (and a straightforward index scan.) However, it can be done in constant memory, and linear time, by simultaneously iterating through two query results at the same time.
One query is for all possible document ids. The other query is for matching documents (those you don't want). Importantly, CouchDB sorts query results, therefore you can calculate the complement efficiently.
See my details in the previous question. The basic idea is you iterate through both (sorted) lists simultaneously and when you say "hey, this document id is listed in the full set but it's missing in the sub-set, that is a hit.
(You don't have to query _all_docs, you just need two queries to CouchDB: one returning all possible values, and the other returning values not to be counted.)