Which sorting method is efficient among sorting using JPA order by or sorting using collection sort() method? - spring-data-jpa

I have a table called 'userProfileEmployment'. It has a start_date col. which stores values in yyyy-mm-dd format. From this I have to fetch the list of employments in Asc order.
Now, I have two approaches for it
1. fetch directly sorted rows from DB through JPA query.
2. fetch the rows as it is from DB, store it in a List and then sort it using collection sort method
I am sharing Code for approach 1:-
List employmentList = userProfileEmploymentRepository.findAllByProfileIdSorted(userProfileId);
Snippet from UserProfileEmploymentRepository.java class:
#Query("select upe from UserProfileEmployment upe where upe.profileId = :profileId and (upe.deleted = 0 or upe.deleted is null) order by upe.startDate")
List<UserProfileEmployment> findAllByProfileIdSorted(#Param("profileId") Long profileId);
Now we get the sorted output from both approaches. So my question is which one of the two is better. Does sorting using the order by clause is more costly or sorting using collection sort method is more better

Sorting in DB is preferable than in-memory(i.e. Collections) in most of the cases.
Firstly because DB's sorting is always optimized to handle large data set and you can be even optimize it better by using indexes.
Secondly, if later you want to start returning paginated data(i.e. slice of whole data) then you can load only those slices by using pageable from DB and save heap memory.

Related

Does Firestore require a separate index for EVERY combination of queried fields? (9 FAILED_PRECONDITION: The query requires an index.)

I am trying to implement logic for querying people collections for suggestion of names based on a combination of arguments of 3 possible (name, surname, nick), eg:
people.where('name', '==', nameArg).where('surname', '>=', surnameArg).where('surname', '<=', surnameArg + '\uf8ff')
or
people.where('nick', '==', nickArg).where('name', '>=', nameArg).where('name', '<=', nameArg+ '\uf8ff')
The idea is that there might be any 1 or 2 fields using == operator and one with >= / <=. I got the error 9 FAILED_PRECONDITION: The query requires an index., so I created an index using all 3 of the fields. But when my query uses only 2 of the fields I still get the same error.
Am I expected to create 4 indices for every combination of query parameters? This feels so wrong.
Does the order of fields in the index matter too?
While you'll often end up with many indexes on your Firestore project, you may not need a composite index for each combination of fields. They key here is that Firestore can use the single-field indexes for equality checks.
From the documentation on taking advantage of index merging:
Although Cloud Firestore uses an index for every query, it does not necessarily require one index per query. For queries with multiple equality (==) clauses and, optionally, an orderBy clause, Cloud Firestore can re-use existing indexes. Cloud Firestore can merge the indexes for simple equality filters to build the composite indexes needed for larger equality queries.
I must admit I always struggle a bit with how to apply this on my own projects, but for example the FriendlyEats sample project uses surprisingly few indexes for the amount of sorting and filtering it allows.

What does the distinct on clause mean in cloud datastore and how does it effect the reads?

This is what the cloud datastore doc says but I'm having a hard time understanding what exactly this means:
A projection query that does not use the distinct on clause is a small operation and counts as only a single entity read for the query itself.
Grouping
Projection queries can use the distinct on clause to ensure that only the first result for each distinct combination of values for the specified properties will be returned. This will return only the first result for entities which have the same values for the properties that are being projected.
Let's say i have a table for questions and i only want to get the question text sorted by the created date would this be counted as a single read and rest as small operations?
If your goal is to just project the date and text fields, you can create a composite index on those two fields. When you query, this is a small operation with all the results as a single read. You are not trying to de-duplicate (so no distinct/on) in this case and so it is a small operation with a single read.

Which MongoDB indexes should be created for different sorting and filtering conditions to improve performance?

I have MongoDB collection with ~100,000,000 records.
On the website, users search for these records with "Refinement search" functionality, where they can filter by multiple criteria:
by country, state, region;
by price range;
by industry;
Also, they can review search results sorted:
by title (asc/desc),
by price (asc/desc),
by bestMatch field.
I need to create indexes to avoid full scan for any of combination above (because users use most of the combinations). Following Equality-Sort-Range rule for creating indexes, I have to create a lot of indexes:
All filter combination × All sortings × All range filters, like the following:
country_title
state_title
region_title
title_price
industry_title
country_title_price
country_industry_title
state_industry_title
...
country_price
state_price
region_price
...
country_bestMatch
state_bestMatch
region_bestMatch
...
In reality, I have more criteria (including equality & range), and more sortings. For example, I have multiple price fields and users can sort by any of that prices, so I have to create all filtering indexes for each price field in case if the user will sort by that price.
We use MongoDB 4.0.9, only one server yet.
Until I had sorting, it was easier, at least I could have one compound index like country_state_region and always include country & state in the query when one searches for a region. But with sorting field at the end, I cannot do it anymore - I have to create all different indexes even for location (country/state/region) with all sorting combinations.
Also, not all products have a price, so I cannot just sort by price field. Instead, I have to create two indexes: {hasPrice: -1, price: 1}, and {hasPrice: -1, price: -1} (here, hasPrice is -1, to have records with hasPrice=true always first, no matter price sort direction).
Currently, I use the NodeJS code to generate indexes similar to the following (that's simplified example):
for (const filterFields of getAllCombinationsOf(['country', 'state', 'region', 'industry', 'price'])) {
for (const sortingField of ['name', 'price', 'bestMatch']) {
const index = {
...(_.fromPairs(filterFields.map(x => [x, 1]))),
[sortingField]: 1
};
await collection.ensureIndex(index);
}
}
So, the code above generates more than 90 indexes. And in my real task, this number is even more.
Is it possible somehow to decrease the number of indexes without reducing the query performance?
Thanks!
Firstly, in MongoDB (Refer: https://docs.mongodb.com/manual/reference/limits/), a single collection can have no more than 64 indexes. Also, you should never create 64 indexes unless there will be no writes or very minimal.
Is it possible somehow to decrease the number of indexes without reducing the query performance?
Without sacrificing either of functionality and query performance, you can't.
Few things you can do: (assuming you are using pagination to show results)
Create a separate (not compound) index on each column and let MongoDB execution planner choose index based on meta-information (cardinality, number, etc) it has. Of course, there will be a performance hit.
Based on your judgment and some analytics create compound indexes only for combinations which will be used most frequently.
Most important - While creating compound indexes you can let off sort column. Say you are filtering based on industry and sorting based on price. If you have a compound index (industry, price) then everything will work fine. But if you have index only on the industry (assuming paginated results), then for first few pages query will be quite fast, but will keep degrading as you move on to next pages. Generally, users don't navigate after 5-6 pages. Also, you have to keep in mind for larger skip values, the query will start to fail because of the 32mb memory limit for sorting. This can be overcome with aggregation (instead of the query) with allowDiskUse enable.
Check for keyset pagination (also called seek method) if that can be used in your use-case.

Can I insert a document in sorted order?

I know I can find documents in sorted order, but can I insert them in sorted order? The data in the database is ordered like set datatype in Redis.
Generally order of records/documents in most (all?) databases is undefined. If you want them returned in a specific order - specify the order when querying. MongoDB is not an exception.
You can't affect what physical location will the new document go to, but you can query by insertion order (see $natural).

Retrieve all values of an indexed variable in Mongodb without iterating over all records

I have a large MongoDB database consisting of millions of records. I want to retrieve all values for a variable which has an index associated with it. Is there a method to retrieve all values for this indexed variable which is faster than iterating over all records?
When you writing query in mongo it use own query analyser which will use optimal indexes for the query.
For considering which index will use by the analyser add .explain() at the end of the query.