Micronaut Data - Pageable or Flowable.skip().limit()? - rx-java2

I was wondering if someone could explain the differences and pros and cons of each approach. Ie. the underlying queries to DBs, performance, etc
RxJava approach
RxJavaCrudRepository.findAll().skip(offset).limit(max)
Pageable approach
CrudRepository.findAll(Pageable.from(offset, max))

So after some digging around and debugging the resulting SQL, I have come to the conclusion that:
Approach 1 doesnt do any under-the-hood magic and uses a SELECT without LIMIT, fetching all rows from DB and then applying the skip/offset. This means it definitely should not be used and approach 2 is the way to go.

Related

How do I efficiently page a large collection of query results with Sails.js / Waterline?

I'm working with a large dataset behind the Waterline ORM. In several use-cases I need to do some processing on many/most of the record–10's of thousands.
So far I've been working with .find(), but that executes and returns the entire result set. Is there a Sails/Waterline approach to iterating over a query result–which preserves the storage-agnostic aspect of the ORM?
You can use paginate, something like -> Model.find().paginate({page: xx, limit: xx});
More info here: http://sailsjs.org/documentation/concepts/models-and-orm/query-language
Search for pagination :)
If you want to keep the storage agnostic waterline trait you will have to take a look to your actual schema implementation (even if you're coding storage agnostic).
You can:
Use pagination like #holzanic answers, however this might come up with critital performance issues in some storage technologies.
Use streams.
If you will be listing whole objects from a Model, you can make sure you can craft paginate by id. You can take first n elements in a query and then try to obtain the next page where their id attribute is bigger than last received in previous page.

MongoDB Using Map Reduce against Aggregation

I have seen this asked a couple of years ago. Since then MongoDB 2.4 has multi-threaded Map Reduce available (after the switch to the V8 Javascript engine) and has become faster than what it was in previous versions and so the argument of being slow is not an issue.
However, I am looking for a scenario where a Map Reduce approach might work better than the Aggregation Framework. Infact, possibly a scenario where the Aggregation Framework cannot work at all but the Map Reduce can get the required results.
Thanks,
John
Take a look to this.
The Aggregation FW results are stored in a single document so are limited to 16 MB: this might be not suitable for some scenarios. With MapReduce there are several output types available including a new entire collection so it doesn't have space limits.
Generally, MapReduce is better when you have to work with large data sets (may be the entire collection). Furthermore, it gives much more flexibility (you write your own aggregation logic) instead of being restricted to some pipeline commands.
Currently the Aggregation Framework results can't exceed 16MB. But, I think more importantly, you'll find that the AF is better suited to "here and now" type queries that are dynamic in nature (like filters are provided at run-time by the user for example).
A MapReduce is preplanned and can be far more complex and produce very large outputs (as they just output to a new collection). It has no run-time inputs that you can control. You can add complex object manipulation that simply is not possible (or efficient) with the AF. It's simple to manipulate child arrays (or things that are array like) for example in MapReduce as you're just writing JavaScript, whereas in the AF, things can become very unwieldy and unmanageable.
The biggest issue is that MapReduce's aren't automatically kept up to date and they're difficult to predict when they'll complete). You'll need to implement your own solution to keeping them up to date (unlike some other NoSQL options). Usually, that's just a timestamp of some sort and an incremental MapReduce update as shown here). You'll possibly need to accept that the data may be somewhat stale and that they'll take an unknown length of time to complete.
If you hunt around on StackOverflow, you'll find lots of very creative solutions to solving problems with MongoDB and many solutions use the Aggregation Framework as they're working around limitations of the general query engine in MongoDB and can produce "live/immediate" results. (Some AF pipelines are extremely complex though which may be a concern depending on the developers/team/product).

Iterate JPA query result

Is there a way to iterate query results through JPA API, similar to Hibernate's Criteria.scroll() method? It is big performance improvement with big resultsets, rows can be processed as they are read.
No, there is not such a construct in JPA. Also not in JPA 2.1.
JPA 2.2 provides TypedQuery.getResultStream(), but default implementation does not have expected effect (calls getResultList). Also specific implementations do not always lead to performance improvements, as can be seen in this article.

Is mongoDB efficient in doing multi-key lookups?

I'm evaluating MongoDB, coming from Membased/memcached because I want more flexibility.
Of course Membase is excellent in doing fast (multi)-key lookups.
I like the additional options that MongoDB gives me, but is it also fast in doing multi-key lookups? I've seen the $or and $in operator and I'm sure I can model it with that. I just want to know if it's performant (in the same league) as Membase.
use-case, e.g., Lucene/Solr returns 20 product-ids. Lookup these product-ids in Couchdb to return docs/ appropriate fields.
Thanks,
Geert-Jan
For your use case, I'd say it is, from my experience: I hacked some analytics into a database of mine that made a lot of $in queries with thousands of ids and it worked fine (it was a hack). To my surprise, it worked rather well, in the lower millisecond area.
Of course, it's hard to compare this, and -as usual- theory is a bad companion when it comes to performance. I guess the best way to figure it out is to migrate some test data and send some queries to the system.
Use MongoDB's excellent built-in profiler, use $explain, keep the one index per query rule in mind, take a look at the logs, keep an eye on mongostat, and do some benchmarks. This shouldn't take too long and give you a definite and affirmative answer. If your queries turn out slow, people here and on the news group probably have some ideas how to improve the exact query, or the indexation.
One index per query. It's sometimes thought that queries on multiple
keys can use multiple indexes; this is not the case with MongoDB. If
you have a query that selects on multiple keys, and you want that
query to use an index efficiently, then a compound-key index is
necessary.
http://www.mongodb.org/display/DOCS/Indexing+Advice+and+FAQ#IndexingAdviceandFAQ-Oneindexperquery.
There's more information on that page as well with regard to Indexes.
The bottom line is Mongo will be great if your indexes are in memory and you are indexing on the columns you want to query using composite keys. If you have poor indexing then your performance will suffer as a result. This is pretty much in line with most systems.

Can I search across collections in MongoDB?

I am inserting my data into MongoDB and had 240 such files. Instead of inserting everything into one big collection, I was thinking of inserting the files as a collection by themselves. Is this a good idea if I do a lot of queries on a commonly indexed column?
If so, how can I initiate a query to query all the collections in my database?
Using an application server such as Solr can help you achieve what you want, also with the addition of fuzzy matching, synonyms, phonetic matching, misspellings, etc.
Solor is built on top of Lucene. It's docs are here:
http://lucene.apache.org/solr/
The learning curve is a little bit steep, but you can get pretty good searchability using much of its defaults, leaving you to build a schema and index your data to get started.
I think the answer you're looking for is really here on your other question: Is there any multicore exploiting NoSQL system?
There is no way to query across all collections in Mongo. It wouldn't make a lot of sense to do so. MongoDB's strength is focused on tactically denormalizing data into collections. Providing operations to query across all collections run exactly counter to the concept of tactical denormalization.
In theory, you could just run 240 queries. But more practically you'll probably end up "partitioning" your data so that you only need to query some of the collections. At this point you end up back at the link I provided above, which suggests that sharding your data is probably the answer here.