How to set h2 to stream resultset? - streaming

Mysql driver has options to set so that resultset will not be read completely in memory as in here http://dev.mysql.com/doc/connector-j/en/connector-j-reference-implementation-notes.html#ResultSet.
Is there an equivalent option for H2?
Thanks,

H2 currently does not support server side cursors. However, it buffers large result sets to disk (as a separate file, or as a temporary table). The disadvantage is speed, but it should not be a memory usage problems.
You can set the size of the when H2 will buffer to disk using set max_memory_rows. You can append that to the database URL: jdbc:h2:~/test;max_memory_rows=100000.
A workaround is usually to use "keyset paging" as described in the presentation "Pagination Done the Right Way". That would mean running multiple queries instead of one.
My plan is to implement server side cursors in H2 in the next months.

Related

Why postgres database using double buffer method?

Postgres using double buffer for performing operation. But why? What is the need for it.
please explain .
I guess you are referring to shared buffers and the kernel cache, both of which cache PostgreSQL data since PostgreSQL uses buffered I/O rather than direct I/O.
The reason is probably that PostgreSQL aims to be very portable, so using buffered I/O with its higher level of abstraction from the system was the easier choice.
There is definitely a desire to change that, but that is a big project, if it ever comes to pass.

What is the correct way to enable query cache?

Based on the documentation, the super privilege is not supported, which means that the following query:
SET GLOBAL query_cache_size = 1000000;
results in an error message
Access denied; you need (at least one of) the SUPER privilege(s) for this operation
and does not allow us to set the query cache size.
What's the correct way to accomplish the task?
Unfortunately, Cloud SQL does not support query caching and query_cache_size cannot be set.
If you are experiencing performance issues, you can try changing your instance tier to give your instance access to more resources. Also, it is preferable to use InnoDB over MyISAM tables. The reason for this is because when a Cloud SQL instance is started, it gives most of the available memory to the InnoDB buffer pool.
As mhalt hints at, there is a good reason not to use the query cache:
You should be using InnoDB rather than MyISAM, as MyISAM is not robust enough for the cloud environment.
InnoDB has built in caching as part of it's buffer pool. This caches individual pages of data, rather than entire result sets.
The buffer pool generally provides superior caching to the query cache: 1) it does not get flushed after writes 2) multiple different queries can be served using the same cache entries 3) it supports partial caching if the active set is larger than available ram.
The only workload where the query cache is superior is if you have a very low write rate and almost all your queries are exactly the same.
For this reason Cloud SQL is optimized by maximizing RAM allocated to the buffer pool instead of having a query cache.
CloudSQL now support query_cache flags.
https://cloud.google.com/sql/docs/mysql/flags
But these options may break the SLA coverage.

Multiple connections to PostgreSQL with huge number of INSERTs

This question is involved by this one: How to speed up insertion performance in PostgreSQL
So, I have java application which is doing a lot of (aprox. billion) INSERTs into PostgreSQL database. It opens several JDBC connections to the same DB for doing these inserts in parallel. As I read in mentioned question-answer:
INSERT or COPY in parallel from several connections. How many depends
on your hardware's disk subsystem; as a rule of thumb, you want one
connection per physical hard drive if using direct attached storage.
But in my case I have only one disk storage for my DB.
So, my question is: does it really have sence to open several connections in this case? Could it reduce perfomance instead of desired increasing due to I/O operations competitions?
For clarifying, here is the picture with actual postgresql processes load:
Since you mentioned INSERT in Java application, I'd assume (utilizing plain JDBC) COPY is not what you're looking for. Without using API like JPA or framework such as Spring-data, may I introduce addBatch() and executeBatch() in case you haven't heard of these:
/*
the whole nine yards
*/
Connection c = ...;
PreparedStatement ps = c.prepareStatement("INSERT INTO table1(columnInt2,columnVarchar)VALUES(?,?)");
Then read data in a loop:
ps.setShort(1, someShortValue);
ps.setString(2, someStringValue);
ps.addBatch(); // one row at a time from human's perspective
When data of all rows are prepared:
ps.executeBatch();
May I also recommend:
Connection pooling which saves you a lot of resources; check out Commons DBCP, c3p0 and BoneCP.
When doing multiple CUD (create, update, delete) operations, one should think about transaction (so you can rollback in case any row goes wrong).

using postgres server-side cursor for caching

To speed page generation for pages based on large postgres collections, we cache query results in memcache. However, for immutable collections that are very large, or that are rarely accessed, I'm wondering if saving server side cursors in postgres would be a viable alternate caching strategy.
The idea is that after having served a page in the middle of a collection "next" and "prev" links are much more likely to be used than a random query somewhere else in the collection. Could I have a cursor "WITH HOLD" in the neighborhood to avoid the (seemingly unavoidable) large startup costs of the query?
I wonder about resource consumption on the server. If the collection is immutable, saving the cursor shouldn't need very many resources, but I wonder how optimized postgres is in this respect. Any links to further documentation would be appreciated.
You're going to run into a lot of issues.
You 'd have to ensure the same user gets the same sql connection
You have to create a cleanup strategy
The cursors will be holding up vacuum operations.
You have to convince the connection pool to not clear the cursors
Probably other issues I have not mentioned.
In short: don't do it. How about precalculating the next/previous page in background, and storing it in memcached?
A good answer to this has previously been made Best way to fetch the continuous list with PostgreSQL in web
The questions are similar, essentially you store a list of PKs on the server with a pagination-token and an expiration.

Using memcache infront of a mongodb server

I am trying to understand how mongo's internal cache works and if it does eliminate using memcache. Our database size is around 200G and index fits in the memory but after the index not much free memory left on the server.
One of my colleague says mongo's internal cache will be as fast as memcache so no need to introduce another level of complexity by using memcache.
The scenario in my head is when we read the data from db, it's saved in memcache and next time it's directly read from the cache instead of going back to db server. If the data is changed and needs to be saved/updated, it's done on both memcache server and database server.
I have been reading about this but couldn't convince myself yet. So I'd really appreciate if someone could shed some light on this.
First thing is that a cache storage is different to a database. So MongoDB and SQL are different in purpose and usage when compared to Memcache.
Memcache is really good at lowering working set sizes for queries. For example: imagine a huge aggregated query with subselects and CASE statements and what not in SQL (think of the most complex query you can), doing this query in realtime all the time could cause the computer(s) to "thrash" (not to mention the problems client side).
However as everyone knows you need only summarise this query to another collection/table for it to be instantly faster. The real speed of memcache comes from the fact that it is a in memory key value store. This is where MongoDB could fail in speed because it is not memory stored, it is memory mapped but not stored.
MongoDB does no self caching, providing the query is "hot" and in LRU (this is where your working set comes in) you shouldn't notice much of a difference in response times. A good way to ensure a query is "hot" is to run it. Some people have a script of their biggest queries that they run to warm up the cache.
As I said memcache is a cache layer this is why:
If the data is changed and needs to be saved/updated, it's done on both memcache server and database server.
Makes me die a little inside. Many do blur the line between the DB and the cache layer.