I have some extensive queries (each of them lasts around 90 seconds). The good news is that my queries are not changed a lot. As a result, most of my queries are duplicate. I am looking for a way to cache the query result in PostgreSQL. I have searched for the answer but I could not find it (Some answers are outdated and some of them are not clear).
I use an application which is connected to the Postgres directly.
The query is a simple SQL query which return thousands of data instance.
SELECT * FROM Foo WHERE field_a<100
Is there any way to cache a query result for at least a couple of hours?
It is possible to cache expensive queries in postgres using a technique called a "materialized view", however given how simple your query is I'm not sure that this will give you much gain.
You may be better caching this information directly in your application, in memory. Or if possible caching a further processed set of data, rather than the raw rows.
ref:
https://www.postgresql.org/docs/current/rules-materializedviews.html
Depending on what your application looks like, a TEMPORARY TABLE might work for you. It is only visible to the connection that created it and it is automatically dropped when the database session is closed.
CREATE TEMPORARY TABLE tempfoo AS
SELECT * FROM Foo WHERE field_a<100;
The downside to this approach is that you get a snapshot of Foo when you create tempfoo. You will not see any new data that gets added to Foo when you look at tempfoo.
Another approach. If you have access to the database, you may be able to significantly speed up your queries by adding and index on on field
Related
I am using postgresql db.
my application manages many objects of the same type.
for each object my application performs intense db writing - each object has a line inserted to db at least once every 30 seconds. I also need to retrieve the data by object id.
my question is how it's best to design the database? use one huge table for all the objects (slower inserts) or use table for each object (more complicated retrievals)?
Tables are meant to hold a huge number of objects of the same type. So, your second option, that is one table per object, doesn't seem to look right. But of course, more information is needed.
My tip: start with one table. If you run into problems - mainly performance - try to split it up. It's not that hard.
Logically, you should use one table.
However, so called "write amplification" problem exhibited by PostgreSQL seems to have been one of the main reasons why Uber switeched from PostgreSQL to MySQL. Quote:
"For tables with a large number of secondary indexes, these
superfluous steps can cause enormous inefficiencies. For instance, if
we have a table with a dozen indexes defined on it, an update to a
field that is only covered by a single index must be propagated into
all 12 indexes to reflect the ctid for the new row."
Whether this is a problem for your workload, only measurement can tell - I'd recommend starting with one table, measuring performance, and then switching to multi-table (or partitioning, or perhaps switching the DBMS altogether) only if the measurements justify it.
A single table is probably the best solution if you are certain that all objects will continue to have the same attributes.
INSERT does not get significantly slower as the table grows – it is the number of indexes that slows down data modification.
I'd rather be worried about data growth. Do you have a design for getting rid of old data? Big DELETEs can be painful; sometimes partitioning helps.
For some reason I'm having a hard time getting over to some people that using a view in Postgres as you would use a table, is a bad idea.
As some background, there are a number of tables containing completely static data that is updated every few months via a batch import into different tables by date - table_201603 or table_201607. A view has then been created called 'table' which clients then use which is just a 'SELECT * FROM' of the table. When an updated batch of data is put into a new table the view is then updated to point at the new table. This means an in-place rename of the table does not need to take place that might mean downtime. This is in a version of Postgres before 9.3 where materialized views came in, just to clarify. These tables generally have about 100 million rows in them.
This is understandably leading to some confusing results when people are querying these views with very inconsistent query times. Sometimes queries are taking seconds, other times 20 or 30 milliseconds.
Additional: This is geospatial data, so they're doing geospatial queries on a view.
I know what many of the pitfalls here are - views are created on-the-fly like a sub-query, you're very much at the whim of the query planner as to what predicates get brought down and how long results are cached as results aren't physically stored as tables - but can anyone see anything else and suggest a better way of doing this? I can imagine this would be a reasonably common scenario so it might help others.
Thanks,
In general, this reminds me a use case for synonym. However, there are no synonyms in Postgres and they recommend using Views and or separation by schema
https://www.postgresql.org/message-id/kon2r2$mo6$1#ger.gmane.org
I have a analytic table that contains 10 million records and for producing charts i have to fetch records from analytic table. several other tables are also joined to this table and data is fetched currently But it takes around 10 minutes even though i have indexed the joined column and i have used Materialized views in Postgres.But still performance is very low it takes 5 mins for executing the select query from Materialized view.
Please suggest me some technique to get the result within 5sec. I dont want to change the DB storage structure as so much of code changes has to be done to support it. I would like to know if there is some in built methods for query speed improvement.
Thanks in Advance
In general you can take care of this issue by creating a better data structure(Most engines do this to an extent for you with keys).
But if you were to create a sorting column of sorts. and create a tree like structure then you'd be left to a search rate of (N(log[N]) rather then what you may be facing right now. This will ensure you always have a huge speed up in your searches.
This is in regards to binary tree's, Red-Black trees and so on.
Another implementation for a speedup may be to make use of something allong the lines of REDIS, ie - a nice database caching layer.
For analytical reasons in the past I have also chosen to make use of technologies related to hadoop. Though this may be a larger migration in your case at this point.
i am new at db2 i want to select around 2 million data with single query like that
which will select and display first 5000 data and in back process it will select other 5000 data and keep on same till end of the all data help me out with this how to write query or using function
Sounds like you want what's known as blocking. However, this isn't actually handled (not the way you're thinking of) at the database level - it's handled at the application level. You'd need to specify your platform and programming language for us to help there. Although if you're expecting somebody to actually read 2 million rows, it's going to take a while... At one row a second, that's 23 straight days.
The reason that SQL doesn't really perform this 'natively' is that it's (sort of) less efficient. Also, SQL is (by design) set up to operate over the entire set of data, both conceptually and syntactically.
You can use one of the new features, that incorporates paging from Oracle or MySQL: https://www.ibm.com/developerworks/mydeveloperworks/blogs/SQLTips4DB2LUW/entry/limit_offset?lang=en
At the same time, you can influence the optimizer by indicating OPTIMIZED FOR n ROWS, and FETCH FIRST n ROWS ONLY. If you are going to read only, it is better to specify this clause in the query "FOR READ ONLY", this will increase the concurrency, and the cursor will not be update-able. Also, assign a good isolation level, for this case you could eventually use "uncommitted read" (with UR). A Previous Lock table will be good.
Do not forget the common practices like: index or cluster index, retrieve only the necessary columns, etc. and always analyze the access plan via the Explain facility.
I have a quite slow data retrieval from a sqlite database on my iPhone and perhaps someone have an alternative idea to explain this. From what I tracked down so far sqlite3_step(statement) is sometimes unusually slow. While retrieving e.g. 50 rows from the database to execute this step takes normally some milliseconds but sometimes it takes several seconds.
My database is not small (80MB) and my theory is that the reason is paging. But can someone else think of an other reason for this?
Do you have a proper index on that table? Queries can be very slow if a full table scan is required to perform your query. See this page for example for some guidance on how to optimize your SQLite queries.
You may simply need to add an index to your table. Don't forget to reindex your table after you add your index (you'll need a tool, there's a free firefox add-on called "SQLite Manager" that does a pretty decent job for this)