perf cost of API calls - database-performance

Sorry if this has already been asked, I didn't find any obvious match.
I am an SQL beginer learning with sqlite.
Is there a document that specify the per consequence of an API call or an internal documentation.
I'd like to know what is a cost of dropping things, say dropping a table, does it degenerate into freeing each table element rowcolelement or is it just a matter of placing the table on freelist and all space it use are now free.
Thanks in advance
Cheers,
Phi

Related

Why does paginating with offset using PSQL make sense?

I've been looking into pagination (paginate by timestamp) with a PSQL dbms. My approach currently is to build a b+ index to greatly reduce the cost of finding the start of the next chunk. But everywhere I look in tutorials and on NPM modules like express-paginate (https://www.npmjs.com/package/express-paginate), people seem to get chunks using offset one way or the other or fetching all the data anyways but simply sending them in chunks which to me doesn't seem to be a complete optimization that pagination is for.
I can see that they're still making an optimization by lazy loading and streaming the chunks (thus saving bandwidth and any download/processing time on the client-side), but since offset on psql still requires scanning previous rows. In the worst case where a user wants to view all the data, doesn't this approach have a very high server cost since if you have per say n chunks, you're accessing the first chunk n times, the second chunk n-1 times, the third chunk n-2 times, etc. I understand that this is really in terms of IOs so it's not that expensive but it still bothers me?
Am I missing something very obvious here? I feel like I am because there seems to be a lot more established and experienced engineers who seem to be using this approach. I'm guessing there is some part of the equation or mechanism that I'm just missing from my understanding.
No, you understand this quite well.
The reason why so many people and tools still advocate pagination with OFFSET and LIMIT (or FETCH FIRST n ROWS ONLY, to use the standard's language) is that they don't know a lot about databases. It is easy to understand LIMIT and OFFSET even if you the word “index” to you has no other meaning than ”the last pages in a book”.
There is another reason: to implement key set pagination, you must have an ORDER BY clause in your query, that ORDER BY clause has to contain a unique column, and you have to create an index that supports that ordering.
Moreover, your database has to be able to handle conditions like
... WHERE (name, id) > ('last_found', 42)
and support a multi-column index scan for them.
Since many tools strive to support several database systems, they are likely to go for the simple but inefficient method that works with every query on most database systems.

Should I create an index on a column with repetitive values and where in lookup

I have a materialized view (which is very much a table) where I need to make where in kind of queries.
The column I want to query (say view_id) definitely has repetitions (15-20).
The where in queries would also be very large i.e - it would contain a lot of view_id to query.
Should I go ahead and create an index on this column?
Will it give me some performance improvements?
I have another column which would help help me have a multi column index(unique). Should this be a better option?
With questions such as these on performance, there is no substitute for testing it with your exact case. There's little harm in trying it out (even on a production system, but utilize a test system if you can!), other than perhaps slowing performance until you undo what you did. Postgres makes this kind of tinkering safe.
#tim-biegeleisen's first comment is spot on: with your setup, your cardinality is reduced, but that doesn't mean it's not a win.
In short, try it and see. There is no better answer you will get than what your own dataset and access patterns will give you.

Is it practical to use one table for reading purpose only in a relational database?

I know this question would not be ideal in a real database world, however, I am building a web REST api to server a result that potentially need to join almost every table(i use normalization for sure).
So is it OK to do have one single table to hold the meta data used for reading API, but the table get updated as well when data updated in other tables? I am using PostgreSQL by the way.
This is not very clear so I will state my understanding of the question and give you what I see are the tradeoffs.
First.... It sounds to me like you want to effectively materialize a metadata table and have it live-updated when other tables update. This is not really what the MATERIALIED VIEW support in PostgreSQL is for.
You can use a trigger to update the data whenever something changes. Because of the way PostgreSQL handles things, this leads to more disk and CPU activity, but will probably add more on the latter than the former. So if you hare heavily CPU-bound that will pose more problems than if you are I/O bound.
Using triggers in this way adds a fair bit of complexity to your database and may reduce write scaling a bit but if the data is seldom written but read frequently it may be a clear win.
So in answer to your question, yes it is practical in at least some cases. Whether it is practical in your case, that will be for you to decide.

Is it a good idea to store attributes in an integer column and perform bitwise operations to retrieve them?

In a recent CODE Magazine article, John Petersen shows how to use bitwise operators in TSQL in order to store a list of attributes in one column of a db table.
Article here.
In his example he's using one integer column to hold how a customer wants to be contacted (email,phone,fax,mail). The query for pulling out customers that want to be contacted by email would look like this:
SELECT C.*
FROM dbo.Customers C
,(SELECT 1 AS donotcontact
,2 AS email
,4 AS phone
,8 AS fax
,16 AS mail) AS contacttypes
WHERE ( C.contactmethods & contacttypes.email <> 0 )
AND ( C.contactmethods & contacttypes.donotcontact = 0 )
Afterwards he shows how to encapsulate this in to a table function.
My questions are these:
1. Is this a good idea? Any drawbacks? What problems might I run in to using this approach of storing attributes versus storing them in two extra tables (Customer_ContactType, ContactType) and doing a join with the Customer table? I guess one problem might be if my attribute list gets too long. If the column is an integer then my attribute list could only be at most 32.
2. What is the performance of doing these bitwise operations in queries as you move in to the tens of thousands of records? I'm guessing that it would not be any more expensive than any other comparison operation.
If you wish to filter your query based on the value of any of those bit values, then yes this is a very bad idea, and is likely to cause performance problems.
Besides, there simply isn't any need - just use the bit data type.
The reason why using bitwise operators in this way is a bad idea is that SQL server maintains statistics on various columns in order to improve query performance - for example if you have an email column, SQL server can tell you roughly what percentage of values that email column are true and select an appropriate execution plan based on that knowledge.
If however you have flags column, SQL server will have absolutely no idea how many records in a table match flags & 2 (email) - it doesn't maintain these sorts of indexes. Without this sort of information available to it SQL server is far more likely to choose a poor execution plan.
And don't forget the maintenance problems using this technique would cause. As it is not standard, all new devs will probably be confused by the code and not know how to adjust it properly. Errors will abound and be hard to find. It is also hard to do reporting type queries from. This sort of trick stuff is almost never a good idea from a maintenance perspective. It might look cool and elegant, but all it really is - is clunky and hard to work with over time.
One major performance implication is that there will not be a lookup operator for indexes that works in this way. If you said WHERE contact_email=1 there might be an index on that column and the query would use it; if you said WHERE (contact_flags & 1)=1 then it wouldn't.
** One column stores one piece of information only - it's the database way. **
(Didnt see - Kragen's answer also states this point, way before mine)
In opposite order: The best way to know what your performance is going to be is to profile.
This is, most definately, an "It Depends" question. I personally would never store such things as integers. For one thing, as you mention, there's the conversion factor. For another, at some point you or some other DBA, or someone is going to have to type:
Select CustomerName, CustomerAddress, ContactMethods, [etc]
From Customer
Where CustomerId = xxxxx
because some data has become corrupt, or because someone entered the wrong data, or something. Having to do a join and/or a function call just to get at that basic information is way more trouble than it's worth, IMO.
Others, however, will probably point to the diversity of your options, or the ability to store multiple value types (email, vs phone, vs fax, whatever) all in the same column, or some other advantage to this approach. So you would really need to look at the problem you're attempting to solve and determine which approach is the best fit.

Time Travel functions in postgresql

Can anyone recommend for/against the time-travel functions in postgresql's contrib/spi module? Is there an example available anywhere?
Tnx
The argument for time-travel would be being able to look at tables that are updated often at an earlier insertion/deletion point. Say a table of stock prices for a firms investment portfolio.
The argument against would be the extra storage space it eats up.
Here is an Example of use.
See This discussion for an alternative approach to historical reporting.