Doubt in clustered and non Clustered index - windows-xp

I have a doubt that if my table do n't have any constraint like Primary Key,Foreign key,Unique key etc. then can i create the clustered index on table and clustered index can have the douplicate records ?
My 2nd question is where should we exectly use the non clustered index and when it is useful and benificial to create in table?
My 3rd question is How can we create the 249 non clustered index in a table .Is it the meaning, Creating the non clustered index on 249 columns ?
Can you anyone help me to remove my confusion in this.

First, the definition of a clustered index is that it is physical ordering of data on the disk. Every time you do an insert into that table, the new record will be placed on the physical disk in its order based on its value in the clustered index column. Because it is the physical location on the disk, it is (A) the most rapidly accessible column in the table but (B) only possible to define a single clustered index per table. Which column (or columns) you use as the clustered index depend on the data itself and its use. Primary keys are typically the clustered index, especially if the primary key is sequential (e.g. an integer that increments automatically with each insert). This will provide the fastest insert/update/delete functionality. If you are more interested in performing reads (select * from table), you may want to cluster on a Date column, as most queries have either a date in the where clause, the group by clause or both.
Second, clustered indexes (at least in the DB's I know) need not be unique (they CAN have duplicates). Constraining the column to be unique is separate matter. If the clustered index is a primary key its uniqueness is a function of being a primary key.
Third, I can't follow you questions concerning 249 columns. A non-clustered index is basically a tool for accelerating queries at the expense of extra disk space. It's hard to think of a case where creating an index on each column is necessary. If you want a quick rule of thumb...
Write a query using your table.
If a column is required to do a join, index it.
If a column is used in a where column, index it.
Remember all the indexes are doing for you is speeding up your queries. If queries run fast, don't worry about them.
This is just a thumbnail sketch of a large topic. There are tons of more informative/comprehensive resources on this matter, and some depend on the database system ... just google it.

Related

Difference between BRIN index and table partitioning in PostgreSQL

What is the difference between a BRIN index and a table partition in PostgreSQL? When I should use one instead of another? It seems that they provide very similar benefits and also have similar use cases
Example
Suppose we have the following table structure
CREATE TABLE orders (
id SERIAL PRIMARY KEY,
store_id INT,
client_id INT,
created_at timestamp,
information jsonb
)
that has the following characteristics:
orders can only be inserted, deletions are not allowed and updates are very rare and they don't involve the created_at column
the created_at column contains the timestamp of the insertion of the row in the database thus the values in the column are strictly increasing
almost every query use the created_at column in a condition and some of them may use the store_id and client_id columns
the most accessed rows are the most recent ones in terms of the created_at column
some queries may return a few records (example: analyzing a single record or the records created in a small time interval) while others may scan a vast amount of records (example: aggregate functions for a dashboard functionality)
I have chosen this example because it's very common and also both approach could be used (in my opinion). In this case which choice should I use between a BRIN index on the whole table or a partitioned table with maybe a btree index (or just a simple btree index without partitioning)? Does the table dimension influence the choice?
I have used both features (although I'll caveat that my experience with partitioning is from back when you had to use inheritance + constraints, before the introduction of CREATE TABLE ... PARTITION BY). You are correct that they seem similar-ish on a surface level, but they function by completely different mechanisms.
Table partitioning basically works as follows: replace all references to table with (select * from table_partition1 union all select * from table_partition2 /* repeat for all partitions */). The partitions will have a constraint on the partition columns, so that if those columns appear in a WHERE, the constraints can be applied up-front to prune which partitions are actually scanned. IOW, if table_partition1 has CHECK(client_id=1), and your WHERE Has client_id=2, table_partition1 will be skipped since the table constraint automatically excludes all rows from this partition from passing that WHERE.
BRIN indexes, in contrast, choose a block size for the table, and then for each block, records a min/max bound of the indexed column. This allows WHERE conditions to skip entire blocks when we can see, say, that the maximum created_at in a particular block of rows is below a created_at>={some_value} clause in your WHERE.
I can't tell you a definitive answer for your case as to which would work better. Well, that's not true, actually: the definitive answer is, "benchmark it for your own data" ;)
This is kind of fuzzy, but my general feeling is that BRIN is lightweight, and table partitioning is not. BRIN is something that can be added on to an existing table without much trouble, the indexes themselves are very small, and the impact on writes is not major (at least, not without inordinately many indices). Table partitioning, on the other hand, is a different way of representing the data on-disk; you are actually determining into which data files particular rows will be written. This requires a much more involved migration process when introducing it to an existing dataset.
However, the set of query optimizations available for table partitioning is much greater. Not only is there the constraint exclusion I described above, but you can also have indices (even BRIN ones!) on each individual partition. Of course, you can also have BRIN + other indices on a single-big-table, but I'm not sure that is particularly helpful IRL.
A few other thoughts: BRIN is good for monotonic data (timestamps, incremnting IDs, etc); the more correlated the on-disk ordering is to the indexed value, the more effective a BRIN index can be at pruning blocks to be scanned. Things like customer IDs, however, are unlikely to work well with BRIN; any given block of rows is likely to have at least one relatively low and relatively high ID. However, fields that like work quite well for partitioning: a partition-per-client, or partitioning on the modulus of a customer ID (which would more commonly be called sharding), is a good way of scaling horizontally, almost without bound.
Any update, even if it does not change the indexed column, will make a BRIN index pretty useless (unless it is a HOT update). Even without that, there are differences, for example:
partitioning allows you to get rid of lots of data efficiently, a BRIN index won't
a partitioned table allows one autovacuum worker per partition, which improves autovacuum performance
But if your only concern is to efficiently select all rows for a certain value of the index or partitioning key, both may offer about the same benefit.

Are these indexes doing the same thing in respect to customer_id?

I'm pretty new to PostgreSQL so apologies if I'm asking the obvious.
I've got a table called customer_products. It contains the following two indexes:
CREATE INDEX customer_products_customer_id
ON public.customer_products USING btree (customer_id)
CREATE UNIQUE INDEX customer_products_customer_id_product_id
ON public.customer_products USING btree (customer_id, product_id)
Are they both doing the same thing in respect to customer_id or do they function in a different way? I'm not sure if I should leave them or remove customer_products_customer_id.
There is nothing that the first index can do that the second cannot, so you should drop the first index.
The only advantage of the first index over the second when it comes to queries whose WHERE (or ORDER BY) clause involves customer_id only is that the index is smaller. That makes a range scan over many index entries somewhat faster.
The price for an extra index in terms of size and data modification speed usually outweighs that advantage. In a read-only data warehouse where I have a query that profits significantly I may be tempted to keep both indexes, otherwise I wouldn't.
You should definitely not drop the UNIQUE index, because it has a valuable use that has nothing to do with performance: it prevents the table from containing two rows that have the save values for the indexed columns. If that is what you want to guarantee, a UNIQUE index will make sure that your data keep in good shape.
Side remark: even though the effect is the same, it is better if the table has a unique constraint (which is backed by a unique index) than just having the index. If nothing else, it documents the purpose better.

Which index is used to answer aggregates when we have several indexes?

I have a table which is partitioned on daily basis, each partition has certainly a primary key, and several other indexes on columns which are not null. If I get the query plane for the following:
SELECT COUNT(*) FROM parent_table;
I can see different indexes are used, sometimes the primary key index is used and some times others. How postgres is able to decide which index to use. Note that, my table is not clustered and never clustered before. Also, the primary key is serial.
What are the catalog / statistics tables which are used to make this decision.

Postgres - unique index on primary key

On Postgres, a unique index is automatically created for primary key columns. From the docs,
When an index is declared unique, multiple table rows with equal
indexed values are not allowed. Null values are not considered equal.
A multicolumn unique index will only reject cases where all indexed
columns are equal in multiple rows.
From my understanding, it seems like this index only checks uniqueness and isn't actually present for faster access when querying by primary key id's. Does this mean that this index structure doesn't consist of a sorted table (or a tree) for the primary key column? Is this correct?
In theory a unique or primary key constraint could be enforced without the presence of an index, but it would be a painful process. The index is mainly there for performance purposes.
However some databases (eg Oracle) allow a unique or primary key constraint to be supported by a non-unique index. Primarily this allows the enforcement of the constraint to be deferred until the end of a transaction, so lack of uniqueness can be permitted temporarily during a transaction, but also allows indexes to be built in parallel and with the constraint then defined as a secondary step.
Also, I'm not sure how the internals work on a PostgreSQL btree index, but all Oracle btree's are internally declared to be unique either:
on the key column(s), for an index that is intended to be UNIQUE, or
on the key column(s) plus the indexed row's ROWID, for a non-unique index.
Quite the contrary, The index is created in order to allow faster access - mainly to check for duplicates when a new record is inserted but can also be used by other queries against PK columns. The best structure for uk indexes is a btree because during the insert the index is created - If the rdbms detects collision in the leaf he will raise a unique constraint violation.

Confused between clustered and nonclustered index. Contains 5 doubts

Do the clustered and non-clustered indexes both work on B-Tree? I read that clustered indexes affect the way how the data is physically stored in table whereas with non-clustered indexes a separate copy of the column is created and that is stored in sorted order. Also, Sql Server creates clustered indexes on primary key by default.
Does that mean :
1) Non clustered indexes occupy more space than clustered indexes since a separate copy of column is stored in non clustered?
2) How does the clustered and non clustered index work when we have primary key based on two columns say.. (StudentName,Marks)?
3) Are there only 2 types of indexes? If so, then what are bitmap indexes? I can't seem to find any such index type in Sql Server Management Studio but in my datawarehousing book all these types are mentioned.
4) Is creating clustered or non-clustered index on primary key effecient?
5) Suppose we create clustered index on name i.e data is physically stored in sorted order name wise then a new record is created. How will the new record find it's place in table?
Thanks in advance :)
Indexes are structures stored separately from the actual datapages and simply contain pointers to the datapages. In SQL Server indexes are B-Trees.
Clustered indexes sort and store the datapages in the table according to the columns defined for the index. In SQL Server 2005 you can add additional columns to an index so it should not be a problem when you have composite primary keys. You can think of a clustered index like a set of filing cabinets with folders. In the first draw you have documents starting with A and in the first folder of that draw you may have documents starting from AA to AC and so on. To search for "Spider" then, you can jump straight to the S draw and look for the folder containing "SP" and quickly find what you are looking for. But it is obvious that if you sort all documents physically by one index then you cannot physically sort the same set of documents by another index. Hence, only one clustered index per table.
A Non Clustered index is a separate structure much like the table of contents or the index at the back of a book. So I think I have only answered some of your questions specifically:
Yes the index does occupy space but not as much as the original table. That is why you must choose your indexes carefully. There is also a small performance hit for update operations since the index has to be maintained.
Your book will mention all the theoretical types of indexes. Bitmap indexes are useful in data warehousing applications or for data that has a few distinct values like days of the week etc. So they are not generally used in your basic RDBMS. I know that Oracle has some implementations but I don't know much about that.
I think that efficiency of an index is determined by how the field is used. It is expected that the majority of the data scanning in your table will be done on the primary key then an index on the primary key makes sense. You usually add indexes to columns that appear in the where clause or the join condition of your queries.
On insert the index has to be maintained, so there is a little extra work that has to be done by the system to rearrange things a bit.