Order of output values when select one field from table. SQL server - select

In what order will the field values be displayed for "select field from table".
Table is a clustered index
Anything or as in the table?
it looks like in the table. But I would like to know for sure
Does it matter in this case the table is a clustered index or not?

Without an explicit ORDER BY - there's no defined / reliable ordering.
Relational database are inherently unordered - so you cannot rely on any ordering - regardless of whether that table has a clustered index or not - UNLESS you explicitly specify what ordering you need by using ORDER BY .....

Related

SQL Top Function with a clustered index

The order that the records come in is not always guaranteed unless I use an order by clause.
If I throw a clustered index on a table and then do a select top 100, for example, would the 100 rows returned always be the same?
I am asking this because a clustered index sorts the data physically on the key value.
I am lead to believe so from my observations, but wanted to see what others thought.
No. The rule is simple: SQL tables and result sets represent unordered sets. The only exception is a result set associated with a query that has an ORDER BY in the outermost SELECT.
A clustered index affects how data is stored on each page. However, it does not guarantee that a result set built on that table will even use the clustered index.
Consider a table that has a primary, clustered key on id and a query that returns:
select top (100) othercol
from t;
This query could use an index on othercol -- avoiding the clustered index altogether.

Sort & Dist keys selection for a dimension table in Redshift database

In Redshift database, I want to decide a sort key for a dimension table between surrogatekey and natural primary key. The definition says "Sort keys should be selected based on the most commonly used columns when filtering, ordering or grouping the data".
My Question is -
I have a Employee table with (Emp_key,Emp_Id,Emp_name) and this table is joined to Fact table on Emp key. Here "Emp_key" is the surrogate key and "Emp_id" is the natural primary key. And I filter the query on Emp_id but "Emp_key" in the fact table is defined as a "dist key" and read that for a large dimension defining sort & dist keys on the join keys results in better performance and so I want to know which one should i choose between Emp_key and Emp_id for Sort key in a dimension table?
And also, another confusion is choosing sort for the "date" dimension table between "date_key" or ignore defining sort key.
I would appreciate your suggestions in this regard.
Thank you!
Your employee table likely doesn't contain too many rows, you can choose ALL distribution style, so the copy of the table sits on every node of your cluster. This way you'll avoid this dilemma at a very low cost.
UPD: with this design, I would have emp_key as dist key (so that data that is joined sits on the same nodes) and emp_id as sort key (to filter efficiently). I'm pretty sure the query planner would prioritize filtering over joining, so first it will filter the rows from the dimension table and only then it will join the corresponding rows from the fact table. But it's better to try all options and benchmark a few queries to see what works best.
If you can change the design I would just add emp_id to the fact table (because it seems like the keys map 1 to 1) as a part of ELT and avoid the dilemma again.

Understanding indexes and performance as they relate to indexed column and non-indexed column data in the same row

I have some tables that are around 100 columns wide. I haven't normalized them because to put it back together would require almost 3 dozen joins and am not sure it would perform any better... haven't tested it yet (I will) so can't say for sure.
Anyway, that really isn't the question. I have been indexing columns in these tables that I know will be pulled frequently, so something like 50 indexes per table.
I got to thinking though. These columns will never be pulled by themselves and are meaningless without the primary key (basically an item number). The PK will always be used for the join and even in simple SELECT queries, it will have to be a specified column so the data makes sense.
That got me thinking further about indexes and how they work. As I understand them the locations of a values are committed to memory for that column so it is quickly found in a query.
For example, if you have:
SELECT itemnumber, expdate
FROM items;
And both itemnumber and expdate are indexed, is that excessive and really adding any benefit? Is it sufficient to just index itemnumber and the index will know that expdate, or anything else that is queried for that item, is on the same row?
Secondly, if multiple columns constitute a primary key, should the index include them grouped together, or is individually sufficient?
For example,
CREATE INDEX test_index ON table (pk_col1, pk_col2, pk_col3);
vs.
CREATE INDEX test_index1 ON table (pk_col1);
CREATE INDEX test_index2 ON table (pk_col2);
CREATE INDEX test_index3 ON table (pk_col3);
Thanks for clearing that up in advance!
Uh oh, there is a mountain of basics that you still have to learn.
I'd recommend that you read the PostgreSQL documentation and the excellent book “SQL Performance Explained”.
I'll give you a few pointers to get you started:
Whenever you create a PRIMARY KEY or UNIQUE constraint, PostgreSQL automatically creates a unique index over all the columns of that constraint. So you don't have to create that index explicitly (but if it is a multicolumn index, it sometimes is useful to create another index on any but the first column).
Indexes are relevant to conditions in the WHERE clause and the GROUP BY clause and to some extent for table joins. They are irrelevant for entries in the SELECT list. An index provides an efficient way to get the part of a table that satisfies a certain condition; an (unsorted) access to all rows of a table will never benefit from an index.
Don't sprinkle your schema with indexes randomly, since indexes use space and make all data modification slow.
Use them where you know that they will do good: on columns on which a foreign key is defined, on columns that appear in WHERE clauses and contain many different values, on columns where your examination of the execution plan (with EXPLAIN) suggests that you can expect a performance benefit.

How multiple indexes in postgres work on the same column

I was wondering I'm not really sure how multiple indexes would work on the same column.
So lets say I have an id column and a country column. And on those I have an index on id and another index on id and country. When I do my query plan it looks like its using both those indexes. I was just wondering how that works? Can I force it to use just the id and country index.
Also is it bad practice to do that? When is it a good idea to index the same column multiple times?
It is common to have indexes on both (id) and (country,id), or alternatively (country) and (country,id) if you have queries that benefit from each of them. You might also have (id) and (id, country) if you want the "covering" index on (id,country) to support index only scans, but still need the stand along to enforce a unique constraint.
In theory you could just have (id,country) and still use it to enforce uniqueness of id, but PostgreSQL does not support that at this time.
You could also sensibly have different indexes on the same column if you need to support different collations or operator classes.
If you want to force PostgreSQL to not use a particular index to see what happens with it gone, you can drop it in a transactions then roll it back when done:
BEGIN; drop index table_id_country_idx; explain analyze select * from ....; ROLLBACK;

Setting constraint for two unique fields in PostgreSQL

I'm new to postgres. I wonder, what is a PostgreSQL way to set a constraint for a couple of unique values (so that each pair would be unique). Should I create an INDEX for bar and baz fields?
CREATE UNIQUE INDEX foo ON table_name(bar, baz);
If not, what is a right way to do that? Thanks in advance.
If each field needs to be unique unto itself, then create unique indexes on each field. If they need to be unique in combination only, then create a single unique index across both fields.
Don't forget to set each field NOT NULL if it should be. NULLs are never unique, so something like this can happen:
create table test (a int, b int);
create unique index test_a_b_unq on test (a,b);
insert into test values (NULL,1);
insert into test values (NULL,1);
and get no error. Because the two NULLs are not unique.
You can do what you are already thinking of: create a unique constraint on both fields. This way, a unique index will be created behind the scenes, and you will get the behavior you need. Plus, that information can be picked up by information_schema to do some metadata inferring if necessary on the fact that both need to be unique. I would recommend this option. You can also use triggers for this, but a unique constraint is way better for this specific requirement.