T-SQL: How to retrieve a 1/2 of total row count (basing on some criteria) or 50 first rows?

T-SQL: How to retrieve a 1/2 of total row count (basing on some criteria) or 50 first rows? - tsql

I wondering about how in T-SQL can I retrieve - let's say - 1/2 of total row count form a table (basing on some criteria) or even the first 50 rows (basing on some criteria too) ?

To select the top 50 rows:
SELECT TOP 50 *
FROM table1
WHERE ...
ORDER BY ...
To select the first half of the result set use PERCENT:
SELECT TOP 50 PERCENT *
FROM table1
WHERE ...
ORDER BY ...
Remember to add an ORDER BY if you want the results to be consistent.

First:
SELECT TOP 50 PERCENT a, b, c FROM table
Second:
SELECT TOP 50 a, b, c FROM table
As a rule, it's not advisable to do this unless you are also providing an ORDER BY (and in those times where it will work perfectly well to leave out the ORDER BY, your code will be more understandable and more robust to changes in the details of the underlying tables if you put it in).
Paging (e.g. returning the x block of y rows) is more cumbersome in SQLServer than many other SQL-language relational databases (more cumbersome than just about all of them to be honest), but can be done with ROW_NUMBER:
WITH OrderedTable AS
(
SELECT a, b, c, ROW_NUMBER() OVER (ORDER BY d) as rowNumber
FROM table
)
SELECT a, b, c FROM OrderedTable
WHERE rowNumber between 31 and 40
Will select the third set of ten rows, ordered by column d.
This latter method is also needed when the limit comes from a variable, as TOP does not allow something like TOP #number.

using SQL TOP Clause
SQL Server Syntax
SELECT TOP number|percent column_name(s)
FROM table_name
example :
SELECT TOP 50 PERCENT * FROM tablename
MySQL Syntax
SELECT column_name(s)
FROM table_name
LIMIT number
Oracle Syntax
SELECT column_name(s)
FROM table_name
WHERE ROWNUM <= number

Following works in MS SQL 2005 and 2008
select top 50 PERCENT * from x

Related

PG SQL UNION With ORDER BY, LIMIT Performance Optimization

I am trying to execute a query with an ORDER BY clause and a LIMIT clause for performance. Consider the following schema.
ONE
(id, name)
(1 , a)
(2 , b)
(5 , c)
TWO
(id, name)
(3 , d)
(4 , e)
(5 , f)
I want to be able to get a list of people from tables one and two ordered by ID.
The current query I have is as follows.
WITH combined AS (
(SELECT * FROM one ORDER BY id DESC)
UNION ALL
(SELECT * FROM two ORDER BY id DESC)
)
SELECT * FROM combined ORDER BY id LIMIT 5
the output of the table will be
(id, name)
(1 , a)
(2 , b)
(3 , d)
(4 , e)
(5 , c)
You'll notice that last row "c" or "f" will change based on the order of the UNION (one UNION two versus two UNION one). That's not important as I only care about the order for ID.
Unfortunately, this query does a full scan of both tables as per the ORDER BY on "combined". My table one and two are both billions of rows.
I am looking for a query that will be able to search both tables simultaneously, if possible. Meaning rather than looking through all of "one" for the entries that I need, it first looks to sort both by ID and then look for the minimum from both tables such that if the ID in one table is lower than the ID in another table, the query will look in the other table until the other table's ID is higher or equal to the first table before looking through the first table again.
The correct order of reading the table, given one UNION two would be a, b, d, e, c/f.

Do you just mean this?
WITH combined AS (
(SELECT * FROM one ORDER BY id LIMIT 5)
UNION ALL
(SELECT * FROM two ORDER BY id LIMIT 5)
)
SELECT * FROM combined ORDER BY id LIMIT 5
That will select the 5 "lowest id" rows from each table (which is the minimum you need to guarantee 5 output rows) and then find the lowest of those.

Thanks to a_horse_with_no_name's comment on Richard Huxton's answer regarding adding an index, the query runs considerably faster, from indeterminate to under one minute.
In my case, the query was still too slow, and I came across the following solution.
Consider using results from one table to limit results from another table. The following solution, in combination with indexing by id, worked for my tables with billions of rows, but operates on the assumption that table "one" is faster than table "two" to finish the query.
WITH first as (SELECT * FROM one ORDER BY id LIMIT 5),
filter as (SELECT min(id) FROM first),
second as (SELECT * FROM two
WHERE id < (SELECT filter.id FROM filter)
ORDER BY id LIMIT 5)
combined AS (
(SELECT * FROM first ORDER BY id LIMIT 5)
UNION ALL
(SELECT * FROM second ORDER BY id LIMIT 5)
)
SELECT * FROM combined ORDER BY id LIMIT 5
By using the minimum ID from the first complete query, I can limit the scope that the database scans for completion of the second query.

Selecting all rows who belong to a group with some properties

I use PostgreSQL for a web application, and I've run into a type of query I can't think of a way to write efficiently.
What I'm trying to do is select all rows from a table which, when grouped a certain way, the group meets some criteria. For example, the naive way to structure this query might be something like this:
SELECT *
FROM table T
JOIN (
SELECT iT.a, iT.b, SUM(iT.c) AS sum
FROM table iT
GROUP BY iT.a, iT.b
) TG ON (TG.a = T.a AND TG.b = T.b)
WHERE TG.sum > 100;
The problem I'm having is that this effectively doubles the time it takes the query to execute, since it's essentially selecting the rows from that table twice.
How can I structure queries of this type efficiently?

You can try a window function although I don't know if it is more efficient. I guess it is as it avoids the join. Test this and your query with explain
select *
from (
select
a, b,
sum(c) over(partition by a, b) as sum
from t
) s
where "sum" > 100

tsql random records extraction returns duplicates

I have to extract two random records from a table.
I have implemented something like this inside a stored procedure:
with tmpTable as(
SELECT top 500 [columns I need]
, row_number() over(order by [myColumn]) as rown
FROM SourceTable
JOIN [myJoin]
WHERE [myCondition]
)
-- here I extract with an intervall of 10 records: 10, 20, 30, ..., 400, 410, ...
select * from tmpTable where rown = (1 + FLOOR(50*RAND()))*10
It works great, it extracts a random record among the first 500 records from my source.
But, when the sp is called from the presentation layer (ASP.NET 4.0, SqlClient ADO.NET), eventually the same record is returned twice. Note that the two calls are indipendent from each other.
I guess it depends on the fact that the sp is called twice in few milliseconds and that the random generator creates the same number. In the debug process no duplication occurs: due to the F10 manual steps, I guess, that require more than few milliseconds.
How may I obtain two different records?
EDIT
The answer of Lamak requires some more details. The source table is made up of records of products. Groups of about 10 records differ from each other only for some carachteristics (e.g. color). The records are distributed in such a way:
1 to 10: product 1
11 to 20: product 2
... and so on
So if I get the first two random records it is highly expected that the records will regard the same product. This is why I'm using an intervall of 10 records in the random extraction.

If you are using SQL Server you can just do the following:
SELECT TOP 1 [columns I need]
FROM SourceTable
JOIN [myJoin]
ON [Something]
WHERE [MyCondition]
ORDER BY NEWID()
If you still want to first isolate the records to each product, you can try this:
SELECT TOP 1 *
FROM ( SELECT [columns I need], ROW_NUMBER() OVER(PARTITION BY Product ORDER BY NEWID()) Corr
FROM SourceTable
JOIN [myJoin]
ON [Something]
WHERE [MyCondition]) A
WHERE Corr = 1
ORDER BY NEWID()
Though you still can choose a record that has the same product of the first one.

Equivalent of LIMIT for DB2

How do you do LIMIT in DB2 for iSeries?
I have a table with more than 50,000 records and I want to return records 0 to 10,000, and records 10,000 to 20,000.
I know in SQL you write LIMIT 0,10000 at the end of the query for 0 to 10,000 and LIMIT 10000,10000 at the end of the query for 10000 to 20,000
So, how is this done in DB2? Whats the code and syntax?
(full query example is appreciated)

Using FETCH FIRST [n] ROWS ONLY:
http://publib.boulder.ibm.com/infocenter/dzichelp/v2r2/index.jsp?topic=/com.ibm.db29.doc.perf/db2z_fetchfirstnrows.htm
SELECT LASTNAME, FIRSTNAME, EMPNO, SALARY
FROM EMP
ORDER BY SALARY DESC
FETCH FIRST 20 ROWS ONLY;
To get ranges, you'd have to use ROW_NUMBER() (since v5r4) and use that within the WHERE clause: (stolen from here: http://www.justskins.com/forums/db2-select-how-to-123209.html)
SELECT code, name, address
FROM (
SELECT row_number() OVER ( ORDER BY code ) AS rid, code, name, address
FROM contacts
WHERE name LIKE '%Bob%'
) AS t
WHERE t.rid BETWEEN 20 AND 25;

Developed this method:
You NEED a table that has an unique value that can be ordered.
If you want rows 10,000 to 25,000 and your Table has 40,000 rows, first you need to get the starting point and total rows:
int start = 40000 - 10000;
int total = 25000 - 10000;
And then pass these by code to the query:
SELECT * FROM
(SELECT * FROM schema.mytable
ORDER BY userId DESC fetch first {start} rows only ) AS mini
ORDER BY mini.userId ASC fetch first {total} rows only

Support for OFFSET and LIMIT was recently added to DB2 for i 7.1 and 7.2. You need the following DB PTF group levels to get this support:
SF99702 level 9 for IBM i 7.2
SF99701 level 38 for IBM i 7.1
See here for more information: OFFSET and LIMIT documentation, DB2 for i Enhancement Wiki

Here's the solution I came up with:
select FIELD from TABLE where FIELD > LASTVAL order by FIELD fetch first N rows only;
By initializing LASTVAL to 0 (or '' for a text field), then setting it to the last value in the most recent set of records, this will step through the table in chunks of N records.

#elcool's solution is a smart idea, but you need to know total number of rows (which can even change while you are executing the query!). So I propose a modified version, which unfortunately needs 3 subqueries instead of 2:
select * from (
select * from (
select * from MYLIB.MYTABLE
order by MYID asc
fetch first {last} rows only
) I
order by MYID desc
fetch first {length} rows only
) II
order by MYID asc
where {last} should be replaced with row number of the last record I need and {length} should be replaced with the number of rows I need, calculated as last row - first row + 1.
E.g. if I want rows from 10 to 25 (totally 16 rows), {last} will be 25 and {length} will be 25-10+1=16.

Try this
SELECT * FROM
(
SELECT T.*, ROW_NUMBER() OVER() R FROM TABLE T
)
WHERE R BETWEEN 10000 AND 20000

The LIMIT clause allows you to limit the number of rows returned by the query. The LIMIT clause is an extension of the SELECT statement that has the following syntax:
SELECT select_list
FROM table_name
ORDER BY sort_expression
LIMIT n [OFFSET m];
In this syntax:
n is the number of rows to be returned.
m is the number of rows to skip before returning the n rows.
Another shorter version of LIMIT clause is as follows:
LIMIT m, n;
This syntax means skipping m rows and returning the next n rows from the result set.
A table may store rows in an unspecified order. If you don’t use the ORDER BY clause with the LIMIT clause, the returned rows are also unspecified. Therefore, it is a good practice to always use the ORDER BY clause with the LIMIT clause.
See Db2 LIMIT for more details.

You should also consider the OPTIMIZE FOR n ROWS clause. More details on all of this in the DB2 LUW documentation in the Guidelines for restricting SELECT statements topic:
The OPTIMIZE FOR clause declares the intent to retrieve only a subset of the result or to give priority to retrieving only the first few rows. The optimizer can then choose access plans that minimize the response time for retrieving the first few rows.

There are 2 solutions to paginate efficiently on a DB2 table :
1 - the technique using the function row_number() and the clause OVER which has been presented on another post ("SELECT row_number() OVER ( ORDER BY ... )"). On some big tables, I noticed sometimes a degradation of performances.
2 - the technique using a scrollable cursor. The implementation depends of the language used. That technique seems more robust on big tables.
I presented the 2 techniques implemented in PHP during a seminar next year. The slide is available on this link :
http://gregphplab.com/serendipity/uploads/slides/DB2_PHP_Best_practices.pdf
Sorry but this document is only in french.

Theres these available options:-
DB2 has several strategies to cope with this problem.
You can use the "scrollable cursor" in feature.
In this case you can open a cursor and, instead of re-issuing a query you can FETCH forward and backward.
This works great if your application can hold state since it doesn't require DB2 to rerun the query every time.
You can use the ROW_NUMBER() OLAP function to number rows and then return the subset you want.
This is ANSI SQL
You can use the ROWNUM pseudo columns which does the same as ROW_NUMBER() but is suitable if you have Oracle skills.
You can use LIMIT and OFFSET if you are more leaning to a mySQL or PostgreSQL dialect.

Select only half the records

I am trying to figure out how to select half the records where an ID is null. I want half because I am going to use that result set to update another ID field. Then I am going to update the rest with another value for that ID field.
So essentially I want to update half the records someFieldID with one number and the rest with another number splitting the update basically between two values for someFieldID the field I want to update.

In oracle you can use the ROWNUM psuedocolumn. I believe in sql server you can use TOP.
Example:
select TOP 50 PERCENT * from table

You can select by percent:
SELECT TOP 50 PERCENT *fields* FROM YourTable WHERE ...

update x set id=#value from (select top 50 percent * from table where id is null) x

The following SQL will return the col_ids of the first half of the table.
SELECT col_id FROM table
WHERE rownum <= (SELECT count(col_id)/2 FROM table);
If the total number of col_ids is an odd number then you will get the first half - 1. This is because, for instance, we have 51 total records, the count(col_id)/2 returns 25.5, and since there is no rownum equal to this result, we get everything equal to 25 and below. That means the other 26 are not returned.
However, I have not seen the reverse statement working:
SELECT col_id FROM table
WHERE rownum > (SELECT count(col_id)/2 FROM table);
So if you want the other half of the table, you could just store the first results into a temp table, lets call it TABLE_A. Then just do MINUS on the original table from this table:
SELECT col_id FROM table
MINUS
SELECT col_id FROM table_a
Hopefully this helps someone.