tsql random records extraction returns duplicates

tsql random records extraction returns duplicates - tsql

I have to extract two random records from a table.
I have implemented something like this inside a stored procedure:
with tmpTable as(
SELECT top 500 [columns I need]
, row_number() over(order by [myColumn]) as rown
FROM SourceTable
JOIN [myJoin]
WHERE [myCondition]
)
-- here I extract with an intervall of 10 records: 10, 20, 30, ..., 400, 410, ...
select * from tmpTable where rown = (1 + FLOOR(50*RAND()))*10
It works great, it extracts a random record among the first 500 records from my source.
But, when the sp is called from the presentation layer (ASP.NET 4.0, SqlClient ADO.NET), eventually the same record is returned twice. Note that the two calls are indipendent from each other.
I guess it depends on the fact that the sp is called twice in few milliseconds and that the random generator creates the same number. In the debug process no duplication occurs: due to the F10 manual steps, I guess, that require more than few milliseconds.
How may I obtain two different records?
EDIT
The answer of Lamak requires some more details. The source table is made up of records of products. Groups of about 10 records differ from each other only for some carachteristics (e.g. color). The records are distributed in such a way:
1 to 10: product 1
11 to 20: product 2
... and so on
So if I get the first two random records it is highly expected that the records will regard the same product. This is why I'm using an intervall of 10 records in the random extraction.

If you are using SQL Server you can just do the following:
SELECT TOP 1 [columns I need]
FROM SourceTable
JOIN [myJoin]
ON [Something]
WHERE [MyCondition]
ORDER BY NEWID()
If you still want to first isolate the records to each product, you can try this:
SELECT TOP 1 *
FROM ( SELECT [columns I need], ROW_NUMBER() OVER(PARTITION BY Product ORDER BY NEWID()) Corr
FROM SourceTable
JOIN [myJoin]
ON [Something]
WHERE [MyCondition]) A
WHERE Corr = 1
ORDER BY NEWID()
Though you still can choose a record that has the same product of the first one.

Related

PG SQL UNION With ORDER BY, LIMIT Performance Optimization

I am trying to execute a query with an ORDER BY clause and a LIMIT clause for performance. Consider the following schema.
ONE
(id, name)
(1 , a)
(2 , b)
(5 , c)
TWO
(id, name)
(3 , d)
(4 , e)
(5 , f)
I want to be able to get a list of people from tables one and two ordered by ID.
The current query I have is as follows.
WITH combined AS (
(SELECT * FROM one ORDER BY id DESC)
UNION ALL
(SELECT * FROM two ORDER BY id DESC)
)
SELECT * FROM combined ORDER BY id LIMIT 5
the output of the table will be
(id, name)
(1 , a)
(2 , b)
(3 , d)
(4 , e)
(5 , c)
You'll notice that last row "c" or "f" will change based on the order of the UNION (one UNION two versus two UNION one). That's not important as I only care about the order for ID.
Unfortunately, this query does a full scan of both tables as per the ORDER BY on "combined". My table one and two are both billions of rows.
I am looking for a query that will be able to search both tables simultaneously, if possible. Meaning rather than looking through all of "one" for the entries that I need, it first looks to sort both by ID and then look for the minimum from both tables such that if the ID in one table is lower than the ID in another table, the query will look in the other table until the other table's ID is higher or equal to the first table before looking through the first table again.
The correct order of reading the table, given one UNION two would be a, b, d, e, c/f.

Do you just mean this?
WITH combined AS (
(SELECT * FROM one ORDER BY id LIMIT 5)
UNION ALL
(SELECT * FROM two ORDER BY id LIMIT 5)
)
SELECT * FROM combined ORDER BY id LIMIT 5
That will select the 5 "lowest id" rows from each table (which is the minimum you need to guarantee 5 output rows) and then find the lowest of those.

Thanks to a_horse_with_no_name's comment on Richard Huxton's answer regarding adding an index, the query runs considerably faster, from indeterminate to under one minute.
In my case, the query was still too slow, and I came across the following solution.
Consider using results from one table to limit results from another table. The following solution, in combination with indexing by id, worked for my tables with billions of rows, but operates on the assumption that table "one" is faster than table "two" to finish the query.
WITH first as (SELECT * FROM one ORDER BY id LIMIT 5),
filter as (SELECT min(id) FROM first),
second as (SELECT * FROM two
WHERE id < (SELECT filter.id FROM filter)
ORDER BY id LIMIT 5)
combined AS (
(SELECT * FROM first ORDER BY id LIMIT 5)
UNION ALL
(SELECT * FROM second ORDER BY id LIMIT 5)
)
SELECT * FROM combined ORDER BY id LIMIT 5
By using the minimum ID from the first complete query, I can limit the scope that the database scans for completion of the second query.

How can I combine two PIVOTs that use different aggregate elements and the same spreading/grouping elements into a single row per ID?

Couldn't find an exact duplicate question so please push one to me if you know of one.
https://i.stack.imgur.com/Xjmca.jpg
See the screenshot (sorry for link, not enough rep). In the table I have ID, Cat, Awd, and Xmit.
I want a resultset where each row is a distinct ID plus the aggregate Awd and Xmit amounts for each Cat (so four add'l columns per ID).
Currently I'm using two CTEs, one to aggregate each of Awd and Xmit. Both make use of the PIVOT operator, using Cat to spread and ID to group. After each CTE does its thing, I'm INNER JOINing them on ID.
WITH CTE1 (ID, P_Awd, G_Awd) AS (
SELECT ...
FROM Table
PIVOT(SUM(Awd) FOR Cat IN ('P', 'G'),
CTE2 ([same as CTE1 but replace "Awd" with "Xmit"])
SELECT ID, P_Awd, P_Xmit, G_Awd, G_Xmit
FROM CTE1 INNER JOIN CTE2 ON CTE1.ID = CTE2.ID
The output of this (greatly simplified) is two rows per ID, with each row holding the resultset of one CTE or the other.
What am I overlooking? Am I overcomplicating this?

Here on one method via a CROSS APPLY
Also, this is assumes you don't need dynamic SQL
Example
Select *
From (
Select ID
,B.*
From YourTable A
Cross Apply ( values (cat+'_Awd',Awd)
,(cat+'_Xmit',Xmit)
) B(Item,Value)
) src
Pivot (sum(Value) for Item in ([P_Awd],[P_XMit],[G_Awd],[G_XMit]) ) pvt
Returns (Limited Set -- Best if you not use images for sample data)
ID P_Awd P_XMit G_Awd G_XMit
1 1000 500 1000 0
2 2000 1500 500 500

how to get rowNum like column in sqlite IPHONE

I have an Sqlite database table like this (with out ascending)
But i need to retrive the table in Ascending order by name, when i set it ascending order the rowId changes as follows in jumbled order
But i need to retrieve some limited number of contacts 5 in ascending order every time
like Aaa - Eeee and then Ffff- Jjjjj ......
but to se**t limits like 0-5 5-10 .... ** it can able using rowids since they are in jumble order
So i need another column like (rowNum in oracle) wich is in order 1234567... every time as follows
how to retrive that column with existing columns
Note: WE DONTE HAVE ROWNUM LIKE COLUMN IN SQLITE

The fake rownum solution is clever, but I am afraid it doesn't scale well (for complex query you have to join and count on each row the number of row before current row).
I would consider using create table tmp as select /*your query*/.
because in the case of a create as select operation the rowid created when inserting
the rows is exactly what would be the rownum (a counter). It is specified by the SQLite doc.
Once the initial query has been inserted, you only need to query the tmp table:
select rowid, /* your columns */ from tmp
order by rowid

You can use offset/limit.
Get the first, 2nd, and 3rd groups of five rows:
select rowid, name from contactinfo order by name limit 0, 5
select rowid, name from contactinfo order by name limit 5, 5
select rowid, name from contactinfo order by name limit 10, 5
Warning, using the above syntax requires SQLite to read through all prior records in sorted order. So to get the 10th record for statement number 3 above SQLite needs to read the first 9 records. If you have a large number of records this can be problematic from a performance standpoint.
More info on limit/ offset:
Sqlite Query Optimization (using Limit and Offset)
Sqlite LIMIT / OFFSET query

This is a way of faking a RowNum, hope it helps:
SELECT
(SELECT COUNT(*)
FROM Names AS t2
WHERE t2.name < t1.name
) + (
SELECT COUNT(*)
FROM Names AS t3
WHERE t3.name = t1.name AND t3.id < t1.id
) AS rowNum,
id,
name
FROM Names t1
ORDER BY t1.name ASC
SQL Fiddle example

Equivalent of LIMIT for DB2

How do you do LIMIT in DB2 for iSeries?
I have a table with more than 50,000 records and I want to return records 0 to 10,000, and records 10,000 to 20,000.
I know in SQL you write LIMIT 0,10000 at the end of the query for 0 to 10,000 and LIMIT 10000,10000 at the end of the query for 10000 to 20,000
So, how is this done in DB2? Whats the code and syntax?
(full query example is appreciated)

Using FETCH FIRST [n] ROWS ONLY:
http://publib.boulder.ibm.com/infocenter/dzichelp/v2r2/index.jsp?topic=/com.ibm.db29.doc.perf/db2z_fetchfirstnrows.htm
SELECT LASTNAME, FIRSTNAME, EMPNO, SALARY
FROM EMP
ORDER BY SALARY DESC
FETCH FIRST 20 ROWS ONLY;
To get ranges, you'd have to use ROW_NUMBER() (since v5r4) and use that within the WHERE clause: (stolen from here: http://www.justskins.com/forums/db2-select-how-to-123209.html)
SELECT code, name, address
FROM (
SELECT row_number() OVER ( ORDER BY code ) AS rid, code, name, address
FROM contacts
WHERE name LIKE '%Bob%'
) AS t
WHERE t.rid BETWEEN 20 AND 25;

Developed this method:
You NEED a table that has an unique value that can be ordered.
If you want rows 10,000 to 25,000 and your Table has 40,000 rows, first you need to get the starting point and total rows:
int start = 40000 - 10000;
int total = 25000 - 10000;
And then pass these by code to the query:
SELECT * FROM
(SELECT * FROM schema.mytable
ORDER BY userId DESC fetch first {start} rows only ) AS mini
ORDER BY mini.userId ASC fetch first {total} rows only

Support for OFFSET and LIMIT was recently added to DB2 for i 7.1 and 7.2. You need the following DB PTF group levels to get this support:
SF99702 level 9 for IBM i 7.2
SF99701 level 38 for IBM i 7.1
See here for more information: OFFSET and LIMIT documentation, DB2 for i Enhancement Wiki

Here's the solution I came up with:
select FIELD from TABLE where FIELD > LASTVAL order by FIELD fetch first N rows only;
By initializing LASTVAL to 0 (or '' for a text field), then setting it to the last value in the most recent set of records, this will step through the table in chunks of N records.

#elcool's solution is a smart idea, but you need to know total number of rows (which can even change while you are executing the query!). So I propose a modified version, which unfortunately needs 3 subqueries instead of 2:
select * from (
select * from (
select * from MYLIB.MYTABLE
order by MYID asc
fetch first {last} rows only
) I
order by MYID desc
fetch first {length} rows only
) II
order by MYID asc
where {last} should be replaced with row number of the last record I need and {length} should be replaced with the number of rows I need, calculated as last row - first row + 1.
E.g. if I want rows from 10 to 25 (totally 16 rows), {last} will be 25 and {length} will be 25-10+1=16.

Try this
SELECT * FROM
(
SELECT T.*, ROW_NUMBER() OVER() R FROM TABLE T
)
WHERE R BETWEEN 10000 AND 20000

The LIMIT clause allows you to limit the number of rows returned by the query. The LIMIT clause is an extension of the SELECT statement that has the following syntax:
SELECT select_list
FROM table_name
ORDER BY sort_expression
LIMIT n [OFFSET m];
In this syntax:
n is the number of rows to be returned.
m is the number of rows to skip before returning the n rows.
Another shorter version of LIMIT clause is as follows:
LIMIT m, n;
This syntax means skipping m rows and returning the next n rows from the result set.
A table may store rows in an unspecified order. If you don’t use the ORDER BY clause with the LIMIT clause, the returned rows are also unspecified. Therefore, it is a good practice to always use the ORDER BY clause with the LIMIT clause.
See Db2 LIMIT for more details.

You should also consider the OPTIMIZE FOR n ROWS clause. More details on all of this in the DB2 LUW documentation in the Guidelines for restricting SELECT statements topic:
The OPTIMIZE FOR clause declares the intent to retrieve only a subset of the result or to give priority to retrieving only the first few rows. The optimizer can then choose access plans that minimize the response time for retrieving the first few rows.

There are 2 solutions to paginate efficiently on a DB2 table :
1 - the technique using the function row_number() and the clause OVER which has been presented on another post ("SELECT row_number() OVER ( ORDER BY ... )"). On some big tables, I noticed sometimes a degradation of performances.
2 - the technique using a scrollable cursor. The implementation depends of the language used. That technique seems more robust on big tables.
I presented the 2 techniques implemented in PHP during a seminar next year. The slide is available on this link :
http://gregphplab.com/serendipity/uploads/slides/DB2_PHP_Best_practices.pdf
Sorry but this document is only in french.

Theres these available options:-
DB2 has several strategies to cope with this problem.
You can use the "scrollable cursor" in feature.
In this case you can open a cursor and, instead of re-issuing a query you can FETCH forward and backward.
This works great if your application can hold state since it doesn't require DB2 to rerun the query every time.
You can use the ROW_NUMBER() OLAP function to number rows and then return the subset you want.
This is ANSI SQL
You can use the ROWNUM pseudo columns which does the same as ROW_NUMBER() but is suitable if you have Oracle skills.
You can use LIMIT and OFFSET if you are more leaning to a mySQL or PostgreSQL dialect.

Select only half the records

I am trying to figure out how to select half the records where an ID is null. I want half because I am going to use that result set to update another ID field. Then I am going to update the rest with another value for that ID field.
So essentially I want to update half the records someFieldID with one number and the rest with another number splitting the update basically between two values for someFieldID the field I want to update.

In oracle you can use the ROWNUM psuedocolumn. I believe in sql server you can use TOP.
Example:
select TOP 50 PERCENT * from table

You can select by percent:
SELECT TOP 50 PERCENT *fields* FROM YourTable WHERE ...

update x set id=#value from (select top 50 percent * from table where id is null) x

The following SQL will return the col_ids of the first half of the table.
SELECT col_id FROM table
WHERE rownum <= (SELECT count(col_id)/2 FROM table);
If the total number of col_ids is an odd number then you will get the first half - 1. This is because, for instance, we have 51 total records, the count(col_id)/2 returns 25.5, and since there is no rownum equal to this result, we get everything equal to 25 and below. That means the other 26 are not returned.
However, I have not seen the reverse statement working:
SELECT col_id FROM table
WHERE rownum > (SELECT count(col_id)/2 FROM table);
So if you want the other half of the table, you could just store the first results into a temp table, lets call it TABLE_A. Then just do MINUS on the original table from this table:
SELECT col_id FROM table
MINUS
SELECT col_id FROM table_a
Hopefully this helps someone.

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse

tsql random records extraction returns duplicates - tsql

Related

PG SQL UNION With ORDER BY, LIMIT Performance Optimization

How can I combine two PIVOTs that use different aggregate elements and the same spreading/grouping elements into a single row per ID?

how to get rowNum like column in sqlite IPHONE

Equivalent of LIMIT for DB2

Select only half the records

Categories

Resources