I have a query which takes parameter and one which doesn't.
Was trying to compare their performances.
Currently for the query one alone I can see two results.That is totally I get three cost % in my execution plan.
Is there a way to get the two performance cost % instead of three?
----Query 1
DECLARE #p2 DATETIME;
SET #p2 = (
SELECT max(date_created)
FROM log_table
WHERE user_id = 1
)
SELECT max(date_created) AS last_login
FROM log_table
WHERE date_created <= #p2
----Query 2
SELECT max(date_created) AS last_login
FROM log_table
WHERE user_id = 1
There is no way to do what you're looking for.
SQL Server produces a plan for each query in the batch.
And you have exactly 3 queries in your batch, because
SELECT max(date_created)
FROM log_table
WHERE user_id = 1
is also a query.
A compartment of query costs is not always the best way to compare performance.
I suggest you compare the number of reads or average execution time.
Or simply execute them separately, that'll do the trick as well.
Related
I would like to get a random selection of records from my table but I wonder if it would be possible to give a better chance for items that are newly created. I also have pagination so this is why I'm using setseed
Currently I'm only retrieving items randomly and it works quite well, but I need to give a certain "preference" to newly created items.
Here is what I'm doing for now:
SELECT SETSEED(0.16111981), RANDOM();
I don't know what to do and I can't figure what can be a good solution without being an absolute performance disaster.
Firstly I want to explain how we can select random records on a table. On PostgreSQL, we can use random() function in the order by statement. Example:
select * from test_table
order by random()
limit 1;
I am using limit 1 for selecting only one record. But, using this method our query performance will be very bad for large size tables (over 100 million data)
The second way, you can manually be selecting records using random() if the tables are had id fields. This way is very high performance.
Let's firstly write our own randomize function for using it's easily on our queries.
CREATE OR REPLACE FUNCTION random_between(low integer, high integer)
RETURNS integer
LANGUAGE plpgsql
STRICT
AS $function$
BEGIN
RETURN floor(random()* (high-low + 1) + low);
END;
$function$;
This function returns a random integer value in the range of our input argument values. Then we can write a query using our random function. Example:
select * from test_table
where id = (select random_between(min(id), max(id)) from test_table);
This query I tested on the table has 150 million data and gets the best performance, Duration 12 ms. In this query, if you need many rows but not one, then you can write where id > instead of where id=.
Now, for your little preference, I don't know your detailed business logic and condition statements which you want to set to randomizing. I can write for you some sample queries for understanding the mechanism. PostgreSQL has not a function for doing this process, so randomize data using preferences. We must write this logic manually. I created a sample table for testing our queries.
CREATE TABLE test_table (
id serial4 NOT NULL,
is_created bool NULL,
action_date date NULL,
CONSTRAINT test_table_pkey PRIMARY KEY (id)
);
CREATE INDEX test_table_id_idx ON test_table USING btree (id);
For example, I want to set more preference only to data which are action dates has a closest to today. Sample query:
select
id,
is_created,
action_date,
(extract(day from (now()-action_date))) as dif_days
from
test.test_table
where
id > (select random_between(min(id), max(id)) from test.test_table)
and
(extract(day from (now()-action_date))) = random_between(0, 6)
limit 1;
In this query this (extract(day from (now()-action_date))) as dif_days query will returned difference between action_date and today. On the where clause firstly I select data that are id field values greater than the resulting randomize value. Then using this query (extract(day from (now()-action_date))) = random_between(0, 6) I select from this resulting data only which data are action_date equals maximum 6 days ago (maybe 4 days ago or 2 days ago, mak 6 days ago).
Сan wrote many logic queries (for example set more preferences using boolean fields: closed are opened and etc.)
I am a beginner in PostgreSQL and, after understanding very basic things, I want to find out how I can get a better performance (on a query) by using an index (one or more). I have read some documentation, but I would like a specific example so as to "catch" it.
MY EXAMPLE: Let's say I have just a table (MyTable) with three columns (Customer(text), Time(timestamp), Consumption(integer)) and I want to find the customer(s) with the maximum consumption on '2014-07-01 01:00:00'. MY SOLUTION (without index usage):
SELECT Customer FROM MyTable WHERE Time='2013-07-01 02:00:00'
AND Consumption=(SELECT MAX(consumption) FROM MyTable);
----> What would be the exact full code, using - at least one - index for the query-example above ?
The correct query (using a correlated subquery) would be:
SELECT Customer
FROM MyTable
WHERE Time = '2013-07-01 02:00:00' AND
Consumption = (SELECT MAX(t2.consumption) FROM MyTable t2 WHERE t2.Time = '2013-07-01 02:00:00');
The above is very reasonable. An alternative approach if you want exactly one row returned is:
SELECT Customer
FROM MyTable
WHERE Time = '2013-07-01 02:00:00'
ORDER BY Consumption DESC
LIMIT 1;
And the best index is MyTable(Time, Consumption, Customer).
I'm having an issue with limiting the SQL query. I'm using SQL 2000 so I can't use any of the functions like ROW_NUMBER(),CTE OR OFFSET_ROW FETCH.
I have tried the Select TOP limit * FROM approach and excluded the already shown results but this way the query is so slow because sometimes my result query fetches more than 10000 records.
Also I have tried the following approach:
SELECT * FROM (
SELECT DISTINCT TOP 100 PERCENT i.name, i.location, i.image ,
( SELECT count(DISTINCT i.id) FROM image WHERE i.id<= im.id ) AS recordnum
FROM images AS im
order by im.location asc, im.name asc) as tmp
WHERE recordnum between 5 AND 15
same problem here plus issue because I couldn't add ORDER option in sub query from record um. I have placed both solution in stored procedure but still the query execution is still so slow.
So my question is:
IS there an efficient way to limit the query to pull 20 records per page in SQL 2000 for large amounts of data i.e more than 10000?
Thanks.
Now the subquery is only run once
where im2.id is null will skip the first 40 rows
SELECT top 25 im1.*
FROM images im1
left join ( select top 40 id from images order by id ) im2
on im1.id = im2.id
where im2.id is null
order by im1.id
Query-wise, there is no great performing way. If performance is critical and the data will always be grouped/ordered the same, you could add a int column and set the value by trigger based on the grouping/ordering. Index it and it should be extremely fast for reads; writes will be a bit slower.
Also, make sure you have indexes on the Id columns on image and images.
I thought I understood how I can do a SELECT from the results of another SELECT statement, but there seems to be some sort of blurring of scope that I don't understand. I am using SQL Server 2008R2.
It is easiest to explain with an example.
Create a table with a single nvarchar column - load the table with a single text value and a couple of numbers:
CREATE TABLE #temptable( a nvarchar(30) );
INSERT INTO #temptable( a )
VALUES('apple');
INSERT INTO #temptable( a )
VALUES(1);
INSERT INTO #temptable( a )
VALUES(2);
select * from #temptable;
This will return: apple, 1, 2
Use IsNumeric to get only the rows of the table that can be cast to numeric - this will leave the text value apple behind. This works fine.
select cast(a as int) as NumA
from #temptable
where IsNumeric(a) = 1 ;
This returns: 1, 2
However, if I use that exact same query as an inner select, and try to do a numeric WHERE clause, it fails saying cannot convert nvarchar value 'apple' to data type int. How has it got the value 'apple' back??
select
x.NumA
from
(
select cast(a as int) as NumA
from #temptable
where IsNumeric(a) = 1
) x
where x.NumA > 1
;
Note that the failing query works just fine without the WHERE clause:
select
x.NumA
from
(
select cast(a as int) as NumA
from #temptable
where IsNumeric(a) = 1
) x
;
I find this very surprising. What am I not getting? TIA
If you take a look at the estimated execution plan you'll find that it has optimized the inner query into the outer and combined the WHERE clauses.
Using a CTE to isolate the operations works (in SQL Server 2008 R2):
declare #temptable as table ( a nvarchar(30) );
INSERT INTO #temptable( a )
VALUES ('apple'), ('1'), ('2');
with Numbers as (
select cast(a as int) as NumA
from #temptable
where IsNumeric(a) = 1
)
select * from Numbers
The reason you are getting this is fair and simple. When a query is executed there are some steps that are being followed. This is a parse, algebrize, optimize and compile.
The algebrize part in this case will get all the objects you need for this query. The optimize will use these objects to create a best query plan which will be compiled and executed...
So, when you look into that part you will see it will do a table scan on #temptable. And #temptable is defined as the way you created your table. That you will do some compute on it is a different thing..... The column still has the nvarchar datatype..
To know how this works you have to know how to read a query. First all the objects are retrieved (from table, inner join table), then the predicates (where, on), then the grouping and such, then the select of the columns (with the cast) and then the orderby.
So with that in mind, when you have a combination of selects, the optimizer will still process it that way.. since your select is subordinate to the from and join parts of your query, it will be a reason for getting this error.
I hope i made it a little clear?
The optimizer is free to move expressions in the query plan in order to produce the most cost efficient plan for retrieving the data (the evaluation order of the predicates is not guaranteed). I think using the case expression like bellow produces a NULL in absence of the ELSE clause and thus takes the APPLE out
select a from #temptable where case when isnumeric(a) = 1 then a end > 1
How do you do LIMIT in DB2 for iSeries?
I have a table with more than 50,000 records and I want to return records 0 to 10,000, and records 10,000 to 20,000.
I know in SQL you write LIMIT 0,10000 at the end of the query for 0 to 10,000 and LIMIT 10000,10000 at the end of the query for 10000 to 20,000
So, how is this done in DB2? Whats the code and syntax?
(full query example is appreciated)
Using FETCH FIRST [n] ROWS ONLY:
http://publib.boulder.ibm.com/infocenter/dzichelp/v2r2/index.jsp?topic=/com.ibm.db29.doc.perf/db2z_fetchfirstnrows.htm
SELECT LASTNAME, FIRSTNAME, EMPNO, SALARY
FROM EMP
ORDER BY SALARY DESC
FETCH FIRST 20 ROWS ONLY;
To get ranges, you'd have to use ROW_NUMBER() (since v5r4) and use that within the WHERE clause: (stolen from here: http://www.justskins.com/forums/db2-select-how-to-123209.html)
SELECT code, name, address
FROM (
SELECT row_number() OVER ( ORDER BY code ) AS rid, code, name, address
FROM contacts
WHERE name LIKE '%Bob%'
) AS t
WHERE t.rid BETWEEN 20 AND 25;
Developed this method:
You NEED a table that has an unique value that can be ordered.
If you want rows 10,000 to 25,000 and your Table has 40,000 rows, first you need to get the starting point and total rows:
int start = 40000 - 10000;
int total = 25000 - 10000;
And then pass these by code to the query:
SELECT * FROM
(SELECT * FROM schema.mytable
ORDER BY userId DESC fetch first {start} rows only ) AS mini
ORDER BY mini.userId ASC fetch first {total} rows only
Support for OFFSET and LIMIT was recently added to DB2 for i 7.1 and 7.2. You need the following DB PTF group levels to get this support:
SF99702 level 9 for IBM i 7.2
SF99701 level 38 for IBM i 7.1
See here for more information: OFFSET and LIMIT documentation, DB2 for i Enhancement Wiki
Here's the solution I came up with:
select FIELD from TABLE where FIELD > LASTVAL order by FIELD fetch first N rows only;
By initializing LASTVAL to 0 (or '' for a text field), then setting it to the last value in the most recent set of records, this will step through the table in chunks of N records.
#elcool's solution is a smart idea, but you need to know total number of rows (which can even change while you are executing the query!). So I propose a modified version, which unfortunately needs 3 subqueries instead of 2:
select * from (
select * from (
select * from MYLIB.MYTABLE
order by MYID asc
fetch first {last} rows only
) I
order by MYID desc
fetch first {length} rows only
) II
order by MYID asc
where {last} should be replaced with row number of the last record I need and {length} should be replaced with the number of rows I need, calculated as last row - first row + 1.
E.g. if I want rows from 10 to 25 (totally 16 rows), {last} will be 25 and {length} will be 25-10+1=16.
Try this
SELECT * FROM
(
SELECT T.*, ROW_NUMBER() OVER() R FROM TABLE T
)
WHERE R BETWEEN 10000 AND 20000
The LIMIT clause allows you to limit the number of rows returned by the query. The LIMIT clause is an extension of the SELECT statement that has the following syntax:
SELECT select_list
FROM table_name
ORDER BY sort_expression
LIMIT n [OFFSET m];
In this syntax:
n is the number of rows to be returned.
m is the number of rows to skip before returning the n rows.
Another shorter version of LIMIT clause is as follows:
LIMIT m, n;
This syntax means skipping m rows and returning the next n rows from the result set.
A table may store rows in an unspecified order. If you don’t use the ORDER BY clause with the LIMIT clause, the returned rows are also unspecified. Therefore, it is a good practice to always use the ORDER BY clause with the LIMIT clause.
See Db2 LIMIT for more details.
You should also consider the OPTIMIZE FOR n ROWS clause. More details on all of this in the DB2 LUW documentation in the Guidelines for restricting SELECT statements topic:
The OPTIMIZE FOR clause declares the intent to retrieve only a subset of the result or to give priority to retrieving only the first few rows. The optimizer can then choose access plans that minimize the response time for retrieving the first few rows.
There are 2 solutions to paginate efficiently on a DB2 table :
1 - the technique using the function row_number() and the clause OVER which has been presented on another post ("SELECT row_number() OVER ( ORDER BY ... )"). On some big tables, I noticed sometimes a degradation of performances.
2 - the technique using a scrollable cursor. The implementation depends of the language used. That technique seems more robust on big tables.
I presented the 2 techniques implemented in PHP during a seminar next year. The slide is available on this link :
http://gregphplab.com/serendipity/uploads/slides/DB2_PHP_Best_practices.pdf
Sorry but this document is only in french.
Theres these available options:-
DB2 has several strategies to cope with this problem.
You can use the "scrollable cursor" in feature.
In this case you can open a cursor and, instead of re-issuing a query you can FETCH forward and backward.
This works great if your application can hold state since it doesn't require DB2 to rerun the query every time.
You can use the ROW_NUMBER() OLAP function to number rows and then return the subset you want.
This is ANSI SQL
You can use the ROWNUM pseudo columns which does the same as ROW_NUMBER() but is suitable if you have Oracle skills.
You can use LIMIT and OFFSET if you are more leaning to a mySQL or PostgreSQL dialect.