I need to insert certain amount of rows into some table with values taken from variables. I certainly can do a loop inserting single row at a time, but that's too straightforward. I am looking for more elegant solution. My current thoughts are around INSERT INTO ... SELECT ... statement, but now I need a query that will generate the amount of rows that I need. I tried to write recursive CTE to do it:
CREATE FUNCTION ufGenerateRows(#numRows INT = 1)
RETURNS #RtnValue TABLE
(
RowID INT NOT NULL
)
AS
BEGIN
WITH numbers AS
(
SELECT 1 as N
UNION ALL
SELECT N + 1
FROM numbers
WHERE N + 1 <= #numRows
)
INSERT INTO #RtnValue
SELECT N
FROM numbers
RETURN
END
GO
It works, but has a limit of recursion depth of 100, which is inappropriate for me. Can you suggest alternatives?
always use the dbo. schema prefix when creating or referencing objects, especially functions.
you should strive to create inline table-valued functions, as opposed to multi-statement table-valued functions, when possible.
Recursive CTEs are about the least efficient way to generate a set (see this three-part series for much better examples):
http://www.sqlperformance.com/2013/01/t-sql-queries/generate-a-set-1
http://www.sqlperformance.com/2013/01/t-sql-queries/generate-a-set-2
http://www.sqlperformance.com/2013/01/t-sql-queries/generate-a-set-3
Here is one example:
CREATE FUNCTION dbo.GenerateRows(#numRows INT = 1)
RETURNS TABLE
AS
RETURN
(
SELECT TOP (#numRows) RowID = ROW_NUMBER() OVER (ORDER BY s1.[number])
FROM master.dbo.spt_values AS s1
-- CROSS JOIN master.dbo.spt_values AS s2
ORDER BY s1.[number]
);
If you need more than ~2,500 rows, you can cross join with itself, or another table.
Even better would be to create your own numbers table (again, see the links above for examples).
Don't think iteratively - looping - but set-based - all at once.
An INSERT INTO...SELECT TOP x… should do what you need without repeated inserts.
I will follow with an example when I'm not bound to my phone.
UPDATE:
What #AaronBertrand said. :} A CROSS JOIN in the SELECT is spot-on.
Related
I am trying to add the same data for a row into my table x number of times in postgresql. Is there a way of doing that without manually entering the same values x number of times? I am looking for the equivalent of the go[count] in sql for postgres...if that exists.
Use the function generate_series(), e.g.:
insert into my_table
select id, 'alfa', 'beta'
from generate_series(1,4) as id;
Test it in db<>fiddle.
Idea
Produce a resultset of a given size and cross join it with the record that you want to insert x times. What would still be missing is the generation of proper PK values. A specific suggestion would require more details on the data model.
Query
The sample query below presupposes that your PK values are autogenerated.
CREATE TABLE test ( id SERIAL, a VARCHAR(10), b VARCHAR(10) );
INSERT INTO test (a, b)
WITH RECURSIVE Numbers(i) AS (
SELECT 1
UNION ALL
SELECT i + 1
FROM Numbers
WHERE i < 5 -- This is the value `x`
)
SELECT adhoc.*
FROM Numbers n
CROSS JOIN ( -- This is the single record to be inserted multiple times
SELECT 'value_a' a
, 'value_b' b
) adhoc
;
See it in action in this db fiddle.
Note / Reference
The solution is adopted from here with minor modifications (there are a host of other solutions to generate x consecutive numbers with SQL hierachical / recursive queries, so the choice of reference is somewhat arbitrary).
All time I have an variation of this problem, and not remember how to workaround, only "oop was so simple, but how to?"... Perhaps there are some patterns and best way to work with each pattern. Let's see the main one, examplefying by unnest() and ts_stat().
First, good examples, no problems, because unnest() returns only one column:
SELECT * FROM unnest(array[1,2,3]) t(id); -- is ok, the int columns there!
SELECT unnest(array[1,2,3]) t(id); -- is ok, the int columns
WITH t AS (SELECT unnest(array[1,2,3]) as id)
SELECT id, unnest(array[4,id]) as x
FROM t; -- more complex, but ok!
Now a function that returns a defined SETOF RECORD,
SELECT * FROM ts_stat('SELECT kx FROM terms where id=2') -- GOOD
-- show all word|ndoc|nentry columns
SELECT ts_stat('SELECT kx FROM terms where id=2') as x -- BAD
-- because lost columns, show only "x" column... but works
-- NOTE: you can imagine any other function, as json_each(), etc.
See GOOD/BAD considerations... So, this is the problem: a SETOF RECORD with more tham one column. In the simplest (unnest above) case, the solution is to use in the "FROM side", as a table; but, when RECORD have multiple fields, arises the problem.
--MAIN EXAMPLE FOR THE DISCUSSION:
WITH t AS (SELECT unnest(array[1,2,3]) as id)
SELECT id, ts_stat('SELECT kx FROM terms where id='||id) as x
FROM t; -- BAD, but works...
Now, in this main example, is not possible to use ts_stat() in the "FROM side", so, characterizing the pattern: a function that returns a TABLE or a SETOF RECORD, in a query where we need columns, but the function can't in the "FROM side".
QUESTION: What the generic (and most elegant) solution to this pattern? How (syntax pattern) to show columns?
NOTE: another problem is that, if you not remember exactly the syntax of solution, you try things that not works... In this case an error:
WITH t AS (SELECT unnest(array[1,2,3]) as id)
SELECT id, x.word, x.ndoc, x.nentry
FROM (
SELECT t.nsid,
ts_stat('SELECT kx FROM terms where id='||id) as x
FROM t
) s;
SQL PARSER ERROR (PostgreSQL 9.5): no table "x" in the FROM clause.
You should never use a set-returning-function (SRF) in a SELECT list. The main example should be written with an implicit LATERAL JOIN:
SELECT v.id, x.*
FROM (VALUES (1),(2),(3)) v(id)
JOIN ts_stat('SELECT kx FROM terms where id=' || v.id) x ON true;
The lateral join is implicit here because an SRF can refer to columns from relations specified before it the FROM clause without using the keyword LATERAL. In the example above the SRF ts_stat() makes a lateral reference to column and relation v(id). You can also do this with e.g. sub-queries but then you have to explicitly use the keyword LATERAL.
Note that while you can use a SRF in a select list, its use is discouraged. You provide the example of unnest(anyarray) which is interesting because there is also the overloaded variant unnest(anyarray, ...) (i.e. unnest multiple arrays in one call) which will throw an error when used in a select list; in can only be used as a row source. The reason why you should not use SRFs in a select list is that there is no obvious solution when using multiple SRFs each producing a different number of rows.
Is there a way to select rows until some condition is met? I.e. a type of limit, but not limited to N rows, but to all the rows until the first non-matching row?
For example, say I have the table:
CREATE TABLE t (id SERIAL PRIMARY KEY, rank INTEGER, value INTEGER);
INSERT INTO t (rank, value) VALUES ( 1, 1), (2, 1), (2,2),(3,1);
that is:
test=# SELECT * FROM t;
id | rank | value
----+------+-------
1 | 1 | 1
2 | 2 | 1
3 | 2 | 2
4 | 3 | 1
(4 rows)
I want to order by rank, and select up until the first row that is over 1.
I.e. SELECT * FROM t ORDER BY rank UNTIL value>1
and I want the first 2 rows back?
One solution is to use a subquery and bool_or:
SELECT * FROM
( SELECT id, rank, value, bool_and(value<2) OVER (order by rank, id) AS ok FROM t ORDER BY rank) t2
WHERE ok=true
BUT wont that end up going through all rows, even if I only want a handful?
(real world context: I have timestamped events in a table, I can use a window query lead/lag to select the time between two events, I want all event from now going back as long as they happened less than 10 minutes apart – the lead/lag window query complicates things, so simplified example here)
edit: made window-function order by rank, id
What you want is a sort of stop-condition. As far as I am aware there is no such thing in SQL, at least PostgreSQL's dialect.
What you can do is use a PL/PgSQL procedure to read rows from a cursor and return them until the stop condition is met. It won't be super fast, but it'll be alright. It's just a FOR loop over a query with an IF expression THEN exit; ELSE return next; END IF;. No explicit cursor is required because PL/PgSQL will use one internally if you FOR loop over a query.
Another option is to create a cursor and read chunks of rows from it in the application, then discard part of the last chunk once the stop condition is met.
Either way, a cursor is going to be what you want.
A stop expression wouldn't actually be too hard to implement in PostgreSQL by the way. You'd have to implement a new executor node type, but the new CustomScan support would make that practical to do in an extension. Then you'd just evaluate an expression to decide whether or not to carry on fetching rows.
You can try something such as:
select * from t, (
select rank from t where value = 1 order by "rank" limit 1) x
where t.rank <= x.rank order by rank;
It will make two passes through the first part of the table (which you might be able to cut by creating an index on (rank, value = 1)) but shouldn't evaluate the rest of the table if you have an index on rank.
[If you could have window expressions in where clauses you could use a window expression to make sure any previous rows didn't have value = 1.. but even if this were possible, then getting the query evaluator to use to limit search would be yet another challenge.]
This may be no better than your solution, since you begged the question, "won't that end up going through all rows?"
I can tell you this -- the explain plan is different than your solution. I don't know how the guts of PostgreSQL works, but if I were writing a "max" function, I would think it would always be O(n). By contrast, you had an order by which is average case O(n log n), worst case O(n^2).
That said, I cannot deny that this will go through all rows:
select * from sandbox.t
where id < (select min (id) from sandbox.t where value > 1)
One thing to clarify, though, is that unless you scan all rows, I'm not sure how you could determine the minimum value. Any time you invoke an aggregate concept across all records, doesn't that mean that you must read all rows?
I have a table with a char(5) field for tracking Bin Numbers. The numbers are stored with leading zeros. The numbers go from 00200 through 90000. There are a lot of gaps in the numbers already in use and I need to be able to query them out so the user knows which numbers are available to use.
Assume you have a table of valid bin numbers.
Table: bins
bin_num
--
00200
00201
00202
...
90000
Assume your table is named "inventory". The bin numbers returned by this query are the ones that aren't in "inventory".
select bins.bin_num
from bins
left join inventory t2
on bins.bin_num = t2.bin_num
where t2.bin_num is null
order by bins.bin_num
If your version of SQL Server supports analytic functions (and, solely for convenience, common table expressions), you can find most of the gaps like this.
with bin_and_next_bin as (
select bin, lead(bin) over (order by bin) next_bin
from inventory
)
select bin
from bin_and_next_bin
where cast(bin as integer) <> cast(next_bin as integer) - 1
Analytic functions don't require a table of valid bin numbers, although you can make a really strong case that you ought to have such a table in the first place. If you're working in an environment where you don't have such a table, and you're not allowed to build such a table, a common table expression can save the day. (It doesn't show "missing" bin numbers before the first used bin number, though, as it's written here.)
One other disadvantage of this statement is that the WHERE clause isn't sargable; it can't use an index. Yet another is that it assumes bin numbers can be cast to integer. The table-based approach doesn't assume anything about the value or data type of the bin number; it works just as well with mixed alphanumerics as it does with integers or anything else.
I was able to get exactly what I needed by reading this article by Pinal Dave
I created a stored procedure that returned the gaps in the bin number sequence starting from the first bin number to the last. In my application I group the bin numbers by Shop (Vehicles would be 1000 through 2000, Buildings 2001 through 3000, etc).
ALTER PROCEDURE [dbo].[spSelectLOG_BinsAvailable]
(#Shop varchar(9))
AS
BEGIN
declare #start as varchar(5) = (SELECT b.Start FROM BinShopCodeBlocks b WHERE b.Shop = #Shop)
declare #finish as varchar(5) = (SELECT b.Finish FROM BinShopCodeBlocks b WHERE b.Shop = #Shop)
SET NOCOUNT ON;
WITH CTE
AS
(SELECT
CAST(#Start as int) as start,
cast(#Finish as int) as finish
UNION ALL
SELECT
Start + 1,
Finish
FROM
CTE
WHERE
Start < Finish
)
SELECT
RIGHT('00000' + CAST(Start AS VARCHAR(5)), 5)
FROM CTE
WHERE
NOT EXISTS
(SELECT *
FROM
BinMaster b
WHERE
b.BinNumber = RIGHT('00000' + CAST(Start AS VARCHAR(5)), 5)
)
OPTION (MAXRECURSION 0);
END
I thought I understood how I can do a SELECT from the results of another SELECT statement, but there seems to be some sort of blurring of scope that I don't understand. I am using SQL Server 2008R2.
It is easiest to explain with an example.
Create a table with a single nvarchar column - load the table with a single text value and a couple of numbers:
CREATE TABLE #temptable( a nvarchar(30) );
INSERT INTO #temptable( a )
VALUES('apple');
INSERT INTO #temptable( a )
VALUES(1);
INSERT INTO #temptable( a )
VALUES(2);
select * from #temptable;
This will return: apple, 1, 2
Use IsNumeric to get only the rows of the table that can be cast to numeric - this will leave the text value apple behind. This works fine.
select cast(a as int) as NumA
from #temptable
where IsNumeric(a) = 1 ;
This returns: 1, 2
However, if I use that exact same query as an inner select, and try to do a numeric WHERE clause, it fails saying cannot convert nvarchar value 'apple' to data type int. How has it got the value 'apple' back??
select
x.NumA
from
(
select cast(a as int) as NumA
from #temptable
where IsNumeric(a) = 1
) x
where x.NumA > 1
;
Note that the failing query works just fine without the WHERE clause:
select
x.NumA
from
(
select cast(a as int) as NumA
from #temptable
where IsNumeric(a) = 1
) x
;
I find this very surprising. What am I not getting? TIA
If you take a look at the estimated execution plan you'll find that it has optimized the inner query into the outer and combined the WHERE clauses.
Using a CTE to isolate the operations works (in SQL Server 2008 R2):
declare #temptable as table ( a nvarchar(30) );
INSERT INTO #temptable( a )
VALUES ('apple'), ('1'), ('2');
with Numbers as (
select cast(a as int) as NumA
from #temptable
where IsNumeric(a) = 1
)
select * from Numbers
The reason you are getting this is fair and simple. When a query is executed there are some steps that are being followed. This is a parse, algebrize, optimize and compile.
The algebrize part in this case will get all the objects you need for this query. The optimize will use these objects to create a best query plan which will be compiled and executed...
So, when you look into that part you will see it will do a table scan on #temptable. And #temptable is defined as the way you created your table. That you will do some compute on it is a different thing..... The column still has the nvarchar datatype..
To know how this works you have to know how to read a query. First all the objects are retrieved (from table, inner join table), then the predicates (where, on), then the grouping and such, then the select of the columns (with the cast) and then the orderby.
So with that in mind, when you have a combination of selects, the optimizer will still process it that way.. since your select is subordinate to the from and join parts of your query, it will be a reason for getting this error.
I hope i made it a little clear?
The optimizer is free to move expressions in the query plan in order to produce the most cost efficient plan for retrieving the data (the evaluation order of the predicates is not guaranteed). I think using the case expression like bellow produces a NULL in absence of the ELSE clause and thus takes the APPLE out
select a from #temptable where case when isnumeric(a) = 1 then a end > 1