PostgreSQL - Update rows in table with generate_series() - postgresql

I have the following table:
create table test(
id serial primary key,
firstname varchar(32),
lastname varchar(64),
id_desc char(8)
);
I need to insert 100 rows of data. Getting the names is no problem - I have two tables one containing ten rows of first names and the other containing ten last names. By doing a insert - select query with a cross join I am able to get 100 rows of data (10x10 cross join).
id_desc contains of eight characters (fixed size is mandatory). It always starts with the same pattern (e.g. abcde) followed by 001, 002 etc. up to 999. I have tried to achieve this with the following statement:
update test set id_desc = 'abcde' || num.id
from (select * from generate_series(1, 100) as id) as num
where num.id = (select id from test where id = num.id);
The statement executes but affects zero rows. I know that the where-clause probably does not make much sense; I have been trying to finally get this to work and just started trying a couple of things. Didn't want to omit it though when posting here because I know it is definitely required.

Laurenz's suggestion fits this specific case very well. I recommend using it.
The rest of this is for the more general case where that simplification is not appropriate.
In my tests this doesn't work in this way.
I think you are better off using a WITH clause and a window function.
WITH ranked_ids (id, rank) AS (
select id, row_number() OVER (rows unbounded preceding)
FROM test
)
update test set id_desc = 'abcde' || ranked_ids.rank
from ranked_ids WHERE test.id = ranked_ids.id;

It should be as simple as
UPDATE test SET id_desc = 'abcde' || to_char(id, 'FM099');

Related

Is there a way to add the same row multiple times with different ids into a table with postgresql?

I am trying to add the same data for a row into my table x number of times in postgresql. Is there a way of doing that without manually entering the same values x number of times? I am looking for the equivalent of the go[count] in sql for postgres...if that exists.
Use the function generate_series(), e.g.:
insert into my_table
select id, 'alfa', 'beta'
from generate_series(1,4) as id;
Test it in db<>fiddle.
Idea
Produce a resultset of a given size and cross join it with the record that you want to insert x times. What would still be missing is the generation of proper PK values. A specific suggestion would require more details on the data model.
Query
The sample query below presupposes that your PK values are autogenerated.
CREATE TABLE test ( id SERIAL, a VARCHAR(10), b VARCHAR(10) );
INSERT INTO test (a, b)
WITH RECURSIVE Numbers(i) AS (
SELECT 1
UNION ALL
SELECT i + 1
FROM Numbers
WHERE i < 5 -- This is the value `x`
)
SELECT adhoc.*
FROM Numbers n
CROSS JOIN ( -- This is the single record to be inserted multiple times
SELECT 'value_a' a
, 'value_b' b
) adhoc
;
See it in action in this db fiddle.
Note / Reference
The solution is adopted from here with minor modifications (there are a host of other solutions to generate x consecutive numbers with SQL hierachical / recursive queries, so the choice of reference is somewhat arbitrary).

Optimizing a query with multiple IN

I have a query like this:
SELECT * FROM table
WHERE department='param1' AND type='param2' AND product='param3'
AND product_code IN (10-30 alphanumerics) AND unit_code IN (10+ numerics)
AND first_name || last_name IN (10-20 names)
AND sale_id LIKE ANY(list of regex string)
Runtime was too high so I was asked to optimize it.
The list of parameters varies for the code columns for different users.
Each user provides their list of codes and then loops over product.
product used to be an IN clause list as well but it was split up.
Things I tried
By adding an index on (department, type and product) I was able to get a 4x improvement.
Current runtime is that some values of product only take 2-3 seconds, while others take 30s.
Tried creating a pre-concat'd column of first_name || last_name, but the runtime improvement was too small to be worth it.
Is there some way I can improve the performance of the other clauses, such as the "IN" clauses or the LIKE ANY clause?
In my experience replacing large IN lists, with a JOIN to a VALUES clause often improves performance.
So instead of:
SELECT *
FROM table
WHERE department='param1'
AND type='param2'
AND product='param3'
AND product_code IN (10-30 alphanumerics)
Use:
SELECT *
FROM table t
JOIN ( values (1),(2),(3) ) as x(code) on x.code = t.product_code
WHERE department='param1'
AND type='param2'
AND product='param3'
But you have to make sure you don't have any duplicates in the values () list
The concatenation is also wrong because the concatenated value is something different then comparing each value individually, e.g. ('alexander', 'son') would be treated identical to ('alex', 'anderson')`
You should use:
and (first_name, last_name) in ( ('fname1', 'lname1'), ('fname2', 'lname2'))
This can also be written as a join
SELECT *
FROM table t
JOIN ( values (1),(2),(3) ) as x(code) on x.code = t.product_code
JOIN (
values ('fname1', 'lname1'), ('fname2', 'lname2')
) as n(fname, lname) on (n.fname, n.lname) = (t.first_name, t.last_name)
WHERE department='param1'
AND type='param2'
AND product='param3'
You generally don't have to do anything special to enable an index for it to be used with multiple IN-lists, other than keep the table well vacuumed and analyzed. A btree index on (department, type, product, product_code, unit_code, (first_name || last_name)) should work well. If it doesn't, please show an EXPLAIN (ANALYZE, BUFFERS) for it, preferably with track_io_timing turned on. If the selectivities of each of your conditions are not mostly independent of each other, that might lead to planning problems.

TSQL order by but first show these

I'm researching a dataset.
And I just wonder if there is a way to order like below in 1 query
Select * From MyTable where name ='international%' order by id
Select * From MyTable where name != 'international%' order by id
So first showing all international items, next by names who dont start with international.
My question is not about adding columns to make this work, or use multiple DB's, or a largerTSQL script to clone a DB into a new order.
I just wonder if anything after 'Where or order by' can be tricked to do this.
You can use expressions in the ORDER BY:
Select * From MyTable
order by
CASE
WHEN name like 'international%' THEN 0
ELSE 1
END,
id
(From your narrative, it also sounded like you wanted like, not =, so I changed that too)
Another way (slightly cleaner and a tiny bit faster)
-- Sample Data
DECLARE #mytable TABLE (id INT IDENTITY, [name] VARCHAR(100));
INSERT #mytable([name])
VALUES('international something' ),('ACME'),('international waffles'),('ABC Co.');
-- solution
SELECT t.*
FROM #mytable AS t
ORDER BY -PATINDEX('international%', t.[name]);
Note too that you can add a persisted computed column for -PATINDEX('international%', t.[name]) to speed things up.

PostGresql: Copy data from a random row of another table

I have two tables, stuff and nonsense.
create table stuff(
id serial primary key,
details varchar,
data varchar,
more varchar
);
create table nonsense (
id serial primary key,
data varchar,
more varchar
);
insert into stuff(details) values
('one'),('two'),('three'),('four'),('five'),('six');
insert into nonsense(data,more) values
('apple','accordion'),('banana','banjo'),('cherry','cor anglais');
See http://sqlfiddle.com/#!17/313fb/1
I would like to copy random values from nonsense to stuff. I can do this for a single value using the answer to my previous question: SQL Server Copy Random data from one table to another:
update stuff
set data=(select data from nonsense where stuff.id=stuff.id
order by random() limit 1);
However, I would like to copy more than one value (data and more) from the same row, and the sub query won’t let me do that, of course.
I Microsoft SQL, I can use the following:
update stuff
set data=sq.town,more=sq.state
from stuff s outer apply
(select top 1 * from nonsense where s.id=s.id order by newid()) sq
I have read that PostGresql uses something like LEFT JOIN LATERAL instead of OUTER APPPLY, but simply substituting doesn’t work for me.
How can I update with multiple values from a random row of another table?
As of Postgres 9.5, you can assign multiple columns from a subquery:
update stuff
set (data, more) = (
select data, more
from nonsense
where stuff.id=stuff.id
order by random()
limit 1
);

Is there a way to find TOP X records with grouped data?

I'm working with a Sybase 12.5 server and I have a table defined as such:
CREATE TABLE SomeTable(
[GroupID] [int] NOT NULL,
[DateStamp] [datetime] NOT NULL,
[SomeName] varchar(100),
PRIMARY KEY CLUSTERED (GroupID,DateStamp)
)
I want to be able to list, per [GroupID], only the latest X records by [DateStamp]. The kicker is X > 1, so plain old MAX() won't cut it. I'm assuming there's a wonderfully nasty way to do this with cursors and what-not, but I'm wondering if there is a simpler way without that stuff.
I know I'm missing something blatantly obvious and I'm gonna kick myself for not getting it, but .... I'm not getting it. Please help.
Is there a way to find TOP X records, but with grouped data?
According to the online manual, Sybase 12.5 supports WINDOW functions and ROW_NUMBER(), though their syntax differs from standard SQL slightly.
Try something like this:
SELECT SP.*
FROM (
SELECT *, ROW_NUMBER() OVER (windowA ORDER BY [DateStamp] DESC) AS RowNum
FROM SomeTable
WINDOW windowA AS (PARTITION BY [GroupID])
) AS SP
WHERE SP.RowNum <= 3
ORDER BY RowNum DESC;
I don't have an instance of Sybase, so I haven't tested this. I'm just synthesizing this example from the doc.
I made a mistake. The doc I was looking at was Sybase SQL Anywhere 11. It seems that Sybase ASA does not support the WINDOW clause at all, even in the most recent version.
Here's another query that could accomplish the same thing. You can use a self-join to match each row of SomeTable to all rows with the same GroupID and a later DateStamp. If there are three or fewer later rows, then we've got one of the top three.
SELECT s1.[GroupID], s1.[Foo], s1.[Bar], s1.[Baz]
FROM SomeTable s1
LEFT OUTER JOIN SomeTable s2
ON s1.[GroupID] = s2.[GroupID] AND s1.[DateStamp] < s2.[DateStamp]
GROUP BY s1.[GroupID], s1.[Foo], s1.[Bar], s1.[Baz]
HAVING COUNT(*) < 3
ORDER BY s1.[DateStamp] DESC;
Note that you must list the same columns in the SELECT list as you list in the GROUP BY clause. Basically, all columns from s1 that you want this query to return.
Here's quite an unscalable way!
SELECT GroupID, DateStamp, SomeName
FROM SomeTable ST1
WHERE X <
(SELECT COUNT(*)
FROM SomeTable ST2
WHERE ST1.GroupID=ST2.GroupID AND ST2.DateStamp > ST1.DateStamp)
Edit Bill's solution is vastly preferable though.