Apply a sequence based on the value of another column - postgresql

I would like to create a sequence that starts with 1000 and increments by one unit. Then, apply this sequence to the variable "identification" of my table "person". The number of records in the table "person" is 958, so it should increment to the end by default. In addition, I want the numbering to be based on the Age field sorted in descending order. It's to say: the field 'Identification' has no records, it has NULL values. When I say sort by age, I mean that the one with the youngest age will be assigned the ID number 1000, the second youngest will be assigned 1001 and so on.
I have tried to do something similar to the following but I get no results. I have also tried to put an order by age desc in the middle of the sentence also without result. Any idea to do it only using sequences please?
TABLE PERSON (Name, Surname, City, Identification, Age)
CREATE SEQUENCE seq
START WITH 1000
INCREMENT BY 1
MINVALUE 1000;
ALTER TABLE person
ADD COLUMN identification integer DEFAULT nextval('seq');

A quick example using dummy data:
create table age_seq_test(age int , fld_1 varchar, id integer);
insert into age_seq_test values (10, 'test'), (30, 'test2'), (20, 'test3');
select * from age_seq_test order by age;
age | fld_1 | id
-----+-------+------
10 | test | NULL
30 | test2 | NULL
20 | test3 | NULL
BEGIN;
with t as
(select age, row_number() over (order by age) as rn from age_seq_test)
update
age_seq_test AS ast
set
id = t.rn + 999
from
t
where
ast.age = t.age ;
select * from age_seq_test order by age;
age | fld_1 | id
-----+-------+------
10 | test | 1000
20 | test3 | 1001
30 | test2 | 1002
--COMMIT/ROLLBACK depending on what the SELECT shows.

Related

tsql - How to convert multiples rows and columns into one row

id | acct_num | name | orderdt
1 1006A Joe Doe 1/1/2021
2 1006A Joe Doe 1/5/2021
EXPECTED OUTPUT
id | acct_num | name | orderdt | id1 | acct_num1 | NAME1 | orderdt1
1 1006A Joe Doe 1/1/2021 2 1006A Joe Doe 1/5/2021
My query is the following:
Select id,
acct_num,
name,
orderdt
from order_tbl
where acct_num = '1006A'
and orderdt >= '1/1/2021'
If you always have one or two rows you could do it like this (I'm assuming the latest version of SQL Server because you said TSQL):
NOTE: If you have a known max (eg 4) this solution can be converted to support any number by changing the modulus and adding more columns and another join.
WITH order_table_numbered as
(
SELECT ID, ACCT_NUM, NAME, ORDERDT,
ROW_NUMBER() AS (PARTITION BY ACCT_NUM ORDER BY ORDERDT) as RN
)
SELECT first.id as id, first.acct_num as acct_num, first.num as num, first.order_dt as orderdt,
second.id as id1, second.acct_num as acct_num1, second.num as num1, second.order_dt as orderdt1
FROM order_table_numbered first
LEFT JOIN order_table_numbered second ON first.ACCT_NUM = second.ACCT_NUM and (second.RN % 2 = 0)
WHERE first.RN % 2 = 1
If you have an unknown number of rows I think you should solve this on the client OR convert the groups to XML -- the XML support in SQL Server is not bad.

Adding days to sysdate based off of values in a column

I am trying to create a manual table based off of a currently built views table.
The structure of this current table is as follows:
ID | Column1 | Column2 | Buffer Days
1 | Asdf | Asdf1 | 91
2 | Qwert | Qwert1 | 11
3 | Zxcv | Zxcv1 | 28
The goal is to add a 4th column after Buffer Days that lists the sys date + the number in buffer days
So the outcome would look like:
ID | Column1 | Column2 | Buffer Days | Lookout Date
1 | Asdf | Asdf1 | 91 | 02-Jan-18
That requirement smells like a virtual column candidate. However, it won't work:
SQL> create table test
2 (id number,
3 column1 varchar2(10),
4 buffer_days number,
5 --
6 lookout_date as (SYSDATE + buffer_days) --> virtual column
7 );
lookout_date as (SYSDATE + buffer_days)
*
ERROR at line 6:
ORA-54002: only pure functions can be specified in a virtual column expression
Obviously, as SYSDATE is a non-deterministic function (doesn't return the same value when invoked).
Why not an "ordinary" column in existing table? Because you shouldn't store values that are calculated using other table columns anyway. For example, good old Scott's EMP table contains SAL and COMM columns. It doesn't (and shouldn't) contain TOTAL_SAL column (as SAL + COMM) because - when SAL and/or COMM changes, you have to remember to update TOTAL as well.
Therefore, a view is what could help here. For example:
SQL> create table test
2 (id number,
3 column1 varchar2(10),
4 buffer_days number
5 );
Table created.
SQL> create or replace view v_test as
2 select id,
3 column1,
4 buffer_days,
5 sysdate + buffer_days lookout_date
6 from test;
View created.
SQL> insert into test (id, column1, buffer_days) values (1, 'asdf', 5);
1 row created.
SQL> select sysdate, v.* from v_test v;
SYSDATE ID COLUMN1 BUFFER_DAYS LOOKOUT_DA
---------- ---------- ---------- ----------- ----------
23.12.2017 1 asdf 5 28.12.2017
SQL>

Collect rating statistics using postgresql

I am currently trying to collect rating statistics from a postgreSql database. Below you can find a simplified example of the database schema I would like to query.
CREATE DATABASE test_db;
CREATE TABLE rateable_object (
id BIGSERIAL PRIMARY KEY,
cdate TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
mdate TIMESTAMP,
name VARCHAR(160) NOT NULL,
description VARCHAR NOT NULL
);
CREATE TABLE ratings (
id BIGSERIAL PRIMARY KEY,
cdate TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
mdate TIMESTAMP,
parent_id BIGINT NOT NULL,
rating INTEGER NOT NULL DEFAULT -1
);
I would now like to collect a statistic for the values in the ratings column. The response should look like this:
+--------------+-------+
| column_value | count |
+--------------+-------+
| -1 | 2 |
| 0 | 45 |
| 1 | 37 |
| 2 | 13 |
| 3 | 5 |
| 4 | 35 |
| 5 | 75 |
+--------------+-------+
My first solution (see below) is very naive and probably not the fastest and simplest one. So my question is, if there is a better solution.
WITH
stars AS (SELECT generate_series(-1, 5) AS value),
votes AS (SELECT * FROM ratings WHERE parent_id = 1)
SELECT
stars.value AS stars, coalesce(COUNT(votes.*), 0) as votes
FROM
stars
LEFT JOIN
votes
ON
votes.rating = stars.value
GROUP BY stars.value
ORDER BY stars.value;
As I would not like to waste your time, I prepared some test data for you:
INSERT INTO rateable_object (name, description) VALUES
('Penguin', 'This is the Linux penguin.'),
('Gnu', 'This is the GNU gnu.'),
('Elephant', 'This is the PHP elephant.'),
('Elephant', 'This is the postgres elephant.'),
('Duck', 'This is the duckduckgo duck.'),
('Cat', 'This is the GitHub cat.'),
('Bird', 'This is the Twitter bird.'),
('Lion', 'This is the Leo lion.');
CREATE OR REPLACE FUNCTION generate_test_data() RETURNS INTEGER LANGUAGE plpgsql AS
$$
BEGIN
FOR i IN 0..1000 LOOP
INSERT INTO ratings (parent_id, rating) VALUES
(
(1 + (10 - 1) * random())::numeric::int,
(-1 + (5 + 1) * random())::numeric::int
);
END LOOP;
RETURN 0;
END;
$$;
SELECT generate_test_data();

Restart primary key numbers of existing rows after deleting most of a big table

I am working with a PostgreSQL 8.4.13 database.
Recently I had around around 86.5 million records in a table. I deleted almost all of them - only 5000 records are left now. I ran:
vacuum full
after deleting the rows and that returned disk space to the OS (thx to suggestion from fellow SO member)
But I see that my id numbers are still stuck at millions. For ex:
id | date_time | event_id | secs_since_1970 | value
---------+-------------------------+----------+-----------------+-----------
61216287 | 2013/03/18 16:42:42:041 | 6 | 1363646562.04 | 46.4082
61216289 | 2013/03/18 16:42:43:041 | 6 | 1363646563.04 | 55.4496
61216290 | 2013/03/18 16:42:44:041 | 6 | 1363646564.04 | 40.0553
61216291 | 2013/03/18 16:42:45:041 | 6 | 1363646565.04 | 38.5694
In an attempt to start the id value at 1 again for the remaining rows, I tried:
cluster mytable_pkey on mytable;
where mytable is the name of my table. But that did not help.
So, my question(s) is/are:
Is there a way to get the index (id value) to start at 1 again?
If I add or update the table with a new record, will it start from 1 or pick up the next highest integer value (say 61216292 in above example)?
My table description is as follows: There is no FK constraint and no sequence in it.
jbossql=> \d mytable;
Table "public.mytable"
Column | Type | Modifiers
-----------------+------------------------+-----------
id | bigint | not null
date_time | character varying(255) |
event_id | bigint |
secs_since_1970 | double precision |
value | real |
Indexes:
"mydata_pkey" PRIMARY KEY, btree (id) CLUSTER
Drop the primary key fisrt and create a temporary sequence.
alter table mytable drop constraint mydata_pkey;
create temporary sequence temp_seq;
Use the sequence to update:
update mytable
set id = nextval('temp_seq');
Recreate the primary key and drop the sequence
alter table mytable add primary key (id);
drop sequence temp_seq;
If there is a foreign key dependency on this table then you will have to deal with it first and the update will be a more complex procedure.
Is your primary key defined using a serial? If so that creates an implicit sequence. You can use ALTER SEQUENCE (see: http://www.postgresql.org/docs/8.2/static/sql-altersequence.html for syntax) to change the starting number back to 1.
Based on the fact that you have some records left (just noticed the 5000 left), you DO NOT want to reset that number to a number before the last ID of the remaining records because then that sequence will generate non-unique numbers. The point of using a sequence is it gives you a transactional way to increment a number and guarantee successive operations get unique incremented numbers.
DROP SCHEMA tmp CASCADE;
CREATE SCHEMA tmp ;
SET search_path=tmp;
--
-- Note: "deferrable initially deferred" appears to be the default
--
CREATE TABLE funky
( id SERIAL NOT NULL PRIMARY KEY DEFERRABLE INITIALLY DEFERRED
, tekst varchar
);
-- create some data with gaps in it
INSERT INTO funky(id, tekst)
SELECT gs, 'Number:' || gs::text
FROM generate_series(1,100,10) gs
;
-- set the sequence to the max occuring id
SELECT setval('funky_id_seq' , mx.mx)
FROM (SELECT max(id) AS mx FROM funky) mx
;
SELECT * FROM funky ;
-- compress the keyspace, making the ids consecutive
UPDATE funky xy
SET id = self.newid
FROM (
SELECT id AS id
, row_number() OVER (ORDER BY id) AS newid
FROM funky
) self
WHERE self.id = xy.id
;
-- set the sequence to the new max occuring id
SELECT setval('funky_id_seq' , mx.mx)
FROM (SELECT max(id) AS mx FROM funky) mx
;
SELECT * FROM funky ;
Result:
CREATE TABLE
INSERT 0 10
setval
--------
91
(1 row)
id | tekst
----+-----------
1 | Number:1
11 | Number:11
21 | Number:21
31 | Number:31
41 | Number:41
51 | Number:51
61 | Number:61
71 | Number:71
81 | Number:81
91 | Number:91
(10 rows)
UPDATE 10
setval
--------
10
(1 row)
id | tekst
----+-----------
1 | Number:1
2 | Number:11
3 | Number:21
4 | Number:31
5 | Number:41
6 | Number:51
7 | Number:61
8 | Number:71
9 | Number:81
10 | Number:91
(10 rows)
WARNING WARNING WARNING WARNING WARNING WARNING ACHTUNG:
Changing key values is generally a terrible idea. Avoid it at all cost.

adding missing date in a table in PostgreSQL

I have a table that contains data for every day in 2002, but it has some missing dates. Namely, 354 records for 2002 (instead of 365). For my calculations, I need to have the missing data in the table with Null values
+-----+------------+------------+
| ID | rainfall | date |
+-----+------------+------------+
| 100 | 110.2 | 2002-05-06 |
| 101 | 56.6 | 2002-05-07 |
| 102 | 65.6 | 2002-05-09 |
| 103 | 75.9 | 2002-05-10 |
+-----+------------+------------+
you see that 2002-05-08 is missing. I want my final table to be like:
+-----+------------+------------+
| ID | rainfall | date |
+-----+------------+------------+
| 100 | 110.2 | 2002-05-06 |
| 101 | 56.6 | 2002-05-07 |
| 102 | | 2002-05-08 |
| 103 | 65.6 | 2002-05-09 |
| 104 | 75.9 | 2002-05-10 |
+-----+------------+------------+
Is there a way to do that in PostgreSQL?
It doesn't matter if I have the result just as a query result (not necessarily an updated table)
date is a reserved word in standard SQL and the name of a data type in PostgreSQL. PostgreSQL allows it as identifier, but that doesn't make it a good idea. I use thedate as column name instead.
Don't rely on the absence of gaps in a surrogate ID. That's almost always a bad idea. Treat such an ID as unique number without meaning, even if it seems to carry certain other attributes most of the time.
In this particular case, as #Clodoaldo commented, thedate seems to be a perfect primary key and the column id is just cruft - which I removed:
CREATE TEMP TABLE tbl (thedate date PRIMARY KEY, rainfall numeric);
INSERT INTO tbl(thedate, rainfall) VALUES
('2002-05-06', 110.2)
, ('2002-05-07', 56.6)
, ('2002-05-09', 65.6)
, ('2002-05-10', 75.9);
Query
Full table by query:
SELECT x.thedate, t.rainfall -- rainfall automatically NULL for missing rows
FROM (
SELECT generate_series(min(thedate), max(thedate), '1d')::date AS thedate
FROM tbl
) x
LEFT JOIN tbl t USING (thedate)
ORDER BY x.thedate
Similar to what #a_horse_with_no_name posted, but simplified and ignoring the pruned id.
Fills in gaps between first and last date found in the table. If there can be leading / lagging gaps, extend accordingly. You can use date_trunc() like #Clodoaldo demonstrated - but his query suffers from syntax errors and can be simpler.
INSERT missing rows
The fastest and most readable way to do it is a NOT EXISTS anti-semi-join.
INSERT INTO tbl (thedate, rainfall)
SELECT x.thedate, NULL
FROM (
SELECT generate_series(min(thedate), max(thedate), '1d')::date AS thedate
FROM tbl
) x
WHERE NOT EXISTS (SELECT 1 FROM tbl t WHERE t.thedate = x.thedate)
Just do an outer join against a query that returns all dates in 2002:
with all_dates as (
select date '2002-01-01' + i as date_col
from generate_series(0, extract(doy from date '2002-12-31')::int - 1) as i
)
select row_number() over (order by ad.date_col) as id,
t.rainfall,
ad.date_col as date
from all_dates ad
left join your_table t on ad.date_col = t.date
order by ad.date_col;
This will not change your table, it will just produce the result as desired.
Note that the generated id column will not contain the same values as the ID column in your table as it is merely a counter in the result set.
You could also replace the row_number() function with extract(doy from ad.date_col)
To fill the gaps. This will not reorder the IDs:
insert into t (rainfall, "date") values
select null, "date"
from (
select d::date as "date"
from (
t
right join
generate_series(
(select date_trunc('year', min("date")) from t)::timestamp,
(select max("date") from t),
'1 day'
) s(d) on t."date" = s.d::date
where t."date" is null
) q
) s
You have to fully re-create your table as indexes haves to change.
The better way to do it is to use your prefered dbi language, make a loop ignoring ID and putting values in a new table with new serialized IDs.
for day in (whole needed calendar)
value = select rainfall from oldbrokentable where date = day
insert into newcleanedtable date=day, rainfall=value, id=serialized
(That's not real code! Just conceptual to be adapted to your prefered scripting language)