I have this:
with required_sportsman as (
select sportsman_result_id from sportsman
where sportsman.sportsman_id = 1
)
update sportsman_result
set shown_result = shown_result + 1
where sportsman_result.sportsman_result_id in (select sportsman_result_id from required_sportsman);
But I want to remove the last select: (select sportsman_result_id from required_sportsman)
How can I do it?
My tables:
sportsman_result:
column name
data type
constraints
sports_id
integer
competition_id_id
integer
shown_result
integer
result_date
date
sportsman_result_id
integer
primary key
sportsman:
column name
data type
constraints
first_name
text
last_name
text
sports_id
integer
trainer_id
integer
sportsman_result_id
integer
foreign key
number_of_wins
integer
year_of_birth
date
country
text
sportsman_id
integer
primary key
Use the PostgreSQL extension UPDATE ... FROM:
UPDATE sportsman_result AS sr
SET shown_result = sr.shown_result + 1
FROM sportsman AS s
WHERE sr.sportsman_result_id = s.sportsman_result_id
AND s.sportsman_id = 1;
Keeping redundant data like shown_result in the database is usually not a good idea.
Related
I intend to create a TABLE called WEB_TICKETS where the PRIMARY KEY is equal to the key->ID value. For some reason, when I run the CREATE TABLE instruction the PRIMARY KEY value is appended with the chars 'JO' - why is this happening?
KsqlDb Statements
These work as expected
CREATE STREAM STREAM_WEB_TICKETS (
ID_TICKET STRUCT<ID STRING> KEY
)
WITH (KAFKA_TOPIC='web.mongodb.tickets', FORMAT='AVRO');
CREATE STREAM WEB_TICKETS_REKEYED
WITH (KAFKA_TOPIC='web_tickets_by_id') AS
SELECT *
FROM STREAM_WEB_TICKETS
PARTITION BY ID_TICKET->ID;
PRINT 'web_tickets_by_id' FROM BEGINNING LIMIT 1;
key: 5d0c2416b326fe00515408b8
The following successfully creates the table but the PRIMARY KEY value isn't what I expect:
CREATE TABLE web_tickets (
id_pk STRING PRIMARY KEY
)
WITH (KAFKA_TOPIC = 'web_tickets_by_id', VALUE_FORMAT = 'AVRO');
select id_pk from web_tickets EMIT CHANGES LIMIT 1;
|ID_PK|
|J05d0c2416b326fe00515408b8
As you can see the ID_PK value has the characters JO appended to it. Why is this?
It appears as though I wasn't properly setting the KEY FORMAT. The following command produces the expected result.
CREATE TABLE web_tickets_test_2 (
id_pk VARCHAR PRIMARY KEY
)
WITH (KAFKA_TOPIC = 'web_tickets_by_id', FORMAT = 'AVRO');
I have two PostreSQL tables:
CREATE TABLE source.staticprompts (
id INT,
projectid BIGINT,
scriptid INT,
promptnum INT,
prompttype VARCHAR(20),
inputs VARCHAR(2000),
attributes VARCHAR(2000),
text VARCHAR(2000),
corpuscode VARCHAR(2000),
comment VARCHAR(2000),
created TIMESTAMP,
modified TIMESTAMP
);
and
CREATE TABLE target.dim_collect_user_inp_configs (
collect_project_id BIGINT NOT NULL,
prompt_type VARCHAR(20),
prompt_input_desc VARCHAR(3000),
prompt_input_name VARCHAR(1000),
no_of_prompt_count BIGINT,
prompt_input_value VARCHAR(100),
prompt_input_value_id BIGSERIAL PRIMARY KEY,
script_id BIGINT,
corpuscode VARCHAR(20),
min_recordings VARCHAR(2000),
max_recordings VARCHAR(2000),
recordings_count VARCHAR(2000),
lease_duration VARCHAR(2000),
date_created TIMESTAMP WITHOUT TIME ZONE NOT NULL DEFAULT NOW(),
date_updated TIMESTAMP WITHOUT TIME ZONE,
CONSTRAINT must_be_unique UNIQUE (prompt_input_value, collect_project_id)
);
I need copy data from source to target with this conditions:
Each value need to be stored as one row in the dim_collect_user_inp_configs table. Example, Indoor-Loud as one row and it will have it’s own unique identifier as prompt_input_value_id, Indoor-Normal as one row and it will have it’s own unique identifier as prompt_input_value_id till the Semi-Outdoor-Whisper.
There could be multiple input “name” in one inputs column. Each name and its value need to be stored separately.
prompt_input_value_id - Generate unique sequence number for the combination of each prompt_input_value and collect_project_id
Source table have this data:
20030,input,m66,,null,"[{""desc"": ""Select the setting that you will do the recordings under."", ""name"": ""ambient"", ""type"": ""dropdown"", ""values"": ["""", ""Indoors + High + Loud"", ""Indoors + High + Normal"", ""Indoors + Low + Normal"", ""Indoors + Low + LowVolume"", ""Outdoors + High + Normal"", ""Outdoors + Low + Loud"", ""Outdoors + Low + Normal"", ""Outdoors + Low + LowVolume""]}, {""desc"": ""Select the noise type that you will do the recordings under."", ""name"": ""Noise type"", ""type"": ""dropdown"", ""values"": ["""", ""Human Speech"", ""Ambient Speech"", ""Non-Speech""]}]",,2018-12-13 13:49:24.408933,1,5,5906,2021-08-26 12:43:54.061000
I try to do this task with this query:
INSERT INTO target.dim_collect_user_inp_configs AS t (
collect_project_id,
prompt_type,
prompt_input_desc,
prompt_input_name,
prompt_input_value,
script_id,
corpuscode)
SELECT
s.projectid,
s.prompttype,
el.inputs->>'name' AS name,
el.inputs->>'desc' AS description,
jsonb_array_elements(el.inputs->'values') AS value,
s.scriptid,
s.corpuscode
FROM source.staticprompts AS s,
jsonb_array_elements(s.inputs::jsonb) el(inputs)
ON CONFLICT
(prompt_input_value, collect_project_id)
DO UPDATE SET
(prompt_input_desc, prompt_input_name, date_updated) =
(EXCLUDED.prompt_input_desc,
EXCLUDED.prompt_input_name,
NOW())
WHERE t.prompt_input_desc != EXCLUDED.prompt_input_desc
OR t.prompt_input_name != EXCLUDED.prompt_input_name
RETURNING *;
But I get an error:
ON CONFLICT DO UPDATE command cannot affect row a second time Hint: Ensure that no rows proposed for insertion within the same command have duplicate constrained values.
Can you help where is mistake?
Change the SELECT so that all rows with the same prompt_input_value and collect_project_id are grouped together, then each targer row will be updated at most once. Use aggregate functions for all other columns.
Something like
SELECT s.projectid,
max(s.prompttype),
max(el.inputs->>'name') AS name,
max(el.inputs->>'desc') AS description,
v.value,
max(s.scriptid),
max(s.corpuscode)
FROM source.staticprompts AS s
CROSS JOIN LATERAL jsonb_array_elements(s.inputs::jsonb) AS el(inputs)
CROSS JOIN LATERAL jsonb_array_elements(el.inputs->'values') AS v(value)
GROUP BY s.projectid, v.value
I have a table consisting of products (with ID's, ~15k records) and another table price_changes (~88m records) recording a change in the price for a given productID at a given changedate.
I'm now interested in the price for each product at given points in time (say every 2 hours for a year, so altogether ~ 4300 points; altogether resulting in ~64m data points of interest). While it's very straight forward to determine the price for a given product at a given time, it seems to be quite time-consuming to determine all 64m data points.
My approach is to pre-populate a new target table fullprices with the data points of interest:
insert into fullprices(obsdate,productID)
select obsdate, productID from targetdates, products
and then update each price observation in this new table like this:
update fullprices f set price = (select price from price_changes where
productID = f.productID and date < f.obsdate
order by date desc
limit 1)
which should give me the most recent price change in each point in time.
Unfortunately, this takes ... well, ages. Is there any better way to do it?
== Edit: My tables are created as follows: ==
CREATE TABLE products
(
productID uuid NOT NULL,
name text NOT NULL,
CONSTRAINT products_pkey PRIMARY KEY (productID )
);
CREATE TABLE price_changes
(
id integer NOT NULL,
productID uuid NOT NULL,
price smallint,
date timestamp NOT NULL
);
CREATE INDEX idx_pc_date
ON price_changes USING btree
(date);
CREATE INDEX idx_pc_productID
ON price_changes USING btree
(productID);
CREATE TABLE targetdates
(
obsdate timestamp
);
CREATE TABLE fullprices
(
obsdate timestamp NOT NULL,
productID uuid NOT NULL,
price smallint
);
I have a date column which I want to be unique once populated, but want the date field to be ignored if it is not populated.
In MySQL the way this is accomplished is to set the date column to "not null" and give it a default value of '0000-00-00' - this allows all other fields in the unique index to be "checked" even if the date column is not populated yet.
This does not work in PosgreSQL because '0000-00-00' is not a valid date, so you cannot store it in a date field (this makes sense to me).
At first glance, leaving the field nullable seemed like an option, but this creates a problem:
=> create table uniq_test(NUMBER bigint not null, date DATE, UNIQUE(number, date));
CREATE TABLE
=> insert into uniq_test(number) values(1);
INSERT 0 1
=> insert into uniq_test(number) values(1);
INSERT 0 1
=> insert into uniq_test(number) values(1);
INSERT 0 1
=> insert into uniq_test(number) values(1);
INSERT 0 1
=> select * from uniq_test;
number | date
--------+------
1 |
1 |
1 |
1 |
(4 rows)
NULL apparently "isn't equal to itself" and so it does not count towards constraints.
If I add an additional unique constraint only on the number field, it checks only number and not date and so I cannot have two numbers with different dates.
I could select a default date that is a 'valid date' (but outside working scope) to get around this, and could (in fact) get away with that for the current project, but there are actually cases I might be encountering in the next few years where it will not in fact be evident that the date is a non-real date just because it is "a long time ago" or "in the future."
The advantage the '0000-00-00' mechanic had for me was precisely that this date isn't real and therefore indicated a non-populated entry (where 'non-populated' was a valid uniqueness attribute). When I look around for solutions to this on the internet, most of what I find is "just use NULL" and "storing zeros is stupid."
TL;DR
Is there a PostgreSQL best practice for needing to include "not populated" as a possible value in a unique constraint including a date field?
Not clear what you want. This is my guess:
create table uniq_test (number bigint not null, date date);
create unique index i1 on uniq_test (number, date)
where date is not null;
create unique index i2 on uniq_test (number)
where date is null;
There will be an unique constraint for not null dates and another one for null dates effectively turning the (number, date) tuples into distinct values.
Check partial index
It's not a best practice, but you can do it such way:
t=# create table so35(i int, d date);
CREATE TABLE
t=# create unique index i35 on so35(i, coalesce(d,'-infinity'));
CREATE INDEX
t=# insert into so35 (i) select 1;
INSERT 0 1
t=# insert into so35 (i) select 2;
INSERT 0 1
t=# insert into so35 (i) select 2;
ERROR: duplicate key value violates unique constraint "i35"
DETAIL: Key (i, (COALESCE(d, '-infinity'::date)))=(2, -infinity) already exists.
STATEMENT: insert into so35 (i) select 2;
I am trying to create a database for movielens (http://grouplens.org/datasets/movielens/). We've got movies and ratings. Movies have multiple genres. I splitted those out into a separate table since it's a 1:many relationship. There's a many:many relationship as well, users to movies. I need to be able to query this table multiple ways.
So I created:
CREATE TABLE genre (
genre_id serial NOT NULL,
genre_name char(20) DEFAULT NULL,
PRIMARY KEY (genre_id)
)
.
INSERT INTO genre VALUES
(1,'Action'),(2,'Adventure'),(3,'Animation'),(4,'Children\s'),(5,'Comedy'),(6,'Crime'),
(7,'Documentary'),(8,'Drama'),(9,'Fantasy'),(10,'Film-Noir'),(11,'Horror'),(12,'Musical'),
(13,'Mystery'),(14,'Romance'),(15,'Sci-Fi'),(16,'Thriller'),(17,'War'),(18,'Western');
.
CREATE TABLE movie (
movie_id int NOT NULL DEFAULT '0',
movie_name char(75) DEFAULT NULL,
movie_year smallint DEFAULT NULL,
PRIMARY KEY (movie_id)
);
.
CREATE TABLE moviegenre (
movie_id int NOT NULL DEFAULT '0',
genre_id tinyint NOT NULL DEFAULT '0',
PRIMARY KEY (movie_id, genre_id)
);
I dont know how to import my movies.csv with columns movie_id, movie_name and movie_genre For example, the first row is (1;Toy Story (1995);Animation|Children's|Comedy)
If I INSERT manually, it should be look like:
INSERT INTO moviegenre VALUES (1,3),(1,4),(1,5)
Because 3 is Animation, 4 is Children and 5 is Comedy
How can I import all data set this way?
You should first create a table that can ingest the data from the CSV file:
CREATE TABLE movies_csv (
movie_id integer,
movie_name varchar,
movie_genre varchar
);
Note that any single quotes (Children's) should be doubled (Children''s). Once the data is in this staging table you can copy the data over to the movie table, which should have the following structure:
CREATE TABLE movie (
movie_id integer, -- A primary key has implicit NOT NULL and should not have default
movie_name varchar NOT NULL, -- Movie should have a name, varchar more flexible
movie_year integer, -- Regular integer is more efficient
PRIMARY KEY (movie_id)
);
Sanitize your other tables likewise.
Now copy the data over, extracting the unadorned name and the year from the CSV name:
INSERT INTO movie (movie_id, movie_name)
SELECT parts[1], parts[2]::integer
FROM movies_csv, regexp_matches(movie_name, '([[:ascii:]]*)\s\(([\d]*)\)$') p(parts)
Here the regular expression says:
([[:ascii:]]*) - Capture all characters until the matches below
\s - Read past a space
\( - Read past an opening parenthesis
([\d]*) - Capture any digits
\) - Read past a closing parenthesis
$ - Match from the end of the string
So on input "Die Hard 17 (John lives forever) (2074)" it creates a string array with {'Die Hard 17 (John lives forever)', '2074'}. The scanning has to be from the end $, assuming all movie titles end with the year of publication in parentheses, in order to preserve parentheses and numbers in movie titles.
Now you can work on the movie genres. You have to split the string on the bar | using the regex_split_to_table() function and then join to the genre table on the genre name:
INSERT INTO moviegenre
SELECT movie_id, genre_id
FROM movies_csv, regexp_split_to_table(movie_genre, '\|') p(genre) -- escape the |
JOIN genre ON genre.genre_name = p.genre;
After all is done and dusted you can delete the movies_csv table.