I have a MySQL table describing the maintenance cycle of railway vehicles (type of overhaul and period in kilometers):
CREATE TABLE `cycle_test` (
`id` int(11) unsigned NOT NULL AUTO_INCREMENT,
`type` varchar(2) COLLATE utf8_spanish_ci NOT NULL,
`km` int(11) unsigned NOT NULL,
PRIMARY KEY (`id`)
) ENGINE=MyISAM DEFAULT CHARSET=utf8 COLLATE=utf8_spanish_ci
I have populated this table as follows:
INSERT INTO `cycle_test` (`id`, `type`, `km`) VALUES (NULL, 'R1', '12000'), (NULL, 'R2', '24000'), (NULL, 'R3', '72000'), (NULL, 'R4', '144000');
SELECT type, km FROM cycle_test;
+------+--------+
| type | km |
+------+--------+
| R1 | 12000 |
| R2 | 24000 |
| R3 | 72000 |
| R4 | 144000 |
+------+--------+
The goal is to obtain the maintenance schedule, with all the overhauls to be carried out throughout the cycle, and the mileage of each overhaul, taking into account that, when overhauls of different types coincide, the one with the highest rank (R4 > R3 > R2 > R1) supersedes the ones with the lowest rank, as shown below:
+--------+------+
| km | type |
+--------+------+
| 12000 | R1 |
| 24000 | R2 |
| 36000 | R1 |
| 48000 | R2 |
| 60000 | R1 |
| 72000 | R3 |
| 84000 | R1 |
| 96000 | R2 |
| 108000 | R1 |
| 120000 | R2 |
| 132000 | R1 |
| 144000 | R4 |
+--------+------+
this problem I have already solved in PHP, but I am looking for a purely MySQL solution, either with session variables or stored procedures.
The (possible) solution must be scalable for different maintenance cycles (not hardcoded).
Thanks in advance for your help and/or hints.
Well, with the help of the MySQL docs, it has turned out easier than I thought. First, I have created the schedule table (in production, it will be a temporary table):
CREATE TABLE `schedule_test` (
`type` varchar(2) COLLATE utf8_spanish_ci NOT NULL,
`km` int(11) unsigned NOT NULL,
UNIQUE KEY `km` (`km`)
) ENGINE=MyISAM DEFAULT CHARSET=utf8 COLLATE=utf8_spanish_ci
The UNIQUE KEY km allows detecting the concurrence of overhauls of different ranks with the same mileage.
Then I have created the following stored procedure in the database:
BEGIN
/* Declaration of variables, cursors and handlers */
DECLARE done INT DEFAULT 0;
DECLARE vTip CHAR(16);
DECLARE vKm, acKm INT;
DECLARE vFinal INT DEFAULT 0;
DECLARE cur1 CURSOR FOR SELECT type, km FROM cycle_test;
DECLARE CONTINUE HANDLER FOR NOT FOUND SET done=1;
/* end declaration */
TRUNCATE schedule_test; -- Get rid of any data
SELECT MAX(km) FROM cycle_test INTO vFinal; -- Find the final mileage of the cycle
OPEN cur1;
REPEAT
FETCH cur1 INTO vTip, vKm; -- Read record from overhaul cycle
IF NOT done THEN
SET acKm = 0; -- Initialize mileage
REPEAT
SET acKm = acKm + vKm; -- Accumulate mileage
INSERT INTO schedule_test (type, km) VALUES (vTip, acKm)
ON DUPLICATE KEY UPDATE type=VALUES(type); -- Insert schedule; update type if an overhaul already exists
UNTIL acKm = vFinal END REPEAT; -- until reach the end of cycle
END IF;
UNTIL done END REPEAT;
CLOSE cur1;
END
And this is the result I wanted, after running the procedure:
mysql> select * from schedule_test;
+------+--------+
| type | km |
+------+--------+
| R1 | 12000 |
| R2 | 24000 |
| R1 | 36000 |
| R2 | 48000 |
| R1 | 60000 |
| R3 | 72000 |
| R1 | 84000 |
| R2 | 96000 |
| R1 | 108000 |
| R2 | 120000 |
| R1 | 132000 |
| R4 | 144000 |
+------+--------+
12 rows in set (0.00 sec)
Related
I am using libpq to connect the Postgres server in c++ code. Postgres server version is 12.10
My table schema is defined below
Column | Type | Collation | Nullable | Default | Storage | Stats target | Description
---------------------+----------+-----------+----------+------------+----------+--------------+-------------
event_id | bigint | | not null | | plain | |
event_sec | integer | | not null | | plain | |
event_usec | integer | | not null | | plain | |
event_op | smallint | | not null | | plain | |
rd | bigint | | not null | | plain | |
addr | bigint | | not null | | plain | |
masklen | bigint | | not null | | plain | |
path_id | bigint | | | | plain | |
attribs_tbl_last_id | bigint | | not null | | plain | |
attribs_tbl_next_id | bigint | | not null | | plain | |
bgp_id | bigint | | not null | | plain | |
last_lbl_stk | bytea | | not null | | extended | |
next_lbl_stk | bytea | | not null | | extended | |
last_state | smallint | | | | plain | |
next_state | smallint | | | | plain | |
pkey | integer | | not null | 1654449420 | plain | |
Partition key: LIST (pkey)
Indexes:
"event_pkey" PRIMARY KEY, btree (event_id, pkey)
"event_event_sec_event_usec_idx" btree (event_sec, event_usec)
Partitions: event_spl_1651768781 FOR VALUES IN (1651768781),
event_spl_1652029140 FOR VALUES IN (1652029140),
event_spl_1652633760 FOR VALUES IN (1652633760),
event_spl_1653372439 FOR VALUES IN (1653372439),
event_spl_1653786420 FOR VALUES IN (1653786420),
event_spl_1654449420 FOR VALUES IN (1654449420)
When I execute the following query it takes 1 - 2 milliseconds to execute.
Time is provided as a parameter to function executing this query, it contains epoche seconds and microseconds.
SELECT event_id FROM event WHERE (event_sec > time.seconds) OR ((event_sec=time.seconds) AND (event_usec>=time.useconds) ORDER BY event_sec, event_usec LIMIT 1
This query is executed every 30 seconds on the same client connection (Which is persistent for weeks). This process runs for weeks, but some time same query starts taking more than 10 minutes.
If I restart the process it recreated connection with the server and now execution time again falls back to 1-2 milliseconds. This issue is intermittent, sometimes it triggers after a week of running process and some time after 2 - 3 weeks of running process.
We add a new partition to table every Sunday and write new data in new partition.
I don't know why the performance is inconsistent, there are many possibilities we can't distinguish with the info provided. Like, does the plan change when the performance changes, or does the same plan just perform worse?
But your query is not written to take maximal advantage of the index. In my hands it can use the index for ordering, but then it still needs to read and individually skip over things that fail the WHERE clause until it finds the first one which passes. And due to partitioning, I think it is even worse than that, it has to do this read-and-skip until it finds the first one which passes in each partition.
You could rewrite it to do a tuple comparison, which can use the index to determine both the order, and where to start:
SELECT event_id FROM event
WHERE (event_sec,event_sec) >= (:seconds,:useconds)
ORDER BY event_sec, event_usec LIMIT 1;
Now this might also degrade, or might not, or maybe will degrade but still be so fast that it doesn't matter.
I'm curious if there is a way to write a unique constraint to support the following situation.
Suppose I have table table1 with facts about a user, with four columns:
user_id: unique id for user
source: where the detail came from
d1: dimension 1 of the fact
d2: dimension 2 of the fact
The following is an example of data in this table:
| row_id | user_id | source | d1 | d2 |
|--------|---------|--------|--------|---------|
| 1 | aaa111 | foo | bar | 123 |
| 2 | aaa111 | foo | baz | 'horse' |
| 3 | aaa111 | scrog | bar | 123 |
| 4 | bbb222 | foo | goober | 456 |
Currently, a unique constraint exists on source + d1 + d2. This is good, because it allows the same user to have duplicates of (d1,d2), as long as they have a different source.
Rows #1 and #3 demonstrate this for user aaa111.
However, this constraint does not prevent the following row from getting added...
| row_id | user_id | source | d1 | d2 |
|--------|---------|--------|--------|---------|
| 1 | aaa111 | foo | bar | 123 |
| 2 | aaa111 | foo | baz | 'horse' |
| 3 | aaa111 | scrog | bar | 123 |
| 4 | bbb222 | foo | goober | 456 |
| 5 | bbb222 | turnip | baz | 'horse' | <---- allowed new row
...because source is different for rows #2 and #5.
I would like to add a unique constraint where the combination of (d1,d2) may only exist for a single user_id.
Said another way, a single user can have as many unique (source, d1, d2) combinations as needed, but cannot share (d1,d2) with another user_id.
Is this data model fundamentally flawed to support this constraint? or is there a unique constraint that might help enforce this? Thanks in advance for any suggestions.
It's a conditional-constraint, you can use a trigger BEFORE INSERT OR UPDATE that raise exception when violate the constraint:
CREATE OR REPLACE FUNCTION check_user_combination() RETURNS trigger AS
$$
DECLARE
vCheckUser INTEGER;
BEGIN
SELECT INTO vCheckUser user_id
FROM table1
WHERE d1 = NEW.d1
AND d2 = NEW.d2
AND user_id <> NEW.user_id;
IF vCheckUser IS NOT NULL THEN
RAISE EXCEPTION 'User % have already d1=% and d2=%',vCheckUser,NEW.d1, NEW.d2;
END IF;
RETURN NEW;
END;
$$
language 'plpgsql';
CREATE TRIGGER tr_check_combination BEFORE INSERT OR UPDATE ON table1 FOR EACH ROW EXECUTE PROCEDURE check_user_combination();
This prevent insert or update additional user for the same d1 and d2.
I want to create a function that can create a table, in which part of the columns is derived from the other two tables.
input table1:
This is a static table for each loan. Each loan has only one row with information related to that loan. For example, original unpaid balance, original interest rate...
| id | loan_age | ori_upb | ori_rate | ltv |
| --- | -------- | ------- | -------- | --- |
| 1 | 360 | 1500 | 4.5 | 0.6 |
| 2 | 360 | 2000 | 3.8 | 0.5 |
input table2:
This is a dynamic table for each loan. Each loan has seraval rows show the loan performance in each month. For example, current unpaid balance, current interest rate, delinquancy status...
| id | month| cur_upb | cur_rate |status|
| ---| --- | ------- | -------- | --- |
| 1 | 01 | 1400 | 4.5 | 0 |
| 1 | 02 | 1300 | 4.5 | 0 |
| 1 | 03 | 1200 | 4.5 | 1 |
| 2 | 01 | 2000 | 3.8 | 0 |
| 2 | 02 | 1900 | 3.8 | 0 |
| 2 | 03 | 1900 | 3.8 | 1 |
| 2 | 04 | 1900 | 3.8 | 2 |
output table:
The output table contains information from table1 and table2. Payoffupb is the last record of cur_upb in table2. This table is built for model development.
| id | loan_age | ori_upb | ori_rate | ltv | payoffmonth| payoffupb | payoffrate |lastStatus | modification |
| ---| -------- | ------- | -------- | --- | ---------- | --------- | ---------- |---------- | ------------ |
| 1 | 360 | 1500 | 4.5 | 0.6 | 03 | 1200 | 4.5 | 1 | null |
| 2 | 360 | 2000 | 3.8 | 0.5 | 04 | 1900 | 3.8 | 2 | null |
Most columns in the output table can directly get or transferred from columns in the two input tables, but some columns can not get then leave blank.
My main question is how to write a function to take two tables as inputs and output another table?
I already wrote the feature transformation part for data files in 2018, but I need to do the same thing again for data files in some other years. That's why I want to create a function to make things easier.
As you want to insert the latest entry of table2 against each entry of table1 try this
insert into table3 (id, loan_age, ori_upb, ori_rate, ltv,
payoffmonth, payoffupb, payoffrate, lastStatus )
select distinct on (t1.id)
t1.id, t1.loan_age, t1.ori_upb, t1.ori_rate, t1.ltv, t2.month, t2.cur_upb,
t2.cur_rate, t2.status
from
table1 t1
inner join
table2 t2 on t1.id=t2.id
order by t1.id , t2.month desc
DEMO1
EDIT for your updated question:
Function to do the above considering table1, table2, table3 structure will be always identical.
create or replace function insert_values(table1 varchar, table2 varchar, table3 varchar)
returns int as $$
declare
count_ int;
begin
execute format('insert into %I (id, loan_age, ori_upb, ori_rate, ltv, payoffmonth, payoffupb, payoffrate, lastStatus )
select distinct on (t1.id) t1.id, t1.loan_age, t1.ori_upb,
t1.ori_rate,t1.ltv,t2.month,t2.cur_upb, t2.cur_rate, t2.status
from %I t1 inner join %I t2 on t1.id=t2.id order by t1.id , t2.month desc',table3,table1,table2);
GET DIAGNOSTICS count_ = ROW_COUNT;
return count_;
end;
$$
language plpgsql
and call above function like below which will return the number of inserted rows:
select * from insert_values('table1','table2','table3');
DEMO2
I'm using Postgres SQL and pgAdmin. I'm attempting to copy data between a staging table, and a production table using INSERT INTO with a SELECT FROM statement with a to_char along the way. This may or may not be the wrong approach. The SELECT fails because apparently "column i.dates does not exist".
The question is: Why am I getting 'column i.dates does not exist'?
The schema for both tables is identical except for a date conversion.
I've tried matching the schema of the tables with the exception of the to_char conversion. I've checked and double checked the column exists.
This is the code I'm trying:
INSERT INTO weathergrids (location, dates, temperature, rh, wd, ws, df, cu, cc)
SELECT
i.location AS location,
i.dates as dates,
i.temperature as temperature,
i.rh as rh,
i.winddir as winddir,
i.windspeed as windspeed,
i.droughtfactor as droughtfactor,
i.curing as curing,
i.cloudcover as cloudcover
FROM (
SELECT location,
to_char(to_timestamp(dates, 'YYYY-DD-MM HH24:MI'), 'HH24:MI YYYY-MM-DD HH24:MI'),
temperature, rh, wd, ws, df, cu, cc
FROM wosweathergrids
) i;
The error I'm receiving is:
ERROR: column i.dates does not exist
LINE 4: i.dates as dates,
^
SQL state: 42703
Character: 151
My data schema is like:
+-----------------+-----+-------------+-----------------------------+-----+
| TABLE | NUM | COLNAME | DATATYPE | LEN |
+-----------------+-----+-------------+-----------------------------+-----+
| weathergrids | 1 | id | integer | 32 |
| weathergrids | 2 | location | numeric | 6 |
| weathergrids | 3 | dates | timestamp without time zone | |
| weathergrids | 4 | temperature | numeric | 3 |
| weathergrids | 5 | rh | numeric | 4 |
| weathergrids | 6 | wd | numeric | 4 |
| weathergrids | 7 | wsd | numeric | 4 |
| weathergrids | 8 | df | numeric | 4 |
| weathergrids | 9 | cu | numeric | 4 |
| weathergrids | 10 | cc | numeric | 4 |
| wosweathergrids | 1 | id | integer | 32 |
| wosweathergrids | 2 | location | numeric | 6 |
| wosweathergrids | 3 | dates | character varying | 16 |
| wosweathergrids | 4 | temperature | numeric | 3 |
| wosweathergrids | 5 | rh | numeric | 4 |
| wosweathergrids | 6 | wd | numeric | 4 |
| wosweathergrids | 7 | ws | numeric | 4 |
| wosweathergrids | 8 | df | numeric | 4 |
| wosweathergrids | 9 | cu | numeric | 4 |
| wosweathergrids | 10 | cc | numeric | 4 |
+-----------------+-----+-------------+-----------------------------+-----+
Your derived table (sub-query) named i has no column named dates because the column dates is "hidden" in the to_char() function and as it does not define an alias for that expression, no column dates is available "outside" of the derived table.
But I don't see the reason for a derived table to begin with. Also: aliasing a column with the same name is also unnecessary i.location as location is exactly the same thing as i.location.
So your query can be simplified to:
INSERT INTO weathergrids (location, dates, temperature, rh, wd, ws, df, cu, cc)
SELECT
location,
to_timestamp(dates, 'YYYY-DD-MM HH24:MI'),
temperature,
rh,
winddir,
windspeed,
droughtfactor,
curing,
cloudcover
FROM wosweathergrids
You don't need to give an alias to the to_timestamp() expression as the column are matched by position, not by name in an insert ... select statement.
I need help with a bit of a crazy single-query goal please that I'm not sure if GROUP BY or sub-SELECT applies to?
The following query:
SELECT id_finish, description, inside_rate, outside_material, id_part, id_metal
FROM parts_finishing AS pf
LEFT JOIN parts_finishing_descriptions AS fd ON (pf.id_description=fd.id);
Returns the results like the following:
+-------------+-------------+------------------+--------------------------------+
| description | inside_rate | outside_material | id_part - id_finish - id_metal |
+-------------+-------------+------------------+--------------------------------+
| Nickle | 0 | 33.44 | 4444-44-44, 5555-55-55 |
+-------------+-------------+------------------+--------------------------------+
| Bend | 11.22 | 0 | 1111-11-11 |
+-------------+-------------+------------------+--------------------------------+
| Pack | 22.33 | 0 | 2222-22-22, 3333-33-33 |
+-------------+-------------+------------------+--------------------------------+
| Zinc | 0 | 44.55 | 6000-66-66 |
+-------------+-------------+------------------+--------------------------------+
I need the results to return in the fashion below but there are catches:
I need to group by either the inside_rate column or the outside_material column but ORDER BY the description column but not ORDER BY or sort them by price (inside_rate and outside_material are the prices). So we know that they belong to a group if inside_rate is 0 or to the other group if outside_material is 0.
I need to ORDER BY the description column desc secondary after they are returned per group.
I need to return a list of parts (composed of three separate columns) for that inside/outside group / price for that finishing.
Stack format fix.
+-------------+-------------+------------------+--------------------------------+
| description | inside_rate | outside_material | id_part - id_finish - id_metal |
+-------------+-------------+------------------+--------------------------------+
| Bend | 11.22 | 0 | 1111-11-11 |
+-------------+-------------+------------------+--------------------------------+
| Pack | 22.33 | 0 | 2222-22-22, 3333-33-33 |
+-------------+-------------+------------------+--------------------------------+
| Nickle | 0 | 33.44 | 4444-44-44, 5555-55-55 |
+-------------+-------------+------------------+--------------------------------+
| Zinc | 0 | 44.55 | 6000-66-66 |
+-------------+-------------+------------------+--------------------------------+
The tables I'm working with and their data types:
Table "public.parts_finishing"
Column | Type | Modifiers
------------------+---------+-------------------------------------------------------------
id | bigint | not null default nextval('parts_finishing_id_seq'::regclass)
id_part | bigint |
id_finish | bigint |
id_metal | bigint |
id_description | bigint |
date | date |
inside_hours_k | numeric |
inside_rate | numeric |
outside_material | numeric |
sort | integer |
Indexes:
"parts_finishing_pkey" PRIMARY KEY, btree (id)
Table "public.parts_finishing_descriptions"
Column | Type | Modifiers
------------+---------+------------------------------------------------------------------
id not null | bigint | default nextval('parts_finishing_descriptions_id_seq'::regclass)
date | date |
description | text |
rate_hour | numeric |
type | text |
Indexes:
"parts_finishing_descriptions_pkey" PRIMARY KEY, btree (id)
The second table's first column is just id. (Why are we still dealing with a 1024 static width layout in 2015?)
I'd make an SQL fiddle though it refuses to load for me regardless of the browser.
Not entirely sure I understand your question. Might look like this:
SELECT pd.description, pf.inside_rate, pf.outside_material
, concat_ws(' - ', pf.id_part::text
, pf.id_finish::text
, pf.id_metal::text) AS id_part_finish_metal
FROM parts_finishing pf
LEFT JOIN parts_finishing_descriptions fd ON pf.id_description = fd.id
ORDER BY (pf.inside_rate = 0) -- 1. sorts group "inside_rate" first
, pd.description DESC NULLS LAST -- 2. possible NULL values last
;