How to increase sequence only if column's value is different from the previous? PostgreSQL - postgresql

Exactly as in the topic. For really easy example I have table:
Cars(
TEXT make,
INTEGER value,
TEXT owner
);
With records like this:
('BMW', 100000, 'AcmeCompany')
('BMW', 350000, 'XCompany')
('Lamborgini', 500000, 'YCompany')
('Fiat', 30000, 'ZCompany')
('BMW', 200000, 'XCompany')
and I want to create table with column makeID (don't ask why, it's just a example :P)
Informations(
INTEGER makeID
other_stuff
)
I have sequence on makeID and I have to increment sequence ONLY when I have new value of make in in my SELECT (it's like transtalion from string into integer; if it is possible I don't wanna sha1 or other cryptografic things just a simply sequence!)
INSERT INTO Informations(makeID) (
-- IF I have new value: SELECT nextVal('mysequence')
-- ELSE : SELECT currval('mysequence')
)
I know it should be possible by pgsql function (sort table by 'make' column, then save last value of 'make' into temp variable and do INSERT INTO with check value of my actual 'make' value with value in temp for incrementing sequence variable or not). But there is easier way, just by query? Combination of join, sort and distinct?

Not sure I understand. It would be simply serial with select distinct
create table make (
make_id serial,
make_brand text,
make_owner text
);
insert into make (make_brand, make_owner)
select distinct make, owner
from cars;

Related

Use sum function in calculated column

Is it possible to use a sum function in a calculated column?
If yes, I would like to create a calculated column, that calculates the sum of a column in the same table where the date is smaller than the date of this entry. is this possible?
And last, would this optimize repeated calls on this value over the exemplified view below?
SELECT ProductGroup, SalesDate, (
SELECT SUM(Sales)
FROM SomeList
WHERE (ProductGroup= KVU.ProductGroup) AND (SalesDate<= KVU.SalesDate)) AS cumulated
FROM SomeList AS KVU
Is it possible to use a sum function in a calculated column?
Yes, it's possible using a scalar valued function (scalar UDF) for you computed column but this would be a disaster. Using scalar UDFs for computed columns destroy performance. Adding a scalar UDF that accesses data (which would be required here) makes things even worse.
It sounds to me like you just need a good ol' fashioned index to speed things up. First some sample data:
IF OBJECT_ID('dbo.somelist','U') IS NOT NULL DROP TABLE dbo.somelist;
GO
CREATE TABLE dbo.somelist
(
ProductGroup INT NOT NULL,
[Month] TINYINT NOT NULL CHECK ([Month] <= 12),
Sales DECIMAL(10,2) NOT NULL
);
INSERT dbo.somelist
VALUES (1,1,22),(2,1,45),(2,1,25),(2,1,19),(1,2,100),(1,2,200),(2,2,50.55);
and the correct index:
CREATE NONCLUSTERED INDEX nc_somelist ON dbo.somelist(ProductGroup,[Month])
INCLUDE (Sales);
With this index in place this query would be extremely efficient:
SELECT s.ProductGroup, s.[Month], SUM(s.Sales)
FROM dbo.somelist AS s
GROUP BY s.ProductGroup, s.[Month];
If you needed to get a COUNT by month & product group you could create an indexed view like so:
CREATE VIEW dbo.vw_somelist WITH SCHEMABINDING AS
SELECT s.ProductGroup, s.[Month], TotalSales = COUNT_BIG(*)
FROM dbo.somelist AS s
GROUP BY s.ProductGroup, s.[Month];
GO
CREATE UNIQUE CLUSTERED INDEX uq_cl__vw_somelist ON dbo.vw_somelist(ProductGroup, [Month]);
Once that indexed view was in place your COUNTs would be pre-aggregated. You cannot, however, include SUM in an indexed view.

PostgreSQL getting new id during insert

I need to create some customer number on record insert, format is 'A' + 4 digits, based on the ID. So record ID 23 -> A0023 and so on. My solution is currently this:
-- Table
create table t (
id bigserial unique primary key,
x text,
y text
);
-- Insert
insert into t (x, y) select concat('A',lpad((currval(pg_get_serial_sequence('t','id')) + 1)::text, 4, '0')), 'test';
This works perfectly. Now my question is ... is that 'safe', in the sense that currval(seq)+1 is guaranteed the same as the id column will receive? I think it should be locked during statement execution. Is this the correct way to do it or is there any shortcut to access the to-be-created ID directly?
Instead of storing this data, you could just query it each time you needed it, making the whole thing a lot less error-prone:
SELECT id, 'A' + LPAD(id::varchar, 4, '0')
FROM t

Can the categories in the postgres tablefunc crosstab() function be integers?

It's all in the title. Documentation has something like this:
SELECT *
FROM crosstab('...') AS ct(row_name text, category_1 text, category_2 text);
I have two tables, lab_tests and lab_tests_results. All of the lab_tests_results rows are tied to the primary key id integer in the lab_tests table. I'm trying to make a pivot table where the lab tests (identified by an integer) are row headers and the respective results are in the table. I can't get around a syntax error at or around the integer.
Is this possible with the current set up? Am I missing something in the documentation? Or do I need to perform an inner join of sorts to make the categories strings? Or modify the lab_tests_results table to use a text identifier for the lab tests?
Thanks for the help, all. Much appreciated.
Edit: Got it figured out with the help of Dmitry. He had the data layout figured out, but I was unclear on what kind of output I needed. I was trying to get the pivot table to be based on batch_id numbers in the lab_tests_results table. Had to hammer out the base query and casting data types.
SELECT *
FROM crosstab('SELECT lab_tests_results.batch_id, lab_tests.test_name, lab_tests_results.test_result::FLOAT
FROM lab_tests_results, lab_tests
WHERE lab_tests.id=lab_tests_results.lab_test AND (lab_tests.test_name LIKE ''Test Name 1'' OR lab_tests.test_name LIKE ''Test Name 2'')
ORDER BY 1,2'
) AS final_result(batch_id VARCHAR, test_name_1 FLOAT, test_name_2 FLOAT);
This provides a pivot table from the lab_tests_results table like below:
batch_id |test_name_1 |test_name_2
---------------------------------------
batch1 | result1 | <null>
batch2 | result2 | result3
If I understand correctly your tables look something like this:
CREATE TABLE lab_tests (
id INTEGER PRIMARY KEY,
name VARCHAR(500)
);
CREATE TABLE lab_tests_results (
id INTEGER PRIMARY KEY,
lab_tests_id INTEGER REFERENCES lab_tests (id),
result TEXT
);
And your data looks something like this:
INSERT INTO lab_tests (id, name)
VALUES (1, 'test1'),
(2, 'test2');
INSERT INTO lab_tests_results (id, lab_tests_id, result)
VALUES (1,1,'result1'),
(2,1,'result2'),
(3,2,'result3'),
(4,2,'result4'),
(5,2,'result5');
First of all crosstab is part of tablefunc, you need to enable it:
CREATE EXTENSION tablefunc;
You need to run it one per database as per this answer.
The final query will look like this:
SELECT *
FROM crosstab(
'SELECT lt.name::TEXT, lt.id, ltr.result
FROM lab_tests AS lt
JOIN lab_tests_results ltr ON ltr.lab_tests_id = lt.id'
) AS ct(test_name text, result_1 text, result_2 text, result_3 text);
Explanation:
The crosstab() function takes a text of a query which should return 3 columns; (1) a column for name of a group, (2) a column for grouping, (3) the value. The wrapping query just selects all the values those crosstab() returns and defines the list of columns after (the part after AS). First is the category name (test_name) and then the values (result_1, result_2). In my query I'll get up to 3 results. If I have more then 3 results then I won't see them, If I have less then 3 results I'll get nulls.
The result for this query is:
test_name |result_1 |result_2 |result_3
---------------------------------------
test1 |result1 |result2 |<null>
test2 |result3 |result4 |result5

Postgresql inserts values falsely

I want to add a denormalized table for some data of a gtfs-feed. For that I created a new table:
CREATE TABLE denormalized_trips (
stops_coords json NOT NULL,
stops_object json NOT NULL,
agency_key text NOT NULL,
trip_id text NOT NULL,
route_id text NOT NULL,
service_id text NOT NULL,
shape_id text,
route_color text,
route_long_name text,
route_desc text,
direction_id text
);
CREATE INDEX denormalized_trips_index ON denormalized_trips (agency_key, trip_id);
CREATE UNIQUE INDEX denormalized_trips_index ON denormalized_trips (agency_key, route_id);
Now I want to transfer data from one table to the other via an insert statement. The statement is rather complex.
INSERT INTO denormalized_trips
SELECT
trps.stops_coords,
trps.stops_object,
trps.trip_id,
trps.service_id,
trps.route_id,
trps.direction_id,
trps.agency_key,
trps.shape_id,
trps.route_color,
trps.route_long_name,
trps.route_desc
FROM (
SELECT
array_to_json(ARRAY_AGG(array[stop_lat, stop_lon])) AS stops_coords,
array_to_json(ARRAY_AGG(array[
stops.stop_id,
CAST ( stop_times.stop_sequence AS TEXT ),
stops.stop_name,
stop_times.departure_time,
CAST ( stop_times.departure_time_seconds AS TEXT ),
stop_times.arrival_time,
CAST ( stop_times.arrival_time_seconds AS TEXT )
])) AS stops_object,
trips.trip_id,
trips.service_id,
trips.direction_id,
trips.agency_key,
trips.shape_id,
routes.route_id,
routes.route_color,
routes.route_long_name,
routes.route_desc
FROM gtfs_stop_times AS stop_times
INNER JOIN gtfs_trips AS trips
ON trips.trip_id = stop_times.trip_id AND trips.agency_key = stop_times.agency_key
INNER JOIN gtfs_routes AS routes ON trips.agency_key = routes.agency_key AND routes.route_id = trips.route_id
INNER JOIN gtfs_stops AS stops
ON stops.stop_id = stop_times.stop_id
AND stops.agency_key = stop_times.agency_key
AND NOT EXISTS (
SELECT 0
FROM denormalized_max_stop_sequence AS max
WHERE max.agency_key = stop_times.agency_key
AND max.trip_id = stop_times.trip_id
AND max.trip_max = stop_times.stop_sequence
)
GROUP BY
trips.trip_id,
trips.service_id,
trips.direction_id,
trips.agency_key,
trips.shape_id,
routes.route_id,
routes.route_color,
routes.route_long_name,
routes.route_desc
) as trps
If I just run the inner select statement I will get the right results. They look something like this: (screenshot does not show all tables because it's too long)
But if I execute the insert statement and display the content of the table i will get something like this:
As you may notice the contents are not inserted into the right columns of the table. The agency_key now has the values of the trip_id and the direction_id is now the service_id (and there are more tables that are messed up).
So my question is what am I doing wrong that my insert statement inserts the contents into the wrong columns of the newly created table?
Thanks for your help.
Postgres, by default, will insert your values in the order the columns are declared in the table; it has nothing to do with what your columns are named in the query.
https://www.postgresql.org/docs/9.5/static/sql-insert.html
If no list of column names is given at all, the default is all the columns of the table in their declared order; or the first N column names, if there are only N columns supplied by the VALUES clause or query.
You can alter your insert to declare the order of the columns you're inserting, or you can change the order of your select to match the order of columns in the table.

an empty row with null-like values in not-null field

I'm using postgresql 9.0 beta 4.
After inserting a lot of data into a partitioned table, i found a weird thing. When I query the table, i can see an empty row with null-like values in 'not-null' fields.
That weird query result is like below.
689th row is empty. The first 3 fields, (stid, d, ticker), are composing primary key. So they should not be null. The query i used is this.
select * from st_daily2 where stid=267408 order by d
I can even do the group by on this data.
select stid, date_trunc('month', d) ym, count(*) from st_daily2
where stid=267408 group by stid, date_trunc('month', d)
The 'group by' results still has the empty row.
The 1st row is empty.
But if i query where 'stid' or 'd' is null, then it returns nothing.
Is this a bug of postgresql 9b4? Or some data corruption?
EDIT :
I added my table definition.
CREATE TABLE st_daily
(
stid integer NOT NULL,
d date NOT NULL,
ticker character varying(15) NOT NULL,
mp integer NOT NULL,
settlep double precision NOT NULL,
prft integer NOT NULL,
atr20 double precision NOT NULL,
upd timestamp with time zone,
ntrds double precision
)
WITH (
OIDS=FALSE
);
CREATE TABLE st_daily2
(
CONSTRAINT st_daily2_pk PRIMARY KEY (stid, d, ticker),
CONSTRAINT st_daily2_strgs_fk FOREIGN KEY (stid)
REFERENCES strgs (stid) MATCH SIMPLE
ON UPDATE CASCADE ON DELETE CASCADE,
CONSTRAINT st_daily2_ck CHECK (stid >= 200000 AND stid < 300000)
)
INHERITS (st_daily)
WITH (
OIDS=FALSE
);
The data in this table is simulation results. Multithreaded multiple simulation engines written in c# insert data into the database using Npgsql.
psql also shows the empty row.
You'd better leave a posting at http://www.postgresql.org/support/submitbug
Some questions:
Could you show use the table
definitions and constraints for the
partions?
How did you load your data?
You get the same result when using
another tool, like psql?
The answer to your problem may very well lie in your first sentence:
I'm using postgresql 9.0 beta 4.
Why would you do that? Upgrade to a stable release. Preferably the latest point-release of the current version.
This is 9.1.4 as of today.
I got to the same point: "what in the heck is that blank value?"
No, it's not a NULL, it's a -infinity.
To filter for such a row use:
WHERE
case when mytestcolumn = '-infinity'::timestamp or
mytestcolumn = 'infinity'::timestamp
then NULL else mytestcolumn end IS NULL
instead of:
WHERE mytestcolumn IS NULL