Physical size of int2, int4, int8 in PostgreSQL - postgresql

I am not understand differents between storage size of INTs (all this types has fixed size)
In official manual I see this description:
The types smallint, integer, and bigint store whole numbers, that is,
numbers without fractional components, of various ranges. Attempts to
store values outside of the allowed range will result in an error.
The type integer is the common choice, as it offers the best balance
between range, storage size, and performance. The smallint type is
generally only used if disk space is at a premium. The bigint type is
designed to be used when the range of the integer type is
insufficient.
SQL only specifies the integer types integer (or int), smallint, and
bigint. The type names int2, int4, and int8 are extensions, which are
also used by some other SQL database systems.
However simple test shows that change type of column do not change table size
create table test_big_table_int (
f_int integer
);
INSERT INTO test_big_table_int (f_int )
SELECT ceil(random() * 1000)
FROM generate_series(1,1000000);
SELECT
pg_size_pretty(pg_total_relation_size(relid)) As "Size_of_table"
FROM pg_catalog.pg_statio_user_tables
where relname = 'test_big_table_int';
--"35 MB";
alter table test_big_table_int ALTER COLUMN f_int TYPE bigint;
SELECT
pg_size_pretty(pg_total_relation_size(relid)) As "Size_of_table"
FROM pg_catalog.pg_statio_user_tables
where relname = 'test_big_table_int';
--"35 MB";
alter table test_big_table_int ALTER COLUMN f_int TYPE smallint;
SELECT
pg_size_pretty(pg_total_relation_size(relid)) As "Size_of_table"
FROM pg_catalog.pg_statio_user_tables
where relname = 'test_big_table_int';
--"35 MB";"0 bytes"
Everytime I get size of my table - 35MB. So where the profit to use integer or smallint insthead int8?
And second question - why postgee is doing rewrite tuple while change type of int (int2<->int4<->int8)?

Related

How to create regtype column in postgreSQL that includes length and precision parameter

I would like to store some meta information in postgreSQL database.
This requires to store the column type information. I am aware of regtype type but it does not store information about length or precision.
How can I achieve this. I could use a TEXT column instead but then I would need to take care of all the validations and referential integrity. Is there more convenient way o achieve this?
Below I present example code.
CREATE TABLE foo
(name TEXT,
sql_type regtype);
INSERT INTO foo
VALUES('my_field_1', 'character varying'::regtype);
INSERT INTO foo
VALUES('my_field_2', 'VARCHAR(50)'::regtype);
INSERT INTO foo
VALUES('my_field_3', 'NUMERIC(32,16)'::regtype);
SELECT * from foo;
The result is as follows:
name sql_type
text regtype
-------------------------------------
my_field_1 character varying
my_field_2 character varying
my_field_3 numeric
Expected result:
name sql_type
text regtype
-------------------------------------
my_field_1 character varying <-- I won't need such cases
my_field_2 character varying(50)
my_field_3 numeric(32,16)
I am currently using PostgreSQL 9.6
The type regclass is a convenience type that internally is just the type's numeric object identifier, so it does not contain information about scale, precision, length and other type modifiers.
I would store the type together with its modifiers as text.
But if you want, you can also do it like this:
CREATE TABLE coldef (
column_name name NOT NULL,
data_type regtype NOT NULL,
numeric_precision smallint
CHECK (numeric_precision IS NULL
OR numeric_precision BETWEEN 1 AND 1000),
numeric_scale smallint
CHECK (numeric_scale IS NULL
OR numeric_scale BETWEEN 0 AND numeric_precision),
character_maximum_length integer
CHECK (character_maximum_length IS NULL
OR character_maximum_length BETWEEN 1 AND 10485760),
datetime_precision smallint
CHECK (datetime_precision IS NULL
OR datetime_precision BETWEEN 0 AND 6),
interval_precision smallint
CHECK (interval_precision IS NULL
OR interval_precision BETWEEN 0 AND 6)
);
You can add more check constraints to make sure that there are no forbidden combinations, like a character varying with a numeric precision, or that numeric_precision must be NOT NULL when numeric_scale is.
Get inspired by the catalog table information_schema.columns that contains the column metadata.

Postgresql - retrieving referenced fields in a query

I have a table created like
CREATE TABLE data
(value1 smallint references labels,
value2 smallint references labels,
value3 smallint references labels,
otherdata varchar(32)
);
and a second 'label holding' table created like
CREATE TABLE labels (id serial primary key, name varchar(32));
The rationale behind it is that value1-3 are a very limited set of strings (6 options) and it seems inefficient to enter them directly in the data table as varchar types. On the other hand these do occasionally change, which makes enum types unsuitable.
My question is, how can I execute a single query such that instead of the label IDs I get the relevant labels?
I looked at creating a function for it and stumbled at the point where I needed to pass the label holding table name to the function (there are several such (label holding) tables across the schema). Do I need to create a function per label table to avoid that?
create or replace function translate
(ref_id smallint,reference_table regclass) returns varchar(128) as
$$
begin
select name from reference_table where id = ref_id;
return name;
end;
$$
language plpgsql;
And then do
select
translate(value1, labels) as foo,
translate(value2, labels) as bar
from data;
This however errors out with
ERROR: relation "reference_table" does not exist
All suggestions welcome - at this point a can still alter just about anything...
CREATE TABLE labels
( id smallserial primary key
, name varchar(32) UNIQUE -- <<-- might want this, too
);
CREATE TABLE data
( value1 smallint NOT NULL REFERENCES labels(id) -- <<-- here
, value2 smallint NOT NULL REFERENCES labels(id)
, value3 smallint NOT NULL REFERENCES labels(id)
, otherdata varchar(32)
, PRIMARY KEY (value1,value2,value3) -- <<-- added primary key here
);
-- No need for a function here.
-- For small sizes of the `labels` table, the query below will always
-- result in hash-joins to perform the lookups.
SELECT l1.name AS name1, l2.name AS name2, l3.name AS name3
, d.otherdata AS the_data
FROM data d
JOIN labels l1 ON l1.id = d.value1
JOIN labels l2 ON l2.id = d.value2
JOIN labels l3 ON l3.id = d.value3
;
Note: labels.id -> labels.name is a functional dependency (id is the primary key), but that doesn't mean that you need a function. The query just acts like a function.
You can pass the label table name as string, construct a query as string and execute it:
sql = `select name from ` || reference_table_name || `where id = ` || ref_id;
EXECUTE sql INTO name;
RETURN name;

How to create a pageable function in PostgreSQL

I have two tables: event and location
CREATE TABLE location
(
location_id bigint NOT NULL,
version bigint NOT NULL,
active boolean NOT NULL,
created timestamp without time zone NOT NULL,
latitude double precision NOT NULL,
longitude double precision NOT NULL,
updated timestamp without time zone,
CONSTRAINT location_pkey PRIMARY KEY (location_id)
)
CREATE TABLE event
(
event_id bigint NOT NULL,
version bigint NOT NULL,
active boolean NOT NULL,
created timestamp without time zone NOT NULL,
end_date date,
entry_fee numeric(19,2),
location_id bigint NOT NULL,
organizer_id bigint NOT NULL,
start_date date NOT NULL,
timetable_id bigint,
updated timestamp without time zone,
CONSTRAINT event_pkey PRIMARY KEY (event_id),
CONSTRAINT fk_organizer FOREIGN KEY (organizer_id)
REFERENCES "user" (user_id) MATCH SIMPLE
ON UPDATE NO ACTION ON DELETE NO ACTION,
CONSTRAINT fk_timetable FOREIGN KEY (timetable_id)
REFERENCES timetable (timetable_id) MATCH SIMPLE
ON UPDATE NO ACTION ON DELETE NO ACTION,
CONSTRAINT fk_location FOREIGN KEY (location_id)
REFERENCES location (location_id) MATCH SIMPLE
ON UPDATE NO ACTION ON DELETE NO ACTION
)
Other tables are of lesser to no importance so they will not be shown (unless explicitly asked).
And for those tables, using cube and earthdistance pgsql extensions I've created the following function for finding all event_ids within a certain radius of a certain point.
CREATE OR REPLACE FUNCTION eventidswithinradius(
lat double precision,
lng double precision,
radius double precision)
RETURNS SETOF bigint AS
$BODY$
BEGIN
RETURN QUERY SELECT event.event_id
FROM event
INNER JOIN location ON location.location_id = event.location_id
WHERE earth_box( ll_to_earth(lat, lng), radius) #> ll_to_earth(location.latitude, location.longitude);
END;
$BODY$
And this works as expected. Now I wish to make it pageable, and am stuck on how to get all the necessary values (the table with paged contents and total count).
So far I've created this:
CREATE OR REPLACE FUNCTION pagedeventidswithinradius(
IN lat double precision,
IN lng double precision,
IN radius double precision,
IN page_size integer,
IN page_offset integer)
RETURNS TABLE( total_size integer , event_id bigint ) AS
$BODY$
DECLARE total integer;
BEGIN
SELECT COUNT(location.*) INTO total FROM location WHERE earth_box( ll_to_earth(lat, lng), radius) #> ll_to_earth(location.latitude, location.longitude);
RETURN QUERY SELECT total, event.event_id as event_id
FROM event
INNER JOIN location ON location.location_id = event.location_id
WHERE earth_box( ll_to_earth(lat, lng), radius) #> ll_to_earth(location.latitude, location.longitude)
ORDER BY event_id
LIMIT page_size OFFSET page_offset;
END;
$BODY$
Here count is called only once and stored in a variable since I assumed that if I placed COUNT into the return query itself it would be called for each row.
This kind of works, but it is difficult to parse on the back-end since the result is in the form of (count, event_id), also count is needlessly repeated over all result rows. I was hoping I could simply add total as an OUT param and have the function return the table and fill the OUT variable with total count, however it seems this is not allowed. I can always have the count be a separate function but I was wondering if there is a better way to approach this issue?
No, there isn't really a better option. You want two different types of quantities so you need two queries. You can improve upon your function, however:
CREATE FUNCTION eventidswithinradius(lat float8, long float8, radius float8) RETURNS SETOF bigint AS $BODY$
SELECT event.event_id
FROM event
JOIN location l USING (location_id)
WHERE earth_box(ll_to_earth(lat, lng), radius) #> ll_to_earth(l.latitude, l.longitude);
$BODY$ LANGUAGE sql STRICT;
As a LANGUAGE sql function it is more efficient than as a PL/pgSQL function, plus you can do your paging on the outside:
SELECT *
FROM eventidswithinradius(121.056, 14.582, 3000)
LIMIT 15 OFFSET 1;
Internally the query planner will resolve the function call to its underlying query and apply the paging directly to that level.
Get the total with the obvious:
SELECT count(id)
FROM eventidswithinradius(121.056, 14.582, 3000);

Varchar to Numeric Conversion

I have loaded the excel data into a table using SSIS.
Table Structure :
Monthly_Budget
SEQUENCE INT IDENTITY,
TRANSACTION_DATE VARCHAR(100),
TRANSACTION_REMARKS VARCHAR(1000),
WITHDRAWL_AMOUNT VARCHAR(100),
DEPOSIT_AMOUNT VARCHAR(100),
BALANCE_AMOUNT VARCHAR(100)
Values in WITHDRAWL_AMOUNT Column:
7,987.00
1,500.00
7,000.00
50.00
NULL
253.00
4,700.00
2,000.00
148.00
2,000.00
64.00
1,081.00
2,000.00
NULL
NULL
7,000.00
Now, I am trying to run a query to get the summation of values under WITHDRWAL_AMOUNT but I am getting an error :
Error converting data type varchar to numeric.
My Query :
SELECT SUM(CAST(ISNULL(LTRIM(RTRIM(WITHDRAWL_AMOUNT)),0) AS NUMERIC(6,2))) AS NUM FROM MONTHLY_BUDGET
Try converting them like this:
select SUM(CAST(ltrim(rtrim(replace(WITHDRAWL_AMOUNT, ',', ''))) as numeric(6, 2)) )
It is much, much preferable to store the values in the proper types that you want. I can, however, understand putting external data into a staging table and then using logic such as the above to load the data into the final table.

PostgreSQL, SQL state: 42601

I want to insert into a table (circuit) using a select which takes values from 2 tables (segment and wgs). My query:
INSERT INTO circuit (id_circuit, description, date_start, date_end, speed,
length, duration)
SELECT (seg.id_segment, cir.nomcircuit, seg.date_start, seg.date_end, seg.speed_average,
cir.shape_leng, (seg.date_end - seg.date_start))
FROM segment seg, wgs cir where seg.id = 13077
My Tables: circuit:
CREATE TABLE circuit
(
id serial NOT NULL,
id_circuit integer,
description character varying(50),
date_start time without time zone,
date_end time without time zone,
speed double precision,
length double precision,
duration double precision,
CONSTRAINT circuit_pkey PRIMARY KEY (id)
)
segment:
CREATE TABLE segment
(
id serial NOT NULL,
id_segment integer,
date_start timestamp without time zone,
date_end timestamp without time zone,
speed_average double precision,
mt_identity character varying,
truck_type character varying,
CONSTRAINT segment_pkey PRIMARY KEY (id)
)
wgs:
CREATE TABLE wgs
(
id serial NOT NULL,
nomcircuit character varying(50),
shape_leng numeric,
CONSTRAINT wgs_pkey PRIMARY KEY (id)
)
But when I run my query, this error comes:
ERROR: INSERT has more target columns than expressions
LINE 1: INSERT INTO circuit (id_circuit, description, dat...
^
HINT: The insertion source is a row expression containing the same number of columns
expected by the INSERT. Did you accidentally use extra parentheses?
As far I can see, I do not have extra parentheses, I double checked the columns data type and made sure they match and various tries, but I still don't get why the error comes.
PS: the 13077 is just to try it out with one value I'm sure I have.
This constructs an anonymous composite value:
select (1, 'a');
For example:
=> select (1, 'a');
row
-------
(1,a)
(1 row)
=> select row(1, 'a');
row
-------
(1,a)
(1 row)
Note that that is a single composite value, not multiple values.
From the fine manual:
8.16.2. Composite Value Input
To write a composite value as a literal constant, enclose the field values within parentheses and separate them by commas. You can put double quotes around any field value, and must do so if it contains commas or parentheses.
[...]
The ROW expression syntax can also be used to construct composite values. In most cases this is considerably simpler to use than the string-literal syntax since you don't have to worry about multiple layers of quoting. We already used this method above:
ROW('fuzzy dice', 42, 1.99)
ROW('', 42, NULL)
The ROW keyword is actually optional as long as you have more than one field in the expression, so these can simplify to:
('fuzzy dice', 42, 1.99)
('', 42, NULL)
The Row Constructors section might also be of interest.
When you say this:
INSERT INTO circuit (id_circuit, description, date_start, date_end, speed,
length, duration)
SELECT (...)
FROM segment seg, wgs cir where seg.id = 13077
your SELECT clause only has one column as the whole (...) expression represents a single value. The solution is to simply drop those parentheses:
INSERT INTO circuit (id_circuit, description, date_start, date_end, speed, length, duration)
SELECT seg.id_segment, ..., (seg.date_end - seg.date_start)
FROM segment seg, wgs cir where seg.id = 13077