Is it possible to limit character length with byte size for Postgres? - postgresql

I just would like to know if it is possible to limit character length with byte size for Postgres.
It would be very appreciated if you would tell me that.
Postgres version: 9.2.17

The length limit for a varchar column is in characters based on the encoding of the database. Unless you want to change your database to use a single-byte encoding (which I would strongly discourage), there is no direct way to do this.
What you can do is to use a check constraint that converts the character value to a byte array based on a specific encoding and then checks the length of the array:
alter table the_table
add constraint check_byte_length
check ( length(convert_to(the_column, 'UTF-8')) <= 42 )

Related

The size physically occuped by a in the database column depend on my string size or on my column max size?

For example, I have a column of varchar(2000) for messages.
If most of my messages have a length of 50 char, the "real place in memory" occupied is optimized?
Or each of them occupies 2000 char?
I use PostgreSQL
The storage (and memory) space needed only depends on the actual data stored in the column. A column defined as varchar(2000) that only contains at most 50 characters, does not need more storage or memory than a column defined as varchar(50)
Quote from the manual
If the string to be stored is shorter than the declared length, [...] values of type character varying will simply store the shorter string
(Emphasis mine)
Note that this is different for the character data type - but that shouldn't be used anyway

Inputted string must be exactly 10 characters

I am creating a column in postgres that must only accept 10 characters. VARCHAR(10) works in the sense that it allows data to be a maximum of 10 characters, however i cant seem to figure out a way to define a minimum.
How can one do this using 11.5 psql?
Add a constraint to the table that checks the length:
ALTER TABLE tbl ADD CONSTRAINT check_min_len CHECK (length(col) >= 10);

Can we create a column of character varying(MAX) with PostgreSQL database

I am unable to set the max size of particular column in PostgreSQL with MAX keyword. Is there any keyword like MAX. If not how can we create the column with the maximum size?
If you want to created an "unbounded" varchar column just use varchar without a length restriction.
From the manual:
If character varying is used without length specifier, the type accepts strings of any size
So you can use:
create table foo
(
unlimited varchar
);
Another alternative is to use text:
create table foo
(
unlimited text
);
More details about character data types are in the manual:
http://www.postgresql.org/docs/current/static/datatype-character.html
You should use TEXT data type for this use-case IMO.
https://www.postgresql.org/docs/9.1/datatype-character.html

Create big integer from the big end of a uuid in PostgreSQL

I have a third-party application connecting to a view in my PostgreSQL database. It requires the view to have a primary key but can't handle the UUID type (which is the primary key for the view). It also can't handle the UUID as the primary key if it is served as text from the view.
What I'd like to do is convert the UUID to a number and use that as the primary key instead. However,
SELECT x'14607158d3b14ac0b0d82a9a5a9e8f6e'::bigint
Fails because the number is out of range.
So instead, I want to use SQL to take the big end of the UUID and create an int8 / bigint. I should clarify that maintaining order is 'desirable' but I understand that some of the order will change by doing this.
I tried:
SELECT x(substring(UUID::text from 1 for 16))::bigint
but the x operator for converting hex doesn't seem to like brackets. I abstracted it into a function but
SELECT hex_to_int(substring(UUID::text from 1 for 16))::bigint
still fails.
How can I get a bigint from the 'big end' half of a UUID?
Fast and without dynamic SQL
Cast the leading 16 hex digits of a UUID in text representation as bitstring bit(64) and cast that to bigint. See:
Convert hex in text representation to decimal number
Conveniently, excess hex digits to the right are truncated in the cast to bit(64) automatically - exactly what we need.
Postgres accepts various formats for input. Your given string literal is one of them:
14607158d3b14ac0b0d82a9a5a9e8f6e
The default text representation of a UUID (and the text output in Postgres for data type uuid) adds hyphens at predefined places:
14607158-d3b1-4ac0-b0d8-2a9a5a9e8f6e
The manual:
A UUID is written as a sequence of lower-case hexadecimal digits, in
several groups separated by hyphens, specifically a group of 8 digits
followed by three groups of 4 digits followed by a group of 12 digits,
for a total of 32 digits representing the 128 bits.
If input format can vary, strip hyphens first to be sure:
SELECT ('x' || translate(uuid_as_string, '-', ''))::bit(64)::bigint;
Cast actual uuid input with uuid::text.
db<>fiddle here
Note that Postgres uses signed integer, so the bigint overflows to negative numbers in the upper half - which should be irrelevant for this purpose.
DB design
If at all possible add a bigserial column to the underlying table and use that instead.
This is all very shaky, both the problem and the solution you describe in your self-answer.
First, a mismatch between a database design and a third-party application is always possible, but usually indicative of a deeper problem. Why does your database use the uuid data type as a PK in the first place? They are not very efficient compared to a serial or a bigserial. Typically you would use a UUID if you are working in a distributed environment where you need to "guarantee" uniqueness over multiple installations.
Secondly, why does the application require the PK to begin with (incidentally: views do not have a PK, the underlying tables do)? If it is only to view the data then a PK is rather useless, particularly if it is based on a UUID (and there is thus no conceivable relationship between the PK and the rest of the tuple). If it is used to refer to other data in the same database or do updates or deletes of existing data, then you need the exact UUID and not some extract of it because the underlying table or other relations in your database would have the exact UUID. Of course you can convert all UUID's with the same hex_to_int() function, but that leads straight back to my point above: why use uuids in the first place?
Thirdly, do not mess around with things you have little or no knowledge of. This is not intended to be offensive, take it as well-meant advice (look around on the internet for programmers who tried to improve on cryptographic algorithms or random number generation by adding their own twists of obfuscation; quite entertaining reads). There are 5 algorithms for generating UUID's in the uuid-ossp package and while you know or can easily find out which algorithm is used in your database (the uuid_generate_vX() functions in your table definitions, most likely), do you know how the algorithm works? The claim of practical uniqueness of a UUID is based on its 128 bits, not a 64-bit extract of it. Are you certain that the high 64-bits are random? My guess is that 64 consecutive bits are less random than the "square root of the randomness" (for lack of a better way to phrase the theoretical drop in periodicity of a 64-bit number compared to a 128-bit number) of the full UUID. Why? Because all but one of the algorithms are made up of randomized blocks of otherwise non-random input (such as the MAC address of a network interface, which is always the same on a machine generating millions of UUIDs). Had 64 bits been enough for randomized value uniqueness, then a uuid would have been that long.
What a better solution would be in your case is hard to say, because it is unclear what the third-party application does with the data from your database and how dependent it is on the uniqueness of the "PK" column in the view. An approach that is likely to work if the application does more than trivially display the data without any further use of the "PK" would be to associate a bigint with every retrieved uuid in your database in a (temporary) table and include that bigint in your view by linking on the uuids in your (temporary) tables. Since you can not trigger on SELECT statements, you would need a function to generate the bigint for every uuid the application retrieves. On updates or deletes on the underlying tables of the view or upon selecting data from related tables, you look up the uuid corresponding to the bigint passed in from the application. The lookup table and function would look somewhat like this:
CREATE TEMPORARY TABLE temp_table(
tempint bigserial PRIMARY KEY,
internal_uuid uuid);
CREATE INDEX ON temp_table(internal_uuid);
CREATE FUNCTION temp_int_for_uuid(pk uuid) RETURNS bigint AS $$
DECLARE
id bigint;
BEGIN
SELECT tempint INTO id FROM temp_table WHERE internal_uuid = pk;
IF NOT FOUND THEN
INSERT INTO temp_table(internal_uuid) VALUES (pk)
RETURNING tempint INTO id;
END IF;
RETURN id;
END; $$ LANGUAGE plpgsql STRICT;
Not pretty, not efficient, but fool-proof.
Use the bit() function to parse a decimal number from hex literal built from a substr of the UUID:
select ('x'||substr(UUID, 1, 16))::bit(64)::bigint
See SQLFiddle
Solution found.
UUID::text will return a string with hyphens. In order for substring(UUID::text from 1 for 16) to create a string that x can parse as hex the hyphens need to be stripped first.
The final query looks like:
SELECT hex_to_int(substring((select replace(id::text,'-','')) from 1 for 16))::bigint FROM table
The hext_to_int function needs to be able to handle a bigint, not just int. It looks like:
CREATE OR REPLACE FUNCTION hex_to_int(hexval character varying)
RETURNS bigint AS
$BODY$
DECLARE
result bigint;
BEGIN
EXECUTE 'SELECT x''' || hexval || '''::bigint' INTO result;
RETURN result;
END;
$BODY$`

what is the equivalent of varbinary(10) in postgresql

I'm trying to define a bit field of 10 bytes. In SQL Server I'd use varbinary(10). I know that bytea replaces varbinary (MAX) for images, but didn't find any documentation on limiting the number of bits in it.
Is there a way to do that?
you want to look at bit(n), not bytea
http://www.postgresql.org/docs/current/static/datatype-bit.html