Casting character varying field into a date - postgresql

I have two tables,
details
id integer primary key
onsetdate Date
questionnaires
id integer primary key
patient_id integer foreign key
questdate Character Varying
Is it possible to make a SELECT statement that performs a JOIN on these two tables, ordering by the earliest date taken from a comparision of onsetdate and questdate (is it possible for example to cast the questdate into a Date field to do this?)
Typical format for questdate is "2009-04-22"
The actual tables have an encyrpted BYTEA field for the onsetdate - but I'll leave that part until later (the application is written in RoR using 'ezcrypto' to encrypt the BYTEA field).

something like
SELECT...
FROM details d
JOIN quesionnaires q ON d.id=q.id
ORDER BY LEAST (decrypt_me(onsetdate), questdate::DATE)
maybe? i'm not sure about the meaning of 'id', if you want to join by it or something else
By the way, you can leave out the explicit cast, it's in ISO format after all.
I guess you know what to use in place of decrypt_me()

There is a date parsing function in postgres: http://www.postgresql.org/docs/9.0/interactive/functions-formatting.html
Look for the to_timestamp function.

PostgreSQL supports the standard SQL CAST() function. (And a couple others, too.) So you can use
CAST (questdate AS DATE)
as long as all the values in the column 'questdate' evaluate to a valid date. If this database has been in production for a while, though, that's pretty unlikely. Not impossible, but pretty unlikely.

Related

Alphanumeric sorting without any pattern on the strings [duplicate]

I've got a Postgres ORDER BY issue with the following table:
em_code name
EM001 AAA
EM999 BBB
EM1000 CCC
To insert a new record to the table,
I select the last record with SELECT * FROM employees ORDER BY em_code DESC
Strip alphabets from em_code usiging reg exp and store in ec_alpha
Cast the remating part to integer ec_num
Increment by one ec_num++
Pad with sufficient zeors and prefix ec_alpha again
When em_code reaches EM1000, the above algorithm fails.
First step will return EM999 instead EM1000 and it will again generate EM1000 as new em_code, breaking the unique key constraint.
Any idea how to select EM1000?
Since Postgres 9.6, it is possible to specify a collation which will sort columns with numbers naturally.
https://www.postgresql.org/docs/10/collation.html
-- First create a collation with numeric sorting
CREATE COLLATION numeric (provider = icu, locale = 'en#colNumeric=yes');
-- Alter table to use the collation
ALTER TABLE "employees" ALTER COLUMN "em_code" type TEXT COLLATE numeric;
Now just query as you would otherwise.
SELECT * FROM employees ORDER BY em_code
On my data, I get results in this order (note that it also sorts foreign numerals):
Value
0
0001
001
1
06
6
13
۱۳
14
One approach you can take is to create a naturalsort function for this. Here's an example, written by Postgres legend RhodiumToad.
create or replace function naturalsort(text)
returns bytea language sql immutable strict as $f$
select string_agg(convert_to(coalesce(r[2], length(length(r[1])::text) || length(r[1])::text || r[1]), 'SQL_ASCII'),'\x00')
from regexp_matches($1, '0*([0-9]+)|([^0-9]+)', 'g') r;
$f$;
Source: http://www.rhodiumtoad.org.uk/junk/naturalsort.sql
To use it simply call the function in your order by:
SELECT * FROM employees ORDER BY naturalsort(em_code) DESC
The reason is that the string sorts alphabetically (instead of numerically like you would want it) and 1 sorts before 9.
You could solve it like this:
SELECT * FROM employees
ORDER BY substring(em_code, 3)::int DESC;
It would be more efficient to drop the redundant 'EM' from your em_code - if you can - and save an integer number to begin with.
Answer to question in comment
To strip any and all non-digits from a string:
SELECT regexp_replace(em_code, E'\\D','','g')
FROM employees;
\D is the regular expression class-shorthand for "non-digits".
'g' as 4th parameter is the "globally" switch to apply the replacement to every occurrence in the string, not just the first.
After replacing every non-digit with the empty string, only digits remain.
This always comes up in questions and in my own development and I finally tired of tricky ways of doing this. I finally broke down and implemented it as a PostgreSQL extension:
https://github.com/Bjond/pg_natural_sort_order
It's free to use, MIT license.
Basically it just normalizes the numerics (zero pre-pending numerics) within strings such that you can create an index column for full-speed sorting au naturel. The readme explains.
The advantage is you can have a trigger do the work and not your application code. It will be calculated at machine-speed on the PostgreSQL server and migrations adding columns become simple and fast.
you can use just this line
"ORDER BY length(substring(em_code FROM '[0-9]+')), em_code"
I wrote about this in detail in this related question:
Humanized or natural number sorting of mixed word-and-number strings
(I'm posting this answer as a useful cross-reference only, so it's community wiki).
I came up with something slightly different.
The basic idea is to create an array of tuples (integer, string) and then order by these. The magic number 2147483647 is int32_max, used so that strings are sorted after numbers.
ORDER BY ARRAY(
SELECT ROW(
CAST(COALESCE(NULLIF(match[1], ''), '2147483647') AS INTEGER),
match[2]
)
FROM REGEXP_MATCHES(col_to_sort_by, '(\d*)|(\D*)', 'g')
AS match
)
I thought about another way of doing this that uses less db storage than padding and saves time than calculating on the fly.
https://stackoverflow.com/a/47522040/935122
I've also put it on GitHub
https://github.com/ccsalway/dbNaturalSort
The following solution is a combination of various ideas presented in another question, as well as some ideas from the classic solution:
create function natsort(s text) returns text immutable language sql as $$
select string_agg(r[1] || E'\x01' || lpad(r[2], 20, '0'), '')
from regexp_matches(s, '(\D*)(\d*)', 'g') r;
$$;
The design goals of this function were simplicity and pure string operations (no custom types and no arrays), so it can easily be used as a drop-in solution, and is trivial to be indexed over.
Note: If you expect numbers with more than 20 digits, you'll have to replace the hard-coded maximum length 20 in the function with a suitable larger length. Note that this will directly affect the length of the resulting strings, so don't make that value larger than needed.

Using DDL, how can the default value for a Numeric(8,0) field be set to today's date as YYYYMMDD?

I'm trying to convert this, which works:
create_timestamp for column
CREATETS TIMESTAMP NOT NULL DEFAULT CURRENT_TIMESTAMP,
to something that works like this, but this code is not working:
date_created for column
DTCREATE NUMERIC(8,0) NOT NULL DEFAULT VARCHAR_FORMAT(CURRENT_TIMESTAMP, 'YYYYMMDD'),
Can anyone advise DDL to accomplish what I'm going for? Thank you.
When asking for help with Db2, always specify your Db2-server platform (Z/OS , i-series, linux/unix/windows) and Db2-server version, because the answer can depend on these facts.
The default-clause for a column does not have syntax that you expect, and that is the reason you get a syntax error.
It's can be a mistake to store a date as a numeric, because it causes no end of hassle to programmers and reporting tools, and data exchange. It's usually a mistake based on false assumptions.
If you want to store a date (not a timestamp) then use the column datatype DATE which lets you use:
DTCREATE DATE NOT NULL DEFAULT CURRENT DATE
How you choose, or future programmers choose , to render the value of a date on the SQL output is a different matter.
You may use BEFORE INSERT trigger to emulate a DEFAULT clause with such an unsupported function instead.
CREATE TRIGGER MYTAB_BIR
BEFORE INSERT ON MYTAB
REFERENCING NEW AS N
FOR EACH ROW
WHEN (N.DATE_CREATED IS NULL)
SET DATE_CREATED = VARCHAR_FORMAT(CURRENT_TIMESTAMP, 'YYYYMMDD');

Postgres truncates trailing zeros for timestamps

Postgres (V11.3, 64bit, Windows) truncates trailing zeros for timestamps. So if I insert the timestamp '2019-06-12 12:37:07.880' into the table and I read it back as text postgres returns '2019-06-12 12:37:07.88'.
Table date_test:
CREATE TABLE public.date_test (
id SERIAL,
"timestamp" TIMESTAMP WITHOUT TIME ZONE NOT NULL,
CONSTRAINT pkey_date_test PRIMARY KEY(id)
)
SQL command when inserting data:
INSERT INTO date_test (timestamp) VALUES( '2019-06-12 12:37:07.880' )
SQL command to retrieve data:
SELECT dt.timestamp ::TEXT FROM date_test dt
returns '2019-06-12 12:37:07.88'
Do you consider this a bug or a feature?
My real issue is: I´m running queries from a C++ program and I have to convert the data returned from the database to appropriate data types. Since the protocol is text-based everything I read from the database is plain text. When parsing timestamps I first tokenize the string and then convert each token to integer. And because the millisecond part is truncated, the last token is "88" instead of "880", and converting "88" yields another value that converting "880" to integer.
That's the default display format when using a cast to text.
If you want to see all three digits, use to_char()
SELECT to_char(dt.timestamp,'yyyy-mm-d hh24:mi:ss.ms')
FROM date_test dt;
will return 2019-06-12 12:37:07.880
It’s a matter of presentation only.
First note that 07.88 seconds and 07.880 seconds is the same amount of time (also 7.88 and 07.880000000 for that matter).
PostgreSQL internally represents a timestamp in a way that we shouldn’t be concerned about as long as it’s an unambiguous representation. When you retrieve the timestamp, it is formatted into some string. This is where PostgreSQL apparently chooses not to print redundant trailing zeros. So it’s probably not even correct to say that it truncates anything. It just refrains from generating that 0.
I think that the nice solution would be to modify your parser in C++ to accept any number of decimals and parse them correctly with and without trailing zeroes. Another solution that should work is given in the answer by a_horse_with_no_name.

Create timestamp index from JSON on PostgreSQL

I have a table on PostgreSQL with a field named data that is jsonb with a lot of objects, I want to make an index to speed up the queries. I'm using few rows to test the data (just 15 rows) but I don't want to have problems with the queries in the future. I'm getting data from the Twitter API, so with a week I get around 10gb of data. If I make the normal index
CREATE INDEX ON tweet((data->>'created_at'));
I get a text index, if I make:
Create index on tweet((CAST(data->>'created_at' AS timestamp)));
I get
ERROR: functions in index expression must be marked IMMUTABLE
I've tried to make it "inmutable" setting the timezone with
date_trunc('seconds', CAST(data->>'created_at' AS timestamp) at time zone 'GMT')
but I'm still getting the "immutable" error. So, How can I accomplish a timestamp index from a JSON? I know that I could make a simple column with the date because probably it will remain constant in the time, but I want to learn how to do that.
This expression won't be allowed in the index either:
(CAST(data->>'created_at' AS timestamp) at time zone 'UTC')
It's not immutable, because the first cast depends on your DateStyle setting (among other things). Doesn't help to translate the result to UTC after the function call, uncertainty has already crept in ...
The solution is a function that makes the cast immutable by fixing the time zone (like #a_horse already hinted).
I suggest to use to_timestamp() (which is also only STABLE, not IMMUTABLE) instead of the cast to rule out some source of trouble - DateStyle being one.
CREATE OR REPLACE FUNCTION f_cast_isots(text)
RETURNS timestamptz AS
$$SELECT to_timestamp($1, 'YYYY-MM-DD HH24:MI')$$ -- adapt to your needs
LANGUAGE sql IMMUTABLE;
Note that this returns timestamptz. Then:
CREATE INDEX foo ON t (f_cast_isots(data->>'created_at'));
Detailed explanation for this technique in this related answer:
Does PostgreSQL support "accent insensitive" collations?
Related:
Query on a time range ignoring the date of timestamps

TSQL Default Minimum DateTime

Using Transact SQL is there a way to specify a default datetime on a column (in the create table statement) such that the datetime is the minimum possible value for datetime values?
create table atable
(
atableID int IDENTITY(1, 1) PRIMARY KEY CLUSTERED,
Modified datetime DEFAULT XXXXX??????
)
Perhaps I should just leave it null.
As far as I am aware no function exists to return this, you will have to hard set it.
Attempting to cast from values such as 0 to get a minimum date will default to 01-01-1900.
As suggested previously best left set to NULL (and use ISNULL when reading if you need to), or if you are worried about setting it correctly you could even set a trigger on the table to set your modified date on edits.
If you have your heart set on getting the minimum possible date then:
create table atable
(
atableID int IDENTITY(1, 1) PRIMARY KEY CLUSTERED,
Modified datetime DEFAULT '1753-01-01'
)
I agree with the sentiment in "don't use magic values". But I would like to point out that there are times when it's legit to resort to such solutions.
There is a price to pay for setting columns nullable: NULLs are not indexable. A query like "get all records that haven't been modified since the start of 2010" includes those that have never been modified. If we use a nullable column we're thus forced to use [modified] < #cutoffDate OR [modified] IS NULL, and this in turn forces the database engine to perform a table scan, since the nulls are not indexed. And this last can be a problem.
In practice, one should go with NULL if this does not introduce a practical, real-world performance penalty. But it can be difficult to know, unless you have some idea what realistic data volumes are today and will be in the so-called forseeable future. You also need to know if there will be a large proportion of the records that have the special value - if so, there's no point in indexing it anyway.
In short, by deafult/rule of thumb one should go for NULL. But if there's a huge number of records, the data is frequently queried, and only a small proportion of the records have the NULL/special value, there could be significant performance gain for locating records based on this information (provided of course one creates the index!) and IMHO this can at times justify the use of "magic" values.
"Perhaps I should leave it null"
Don't use magic numbers - it's bad practice - if you don't have a value leave it null
Otherwise if you really want a default date - use one of the other techniques posted to set a default date
Unless you are doing a DB to track historical times more than a century ago, using
Modified datetime DEFAULT ((0))
is perfectly safe and sound and allows more elegant queries than '1753-01-01' and more efficient queries than NULL.
However, since first Modified datetime is the time at which the record was inserted, you can use:
Modified datetime NOT NULL DEFAULT (GETUTCDATE())
which avoids the whole issue and makes your inserts easier and safer - as in you don't insert it at all and SQL does the housework :-)
With that in place you can still have elegant and fast queries by using 0 as a practical minimum since it's guranteed to always be lower than any insert-generated GETUTCDATE().
Sometimes you inherit brittle code that is already expecting magic values in a lot of places. Everyone is correct, you should use NULL if possible. However, as a shortcut to make sure every reference to that value is the same, I like to put "constants" (for lack of a better name) in SQL in a scaler function and then call that function when I need the value. That way if I ever want to update them all to be something else, I can do so easily. Or if I want to change the default value moving forward, I only have one place to update it.
The following code creates the function and a table using it for the default DateTime value. Then inserts and select from the table without specifying the value for Modified. Then cleans up after itself. I hope this helps.
-- CREATE FUNCTION
CREATE FUNCTION dbo.DateTime_MinValue ( )
RETURNS DATETIME
AS
BEGIN
DECLARE #dateTime_min DATETIME ;
SET #dateTime_min = '1/1/1753 12:00:00 AM'
RETURN #dateTime_min ;
END ;
GO
-- CREATE TABLE USING FUNCTION FOR DEFAULT
CREATE TABLE TestTable
(
TestTableId INT IDENTITY(1, 1)
PRIMARY KEY CLUSTERED ,
Value VARCHAR(50) ,
Modified DATETIME DEFAULT dbo.DateTime_MinValue()
) ;
-- INSERT VALUE INTO TABLE
INSERT INTO TestTable
( Value )
VALUES ( 'Value' ) ;
-- SELECT FROM TABLE
SELECT TestTableId ,
VALUE ,
Modified
FROM TestTable ;
-- CLEANUP YOUR DB
DROP TABLE TestTable ;
DROP FUNCTION dbo.DateTime_MinValue ;
I think your only option here is a constant. With that said - don't use it - stick with nulls instead of bogus dates.
create table atable
(
atableID int IDENTITY(1, 1) PRIMARY KEY CLUSTERED,
Modified datetime DEFAULT '1/1/1753'
)
I think this would work...
create table atable
(
atableID int IDENTITY(1, 1) PRIMARY KEY CLUSTERED,
Modified datetime DEFAULT ((0))
)
Edit: This is wrong...The minimum SQL DateTime Value is 1/1/1753. My solution provides a datetime = 1/1/1900 00:00:00. Other answers have the correct minimum date...