Create timestamp index from JSON on PostgreSQL - postgresql

I have a table on PostgreSQL with a field named data that is jsonb with a lot of objects, I want to make an index to speed up the queries. I'm using few rows to test the data (just 15 rows) but I don't want to have problems with the queries in the future. I'm getting data from the Twitter API, so with a week I get around 10gb of data. If I make the normal index
CREATE INDEX ON tweet((data->>'created_at'));
I get a text index, if I make:
Create index on tweet((CAST(data->>'created_at' AS timestamp)));
I get
ERROR: functions in index expression must be marked IMMUTABLE
I've tried to make it "inmutable" setting the timezone with
date_trunc('seconds', CAST(data->>'created_at' AS timestamp) at time zone 'GMT')
but I'm still getting the "immutable" error. So, How can I accomplish a timestamp index from a JSON? I know that I could make a simple column with the date because probably it will remain constant in the time, but I want to learn how to do that.

This expression won't be allowed in the index either:
(CAST(data->>'created_at' AS timestamp) at time zone 'UTC')
It's not immutable, because the first cast depends on your DateStyle setting (among other things). Doesn't help to translate the result to UTC after the function call, uncertainty has already crept in ...
The solution is a function that makes the cast immutable by fixing the time zone (like #a_horse already hinted).
I suggest to use to_timestamp() (which is also only STABLE, not IMMUTABLE) instead of the cast to rule out some source of trouble - DateStyle being one.
CREATE OR REPLACE FUNCTION f_cast_isots(text)
RETURNS timestamptz AS
$$SELECT to_timestamp($1, 'YYYY-MM-DD HH24:MI')$$ -- adapt to your needs
LANGUAGE sql IMMUTABLE;
Note that this returns timestamptz. Then:
CREATE INDEX foo ON t (f_cast_isots(data->>'created_at'));
Detailed explanation for this technique in this related answer:
Does PostgreSQL support "accent insensitive" collations?
Related:
Query on a time range ignoring the date of timestamps

Related

Can I use to_char() and make_date() in postgreSQL table definition?

I'm working on a poc to migrate an on-prem SQL Server database to Amazon Aurora for PostgreSQL. Amazon's Schema Conversion Tool struggled to translate the SQL Server code for the creation of a table on this column:
[DOB] AS (CONVERT([varchar],datefromparts([DOB_year],[DOB_month],[DOB_day]),(120))) PERSISTED,
as the CONVERT function is unsupported in Postgres.
The best translation I can come up with is:
dob varchar(30) GENERATED ALWAYS AS (to_char((make_date(dob_year, dob_month, dob_day))::timestamp, 'YYYY-MM-DD HH24:MI:SS')) STORED,
but neither the SCT nor pgAdmin4 are recognising to_char() and make_date() as functions. 'dob_day', 'dob_month' and 'dob_year' are all column names with datatype of integer. I'm new to all this but another column definition is using other functions, e.g. replace() and right(), successfully, so I'm confused why this isn't working.
When I tried to run the code in pgAdmin I got this error:
ERROR: generation expression is not immutable
SQL state: 42P17
Thanks
to_char() is is not marked as immutable even though in your case it would be. But there are format masks that are not immutable if e.g. time zones or different locales are involved.
If you really want to (or are forced to) convert day,month, year in a formatted string (rather than a proper date which would be the correct thing to do), then you can only achieve this with a custom function:
create function create_string_date(p_year int, p_month int, p_day int)
returns text
as
$$
select to_char(make_date(p_year, p_month, p_day), 'yyyy-mm-dd hh24:mi:ss');
$$
language sql
immutable;
Marking the function as immutable isn't cheating, because we know that with the given input and format string this is indeed immutable.
dob text generated always as (create_string_date(dob_year, dob_month, dob_day)) stored

PostgreSQL: Insert Date and Time of day in the same field/box?

After performing the following:
INSERT INTO times_table (start_time, end_time) VALUES (to_date('2/3/2016 12:05',
'MM/DD/YYYY HH24:MI'), to_date('2/3/2016 15:05', 'MM/DD/YYYY HH24:MI'));
PostgreSQL only displays the date.
If possible, would I have to run a separate select statement to extract the time (i.e. 12:05 and 15:05), stored in that field? Or are the times completey discard when the query gets executed.
I don't want to use timestamp, since I'd like to execute this in Oracle SQL as well.
to_date returns... a date. Surprise! So yeah, it's not going to give you the time.
You should be using the timestamp data type to store times and functions which return timestamps. So use to_timestamp.
Oracle also has a timestamp data type and to_timestamp function.
In general, trying to write one set of SQL that works with multiple databases results in either having to write very simple SQL that doesn't take advantage of any of the database's features, or madness.
Instead, use a SQL query builder to write your SQL for you, take care of compatibility issues, and allow you to add clauses to existing statements. For example, Javascript has Knex.js and Perl has SQL::Abstract.

postgres index with date in where clause

I have large table with several million rows in Postgresql 9.1. One of the columns is timestamp with time zone.
Frequently used query is looking for data using where clause 'column > (now()::date - 11)' to look for last ten days.
I want to build an index that would work only for last months data, to limit the scan. Partial index.
So far I have not figured out how to use actual last month, so I started by hardcoding '2015-12-01' as a start date for index.
create index q on test (i) where i > '2015-01-01';
this worked fine, index was created. But unfortunately, it was not used, as it treats '2015-01-01' as a ::timestamp, while query is with a ::date. So index was not used and I was back to square one.
Next I tried to modify index to compare column with date, so it would match. But here I hit the immutable wall.
As to_date or cast as date are mutable functions, they are dependent on local timezone, index creation fails.
if I have test table like this:
create table test (i timestamptz);
and then try to create index with
create index q on test (i) where i > to_date('2015-01-01','YYYY-DD-MM');
then it fails with
ERROR: functions in index predicate must be marked IMMUTABLE
this is understandable. But now, when I try it with specific timezone
create index q on test (i) where i > to_date('2015-01-01','YYYY-DD-MM')
at time zone 'UTC';
it still fails
ERROR: functions in index predicate must be marked IMMUTABLE
this I don't understand anymore. It has timezone defined. What else is immutable?
I also tried creating immutable function myself:
CREATE FUNCTION
datacube=# create or replace function immutable_date(timestamptz) returns date as $$
select ($1::date at time zone 'UTC')::date;
$$ language sql immutable;
but using this function in index:
create index q on test (i) where i > immutable_date('2015-01-01');
fails with the same error:
ERROR: functions in index predicate must be marked IMMUTABLE
I am at loss here. Maybe it has something to do with Locales, not only timezones? Or something else makes it mutable?
And also - maybe there are another, simpler way, to limit index to last month or two of data? Table partitioning in Postgres would require rebuilding entire database, and so far I have not found anything else.

Rectifying timeformat in PostgreSQL

I am working on third party data which I need to load into my postgresql database. I am running into problems where sometimes I get the time '24:00:30' when it actually should be '00:00:30'. This rejects the data.
I tried to cast but it did not work.
insert into stop_times_test trip_id, cast(arrival_time as time), feed_id, status
from external_source;
Is there any way to convert to the correct one internally?
This may work for your case:
> select '0:0:0'::time + '24:00:30'::interval;
00:00:30
Cast to interval, then cast to time:
SELECT '24:00:30'::interval::time
If you want to bulk load the data with COPY or mass INSERT make the target column data type interval and convert it to time later. This works out of the box:
ALTER TABLE mytable ALTER col1 TYPE time;
No, there is no magic way of doing it. No cast will help you. 24:00:30 is an invalid time. Period.
You could try adding that value on a varchar and then using regular expressions to update the right values and insert them on the right columns. This sort of things happen a lot when doing data transformation.

Casting character varying field into a date

I have two tables,
details
id integer primary key
onsetdate Date
questionnaires
id integer primary key
patient_id integer foreign key
questdate Character Varying
Is it possible to make a SELECT statement that performs a JOIN on these two tables, ordering by the earliest date taken from a comparision of onsetdate and questdate (is it possible for example to cast the questdate into a Date field to do this?)
Typical format for questdate is "2009-04-22"
The actual tables have an encyrpted BYTEA field for the onsetdate - but I'll leave that part until later (the application is written in RoR using 'ezcrypto' to encrypt the BYTEA field).
something like
SELECT...
FROM details d
JOIN quesionnaires q ON d.id=q.id
ORDER BY LEAST (decrypt_me(onsetdate), questdate::DATE)
maybe? i'm not sure about the meaning of 'id', if you want to join by it or something else
By the way, you can leave out the explicit cast, it's in ISO format after all.
I guess you know what to use in place of decrypt_me()
There is a date parsing function in postgres: http://www.postgresql.org/docs/9.0/interactive/functions-formatting.html
Look for the to_timestamp function.
PostgreSQL supports the standard SQL CAST() function. (And a couple others, too.) So you can use
CAST (questdate AS DATE)
as long as all the values in the column 'questdate' evaluate to a valid date. If this database has been in production for a while, though, that's pretty unlikely. Not impossible, but pretty unlikely.