how to use the age() function in postgresql - postgresql

I have a column in the students table called birthdate. i need to find students over the age of 12.
select ......, age(timestamp 'birthdate') as StudentAge
from students
.....
where StudentAge > 11
I dont know if thats the proper syntax or if im using the correct function for the situation

I think most of your confusion comes from unfamiliarity with Postgres's rich type system, and the syntax it uses.
In the page on date/time functions, the age function is listed with two forms. Assuming you want to compare to "today", you want the form with a single argument:
Function: age(timestamp)
Return type: interval
Description: Subtract from current_date (at midnight)
Example: age(timestamp '1957-06-13')
Result: 43 years 8 mons 3 days
So, you have a function which takes a value of type timestamp, and returns a value of type interval.
The example shows the input being specified as timestamp '1957-06-13'; this is just a way of creating a value of type timestamp from a hard-coded value - like creating an object in an object-oriented language. In your query, birthdate is not a hard-coded value, it's the name of a column, so this is not the syntax you want. If the column is of type timestamp, you can just use age(birthdate) directly; if not, you might need to convert it, e.g. age(CAST(birthdate AS timestamp)).
The output is of type interval, not a number of years, so comparing it against 12 is unlikely to do what you want. Instead, you should compare it against another interval value. Similar to the timestamp '1957-06-13' example, you can write interval '12 years' to directly create an interval value representing 12 years.
So your comparison would look like age(birthdate) >= interval '12 years'.

I don't know that tutorial you are talking about, but the documentation has the following to say about column labels:
The entries in the select list can be assigned names for subsequent processing, such as for use in an ORDER BY clause or for display by the client application.
Observe the subsequent here: The SELECT list is (logically) processed after the WHERE clause, so you cannot use column labels there.
You'll have to repeat the expression. This is in accordance with the SQL standard.
Moreover, birthdate is not a string literal, so don't quote it. And remove the timestamp.

Related

postgreSQL increment number in output

I am extracting three values (server, region, max(date)) from my postgresql> But I want to extract an additional 4th field which should be the numerical addition of 1 to 3rd field. I am unable to use date add function as in the database date field is defined as an integer.
date type in DB
date|integer|not null
tried using cast and date add function
MAX(s.date)::date + cast('1 day' as interval)
Error Received
ERROR: cannot cast type integer to date
Required output
select server, region, max(alarm_date), next date from table .....
testserver, europe, 20190901, 20190902
testserver2, europe, 20191001, 20191002
next date value should be the addition to alarm_date
To convert an integer like 20190901 to a date, use something like
to_date(CAST(s.date AS text), 'YYYYMMDD')
It is a bad idea to store dates as integers like that. Using the date data type will prevent corrupted data from entering the database, and it will make all operations natural.
First solution that came to my mind:
select (20190901::varchar)::date + 1
Which output 2019-09-02 as type date.
Other solutions can be found here.

Connecting BigQuery and Google Sheets - DATE parameter issue

following 1 I started creating a Spreadsheet which reads data from BigQuery, but I'm having an issue handling parameters related to date values.
In the first sheet, I created 2 cells with 2 parameters, the start and the end of a date interval, with proper values. Both cells are formatted as "Date" value.
In the second sheet I configured BigQuery connector, for this example, I'm using a public dataset with dates. bigquery-public-data.utility_eu.date_greg
From the BigQuery connector wizard I added:
"STARTDATE" as "PARAMETERS!B1"
"ENDDATE" as "PARAMETERS!B2"
After this configuration, this is the resulting query:
SELECT
date,
date_str,
date_int
FROM `bigquery-public-data.utility_eu.date_greg`
WHERE date > DATE(#STARTDATE) AND date < DATE(#ENDDATE)
LIMIT 10
I'm getting an error directly from the editor with this message:
> Error BigQuery: No matching signature for function DATE for argument types: INT64. Supported signatures: DATE(TIMESTAMP, [STRING]); DATE(DATETIME); DATE(INT64, INT64, INT64) at [8:14]
As far as I can understand, the "date" cells are retrieved as a number, so the direct parse is not working. After a couple of tests, I understood the that given int value is the number I can obtain change cell format to "number".
If you convert cell value from DATE to NUMBER you get this value:
01/05/2019 -> 43.586
31/05/2019 -> 43.616
What is this number? It is not milliseconds, it increases by 1 every next day. In order to create the proper query that can parse this int, I need to understand what is this int (of course I can handle the cell as "text" and writing the timestamp value directly, but I would prefer to have the native date format so I can use the built-in calendar.
My consideration (with simple math) is that this number refers to a number of days since 30/12/1899, but it is very odd (also, every date BEFORE this days is always 0), so I'm asking you directly how to handle this value. Basing on my understanding of when the number counter starts (30/12/1899), I created this query which add the number retrieved from the cell:
SELECT *
FROM `bigquery-public-data.utility_eu.date_greg`
WHERE
date >= DATE_ADD(DATE("1899-12-30"), INTERVAL #DATAINIZIO DAY)
AND date <= DATE_ADD(DATE("1899-12-30"), INTERVAL #DATAFINE DAY)
It is working... but I think I'm doing a workaround that is not the proper way of doing this.
Also, is there any full documentation related to this BigQuery connection provided by Spreadsheet? Besides presentation in 1 I'm unable to find any specific documentation.
Spreadsheets (Google, Excel, ...) store the dates as days passed since a starting date with a fractional day representing time.
From here: "Excel stores dates and times as a number representing the number of days since 1900-Jan-0, plus a fractional portion of a 24 hour day: ddddd.tttttt . This is called a serial date, or serial date-time."
Now, you have to ways to filter by date on your Query:
In the query, you can use DATE_ADD to add your number of days (cell value) to the base date. (Carefull, DATE_ADD takes INT, and the date value is float so needs prior casting).
(preferred) on your spreadsheet you use TEXT(cell, "yyyy-mm-dd") so you can then use DATE() in the BigQuery query.
I use the second method as, though you need that extra cell (unless you directly store the date as YYYY-MM-DD; keeps the query cleaner than having a cast and date_add in there. Also would save you from the "1904 problem" explained in the link above.
What is this number? It is not milliseconds, it increases by 1 every next day.
This is so called serial number which represent number of days since "very beginning"
Google's Spreadsheet date calendar starts from 1900-01-01 - which is treated as a "very beginning"
In order to create the proper query that can parse this int, I need to understand what is this int
Armed with above info you can adjust you dates calculation to be in sync with what BigQuery expects
You mentioned that your fields are already in Date format, maybe you are doing an extra parsing in your query.
Try to do it without the DATE functions.
Also, I found this other doc, not merely related to connection, but might be helpful: Getting info from Spreadsheets with BigQuery.

Converting string timestamp into date

I have dates in a postgres database. The problem is they are stored in a string field and have values similar to: "1187222400000" (which would correspond to 07.08.2007).
I would like to convert them into readable dates usind some SQL to_date() expression or something similar but can't come up with the correct syntax to make it work.
There really isn't enough information here for a conclusion, so I propose this 'scientific-wild-ass-guess' to resolve your puzzle. :)
It appears this number is UNIX 'epoch time' in milliseconds. I'll show this example as if your string field had the arbitrary name, 'epoch_milli'. In postgresql you can convert it to a time stamp using this statement:
SELECT TIMESTAMP WITH TIME ZONE 'epoch' + epoch_milli * INTERVAL '1 millisecond';
or using this built-in postgresql function:
SELECT to_timestamp(epoch_milli / 1000)
either of which, for the example '1187222400000', produces the result
"2007-08-15 17:00:00-07"
You can do some of your own sleuthing with quite a few values selected similarly to this:
SELECT to_timestamp(epoch_milli/1000)::DATE
FROM (VALUES (1187222400000),(1194122400000)) AS val(epoch_milli);
"Well, bollocks, man. I just want the date." Point taken.
Simply cast the timestamp to a date to discard the excess bits:
SELECT to_timestamp(epoch_milli / 1000)::DATE
Of course its possible that this value is a conversion or is relative to some other value, hence the request for a second example data point.

What are the benefits of using Postgresql Daterange type instead of two Date fields?

I'm working with PostgreSQL 9.4 and I discovered today the Daterange type. Until now I used a field startDateTime and an other field startEndTime, so what would be the benefits of using the Daterange type instead?
There is nothing that you can't do with a startDateTime and an endDateTime that you can do with a tsrange (or daterange for dates). However, there a quite a few operators on range types that make writing queries far more concise and understandable. Operators like overlap &&, containment #> and adjacency -|- between two ranges are especially useful for date and time ranges. A big bonus for range types is that you apply a gist index on them which makes searches much faster.
As an example, find all rows where an event takes place within a certain time range:
Start/end
SELECT *
FROM events
WHERE startDateTime >= '2016-01-01'::timestamp
AND endDateTime < '2016-01-19'::timestamp;
Range
SELECT *
FROM events
WHERE startEndRange <# tsrange('2016-01-01'::timestamp, '2016-01-19'::timestamp);
Some additions to Patricks answer:
Information if the lower/upper bound is included or excluded
the range type explicitly includes this information and it can be different per row
NULL
When we have 2 separate cols, we can define nullable for each of them.
When using a single tsrange column, we can use a check constraint: PostgreSQL 9.2 tstzrange null/infinity CONSTRAINT CHECK
Cannot get date from empty tsrange
select lower(tsrange('2020-12-01', '2020-12-01')); returns NULL
this was a gotcha for me: I don't know if it's possible to get the start/end date
updating
When you have 2 date-time fields, you can update each one separately
for tsrange you must create a new value and consider the boundaries: see Update lower/upper bound of range type
tsrange will throw an error when the start is lower than the end
select tsrange('2020-12-20', '2020-12-19');
-- [22000] ERROR: range lower bound must be less than or equal to range upper bound
keep in mind, that the range types may not be supported by some drivers/libraries/tools. e.g. pg-promise, node-postgres have no build-in support. But there are external packages to add support, e.g. node-pg-range

How to work with date and time in KDB

I tried to work with dateDtimespan type by subtracting one dateDtimespan from another, but KDB (QPad) always shows 0 as a result, why?
Also if I have, say, datetime 12.11.2014:22:33:00.000000000 in one column and only time 22:32:00.000000000 in another, how I may remove date part from the first column to subtract time portion from the second column?
to remove the date, you can use the cast operator, $. To reference only the time, you can prefix $ with `time as shown below.
q).z.z
2015.02.23T14:10:33.523
q)`time$.z.z
14:10:30.731
q)t:([]ts:10#.z.N;ti:.z.t-til 10)
q)exec `time$ts-ti from t
00:00:00.000 00:00:00.001 00:00:00.002 00:00:00.003 00:00:00.004 00:00:00.005..
You can see more examples here. http://code.kx.com/q/ref/casting/#cast
I'll prefer downcasting the timestamp to timespan first and then calculate the diff i.e. (`timespan$p)-n. No harm in using the other way (`timespan$p-n) but it is less explicit than the former.
q)dt:( [] p:2#2014.12.11D22:33:00.000000000;n:2#22:32:00.000000000)
q)select (`timespan$p)-n from dt
p
--------------------
0D00:01:00.000000000
0D00:01:00.000000000