Storing dates in postgresql arrays - postgresql

I have a large person-oriented historical dataset, which includes birth-dates recorded in YYYY, YYYY-MM, or YYYY-MM-DD format. I've been thinking I should use a date[] array for this field because the dataset frequently lists two or more birth-dates.
PG docs say that ISO 8601 dates are supported, and ISO 8601 (wikipedia link) accommodates reduced precision, but PG doesn't let me add a reduced-precision date (like 1882-11 for November 1882).
So, what's the best approach for handling records that need to contain multiple birth-dates that might look like 1883, 1882-11, or 1882-12-12?

Lets imagine you have a table person with
+----------------+---------+----------+--------------------------------+
| person_id | fname | lname | bdate[] |
+----------------+---------+----------+--------------------------------+
| 1 | 'Jhon' | 'Smith' | {1883, 1882-11, or 1882-12-12} |
+----------------+---------+----------+--------------------------------+
You dont want that because then is hard to search for one date or update the array.
Instead you want one aditional table birthdays
+-------------+------+------------+
| birthday_id | type | bdate |
+-------------+------+------------+
| 1 | 1 | 1883-01-01 |
| 1 | 2 | 1882-11-01 |
| 1 | 3 | 1882-12-12 |
+-------------+------+------------+
This way even when date is save 1883-01-01 you know will be type = 1 or 1883

Related

How to represent repeating month in ISO 8601

I try to promote the usage of ISO8601. How can every month of any year and halves of theses months be representated in ISO 8601?
We use a perpetual calendar in Excel where there are 2 column headers : first row is January, February, etc. and below a column subdivided into 2, something like this following example:
| Tasks | January | February | March | ...
| | | | | | | | ...
| task1 | | X | | X | | | ...
| task2 | | | X | | | | ...
How to best merge these header rows into in a meaningful row, written Jan-<first half> | Jan-<second-half> | etc. in an easily readable form. I think January-01 | January-02 is obviously not the answer. If this is not the right way to do it, please, describe how to deal with this kind of repetition.
This question is different from the one about representing date ranges I've redirected to as in the later start and end years/dates are indicated. My question is about recurring approximate date spans.
Thanks

Combine multiple rows into single row in Google Data Prep

I have a table which has multiple payload values in separate rows. I want to combine those rows into a single row to have all the data together. Table looks something like this.
+------------+--------------+------+----+----+----+----+
| Date | Time | User | D1 | D2 | D3 | D4 |
+------------+--------------+------+----+----+----+----+
| 2020-04-15 | 05:39:45 UTC | A | 2 | | | |
| 2020-04-15 | 05:39:45 UTC | A | | 5 | | |
| 2020-04-15 | 05:39:45 UTC | A | | | 8 | |
| 2020-04-15 | 05:39:45 UTC | A | | | | 7 |
+------------+--------------+------+----+----+----+----+
And I want to convert it to something like this.
+------------+--------------+------+----+----+----+----+
| Date | Time | User | D1 | D2 | D3 | D4 |
+------------+--------------+------+----+----+----+----+
| 2020-04-15 | 05:39:45 UTC | A | 2 | 5 | 8 | 7 |
+------------+--------------+------+----+----+----+----+
I tried "set" and "aggregate" but they didn't work as I wanted them to and I am not sure how to go forward.
Any help would be appreciated.
Thanks.
tl;dr:
use fill() function to fill all empty values within each d1-d4 columns in the wanted group (AKA - the columns date+time+user) then dedup\aggregate to your heart's content.
long version
So the quickest way to do this is by using a window-function called "fill()".
What this function does for each given field in a column, it tells it:
"Look down. look up. find the closest non-empty value, and copy it!"
you can ofcourse limit it's sight (look only 3 rows above, for example) but for this example, don't need the limitation. so your fill function will look like this:
FILL($col, -1, -1)
So the "$col" will reference all the chosen columns. the "-1" says "unlimited sight".
finally, the "~" says "from column D1 to column D4".
So, function will look like this:
.
Which in turn will make your columns look like this:
.
Now you can use the "dedup" transformation to remove any duplications, and only 1 copy of each "group" will remain.
Alternatively, if you still want to use "group by", you can do that aswell.
Hope this helps =]
p.s
There are more ways to do this - which entails using the "pivot" transformation, and array unnesting. But in the process you'll lose your columns' names, and will need to rename them.

when converting text/string to date in postgres, random date is generated

I have a text column indicating date i.e. 20170101
UPDATE table_name
SET work_date = to_date(workdate, 'YYYYMMDD');
I used this command to convert it as date. However, I got a odd result. I read though other existing posts but not sure what's wrong here.
+----------+---------------+
| workdate | work_date |
+----------+---------------+
| 20170211 | 2207-05-09 |
| 20170930 | 2209-04-27 |
| 20170507 | 2208-02-29 |
| 20170318 | 2207-08-24 |
+----------+---------------+
I think you must be mistaken about the data you are supplying to to_date.
For example, input to these functions is not restricted by normal ranges, thus to_date('20096040','YYYYMMDD') returns 2014-01-17 rather than causing an error.
Source: https://www.postgresql.org/docs/9.6/static/functions-formatting.html

Order by date AND id, sqldeveloper

I have some tables with date and id as two of the columns:
ID | DATE | ITEMS
1 | 7/1/13 | More Apples
2 | 6/29/13 | Carrots
1 | 6/20/13 | Apples
2 | 6/10/13 | Broccoli
I would like to order them by DATE and then group them by ID's so that all the 1's are together ordered by dates:
ID | DATE | ITEMS
1 | 7/1/13 | More Apples
1 | 6/20/13 | Apples
2 | 6/29/13 | Carrots
2 | 6/10/13 | Broccoli
How would I accomplish this?
I'm thinking my solution might be a sub-select but I haven't gotten anywhere closest to what I want to achieve. Note that the above tables are very simplified. I'm actually trying to accomplish this with many tables joined and many different fields being displayed. Thanks.

Finding the last seven days in a time series

I have a spreadsheet with column A which holds a timestamp and updates daily. Column B holds a value. Like the following:
+--------------------+---------+
| 11/24/2012 1:14:21 | $487.20 |
| 11/25/2012 1:14:03 | $487.20 |
| 11/26/2012 1:14:14 | $487.20 |
| 11/27/2012 1:14:05 | $487.20 |
| 11/28/2012 1:13:56 | $487.20 |
| 11/29/2012 1:13:57 | $487.20 |
| 11/30/2012 1:13:53 | $487.20 |
| 12/1/2012 1:13:54 | $492.60 |
+--------------------+---------+
What I am trying to do is get the average of the last 7, 14, 30 days.
I been playing with GoogleClock() function in order to filter the dates in column A but I can't seem to find the way to subtract TODAY - 7 days. I suspect FILTER will also help, but I am a little bit lost.
There are a few ways to go about this; one way is to return an array of values with a QUERY function (this assumes a header row in row 1, and you want the last 7 dates):
=QUERY(A2:B;"select B order by A desc limit 7";0)
and you can wrap this in whatever aggregation function you like:
=AVERAGE(QUERY(A2:B;"select B order by A desc limit 7";0))