How to parse month-year string using Presto - type-conversion

I have a column that contains a Month-Year string that I would like to convert to an actual date representing the first day of the Month and Year combination. For example
+----------+------------+
| Original | Desired |
+----------+------------+
| Aug-19 | 08/01/2019 |
+----------+------------+
| Sep-20 | 09/01/2020 |
+----------+------------+
| May-22 | 05/01/2022 |
+----------+------------+
I have tried breaking apart the Month-Year string using split_part but when I try and pass Month as a parameter into date_parse it throws an error with the input (INVALID_FUNCTION_ARGUMENT). I could break apart the Month-Year into strings and then recombine, hard-coding the 01 however the problem seems that three letter month cannot be parsed into an actual month by Presto. I also want to avoid a 12 line CASE WHEN statement to parse the month if possible.

I'm not sure where the year comes from, but the query will be like this:
select date_format(date_parse('May-22', '%b-%d'), '%m/%d/%Y')
https://trino.io/docs/current/functions/datetime.html?mysql-date-functions

Related

Excel: Select the newest date from a list that contains multiple rows with the same ID

In Excel, I have a list with multiple rows of the same ID (column A), each with various dates recorded (Column B). I need to extract one row for each ID that contains the newest date. See below for example:
|Column A | Column B|
|(ID) | (Date) |
|-----------|-----------|
|00001 | 01/01/2022|
|00001 | 02/01/2022|
|00001 | 03/01/2022| <-- I Need this one
|00002 | 01/02/2022|
|00002 | 02/02/2022|
|00002 | 03/02/2022| <-- I Need this one
|00003 | 01/03/2022|
|00003 | 02/03/2022|
|00003 | 03/03/2022| <-- I Need this one
|00004 | 01/04/2022|
|00004 | 02/04/2022|
|00004 | 03/04/2022| <-- I Need this one
|00005 | 01/05/2022|
|00005 | 02/05/2022|
|00005 | 03/05/2022| <-- I Need this one
I need to extract the above rows, where the row with the newest date is extracted for each unique ID. It needs to look like this:
|Column A | Column B |
|(ID) | (Date) |
|----------|--------------|
|00001 | 03/01/2022 |
|00002 | 03/02/2022 |
|00003 |03/03/2022 |
|00004 | 03/04/2022 |
|00005 | 03/05/2022 |
I'm totally stumped and I can't seem to find the right answer (probably because of how I'm wording the question!)
Thank you!
Google searches for the answer - no joy. I don't know where to start in excel with this function, I thought perhaps DISTINCT or similar...
Assuming you have Office 365 compatible version of Excel, you could do something like this:
(screenshot/here refers):
=INDEX(SORTBY(A2:B11,B2#,-1),SEQUENCE(1,1,1,1),SEQUENCE(1,2,1,1))
This formula is superfluous albeit convenient - you don't really require the first sequence (there's only one row being returned). However, as you can see in the screenshot, using the self-same formula, this time with a leading 2 in the first argument of that sequence returns the top two (descending order) dates, and so forth.
FOR THOSE w/ Office 365 you could do something like this....
=LARGE(B2#+(ROW(B2#)-ROW(B2))/1000,1)
i.e. adding a "little bit" to the dates that we can subtract later and use as a unique reference (row number, original unsorted list)
As mentioned, reverse engineer, throw into an index, and voila!
=INDEX(A2:A11,ROUND((H2-ROUND(H2,0))*1000,6))
caveats:
the round(<>,6) is purely to eliminate Excel's irritating lack of precision issue.
can work if you're looking up text strings (i.e. attempting to sort alphabetically) EXCEPT large doesn't work with string (no prob, just use unicode - but good luck with expanding out the string etc. ☺ with mid(<>,row(a1:offset(a1,len(<>)-1)..,1)..

String splitting and operations on only some results

I have strings that look like this:
schedulestart | event_labels
2018-04-04 | 9=TTR&11=DNV&14=SWW&26=DNV&2=QQQ&43=FTW
When I look at it in the database. I have code that relies in this string in this format to display a schedule with events with those labels on those days.
Now I find myself needing to break down the string in postgres for reporting/analysis, and I can't really pull out the string and parse it in another language, so I have to stick to postgres.
I've figured out a way to unpack the string so my results look like this:
User ID | Schedule Start | Unpacked String
2 | 2018-04-04 | TTR
2 | 2018-04-04 | 9
2 | 2018-04-04 | DNV
2 | 2018-04-04 | 11
2 | 2018-04-04 | SWW
2 | 2018-04-04 | 14
2 | 2018-04-04 | DNV
2 | 2018-04-04 | 26
select schedulestart, unnest(string_to_array(unnest(string_to_array(event_labels, '&')), '=')) from table;
Now what I need is a way to actually perform an interval calculation (so 2018-04-04+11 days::interval), and I can if I only get a numbers list, but I need to also bind that result to each string. So the goal is an output like this:
eventdate | event_label
2018-04-12 | TTR
2018-04-20 | DNV
Where eventdate is the schedule start + which day of the schedule the event is on. I'm not sure how to take the unpacked string I created and use it to perform date calculations, and tie it to the string.
I've considered doing only one unnest, so that it's 11=TTR and 14=DNV, but I'm not sure how to take that to my desired result either. Is there a way to read a string until you reach a certain character, and then use that in calculations, and then read every character past a certain character in a string into a new column?
I'm aware completely rewriting how this is handled would be ideal, but I did not initially write it, and I don't have the time or means to rewrite the ~20 locations this is used.
Here is your table (I added userid column):
CREATE TABLE test(userid INTEGER, schedulestart DATE, event_labels VARCHAR);
And input data:
INSERT INTO test(userid,schedulestart , event_labels) VALUES
(2,DATE '2018-04-04', '9=TTR&11=DNV&14=SWW&26=DNV&2=QQQ&43=FTW');
And finally the solution:
SELECT
userid,
(schedulestart + (SPLIT_PART(kv,'=',1)||' days')::INTERVAL)::DATE AS eventdate,
SPLIT_PART(kv,'=',2) AS event_label
FROM (
SELECT
userid,schedulestart,
REGEXP_SPLIT_TO_TABLE(event_labels, '&') AS kv
FROM test
WHERE userid = 2
) a

PostgreSQL timestamp with negative

I have a set of date and time rows in varchar type which looks like this:
| TimeLocal |
| 2017-11-06 12:13:55 -21:18 |
| 2017-11-06 12:18:50 -21:18 |
| 2017-11-06 12:13:09 -21:18 |
I want to perform a conversion of this column into timestamp
Select TimeLocal::timestamp as New_Time_Local From tb1
However I am getting this error
ERROR: time zone displacement out of range
This appears only to those datetime with -21 but for the other dates, it was converted successfully
Any help would be much appreciated
Thanks!

Calculate time range in org-mode table

Given a table that has a column of time ranges e.g.:
| <2015-10-02>--<2015-10-24> |
| <2015-10-05>--<2015-10-20> |
....
how can I create a column showing the results of org-evalute-time-range?
If I attempt something like:
#+TBLFM: $2='(org-evaluate-time-range $1)
the 2nd column is populated with
Time difference inserted
in every row.
It would also be nice to generate the same result from two different columns with, say, start date and end date instead of creating one column of time ranges out of those two.
If you have your date range split into 2 columns, a simple subtraction works and returns number of days:
| <2015-10-05> | <2015-10-20> | 15 |
| <2013-10-02 08:30> | <2015-10-24> | 751.64583 |
#+TBLFM: $3=$2-$1
Using org-evaluate-time-range is also possible, and you get a nice formatted output:
| <2015-10-02>--<2015-10-24> | 22 days |
| <2015-10-05>--<2015-10-20> | 15 days |
| <2015-10-22 Thu 21:08>--<2015-08-01> | 82 days 21 hours 8 minutes |
#+TBLFM: $2='(org-evaluate-time-range)
Note that the only optional argument that org-evaluate-time-range accepts is a flag to indicate insertion of the result in the current buffer, which you don't want.
Now, how does this function (without arguments) get the correct time range when evaluated is a complete mystery to me; pure magic(!)

How do I convert Epoch time to Date in Open Refine?

I don't care which language I use (as long as it's one of the three available in Open Refine), but I need to convert a timestamp returned from an API from epoch time to a regular date (see Expression box in the screenshot below). Not too picky about the output date format, just that it retains the date down to the second. Thanks!
Can use: GREL, Jython, or Clojure.
If you have to stick to GREL you can use the following one-liner:
inc(toDate("01/01/1970 00:00:00","dd/MM/YYYY H:m:s"),value.toNumber(),"seconds").toString('yyyy-MM-dd HH:mm:ss')
Breaking it down:
inc(date d, number value, string unit) as defined in the GREL documentation : Returns a date changed by the given amount in the given unit of time. Unit defaults to 'hour'
toDate(o, string format) : Returns o converted to a date object. (more complex uses of toDate() are shown in the GREL documentation)
We use the string "01/01/1970 00:00:00" as input for toDate() to get the start of the UNIX Epoch (January 1st 1970 midnight).
We pass the newly created date object into inc() and as a second parameter the result of value.toNumber() (assuming value is a string representation of the number of seconds since the start of the Unix Epoch), as a 3rd parameter, the string "seconds" which tells inc() the unit of the 2nd parameter.
We finally convert the resulting date object into a string using the format: yyyy-MM-dd HH:mm:ss
Test Data
Following is a result of using the function described above to turn a series of timestamps grabbed from the Timestamp Generator into string dates.
| Name | Value | Date String |
|-----------|------------|---------------------|
| Timestamp | 1491998962 | 2017-04-09 12:09:22 |
| +1 Hour | 1492002562 | 2017-04-09 13:09:22 |
| +1 Day | 1492085362 | 2017-04-10 12:09:22 |
| +1 Week | 1492603762 | 2017-04-16 12:09:22 |
| +1 Month | 1494590962 | 2017-05-09 12:09:22 |
| +1 Year | 1523534962 | 2018-04-09 12:09:22 |
Unfortunately, I do not think you can do it with a GREL statement like this or somesuch, but I might be pleasantly surprised by someone else that can make it work somehow:
value.toDate().toString("dd/MM/yyy")
So in the meantime, use this Jython / Python Code:
import time;
# This is a comment.
# We change 'value' to an integer, since time needs to work with numbers.
# If we needed to, we could also * 1000 if we had a Unix Epoch Time in seconds, instead of milliseconds.
# We also have no idea what the local time zone is for this, which could affect the date. But we digress...
epochlong = int(float(value));
datetimestamp = time.strftime('%Y-%m-%d %H:%M:%S', time.localtime(epochlong));
return datetimestamp