This question is far from unique, but i cannot find a way to convert the strings that are contained in this df column to datetime and date alone objects in order to use them as the index of my dataframe.
How can i convert this string to datetime or date format to use it as an index on my df?
The format of this column in particular is as follows:
>>> data['DateTime']
0 20140101 00:00:00
1 20140101 00:00:00
3 20140101 00:00:00
4 20140101 00:00:00
5 20140101 00:00:00
6 20140101 00:00:00
7 20140101 00:00:00
8 20140101 00:00:00
9 20140101 00:00:00
10 20140101 00:00:00
Name: DateTime, Length: 3779, dtype: object
Use to_datetime to convert to a string to a datetime, you can pass a formatting string but in this case it seems to handle it fine, then if you wanted a date then call apply and use a lambda to call .date() on each datetime entry:
In [59]:
df = pd.DataFrame({'DateTime':['20140101 00:00:00']*10})
df
Out[59]:
DateTime
0 20140101 00:00:00
1 20140101 00:00:00
2 20140101 00:00:00
3 20140101 00:00:00
4 20140101 00:00:00
5 20140101 00:00:00
6 20140101 00:00:00
7 20140101 00:00:00
8 20140101 00:00:00
9 20140101 00:00:00
In [60]:
df['DateTime'] = pd.to_datetime(df['DateTime'])
df.dtypes
Out[60]:
DateTime datetime64[ns]
dtype: object
In [61]:
df['DateTime'] = df['DateTime'].apply(lambda x:x.date())
print(df)
df.dtypes
DateTime
0 2014-01-01
1 2014-01-01
2 2014-01-01
3 2014-01-01
4 2014-01-01
5 2014-01-01
6 2014-01-01
7 2014-01-01
8 2014-01-01
9 2014-01-01
Out[61]:
DateTime object
dtype: object
Related
I have two tables, the first table has columns: id, start_time, and end_time. The second table has columns: id, timestamp, value. Is there a way to make a sum of table 2 based on the conditions in table 1?
Table 1:
id
start_date
end_date
5
2000-01-01 01:00:00
2000-01-05 02:45:00
5
2000-01-10 01:00:00
2000-01-15 02:45:00
6
2000-01-01 01:00:00
2000-01-05 02:45:00
6
2000-01-11 01:00:00
2000-01-12 02:45:00
6
2000-01-15 01:00:00
2000-01-20 02:45:00
Table 2:
id
timestamp
value
5
2000-01-01 05:00:00
1
5
2000-01-01 06:00:00
2
6
2000-01-01 05:00:00
1
6
2000-01-11 05:00:00
2
6
2000-01-15 05:00:00
2
6
2000-01-15 05:30:00
2
Desired result:
id
start_date
end_date
Sum
5
2000-01-01 01:00:00
2000-01-05 02:45:00
3
5
2000-01-10 01:00:00
2000-01-15 02:45:00
null
6
2000-01-01 01:00:00
2000-01-05 02:45:00
1
6
2000-01-11 01:00:00
2000-01-12 02:45:00
2
6
2000-01-15 01:00:00
2000-01-20 02:45:00
4
Try this :
SELECT a.id, a.start_date, a.end_date, sum(b.value) AS sum
FROM table1 AS a
LEFT JOIN table2 AS b
ON b.id = a.id
AND b.timestamp >= a.start_date
AND b.timestamp < a.end_date
GROUP BY a.id, a.start_date, a.end_date
I am trying to use a CASE WHEN statement like below to add 1 day to a timestamp based on the time part of the timestamp:
CASE WHEN to_char(pickup_date, 'HH24:MI') between 0 and 7 then y.pickup_date else dateadd(day,1,y.pickup_date) end as ead_target
pickup_Date is a timestamp with default format YYYY-MM-DD HH:MM:SS
My output
pickup_Date ead_target
2020-07-01 10:00:00 2020-07-01 10:00:00
2020-07-02 3:00:00 2020-07-02 3:00:00
When the hour of the day is between 0 and 7 then ead_target = pickup_Date ELSE add 1 day
Expected output
pickup_Date ead_target
2020-07-01 10:00:00 2020-07-02 10:00:00
2020-07-02 3:00:00 2020-07-02 3:00:00
You will want to use the date_part() function to extract the hour of the day - https://docs.aws.amazon.com/redshift/latest/dg/r_DATE_PART_function.html
Your case statement should work if you extract 'hour' from the timestamp and compare it to the range 0 - 7.
I have delivery slots that has a from column (datetime).
Delivery slots are stored as 1 hour to 1 hour and 30 minute intervals, daily.
i.e. 3.00am-4.30am, 6.00am-7.30am, 9.00am-10.30am and so forth
id | from
------+---------------------
1 | 2016-01-01 03:00:00
2 | 2016-01-01 04:30:00
3 | 2016-01-01 06:00:00
4 | 2016-01-01 07:30:00
5 | 2016-01-01 09:00:00
6 | 2016-01-01 10:30:00
7 | 2016-01-01 12:00:00
8 | 2016-01-02 03:00:00
9 | 2016-01-02 04:30:00
10 | 2016-01-02 06:00:00
11 | 2016-01-02 07:30:00
12 | 2016-01-02 09:00:00
13 | 2016-01-02 10:30:00
14 | 2016-01-02 12:00:00
I’m trying to get all delivery_slots between the hours of 3.00am - 4.30 am. Ive got the following so far:
SELECT * FROM delivery_slots WHERE EXTRACT(HOUR FROM delivery_slots.from) >= 3 AND EXTRACT(MINUTE FROM delivery_slots.from) >= 0 AND EXTRACT(HOUR FROM delivery_slots.from) <= 4 AND EXTRACT(MINUTE FROM delivery_slots.from) <= 30;
Which kinda works. Kinda, because it is only returning delivery slots that have minutes of 00.
Thats because of the last where condition (EXTRACT(MINUTE FROM delivery_slots.from) <= 30)
To give you an idea, of what I am trying to expect:
id | from
-------+---------------------
1 | 2016-01-01 03:00:00
2 | 2016-01-01 04:30:00
8 | 2016-01-02 03:00:00
9 | 2016-01-02 04:30:00
15 | 2016-01-03 03:00:00
16 | 2016-01-03 04:30:00
etc...
Is there a better way to go about this?
Try this: (not tested)
SELECT * FROM delivery_slots WHERE delivery_slots.from::time >= '03:00:00' AND delivery_slots.from::time <= '04:30:00'
Hope this helps.
Cheers.
The easiest way to do this, in my mind, is to cast the from column as a type time and do a where >= and <=, like so
select * from testing where (date::time >= '3:00'::time and date::time <= '4:30'::time);
So I have a DataFrame object called 'df' and im trying to covert the 'timestamp' into a actual readable date.
timestamp
0 1465893683657
1 1457783741932
2 1459730006393
3 1459744745346
4 1459744756375
Ive tried
df['timestamp'] = pd.to_datetime(df['timestamp'],unit='s')
but this gives
timestamp
0 1970-01-01 00:24:25.893683657
1 1970-01-01 00:24:17.783741932
2 1970-01-01 00:24:19.730006393
3 1970-01-01 00:24:19.744745346
4 1970-01-01 00:24:19.744756375
which is clearly wrong since I know the date should be either this year or last year.
What am i doing wrong?
Solution with unit ms:
print (pd.to_datetime(df.timestamp, unit='ms'))
0 2016-06-14 08:41:23.657
1 2016-03-12 11:55:41.932
2 2016-04-04 00:33:26.393
3 2016-04-04 04:39:05.346
4 2016-04-04 04:39:16.375
Name: timestamp, dtype: datetime64[ns]
You can reduce the significant digits or better use #jezrael's unit ('ms').
In [133]: pd.to_datetime(df.timestamp // 10**3, unit='s')
Out[133]:
0 2016-06-14 08:41:23
1 2016-03-12 11:55:41
2 2016-04-04 00:33:26
3 2016-04-04 04:39:05
4 2016-04-04 04:39:16
Name: timestamp, dtype: datetime64[ns]
I have this below data.
Date Interval
2014-01-01 12:00 AM
2014-01-01 12:30 AM
2014-01-01 1:00 AM
2014-01-01 1:30 AM
2014-01-01 2:00 AM
2014-01-01 2:30 AM
2014-01-01 3:00 AM
2014-01-01 3:30 AM
2014-01-01 4:00 AM
2014-01-01 4:30 AM
I need to extract the hour of the interval column.
I could do it using EXTRACT('hour', Interval) which gives me the hour numbers as int.
Result will be as follows:
Date Interval HourCount
2014-01-01 12:00 AM 0
2014-01-01 12:30 AM 0
2014-01-01 1:00 AM 1
2014-01-01 1:30 AM 1
2014-01-01 2:00 AM 2
2014-01-01 2:30 AM 2
2014-01-01 3:00 AM 3
2014-01-01 3:30 AM 3
2014-01-01 4:00 AM 4
2014-01-01 4:30 AM 4
But what I'm looking for is. I need the count for every 30 mins as 1.
Example data what I'm looking for.
Date Interval HourCount
2014-01-01 12:00 AM 1
2014-01-01 12:30 AM 2
2014-01-01 1:00 AM 3
2014-01-01 1:30 AM 4
2014-01-01 2:00 AM 5
2014-01-01 2:30 AM 6
2014-01-01 3:00 AM 7
2014-01-01 3:30 AM 8
2014-01-01 4:00 AM 9
2014-01-01 4:30 AM 10
This way, in a day I'll be getting 48 intervals.
I could use ROW_NUMBER() OVER (PARTITION BY Date ORDER BY Date). But this will give me wrong count if any interval is been missed out.
Suppose if the below row is missed out.
2014-01-01 4:00 AM 9
I'll be getting 9 as the HourCount for this row.
2014-01-01 4:30 AM 9
Someone help me to get the count of hours on 30 mins interval.
Try something like:
with series as
(select interval_num, interval_num * interval '30 minutes' as interval_time
from generate_series(0,47) series(interval_num))
select *
from data_table
right join series on series.interval_time = data_table.interval
Like you wrote, you can extract the hour, what you're missing is you can also extract the minutes and just check if they are bigger or equal 30.
SELECT date_part('hour', interval) * 2 + CASE WHEN date_part('minute', interval) >= 30 THEN 1 ELSE 0 END