Joining time series events with daily 'shift' data? - tableau-api

What is the best practice for joining 'shift' data and other time series data in Tableau? I am working with multiple geo data (from LA to India, UK, NY, Malaysia, Australia, China etc), and a lot of employees work past midnight.
For example, an employee has shift at 9 PM to 6 AM on 2016-07-31. The 'report date' is 2016-07-31 but no time zone information is provided.
This employee does work and there are events (time stamps in UTC) between 2016-07-31 21:00 to 2016-08-01 06:00. When I look at the events though, 7/31 will only have the events between 21:00 and 23:59. If I filter for just July, my calculations will be skewed (the event data will be cut off at midnight even though the shift extended to 6 AM).
I need to make calculations based upon the total time an employee was actually engaged with work (productive) and the total time they were paid. The request is for this to be daily/weekly/monthly.
If anyone can help me out here or give me some talking points to explain this to my superiors, it would be appreciated. This seems like it must be a common scenario. Do I need to request for a new raw data format or is there something I can do on my end?
the shift data only looks like this:
id date regular_hours overtime_hours total_hours
abc 2016-06-17 8 0.52 8.52
abc 2016-06-18 7.64 0.83 8.47
abc 2016-06-19 7.87 0.23 8.1
the event data is more detailed (30 minute interval data on events handled and the time it took to complete those events in seconds):
id date interval events event_duration
abc 2016-06-17 01:30:00 4 688
abc 2016-06-17 02:00:00 6 924
abc 2016-06-17 02:30:00 10 1320
So, you sum up the event_duration for an entire day and you get a number of seconds which was actually spent doing work. You can then compare this to amount of time that the employee was paid to see how efficient the staffing is.
My concern is that the event data has the date and the time (UTC). The payroll data only has a date without any time zone information. This causes inaccuracies when blending data in Tableau because some shifts cross midnight. Is there a way around this or do I need to propose new data requirements?
(FYI - people have been calculating it just based on the date for years most likely without considering time zones before. My assumption is that they just did not realize that this could cause inaccurate results)

Related

How to handle dates in neo4j

I'm an historian of medieval history and I'm trying to code networks between kings, dukes, popes etc. over a period of time of about 50 years (from 1220 to 1270) in medieval Germany. As I'm not a specialist for graph-databases I'm looking for a possibility to handle dates and date-ranges.
Are there any possibilities to handle over a date-range to an edge so that the edges, which represents a relationship, disappears after e.g. 3 years?
Are there any possibility to ask for relationships who have their date-tag in a date-range?
The common way to deal with dates in Neo4j is storing them either as a string representation or as millis since epoch (aka msec passed since Jan 01 1970).
The first approach makes the graph more easily readable the latter allows you to do math e.g. calculate deltas.
In your case I'd store two properties called validFrom and validTo on the relationships. You queries need to make sure you're looking for the correct time interval.
E.g. to find the king(s) in charge of France from Jan 01 1220 to Dec 31st 1221 you do:
MATCH (c:Country{name:'France'})-[r:HAS_KING]->(king)
WHERE r.validFrom >= -23667123600000 and r.validTo <=-23604051600000
RETURN king, r.validFrom, r.validTo
addendum
Since Neo4j 3.0 there's the APOC library which provides couple of functions for converting timestamps to/from human readable date strings.
You can also store the dates in their number representation in the following format: YYYYMMDD
In your case 12200101 would be Jan 1st 1220 and 12701231 would be Dec 31st 1270.
It's a useful and readable format and you can perform range searches like:
MATCH (h:HistoricEvent)
WHERE h.date >= 12200101 AND h.date < 12701231
RETURN h
It would also let you order by dates, if you need to.
As of Neo4J 3.4, the system handles duration and dates, see the official documentation. See more examples here.
An example related to the original question: Retrieve the historical events that happened in the last 30 days from now :
WITH duration({days: 30}) AS duration
MATCH (h:HistoricEvent)
WHERE date() - duration < date(h.date)
RETURN h
Another option for dates that keeps the number of nodes/properties you create fairly low is a linked list years (earliest year of interest - latest year), one of months (1-12), and one of dates in a month (1-31). Then every "event" in your graph can be connected to a year, month, and day. This way you don't have to create a new node for every new combination of a year month and day. You just have a single set of months, one of days, and one year. I scale the numbers to make manipulating them easier like so
Years are yyyy*10000
Months are mm*100
Date are dd
so if you run a query such as
match (event)-[:happened]->(t:time)
with event,sum(t.num) as date
return event.name,date
order by date
You will get a list of all events in chronological order with dates like Janurary 17th, 1904 appearing as 19040117 (yyyymmdd format)
Further, since these are linked lists where, for example,
...-(t0:time {num:19040000})-[:precedes]->(t1:time {num:19050000})-...
ordering is built into the nodes too.
This is, so far, how I have liked to do my event dating

After midnight times in postgresql

I have data from a text file I'm reading into a postgres 9.1 table, and the data looks like this:
451,22:30:00,22:30:00,San Jose,1
451,22:35:00,22:35:00,Santa Clara,2
451,22:40:00,22:40:00,Lawrence,3
451,22:44:00,22:44:00,Sunnyvale,4
451,22:49:00,22:49:00,Mountain View,5
451,22:53:00,22:53:00,San Antonio,6
451,22:57:00,22:57:00,California Ave,7
451,23:01:00,23:01:00,Palo Alto,8
451,23:04:00,23:04:00,Menlo Park,9
451,23:07:00,23:07:00,Atherton,10
451,23:11:00,23:11:00,Redwood City,11
451,23:15:00,23:15:00,San Carlos,12
451,23:18:00,23:18:00,Belmont,13
451,23:21:00,23:21:00,Hillsdale,14
451,23:24:00,23:24:00,Hayward Park,15
451,23:27:00,23:27:00,San Mateo,16
451,23:30:00,23:30:00,Burlingame,17
451,23:33:00,23:33:00,Broadway,18
451,23:38:00,23:38:00,Millbrae,19
451,23:42:00,23:42:00,San Bruno,20
451,23:47:00,23:47:00,So. San Francisco,21
451,23:53:00,23:53:00,Bayshore,22
451,23:58:00,23:58:00,22nd Street,23
451,24:06:00,24:06:00,San Francisco,24
It is from a timetable for a commuter rail line, Caltrain. I'm trying to query stations, to get train arrival and departure times. I did this several months ago in MySql, and I got
select * from trains as a, trains as b where a.trip_id=b.trip_id and a.st
op_id='San Antonio' and b.stop_id='San Carlos' and a.arrival_time < b.arrival_ti
me;
So far so good, pretty straightforward. However, when I tried copying the data into a postgres database, I got an error for the various columns that had times after midnight, either 24 or 25:00:00 something. However, if I change them to be 00:00:00 and 01:00:00 something, won't that mess with the query? A time after midnight will appear to be before the starting time? MySql apparently didn't have a problem with those times, and I'm not sure what to do. I'm thinking I should use the last column, or maybe convert the times to something that doesn't take into account PM/AM?
You should try using the interval type for the time columns. Those will keep track of the number of hours, minutes, and seconds instead of trying to record a time of day.
See the PostgreSQL documentation on dates and times.
An interval can have a time component greater than 24 hours, unlike the time datatype that is confined to 00:00 <= x <= 23:59.

facebook api date format

I've been working on facebook application. But I've faced with strange bug(?).
It I'm trying to get detailed info about any event using graph api start_date differs from the one if I'm trying to get it using fql. for example:
https://graph.facebook.com/209798352393506/ - start date is 2011-05-26T19:00:00
https://api.facebook.com/method/fql.query?query=select%20eid%2C%20name%2C%20tagline%2C%20pic%2C%20host%20%2C%20start_time%20from%20event%20where%20eid%20%3D209798352393506 - start time is 1306461600. Which in human readable format equals to Fri, 27 May 2011 02:00:00 GMT.
As you can see difference between got dates is 5 hours. Somtimes I'm getting dates which differ for 8 hours, sometimes - 6.
Correct date is the first one:
http://www.facebook.com/events/209798352393506/
I can't figure out what happens. All events I'm trying to view are from Denmark. My timezone is Europe/Kiev. Difference is 1 hour.
Is this a facebook's bug? Or documented feature? Or am I doing something wrong?
Link to the documentation or another answer in stackoverflow would be enough.
Here is two events
http://www.facebook.com/events/290600150977115/ - starts on 2012-03-22 at 20:00
http://www.facebook.com/events/289501924395338/ - starts on 2012-03-03 at 21:00
But. Using FQL I'm getting that first event starts on 2012-03-23 at 04:00. Difference is 8 hours. And the second one starts on 2012-03-04 at 06:00. In this case difference is 9 hours. Why???
It was because of daylight saving time.
Time difference between me and facebook(Los Angeles) sometimes was 8 sometimes 9 hours, because there was a moment when Denmakr alredy changed their time to summer time and los angeles - not.
The problem occured when event started "in winter time" and finished in summer time. In this case I needed to add one hour.
Facebook is weird.
From /fql/insights/
The end of the period during which the metrics were collected, expressed as a unix time (which should always be midnight, Pacific Daylight Time) or using the function end_time_date() which takes a date string in 'YYYY-MM-DD' format.
2011-05-26T19:00:00 ===> 2011-05-26T19:00:00 PDT ===> Fri, 27 May 2011 02:00:00 GMT.

Can't understand values being returned by Facebook Insights API

I don't understand the way the API returns values. Here's a sample of a page_impressions call, with 'week' as the period.
"values"=>
[{"end_time"=>"2012-01-08T08:00:00+0000", "value"=>1116},
{"end_time"=>"2012-01-09T08:00:00+0000", "value"=>1171},
{"end_time"=>"2012-01-10T08:00:00+0000", "value"=>1175}]
It seems that they're showing how many hits I had in the last 7 days up to the date in "end_time", is that correct? If it is, then I don't understand what use this would have, there is a huge overlap in the data.
How can I get the number of impressions of the last weeks instead? And how can I get more than 3 values to display? I really can't understand the logic behind this or how it could be useful.
What's happening here is that you're being given the total number of page_impressions for the 7-day period ending on each of the dates shown (i.e., how many times was the page seen over the past 7 days assuming the week ended on the end_time? and then on end_time+1? end_time+2)
Facebook is returning three (3) separate readings, presumably so you can spot/review very local trends (e.g., "are my weekly impressions creeping up?) or perhaps because you missed a measurement and want to have values for every day.
To answer your question specifically:
The 7-day period 2012-01-01 through 2012-01-08 12:00am* had 1,116 impressions.
The 7-day period 2012-01-02 through 2012-01-09 12:00am* had 1,171 impressions.
The 7-day period 2012-01-03 through 2012-01-10 12:00am* had 1,175 impressions.
As is quoted below, the end_time itself is always midnight in PDT. Thus, an end_time of 2012-01-08 really means the measurement stopped the night before, i.e., at 1 minute past 11:59pm on 2012-01-07.
From https://developers.facebook.com/docs/reference/fql/insights/:
The end of the period during which the metrics were collected,
expressed as a UNIX time (which should always be midnight, Pacific
Daylight Time) or using the function end_time_date() which takes a
date string in 'YYYY-MM-DD' format. Note: If the UNIX time provided is
not midnight, Pacific Daylight Time, your query may return an empty
resultset. Example: To obtain data for the 24-hour period starting on
September 15th at 00:00 (i.e. 12:00 midnight) and ending on September
16th at 00:00 (i.e. 12:00 midnight), specify 1284620400 as the
end_time and 86400 as the period. Note: end_time should not be
specified when querying lifetime metrics.

Capturing Employee Time using a collection

I want to create a timesheet application where I need an application that will collect data from the logged in user regarding the number of hours they worked on a specific date.
The user will be required to log into an application which will capture their credentials and employee ID.
Users will be presented with a form that will list the days of the week and a corresponding textbox for entering their hours (decimal). The following table is my vision of the entry form (basic).
Mon Tue Wed Thu Fri Sat Sun
9/6 9/7 9/8 9/9 9/10 9/11 9/12
0.00 0.00 0.00 0.00 0.00 0.00 0.00
I will need to store the information in a table where I will need to store:
Entry Date (DateTime) The date worked
EmpID (Int) The Employee’s ID
RptHours (Decimal) The number of hours worked
I am attempting to design the process that so that it will be streamlined and easy to interface. The current process will be:
1. Read table for reported hours and dates for the current logged in user
2. Display the dates for the current week
3. Display the hours worked (for the days that have been reported to date).
4. Allow the user to enter/edit data (textbox)
5. Save the data back to the table.
This data structure sticks out that it should be a class however my problem is that I am uncertain how to design a class which will allow me to access the information for all seven days of the week. I know that I can perform this using an array however I think that implementing a class will be more professional as well as a chance to learn.
I am fairly certain that I would use a collection (like List) however I am not seeing the solution where I can access and modify dates & times for a time period (7 days).
I am using C#. Can anyone give me a push (kick in the pants) in the right direction? I will appreciate any help and insight.
Thanks
Ray
I would have a class for Employee which has information like their employee ID, name, etc. Then I would have a WorkingInfo (or similar) class which has the following properties: Employee, Date, StartTime, EndTime. You then have a List<WorkingInfo> collection.
To calculate how long the employee worked, you would use the TimeSpan class to get the difference between your Start and EndTime properties.