Can't understand values being returned by Facebook Insights API - facebook

I don't understand the way the API returns values. Here's a sample of a page_impressions call, with 'week' as the period.
"values"=>
[{"end_time"=>"2012-01-08T08:00:00+0000", "value"=>1116},
{"end_time"=>"2012-01-09T08:00:00+0000", "value"=>1171},
{"end_time"=>"2012-01-10T08:00:00+0000", "value"=>1175}]
It seems that they're showing how many hits I had in the last 7 days up to the date in "end_time", is that correct? If it is, then I don't understand what use this would have, there is a huge overlap in the data.
How can I get the number of impressions of the last weeks instead? And how can I get more than 3 values to display? I really can't understand the logic behind this or how it could be useful.

What's happening here is that you're being given the total number of page_impressions for the 7-day period ending on each of the dates shown (i.e., how many times was the page seen over the past 7 days assuming the week ended on the end_time? and then on end_time+1? end_time+2)
Facebook is returning three (3) separate readings, presumably so you can spot/review very local trends (e.g., "are my weekly impressions creeping up?) or perhaps because you missed a measurement and want to have values for every day.
To answer your question specifically:
The 7-day period 2012-01-01 through 2012-01-08 12:00am* had 1,116 impressions.
The 7-day period 2012-01-02 through 2012-01-09 12:00am* had 1,171 impressions.
The 7-day period 2012-01-03 through 2012-01-10 12:00am* had 1,175 impressions.
As is quoted below, the end_time itself is always midnight in PDT. Thus, an end_time of 2012-01-08 really means the measurement stopped the night before, i.e., at 1 minute past 11:59pm on 2012-01-07.
From https://developers.facebook.com/docs/reference/fql/insights/:
The end of the period during which the metrics were collected,
expressed as a UNIX time (which should always be midnight, Pacific
Daylight Time) or using the function end_time_date() which takes a
date string in 'YYYY-MM-DD' format. Note: If the UNIX time provided is
not midnight, Pacific Daylight Time, your query may return an empty
resultset. Example: To obtain data for the 24-hour period starting on
September 15th at 00:00 (i.e. 12:00 midnight) and ending on September
16th at 00:00 (i.e. 12:00 midnight), specify 1284620400 as the
end_time and 86400 as the period. Note: end_time should not be
specified when querying lifetime metrics.

Related

Why is Date.getMinutes() returning 2 for the time 4:00 PM?

I'm creating a customized function that does some calculations for a given time.
When a time is entered in a cell, for example 4:00 PM, this is automatically converted into a date, in this case 12/30/1899 16:00:00 and when the function getTheMinutes() is called, it returns 2 instead of 0.
function getTheMinutes(dateTime){
return dateTime.getMinutes();
}
The behavior of the function is different if it's used for a most recent date like 5/1/2019 16:00:00.
I want the user to be able to just write a time in a cell then use the customized function in another cell. Please let me know your thoughts.
Now that you have indicated the time zone for your spreadsheet I can confirm what #RobG deduced almost a day ago, which is that Guatemala adjusted its difference relative to UTC. Something you have confirmed is treated as by two minutes with effect from October 5, 1918.
More specifically, the adjustment was of 2 minutes and 4 seconds and effective from 03:00 that day:
(Source IANA Version 2019b file northamerica.)
There have been very many such minor adjustments around the world over the years (even between towns in the same country) and adjustments continue, though usually of a whole hour – between 'standard' and Summer time. Sheets has very properly recognised that "normal arithmetic" 'does not work' across such a transition and while noon yesterday to noon today for example is normally, for any one specific location, a difference of 24 hours it is often 23 hours or 25 hours on the day that clocks go forward/back.
And the moral of the story is to beware of obliging Sheets to assume, for want of a specific date, that is has the index number 0 - i.e. is December 30, 1899.
I made some testing, and I found out that the formula is giving a wrong result any minute before 10/5/1918 0:03:00, from that DateTime on, the formula is working as expected.
Here is my sheet https://docs.google.com/spreadsheets/d/1psm8_GJYRczO53TILJCOzo0p4GpnS-ooiGWqOJrC8ZU/edit?usp=sharing
I would need to do a date validation in my customized formula to make it useful. I don't know why google sheets is choosing that date as default when just a time is typed in a cell, I think it should be improved.

inconsistency between month, day, second representation of interval data type

I understand why postgresql uses month,day and second fields to representate the sql interval datatype. A month is not always the same length and a day can have 23, 24 or 25 hours if a daylight savings time adjustment is involved. this is from postgresql documentation.
But I then do not understand why this is not consequently handled both for months and days. see the following query which calculates an exact interval where the number of seconds between two points in time is exactly calculatable:
select ('2017-01-01'::timestamp-'2016-01-01'::timestamp); -->366 days.
postgresql chooses to give a result in days. not in months and not in seconds.
But why is the result days and not seconds? it is NOT defined how long days are (they can be 23,24 or 25 hours long). so why does he not give output in seconds?
Then since the length of months is also not defined, why doesn't postgresql give an output of 12 month instead of 366 days?
He does not care that the length of days is not defined, but obviously he cares that the length of month is not defined.
Why this asymmetrie?
For further explanation, see this query:
select ('10 days'::interval-'24 hours'::interval); --> 10 days -24:00:00
you see that postgresql correctly refuses to answer with 9 days. He is pretty aware of the problem that days and hours cannot be interchanged. But then again why does the first query return days?
I can't answer your question, but I think I can point you in the right direction. I think the book SQL-99 Complete, Really is the most accessible source for understanding SQL intervals. It's available online: https://mariadb.com/kb/en/sql-99/08-temporal-values/.
SQL standards describe two kinds of intervals: year-month intervals and day-time intervals. It does this to prevent month parts and day parts from appearing in the same interval, because, as you already know, the number of days in a month is ambiguous. The number of days in the interval '3' month depends on which three months you're talking about.
I think this is the verbose, standard SQL way to write your first query.
select cast(timestamp '2017-01-01' - timestamp '2016-01-01' as interval day to hour) as new_column;
new_column
interval day to hour
--
366 days
I suspect that you'll find that SQL standards have rules for what a SQL dbms is supposed to do when things like interval day to hour are omitted. PostgreSQL might or might not follow those rules.
postgresql chooses to give a result in days. not in months and not in seconds.
Standard SQL prevents month parts and day parts from appearing in the same interval. Also, the range of valid seconds is from 0 to 59.
select interval '59' second;
interval
interval second
--
00:00:59
select interval '60' second;
interval
interval second
--
00:01:00

Joining time series events with daily 'shift' data?

What is the best practice for joining 'shift' data and other time series data in Tableau? I am working with multiple geo data (from LA to India, UK, NY, Malaysia, Australia, China etc), and a lot of employees work past midnight.
For example, an employee has shift at 9 PM to 6 AM on 2016-07-31. The 'report date' is 2016-07-31 but no time zone information is provided.
This employee does work and there are events (time stamps in UTC) between 2016-07-31 21:00 to 2016-08-01 06:00. When I look at the events though, 7/31 will only have the events between 21:00 and 23:59. If I filter for just July, my calculations will be skewed (the event data will be cut off at midnight even though the shift extended to 6 AM).
I need to make calculations based upon the total time an employee was actually engaged with work (productive) and the total time they were paid. The request is for this to be daily/weekly/monthly.
If anyone can help me out here or give me some talking points to explain this to my superiors, it would be appreciated. This seems like it must be a common scenario. Do I need to request for a new raw data format or is there something I can do on my end?
the shift data only looks like this:
id date regular_hours overtime_hours total_hours
abc 2016-06-17 8 0.52 8.52
abc 2016-06-18 7.64 0.83 8.47
abc 2016-06-19 7.87 0.23 8.1
the event data is more detailed (30 minute interval data on events handled and the time it took to complete those events in seconds):
id date interval events event_duration
abc 2016-06-17 01:30:00 4 688
abc 2016-06-17 02:00:00 6 924
abc 2016-06-17 02:30:00 10 1320
So, you sum up the event_duration for an entire day and you get a number of seconds which was actually spent doing work. You can then compare this to amount of time that the employee was paid to see how efficient the staffing is.
My concern is that the event data has the date and the time (UTC). The payroll data only has a date without any time zone information. This causes inaccuracies when blending data in Tableau because some shifts cross midnight. Is there a way around this or do I need to propose new data requirements?
(FYI - people have been calculating it just based on the date for years most likely without considering time zones before. My assumption is that they just did not realize that this could cause inaccurate results)

What is the significance of January 1, 1601?

This structure is a 64-bit value representing the number
of 100-nanosecond intervals since January 1, 1601.
Reference: http://msdn.microsoft.com/en-us/library/aa915351
Why it is set "since 1601"? Why not unix time 1970 or even 2000?
What can I do with the compatibility of so distant in time dates?
Answering to myself.
The ANSI Date defines January 1, 1601 as day 1, and is used as the origin of COBOL integer dates. This epoch is the beginning of the previous 400-year cycle of leap years in the Gregorian calendar, which ended with the year 2000.
as you can find in wikipedia under Julian_day entry.
Further:
Why is the Win32 epoch January 1, 1601
Because 1/1/1601 was the start of the epoch.
Take it from Raymond Chen:
Why is the Win32 epoch January 1, 1601?🕗
The FILETIME structure records time in the form of 100-nanosecond intervals since January 1, 1601. Why was that date chosen?
The Gregorian calendar operates on a 400-year cycle, and 1601 is the first year of the cycle that was active at the time Windows NT was being designed. In other words, it was chosen to make the math come out nicely.
I actually have the email from Dave Cutler confirming this.
Bonus Chatter
RFC4122 UUIDs also measure 100ns ticks, but they start at 10/15/1582 (as opposed to FILETIME's 1/1/1601:
Date Ticks Uuid Epoch ticks
---------------------- ------------------ ------------------
1582-10-15 -5748192000000000 0 Start of uuid epoch
1601-01-01 0 0x00146BF33E42C000 Start of Windows epoch
1899-12-30 0x014F35A9A90CC000 0x0163A19CE74F8000 Lotus 123/Excel/Access/COM zero date
1900-01-01 0x014F373BFDE04000 0x0163A32F3C230000 SQL Server zero date
1970-01-01 0x019DB1DED53E8000 0x01B21DD213814000 Unix epoch timestamp
2000-01-01 0x01BF53EB256D4000 0x01D3BFDE63B00000
2010-01-01 0x01CA8A755C6E0000 0x01DEF6689AB0C000
2020-01-01 0x01D5C03669050000 0x01EA2C29A747C000
//FILETIME eras
1972-01-21 11:43:51 PM 0x01A0000000000000 0x01B46BF33E42C000 Start of 0x01A era
1986-04-30 11:43:13 AM 0x01B0000000000000 0x01C46BF33E42C000 Start of 0x01B era
2000-08-06 11:42:36 PM 0x01C0000000000000 0x01D46BF33E42C000 Start of 0x01C era
2014-11-14 11:41:59 AM 0x01D0000000000000 0x01E46BF33E42C000 Start of 0x01D era
2029-02-20 11:41:22 PM 0x01E0000000000000 0x01F46BF33E42C000 Start of 0x01E era
2043-05-31 11:40:44 AM 0x01F0000000000000 0x02046BF33E42C000 Start of 0x01F era
//UUID eras
1968-02-11 11:43:13 AM 0x019B940CC1BD4000 0x01B0000000000000 Start of uuid 0x01B era
1982-05-20 11:42:36 PM 0x01AB940CC1BD4000 0x01C0000000000000 Start of uuid 0x01C era
1996-08-27 11:41:59 AM 0x01BB940CC1BD4000 0x01D0000000000000 Start of uuid 0x01D era
2010-12-04 11:41:22 PM 0x01CB940CC1BD4000 0x01E0000000000000 Start of uuid 0x01E era
2025-03-13 11:40:44 AM 0x01DB940CC1BD4000 0x01F0000000000000 Start of uuid 0x01F era
Bonus Chatter
Excel uses a zero date of 12/30/1899 in order to be bug-for-bug compatible with Lotus 1-2-3. Which is also why Excel considers February 1900 to be a leap year (because the Lotus 1-2-3 guys thought it was). Which is why it's also impossible to represent dates before March 1, 1900 in Excel.
Well, 1 January 1601 was the first day of the 17th Century. And pendulum clocks were invented in the 17th century, allowing time to be measured to 1 second accuracy1. So (in theory) there might be references in extant literature from that period to timepoints measured with that accuracy.
But in reality the choice is arbitrary. There has to be an "epoch", and provided
the epoch is far enough back that "negative time" values are rare, and
the wrap-around time is far enough in the future to be a few generations away,
any choice will do.
But hey, if it worries you that much, send a letter to Steve Balmer2.
I'm inclined to believe Ian Boyd's answer, given the claimed source. And the reason therein is that it makes the math easier (for Gregorian leap year calculation). However, given how tiny that simplification is, and how weak the reasoning behind it, the choice is (IMO) essentially arbitrary. (Not that I'm saying it is wrong ...)
1 - OK ... probably not that accurate.
2 - Or Satya Nadella.
Its a pragmatic choice.
The modern western calendar was not consistent until 1752 when Britain (and its colonies) adopted the Gregorian calendar, which had been adopted in most of catholic Europe since 1582.
This is the modern calendar with leap years etc. to keep the 1st of January aligned with the winter solstice.
So why not start from 1st January 1752? Because the basic leap year rule "Its a leap year if the two digit year is divisible by four except if the four digit century is also divisible by four") established a 400 year cycle. The first full cycle starting on 1st January 1601, (at least in Rome).
The leap year and date calculations are painful enough without starting midway through a four hundred year cycle so 1600 is a pretty good start as long as you remeber that any dates before 1752 need to be qualified by a geographic location, as British dates were 10 days out of sync. with Roman dates by this time.
As has already been mentioned I think the popular answer is because the Gregorian calendar operates on a 400-year cycle, and 1601 is the first year of the cycle that was active at the time Windows NT was being designed.
January 1, 1601 is origin of COBOL integer dates.
It is also day 1 by ANSI date format.
And if you speculate further according to ISO8601 which is the format in which it is in, prior to 1583 time was based on the proleptic Gregorian calendar which has 366 days per year. Perhaps they just rounded up to the next century.

facebook api date format

I've been working on facebook application. But I've faced with strange bug(?).
It I'm trying to get detailed info about any event using graph api start_date differs from the one if I'm trying to get it using fql. for example:
https://graph.facebook.com/209798352393506/ - start date is 2011-05-26T19:00:00
https://api.facebook.com/method/fql.query?query=select%20eid%2C%20name%2C%20tagline%2C%20pic%2C%20host%20%2C%20start_time%20from%20event%20where%20eid%20%3D209798352393506 - start time is 1306461600. Which in human readable format equals to Fri, 27 May 2011 02:00:00 GMT.
As you can see difference between got dates is 5 hours. Somtimes I'm getting dates which differ for 8 hours, sometimes - 6.
Correct date is the first one:
http://www.facebook.com/events/209798352393506/
I can't figure out what happens. All events I'm trying to view are from Denmark. My timezone is Europe/Kiev. Difference is 1 hour.
Is this a facebook's bug? Or documented feature? Or am I doing something wrong?
Link to the documentation or another answer in stackoverflow would be enough.
Here is two events
http://www.facebook.com/events/290600150977115/ - starts on 2012-03-22 at 20:00
http://www.facebook.com/events/289501924395338/ - starts on 2012-03-03 at 21:00
But. Using FQL I'm getting that first event starts on 2012-03-23 at 04:00. Difference is 8 hours. And the second one starts on 2012-03-04 at 06:00. In this case difference is 9 hours. Why???
It was because of daylight saving time.
Time difference between me and facebook(Los Angeles) sometimes was 8 sometimes 9 hours, because there was a moment when Denmakr alredy changed their time to summer time and los angeles - not.
The problem occured when event started "in winter time" and finished in summer time. In this case I needed to add one hour.
Facebook is weird.
From /fql/insights/
The end of the period during which the metrics were collected, expressed as a unix time (which should always be midnight, Pacific Daylight Time) or using the function end_time_date() which takes a date string in 'YYYY-MM-DD' format.
2011-05-26T19:00:00 ===> 2011-05-26T19:00:00 PDT ===> Fri, 27 May 2011 02:00:00 GMT.