Expires header for Facebook JS SDK and Google Analytics - facebook

We all know adding a far-future expiration date to static resources is a good practice to increase our websites' page load speed. So we've ensured it for all of our resources BUT the all-too-common Facebook JS SDK and Google Analytics don't do that and thus lower the entire page's speed score.
Examining the headers shows Facebook do 20 minutes:
Cache-Control public, max-age=1200
Connection keep-alive
Content-Type application/x-javascript; charset=utf-8
Date Tue, 23 Sep 2014 04:46:38 GMT
Etag "566aa5d57a352e6f298ac52e73344fdc"
Expires Tue, 23 Sep 2014 05:06:38 GMT
and Google Analytics do 2 hours:
Key Value
Response HTTP/1.1 200 OK
Date Tue, 23 Sep 2014 04:45:49 GMT
Expires Tue, 23 Sep 2014 06:45:49 GMT
Last-Modified Mon, 08 Sep 2014 18:50:13 GMT
X-Content-Type-Options nosniff
Content-Type text/javascript
Server Golfe2
Age 1390
Cache-Control public, max-age=7200
Alternate-Protocol 80:quic,p=0.002
Content-Length 16,062
Is there a way to force them to longer expiration dates?

These scripts have a short cache expire headers because they're frequently updated. When Facebook and Google add new features and fix bugs, they deploy these changes by overwriting the existing files (the ones you linked to in your question). This allows users of these services to get the latest features without having to do anything, but it comes at the cost (as you point out) of needing short cache expire headers.
You could host these scripts yourself and set far-future expire headers on them, but that would require you to manually update them when the libraries change. This would be very time-consuming and often impossible because most of these updates aren't put in public changelogs.
Moreover, doing this yourself could very likely end up being a net loss in performance because you'd lose the network cache effect that you gain due to the sheer popularity of these services. For example, I'd imagine when most users come to your site they already have a cached version of these scripts (i.e. it's extremely likely that sometime in the past two hours, the person visiting your website also visited another site that uses Google Analytics). On the other hand, if you hosted your own version, first-time visitors would always have to download your version.
To sum up, I wouldn't go out of your way to fix this "problem". Doing so would take a lot of time and probably not give you the desired effects.

The solution finally implemented was to move to Facebook's rediret API which doesn't force loading their script on each page load. It's actually what StackOverflow does here as well. Start a session in a private/incognito browser and you'll see.
This link might help: https://developers.facebook.com/docs/php/howto/example_facebook_login

Related

Service worker JavaScript update frequency (every 24 hours?)

As per this doc on MDN:
After that it is downloaded every 24 hours or so. It may be
downloaded more frequently, but it must be downloaded every 24h to
prevent bad scripts from being annoying for too long.
Is the same true for Firefox and Chrome? OR does update to service worker javascript only happens when user navigates to site?
Note: As of Firefox 57, and Chrome 68, as well as the versions of Safari and Edge that support service workers, the default behavior has changed to account for the updated service worker specification. In those browsers, HTTP cache directives will, by default, be ignored when checking the service worker script for updates. The description below still applies to earlier versions of Chrome and Firefox.
Every time you navigate to a new page that's under a service worker's scope, Chrome will make a standard HTTP request for the JavaScript resource that was passed in to the navigator.serviceWorker.register() call. Let's assume it's named service-worker.js. This request is only made in conjunction with a navigation or when a service worker is woken up via, e.g., a push event. There is not a background process that refetches each service worker script every 24 hours, or anything automated like that.
This HTTP request will obey standard HTTP cache directives, with one exception (which is covered in the next paragraph). For instance, if your server set appropriate HTTP response headers that indicated the cached response should be used for 1 hour, then within the next hour, the browser's request for service-worker.js will be fulfilled by the browser's cache. Note that we're not talking about the Cache Storage API, which isn't relevant in this situation, but rather standard browser HTTP caching.
The one exception to standard HTTP caching rules, and this is where the 24 hours thing comes in, is that browsers will always go to the network if the age of the service-worker.js entry in the HTTP cache is greater than 24 hours. So, functionally, there's no difference in using a max-age of 1 day or 1 week or 1 year—they'll all be treated as if the max-age was 1 day.
Browser vendors want to ensure that developers don't accidentally roll out a "broken" or buggy service-worker.js that gets served with a max-age of 1 year, leaving users with what might be a persistent, broken web experience for a long period of time. (You can't rely on your users knowing to clear out their site data or to shift-reload the site.)
Some developers prefer to explicitly serve their service-worker.js with response headers causing all HTTP caching to be disabled, meaning that a network request for service-worker.js is made for each and every navigation. Another approach might be to use a very, very short max-age—say a minute—to provide some degree of throttling in case there is a very large number of rapid navigations from a single user. If you really want to minimize requests and are confident you won't be updating your service-worker.js anytime soon, you're free to set a max-age of 24 hours, but I'd recommend going with something shorter on the off chance you unexpectedly need to redeploy.
Some developers prefer to explicitly serve their service-worker.js with response headers causing all HTTP caching to be disabled, meaning that a network request for service-worker.js is made for each and every navigation.
This no-cache strategy may result useful in a fast-paced «agile» environment.
Here is how
Simply place the following hidden .htaccess file in the server directory containing the service-worker.js:
# DISABLE CACHING
<IfModule mod_headers.c>
Header set Cache-Control "no-cache, no-store, must-revalidate"
Header set Pragma "no-cache"
Header set Expires 0
</IfModule>
<FilesMatch "\.(html|js)$">
<IfModule mod_expires.c>
ExpiresActive Off
</IfModule>
<IfModule mod_headers.c>
FileETag None
Header unset ETag
Header unset Pragma
Header unset Cache-Control
Header unset Last-Modified
Header set Pragma "no-cache"
Header set Cache-Control "max-age=0, no-cache, no-store, must-revalidate"
Header set Expires "Thu, 1 Jan 1970 00:00:00 GMT"
</IfModule>
</FilesMatch>
This will disable caching for all .js and .html files in this server directory and those below; which is more than service-worker.js alone.
Only these two file types were selected because these are the non-static files of my PWA that may affect users who are running the app in a browser window without installing it (yet) as a full-fledged automatically updating PWA.
More details about service worker behaviour are available from Google Web Fundamentals.

Format of Retry-After Header?

In Dropbox's Core API best practices there is a statement:
"Apps that hit the rate limits will receive a 503 error which uses the Retry-After header to indicate exactly when it's okay to start making requests again."
This answer references the Retry-After protocol which allows for two formats:
Retry-After: Fri, 31 Dec 1999 23:59:59 GMT
Retry-After: 120
Does anyone know which format Dropbox uses?
It's the latter, where the value is the number of seconds the app should wait before trying again.

How to efficiently retrieve expiration date for facebook photo URL and renew it before it expires?

Main problem:
Application caches URL's from facebook photo CDN
The photos expires at some point
My "technical" problem:
Facebook CDN "expires" header doesn't seem to be reliable ( or i don't know how to deal with them )
Using CURL to retrieve expiration date:
curl -i -X HEAD https://scontent-b.xx.fbcdn.net/hphotos-xap1/v/t1.0-9/q82/p320x320/10458607_4654638300864_4316534570262772059_n.jpg?oh=9d34386036754232b79c2208c1075def&oe=54BE4EE2
One minute before it returned: Mon, 05 Jan 2015 01:34:28 GMT
Calling it again now returned: Mon, 05 Jan 2015 01:35:27 GMT
Both times "Cache-Control" returned the same: Cache-Control: max-age=1209600
So far:
It seems like one of the most reliable ways would be to have a background job checking the photos all the time, but that feels a bit "wrong", like "brute forcing".
Having a background job would potentially allow expired pictures to be served up to the moment this photo url is "renewed"
My questions are:
Should i use the max-age parameter even tough it doesn't seem to change?
Is there a reliable way of using facebook's CDN URL?
Any other idea of how this should be implemented?
< joke >Should facebook API be used to punish badly behaving coders?< /joke >
Possible solutions ?
Check facebook for the most recent URL before serving any CDN URL
~> would slow down my requests a lot
Have a background job renewing the URL and expiration dates
~> would potentially have expired photos while the job don't "catch" them
Download the photos to my own CDN
~> not a good practice i would guess
UPDATE:
~> Perhaps Tinder actually cache user's pictures on their own CDN: https://gist.github.com/rtt/10403467 so seems like facebook is kind of OK with it?
Expires means exactly one thing, and it's not what you think it is:
The Expires entity-header field gives the date/time after which the response is considered stale. […]
The presence of an Expires field does not imply that the original resource will change or cease to exist at, before, or after that time.
— RFC 2616 §14.21, emphasis mine
If Facebook's image URLs stop working after some point in time, that's their business. Their HTTP headers don't have to mention it, and in fact, don't.
That being said, I suspect that the oe URL parameter may contain an expiration timestamp. If I interpret 54be4ee2 as a hexadecimal number containing a UNIX timestamp, I get January 20th, 2015, which is almost exactly a month from now. Might that be the value you're looking for?

If I enable migrations "July 2013 Breaking Changes" of my app, then search user by email wouldn't work

I'm using the search graph API to search for users by email. Here's an example of how I do that:
GET https://graph.facebook.com/search?q=Sample%40gmail.com&fields=name%2clink%2ceducation%2cid%2cwork%2cabout%2cpicture&limit=2&type=user&access_token=...
Before the July 2013 Breaking Changes it was working fine. Once I enabled the breaking changes I start getting HTTP 403 saying that that the access token is not valid.
HTTP/1.1 403 Forbidden
Access-Control-Allow-Origin: *
Cache-Control: no-store
Content-Type: text/javascript; charset=UTF-8
Expires: Sat, 01 Jan 2000 00:00:00 GMT
Pragma: no-cache
WWW-Authenticate: OAuth "Facebook Platform" "insufficient_scope" "(#200) Must have a valid access_token to access this endpoint"
X-FB-Rev: 798183
X-FB-Debug: lZPVbdTmZrCo+Bde/MNEXy/halUzQx7qIDW5aiZeT0g=
Date: Mon, 29 Apr 2013 07:25:29 GMT
Connection: keep-alive
Content-Length: 120
{"error":{"message":"(#200) Must have a valid access_token to access this endpoint","type":"OAuthException","code":200}}
Once I remove the %40 (# sign) or the '.com' part from the request I get a normal HTTP 200 results. The problem is, that it's not what I'm looking for. I want to be able to search for users by email the way I was able before.
Example of requests that does work:
GET https://graph.facebook.com/search?q=Samplegmail.com&fields=name%2clink%2ceducation%2cid%2cwork%2cabout%2cpicture&limit=2&type=user&access_token=...
GET https://graph.facebook.com/search?q=Sample%40gmail&fields=name%2clink%2ceducation%2cid%2cwork%2cabout%2cpicture&limit=2&type=user&access_token=...
As 林果皞 said. This is a bug in the graph API. I filed a bug here:
https://developers.facebook.com/bugs/335452696581712
have you try FQL?
SELECT uid,username,first_name, middle_name, pic,pic_small, pic_big,
pic_square,
last_name,name,email,birthday,birthday_date,contact_email,current_address,current_location,education,hometown_location,
languages, locale,profile_url,sex,work FROM user where
contains('youremail#example.com')
Search by email works fine to me (Access token just granted basic permissions enough):
https://developers.facebook.com/tools/explorer?method=GET&path=%2Fsearch%3Fq%3Dlimkokhole%40gmail.com%26fields%3Dname%2Clink%2Ceducation%2Cid%2Cwork%2Cabout%2Cpicture%26limit%3D2%26type%3Duser
Update:
Recently Graph API explorer app already enabled "July 2013 Breaking
Changes". So the example i've shown above whouldn't work anymore.

Session affinity not maintained with HTTP requests from IFrame in a canvas Facebook application

I am migrating a Facebook canvas application from FBML to iframe based.
The Facebook client class that is used to communicate with Facebook APIs is placed in the HTTP session for the first time user accessed my application. For subsequent requests, I retrieve the Facebook client object stored in the session and communicate to facebook.com with the same client.
There are two types of Facebook canvas applications, that is, applications within facebook.com.:
FBML version
iframe version
The FBML version of the application maintains session affinity, that is, the same session object is used by the application server for requests from same Facebook user.
Hence, I am able to retrieve the Facebook client placed in the session and use the same, but in case of an iframe based canvas application, that is, the application is displayed within an iframe, the same session object is not used but each time a new session is created and hence the Facebook client that I placed earlier vanishes.
No session affinity is maintained and new sessions keeps getting created. On further inspecting the cookies, it is found that the cookie named JSESSIONID is not available in HttpServletRequest object in case of iframe canvas application.
Dump of cookies and session taken for consecutive requests from the Facebook application to my server:
FBML APP:
--------------------Cookies-------------------
JSESSIONID==6E8792ADDF2AF192BF71864C353DE8E5==null
----------------Session-----------------
Session ID : 6E8792ADDF2AF192BF71864C353DE8E5
Creation time : Thu Sep 08 16:36:19 IST 2011
--------------------Cookies-------------------
JSESSIONID==6E8792ADDF2AF192BF71864C353DE8E5==null
----------------Session-----------------
Session ID : 6E8792ADDF2AF192BF71864C353DE8E5
Creation time : Thu Sep 08 16:36:19 IST 2011
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
IFrame App:
---------------------------------------------
--------------------Cookies-------------------
null
----------------Session-----------------
Session ID : D03845C95FC49E79EF7EED1FE8377799
Creation time : Thu Sep 08 16:39:09 IST 2011
--------------------Cookies-------------------
null
----------------Session-----------------
Session ID : 7466CDB69784FA10C570122BC866DB14
Creation time : Thu Sep 08 16:39:19 IST 2011
--------------------Cookies-------------------
null
----------------Session-----------------
Session ID : 4A23EA79AF929E6C2BD4114173AB250F
Creation time : Thu Sep 08 16:39:45 IST 2011
It is due to this issue the session affinity is not maintained. But I am not able to reason out why this is happening. I am using Struts 2 and plain servlets. The solution would be to enable a iframe canvas application to maintain session affinity, that is, return the JSESSIONID cookie with every request. What should I do or are there alternative solutions?
In order for the session cookie to be preserved in an iframe you need to add the HTTP header P3P. I do not know the exact value, but the following found on the Internet worked for me.
httpResponse.setHeader("P3P","CP='IDC DSP COR ADM DEVi TAIi PSA PSD IVAi IVDi CONi HIS OUR IND CNT'");