How to create and implement a pixel tracking code - zend-framework

OK, here's a goal I've been looking for a while.
As it's known, most advertising and analytics companies use a so called "pixel" code in order to track websites views, transactions, conversion etc.
I do have a general idea on how it works, the problem is how to implement it. The tracking codes consist from few parts.
The tracking code itself.
This is the code that the users inserts on his webpage in the <head> section. The main goal of this code is to set some customer specific variables and to call the *.js file.
*.js file.
This file holds all the magic of CRUD (create/read/update/delete) cookies, track user's events and interaction with the webpage.
The pixel code.
This is an <img> tag with the src atribute pointing to an image *.gif (for example) file that takes all the parameters collected on the page, and stores them in the database.
Example:
WordPress pixel code: <img id="wpstats" src="http://stats.wordpress.com/g.gif?host=www.hostname.com&list_of_cookies_value_pairs;" alt="">
Google Analitycs:
http://www.google-analytics.com/__utm.gif?utmwv=4&utmn=769876874&etc
Now, it's obvious that the *.gif request has to reach a server side scripting language in order to read the parameters data and store them in a db.
Does anyone have an idea how to implement this in Zend?
UPDATE
Another thing I'm interested in is: How to avoid the user's browser to load the cached *.gif ? Will a random parameter value do the trick? Example: src="pixel.gif?nocache=random_number" where the nocache parameter value will be different on every request.

As Zend is built using PHP, it might be worth reading the following question and answer: Developing a tracking pixel.
In addition to this answer and as you're looking for a way of avoiding caching the tracking image, the easiest way of doing this is to append a unique/random string to it, which is generated at runtime.
For example, server-side and with the creation of each image, you might add a random URL id:
<?php
// Generate random id of min/max length
$rand_id = rand(8, 8);
// Echo the image and append a random string
echo "<img src='pixel.php?a=".$vara."&b=".$varb."&rand=".$rand_id."'>";
?>

Just adding my 2 cents to this thread because I think an important, and frequently used, option is missing: you don't necessarily need a scripting language to capture the request. A more efficient approach is to use the web server access log (like apache access log for instance) to log the request and then handle that log with whatever tools you see fit, like ELK stack for instance.
This makes serving the requests much lighter because no scripting language is loaded to prepare the response, just native apache response, which is typically much more efficient.

First of all, the *.gif doesn't need to be that file type, the only thing that is of interest is the Content-Type http header. Set that to image/gif (or any other, appropiate type) in the beginning, execute your code and render some sort of image to the response body.

Well, all of the above codes are correct and is good but to be certain, the guy above mention "g.gif"
You can just add a simple php code to write to an sql or fwrite("file.txt",$opened)
where var $opened serves as the counter++ if someone opened your mail... then save it as "g.gif"
TO DO all of this just add these:
<Files "/thisdirectory">
AddType application/x-httpd-php .gif
</Files>
to your ".htaccess" file but be sure to make a new directory for that g.gif or whatever.gif where the directory only contains g.gif and .htaccess

Related

Fetching a redirect's target URL in OpenRefine

I have a CSV of ~2000 URLs that, when queried, do a 301 or 302 redirect, and I'm trying to figure out if OpenRefine is able to export to a new column the destination url that it retrieves HTML from when I fetch the html from it (or some other way).
e.g.
https://www-istp.gsfc.nasa.gov/stargaze/Ssolsys.htm
redirects to
https://pwg.gsfc.nasa.gov/stargaze/Ssolsys.htm
And I know that from clicking the link in my browser of choice. I've found a few answers suggesting that this can be done in various coding languages, but nothing so far suggesting how to do so in OpenRefine, even though I'm like 80% sure that it can be.
Does anyone out there know what I might be able to do to make this happen?
In OpenRefine you can write expressions in GREL, Jython (Java Implementation of Python 2) and Clojure.
As far as I know GREL does not support analyzing the target of a redirection URL, so I would use Python for that.
In your OpenRefine Project go to your column containing the urls and use "Edit column" > "Add column based on this column..."
In the corresponding dialog window (see Screenshot below) you change the expression language to "Python / Jython" and use the following code snippet to retrieve the "real" URL of the request.
import urllib2
response = urllib2.urlopen(value)
return response.geturl()

REST API versioning when using Atom for resource collections

I know this is something that has been discussed over and over, and I have done extensive research to get where I am so far, but can't seem to get over the final hurdle.
I am designing a custom REST api for our application, and have decided that I would like to version using media types e.g. application/vnd.mycompany.resource.v2+xml. I realise the pro's and con's of this model and it seems to weigh up the most flexible.
Hence my GET would look as follows:
=== REQUEST ===>
GET /workspaces/123/contacts?firstName=Neil&accessID=789264&timestamp=1317611 HTTP/1.1
Accept: application/vnd.mycompany.contact-v2+xml
<== RESPONSE ===
HTTP/1.1 200 OK
Content-Type: application/vnd.mycompany.contact-v2+xml
<contact>
<name>Neil Armstrong</name>
<mobile>+61456838435</mobile>
<email>neil.armstrong#space.com</email>
</contact>
The problem is that I would like to use Atom feeds and entrys to represent my resource collections. This way I can harness the searching and pagination of Atom without this infected my resources or API structure.
If I use Atom for my requests, my request structure now looks like:
=== REQUEST ===>
GET /workspaces/123/contacts HTTP/1.1
Accept: application/atom+xml; type=feed;
<== RESPONSE ===
HTTP/1.1 200 OK
Content-Type: application/atom+xml; type=feed;
<feed xmlns="http://www.w3.org/2005/Atom">
<title>Contacts Feed</title>
<link rel="self" href="https://api.mycompany.com/workspaces/contacts"/>
<updated>2011-11-13T18:30:02Z</updated>
...
<entry>
<title>Neil Armstrong</title>
...
<content type="application/vnd.mycompany.contact-v2+xml">
<contact>
<name>Neil Armstrong</name>
<mobile>+61456838435</mobile>
<email>neil.armstrong#space.com</email>
</contact>
</content>
</entry>
</feed>
Using Atom to represent my collections of resources, I lose the ability to version using media types. As the media type is now hidden within the content of the Atom entry.
<content type="application/vnd.mycompany.contact-v2+xml">
What is the best practice for determining the media type version of my resource, while still utilising the power of Atom for Resource collection management?
My thinking is that I could pass it through the ACCEPT header e.g.
Accept: application/atom+xml; type=feed; version=1.0
But then this is confusing as you are asking for version 1.0 of the Atom feed, not the resource itself...
Any help would be really appreciated!!
The problem is that IMHO you're misusing the media types.
Media types give you information on the STRUCTURE of the actual payload, but not the SEMANTICS of the payload. "I know this is an XHTML page, but I don't know if it's a blog post or a item on Amazon." By being an XHTML page, you know how to get the component parts out of the payload and ask interesting questions, but interpretation of the payload is not part of the media type.
Consider an example, paraphrased from an example of Roy Fielding, sending a 10,000 bit array as a GIF file that's 100x100 pixels. GIF, as "everyone knows" is used for sending pictures, but it's really simpler than that. It's a mechanism for sending structured binary that just-happen to most-of-the-time be images. So, in this case of using it to send a 10,000 bit array (perhaps represented as a gray scale image of 00 and FF), you get the benefit of a common decoder (GIF), GIFs built in compression, etc.
But, in this case, it's not a picture. You can show it as a picture, but it's a meaningless picture. The classic semantic of it being used for the picture scenario is not relevant in this case. The benefit is the ubiquity of the the format.
Another example was years ago an engineer was doing radar studies. So, he would take the 3-view drawings of aircraft you would find in books and such, and he would encode them using a tablet in to AutoCAD drawings. The DWG format was well documented, and he had code to read them. What he wanted was the coordinates and measurements from the specific aircraft.
So, in the end he had a bunch of "meaningless" AutoCAD files with nothing but a bunch of lines in it that "made no sense". But in fact they were chock full of good information for his domain. The DWG file was the media-type, but these weren't "CAD drawings". (Can you say "spontaneous reuse"?)
It's fine to version something via media-type, but that's only relevant if the media-type is in fact changing. ATOM, as you noted, isn't changing, or at least it's not changing under your control, and you may choose not to support the new version if/when it does change. But ATOM is not changing because how it represents its information, how that information is encoded, is not changing. The information may well change, in fact it changes all the time. Every ATOM feed is different with different information. MOST have similar semantics (blog feeds), but many do not (for example, perhaps your scenario).
But how you will parse and get information out of the ATOM feed will not change. And that's what the media type represents. An encoding of information, not the information itself.
So, if you want to detect versioning, then check within your payload. Inspect it. You KNOW that for V1 of your data where, for example, the invoice number is (perhaps it's at invoice/inv_no in XPATH). If the invoice is NOT there, then what do you do? You a) look some other well known place (i.e. V2), or, b) you throw an error ("Whatever this is, it's not an invoice!"). You would have to do that no matter what, because you could be getting anything, regardless of what the version says, or the media type says, or what anything else says.
You can make your payloads forward compatible to be resistant to breaking change, then version is a matter of making use of all the information you can see. If you get A and B, then while you'd like to have C and D as well, the clients can get by with the more limited information. Of if the clients see C and D, they would know to ignore A and B, as that data is deprecated. Same with the server. If something is sending A and B, it's implied to be an older processing model than if they sent along C and D.
You can version through rel names "order" vs "order_2", old clients only know to use "order", new clients know to use "order_2" and follow that link instead.
Or you simply include a version identifier in the payload, that's an easy check as well (especially since it's early in your design).
There are a lot of ways to manage the versioning, but the media type really shouldn't be the mechanism. That's why this really isn't a "problem" with ATOM. So, it's a matter of perspective.
I have another discussion about the Accept header over here: REST API having same object, but light
This (IMHO) unrelated to your versioning issue, but it's an example of extended media-types. But that's only my perception of why and how most folks want "versioning". A case could be made this case is the same thing, but most folks associate versioning with services, not simply data representations, which this other post was mostly about.
In the end, either your client and/or server are flexible enough to handle versioned data or they're not. They will (mostly, they are computers after all. Deterministic my heinie...) do what they're told. A simple rule of "ignore stuff that you don't know" can take you quite far in terms of versioning without ever changing a v1 to a v2, regardless of your encoding. Likewise "work with what you have" is a nice rule for a flexible, tolerant server. If you have problems in either case, that's what errors, logs, operators, and 24hr pagers are for, and you need those anyway.

Best method to serve a dynamic image for mails

I would like to represent dynamic images in an email. For example with the given url
<img src="http://myserver.com/index.php/user_key/thispagestate.jpg" />
I would like to serve a different image based on logic within my server. There will only be between 2 to 4 static images used to represent the result of any given request.
The 2 options I had in mind were:
to serve the images directly using perhaps
imagecreatefromjpeg
Or generate 302 redirects
Seeing as each request will result in one of a limited number of images I thought a redirect might save resources on our end and make use of caching on the user's end too. The result for each request will change depending on the user and time, perhaps using redirects will have some consecuence for SEO or spam filtering?
Your opinions on the best method will be appreciated
The 2 options I had in mind were:
to serve the images directly using
perhaps imagecreatefromjpeg Or
generate 302 redirects
I'd go with #1 in this case, though since it's a static image you can simply use:
header("Cache-Control: no-cache, must-revalidate"); // HTTP/1.1
header("Expires: Sat, 26 Jul 1997 05:00:00 GMT"); // Date in the past
header('Content-Type: image/jpg'); // or image/png, etc.
echo file_get_contents($image_path); // where $image_path is the path to the image
exit;
instead. You'd only need to use the GD functions if you're were trying to do something like adding text on top of the static image.
Note in this cache I'm setting it to cache expire, since the URL will be the same, but the content might change. This could potentially confuse caching systems.
Seeing as each request will result in
one of a limited number of images I
thought a redirect might save
resources on our end and make use of
caching on the user's end too.
The reverse actually, since the same file will now have different content. You'll want them to revalidate the content each time to make sure the proper image shows.

RESTful, efficient way to query List.contains(element)?

Given:
/images: list of all images
/images/{imageId}: specific image
/feed/{feedId}: potentially huge list of some images (not all of them)
How would you query if a particular feed contains a particular image without downloading the full list? Put another way, how would you check whether a resource state contains a component without downloading the entire state? The first thought that comes to mind is:
Alias /images/{imageId} to /feed/{feedId}/images/{imageId}
Clients would then issue HTTP GET against /feed/{feedId}/images/{id} to check for its existence. The downside I see with this approach is that it forces me to hard-code logic into the client for breaking down an image URI to its proprietary id, something that REST frowns upon. Ideally I should be using the opaque image URI. Another option is:
Issue HTTP GET against /feed/{feedId}?contains={imageURI} to check for existence
but that feels a lot closer to RPC than I'd like. Any ideas?
What's wrong with this?
HEAD /images/id
It's unclear what "feed" means, but assuming it contains resources, it'd be the same:
HEAD /feed/id
It's tricky to say without seeing some examples to provide context.
But you could just have clients call HEAD /feed/images/{imageURI} (assuming that you might need to encode the imageURI). The server would respond with the usual HEAD response, or with a 404 error if the resource doesn't exist. You'd need to code some logic on the server to understand the imageURI.
Then the client either uses the image meta info in the head, or gracefully handles the 404 error and does something else (depending on the application I guess)
There's nothing "un-RESTful" about:
/feed/{feedId}?contains={imageURI}[,{imageURI}]
It returns the subset as specified. The resource, /feed/{feedid}, is a list resource containing a list of images. How is the resource returned with the contains query any different?
The URI is unique, and returns the appropriate state from the application. Can't say anything about the caching semantics of the request, but they're identical to whatever the caching semantics are of the original /feed/{feedid}, it simply a subset.
Finally, there's nothing that says that there even exists a /feed/{feedid}/image/{imageURL}. If you want to work with the sub-resources at that level, then fine, but you're not required to. The list coming back will likely just be a list of direct image URLS, so where's the link describing the /feed/{feedid}/image/{imageURL} relationship? You were going to embed that in the payload, correct?
How about setting up a ImageQuery resource:
# Create a new query from form data where you could constrain results for a given feed.
# May or may not redirect to /image_queries/query_id.
POST /image_queries/
# Optional - view query results containing URIs to query resources.
GET /image_queries/query_id
This video demonstrates the idea using Rails.

Use GET or POST for a search form

I have a couple search forms, 1 with ~50 fields and the other with ~100. Typically, as the HTML spec says, I do searches using the GET method as no data is changed. I haven't run into this problem yet, but I'm wondering if I will run out of URL space soon?
The limit of Internet Explorer is 2083 characters. Other browsers, have a much higher limit. I'm running Apache, so the limit there is around 4000 characters, which IIS is 16384 characters.
At 100 fields, say average field name length of 10 characters, that's already 5000 characters...amazing on the 100 field form, I haven't had any errors yet. (25% of the fields are multiple selects, so the field length is much longer.)
So, I'm wondering what my options are. (Shortening the forms is not an option.) Here my ideas:
Use POST. I don't like this as much because at the moment users can bookmark their searches and perform them again later--a really dang nice feature.
Have JavaScript loop through the form to determine which fields are different than default, populate another form and submit that one. The user would of course bookmark the shortened version.
Any other ideas?
Also, does anyone know if the length is the encoded length or just plain text?
I'm developing in PHP, but it probably doesn't make a difference.
Edit: I am unable to remove any fields; I am unable to shorten the form. This is what the client has asked for and they often do use a range of fields, in the different categories. I know that it's hard to think of a form that looks nice with this many fields, but the users don't have a problem understanding how it works.
Are your users actually going to be using all 50-100 fields to do their searches? If they're only using a few, why not POST the search to an "in between" page which header()-redirects them to the results page with only the user-changed fields in the URL? The results page would then use the default values for the fields that don't exist in the URL.
To indirectly address your question, if I was faced with a 100-field form to fill in on one page, I'd most likely close my browser, it sounds like a complete usability nightmare.
My answer is, if there's a danger that I'm getting anywhere near that limit for normal usage of the form, I'm probably Doing It Wrong.
In order of preference, I would
Split the form up and use some server-side state retention
Switch to POST, and then generate and redirect to a shorter URL on POST that resolved to the same result
Give up ;)
You mention in a comment that many of the fields "are hidden and can be opened as required".
If you are willing to discard graceful degradation, you could always actually add and remove the fields from the form, rather than just hiding and showing them: the browser won't submit the ones that aren't included in the form.
This is a variant of the "Make and model" forms that online insurance etc. pages use -- select the make, submit back to the server and get the list of models for that manufacturer.
If you don't mind using javascript then you could have it work out the length of the query string and if it is too long then switch to a post. Then have some sort of url mapper to allow them to bookmark these posted searches.
Use post and if the user bookmarks the search, save it in a database and give it a unique token, then redirect to the search page using GET and passing the token as parameter.
TinyURL is a nice example: You give it a very long URL, it saves it to a DB, gives you a unique identifier for that URL and later you can request the long URL using that identifier.
In PHP it would be something along the lines of:
<?php
if (isset($_GET['token']))
{
$token = addslashes($_GET['token']);
$qry = mysql_query("SELECT fields FROM searches WHERE token = '{$token}'");
if ($row = mysql_fetch_assoc($qry))
{
performSearch(unserialize($row['fields']));
exit;
}
showError('Your saved search has been removed because it hasn\'t been used in a while');
exit;
}
$fields = addslashes(serialize($_POST));
$token = sha1($_SERVER['REMOTE_ADDR'].rand());
mysql_query("INSERT INTO searches (token, fields, save_time) Values ('{$token}', '{$fields}', NOW())");
header('Location: ?token='.$token);
exit;
?>
And run a script daily:
<?php
mysql_query('DELETE FROM searches WHERE save_time < DATE_ADD(NOW(), INTERVAL -200 DAY)');
?>
Also, does anyone know if the length
is the encoded length or just plain text?
My guess was for encoded length. I made a simple test: a textarea and a submit button to a simplistic PHP script.
Loaded the page in IE6, pasted some French text in the textarea, 2000 characters. If I hit the submit button, nothing. I had to reduce the length of the text to be able to submit.
In other words, the 2083 character limit is exactly the maximal length of the URL found in the address bar after submitting the GET request.
I would go for the JavaScript solution: on submit, analyze the form, create a secondary form with hidden attributes, and submit that.
Some strategies on shortening the output:
As you point out, you can already skip all values left to default (no field, no value).
If you have a form like the one at Processing forum search you can group all checkbox states in one variable only, eg. using letter encoding.
Use short value attributes (in select for example).
Note: if the search page is actually composed of several independent forms, where users fill only one section or another, you can make several separate forms.
Might not apply to your case and might seems obvious but worth mentioning for the record... ^_^
One could philosophically look at the search submission POST as the creation of a saved search (especially when a search is as complex an object as the one your users are making). In this case, you could accept the post for the creation of a search and then redirect using a GET to fetch the appropriate search results (post/redirect/get).
This would also allow the users to bookmark the search results (GET) to coming back at any time to re-run the search.
Get can have one advantage if your search results can be shared, in case of post request if you send the link to someone, that person won't see any search results