I'm trying to generate tracking statistics where I can save the campaign and traffic sources. Therefore I would like to generate visitor cookies with an id and save the traffic sources inside a table. Is there a TYPO3 way to generate such a cookie? If not, where should I include the logic for that? The whole project is organised as an extension. Should I generate the cookie inside the ext_localconf.php
I'm thankful for every help.
In the core the PHP functions for cookies are used. There is no API available (or necessary) to handle them.
Output is buffered so you can set cookies (and other headers) anywhere in your code.
Related
I am working on linkchecker and want to know that when AEM saves the URLs in /var/linkchecker and on what basis?
If i am opening a link,then it saves it,or it has a polling like it traverse the complete content and put it in /var/linkchecker.
Which java class help to store valid or invalid links in its storage directory?
LinkChecker is based on an eventHandler for /content (and child) nodes on creates and updates. All content is parsed and links are validated against allowed protocols and (configurable) external site links.
External Links
All the validation is done asynchronously in the background and the HTML is updated based on verification results.
/var/linkchecker is the cache for external links. The results based on simple GET requests to external links in order to optimise the process. The HTTP 200/30x response means that the links are valid. AEM looks at this cache before requesting a validation of the external link in order to optimize the page processing. This also means that the link validation is NOT real time and the delay is proportional to the load on your server.
All the links that have been checked can be seen via the /etc/linkchecker.html screen where you can request for revalidation and refresh the status of the links.
You can configure the frequency of this background check via the Day CQ Link Checker Service configuration under /system/console/configMgr. The default interval is 5 seconds (scheduler.period parameter).
Under the config manager /system/console/configMgr you will find a lot of other Day CQ Link * configurations that control this feature.
For example, Day CQ Link Checker Transformer contains config for all the elements that need to be transformed by the link checker.
Similarly Day CQ Link Checker Info Storage Service configures the link cache.
Internal Links
Internal links are ignored unless they used FQDN and external urls (which is not normally the case on author). The only exception is in a multi-tenant environment where page from one site links to another site and all the mapping information is stored in sling mappings.
I want to know the difference between Follow Redirects and Redirect Automatically while recording with Jmeter.
Also what effect will both these have when used with Retrieve all Embedded Resources from HTML
Redirect automatically, will not consider redirect as a separate request
where as Follow redirects will consider each redirection as a separate request.
This difference can be visualized in the Listener (View Results Tree).
If Retrieve all Embedded Resources from HTML is checked, it will give you Page Load Time, since apart from response time it will keep on calculating the time taken till all the supporting files of html page have been downloaded to Local (CSS, Images, Javascript files.. etc.)
Also if any values needs to be captured from redirect request you need to set configuration a follow redirect otherwise will not be able to capture those data using extractors (set cookie values for example)
Hope this will help.
I'm using WWW::Mechanize to retrieve a webpage. I need to check if the page has been updated and retreive information from it. How can I do this?
Use the mirror method. This works fine for GET requests, see the method attribute of the form element which you are submitting. Just take note of the URI where you arrived, use it to repeatedly call mirror. Then there is no need to fill and submit the form anymore.
In the case of POST, you are not able to leverage any assistance from the HTTProtocol (conditional requests, ETags and other cacheability features). You have to manually write out fetched results to files, then compare them.
Please give me idea about the management of data in GWT. I am using Gwt in my travel portal project and my web pages is related to previous page data but when i press the refresh button of browser's then my data is lost . so please inform me if there is any way to manage this problem.
GWT History class cannot be used to manage page refresh (only back/forward).
A click on the refresh button send a request to the server and the state of the application is reloaded from the server. That's all. You have to deal with it.
If you don't want to lose your data, you have to find a way to save it on the server when it's needed.
If your users have modern browsers, you can use the HTML5 feature localStorage to store the data in the browser between page-refresh.
Check this thread for supported browser.
You can create a url fragment to encode your data.
String location = "ny";
History.newItem("location="+location);
will result with a url fragment of www.example.com#location=ny
Then if the browser is refreshed, you can decode the url fragment and determine that the location is ny.
For multiple parameters you can create a complex fragment and parse it.
History.newItem("start="+startLocation+"&end="+endLocation);
Then the url would look like www.example.com#start=newyork&end=boston
The basic idea is to store some state in the URL fragment (the part of the URL after the #) -- for example your-site.com/app#page-1
To listen for changes to the fragment, use GWT's History class. The fragment will change when the user goes back/forward, or refreshes the page.
So you could have your app do different things when the URL has #page-1 vs #page-2, etc.
A more generalized and scalable solution to this is something like gwt-platform's Place architecture (along with Presenters, which are also a good idea for large apps)
I want to crawl a site with Greasemonkey and wonder if there is a better way to temporarily store values than with GM_setValue.
What I want to do is crawl my contacts in a social network and extract the Twitter URLs from their profile pages.
My current plan is to open each profile in it's own tab, so that it looks more like a normal browsing person (ie css, scrits and images will be loaded by the browser). Then store the Twitter URL with GM_setValue. Once all profile pages have been crawled, create a page using the stored values.
I am not so happy with the storage option, though. Maybe there is a better way?
I have considered inserting the user profiles into the current page so that I could all process them with the same script instance, but I am not sure if XMLHttpRequest looks indistignuishable from normal user initiated requests.
I've had a similar project where I needed to get a whole lot of (invoice line data) from a website, and export it into an accounting database.
You could create a .aspx (or PHP etc) back end, which processes POST data and stores it in a database.
Any data you want from a single page can be stored in a form (hidden using style properties if you want), using field names or id's to identify the data. Then all you need to do is make the form action an .aspx page and submit the form using javascript.
(Alternatively you could add a submit button to the page, allowing you to check the form values before submitting to the database).
I think you should first ask yourself why you want to use Greasemonkey for your particular problem. Greasemonkey was developed as a way to modify one's browsing experience -- not as a web spider. While you might be able to get Greasemonkey to do this using GM_setValue, I think you will find your solution to be kludgy and hard to develop. That, and it will require many manual steps (like opening all of those tabs, clearing the Greasemonkey variables between runs of your script, etc).
Does anything you are doing require the JavaScript on the page to be executed? If so, you may want to consider using Perl and WWW::Mechanize::Plugin::JavaScript. Otherwise, I would recommend that you do all of this in a simple Python script. You will want to take a look at the urllib2 module. For example, take a look at the following code (note that it uses cookielib to support cookies, which you will most likely need if your script requires you to be logged into a site):
import urllib2
import cookielib
opener = urllib2.build_opener(urllib2.HTTPCookieProcessor(cookielib.CookieJar()))
response = opener.open("http://twitter.com/someguy")
responseText = response.read()
Then you can do all of the processing you want using regular expressions.
Have you considered Google Gears? That would give you access to a local SQLite database which you can store large amounts of information in.
The reason for wanting Greasemonkey
is that the page to be crawled does
not really approve of robots.
Greasemonkey seemed like the easiest
way to make the crawler look
legitimate.
Actually tainting your crawler through the browser does not make it that more legitimate. You are still breaking the terms of use of the site! WWW::Mechanize for example is equally well suited to 'spoof' your User Agent String, but that and crawling is, if the site does not allow spiders/crawlers, illegal!
The reason for wanting Greasemonkey is that the page to be crawled does not really approve of robots. Greasemonkey seemed like the easiest way to make the crawler look legitimate.
I think this is the the hardest way imaginable to make a crawler look legitimate. Spoofing a web browser is trivially easy with some basic understanding of HTTP headers.
Also, some sites have heuristics that look for clients that behave like spiders, so simply making requests look like browser doesn't mean the won't know what you are doing.