How can I create a website summary with Perl? - perl

When you share something on Facebook or Digg, it generates some summary of the page. How would I do this in Perl? What algorithms are there?
For example:
If I go to Facebook and tried to share this question as a link:
How can I create a website summary with Perl?
It retrieves "Facebook/Digg get website summary? - Stack Overflow" as the title (which is just the title of the page) and [... incomplete question?]

CPAN is your friend.
Some promising looking modules:
HTML::Summary
HTML::SummaryBasic
Lingua::EN::Summarize

Assuming you mean sharing a link...
Usually the summary is written by the user submitting the URL. If you have to write a summary automagically this can be achieved by:
Using the first 100 or so characters of the document body (in itself not easy)
Using metadata like the description or keywords (often empty or spammed)
Context-relevant summaries like recreating Google snippets (sorry its PHP but simple)
Tags/keywords from the document using something like the Yahoo Keyword Extractor API or your own keyword density function
Your best bet is to ask the user!
Hope that helps somewhat :)

Basically you want to scrape the URL and find the "most significant paragraph" which might be the first <div> or <p> element after the first <h2> or <h1>, depending on the layout of the page.

You could check and see if there is a meta description on the page, but that leaves you at the mercy of whoever wrote the meta description.

Related

confluence create a link of a page to another page name?

I am using Confluence via Emacs confluence-mode, and I would like to give my pages a meaningful name, but then also give them a short name that would make it faster to type when editing. Can I create a page that is just a link to another page with a different name?
For example, create a page called My meaningful name for this page and then create a link to that page called p1 that is simply a pointer to the other page. When opening p1, it would actually be opening the other page.
Essentially, I want to be able to open the page:
http://my-confluence.mydomain.com/display/MySpace/p1
And end up with:
http://my-confluence.mydomain.com/display/MySpace/My+meaningful+name+for+this+page
Any ideas?
Sounds like you need a redirect. On your p1 page, throw this inside an html macro:
<script type="text/javascript">
window.location.href("http://My meaningful name for this page.com");
</script>
That will automatically redirect to the correct page. Only downside is it might visibly show the p1 page for a fraction of a second. If it does, and that is unacceptable, I would refer you to this answer on Atlassian's forums:
https://answers.atlassian.com/questions/124121/how-to-properly-redirect-pages-in-confluence
It lists 2 plug-ins, one of which I use, which do it very seamlessly. They're not free though.
Can you describe your use-case a bit more? You can achieve this by using anchors (https://confluence.atlassian.com/display/DOC/Working+with+Anchors) if you're linking to the same page. If you need to link to a new page, this might help:
https://confluence.atlassian.com/display/DOC/Configuring+Shortcut+Links

How do I post a query string to a form field?

I have a website with a link that says "Click here to claim your prize" and that link goes to a Sales Force catch all web-to-lead form (that multiple sites use and it has a site ID that I append to the URL so the data goes to the right account)
When a user clicks that link and goes to the form page a string (in this case it's the promo code "my prize") needs to be passed to that form page and placed in the comments form field.
My questions are how is this done, are there any tutorials you could point me to, and is there a better method for accomplishing this?
What I'm trying to avoid is having the link say "Click here to win a free prize! Must enter "My Prize" in the comments" and having the user manually enter the promo code.
Thanks muchly in advance for your help!
I've used a handy jQuery script in the past called preset.js, although the instructions are not that great it's fairly simple to implement, you may have to pick through his source code though.
This question may be better on stack overflow but I found this blog post which shows example code using php and cURL to post to the web-to-lead form
http://www.paulwest.co.uk/article.php/salesforce-form-integration-with-php
This is another post about pushing leads to salesforce with php
http://sim.plified.com/2009/02/13/pushing-leads-to-salesforce-with-php/
And one more for good luck
http://www.seobywebmechanix.com/salesforce-php-form-processor-curl-tutorial
Hope these help

How to share a few specific elements of a page only

I have a page of products on my website with a share button on each, now how can I tell facebook to automatically select the title and image of the product shared rather than the whole page.
A link to my site
The site is written using php on smarty type script with tpl files,
ps i dont really know php very well.
Thanks in advance :)
A server side scripting language would be the easiest:
in php you could share the page + an extra variable , then if your page detects this variable you can set specific meta data.
http://www.fancydressclothing.net/categories/49/1950s-Costumes.php?product=1
And in php fetch the correct data for product 1 and set the meta data:
http://developers.facebook.com/docs/opengraph/
You should use a canonical URL for each product which will help in two ways. One with the facebook linter getting the correct og tags. And two, helping your SEO rankings too.

From where does Google get the abstract for each of its site results, that it displays on its search result page?

I am working on a project in which i have to search for terms on a search engine and then cluster the results on their contextual sense. So i have to treat each result as a document. unfortunately, the data present along with each result on the result page is too little for clustering. Hence, I wanted to know from where the search engines get the abstract for each result that they show. If i could get that entire abstract then i can cluster the results by treating them as separate documents.
From where does google get the abstract ?
For eg: If you search for "1000 Mile" on google, the second result shows the following abstract:
"The women's 1000 Mile Collection is based on classic designs and reflects Wolverine's long heritage of crafting quality footwear. Complementing these classics ..."
This abstract is not present in the Meta tags of the page.
From where does Google find this data.
Thanks
From Does Google use the Meta Description Tag for Description of Page?
Google will choose your search results snippets from the following places (not necessarily in this order):
The page's Meta Description tag
The page's Open Directory Project (ODP) Listing
Page content relevant to the search query
If you do not want Google to use the ODP listing's description then you can tell them not to do so with the following Meta tag:
<meta name="robots" content="NOODP">
If you want to encourage Google to use your Meta Description tag then make sure it is unique to each page. Also make sure it contains an accurate description of the page's content.
In thew absence of an ODP description and Meta Description tag, Google will use a portion of the page's text as the description. This text will contain the closest matches to the search query. I have not seen any official limit to how long this can be but a couple of sentences seems about right.
On a related note, if you don't want a snippet to be shown with a particular page you can use the following Meta tag to prevent one from being shown:
<meta name="robots" content="nosnippet">
See this blog post for Google's tips on using the meta description tag.
According to this site, "The meta description should typically be at most 145 to 150 characters in length as these are the maximum number of characters typically displayed at Yahoo! and Google, respectively."
That site is Flash-based, and Google can index Flash content, so given that the snippet isn't in the HTML source of the page as you point out, nor is it in the cached version of the page, I'm guessing that it's somewhere in the Flash movie.
It's kind of arbitrary that the snippet mentions 'The women's 1000 Mile Collection' while the site link itself is to the parent category of 1000 mile, not just women's, so I'm guessing here that gathering snippet-friendly metadata from a Flash site is an imprecise science. That's my best guess.
In this Google Webmaster blog post, they explain how they use external text or HTML files loaded into the Flash movie, and in one of the comments Jonathan Simon says (sorry):
"We try our best to crawl Flash content but the results can sometimes be less than ideal. You are only seeing a title in the search results for your site because that's the only bit of HTML text that you have outside of your Flash content. You could add a Meta description element to offer more information in HTML. You could also add some other text that's not a part of your Flash content. Just doing this should improve the snippet you see associated with your site in the search results."

1. I fill out a form & click submit. 2. I get the results page. Goal: Get the same results without filling out the form again

This is my first time posting - I greatly appreciate any and all guidance on this subject.
Background: I am building a Real Estate web site. I would like to use the free IDX data provided by my local MLS board. The MLS board does not allow me the option of displaying a predefined search and only provides me with a link to the search field. after filling out the search field, I am able to view the results.
Goal: I would like to bypass this step and frame the results page into a GoDaddy website I am building, which supports HTML.
Here is a link to the search page:
http://fgcmls.rapmls.com/scripts/mgrqispi.dll?APPNAME=Fortmyers&PRGNAME=MLSLogin&ARGUMENT=vBSJvLQtMcbg7F0O0KnXDiggv%2F12B0S6Ss9wv4510QA%3D&KeyRid=1
I am trying to only show the listings that appear in my neighborhood. Options include:
1. Property Type - Residential
2. GEO Area - FM11
3. Developments: Fiddlesticks Country Club
Once this criteria is entered, I have the page needed to make this project work.
Thank all of you for taking the time to read this and for the time you spend helping me out.
Best regards,
Chris
Without looking at the page itself, it's probably doing a "POST" operation to give the form to the website. You should be able to use javascript (maybe jquery or some other ajax framework) to do this for you in the frame and have it display the results.
-Adam
So long as this is a POST form and they aren't doing a lot of strict referrer checking, the following should work:
Replicate the form on your own site.
Make a few minor changes to automate a few of the fields to better serve your geographic area/company.
Ensure everything is a full path and not relative to the server handling the query.
You will probably end up changing a lot of the text/select fields to hidden fields with pre-set values to keep it simple for end-users. The server handling the request won't know the kind of field it came from, just the value and name.
I took a look at the page HTML, the form posts is defined thusly:
<FORM action="/scripts/mgrqispi.dll" method="POST" name="InputForm" />
you may be able to create your own form defined like this:
<FORM action="http://fgcmls.rapmls.com/scripts/mgrqispi.dll" method="POST" name="InputForm">
</FORM>
You will have to go through the HTML on the page you provided to get the appropriate ID's and Name's of the form elements you are interested in. Its possible their processing page checks to ensure its their form that is submitting to it, in which case this wouldn't work.
good luck.