Why is Google truncating my URLs? (using Zend framework) - zend-framework

I run a site called anecdotage.com (bunch of funny stories about famous ppl).
We just moved to a new platform (socialengine 4, Zend framework).
Our URLs are new & google has been indexing them, which is great!
But I just noticed that it's ignoring the most iportant part of the URL... not so great!
It still works, but I'm worried about our page rank.
EG:
System URL:
http://www.anecdotage.com/articles/7994/allman-brothers-foot-shootin-morons
Google URL:
http://www.anecdotage.com/articles/7994/
[You can see here:
https://www.google.com.mx/#hl=en&sclient=psy-ab&q=site:anecdotage.com%2Farticles&oq=site:anecdotage.com%2Farticles
We submitted a txt sitemap, with complete links. Why is Google doing this? Does it matter for SEO?
Also:
When I follow a link from google, it does this:
http://www.anecdotage.com/articles/7994/#.UHOF_JhX3ZI
Can anyone explain what's going on?

Both urls have identical content.
To solve this you could had canonical link in the header pointing for the desired url you wish goole to index:
<link rel="canonical" href="http://www.anecdotage.com/articles/7994/allman-brothers-foot-shootin-morons" />
More details at google blog

Related

English Literature site is not indexing new pages in google

This is my site link:
www.englishact.com
This is the sitemap current position:
Google is showing no error in sitemap or any other pages. But indexed pages are 0 for about 3 months. I also have uploaded new sitemaps which are acting same way with no index.
NB:
I am using 1and1 paid hosting package. Also, google has accepted adsence for this site. Now what can I do? Any suggestions?
Your website is index on Google, i just searched for site: www.englishact.com and got many results.
Check if the links in your XML sitemap are valid or redirecting to another URL.
Also you have to solve the duplication in the URLs, you can access your website with WWW and without it, also you have two URLs for the homepage http://englishact.com/ and http://www.englishact.com/index.php
After fixing these errors your website should be healthy and Google will understand the structure of it.

AMP errors in web master tool

I have implemented AMP successfully for my webpages and google started indexing it, which I came to know via WebMaster tool. I am facing some issues which is present and disappears in short span of time.
Issue logged are:
User authored JavaScript found on page
The pages doesn't contain any script tags except schema.
This error is showing for few pages from 120 pages instead of following same
template. Below is the image link:
Have some more query:
I have observe different amp urls getting redirected to its original page when the same amp url is being used in Web Browser.
Is Google taking care of it or its on us to do the redirection?
I am planning to implement the sign in and share buttons on my web pages which will be using javascript. But if I do so, I do get validation error. So what is the right approach.
Can anyone please help me on this?
Please ensure that all script tags are of type application/ld+json. There should be no executable code in these script tags.
Redirection is something that you must be doing on your end. Google doesn't do any sort of redirection from AMP to non-amp pages if the URL is hit directly. In fact that URL schema that Google uses in their carousel is entirely their own, and just includes the path to your page inside it. E.g. https://cdn.ampproject.org/v/www.yoursitehere.com/path/to/article.html
Social sharing using Javascript inserted in the page is not allowed, as no Javascript is allowed. If you want to use social sharing, use a non-javascript implemention, or try out the amp-social-share
thanks for the response. As per the query which I asked
Please ensure that all script tags are of type application/ld+json. There should be no executable code in these script tags - I am not using any Script as of now except amp only
Redirection is something that you must be doing on your end. Google doesn't do any sort of redirection from AMP to non-amp pages if the URL is hit directly. In fact that URL schema that Google uses in their carousel is entirely their own, and just includes the path to your page inside it. E.g. https://cdn.ampproject.org/v/www.yoursitehere.com/path/to/article.html -
Understood
Social sharing using Javascript inserted in the page is not allowed, as no Javascript is allowed. If you want to use social sharing, use a non-javascript implementation, or try out the amp-social-share - Implemented Social Share and its working fine
Can we implement AMP for eCommerce sites where a lot of JavaScript, forms, plugins can be included? As of my knowledge AMP wants to keep it simple and thus restrict as many JavaScript, form tag is not valid only. So is there any chance we can implement AMP on eCommerce sites.

Ajax Base Googlebot Crawl

This is the website which we released last week(One page website) http://www.itslayer.com/
We are getting problem to test this with Google Webmaster tool. We developed the website as per the doc - https://developers.google.com/webmasters/ajax-crawling/docs/getting-started
Please let us know how can we test Ajax base website with googlebot and whether our implementation is correct.
Thanks in advance for suggestion. Please suggest.
Just try this : http://www.itslayer.com/?_escaped_fragment_=md5 and see that you don't have content in it.
The "HTML snapshots" must be what real user see after loading the page with JS.
In consequence, the URL with ?_escaped_fragment_ must return the whole HTML page. Not only <title> & <meta>.
Read further on Google doc : How do I create an HTML snapshot?

Site not valid - but it is

So, I'm building a website called "dagbok.nu", which is swedish for "diary now" :)
Anyway, when creating the Facebook application, it claims that the site URL is invalid as well as the app domain. For site url, I used "http://dagbok.nu" and for site domain, I used "dagbok.nu". Please don't reply (as I've seen others do on similar issues) that I should type the site url with the scheme and the domain without - that's exactly what I'm doing.
Right, so according to another question here, one could trouble shoot this functionality using FB's own URL scraper, so I did just that:
http://developers.facebook.com/tools/debug/og/object?q=http%3A%2F%2Fdagbok.nu
And the reply: Error Parsing URL: Error parsing input URL, no data was scraped
Right, so now I can assume that the reason for it being considered invalid is because of FB not being able to scrape the URL. But why?
According to this question, one of the reasons seems to be that FB has deemed the URL insecure or "spammy". I've acquired this domain from a previous owner so this wasn't all that impossible. But when doing the same thing as Matthew in that post - i.e. trying to post in my timeline using the domain "http://dagbok.nu", I didn't get any information. The status box expanded as if to include a thumbnail and information about the link, but it only contained a "(No title)" text and nothing more.
So now I don't know what to do. I've tried to check the DIG and NS records from multiple servers around the web, and everyone seems to resolve it correctly, and I've had friends double check the URL from the states as well. I can't understand what's wrong and I have no idea how to ask someone at FB how to resolve this. Does anyone here have a good advice for this? Thanks in advance! :)
EDIT
When changing the domain to another domain that points to the exact same web server and document_root, it works! So this is definitely a problem with the domain "dagbok.nu" and not with the code on that page.
EDIT
When using the debug function above - I see no activity in the server log what so ever. Facebook doesn't even contact the server. When using the alternate url - the one from the last edit, it pops up in the logs as it should.
EDIT
I filed a bug report with Facebook, And their first response was that they were going to follow up. Now, a month later, I got an email that said "We are prioritizing bugs based on impact to the developer community. As this bug report has not received much attention from other developers, we are closing it so as to better focus on the top issues", and then they told me to go here to stackoverflow to try to solve my issue - but the issue is WITH THEM, and of course no one else have reported that my site doesn't work, it affects only me, and I haven't opened it yet due to this bug!
EDIT
I wanted to file a new bug report, but I can't even that now, since they are blocking bug reports with this URL as well!
I had to edit the URL - here is the new bug report
When Facebook tries to scrap your site for information, they send a call to your server with specific user agent called "facebookexternalhit"...
Facebook needs to scrape your page to know how to display it around
the site.
Facebook scrapes your page every 24 hours to ensure the properties are
up to date. The page is also scraped when an admin for the Open Graph
page clicks the Like button and when the URL is entered into the
Facebook URL Linter. Facebook observes cache headers on your URLs -
it will look at "Expires" and "Cache-Control" in order of preference.
However, even if you specify a longer time, Facebook will scrape your
page every 24 hours.
The user agent of the scraper is: "facebookexternalhit/1.1(+http://www.facebook.com/externalhit_uatext.php)"
Make sure it is not blocked by your server firewall
Look in your server log if it even tried to access your site
If you think this is a firewall issue look at this link
Your problem appears to be with your character encoding string. Your Apache server is currently sending the unsupported string latin1. You've defined your meta:content-type as iso-8859-1. See the w3c validator
From what I've seen, the Facebook parser will stop immediately if it encounters either an unrecognized character encoding string or a mismatch in character encoding strings between your header and meta tags.
The problem could be originating from either your httpd.conf or php.ini files. Change these to match your meta and restart Apache. Since the problem seems to be domain-specific, I'd check httpd.conf first.
Could your domain be blacklisted? Could you try messaging your url to someone, and see if Facebook gives you a "This message contains blocked content..." error?
For example:
If you don't provide certain minimum Facebook markup on your page, it will respond with "Error Parsing URL: Error parsing input URL, no data was scraped." I only looked at the homepage, but it appears that dagbok.nu contains no Facebook markup. I'm not sure what things must be present at minimum, but in my implementation, I assume the fb:app_id meta tag and the JavaScript SDK script must be there. You may want to take a look at http://developers.facebook.com/docs/guides/web/#plugins , particularly the Authentication section.
I discovered your question because I had this same error today for an unknown reason. I found that it was caused because the content of my og:image meta tag used an incorrect URL to the image I was trying to use. So as you add Facebook markup to your page, make sure your values are correct or you may continue to receive this message.
This doesn't seem to be a Facebook problem if you take a look at what I've discovered.
The results when testing it with W3C Online Validation Tool are 1 of 2 results.
Tested using: dagbok.nu but note http://dagbok.nu has no difference in test results. Remove the last forward slash in between tests.
Test: 1
Results: 72 Errors 0 Warning
Note: Shown here is a fragment of the source Frameset DOCTYPE webpage.
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Frameset//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-frameset.dtd">
<NOSCRIPT><IMG SRC="http://svs.bystorm.se/rv?java=off"></NOSCRIPT><SCRIPT SRC="http://svs.bystorm.se/rvj"></SCRIPT>
<HTML STYLE="height:100%;">
<HEAD>
<META HTTP-EQUIV="content-type" CONTENT="text/html;charset=iso-8859-1">
Test: 2
Results: 4 Errors 1 Warning
Note: Shown here is a fragment of the source Transitional DOCTYPE webpage.
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd">
<html >
<head>
<title>Dagbok: Framsida</title>
<meta http-equiv="content-type" content="text/html; charset=iso-8859-1">
<meta name="author" content="Jonas Eklundh Communication (http://jonas.eklundh.com)">
<meta name="author-email" content="jonas#eklundh.com">
<meta name="copyright" content="Jonas Eklundh Communication #2012">
<meta name="keywords" content="Atlas,Innehållssystem,Jonas Eklundh">
<meta name="description" content="">
<meta name="creation-time" content="0,079s">
<meta name="kort" content="DGB">
Repeated tests loop these results when done a couple seconds apart indicating a page-redirect is occurring.
Security warnings are seen in Firefox and Chrome when visiting your site using these secure URL's:
https://dagbok.nu
https://www.dagbok.nu
The browser indicates the site should not be trusted because it's impersonating another site using invalid security certificate from *.loopiasecure.com
Recommendation: Check your .htaccess file, CMS Settings, page redirection, and security settings. Use the above source webpages to realize those file-locations / file-names that are being served to discover what's set incorrectly.
Once that's done, I think Facebook will be happy to then debug your webpage and provide additional recommendations.
Had the same problem and I discovered it was an incorrect IPv6 address in the AAAA records for my domain. The IPv4 record was correct, so the site worked in a browser but FB obviously check the IPv6 records!
This issue may also happen when Cloudflare is used. This is because Cloudflare protects the page from Facebook, which is then unable to collect the data, which in turn makes Facebook think the page is invalid.
My fix was:
Turn off Cloudflare for the page.
Scrape the page using Facebook's Dev Tools: https://developers.facebook.com/tools/debug/og/object
Click and let run the "Fetch new scrape information" button.
Re-enable cloudflare protection for the page.
You should then be able to continue to add the page where you needed.

Facebook Post Link Image

When someone posts a link on facebook, a script usually scans that link for any images, and displays a quick thumbnail next to the post. For certain URLs though (including mine), FB doesn't seem to pick up anything, despite their being a number of images on that page.
I read up that FB prefers the "image_src" rel tag for the image the user wishes to specify, but this does not generate that thumbnail either for my site.
My url goes directly to the DNS, and is not forwarded, so I don't imagine that could be the problem either.
Does anyone have an idea as to why FB can't generate any thumbnails from my site?
The easiest way is just a link tag:
<link rel="image_src" href="http://stackoverflow.com/images/logo.gif" />
But there are some other things you can add to your site to make it more Social media friendly:
Open Graph Tags
Open Graph tags are tags that you add to the <head> of your website to describe the entity your page represents, whether it is a band, restaurant, blog, or something else.
An Open Graph tag looks like this:
<meta property="og:tag name" content="tag value"/>
If you use Open Graph tags, the following six are required:
og:title - The title of the entity.
og:type - The type of entity. You must select a type from the list of Open Graph types.
og:image - The URL to an image that represents the entity. Images must be at least 50 pixels by 50 pixels. Square images work best, but you are allowed to use images up to three times as wide as they are tall.
og:url - The canonical, permanent URL of the page representing the entity. When you use Open Graph tags, the Like button posts a link to the og:url instead of the URL in the Like button code.
og:site_name - A human-readable name for your site, e.g., "IMDb".
fb:admins or fb:app_id - A comma-separated list of either the Facebook IDs of page administrators or a Facebook Platform application ID. At a minimum, include only your own Facebook ID.
More information on Open Graph tags and details on Administering your page can be found on the Open Graph protocol documentation.
http://developers.facebook.com/docs/reference/plugins/like
I know this question is old, but I recently dealt with the exact same problem and went round and round on it for a couple weeks. Multiple searches on Google turned up a lot of useful information, but most of it was focused on Open Graph tags, which I wasn't interested in using. Turns out my site had multiple issues, but here are some of the basics.
As EightyEight said, make sure your HTML is valid - and the same goes for your javascript and server-side code (PHP, ASP, etc.). I had a small PHP error in a piece of code that was executing as a separate call to the server from the main page. Due to a number of bizarre coincidences, that code was generating a 500 error - but ONLY for IE6 and strict parsing engines like the W3C validator and the Facebook page crawler. The problem didn't appear in modern browsers (Chrome 4, FF 3.5, IE 8, etc) so I didn't see it right away, but older/stricter clients were showing the 500 every time and that was the main reason FB wasn't crawling our page (when everything else seemed to be correct).
Regarding Randy's response, he's correct that Facebook will keep an old cached copy of your page long after you've updated it. FB claims it's only held for 24 hours, but I experienced much longer times than that. FORTUNATELY, FB has released their "URL Linter" tool that will show you a preview of how your page will appear when being shared on FB, and it will force FB to instantly update its cache of your page. This was a lifesaving tool. You can find it at http://developers.facebook.com/tools/lint/
Regarding the URL Linter tool, be aware that each variation of a URL is cached separately on Facebook, so "www.example.com" is not the same as "example.com". Also, unique capitalization is stored as well, so "ExampleOne.com" is not the same as "exampleone.com". (This led to a lot of confusion between my client and myself when it appeared to me that the cache had been updated just fine and the client claimed they weren't seeing the updates. Turns out I was looking at exampleone.com and had used Linter to update the cache, but they were looking at exampleOne.com which I hadn't submitted to Linter. As a result, I ended up submitting quite a few variations of the URL to Linter just to cover the bases.)
WyrdNEXUS's advice to use the image_src link tag is spot-on. This allows you to be sure that FB is scraping the best possible image for your page. There are some varying guidelines out there about what specs the image file should have, but I've successfully used a 128px square image and have seen a 130x97 image make it through as well. Here is Facebook's official documentation from http://developers.facebook.com/docs/reference/plugins/like/:
Images must be at least 50 pixels by 50 pixels. Square images work best, but you are allowed to use images up to three times as wide as they are tall.
Obviously, FB will resize a large image for you, but you'll almost always get better results if you resize it yourself beforehand.
Regarding Mike Cooper's link to the eHow article, avoid using step #1 in that article. It was valid advice when the article was written and when Mike posted the link, but it's now better to use the URL Linter tool for previewing how your page will appear when being shared. By using Linter, you won't cause FB to cache a (potentially) bad copy of the page before you get a chance to tweak it.
Use the facebook lintter available here. http://developers.facebook.com/tools/lint/
This will check your link and re fetch any images. this also clears any old cache.
Or try this - https://developers.facebook.com/tools/debug
To change Title, Description and Image, we need to add some meta tags under head tag.
STEP 1 :
Add meta tags under head tag
<html>
<head>
<meta property="og:url" content="http://www.test.com/" />
<meta property="og:image" content="http://www.test.com/img/fb-logo.png" />
<meta property="og:title" content="Prepaid Phone Cards, low rates for International calls with Lucky Prepay" />
<meta property="og:description" content="Cheap prepaid Phone Cards. Low rates for international calls anywhere in the world." />
NEXT STEP :
Click on below link
https://developers.facebook.com/tools/debug
Add your URL in text box (e.g http://www.test.com/) where you mentioned the tags. Click on DEBUG button.
Its done.
You can verify here https://www.facebook.com/sharer/sharer.php?u=http://www.test.com/
In above url, u = your website link
ENJOY !!!!
try this: http://www.ehow.com/how_4938148_thumbnail-show-up-facebook-share.html
Is the site's HTML valid? Run it through w3c validation service.
Actually, if you've already tried linking that page on Facebook BEFORE adding the "image_src" link, Facebook will keep using the old cached copy and not even see your changes. Try modifying the URL by removing or adding the 'www', or duplicate your page to test it.
I've noticed that Facebook does not take thumbnails from websites if they start with https, is that maybe your case?
had the same problem and figured out that my head closing tag was in the wrong place
Old question but recently I seemed to be running into same issue with thumbnail images from my link not showing in status updates on Facebook. I post for many clients and this is relatively new.
FB doesn't seem to like long URLs anymore — if you use a URL shortener such as goo.gl or bitly.com, the thumbnail from your link/post will appear in your FB update.
Try using something like this:
<link rel="image_src" href="http://yoursite.com/graphics/yourimage.jpg" /link>`
Seems to work just fine on Firefox as long as you use a full path to your image.
Trouble is it get vertically offset downward for some reason. Image is 200 x 200 as recommended somewhere I read.
If you used any plugin for seo then Check 1st your seo plugin settings.Then find out Noindex setting if Enable Media for Noindex then disable it.