Need to prevent a website link from being displayed in Bing search results

Need to prevent a website link from being displayed in Bing search results - bing

I have a web application that shows up in the Bing search. I do not want the application link to show up in Bing search. In the application root directory we have a robots.txt file that contains the following:
User-agent: *
Disallow: /
User-agent: bingbot
Disallow: /
However the link still shows up in Bing search. I also tried using this tag in the header section for specific web pages:
<meta name="robots" content="noindex,nofollow">
However the link is still displayed in bing search. We did wait for 2-3 weeks and more but the links are still appearing.
We contacted Bing Webmaster tools support and they suggested that for a URL to be removed from their index it has to be deleted from our site so that the URL returns a 404 (Not Found) or 410 (Gone) HTTP status. They also mentioned that in order for Bing to detect that the page has in fact been removed from the site and is now returning a 404 or 410 HTTP status code, Bingbot needs to be able to access the URL, so we should not block the URL from being re-crawled through robots.txt.
Now the problem here is that we cannot delete our site or redirect it to 404 error page since it is used by our client. Google search does not show the link but bing shows it. Is there any other way by which we can make the link(s) to not appear in bing search?

You can do that from the bing console if you need quick removal: https://www.bing.com/webmaster/help/block-urls-from-bing-264e560a
Have a look at the last paragraph on blocking URL's immediately.

Related

"Circular Redirect" errors in Facebook open graph debugger after API was upgraded to 2.6

On April 12th, Facebook upgraded the FB App we use to share new articles to our page to API version 2.6. Since then, stories get posted, but the image is usually not added to the story.
When I check with the opengraph debugger, I see strange errors like:
Circular Redirect
We could not resolve the canonical URL because the redirect path contained a cycle.
With redirect paths of:
Redirect Path
Input URL arrow-right https://www.bleepingcomputer.com/news/government/34-tech-firms-sign-accord-not-to-assist-government-hacking-operations/
301 HTTP Redirect arrow-right https://www.bleepingcomputer.com/news/government/34-tech-firms-sign-accord-not-to-assist-government-hacking-operations/
Even stranger, sometimes the redirect paths show urls like below, which are not from my site:
Input URL arrow-right https://www.bleepingcomputer.com/news/security/crooks-hijack-router-dns-settings-to-redirect-users-to-android-malware/
301 HTTP Redirect arrow-right https://www.bleepingcomputer.com/news/security/crooks-hijack-router-dns-settings-to-redirect-users-to-android-malware/?utm_content=70198165&utm_medium=social&utm_source=facebook
og:url Meta Tag arrow-right https://www.bleepingcomputer.com/news/security/crooks-hijack-router-dns-settings-to-redirect-users-to-android-malware/
Once I click on the scrape button again, it gets rid of the error, but the images still do not show in the post on my FB page.
It is almost as if the scraper is targeting URLs that are not mine, even though I am sending them correctly and I have the same urls in my og:url and canonical tags.
Any ideas?

Facebook takes wrong canonical URL

Scenario of the problem:
We enforced HTTPS on a website. Any URL with HTTP now redirects (301 permanent redirect) to an appropriate HTTPS URL.
To avoid Facebook like/share buttons (that are placed on many pages of the website) loosing previous numbers of likes/shares, we made the buttons to "link" to the old HTTP URLs via the "data-href" property.
Additionally we placed the "og:url" meta tag on some pages, pointing to the old HTTP URLs.
I then scraped that pages at the Facebook debugger tool https://developers.facebook.com/tools/debug to make sure the Facebook gets the fresh data. According to the scraped data, canonical URLs were indeed pointing to the old HTTP URLs just as it should be according to our actions listed above. This was also reflected in the like/share buttons on our pages keeping the old numbers.
A few days later I discovered that some pages lose the old numbers of likes/shares. Checking the pages in the Facebook debugger shows that Facebook now takes HTTPS URLs as canonical. We did not make any changes on our pages, and the "og:url" tag is still pointing to the HTTP URLs. But the Facebook wrongly takes HTTPS URLs as canonical URLs. Now if I scrape the information again in the debugger, it agains becomes normal, showing HTTP as canonical and restoring the old number of likes/shares. But obviously it's not a solution to the problem, because we cannot constantly monitor all our pages and scrape them again and again.
Any ideas of what may causing the problem?

Facebook follows HTTP redirects as well. You need to make your old HTTP URLs available to the scraper, without redirecting it to the HTTPS version. (The scraper can be recognized by its User-Agent, see social plugins FAQ.)
The old HTTP URLs need to be available to the scraper, and not redirected to HTTPS, as the FAQ also mentions:
“This also requires that the old URL still renders a document with Open Graph tags and returns a HTTP 200 response, at least when loaded by Facebook's crawler. If you want other clients to redirect when they visit the URL, you must send your 301 HTTP response to all non-Facebook crawler clients. The old URL should contain its own og:url tag that points to itself.”

Will google recrawl 301 redirected to a 404 error?

I made an error when bulk 301 redirecting from an old domain to new domain with same url structure.
Google bot followed the 301 redirect for each page on the old site to my new site, which gave a 404 error. I tested it browser in worked for a user but somehow did not work for the google bot and I detected it too late.
I fixed the error now and the pages could be (hopefully) accessed by google bot at their new urls.
Question: will google recrawl 301 redirects which led to 404?

They'll crawl it for a little while but eventually the 404 status will tell them the page is gone and they will stop crawling it and remove it from their index. If you fixed the error before they stopped crawling the original URL then they will follow the redirect and associate the new URL with the old URL.

Comments not crawlable by search engines?

I was wondering if Search Engine spiders can see the comments, when I open the source of the page the comments are not showing up (same as with disqus), so I'm assuming when the search engines crawl the page they won't see the comments either? Is this assumption correct? If so, is there a way to change this?
Found the solution:
http://developers.facebook.com/docs/reference/plugins/comments/
How can I get an SEO boost from the comments left on my site?
The Facebook comments box is rendered in an iframe on your page, and
most search engines will not crawl content within an iframe. However,
you can access all the comments left on your site via the graph API as
described above. Simply grab the comments from the API and render them
in the body of your page behind the comments box. We recommend you
cache the results, as pulling the comments from the graph API on each
page load could slow down the rendering time of the page.

Only what get thrown to a crawl engine the crawl engine can see, hence these comments should be outputted in able to get crawled and saved into the SE database or whatever it uses to collect data about websites, you might check the headers the connection request came from, if it belongs to a crawl engine and that's called a user agent in our case humans (browsers), here you can find a way to detect crawlers using PHP, after detecting it you force the comments to be shown in order to get crawled, here also a good resource on how to deal with crawlers from Google itself.
Now if you're talking about Facebook comments, it's impossible to let them indexed by the crawler or SE, when a crawler attempt to visit one of the Facebook pages it won't be able to see users' data because of the login page, and if you are talking about Facebook plugins you may do what what I suggested above, article talking about Facebook comments crawling.

Facebook debug tool giving Bad Response Code (redirect) for my site

Today, I tried to post a link from my website on my wall, but no OG meta information was fetched. Therefore, I went on to Facebook URL Linter to check things for myself and see if OG meta tags are fetched properly.
To my surprise, every link on my website including the domain itself, generated Response code: 302.
My OG tags are well set and they were working fine a few days ago. Following is what the debugger shows for my domain : http://www.price-tag.org
Response Code: 302
Fetched URL: http://price-tag.org/
Canonical URL: http://price-tag.org/
Final URL: http://price-tag.org/WpjZW/
For every page from my website, facebook linter is adding an arbitrary string like above WpjZW.
Please let me know if this is a facebook error or something has gone wrong at my end.

It seems the linter is being sent on a circular redirect loop. It is seeing a 302 redirect.
For http://www.price-tag.org I don't see anything but a 200 response code when I navigate to your website using Firefox.
However, if I remove the www from the url and go to http://price-tag.org, I do see the strange behaviour of the 302s and then a random url like http://price-tag.org/UKXRN/
I would suggest you contact your server admin or web master to see what they have in place that is doing this 302 redirect and strange url characters.

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse

Need to prevent a website link from being displayed in Bing search results - bing

You can do that from the bing console if you need quick removal: https://www.bing.com/webmaster/help/block-urls-from-bing-264e560a Have a look at the last paragraph on blocking URL's immediately.

Related

"Circular Redirect" errors in Facebook open graph debugger after API was upgraded to 2.6

Facebook takes wrong canonical URL

Will google recrawl 301 redirected to a 404 error?

Comments not crawlable by search engines?

Facebook debug tool giving Bad Response Code (redirect) for my site

Categories

Resources