Site is not accessible by webscrapers - facebook

The website I am using is being scraped with wrong information. (metatag checker) Namely the information when looking for a domain of my hosting company (my host name service provider). I can totally not understand why. I want to be able to share my site on social media but somehow it is not available. I really don't know what is going on.. Can someone please help?
The following links show what I mean:
https://rankingapp.metatags.nl/nl/reports/mysite.nl
https://developers.facebook.com/tools/debug/sharing/?q=www.mysite.nl
And this is my website: http://www.mysite.nl

Facebook prefers IPv6 over IPv4 when available, and in the debug tool output you see
Server IP: 2a02:2268:ffff:ffff::4
So go check/have checked that your DNS configuration is correct.
Having the DNS wrongly resolve the IPv6 address of the domain to the "main domain" of a shared(?) hosting account is a common cause for such problems. End users are still largely "on IPv4" when they're browsing the net, so real users hardly ever notice such issues - but when the Facebook scraper comes along and wants to request the page via IPv6, it goes wrong ...

Related

GitHub page 'Your connection is not private'

I have searched for this and found answers that do not work in my case. I would appreciate some thoughts on this
I have set up a github page at: https://ir-ischool-uos.github.io/mwpd/
Some users reported that when they visit the page, an error about security is displayed, like this on Chrome:
- However, many users say it works ok for them.
I have found some sources say that this only happens if your link contains 'https' instead of 'http', but tested on two computers, one mobile phone and one tablet they both work fine. I also found source that say I should use GitHub page's https support, and I checked my setting this already is ticked.
Is there anything I can do to fix this for every user?
Thanks
This error could happen because of numerous of reasons. For example:
The server certificate (or at least one of the certificates in the chain of trust) is not among the trusted certificates that the browser/system maintains (maybe an outdated list?). Try to update the browser/system.
The date/time on the system is not configured correctly.
The connection is being intercepted (by an attacker?) and the certificate is manipulated, hence the SSL connection handshake process could not complete.
Your connection is not private error appears on websites using the SSL / HTTPs protocol when a browser is unable to validate the SSL certificate issued by the website.
Basically, any website using SSL / HTTPs protocol sends a security certificate information to users browsers upon each visit. Browsers then try to validate the certificate using the public key accompanying the certificate.
If it checks out, then users browser encrypts the data using the private key sent by your website. This encryption secures the data transfer between a user’s browser and your website.
I have checked it accross 3 different connections and they all worked just fine.
I believe the problem could be from the users. They may need to clear their cache, check if their clock is set correctly, their antivirus could be stopping it. And their browsers may be outdated.
What I will advice is just (https://support.github.com/contact). They could check to verify if this is an issue from the server or not.
But from what am looking at, this may be an issue with the user's device.
Also here are a few links you could refer and see if all settings on your own part are rightly set;
[1] https://github.com/docsifyjs/docsify/issues/236
[2] https://help.github.com/en/github/working-with-github-pages/securing-your-github-pages-site-with-https
[3] https://help.github.com/en/github/working-with-github-pages/troubleshooting-custom-domains-and-github-pages#https-errors
I hope this helps. Let me know!
If you are using a school/college wifi, most probably someone has your credentials and he/she is using it at the same time as you so basically when he/she is using the web you'll get this message, you should probably change your password or switch on VPN.
If the WiFi/other network used to access the website in question is a school or public network, some 3rd-party software used by it's administrator might be trying to prevent or override the connection to your website.
That might happen in order to display an error message (e.g. "Website access prohibited"), a captive portal (network login window), or just to watch the data being sent around.
Since you're using HTTPS it was prevented when the certificate check failed, because with HTTPS in place that software has no way of presenting it's own page or eavesdropping, other than creating it's own certificate with your website name in it on the fly. Which, of course, was rejected by the browser, since either the user didn't expect it, or, if it's indeed a school/company network, the PC wasn't properly enrolled for use on the said network.
Either way, there is no problem with your webpage itself. Because Github manages the server for your Pages, chances you could create something causing that problem yourself are pretty much zero.
Sometimes it happens because of the wrong IP/DNS settings. Checking the below places might help resolve the issue:
Make sure you are using a common public DNS server. How to check the DNS server you are using depends on your operating system. Moreover, if you are using a VPN client and it has a DNS configuration, check that setting too.
Check if there is an IP address associated with GitHub in the system's hosts file. In Linux and macOS you may use sudo vi /etc/hosts. If there is one, turn that line into a comment by adding # at the beginning of the line. Save, exit, and check if you see that error again. Do step 3 only if you are still getting the same error.
Go to https://www.ipaddress.com, search for github.io, and add its IP address at the bottom of /etc/hosts file like this example: 140.82.114.4 github.io.
Hope this helps.

How to redirect a website according to country's IP address

I'm working on a messenger app whose server side code is developed in Erlang.
The problem which I'm facing is regarding redirection of website according to country specific domain.
For example: when user's types google.co in message box, it automatically displays google.co.uk, how can I redirect it to google.co.in if I'm in India?
For finding country's location, I found this library on github: https://github.com/mochi/egeoip
How can I use this geoLocation for redirecting to particular country specific website?
ScreenShot, when I entered facebook.com, it automatically displays preview in my local language.
But in case of my app, it shows preview in some foreign language, russian maybe.
I've read the comments, and since you are not considering having datasets as an option, I think what you may want to do is something like this:
First thing to understand is how those previews work. In any (popular) messaging app, if you type in a URL, the app will send a request to the URL and get the website metadata. Then it will be displayed in the UI.
The country detection, is a bit more complicated and done in a variety of ways. But thankfully, you (almost) don't have to do anything. This is a rather long topic, but I'll try to shorten it out.
Text Localization
In some websites (might be the case of Facebook's in your example), they do country detection on the application layer, and then based on that country, it will use a specific language for the website's text. This all usually happens before the website renders it's content, so you do not have to worry about it.
GeoDNS
This one occurs on the DNS layer, and probably the most popular. Domain names can be assigned a handful of IP addresses. These IPs can point to different versions of the website, and in the case of GeoDNS it will be up to the DNS manager to assign a country to an IP. So when a DNS query came from Russia, the requesting IP's country will be resolved and then the IP assigned to it (if any) will be returned. This is used by websites especially for country-specific features or content. Best example is Netflix.
Redirects
In case of Google redirecting you to a different domain, this might be how they do it. Country is being resolved via the IP address in the application (HTTP) layer, and then does a 301/302 redirect, pointing to the new domain name. This one, you may need to do something on. So given that your application needs to do an HTTP request to the URL the user has entered, if it returns a redirect, you must follow it. Many HTTP libs/clients already does this, but on some you might have to explicitly turn on the option to follow redirects.
One important thing to note is to do the HTTP request on the client side. Otherwise, you will be resolving to the same country (where your server resides) regardless of where your user is.

How to redirect a root (naked) domain to www - heroku and zerigo

I have a domain example.com and www.example.com. I'm using Heroku and Zerigo for DNS.. Right now I have a forwarding from the root domain to the www.example.com from my Hostgator account but that's not working. I'd prefer to use Zerigo with the redirecting or by using an ALIAS. A lot of the articles I've found talk about ALIAS and ANAMES but I can't find those on Zerigo unless an ALIAS is specified by the letter A.
Does anyone have a solution to pointing naked domains to their www using Zerigo??
I have done this already -
Went to Zerigo dashboard
Clicked Add Snippet
Click Heroku
Add both of those
Change the CNAME to my heroku app name
It seems like this makes it work temporarily only.
Heroku support didn't even give me a great answer at all...
Looks like one short-term solution would be to point an A-record to 174.129.25.170
If you go to http://wwwizer.com/naked-domain-redirect they free-of-charge redirect naked domains to full domains if you point the record to that address. Also, instead of using Zerigo I did this on my domain registrar's website. But using Zerigo should work as well.
Hope this helps anyone who has this trouble in the future. Remember, this isn't the best solution but it does work. If anyone has a better answer please let us know for future reference.
EDIT on April 10, 2014
The best solution is to buy an SSL certificate for your website. With Heroku, you can use their SSL Endpoint add-on which requires you to buy a certificate elsewhere. This isn't the easiest process, but will work 100% of the time.
Happy coding

Block website during maintenance except for testers

I would like to block the access to my website while I make changes to it. But I want some selected people and myself to have access to do tests. I found this method which is good (http://25yearsofprogramming.com/blog/20070704.htm), except for the fact it is based on ip addresses (I don't know the ips of everyone and can't ask).
How can I do the following:
- redirect all urls to a page like maintenance.php
- on that page, there is a form that people can use to enter a code
- if the code is valid, the redirection stops and they have access to the website normally
Thanks
If you dont want to use ip addresses you can setup htaccess with passwords for the your testers and everybody else will be redirected.
This site can help you with that: http://tools.dynamicdrive.com/password/

Reroute web traffic. Act as gateway

In hotels WiFi is unprotected and everyone can connect. But no matter the requested url you are redirected to their "Give a password" page.
I am interested in making something similar but I'm kinda lost. I don't know where to start.
What is the minimum hardware required for such thing, do I have to buy a special router that does this or I can simply get a computer with Apache or IIS and manage traffic from there?
edit:
Since second scenario is more likely to be the answer, I am looking for some info on where to start.