As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance.
Closed 10 years ago.
I am trying to interpret HttpReferer strings in our server logs. It seems like there is quite a high number of empty values.
I am wondering how many of these empty values are due to direct hits from people entering our URL directly into a browser and how many might be due to some kind of blocking utility that prevents the Referer from being sent.
I really have no idea how many people are using tools or browsers or 'anonymizers' that might block the refer. Any input?
I personally disable it using "Web Developer" extension of Firefox, only because of some "helpful" sites that highlight the search terms that I used to get to that page.
Thanks, I am fully capable of installing a highlighter plugin, or search for the words inside your page.
I think a large proportion may actually be caused by ISPs' restrictions. I know my ISP (BT, in the UK) filters it out (probably at the router) which is bloody annoying at times.
As it turns out, the block is actually put in place by Zone Alarm, a software firewall, which is often supplied by ISPs.
Opera has a quick toggle in the F12 menu that you can switch on "Send Referrer Information" or not to the site(s) you're surfing around.
I used to log all this stuff in my blogging app - pretty much all bots never send referrer info.
You should be able to make an educated guess as to whether it's down to it being filtered out or just people entering the URL.
If the first hit has no referrer but the loading of images/CSS etc has referrer info then they just entered the URL directly.
If they only ever pull down HTML with no images or CSS they are most likely a bot (or using Lynx perhaps).
If they pull down HTML, images and CSS with no referrer then it's being filtered out.
Some antivirus software is retarded and also started doing this for "security" reasons.
We had an email form that used referrer tracking to eliminate the gist of the random bot-spam an some people moaned that it didn't work.
Not entirely wonderful, but there are far more good uses of the referrer header than for just 'lets be evil and watch where people came from' to legitimise it.
( Some antivirus packages have been known to stop email working altogether for instance, and the clients will ring you and tell you its your fault until you tell them to get rid of their rubbish i've never heard of that company before' antivirus for the 40th time and they listen and their problem magically resolves )
Addendum on security
Referrer tracking is very useful for keeping state within a site. (Without needing cookies)
Referrer tracking is very useful to acknowledge that a users origin was from the site itself ( without needing cookies )
Though I see a legitimate privacy concern with leaking 3rd party sites leaking data via referrer, and the recipient seeing that.
So:
3rd-party => site # referrer preferred blank
local => local # referrer preferred kept
At least here you can easily distinguish between a "hotlink" from an external source and an internal link.
Also, because of this, cross-domain referrals from SSL websites are blocked by default by some browsers.
Related
As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance.
Closed 10 years ago.
With the release of the new Facebook commenting module, could people please share their experiences with the various commenting systems -- specifically, Disqus, Echo, Intense Debate, and Facebook Comments?
What are the pros and cons of each system?
Thanks!
For the All Commenting System
The pros:
One simple login for all sites.
Spam control.
Expanded social media presence.
Easy comment subscriptions.
The cons:
Complicates the comment process.
Lessens your control.
Facebook Commenting System
Pros
Real names and identities greatly reduces the number of trolls and anonymous cowards in comments.
Social virality boosts traffic by creating a feedback loop between Facebook and participating sites. Friends pull in their friends, creating a social entry point to your site.
Automatic sign-in if you already signed into Facebook elsewhere, lowers the barriers to commenting.
Most “liked” comments get voted to the top. It also knows who your friends are, so you will see those comments first.
Cons
No support for Twitter or Google IDs, which leaves out the other half of the social Web.
No backups and other lock-ins will make it hard for sites to leave.
If you work somewhere that blocks Facebook, you are out of luck.
Your friends might be surprised to find their replies in your Facebook News stream reproduced on another site’s comments. Expect a backlash.
Moderation bugs, no view counts at the top of posts or ways to highlight site owners/writers in comments.
Source : http://techcrunch.com/2011/03/01/pros-cons-facebook-comments/
Intense Debate
Pros
Highly customizable. CSS style sheet is easy to work with and more importantly,
Well integrated into WordPress. It’s made by the same company, apparently.
Can add a bunch of add-ons to the comment system, such as CommentLuv.
Feels simple and crisp.
Cons
Does not render properly under IE9 and Opera (just one button misplacement in Opera).
Importing comments process is buggy.
Replies are hidden and you have to click the ‘Replies’ text to expand them. Replies were already shown for pages with few comments, but were not shown for those with many comments.
DISQUS
Pros
Lots of login options. You can use just about any of your login credentials (Google, Facebook, Twitter, etc). Of course, you can still post comments anonymously if you choose to.
Looks nice and clean, though it took me a long time to customize the CSS. Still not 100% satisfied.
It’s popular. Lots of websites use it; therefore, many people know what they’re dealing with when they see Disqus logo in the comment section of a blog.
Cons
Not as customizable as Intense Debate.
By default, it inherits the blog’s main theme style sheet.
All URLs in comments are auto-linked.
Doesn’t integrate well with WordPress comments.
Comments count
The Help section is lacking.
Source: http://www.scamfreeinternet.com/2011/04/disqus-vs-intense-debate/
The major pro of each of these systems is that you don't have to write them youself.
Personally I wouldn't use Facebook Comments, because (believe it or not) not everbody uses (or even likes) facebook.
Discus is very good because you can sign in with a variety of services, so you're likely to get quite a few people using it who might not compared to Facebook Comments.
a post that grew out of a comment to this page:
Unfortunately my comment was rejected (too many links).
So here is an excerpt from that comment, and a link to the now-fully-fledged blog post, in which I have aggregated all the Disqus links, pro, con, and neutral, that illustrate their respective points.
Having contemplated the wonderful pitter-patter of keyboards that,
all-too-often, does not warm this blog from underneath, I decided
renovation might be just the thing. Disqus has an overall style that
definitely appeals to me. According to the brief overview I quickly
search-engined for myself, Disqus has problems with privacy and
anonymity, just like (it should by now go without saying) Facebook.
The question, for me, is: exactly how close is the resemblance.
And the real question is, how dissimilar can any data-mining,
profile-generating, social-network-enabling corporate entity be from
such creeping Evil. Breaches of privacy cannot be easily explained by
accident, by exceptional circumstances, especially if they recur.
They are soon exposed for what they are: evidence of the sort of
underlying motivations best met with corresponding breaches of trust.
I remain as yet unconvinced and undecided.
In case anyone in interested, these are the Disqus issues that my very
brief search uncovered, with relelevant links, loosely seperated into
general, pro, and con:
the accidental public disclosure of private information such as email address, photo, or real name, when signing up or signing in; the
forcing of users to enable 3d party cookies; difficulty or
impossibility of integration with exclusive HTTPS.
the forcing of users to enable 3d party cookies; difficulty or impossibility of integration with exclusive HTTPS.
difficulty or impossibility of integration with exclusive HTTPS.
For the links to which this excerpt refers, follow go to
A Better Comment Platform Should Be Possible
I just tried LiveFyre and disliked it as it still has some major bugs like spam getting through, lack of comment moderation etc.
I decided to try Disqus and boy I LOVED IT FROM THE BEGINNING.
Right now I'm using both Facebook comments and Disqus on my blogs. Facebook comments helps in a way to drive traffic to my blogs from Facebook ;-)
I agree with use Disqus instead of Facebook comments system, the major advantage of facebook comments is that the comments will syn with comments in facebook, if you don't need this, just go Disqus. MICBook.net now also consider use Disqus, you could take a look when its ready, the website is http://micbook.net.
Facebook commenting is the best option in my opinion.
When you enter something and you see those lots of people commenting and you can see their pictures etc... Makes people want to enter the discussion. Or, at least, they read some comments more than they would on normal comments.
And, when you comment, the text is inputed on your wall feed and most of your friends will see it.
Thats why, in my opinion, facebook commenting is the best choice today to use.
att,
Jonathan
Dont like to use facebook. We could be better off with disqus. Facebook has no option on deleting comments. So how can it be moderated. For example if your website have been spam to death then how do you delete the spam posts? the answer is , you cant. Facebook is good for viral but not for commenting, I wouldn't recommend it.
As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance.
Closed 9 years ago.
Note: I have seen this and tried to take as much from it as possible; but I believe my context is different.
I am working on a small-ish project. Call it Foobar. I'm wanting to get this done more organised..I've tried a few projects, mostly as an unorganised programming-as-a-light-hobby student. I'm trying to get more organised; 90% of those projects went after I either failed to document at all, or because I lost them.
As such, I've been thinking about getting version control/hosting going. Not only will it organise me more, but (a big if here) if it gets anywhere into a usable state, it will be easier for people to get.
The two places I'm considering are Google Code and GitHub. From the question I linked:
Google Code:
As with any Google page, the complexity is almost non-existent
Everyone (or almost everyone) has a Google account, which is nice if
people want to report problems using the issues system
GitHub:
May (or may not) be a little more complex (not a problem for me though) than Google's pages but...
...has a much prettier interface than Google's service
It needs people to be registered on GitHub to post about issues
I like the fact that with Git, you have your own revisions locally
From this I'm leaning towards GitHub, as Google Code doesn't look appealing to me.
For a small hobby project - basically making community features irrelevant - are there features that should take me over to one side or the other?
I prefer Google Code since it's just easier for my small personal projects. At the end of the day, for free projects, it's hard to steal time from family, friends or other commitments and the key to making small free projects a success is being realistic with your time. (Elsewise, you get the "80% done" problem.)
Google Code now has GIT support.
Biggest advantage of Google Code is that you don't need a website.
- The frontpage of the project is enough.
- You can add simple binary downloads in the Downloads section.
- In comparison, GITHUB's interface is REALLY confusing to non-programmers. Your frontpage is full of technobabble and so unless it's a coder's tool, you'll need a separate website.
- Marketing's really good- You get a good rank on Google and often you'll be picked up and sometimes reviewed by other download sites. There's no sense donating your time if no one can find your project.
If it is entirely a coder's tool (not just a handy IT tool), then perhaps GITHUB is better.
You say "I believe my context is different", but don't give any reasons why it is. As such, I can't offer you any specific suggestions other than the generic pros and cons, which are outlined in various documents and tutorials online.
My suggestion: pick a program first (git, Mercurial, or SVN) and use it. Find a hosting site that supports the software (at the time of this answer, GitHub for git, BitBucket or Google Code for Mercurial, Google Code for SVN) and use it. If you run into problems, switch to another one.
I've used all three, and typically the problem isn't the hosting, but the fact that you need to learn the program itself. All of the hosting providers listed here will suit you fine until you have a specific reason why it doesn't.
I would go with Github. The single reason for this is, that Google code shows your email and your full name (name only if you have google+ i think). And you cannot disable this at the moment.
Let's split the problem into two parts: for developers and for users.
In fact, if just terminal users are considered, both google code and gitbud has friendly interfaces, and as we all know, google is more well-known towards those who do not program.
But when we turn to programmers, git is more fashion and more comfortable(question?).
So, personally I will choose google code if I am planning an terminal user oriented product and github of course if I want to involve lots of potential collaborators of I was developing an complete programmers' product, like a API something.
As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance.
Closed 10 years ago.
For instance, if the original message (message 1) is...
Hey Jon,
Want to go get some pizza?
-Bill
And the reply (message 2) is...
Bill,
Sorry, I can't make lunch today.
Jonathon Parks, CTO Acme Systems
On Wed, Feb 24, 2010 at 4:43 PM, Bill Waters wrote:
> Hey John,
> Want to go get some pizza?
> -Bill
In Gmail, the system (a) detects that message 2 is a reply to message 1 and turns this into a 'thread' of sorts and (b) detects where the replied portion of the message actually is and hides it from the user. (In this case the hidden portion would start at "On Wed, Feb..." and continue to the end of the message.)
Obviously, in this simple example it would be easy to detect the "On <Date>, <Name> wrote:" or the ">" character prefixes. But many email systems have many different style of marking replies (not to mention HTML emails). I get the feeling that you would have to have some damn smart string parsing algorithms to get anywhere near how good GMail's is.
Does this technology already exist in an open source project somewhere? Either in some library devoted to this exclusively or perhaps in some open source email client that does similar message threading?
Thanks.
There's a good article written by Zawinski here:
http://www.jwz.org/doc/threading.html
I believe Gmail works by subject title. I can't check it at the moment, but a quick change to the title might break the threading.
The following is difficult to predict, as you mention:
On Wed, Feb 24, 2010 at 4:43 PM, Bill Waters wrote:
but grabbing the email title Pizza tomorrow and assuming a prefix of Re: Pizza tomorrow is considerably more predictable. You could also assume the cases of FW: and RE: (in caps).
Do you mean to solve problems where the correspondent doesn't set In-Reply-To: or References: header fields?
Otherwise, you might use mutt and configure it to not show quotes by default.
(Should be done by any other mail-tool on earth too. (Well, i never got a tree-thread-view in Outlook.)
[edited below in reaction to comment]
If you try to build your own software, then this question obviously is suited well. But then, I can only give you my 2c on this. If you cannot rely on the explicit headers, than the only thing to do is take a bunch of mails, learn the most common phrases used to indicate quotes. (Luckily there are some conventions, and date formats and names/emails are not completely arbitrary.)
If you do this for analysis of communication threads, you probably want to indicate the likelyness of the relation. If you only do it for convenience of the user... well,... my personal opinion? Don't sweat about people not able to use a decent mailtool.
What kind of Mail Delivery Agent are you using?
Are you developing your own? In that case, are you planning to implement IMAP protocol?
If you're using Cyrus (or any other product that handles IMAP) with SORT and THREAD extensions, then it's already built in.
In both cases, you should take a look at RFC 5256.
You could have a look at sup http://freshmeat.net/articles/sup-gmail-meets-the-console as it does almost what you want
Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 9 years ago.
Improve this question
I have a serious question. Is it ever ethical to ignore the presence of a robots.txt file on a website? These are some of the considerations I've got in mind:
If someone puts a web site up they're expecting some visits. Granted, web crawlers are using bandwidth without clicking on ads that may support the site but the site owner is putting their site on the web, right, so how reasonable is it for them to expect that they'll never get visited by a bot?
Some sites apparently use a robots.txt exactly in order to keep their site from being crawled by Google or some other utility that might grab prices and therefore allow people to do price comparisons easily. They have private search engines on the site so they obviously want people to be able to search the site; apparently they just don't want people to be able to easily compare their information with other vendors.
As I said, I'm not trying to be argumentative; I would just like to know if anyone has ever come up with a case where it's ethically permissible to ignore the presence of a robots.txt file? I cannot think of a case where it's permissible to ignore the robots.txt mainly because people (or businesses) are paying money to put up their web sites so they should be able to tell the Googles/Yahoos/Other SE's of the world that they don't want to be on their indices.
To put this discussion in context, I'd like to create a price comparison website and one of the major vendors has a robots.txt that basically prevents anyone from grabbing their prices. I'd like to be able to get their information but, as I said, I can't justify simply ignoring the wishes of the site owner.
I have seen some very sharp discussion here and that's why I would like to hear the opinions of developers that follow Stack Overflow.
By the way, there is some discussion of this topic on a Hacker News question but they seem to mainly focus on the legal aspects of this.
Arguments:
A robots.txt file is an implied license, especially since you are aware of it. Thus, continuing to scrape their site could be seen as unauthorized access (i.e., hacking). Sucks, but arguments like this have been made in other legal cases recently (not directly related to robots.txt, but in relation to other "passive controls".)
Grabbing prices violates no copyright law, including DMCA, since copyright does not include factual information, only creative.
Ethically, you should not grab prices because the vendor should have the ability to change prices without worrying about being accused of a bait/switch by people coming from your site.
Have you taken the high road, explaining the site to them and saying you'd love to include them in your list of vendors? Maybe they will love the idea and actually expose the data in a way that is easy for you to consume and less resource-intensive for them to produce.
There are no laws written directly about robots.txt because netiquette is generally followed. Don't be one of the "bad guys."
Some people filter robots because they use URL links to perform "actions" like adding things to carts, and robots leave them with massive numbers of abandoned shopping carts in their database.
Some people filter robots because they have exclusive prices that they can't advertise openly based on agreements with their vendors. You could be putting them in a bad position by exposing those prices on your site.
In this economy, if a company doesn't want to do everything possible to advertise themselves, it's their own fault that you don't include them.
The other use of robots.txt is to help protect web spiders from themselves. It's relatively easy for a web spider to get mired in an infinitely deep forest of links, and a properly constructed robots.txt file will tell the spider that "you don't need to go here".
Many people have tried to build businesses off building "price comparison" engines that scraped major sites.
Once you start getting any sort of traffic/revenue to speak of, you will receive a cease and desist. It's happened to dozens, if not hundreds of projects. I even worked on a small project that received a C&D from Craigslist.
You know how they say "It's easier to ask forgiveness than it is to get permission"? It doesn't hold true with page scraping. Get permission, or you will be hearing from their lawyers.
If you're lucky, it'll be early on, when you've got nothing to lose. If it's late, you may lose your business and all your work overnight, with a single letter.
Getting permission shouldn't be hard. Unless you're doing something sneaky, you're likely going to drive them additional traffic. Hell, once your product takes off, sites may be begging you, or even paying you to add their data.
One reason we allow robots to dig through the web without complaint is that we have a way to stop them if we want to. Protects both sides.
Remember the uproar when Cuil's robots were accused of going over-the-top, apparently acting like a DoS attack in some cases and using up bandwidth allowances of some small sites?
If too many people violate robots.txt we might get something worse.
"No" means "no".
To answer the narrow question, for the price comparison website you're probably best grabbing the price in real time, rather then scrapping the database in advance. Hard to imagine that being a problem.
An interesting IRL version of story involving The Harvard Coop:
Coop Calls Cops On ISBN Copiers.
Short answer: No.
On the narrow issue: If a seller says that their prices are secret, I think you have to respect that. I'd contact them and ask if they really don't want price comparison engines like yours to include them, or if the "no trespassing" sign is for technical reasons. If the latter, perhaps they'll provide you with an alternative. If the former, then I'd say too bad, they don't get included, they lose some business, and it's their problem.
Tangential rant: Personally, I get pretty annoyed with companies that make me jump through hoops to find out the price of their products, places that make me call and talk to a salesman so he can give me a hard-sell pitch, or worse, make me give them my phone number so their salesman can call and harass me. I figure that if they're afraid to tell me the price, it probably means that it's too high.
In general: A robots.txt file is like a "No Trespassing" sign. It's the owner's right to say who is allowed on their property. If you think their reasons are dumb, you can politely suggest they take the sign down. But you don't have the right to disregard their wishes. If someone puts a No Trespassing sign on his yard, and I say, "Hey, I just want to take a quick short cut, what's the big deal?" -- Maybe I'm stepping on his prized Bulgarian violet bulbs and destroying a valuable investment. Maybe I'm crossing his people's sacred burial ground and offending their religious sensibilities. Or maybe he's just an ornery jerk. But it's still his property and his right. Oh, and if I fall into the dangerous sinkhole after ignoring the No Trespassing sign, who's to blame? (In America, I could probably still sue him for all he's worth despite the fact that he warned me, but is that right?)
I'm showing some ignorance here, but I always thought a bot was something only sent out by a search engine. Like Google or Yahoo.
Thus, if you wrote an application that searched content on the internet, I wouldn't consider that a search engine bot, which to my knowledge is what robots.txt is trying to block.
But this may just be selective ignorance, because I might do it until the webmaster of that site contacted me and asked me to stop :)
If people make it available to public access, they shouldn't try to put limits on it. Adding a robots.txt file to your site is the equivalent to putting a sign on your lawn that says "Please don't look at me."
As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance.
Closed 11 years ago.
I am looking for a CMS that would be incredibly user-friendly and would have the following features:
really simple message board (no login required)
family tree
story telling area
photo section
news section
Is there anything out there like this that is really easily configurable? I've already messed around with Mambo and Family Connects, but I didnt like either of those. In the past I've just programmed my own websites, for lack of easily implementable features. However, I've assuming there's something I need out there just like this, that I can't find. Thanks.
I don't want anyone to have to login, for one. The is for a family website, and much of my family really don't know what a website is, let alone how to use one. I want a super simple website with huge buttons and not a whole lot of distractions. Family Connects is a good example of what I want, except the photo album is horrible. I want people to post messages without logging in or signing up, and haven't seen that ability in mambo sites I've looked at.
I can understand your stipulation that your users (family) shouldn't have to sign up - but without a sign-in, your site will be a free-for-all for spammers, hackers and other bored Internet denizens.
That said, my suggestion is to use WordPress for a front end - register your family members yourself, and use a very basic template - or better yet, create one.
I have created a CMS for exactly what you are looking for. My family uses it all the time and the majority of them are not computer savy. The only downside is that it requires a login, but like other people have said, their really isn't a way around that if you want your information to be private.
Anyway, if you are still looking, try http://www.familycms.com/
I've been using http://www.myfamily.com/ and it fits all my needs. It includes:
Pictures (with option to order prints)
Discussion
Family Trees (free from ancestry.com)
Videos
Files
Events
I've setup CMS Made Simple a couple times now. It's all PHP and you can edit it to your heart's content. Give it a try.
CMS made simple seems to die according to this study about content management systems found on MytestBox.com
But if it's just for a family website...
maybe you can try other CMSs which any web hosting company provides (like Joomla or Wordpress).
These can be installed in several clicks (especially Wordpress - you can build a good site in Wordpress and it's very easy to maintain it).
For a family website I thiknk Wordpress is the best and enough (lots of plugins and skins can be found for it on the web.
If you're going for a family website you do have the option of removing the usernames/passwords/accounts by setting it up as an intranet site. Then you can browse it at home or from selective addresses.
I recommend geni.com. It's much better than Myfamily.com