I don't really know how to explain this, so bare with me. But our Facebook pixel detected traffic from another domain. We only have one domain. We went to see what other domain it could possibly be referencing. It turns out, this other domain was a carbon copy of our site. The only thing that was different was the web address. Does anyone have a clue what is going on? It's as though someone is retargeting our customers to a mirrored website.
We tested the foreign site by placing an order using store credit given to ourselves on the backend of our site. The order went through and instead of showing the order was placed in the US, it said it was placed in Turkey.
This is over my head and I have no clue where to start solving this issue.
I've actually seen this happen to someone else before. I'm not sure what the motive behind doing something like this is - but if the orders from the cloned store are being paid to your gateway, then the upside is that you're not losing money over it. However, I do believe that the intent is somewhat malicious.
The most logical reason I have been able to come up with is that if your store has high amounts of traffic, is well known, and has a good SEO rating, the people that are cloning your store are trying to "SEO-Hijack" you in a sense. Essentially piggybacking off of your site because of the SEO ratings it already has in order to boost their own and potentially turn it into a separate store/website later.
This isn't necessarily something that can be fixed by BigCommerce since the copy of your store isn't on the platform whatsoever, since they are essentially just piggybacking off of your SEO rating. The best option here would be to do a domain WHOIS lookup for their domain and report it as fraud to their registrar as an attempt to get legal action to be taken or a cease & desist.
Sorry that this is happening to you!
Here's a helpful explanation that I was able to find and a helpful blog post on how to prevent it and the steps to take.
Oh no, I'm sorry to hear about this! As blurfus suggested above -- Please the BigCommerce Support team to report this as soon as you can. You can find their contact information here: https://support.bigcommerce.com/s/#contact
This question concerns the 'Contact Us' option on the support page for Actions on Google. We used it to ask for some clarification regarding a certain part of the Smart Home documentation recently (last Thursday, if memory serves), but have not heard back yet. Additionally, we have not received a confirmation e-mail to indicate receipt of our request. The 'quota' of remaining questions did properly go down (from 15 to 14), which makes us think the request may have been properly processed.
However, we are uncertain whether we will receive any response, or how soon we may expect one. Our request was sent with 'Medium' urgency.
Does anyone have any experience using this support option, who may be able to vouch for its efficacy? Additionally, is there a possibility to view currently 'open' support requests, to see if it is being looked into or has perhaps been closed?
The Contact Us page is generally for specific help related to your project. If your question is more general, like about the documentation, it may be preferable to ask in a broader forum such as Google+ or Stack Overflow, where more individuals with technical experience with the platforms will be able to provide help.
(Such as myself)
Title pretty much says it all. Blogs are under the same account. Asked this question on Quora with little response.
I'm looking for perhaps a web app, which automates the process. If there isn't anything already out there, I'm ready to build my own web app using Tumblr's API.
Take a look at this post describing for detailed explanation of moving data from one Tumblr to another http://vinylanswer.tumblr.com/post/42009904333/how-to-turn-your-secondary-tumblr-blog-into-your
Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 9 years ago.
Improve this question
I have a serious question. Is it ever ethical to ignore the presence of a robots.txt file on a website? These are some of the considerations I've got in mind:
If someone puts a web site up they're expecting some visits. Granted, web crawlers are using bandwidth without clicking on ads that may support the site but the site owner is putting their site on the web, right, so how reasonable is it for them to expect that they'll never get visited by a bot?
Some sites apparently use a robots.txt exactly in order to keep their site from being crawled by Google or some other utility that might grab prices and therefore allow people to do price comparisons easily. They have private search engines on the site so they obviously want people to be able to search the site; apparently they just don't want people to be able to easily compare their information with other vendors.
As I said, I'm not trying to be argumentative; I would just like to know if anyone has ever come up with a case where it's ethically permissible to ignore the presence of a robots.txt file? I cannot think of a case where it's permissible to ignore the robots.txt mainly because people (or businesses) are paying money to put up their web sites so they should be able to tell the Googles/Yahoos/Other SE's of the world that they don't want to be on their indices.
To put this discussion in context, I'd like to create a price comparison website and one of the major vendors has a robots.txt that basically prevents anyone from grabbing their prices. I'd like to be able to get their information but, as I said, I can't justify simply ignoring the wishes of the site owner.
I have seen some very sharp discussion here and that's why I would like to hear the opinions of developers that follow Stack Overflow.
By the way, there is some discussion of this topic on a Hacker News question but they seem to mainly focus on the legal aspects of this.
Arguments:
A robots.txt file is an implied license, especially since you are aware of it. Thus, continuing to scrape their site could be seen as unauthorized access (i.e., hacking). Sucks, but arguments like this have been made in other legal cases recently (not directly related to robots.txt, but in relation to other "passive controls".)
Grabbing prices violates no copyright law, including DMCA, since copyright does not include factual information, only creative.
Ethically, you should not grab prices because the vendor should have the ability to change prices without worrying about being accused of a bait/switch by people coming from your site.
Have you taken the high road, explaining the site to them and saying you'd love to include them in your list of vendors? Maybe they will love the idea and actually expose the data in a way that is easy for you to consume and less resource-intensive for them to produce.
There are no laws written directly about robots.txt because netiquette is generally followed. Don't be one of the "bad guys."
Some people filter robots because they use URL links to perform "actions" like adding things to carts, and robots leave them with massive numbers of abandoned shopping carts in their database.
Some people filter robots because they have exclusive prices that they can't advertise openly based on agreements with their vendors. You could be putting them in a bad position by exposing those prices on your site.
In this economy, if a company doesn't want to do everything possible to advertise themselves, it's their own fault that you don't include them.
The other use of robots.txt is to help protect web spiders from themselves. It's relatively easy for a web spider to get mired in an infinitely deep forest of links, and a properly constructed robots.txt file will tell the spider that "you don't need to go here".
Many people have tried to build businesses off building "price comparison" engines that scraped major sites.
Once you start getting any sort of traffic/revenue to speak of, you will receive a cease and desist. It's happened to dozens, if not hundreds of projects. I even worked on a small project that received a C&D from Craigslist.
You know how they say "It's easier to ask forgiveness than it is to get permission"? It doesn't hold true with page scraping. Get permission, or you will be hearing from their lawyers.
If you're lucky, it'll be early on, when you've got nothing to lose. If it's late, you may lose your business and all your work overnight, with a single letter.
Getting permission shouldn't be hard. Unless you're doing something sneaky, you're likely going to drive them additional traffic. Hell, once your product takes off, sites may be begging you, or even paying you to add their data.
One reason we allow robots to dig through the web without complaint is that we have a way to stop them if we want to. Protects both sides.
Remember the uproar when Cuil's robots were accused of going over-the-top, apparently acting like a DoS attack in some cases and using up bandwidth allowances of some small sites?
If too many people violate robots.txt we might get something worse.
"No" means "no".
To answer the narrow question, for the price comparison website you're probably best grabbing the price in real time, rather then scrapping the database in advance. Hard to imagine that being a problem.
An interesting IRL version of story involving The Harvard Coop:
Coop Calls Cops On ISBN Copiers.
Short answer: No.
On the narrow issue: If a seller says that their prices are secret, I think you have to respect that. I'd contact them and ask if they really don't want price comparison engines like yours to include them, or if the "no trespassing" sign is for technical reasons. If the latter, perhaps they'll provide you with an alternative. If the former, then I'd say too bad, they don't get included, they lose some business, and it's their problem.
Tangential rant: Personally, I get pretty annoyed with companies that make me jump through hoops to find out the price of their products, places that make me call and talk to a salesman so he can give me a hard-sell pitch, or worse, make me give them my phone number so their salesman can call and harass me. I figure that if they're afraid to tell me the price, it probably means that it's too high.
In general: A robots.txt file is like a "No Trespassing" sign. It's the owner's right to say who is allowed on their property. If you think their reasons are dumb, you can politely suggest they take the sign down. But you don't have the right to disregard their wishes. If someone puts a No Trespassing sign on his yard, and I say, "Hey, I just want to take a quick short cut, what's the big deal?" -- Maybe I'm stepping on his prized Bulgarian violet bulbs and destroying a valuable investment. Maybe I'm crossing his people's sacred burial ground and offending their religious sensibilities. Or maybe he's just an ornery jerk. But it's still his property and his right. Oh, and if I fall into the dangerous sinkhole after ignoring the No Trespassing sign, who's to blame? (In America, I could probably still sue him for all he's worth despite the fact that he warned me, but is that right?)
I'm showing some ignorance here, but I always thought a bot was something only sent out by a search engine. Like Google or Yahoo.
Thus, if you wrote an application that searched content on the internet, I wouldn't consider that a search engine bot, which to my knowledge is what robots.txt is trying to block.
But this may just be selective ignorance, because I might do it until the webmaster of that site contacted me and asked me to stop :)
If people make it available to public access, they shouldn't try to put limits on it. Adding a robots.txt file to your site is the equivalent to putting a sign on your lawn that says "Please don't look at me."
Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 6 years ago.
Improve this question
A client of ours is a membership organization and they are looking for functionality that seems closely aligned with Google Sites capabilities.
They want a system where their members can have a content managed site of their own that one or more admins can create by submitting a simple form.
The member organization could then add/remove pages, add/edit/remove content, add their own users, modify their color scheme and layout.
They would like the ability to have a url structure like, "member_org_url_to_be_named/member_name" - but it could also be subdomains (i.e. "member_name.member_org_url_to_be_named").
So they need a security hierarchy to be able to have different levels of users:
Admin - can add/edit/remove sites, users, etc.
Member Admin - can add/edit content within their site, add users that are also able to add/edit content within their site.
Member user - can add/edit content within their site.
From what I've seen and read, Google Sites seems to be able to handle this functionality. It's a little difficult to get in touch with someone there who would be able to tell me this definitively, however. So I'm wondering if there are any other platforms that might be able to handle this workflow.
Obviously, I'd love to hear from anyone who has implemented a system like this before. I'd also love to hear from anyone who has actually used Google Sites.
(Disclaimer: I work for Google. I don't know much about Sites though.)
Have you actually tried to use Google Sites for this? It strikes me that it shouldn't take very long to give it a whirl. If you have any Sites-specific questions, the Google Sites help centre and user forum are probably good starting places.
This sounds like content management with roles. Drupal fits this purpose pretty much perfectly.
http://drupal.org/
I've used Google Sites (the free "standard edition") a very little bit, it was easy to setup + easy to reconfigure my DNS records via nearlyfreespeech.net to setup CNAME and MX records to a domain I own.
The mailing list stuff works nicely. The site editing is very easy for anyone to use but a bit slllllooooowwww and somewhat clumsy, and doesn't appear to "play nicely" with the concept of uploading/downloading via FTP/SFTP/etc. I don't like the idea of my group's users spending all this time developing a website, that I can't backup or transfer to someone other than Google if I run into an issue.
I don't know if these issues are addressed in the pay version of Google Sites. For the moment I'm definitely keeping the email-mailing-list features going, but looking around elsewhere for something similar that works better.
(If you find something please post!)