How to refuse wget? - wget

I am uploading images to a public directory and I would like to prevent users from downloading the whole lot using wget. Is there a way to do this?
As far as I can see, there must be. I have found a number of sites where, as a public browser, I can download a single image, but as soon as I run wget against them I get a 403 (Forbidden). I have tried using the no-robot argument, but I'm still not able to download them. (I won't name the sites here, for security reasons).

You can restrict access using user-agent string, see apache 2.4 mod_authz_core for example.
Wget also respects robots.txt directives by default. This should repent any casual user.
However, a careful look into wget manual will let to bypass these restrictions. Wget also lets to add random delays between requests, so even advanced techniques based on access pattern analysis may be bypassed.
So the proper way is to mess with wget link/reference recognition engine. Namely, the content you want to keep unmirrored should be loaded dynamically using javascript and the urls must be encoded in a way that would require js code to decode. This would protect your content, but would require to manually provide unobfuscated version for web bots you want to index your site, such as google bot (and no, it is not the only one one should care about). Also, some people do not run js scripts by default (esoteric browsers, low-end machines, mobile devices may demand such policy).

Related

Can Squarespace connect to an external Json Rest-API...?

I am new to Squarespace and I was wondering if it can interact with an external Rest-API using JSON?
For example, say I have a Database being hosted privately and I want data from it to be shown via Squarespace and certain pages being restricted according to the user's privileges.
Is any of the above possible, and if so can you direct me to an example? I seem not to be able to find anything on the above via google.
Thanks
From Squarespace:
Squarespace doesn’t support server-side code, including PHP, Ruby, Ruby on Rails, and SQL.
Therefore, the only way to connect to an external API (besides those supported by Squarespace's official 'extensions') is to use "client-side" (in-browser) JavaScript.
So, the database solution which you use must be capable of securely handling client-side connections (for example, Firebase can do that). To interface with it, you must add the JavaScript to your Squarespace site via code block or code injection. An example explanation of doing that can be found at this question.
As to allowing/disallowing content based on data returned from the database, it can be done, but only client-side. That means that, while you can make the site appear to restrict access and/or make it inconvenient for others to access certain pages based on information in the database, because it is all client-side, it could technically be circumvented by someone if they are familiar with web-development, web-inspector, etc. So it's not something you'd want to do if it is critical that the content be truly restricted.
Squarespace does have its own "Members Areas" which can be used to solve content access problems. However, it's extremely limited at the moment, and there are many scenarios it does not address.

A/B Test a Page Step in a Single Page without a new URL

I am trying to figure out how to run an A/B Test for a change on a Page Step for a Single Page. The idea is we have a payment flow with several page steps each containing a form. We'd like to swap out forms and test how our users react. We are trying to avoid changing the URL.
I looked into tools such as Google Analytics, but that requires a different URL to run the A/B test. The hesitation about creating a new URL is because our users are known to bookmark them, and we don't want to keep a backlog of redirects from invalid URLs, also we'd like to avoid constantly deploying new URLs for our tests.
I cannot seem to find any tool to do this, so I've tried to think of a few solutions but I'm not having a lot of luck.
My best idea is to build both a and b forms into the page, and when a user accesses the flow, the session randomly(based on a preset%) stores a value that dictates whether the user is in test a or b. Then when they step into that form, the server will serve the proper form to them. If they abandon their session, we'd track that, and if they complete the action, we'd track that.
I feel like there should be a better solution, but I just cannot come up with one.
My results online were either blogs showing how to approach it from a high level, and all of them used different URLs, I have found almost no developer resources.
Thanks.
We're using ExtJS 4.2.2, and .NET as our server.
Whenever you need the server to be involved, you need server-side instrumentation. No free tools offer that, but you could consider Optimizely "full-stack" (has support for C#) or Variant (does not yet).

one robots.txt to allow crawling only live website rest of them should be disallowed

I need guideline about using of robots.txt problem is as following.
I have one live website "www.faisal.com" or "faisal.com" and have two testing web servers as follows
"faisal.jupiter.com" and "faisal.dev.com"
I want one robots.txt to handle this all, i don't want crawlers to index pages from "faisal.jupiter.com" and "faisal.dev.com" only allowed to index pages from "www.faisal.com" or "faisal.com"
I want one robots.txt file which will be on all web servers and and should allow indexing only live website.
The disallow commands specifies only relative URL so I guess you cannot have the same robots.txt file for all.
Why not force HTTP authentification on the dev/test servers ?
That way the robots wont be able to crawl these servers.
Seems like a good idea if you want to allow specific people to check them but not everybody trying to find flaws in your not yet debugged new version ...
Especially now that you gave the adresses to everybody on the web.
Depending on who needs to access the dev and test servers -- and from where, you could use .htaccess or iptables to restrict at the IP address level.
Or, you could separate your robots.txt file from the web application itself, so that you can control the contents of it relative to the environment.

what language combination should I use to permanently modify a webpage?

I'm trying to make a page with 2 fields (email and feedback) and 1 button. When the user clicks on the button, a table on a page else where is filled in with the data, permanently.
Does anyone have recommendations of how I should do this? I'd like to avoid having a script send me an email, or writing to a database. But if I have to, which ever is easier to configure would be prefered.
Thanks,
Matt
So you want a comments system like you find on most blogs? You'll need to store those comments somewhere, probably in a database. As for how to do it, that would depend entirely on what you already know and what the site is currently written in. You could use PHP and MySql if you already have those skills, or ASP.Net/SQL Server, or if you want to be down with the cool kids you can use Ruby on Rails or Python/Django.
If you post what languages you already have experience in, and/or what the site is written in you might get a more specific answer :-)
There're 2 types of scripts: server side and client side. The client side script (JavaScript) stores info only for particular visitor on his computer and this can't be seen by anybody else.
You need a server side script to save feedback on the server. The language or technology depends on the hosting server you use. Not all hosting services allows server side scripts. You need first to find out what scripting languages and technologies are supported by your hosting provider. Then we can help you more.
ADD:
For an unexperienced persons I recommend to search for hosting services which has most needed functionality. Something like blogs, etc. On such services you could create pages that will have comments and feedback and many more.
While it may seem outdated it's not necessarily a bad design. You can use PHP or Perl (due to it's string parsing capabilities) and simply store the main page on the disk.
Here's your sudo code/design...
You'll need need an html page that looks as follows
<tr><td>email</td><td>comment 1</td></tr>
<tr><td>email 2</td><td>comment 2</td></tr>
<tr><td>email 3</td><td>comment 3</td></tr>
Then you'll need a php script page that will read this html file in and display it.
The php page will also contain code for dealing with a user submitted comment. When a user posts a comment you need to open the html page with the rows in it and append to that file.
You need to be careful with this design however because you may run into write concurrency issues if two people attempt to read the file at the same time. Add code to handle this gracefully accordingly.

Using web hit counter to track application usage, recommendations wanted

I have an internal tool written in java. It would be useful to get a little
feedback on how much it is used by colleagues.
A simple solution would be to have the application display an image which it fetches from
a web hit counter like application and just look at how often the image is accessed.
So what I am looking for: a stand-alone application (i.e. no Apache modules, cgi scripts, etc),
which serves one or a couple of static images and and can log accesses, preferably with as
little as possible of support of everything else.
Searching for "hit counter" gave little relevant, "lightweight http server" was more relevant, although mostly overkill still. Any suggestions?
You could try using Google Analytics. Most of the time, people using Google Analytics are tracking pageviews on a web page, and Google Provides some javascript that you can place on your page and it will track the visits to that page as well as browser capabilities/etc. Behind the scenes, that javascript is placing an image tag on the page in the manner you describe.
However, since your application is java and not a web app (I assume it's a standalone and not an applet), you won't be able to include Google's javascript (unless you embed a javascript interpreter...yick). Fortunately, it is possible to use Google's analytics without javascript.
The trick is that Google's scripts use the image http://www.google-analytics.com/__utm.gif and pass parameters via the query string. You can find a list of the parameters you can pass to the query string here. So all you'd have to do is figure out what the query string should be and have your client make the request to google's image (after setting up your google analytics account, of course).
Just use Google Analytics, it's really easy and requires a short script on your pages.
Michal Kebrt's simple UNIX HTTP server does exactly what I was looking for.