what's the purpose of the '--delete-after' option of wget? - wget

I came across the "--delete-after" option when I was reading the manpage of wget ?
what's the purpose of providing such an option ? Is it just for testing the page is ok for downloading ? Or maybe there are other situations where this option is useful, I hope you guys may give me some hints.

With reference to your comments above. I'm providing some examples of how we use it. We have a few websites running on Rackspace Cloud Sites which is a managed cloud hosting solution. We don't have access to regular cron.
We had an issue with runaway usage on a site using WordPress because WP kept calling wp-cron.php. To give you a sense of runaway usage, it used up in one day the allotted CPU cycles for a month. Anyway what I did was disable wp-cron.php being called within the WordPress system and manually call it through wget. I'm not interested in the output from the process so if I don't use --delete-after with wget (wget ... > /dev/null 2>&1 works well too) the folder where wget runs would get filled with hundreds of useless logs and output of each time the script was called.
We also have SugarCRM installed and that system requires its cron script to be called to handle system maintenance. We use wget silently for that as well. Basically a lot of these kinds of web-based systems have cron scripts. If you can't call your scripts directly say using php on the machine then the other option is calling it silently with wget.
The command to call these cron scripts is quite basic - wget --delete-after http://example.com/cron.php?parameters=if+needed

I'm using wget (with cron) to automate commands to a web application, so I have no interest in the contents of the pages. --delete-after is ideal for this.

You can use it for testing if a page is downloading ok, but usually it's used to force proxy servers to cache their contents.
If your sitting on a connection where there's a network appliance caching content between the site and your endpoint, and you have a site that's popular among users on that network, then what you may want to do as a sysadmin, is to use a down level machine just after the proxy to script a recursive "-r" or mirror "-m" wget operation.
The proxy appliance will see this and pre-cache the site and it's assets, thus making site accesses for uses after said proxy a bit faster.
You'd then want to specify "--delete-after" to free up the disk space used unless your wanting to keep a local copy of all sites you force to cache.

Sometimes you only need to visit a website to set an IP address - say if you are rolling your own dyn dns service.

Related

three ways to let PHP and a regular user edit the same files

I am a web developer, and for some upcoming projects I would like to use a file-based CMS. This means that many of the files I create at the start must be editable by the PHP user later, but also remain editable for my user (and also the other way around). My PC runs Debian 9, which I love but am not super knowledgeable about, and I have also just set up a local network server with Debian 9 for backups and possibly file sharing. (I'm using Webmin to configure this, which reflects my level of command line skills).
On my online shared hosting server, the PHP user and the FTP user seem to be the same, and 644/755 permissions work fine, this is also recommended by the CMS I'm using. I would like to mimic this on my computer so I don't have to fiddle with permissions all the time. But how do I do this? Currently, my regular user (anna) does not have access to www-data's files and vice versa. Putting them in the same group still means changing file permissions. Making anna the PHP user is a Bad Idea (as far as I understand it) because anna has sudo permissions.
So far I have researched three possible solutions that I don't really know very much about, and I would like to know which is the best route to take.
Develop locally on my computer and use apache-mpm-itk or suPHP to let PHP edit the files (I got that idea from this question on ServerFault).
Develop locally on my computer and rsync the files to my server with grunt-rsync, and somehow get rsync to set the ownership to www-data (another ServerFault thread helping here).
Mount the project's server directory, which is owned by www-data, on my computer with SSHFS and then either edit the files on the server directly or copy them over from my local directory with grunt-copy.
What do you think: from a security and ease of use perspective, which is the best way? Or do you know an even better one?
Thank you for taking the time to read and think about this!
Anna~
I figured it out! I finally ended up reading about running PHP as CGI instead of as an Apache module, and that this would solve my permissions problem. Plus, as far as I understand it, there are no extra security precautions to take when I'm the only one working with it on my local computer.
In case someone comes across this who might find it helpful, here's what I did (basically following these instructions):
I installed php7.0-fpm
Edited /etc/apache2/sites-enabled/000-default.conf and put the following just before </VirtualHost>:
DirectoryIndex index.php
<LocationMatch "^(.*\.php)$">
ProxyPass fcgi://127.0.0.1:9000/var/www/html
</LocationMatch>
I activated the Apache module proxy_fcgi (via Webmin, which apparently does an automatic Apache restart)
In /etc/php/7.0/fpm/pool.d/www.conf I commented out a listen line and put another below like this:
; listen = /run/php/php7.0-fpm.sock
listen = 127.0.0.1:9000
I then restarted PHP-FPM with this command: /etc/init.d/php7.0-fpm restart (a little different from the instructions, I'm on Debian 9). After that, phpinfo() gave me the Server API "FPM/FastCGI".
And finally, I changed the user and group from www-data to anna in three places, twice in /etc/php/7.0/fpm/pool.d/www.conf and then once more in /usr/lib/tmpfiles.d/php7.0-fpm.conf (this last bit may be Ubuntu/Debian specific, my thanks go to Keith for a comment on StackExchange).
And that was it! :-)

Google Cloud Storage - GSUtil Update fails because file used by another process

We use an ETL process to pull data from Google Cloud Storage, but annoyingly it hangs everytime Google releases udpates to GSUtil, because it sits at a prompt asking if you want to update the library. Fine if you are doing this manually, but not cool when it's being run in an automated SSIS package, as jobs don't finish for days and you keep wasting time with the same stupid cause.
I thought I was going to be cleaver, and add "python gsutil update -n" to the top of the bash script I'm automating the building/execution of in my SSIS Package in the hope to curb this problem, but when I run this command from the prompt in either Windows Server 2008r2 or Windows 7 I get the following:
C:\gsutil>python gsutil update -f -n
Copying gs://pub/gsutil.tar.gz...
OSError: The process cannot access the file because it is being used by another process.
Any help?
P.S. - Also, Google engineers... can you PLEASE remove these prompts? for all of us using these tools in automated processes? I have other things to work on, instead of constantly going back to things like this every few days/weeks.
What version of gsutil are you running?
Also, to be clear: Are you talking about the fact that gsutil checks for available software updates periodically, and if it finds them it then prompts you whether you want to update? Or are you talking about the fact that the gsutil update command asks if you want to perform the update?
If the former, gsutil shouldn't be performing this check/prompting if you are running gsutil from a script not connected to at TTY. If that's not working correctly we'd like to know.
And also, if that's the problem you're having, you can completely disable automated software update checks by setting software_update_check_period=0 in the [GSUtil] section of your .boto config file.

How to replay traffic to web server from logs to profile / benchmark web app under real load?

Is there a way to get recorder real network traffic to web server, e.g. from web server logs (Apache), and replay this traffic to either profile web application (in Perl) under real load, or benchmark and compare speed of different implementations before choosing one or the other?
If it matters, webapp is written in Perl, and runs under plain CGI, FastCGI, mod_perl (via ModPerl::Registry), PSGI (via Plack::App::WrapCGI).
Crossposted to Pro Webmasters
Similar questions on Server Fault:
How can I replay Apache access logs back at my servers to do real world load testing?
A quick scan on Google for this yielded an interesting blog entry with subsequent, useful comments are at http://www.igvita.com/2008/09/30/load-testing-with-log-replay/. A commenter also mentioned Tsung by Process-One that allows for recording sessions real-time, with the obvious note that you should be able to replay it back. That doesn't help so much with existing Apache access logs though.
Been here lately. I figured that if I dumped tcp traffic with tcpdump I could rewrite the destination of the packages and then replay it to the new app servers. So I started out with something like this:
tcpdump -i eth1 dst -s 0 -w - port 80 | \
tcprewrite --mtu-trunc --infile=- --outfile=- \
--dstipmap=<source_ip>:<destination_ip> | \
tcpslice -w - - | tcpreplay --intf1=eth1 -
It did not work for various reasons, so I started digging some more and found Gor: a small Go project by Leonid Bugaev from Granify, written for exactly what we wanted to accomplish here.
This is how we ended up using Gor: http://devblog.springest.com/testing-big-infrastructure-changes-at-springest/
We have a Chef cookbook for it as well: https://github.com/Springest/gor-chef
Hope this helps.
Short answer was given on the otherside.
Longer answer is that you can't: you will be missing request headers and POST bodies.
Here's a simple perl way to record real http traffic and play it back:
http://patrick.net/sprocket/rwt.html
If only GET requests are needed and there is no session-tracking implemented via query parameters, then this is possible.
One question: do you want to do it this way because (1) you want to emulate real-world distribution of traffic among your pages or (2) there are too many pages to even consider building any sort of test scripts?

How to configure MAMP to serve perl CGI scripts (NOT localhost!)

I'm using MAMP-pro to serve my domain to the outside world.
I'm not a very experienced sys-admin, though I've slogged my way through a few basic things. I know what apache is, and I can read-most-of but not generate-without-guide related .conf files.
I've got a perl script which I've tested from the command line and it works (outputs as desired.)
When I try to access said script from the browser, I get 404.
I've tried placing the script at:
/Users/me/Sites/mydomain.com/htdocs/mycgi.pl
/Users/me/Sites/mydomain.com/cgi-bin/mycgi.pl
/Users/me/Sites/mydomain.com/htdocs/cgi-bin/mycgi.pl
and accessing it as:
http://www.mydomain.com/mycgi.pl
http://www.mydomain.com/cgi-bin/mycgi.pl
and all the various combinations, all to no avail (404.)
The script and its container directory have permissions 755.
So, what other steps am I missing? Are there any good set-up guides? I tried the MAMP-Pro manual, but it is filled with such information as "the cancel button cancels the current operation" and not really anything useful. Google turned up several hits that all seem to talk about how to make this work on localhost, but I'm trying to serve this to the outside world.
Any hints?
Thanks!
The official online documentation has a section on virtual hosts. When creating a host for www.mydomain.com you can choose the DocumentRoot which is called "Disk location" within MAMP PRO. If you still get a 404 error, take a look into the error_log for a more specific reason (i.e., where Apache tries to find the file in question).

Set a script to run on a schedule?

How can I set a PHP script to run on a schedule? I don't have full control over the server as I am using a hosting company, I have a PLESK administration for the hosting though.
Thanks
I believe PLESK has a crontab area underneath each domain.
Alternatively, if you have shell access, here's a good tutorial on editing your crontab from the command-line.
crontab. video tutorial here:
http://www.webhostingresourcekit.com/flash/plesk-8-linux/plesk8linux_crontab.html
What you're looking for is called a cron job: an automated task that can execute a http request on your server.
Since you're hosted, it's impossible to manually set up a cron job to run. However, many web hosts offer online tools for creating cron jobs through their control panel (cpanel, plesk, etc).
If that isn't an option, there are some paid and SOME free cron services you might be able to find if you poke around long enough.