How can I efficiently jump to a specific time in a large log? - less-unix

I have a huge daily textual logs (2-3 GB), which I want to investigate for specific event (which I know when it occurred), I'm using less (since it all on ssh to remote server).
I'm looking for an option to jump as fast as I can to the exact time, and I think if there is an option to a binary search to find it, it should be the fastest (right now jump to end of the day takes tens of seconds)
Thanks!

Based on this other question's answer:
sgrep might work for you:
sudo apt-get install sgrep
sgrep -l '"needle"' haystack.txt
The project page http://sgrep.sourceforge.net/ says:
Sgrep uses a binary search algorithm, which is very fast, but requires sorted input.

Related

Is it possible to run rrdtool serverside?

Have searched, but not found an answer.
Presently running RRDTool at the same processor which is collecting the information, making rrd-files and related graphic output at that processor.
Is it also possible to run RRDTool at a server for graphic output, applying rrd-files being uploaded?
Yes; at least to some extent. You need to run rrdcached on your backend server; then, your collector and graphing servers can make remote calls to obtain or store the data.
How you tune rrdcached depends on the amount of data and frequency of writes, and how much you can afford to lose in the even of a server crash; however generally a 30min cache works. This also greatly decreases the amount of disk IO required.
Note that some rrdtool functions do not work exactly the same via rrdcached; check the documentation for more details.
Read about rrdcached here: https://oss.oetiker.ch/rrdtool/doc/rrdcached.en.html

Will the depth of a file within a filesystem change the time taken to copy it?

I am trying to figure out if whether or not the depth of a file in a filesystem will change the amount of time it takes to execute a "cp" bash command with that file.
By depth I mean how many parent directories its contained in.
I tried running a few tests, but my results are pretty inconclusive, and when I try to logically answer, I can think of reasons of why it would be either way.
What is the purpose of this?
Provided nothing is cached, the deeper the directory tree the more data has to be read from storage to get to the file - you have to find the name of the second dir, then the third within the second and so on. On the other hand if the file is big, the time needed to do this can be negligible in comparison.
Also mere startup of a command like cp is not without its cost.
If you are interested in how file systems work read this free book: http://www.nobius.org/~dbg/practical-file-system-design.pdf
Performance is a complicated subject, especially so when hard media is involved. Without proper understanding of how this works and proper understanding of statistics, you can't perform a correct test.

WTMP (RHEL 5/6) log maintenance - need to keep a rolling log rather than rotate

We have a policy requirement to use items using wtmp, such as the 'last' command or GDM-Last-Login details. We've discovered that these items will have gaps depending on when wtmp was last rotated, and need to try to work around this.
Because these gaps have been determined to be unacceptable, and keeping wtmp data in a single active logfile forever without splitting off the old data into archives is not really viable, I'm looking for a way to rollover / age-out old wtmp entries while still keeping more recent ones.
From some initial research I've seen this problem addressed in the Unix (AIX, SunOS) world with the use of 'fwtmp' and some pre/post logrotate scripts. Has this been addressed in the Linux world and I've just missed it?
So far as I can tell 'fwtmp' is a Unix built-in that's not made it into RHEL 5 & 6, per searching the RHEL customer portal and some 'yum whatprovides' searches on my test boxes.
Many thanks in advance!

How to quickly get directory (and contents) size in cygwin perl

I have a perl script which monitors several windows network share drive usages. It currently monitors the free space of several network drives using the cygwin df command. I'd like to add the individual drive usages as well. When I use the
du -s | grep total
command, it takes for ever. I need to look at the shared drive usages because there are several network drives that are shared from a single drive on the server. Thus, filling one network drive fills them all (yes I know, not the best solution, not my choice).
So, I'd like to know if there is a quicker way to get the folder usage that doesn't take for ever.
du -s works by recursively querying the size of every directory and file. If your filesystem implementation doesn't store this total value somewhere, this is the only way to determine disk usage. Therefore, you should investigate which filesystem and drivers you are using, and see if there is a way to directly query for this data. Otherwise, you're probably SOL and will have to suck up the time it takes to run du.
1) The problem possibly lies in the fact that they are network drives - local du is acceptably fast in most cases. Are you doing du on the exact server where the disk is housed? If not, try to approach the problem from a different angle - run an agent on every server hosting the drives which calculates the local du summaries and then report the totals to a central process (either IPC or heck, by writing a report into a file on that same share filesystem).
2) If one of the drives is taking a significantly larger share of space (on average) than the rest of them, you can optimize by doing du on all but the "biggest" one and then calculate the biggest one by subtracting the sum of others from df results
3) Also, to be perfectly honest, it sounds like a suboptimal solution from design standpoint - while you indicated that it's not your choice, I'd strongly recommend that you post a question on how you can improve the design within the parameters you were given (to ServerFault website, not SO)

How to get a command line process to use less processing power

I am wondering how to get a process run at the command line to use less processing power. The problem I'm having is the the process is basically taking over the CPU and taking MySQL and the rest of the server with it. Everything is becoming very slow.
I have used nice before but haven't had much luck with it. If it is the answer, how would you use it?
I have also thought of putting in sleep commands, but it'll still be using up memory so it's not the best option.
Is there another solution?
It doesn't matter to me how long it runs for, within reason.
If it makes a difference, the script is a PHP script, but I'm running it at the command line as it already takes 30+ minutes to run.
Edit: the process is a migration script, so I really don't want to spend too much time optimizing it as it only needs to be run for testing purposes and once to go live. Just for testing, it keeps bring the server to pretty much a halt...and it's a shared server.
The best you can really do without modifying the program is to change the nice value to the maximum value using nice or renice. Your best bet is probably to profile the program to find out where it is spending most of its time/using most of its memory and try to find a more efficient algorithm for what you are trying to do. For example, if your are operating on a large result set from MySQL you may want to process records one at a time instead of loading the entire result set into memory or perhaps you can optimize your queries or the processing being performed on the results.
You should use nice with 19 "niceness" this makes the process very unlikely to run if there are other processes waiting for the cpu.
nice -n 19 <command>
Be sure that the program does not have busy waits and also check the I/O wait time.
Which process is actually taking up the CPU? PHP or MySQL? If it's MySQL, 'nice' won't help at all (since the server is not 'nice'd up).
If it's MySQL in general you have to look at your queries and MySQL tuning as to why those queries are slamming the server.
Slamming your MySQL server process can show as "the whole system being slow" if your primary view of the system through MySQL.
You should also consider whether the cmd line process is IO intensive. That can be adjusted on some linux distros using the 'ionice' command, though it's usage is not nearly as simplistic as the cpu 'nice' command.
Basic usage:
ionice -n7 cmd
will run 'cmd' using 'best effort' scheduler at the lowest priority. See the man page for more usage details.
Using CPU cycles alone shouldn't take over the rest of the system. You can show this by doing:
while true; do done
This is an infinite loop and will use as much of the CPU cycles it can get (stop it with ^C). You can use top to verify that it is doing its job. I am quite sure that this won't significantly affect the overall performance of your system to the point where MySQL dies.
However, if your PHP script is allocating a lot of memory, that certainly can make a difference. Linux has a tendency to go around killing processes when the system starts to run out of memory.
I would narrow down the problem and be sure of the cause, before looking for a solution.
You could mount your server's interesting directory/filesystem/whatever on another machine via NFS and run the script there (I know, this means avoiding the problem and is not really practical :| ).