Storing time series data, without a database - perl

I would like to store time series data, such as CPU usage over 6 Months (Will poll the CPU usage every 2 minutes, so later I can get several resolutions, such as - 1 Week, 1 Month, or even higher resolutions, 5 Minutes,etc).
I'm using Perl, and I dont want to use RRDtool or relational database, I was thinking of implementing my own using some sort of a circular buffer (ring buffer) with the following properties:
6 Months = 186 Days = 4,464 Hours = 267,840 Minutes.
Dividing it into 2 minutes sections: 267,840 / 2 = 133,920.
133,920 is the ring-buffer size.
Each element in the ring-buffer will be a hashref with the key as the epoch (converted easily into date time using localtime) and the value is the CPU usage for that given time.
I will serialize this ring-buffer (using Storable I guess)
Any other suggestions?
Thanks,

I suspect you're overthinking this. Why not just use a flat (e.g.) TAB-delimited file with one line per time interval, with each line containing a timestamp and the CPU usage? That way, you can just append new entries to the file as they are collected.
If you want to automatically discard data older than 6 months, you can do this by using a separate file for each day (or week or month or whatever) and deleting old files. This is more efficient than reading and rewriting the entire file every time.
Writing and parsing such files is trivial in Perl. Here's some example code, off the top of my head:
Writing:
use strict;
use warnings;
use POSIX qw'strftime';
my $dir = '/path/to/log/directory';
my $now = time;
my $date = strftime '%Y-%m-%d', gmtime $now; # ISO 8601 datetime format
my $time = strftime '%H:%M:%S', gmtime $now;
my $data = get_cpu_usage_somehow();
my $filename = "$dir/cpu_usage_$date.log";
open FH, '>>', $filename
or die "Failed to open $filename for append: $!\n";
print FH "${date}T${time}\t$data\n";
close FH or die "Error writing to $filename: $!\n";
Reading:
use strict;
use warnings;
use POSIX qw'strftime';
my $dir = '/path/to/log/directory';
foreach my $filename (sort glob "$dir/cpu_usage_*.log") {
open FH, '<', $filename
or die "Failed to open $filename for reading: $!\n";
while (my $line = <FH>) {
chomp $line;
my ($timestamp, $data) = split /\t/, $line, 2;
# do something with timestamp and data (or save for later processing)
}
}
(Note: I can't test either of these example programs right now, so they might contain bugs or typos. Use at your own risk!)

As #Borodin suggests, use SQLite or DBM::Deep as recommended here.
If you want to stick to Perl itself, go with DBM::Deep:
A unique flat-file database module, written in pure perl. ... Can handle millions of keys and unlimited levels without significant slow-down. Written from the ground-up in pure perl -- this is NOT a wrapper around a C-based DBM. Out-of-the-box compatibility with Unix, Mac OS X and Windows.
You mention your need for storage, which could be satisfied by a simple text file as advocated by #llmari. (And, of course, using a CSV format would allow the file to be manipulated easily in a spreadsheet.)
But, if you plan on collecting a lot of data, and you wish to eventually be able to query it with good performance, then go with a tool designed for that purpose.

Related

Limitations of the Perl Tie::File module

I tried using the Tie:File module to write a text file which should be containing 1 billion lines, but it throws an error after writing 16 million
"Out of memory!"
"Callback called exit at C:/perl/lib/Tie/File.pm line 979 during global destruction."
this is the code I tried with.
use Tie::File;
tie #array, 'Tie::File', "Out.txt";
$start = time();
for ($i = 0; $i <= 15000000; $i++) {
$array[$i].= "$i,";
}
$end = time();
print("time taken: ", $end - $start, "seconds");
untie #array;
I don't know why it throws an error. Any solutions to overcome this? It also took about 55 minutes to write 16 million records and it throws error! Is this usual time it takes to write?
The Tie:File module is known to be quite slow, and it is best used where the advantage of having random access to the lines of the file outweighs the poor performance.
But this isn't a problem with the module, it is a limitation of Perl. Or, more accurately, a limitation of your computer system. If you take the module out of the situation and just try to create an ordinary array with 1,000,000,000 elements then Perl will die with an Out of memory! error. The limit for my 32-bit build of Perl 5 version 20 is around 30 million. For 64-bit builds it will be substantially more.
Tie:File doesn't keep the whole file in memory but pages it in and out to save space, so it can handle very large files. Just not that large!
In this case you don't have any need of the advantages of Tie:File, and you should just write the data sequentially to the file. Something like this
use strict;
use warnings;
use 5.010;
use autodie;
open my $fh, '>', 'Out.txt';
my $time = time;
for my $i (0 .. 15_000_000) {
print $fh "$i,\n";
}
$time = time - $time;
printf "Time taken: %d seconds\n", $time;
This program ran in seven seconds on my system.
Please also note use strict and use warnings at the start of the program. This is essential for any Perl program, and will quickly reveal many simple problems that you would otherwise overlook. With use strict in place, each variable must be declared with my as close as possible to the first point of use.

Perl to read each line

I'm beginner for Perl script. Below script is to check if file modified time is greater than 600 seconds. I read filename from filelist.txt.
When I tried to print file modified time, it shows modified time as blank.
Could you help me where I'm wrong?
filelist.txt
a.sh
file.txt
Perl script
#!/usr/bin/perl
my $filename = '/root/filelist.txt';
open(INFO, $filename) or die("Could not open file.");
foreach $eachfile (<INFO>) {
my $file="/root/$eachfile";
my $file_timestamp = (stat $file)[9];
my $timestamp = localtime($epoch_timestamp);
my $startTime = time();
my $fm = $startTime - $file_timestamp;
print "The file modified time is = $file_timestamp\n";
if ($rm > 600) {
print "$file modified time is greater than 600 seconds";
}
else {
print "$file modified time is less than 600 seconds\n";
}
}
You didn't include use strict; or use warnings; which is your downfall.
You set $fm; you test $rm. These are not the same variable. Using strictures and warnings would have pointed out the error of your ways. Expert use them routinely to make sure they aren't making silly mistakes. Beginners should use them too to make sure they aren't making silly mistakes either.
This revised script:
Uses use strict; and use warnings;
Makes sure each variable is defined with my
Doesn't contain $epoch_timestamp or $timestamp
Uses lexical file handles ($info) and the three argument form of open
Closes the file
Includes newlines at the ends of messages
Chomps the file name read from the file
Prints the file name so it can be seen that the chomp is doing its stuff
Locates the files in the current directory instead of /root
Avoids parentheses around the argument to die
Includes the file name in the argument to die
Could be optimized by moving my $startTime = time; outside the loop
Uses $fm in the test
Could be improved if the greater than/less than comments included the equals case correctly
Code:
#!/usr/bin/perl
use strict;
use warnings;
my $filename = './filelist.txt';
open my $info, '<', $filename or die "Could not open file $filename";
foreach my $eachfile (<$info>)
{
chomp $eachfile;
my $file="./$eachfile";
print "[$file]\n";
my $file_timestamp = (stat $file)[9];
my $startTime = time();
my $fm = $startTime - $file_timestamp;
print "The file modified time is = $file_timestamp\n";
if ($fm > 600) {
print "$file modified time is greater than 600 seconds\n";
}
else {
print "$file modified time is less than 600 seconds\n";
}
}
close $info;
Tangentially: if you're working in /root, the chances are you are running as user root. I trust you have good backups. Experimenting in programming as root is a dangerous game. A simple mistake can wipe out the entire computer, doing far more damage than if you were running as a mere mortal user (rather than the god-like super user).
Strong recommendation: Don't learn to program as root!
If you ignore this advice, make sure you have good backups, and know how to recover from them.
(FWIW: I run my Mac as a non-root user; I even run system upgrades as a non-root user. I do occasionally use root privileges via sudo or equivalents, but I never login as root. I have no need to do so. And I minimize the amount of time I spend as root to minimize the chance of doing damage. I've been working on Unix systems for 30 years; I haven't had a root-privileged accident in over 25 years, because I seldom work as root.)
What others have run into before, is that reading the filename from the file INFO you end up with a newline character at the end of the string and then trying to open /root/file1<cr> doesn't work because that file doesn't exist.
Try calling:
chomp $eachfile
before constructing $file

How to add time and replace it in a file in Perl?

i have wriiten the following code to fetch date from server and to display it in yy/mm/dd-hh/mm/ss format.
#!/usr/bin/perl
system(`date '+ %Y/%m/%d-%H:%M:%S' >ex.txt`);
open(MYINPUTFILE, "/tmp/ranjan/ex.txt");
while(<MYINPUTFILE>)
{
chomp;
print "$_\n";
}
close(MYINPUTFILE);
output:
2013/07/29-18:58:04
I want to add two minutes to the time and need to replace the time present in a file, Pls give me some ideas.
Change your date command to add the 2 minutes:
date --date "+2 min" '+ %Y/%m/%d-%H:%M:%S'
or a Perl version:
use POSIX;
print strftime("%Y/%m/%d-%H:%M:%S", localtime(time + 120));
It is best to use Time::Piece to do the parsing and formatting of dates. It is a built-in module and shoudln't need installation.
Unusually, in this case the replacement date/time string is exactly the same length as the original string read from the file, so the modification can be done in-place. Normally the overall length of a file changes, so it is necessary either to create a new file and delete the old one, or to read the entire file into memory and write it out again.
This program opens the file for simultaneous read/write, reads the first line from the file, parses it using Time::Piece, adds two minutes (120 seconds), seeks to the start of the file again, and prints the new date/time reformatted in the same way as the original back to the file.
use strict;
use warnings;
use autodie;
use Time::Piece;
my $format = '%Y/%m/%d-%H:%M:%S';
open my $fh, '+<', 'ex.txt';
my $date_time = <$fh>;
chomp $date_time;
$date_time = Time::Piece->strptime($date_time, $format);
$date_time += 60 * 2;
seek $fh, 0, 0;
print $fh $date_time->strftime($format);
close $fh;
output
2013/07/29-19:00:04

Open a directory and sort files by date created

I need to open directories and sort the files by the time they were created. I can find some discussion, using tags for Perl, sorting, and files, on sorting files based on date of modification. I assume this is a more common need than sorting by date of creation. I use Perl. There is some previous postings on sorting by creation date in other languages other than Perl, such as php or java.
For example, I need to do the following:
opendir(DIR, $ARGV[0]);
my #files = "sort-by-date-created" (readdir(DIR));
closedir(DIR);
do things with #files...
The CPAN has a page on the sort command, but it's not very accessible to me, and I don't find the words "date" or "creation" on the page.
In response to an edit, I should say I use Mac, OS 10.7. I know that in the Finder, there is a sort by creation date option, so there must be some kind of indication for date of creation somehow attached to files in this system.
In response to an answer, here is another version of the script that attempts to sort the files:
#!/usr/bin/perl
use strict; use warnings;
use File::stat; # helps with sorting files by ctime, the inode date that hopefully can serve as creation date
my $usage = "usage: enter name of directory to be scanned for SNP containing lines\n";
die $usage unless #ARGV == 1;
opendir(DIR, $ARGV[0]); #open directory for getting file list
#my #files = (readdir(DIR));
my #file_list = grep ! /^\./, readdir DIR;
closedir(DIR);
print scalar #file_list."\n";
for my $file (sort {
my $a_stat = stat($a);
my $b_stat = stat($b);
$a_stat->mtime <=> $b_stat->mtime;
} #file_list ) {
say "$file";
}
You can customize the sorting order by providing a subroutine or code block to the sort function.
In this sub or block, you need to use the special variables $a and $b, which represent the values from the #array as they are compared.
The sub or block needs to return a value less than, equal to, or greater than 0 to indicate whether $a is less than, equal to, or greater than $b (respectively).
You may use the special comparison operators (<=> for numbers, cmp for strings) to do this for you.
So the default sort sort #numbers is equivalent to sort {$a <=> $b} #numbers.
In the case of sorting by creation time, you can use the stat function to get that information about the file. It returns an array of information about the file, some of which may not be applicable to your platform. Last modification time of the file is generally safe, but creation time is not. The ctime (11th value that it returns) is as close as you can get (it represents inode change time on *nix, creation time on win32), which is expressed as the number of seconds since the epoch, which is convenient because it means you can do a simple numeric sort.
my #files = sort {(stat $a)[10] <=> (stat $b)[10]} readdir($dh);
I'm not sure if you want to filter out the directories also. If that is the case, you'll probably also want to use grep.
I need to open directories and sort the files by the time they were created.
You can't. The creation time simply does not exist. There are three time elements tracked by *nix like operating systems:
mtime: This is the time the file was last modified.
atime: This is the time the file was last accessed.
ctime: This is the time when the inode was last modified.
In Unix, certain file information is stored in the inode. This includes the various things you see when you take the Perl stat of a file. This is the name of the user, the size of the file, the device it's on, the link count, and ironically, the mtime, atime, and ctime timestamps.
Why no creation time? Because how would you define it? What if I move a file? Should there be a new creation time (By the way, ctime won't change with a move). What if I copy the file? Should the new copy have a new creation time? What if I did a copy, then deleted the original? What if I edited a file? How about if I changed everything in the file with my edit? Or I edited the file, then renamed it to a completely new name?
Even on Windows that has a file creation time, doesn't really track the file creation. It merely tracks when the directory entry was created which is sort of what ctime does. And, you can even modify this creation time via the Windows API. I suspect that the Mac's file creation time is a relic of the HFS file system, and really doesn't point to a file creation time as much as the time the directory entry was first created.
As others have pointed out. You can add into the sort routine a block of code stating how you want something sorted. Here's a quickie example. Note I use File::stat which gives me a nice by name interface to the old stat command. If I used the old stat command, I would get an array, and then have to figure out where in the array the item I want is located. Here, the stat command gives me a stat object, and I can use the mtime, atime, or ctime method for pulling out the right time.
I also use the <=> which is a comparison operator specifically made for the sort command block.
The sort command gives you two items $a and $b. You use these two items to figure out what you want, adn then use either <=> or cmp to say whether $a is bigger, $b is bigger, or they're both the same size.
#! /usr/bin/env perl
use 5.12.0;
use warnings;
use File::stat;
my $dir_name = shift;
if ( not defined $dir_name ) {
die qq(Usage: $0 <directory>);
}
opendir(my $dir_fh, $dir_name);
my #file_list;
while ( my $file = readdir $dir_fh) {
if ( $file !~ /^\./ ) {
push #file_list, "$dir_name/$file"
}
}
closedir $dir_fh;
say scalar #file_list;
for my $file (sort {
my $a_stat = stat($a);
my $b_stat = stat($b);
$a_stat->ctime <=> $b_stat->ctime;
} #file_list ) {
say "$file";
}
OS X stores the creation date in Mac-specific metadata, so the standard Perl filesystem functions don't know about it. You can use the MacOSX::File module to access this information.
#!/usr/bin/env perl
use strict;
use warnings;
opendir(DIR, $ARGV[0]);
chdir($ARGV[0]);
my #files = sort { (stat($a))[10] <=> (stat($b))[10] } (readdir(DIR));
closedir(DIR);
print join("\n",#files);
stat gives you all kinds of status info for files. field 10 of that is ctime (on filesystems that support it) which is inode change time (not creation time).
Mojo::File brings some interesting and readable ways to do it.
#!/usr/bin/env perl
use Mojo::File 'path';
my $files_list = path( '/whatever/dir/path/' )->list;
# Returns an array of Mojo::File
my #files = sort { $a->stat->ctime <=> $b->stat->ctime }
map { $_ } $files_list->each;
# Returns an array of paths sorted by modification date (if needed)
my #paths = map { $_->realpath->to_string } #files;

Parsing multiple files at a time in Perl

I have a large data set (around 90GB) to work with. There are data files (tab delimited) for each hour of each day and I need to perform operations in the entire data set. For example, get the share of OSes which are given in one of the columns. I tried merging all the files into one huge file and performing the simple count operation but it was simply too huge for the server memory.
So, I guess I need to perform the operation each file at a time and then add up in the end. I am new to perl and am especially naive about the performance issues. How do I do such operations in a case like this.
As an example two columns of the file are.
ID OS
1 Windows
2 Linux
3 Windows
4 Windows
Lets do something simple, counting the share of the OSes in the data set. So, each .txt file has millions of these lines and there are many such files. What would be the most efficient way to operate on the entire files.
Unless you're reading the entire file into memory, I don't see why the size of the file should be an issue.
my %osHash;
while (<>)
{
my ($id, $os) = split("\t", $_);
if (!exists($osHash{$os}))
{
$osHash{$os} = 0;
}
$osHash{$os}++;
}
foreach my $key (sort(keys(%osHash)))
{
print "$key : ", $osHash{$key}, "\n";
}
While Paul Tomblin's answer dealt with filling the hash, here's the same plus opening the files:
use strict;
use warnings;
use 5.010;
use autodie;
my #files = map { "file$_.txt" } 1..10;
my %os_count;
for my $file (#files) {
open my $fh, '<', $file;
while (<$file>) {
my ($id, $os) = split /\t/;
... #Do something with %os_count and $id/$os here.
}
}
We just open each file serially -- Since you need to read all lines from all files, there isn't much more you can do about it. Once you have the hash, you could store it somewhere and load it when the program starts, then skip all lines until the last you read, or simply seek there, if your records premit, which doesn't look like it.