The following code generates a list of the average number of clients connected by subnet. Currently I have to pipe it through sort | uniq | grep -v HASH.
Trying to keep it all in Perl, this doesn't work:
foreach $subnet (keys %{keys %{keys %days}}) {
print "$subnet\n";
}
The source is this:
foreach $file (#ARGV) {
open(FH, $file) or warn("Can't open file $file\n");
if ($file =~ /(2009\d{4})/) {
$dt = $+;
}
%hash = {};
while(<FH>) {
#fields = split(/~/);
$subnet = $fields[0];
$client = $fields[2];
$hash{$subnet}{$client}++;
}
close(FH);
$file = "$dt.csv";
open(FH, ">$file") or die("Can't open $file for output");
foreach $subnet (sort keys %hash) {
$tot = keys(%{$hash{$subnet}});
$days{$dt}{$subnet} = $tot;
print FH "$subnet, $tot\n";
push #{$subnet}, $tot;
}
close(FH);
}
foreach $day (sort keys %days) {
foreach $subnet (sort keys %{$days{$day}}) {
$tot = $i = 0;
foreach $amt (#{$subnet}) {
$i++;
$tot += $amt;
}
print "$subnet," . int($tot/$i) . "\n";
}
}
How can I eliminate the need for the sort | uniq process outside of Perl? The last foreach gets me the subnet ids which are the 'anonymous' names for the arrays. It generates these multiple times (one for each day that subnet was used).
but this seemed easier than combining
spreadsheets in excel.
Actually, modules like Spreadsheet::ParseExcel make that really easy, in most cases. You still have to deal with rows as if from CSV or the "A1" type addressing, but you don't have to do the export step. And then you can output with Spreadsheet::WriteExcel!
I've used these modules to read a spreadsheet of a few hundred checks, sort and arrange and mung the contents, and write to a new one for delivery to an accountant.
In this part:
foreach $subnet (sort keys %hash) {
$tot = keys(%{$hash{$subnet}});
$days{$dt}{$subnet} = $tot;
print FH "$subnet,$tot\n";
push #{$subnet}, $tot;
}
$subnet is a string, but you use it in the last statement as an array reference. Since you don't have strictures on, it treats it as a soft reference to a variable with the name the same as the content of $subnet. Which is okay if you really want to, but it's confusing. As for clarifying the last part...
Update I'm guessing this is what you're looking for, where the subnet value is only saved if it hasn't appeared before, even from another day (?):
use List::Util qw(sum); # List::Util was first released with perl 5.007003 (5.7.3, I think)
my %buckets;
foreach my $day (sort keys %days) {
foreach my $subnet (sort keys %{$days{$day}}) {
next if exists $buckets{$subnet}; # only gives you this value once, regardless of what day it came in
my $total = sum #{$subnet}; # no need to reuse a variable
$buckets{$subnet} = int($total/#{$subnet}; # array in scalar context is number of elements
}
}
use Data::Dumper qw(Dumper);
print Dumper \%buckets;
Building on Anonymous's suggestions, I built a hash of the subnet names to access the arrays:
..
push #{$subnet}, $tot;
$subnets{$subnet}++;
}
close(FH);
}
use List::Util qw(sum); # List::Util was first released with perl 5.007003
foreach my $subnet (sort keys %subnets) {
my $total = sum #{$subnet}; # no need to reuse a variable
print "$subnet," . int($total/#{$subnet}) . "\n"; # array in scalar context is number of elements
}
I am not sure if this is the best solution, but I don't have the duplicates any more.
Related
I want to pair two array and add char '/' between them. Let say, two arrays are like below
#array1 = (FileA .. FileZ);
#array2 = (FileA.txt .. FileZ.txt);
The output that I want is like below
../../../experiment/fileA/fileA.txt
.
.
../../../experiment/fileZ/fileZ.txt
here is my code
my #input_name = input();
my $dirname = "../../../experiment/";
# CREATE FOLDER PATH
my #fileDir;
foreach my $input_name (#input_name){
chomp $input_name;
$_ = $dirname . $input_name;
push #fileDir, $_;
}
# CREATE FILE NAME
my #filename;
my $extension = '.txt';
foreach my $input_name (#input_name){
chomp $input_name;
$_ = $input_name . $extension;
push #filename, $_;
}
The code that I'd try is like below. But it seem doesn't work
#CREATE FULL PATH
foreach my $test_path (#test_path){
foreach my $testname (#testname){
my $test = map "$test_path[$_]/$testname[$_]", 0..$#test_path;
push #file, $test;
}
}
print #file;
I assume input() returns something like ('fileA', 'fileB').
The problem with your code is the nested loop here:
foreach my $test_path (#test_path){
foreach my $testname (#testname){
This combines every $test_path with every possible $testname. You don't want that. Also, it doesn't make much sense to assign the result of map to a scalar: All you'll get is the number of elements in the list created by map.
(Also, you have random chomp calls sprinkled throughout your code. None of those should be there.)
You only need a single array and a single loop:
use strict;
use warnings;
sub input {
return ('fileA', 'fileB');
}
my #input = input();
my $dirname = '../../../experiment';
my #files = map "$dirname/$_/$_.txt", #input;
for my $file (#files) {
print "got $file\n";
}
Here the loop is hidden in the map ..., #input call. If you want to write it as a for loop, it would look like this:
my #files;
for my $input (#input) {
push #files, "$dirname/$input/$input.txt";
}
The problem is your algorithm. You're iterating all filenames and all dirnames at the same time.
I mean, your code says "For every directory, create every file".
Try something along the lines of this and you'll be fine:
# WRITE TESTFILE
foreach my $filename (#filename){
chomp $filename;
if ( -e "$filename/$filename" and -d "$filename/$filename" ){
print "File already exists\n";
}
else {
open ( TXT_FILE, ">$filename/$filename" );
print TXT_FILE "Hello World";
close TXT_FILE;
}
}
So I have some data in tab delimited form:
Windows Department1 Enterprise
Windows Department1 Home
Linux Department2 Santiago
Windows Department1 Professional
Windows Department1 Enterprise
Windows Department2 Enterprise
In this case I need to match the first column first and get the count of each value in the 2nd and 3rd columns. Sort of to match the number of exact matches.
So to end up with something like:
Windows Department1 Enterprise = 2
Windows Department2 Professional = 1
Linux Department2 Santiago = 1
Windows Department3 Home = 1
Windows Department2 Enterprise = 1
So I tried loads of things, with this being the last attempt and I got many different unwanted results:
use strict;
use warnings;
my %seen;
my $count = 0;
while (<INPUTFILE>) {
my ($app,$dep,$name) = split(/\t/,$_);
if ($app.$dep.$name eq 'Windows.Department1.Professional') {
unless ($seen{$app.$dep.name}++) {
$count++;
}
}
}
print $app . " " . $dep . " " . $name . " " . $count++
But this does not do remotely what I want. and just prints the last values with a count. I want to set the $app unique once, then match both the second and third values to get a count. Other than that, I need to manually match each item with eq and the example above does not remotely show the amount of data in the file, so this will become a pain. I would greatly appreciate any help.
First construct a hash keyed by what you want to count uniquely: the combination of $app, $dep, and $name. You can use a combined key for this but let's use a multidimensional hash to keep the keys separate for later. Each intermediate level will automatically be autovivified when we increment a count.
use strict;
use warnings;
open my $input, '<', $filename or die "open $filename failed: $!";
my %counts;
while (my $line = <$input>) {
chomp $line; # otherwise trailing field will contain a newline
my ($app, $dep, $name) = split /\t/, $line;
$counts{$app}{$dep}{$name}++;
}
Then iterate through the hash to print out each count.
foreach my $app (sort keys %counts) {
my $app_counts = $counts{$app};
foreach my $dep (sort keys %$app_counts) {
my $dep_counts = $app_counts->{$dep};
foreach my $name (sort keys %$dep_counts) {
my $count = $dep_counts->{$name};
print "$app $dep $name $count\n";
}
}
}
I am trying to read a config file and discard the directories that are listed in there with size mentioned in the file. So far I have this-
open FILE, 'C:\reports\config.txt' or die $!;
my $size_req;
my $path;
my $sub_dir;
my $count;
my #lines = <FILE>;
foreach $_ (#lines)
{
my #line = split /\|/, $_;
if ($line[0] eq "size")
{
$size_req= $line[1];
$size_req= ">".$size_req*1024;;
}
if ($line[0] eq "path")
{
$path= $line[1];
}
if ($line[0] eq "directories")
{ my $aa;
my $siz_two_digit;
my $sub_dir;
my $i;
my $array_size=#line;
**for($i=1; $i < $array_size; )**
{
$sub_dir=$line[$i];
print $sub_dir;
print "\n";
print $path;
print "\n";
my $r1 = File::Find::Rule->directory
->name($sub_dir)
->prune # don't go into it
->discard; # don't report it
my $fn = File::Find::Rule->file
->size( $size_req );
my #files = File::Find::Rule->or( $r1, $fn )
->in( $path);
print #files;
undef #files;
print #files;
$i++;
print "\n";
print "\n";
}
}
}
The problem with the for loop is that- it stores all the subdirectories to be discarded from an array just fine. However, when it reads the name of the first directory to be discarded, it does not know about the remaining subdirectories and lists them too. When it goes to the 2 nd value, it ignores the previous one and lists that as well.
Does anyone know if the File|::Find::Rule takes an array at a time so that the code will consider entire line in the configuration file at once? or any other logic?
Thank you
This code does not do what you think:
my $r1 = File::Find::Rule->directory
->name($sub_dir)
->prune # don't go into it
->discard; # don't report it
You are trying to store a rule in a scalar, but what you are actually doing is calling Find::File::Rule and converting the resulting list to an integer (the number of elements in the list) and storing that in $r1.
Just put the whole call in the #files call. It may look messy but it will work a whole lot better.
Here is the code, I know it is not perfect perl. If you have insight on how I an do better let me know. My main question is how would I print out the arrays without using Data::Dumper?
#!/usr/bin/perl
use Data::Dumper qw(Dumper);
use strict;
use warnings;
open(MYFILE, "<", "move_headers.txt") or die "ERROR: $!";
#First split the list of files and the headers apart
my #files;
my #headers;
my #file_list = <MYFILE>;
foreach my $source_parts (#file_list) {
chomp($source_parts);
my #parts = split(/:/, $source_parts);
unshift(#files, $parts[0]);
unshift(#headers, $parts[1]);
}
# Next get a list of unique headers
my #unique_files;
foreach my $item (#files) {
my $found = 0;
foreach my $i (#unique_files) {
if ($i eq $item) {
$found = 1;
last;
}
}
if (!$found) {
unshift #unique_files, $item;
}
}
#unique_files = sort(#unique_files);
# Now collect the headers is a list per file
my %hash_table;
for (my $i = 0; $i < #files; $i++) {
unshift #{ $hash_table{"$files[$i]"} }, "$headers[$i]";
}
# Process the list with regex
while ((my $key, my $value) = each %hash_table) {
if (ref($value) eq "ARRAY") {
print "$value", "\n";
}
}
The Perl documentation has a tutorial on "Printing of a HASH OF ARRAYS" (without using Data::Dumper)
perldoc perldsc
You're doing a couple things the hard way. First, a hash will already uniqify its keys, so you don't need the loop that does that. It appears that you're building a hash of files, with the values meant to be the headers found in those files. The input data is "filename:header", one per line. (You could use a hash of hashes, since the headers may need uniquifying, but let's let that go for now.)
use strict;
use warnings;
open my $files_and_headers, "<", "move_headers.txt" or die "Can't open move_headers: $!\n";
my %headers_for_file;
while (defined(my $line = <$files_and_headers> )) {
chomp $line;
my($file, $header) = split /:/, $line, 2;
push #{ $headers_for_file{$file} }, $header;
}
# Print the arrays for each file:
foreach my $file (keys %headers_for_file) {
print "$file: #{ $headers_for_file{$file}}\n";
}
We're letting Perl do a chunk of the work here:
If we add keys to a hash, they're always unique.
If we interpolate an array into a print statement, Perl adds spaces between them.
If we push onto an empty hash element, Perl automatically puts an empty anonymous array in the element and then pushes onto that.
An alternative to using Data::Dumper is to use Data::Printer:
use Data::Printer;
p $value;
You can also use this to customise the format of the output. E.g. you can have it all in a single line without the indexes (see the documentation for more options):
use Data::Printer {
index => 0,
multiline => 0,
};
p $value;
Also, as a suggestion for getting unique files, put the elements into a a hash:
my %unique;
#unique{ #files } = #files;
my #unique_files = sort keys %unique;
Actually, you could even skip that step and put everything into %hash_table in one pass:
my %hash_table;
foreach my $source_parts (#file_list) {
chomp($source_parts);
my #parts = split(/:/, $source_parts);
unshift #{ $hash_table{$parts[0]} }, $parts[1];
}
Perl newbie here...I had help with this working perl script with some HASH code and I just need help understanding that code and if it could be written in a way that I would understand the use of HASHES more easily or visually??
In summary the script does a regex to filter on date and the rest of the regex will pull data related to that date.
use strict;
use warnings;
use constant debug => 0;
my $mon = 'Jul';
my $day = 28;
my $year = 2010;
my %items = ();
while (my $line = <>)
{
chomp $line;
print "Line: $line\n" if debug;
if ($line =~ m/(.* $mon $day) \d{2}:\d{2}:\d{2} $year: ([a-zA-Z0-9._]*):.*/)
{
print "### Scan\n" if debug;
my $date = $1;
my $set = $2;
print "$date ($set): " if debug;
$items{$set}->{'a-logdate'} = $date;
$items{$set}->{'a-dataset'} = $set;
if ($line =~ m/(ERROR|backup-date|backup-size|backup-time|backup-status)[:=](.+)/)
{
my $key = $1;
my $val = $2;
$items{$set}->{$key} = $val;
print "$key=$val\n" if debug;
}
}
}
print "### Verify\n";
for my $set (sort keys %items)
{
print "Set: $set\n";
my %info = %{$items{$set}};
for my $key (sort keys %info)
{
printf "%s=%s;", $key, $info{$key};
}
print "\n";
}
What I am trying to understand is these lines:
$items{$set}->{'a-logdate'} = $date;
$items{$set}->{'a-dataset'} = $set;
And again couple lines down:
$items{$set}->{$key} = $val;
Is this an example of hash reference? hash of hashes?
I guess i'm confused with the use of {$set} :-(
%items is a hash of hash references (conceptually, a hash of hashes). $set is the key into %items and then you get back another hash, which is being added to with keys 'a-logdate' and 'a-dataset'.
(corrected based on comments)
Lou Franco's answer is close, with one minor typographical error—the hash of hash references is %items, not $items. It is referred to as $items{key} when you are retrieving a value from %items because the value you are retrieving is a scalar (in this case, a hash reference), but $items would be a different variable.