Picking up files from a directory in perl - perl

I have gone through a bunch of questions to find the best way to get the names of the files from a directory. However, I have a peculiar scenario and need some help.
The files in my direcotry are as follows
-rw-rw-r-- 1 root 55000 53916 Apr 12 2013 Update_2013-04-12_02-17-55.txt
-rw-rw-r-- 1 root 55000 53916 Apr 12 2013 UpdateCIMS_2013-04-12_03-20-30.txt
-rw-rw-r-- 1 root 55000 53763 Apr 15 2013 UpdateCIMSFlag_2013-04-15_05-47-41.txt
-rw-rw-r-- 1 root 55000 91981 Apr 23 2013 UserManagementService_2013-04-23_03-55-52.txt
-rw-rw-r-- 1 root 55000 92076 Apr 23 2013 UserManagementService_2013-04-23_04-34-42.txt
-rw-rw-r-- 1 root 55000 92086 Apr 23 2013 UserManagementService_2013-04-23_23-55-10.txt
-rw-rw-r-- 1 root 55000 91971 Apr 24 2013 UserManagementService_2013-04-24_02-23-20.txt
-rw-rw-r-- 1 root 55000 59441 Apr 24 2013 SecuredService_2013-04-24_02-29-08.txt
-rw-rw-r-- 1 root 55000 42240 May 20 2013 UpdateCIMSFlag_2013-05-20_04-24-19.txt
-rw-rw-r-- 1 root 55000 40547 May 20 2013 UpdateCIMSFlag_2013-05-20_05-31-29.txt
-rw-rw-r-- 1 root 55000 42238 May 20 2013 UpdateCIMSFlag_2013-05-20_05-43-54.txt
-rw-rw-r-- 1 root 55000 59493 May 21 2013 SecuredService_2013-05-21_04-25-32.txt
-rw-rw-r-- 1 root 55000 88374 May 21 2013 RegistrationService_2013-05-21_23-55-33.txt
-rw-rw-r-- 1 root 55000 88426 May 22 2013 RegistrationService_2013-05-22_00-20-04.txt
-rw-rw-r-- 1 root 55000 60014 Jul 31 04:16 SecuredService_2013-07-31_04-16-56.txt
-rw-rw-r-- 1 root 55000 91636 Sep 2 06:11 AdminServices_2013-09-02_06-11-17.txt
-rw-rw-r-- 1 root 55000 91649 Sep 3 05:37 AdminServices_2013-09-03_05-37-54.txt
-rw-rw-r-- 1 root 55000 133629 Sep 3 05:43 UserManagementService2_2013-09-03_05-43-56.txt
-rw-rw-r-- 1 root 55000 556 Sep 9 08:26 Test_2013-09-09_08-26-23.txt
-rw-rw-r-- 1 root 55000 556 Sep 9 08:37 Test_2013-09-09_08-37-20.txt
-rw-rw-r-- 1 root 55000 133708 Sep 13 02:28 UserManagementService2_2013-09-13_02-28-49.txt
-rw-rw-r-- 1 root 55000 60107 Sep 13 02:30 SecuredService_2013-09-13_02-30-43.txt
-rw-rw-r-- 1 root 55000 133743 Sep 13 04:44 UserManagementService2_2013-09-13_04-44-29.txt
-rw-rw-r-- 1 root 55000 100886 Sep 16 04:27 AdminServices_2013-09-16_04-27-33.txt
-rw-rw-r-- 1 root 55000 556 Sep 20 06:40 Test_2013-09-20_06-40-16.txt
-rw-rw-r-- 1 root 55000 110236 Nov 25 02:35 AdminServices_2013-11-25_02-35-37.txt
-rw-rw-r-- 1 root 55000 142357 Dec 18 03:13 UserManagementService2_2013-12-18_03-13-20.txt
As you can see, i have similar files with different timestamps and different files. So i need the file names which are similar excluding the timestamp and the latest file from them. I want my end result to display the latest, unique filenames with the timestamp.
I am trying opendir but am not seeing any result.
#!/usr/bin/perl
use File::stat;
my $DIR = "/home/DIR";
opendir(my $DH, $DIR) or die "Error opening the dir";
my %files = map { $_ => (stat("$DIR/$_"))[9] } grep(! /^\.\.?$/, readdir($DH));
closedir($DH);
my #sorted_files = sort { $files{$b} <=> $files{$a} } (keys %files);
print $_;
Please help.
The output I am expecting is
AdminServices_2013-11-25_02-35-37.txt
UserManagementService2_2013-12-18_03-13-20.txt
SecuredService_2013-09-13_02-30-43.txt
RegistrationService_2013-05-22_00-20-04.txt
etc...

For a start, opendir isn't the problem. You aren't using the print statement correctly.
foreach(#sorted_files)
{ print $_ . "\n"; }
That outputs the file names. This is only a start to get you some output. I didn't finish the problem.

#!/usr/bin/perl
my $DIR = "/home/DIR";
opendir(my $DH, $DIR) or die $!;
my %files;
while (my $f = readdir($DH)) {
next if $f =~ /^\.\.?$/;
my ($key, $t) = split /_/, $f, 2;
# #{ $files{$key} }{ "t","f" } = ($t, $f)
# if !$files{$key} or $files{$key}{t} lt $t;
my $h = $files{$key} ||= {};
if (! %$h or $h->{t} lt $t) {
$h->{t} = $t;
$h->{f} = $f;
}
}
print "$files{$_}{f}\n" for sort keys %files;

You could use glob:
my $filespec = "/tmp/test*";
my %files;
while (my $file = glob("$filespec") ){
next if $file eq '.' or $file eq '..';
next if ! -f $file;
my $datestamp = (stat($file))[9];
#remove timestamp
my $filename = $file;
$filename =~ s!_\d{4}-\d{2}-\d{2}_\d{4}-\d{4}-\d{4}\.\W{3}\z!!is;
if (! exists $files{$filename} || $files{$filename}<$datestamp){
$files{$filename} = $datestamp;
$files{$filename} = $datestamp;
}
}
foreach my $key (sort { $files{$b} <=> $files{$a} } (keys %files)){
print "$key\t$files{$key}\n";
}

Related

My symbol column file size in partitioned table is unusually large -- why would that be?

I've just built my first proper q/kdb+ database with splayed and partitioned tables. Everything is going fine, but I just noticed that my symbol s column file size is unusually large. Here is what I can see from the OS and from inside q:
# ls -latr 2017.10.30/ngbarx
total 532
-rw-r--r-- 1 root root 24992 Apr 17 20:53 vunadj
-rw-r--r-- 1 root root 24992 Apr 17 20:53 v
-rw-r--r-- 1 root root 300664 Apr 17 20:53 s
...
q)meta ngbarx
c | t f a
------| -----
date | d
s | s p
v | e
vunadj| e
...
q)get `:2017.10.30/ngbarx/s
`p#`sym$`A`AA`AACG`AADI`AADR`AAIC`AAIC-B`AAL`AAM-A`AAMC`AAME`AAOI`AAON`AAP`AA..
q)-22!get `:2017.10.30/ngbarx/v
24990
q)-22!get `:2017.10.30/ngbarx/s
28678
q)all (get `:2017.10.30/ngbarx/s) in sym
1b
q)count sym
62136
So comparing the real-type v column with the symbol-type s column, I see from ls that the symbol column is more than 10x the size, even though the internal size in bytes is similar and everything seems properly encoded in the sym file.
Is this expected behavior? Or am I doing something wrong that could be fixed?
UPDATE: I have not used compression, and have written the files using the magical function .Q.dcfgnt, which can be viewed here. Well, a slightly modified version, I noticed that this function as is also saved a date file in the directory, even though the column should be virtual, so I did some hacking in k (I'm not very good at it) and updated the inner function .Q.dpfgnt to this ...
k){[d;p;f;g;n;t]if[~&/qm'r:+en[d]t;'`unmappable];
{[d;g;t;i;x]#[d;x;g;t[x]i]}[d:par[d;p;n];g;r;<r f]'{x#&~x=`date}(!r);
#[;f;`p#]#[d;`.d;:;f,r#&~f=r:{x#&~x=`date}(!r)];n}
Applying the parted attribute is not free and requires storage. It is usually not that costly but looking at your sample output of s, it doesn't look suitable for parting as does not contain repeating values:
q)get `:2017.10.30/ngbarx/s
`p#`sym$`A`AA`AACG`AADI`AADR`AAIC`AAIC-B`AAL`AAM-A`AAMC`AAME`AAOI`AAON`AAP`AA..
See below tables created to illustrate the issue:
/ no part - 16 distinct syms
t1:([]s:100000?`1;v:100000?2e)
/ part - 16 distinct syms
t2:update `p#s from `s xasc ([]s:100000?`1;v:100000?2e)
/ no part - 99999 distinct syms
t3:([]s:100000?`8;v:100000?2e)
/ part - 99999 distinct syms
t4:update `p#s from `s xasc ([]s:100000?`8;v:100000?2e)
The difference in size is insignificant between t1 and t2 with the parted attribute(804096 -> 804664). However, when the number of distinct syms / parts becomes very large, the storage cost is very large. (804096 -> 4749872)
ls | xargs ls -latr
t1:
total 1180
-rw-r--r-- 1 matmoore matmoore 12 Apr 19 10:28 .d
-rw-r--r-- 1 matmoore matmoore 804096 Apr 19 10:28 s
-rw-r--r-- 1 matmoore matmoore 400016 Apr 19 10:28 v
drwxr-xr-x 1 matmoore matmoore 4096 Apr 19 10:28 .
drwxr-xr-x 1 matmoore matmoore 4096 Apr 19 10:28 ..
t2:
total 1180
-rw-r--r-- 1 matmoore matmoore 12 Apr 19 10:28 .d
-rw-r--r-- 1 matmoore matmoore 804664 Apr 19 10:28 s
-rw-r--r-- 1 matmoore matmoore 400016 Apr 19 10:28 v
drwxr-xr-x 1 matmoore matmoore 4096 Apr 19 10:28 .
drwxr-xr-x 1 matmoore matmoore 4096 Apr 19 10:28 ..
t3:
total 1180
-rw-r--r-- 1 matmoore matmoore 12 Apr 19 10:28 .d
-rw-r--r-- 1 matmoore matmoore 804096 Apr 19 10:28 s
-rw-r--r-- 1 matmoore matmoore 400016 Apr 19 10:28 v
drwxr-xr-x 1 matmoore matmoore 4096 Apr 19 10:28 .
drwxr-xr-x 1 matmoore matmoore 4096 Apr 19 10:28 ..
t4:
total 5032
-rw-r--r-- 1 matmoore matmoore 12 Apr 19 10:28 .d
drwxr-xr-x 1 matmoore matmoore 4096 Apr 19 10:28 ..
-rw-r--r-- 1 matmoore matmoore 4749872 Apr 19 10:28 s
-rw-r--r-- 1 matmoore matmoore 400016 Apr 19 10:28 v
drwxr-xr-x 1 matmoore matmoore 4096 Apr 19 10:28 .
I would also question if this column should be a symbol. If 62k is the size of your sym file with just one date created then you should be careful that you are going to end up creating a bloated sym file. If you have a full history from 2017.10.30 and the sym file is still 62k, then it's fine but if you are adding that many new symbols each day, the sym file will quickly spiral out of control.

Perl Script to determine how many logins were still on the system after 16:00

I have a data file that displays the following data:
cfs264su pts/6 x.x.x.x.x Tue May 26 16:46 - 19:21 (02:34)
cfs264su pts/6 x.x.x.x.x Tue May 26 16:30 - 16:46 (00:15)
cfs264su pts/6 x.x.x.x.x Tue May 26 16:19 - 16:30 (00:10)
cfs264su pts/6 x.x.x.x.x Tue May 26 14:59 - 15:30 (00:31)
cfs264su pts/5 x.x.x.x.x Tue May 26 14:40 - 17:13 (02:33)
cfs264su pts/1 x.x.x.x.x Tue May 26 14:02 - 19:06 (05:03)
cfs264su pts/6 x.x.x.x.x Tue May 26 10:36 - 13:18 (02:41)
cfs264su pts/5 x.x.x.x.x Tue May 26 10:22 - 12:45 (02:23)
cfs264su pts/1 x.x.x.x.x Tue May 26 08:45 - 12:12 (03:27)
cfs264su pts/5 x.x.x.x.x Tue May 26 00:34 - 01:28 (00:54)
I have created a perl script that is suppose to display how many logins were still signed in after 16:00 on May 26th. This is what I have so far and cant figure out how to display the correct number of logins which are 5.
#!/usr/bin/perl
open(FILE, $ARGV[0]) or die ("Error Found: $!");
while ( $line = <FILE> ) {
($login, $time) = split('\(', $line);
# print "Time: $time";
($hour,$m) = split('\:',$time);
# print "Hour: $hour\n";
if ( $hour <= 16 ) {
# print "Found: $hour\n";
$n++;
}
}
print "There were $n logins that were still on the system after 16:00 on May 26.\n";
close(FILE);
I finally figure it out. the split had to use / - /
#!/usr/bin/perl
open(FILE, $ARGV[0]) or die ("Error Found: $!");
while ( $line = <FILE> ) {
($login, $time) = split(/ - /, $line);
# print "Time: $time";
($hour,$m) = split('/ - /:',$time);
# print "Hour: $hour\n";
if ( $hour < 16 ) {
# print "Found: $hour\n";
$n++;
}
}
print "There were $n logins that were still on the system after 16:00 on May 26.\n";
close(FILE);
Thanks!
Following code sample demonstrates one of possible ways.
Algorithm:
convert date and time into epoch for $start and $end timeframe
read input data into hash $record->{#fields} with fields as keys
store read record into HoA %login with userid as a key
for user of interest go through all records
compare login time with $start and $end timeframe
if match increase $count
print $count on completion
Run as:
Linux script.pl user '[start]' '[end]'
Windows script.pl user "[start]" "[end]"
use strict;
use warnings;
use feature 'say';
use Date::Parse;
my $user = shift || 'cfs264su';
my $start = shift || 'May 26 10:00 2020';
my $end = shift || 'May 26 15:00 2020';
$start = date2epoch($start);
$end = date2epoch($end);
say 'USER: ' . $user;
say 'START: ' . scalar localtime $start;
say 'END: ' . scalar localtime $end;
say '-' x 45;
my %login;
my #fields = qw/id pts what wday month mday in out duration/;
my $count;
while( <DATA> ) {
my($id, $record);
$record->#{#fields} = split "[- ()]+";
$id = $record->{id};
delete $record->{id};
push #{$login{$id}}, $record;
}
for( #{$login{$user}} ) {
my $in = date2epoch(join ' ', $_->#{qw/month mday in/});
my $out = date2epoch(join ' ', $_->#{qw/month mday out/});
if( $in >= $start && $in <= $end) {
++$count;
say 'IN: ' . scalar localtime $in;
say 'OUT: ' . scalar localtime $out;
say '-' x 45;
}
}
say "Logged in: $count times";
exit;
sub date2epoch {
my $date = shift;
return str2time($date);
}
__DATA__
cfs264su pts/6 x.x.x.x.x Tue May 26 16:46 - 19:21 (02:34)
cfs264su pts/6 x.x.x.x.x Tue May 26 16:30 - 16:46 (00:15)
cfs264su pts/6 x.x.x.x.x Tue May 26 16:19 - 16:30 (00:10)
cfs264su pts/6 x.x.x.x.x Tue May 26 14:59 - 15:30 (00:31)
cfs264su pts/5 x.x.x.x.x Tue May 26 14:40 - 17:13 (02:33)
cfs264su pts/1 x.x.x.x.x Tue May 26 14:02 - 19:06 (05:03)
cfs264su pts/6 x.x.x.x.x Tue May 26 10:36 - 13:18 (02:41)
cfs264su pts/5 x.x.x.x.x Tue May 26 10:22 - 12:45 (02:23)
cfs264su pts/1 x.x.x.x.x Tue May 26 08:45 - 12:12 (03:27)
cfs264su pts/5 x.x.x.x.x Tue May 26 00:34 - 01:28 (00:54)
Output
USER: cfs264su
START: Tue May 26 10:00:00 2020
END: Tue May 26 15:00:00 2020
---------------------------------------------
IN: Tue May 26 14:59:00 2020
OUT: Tue May 26 15:30:00 2020
---------------------------------------------
IN: Tue May 26 14:40:00 2020
OUT: Tue May 26 17:13:00 2020
---------------------------------------------
IN: Tue May 26 14:02:00 2020
OUT: Tue May 26 19:06:00 2020
---------------------------------------------
IN: Tue May 26 10:36:00 2020
OUT: Tue May 26 13:18:00 2020
---------------------------------------------
IN: Tue May 26 10:22:00 2020
OUT: Tue May 26 12:45:00 2020
---------------------------------------------
Logged in: 5 times
A simple approach using Time::Piece.
#!/usr/bin/perl
use strict;
use warnings;
use feature 'say';
use Time::Piece;
# The format of the dates and times that we're dealing with
my $fmt = '%b %d %H:%M';
# Store the count
my $count;
# Get a Time::Piece object that represents our cut-off time
my $cutoff = Time::Piece->strptime('May 26 16:00', $fmt);
# Note: I'm reading from DATA here. It's an easy way to
# prototype stuff without having to deal with opening files
while (<DATA>) {
# Split the data on whitespace
# Pull out columns 4, 5 and 8
# Join them with a space - which gives us the end date/time
my $end_str = join ' ', (split /\s+/)[4, 5, 8];
# Parse that string into a Time::Piece object
my $end = Time::Piece->strptime($end_str, $fmt);
# Increment the counter if this date/time is
# greater than the cut-off
$count++ if $end > $cutoff;
}
say $count;
__DATA__
cfs264su pts/6 x.x.x.x.x Tue May 26 16:46 - 19:21 (02:34)
cfs264su pts/6 x.x.x.x.x Tue May 26 16:30 - 16:46 (00:15)
cfs264su pts/6 x.x.x.x.x Tue May 26 16:19 - 16:30 (00:10)
cfs264su pts/6 x.x.x.x.x Tue May 26 14:59 - 15:30 (00:31)
cfs264su pts/5 x.x.x.x.x Tue May 26 14:40 - 17:13 (02:33)
cfs264su pts/1 x.x.x.x.x Tue May 26 14:02 - 19:06 (05:03)
cfs264su pts/6 x.x.x.x.x Tue May 26 10:36 - 13:18 (02:41)
cfs264su pts/5 x.x.x.x.x Tue May 26 10:22 - 12:45 (02:23)
cfs264su pts/1 x.x.x.x.x Tue May 26 08:45 - 12:12 (03:27)
cfs264su pts/5 x.x.x.x.x Tue May 26 00:34 - 01:28 (00:54)

Insert newline in html format using Powershell

I have the string value that has
. I want replace this with \n. and when I convert it to html format after each line comes to another line. But this not work.
$StdOut = 'total 40
drwxr-xr-x 3 root root 4096 Jun 16 14:55 .
drwxr-xr-x 5 root root 4096 Jun 16 14:54 ..
-rw------- 1 root root 0 Jun 16 14:55 cimserver_start.lock
srwxrwxrwx 1 root root 0
Jun 16 14:55 cim.socket
drwxr-xr-x 2 root root 4096 Jun 16 17:58 localauth
-rw------- 1 root root 6 Jun 16 14:55 scx-cimd.pid
'
$CResult = $StdOut -replace "
", "\n"
After using Convert-html I have the text like this :
'total 40\ndrwxr-xr-x 3 root root 4096 Jun 16 14:55 .\ndrwxr-xr-x 5 root root 4096 Jun 16 14:54 ..\n-rw------- 1 root root 0 Jun 16 14:55 cimserver_start.lock\nsrwxrwxrwx 1 root root 0 Jun 16 14:55
cim.socket\ndrwxr-xr-x 2 root root 4096 Jun 16 17:58 localauth\n-rw------- 1 root root 6 Jun 16 14:55 scx-cimd.pid\n
'
How can I do this?
The approach Mathias mentions works fine for newlines. But if you have (or could have) other entity refs then I would use the HtmlDecode method e.g.
Add-Assembly System.Web
[System.Web.HttpUtility]::HtmlDecode($StdOut)
Outputs:
total 40
drwxr-xr-x 3 root root 4096 Jun 16 14:55 .
drwxr-xr-x 5 root root 4096 Jun 16 14:54 ..
-rw------- 1 root root 0 Jun 16 14:55 cimserver_start.lock
srwxrwxrwx 1 root root 0
PS C:\WINDOWS\system32> Add-Assembly System.Web
Add-Assembly : The term 'Add-Assembly' is not recognized as the name of a
cmdlet, function, script file, or operable program. Check the spelling of
the name, or if
a path was included, verify that the path is correct and try again.
At line:1 char:1
+ Add-Assembly System.Web
+ ~~~~~~~~~~~~
+ CategoryInfo : ObjectNotFound: (Add-Assembly:String) [],
CommandNotFoundException
+ FullyQualifiedErrorId : CommandNotFoundException
Try this
$null = [Reflection.Assembly]::LoadWithPartialName('System.Web')
[System.Web.HttpUtility]::HtmlDecode($StdOut)
Gives me
total 40
drwxr-xr-x 3 root root 4096 Jun 16 14:55 .
drwxr-xr-x 5 root root 4096 Jun 16 14:54 ..
-rw------- 1 root root 0 Jun 16 14:55 cimserver_start.lock
srwxrwxrwx 1 root root 0

Print command result side by side?

It is possible to print the result of 2 commands side by side...
Something like this
something `ls -l /a` `cat bla.txt`
result:
total 24 #while [ 1 = 1 ]; do
-rw-r--r-- 1 wolfy wolfy 194 Aug 13 08:50 c.in # echo "bla"
-rwxr-xr-x 1 wolfy wolfy 52 Sep 24 11:48 bla.sh #done
-rwxr-xr-x 1 wolfy wolfy 38 Sep 24 11:48 bla1.sh echo "bla"
-rwxr-xr-x 1 wolfy wolfy 147 Sep 24 11:54 ble.sh
I know that pr can do something like this with files, but I didn't find a way to do this for commands...
You can use process substitution
pr -m <(cmd1) <(cmd2)
though in your case, since you have one command and one file:
ls -l | pr -m - bla.txt

copy specific file in command line

I want to copy specific file done last changes in Oct 16-17,file type is java.
shia#ubuntu:~/code$ ls -alxo
total 96
drwx------ 2 shia 4096 Oct 20 18:54 .
drwxr-xr-x 61 shia 12288 Oct 20 19:24 ..
-rw------- 1 shia 12288 Oct 16 21:52 .Reuse.java.swp
-rw-rw-r-- 1 shia 746 Oct 20 11:16 Argus.class
-rw-rw-r-- 1 shia 302 Oct 20 11:16 Argus.java
-rw------- 1 shia 310 Oct 16 21:30 Call.java
-rw-rw-r-- 1 shia 417 Oct 17 15:20 Ordinary.class
-rw-rw-r-- 1 shia 298 Oct 17 14:57 Overriding.java
-rw-rw-r-- 1 shia 562 Oct 19 21:27 Package.class
-rw-rw-r-- 1 shia 430 Oct 19 21:27 Package.java
-rw------- 1 shia 729 Oct 17 13:50 Reuse.java
-rw------- 1 shia 424 Oct 17 13:47 Room.java
-rw------- 1 shia 321 Oct 16 21:22 Simpleobject.java
-rw-rw-r-- 1 shia 1187 Oct 17 00:04 Static.java
-rw-rw-r-- 1 shia 686 Oct 17 15:20 Super.class
-rw-rw-r-- 1 shia 1010 Oct 17 15:20 Super.java
-rw------- 1 shia 843 Oct 17 14:20 This.java
-rw-rw-r-- 1 shia 521 Oct 17 14:51 b.java
-rw-rw-r-- 1 shia 90 Oct 20 18:54 cp.awk
-rw-rw-r-- 1 shia 105 Oct 20 17:19 file.txt
I try to specific them but i don't know how to copy them.
shia#ubuntu:~/code$ ls -alxo|grep 'Oct 1[67].*java$'|awk '{print $8}'
Call.java
Overriding.java
Reuse.java
Room.java
Simpleobject.java
Static.java
Super.java
This.java
b.java
Any help,thanks a lot!
One way using find:
find . -maxdepth 1 -type f -name "*.java" -newermt 2012-10-16 ! -newermt 2012-10-18 -exec cp '{}' /home/user/dstFolder/ \;
You can use xargs to copy the files found:
...| xargs -i cp '{}' /home/user/dstFolder/
This will copy all the files found to the folder /home/user/dstFolder/.