Copying files that starts with numbers to another directory - perl

I've solved #1, But I got 2 question left.
Anyone that can help me?
From the directory local dir, list all the files that start with a number (0..9) and which file
extension is .song
Create the subdirectory ”local dir/selected” in which you will copy each one of these files after
numbering each (non blank) line in each one of them.
Print out, in a file called stats.txt, the following informations concerning each one of the files:
a) Number of (non blank) lines.
b) Number of paragraphs. A paragraph here is a block of text composed of non empty lines and
delimited at its beginning and at its end by either the beginning of the file, the end of the file or
by a blank line.
c) The mean size of a paragraph (in number of lines).
d) If, yes or no, all paragraphs in the file have the same length.
Bonus questions:
e) Detect each rhymes present in each file.
f) Give the ratio of rhyming lines towards the total number of lines.
For #1:
#!/usr/bin/perl
use strict;
use warnings;
my $directory = '/local_dir';
opendir (DIR, $directory) or die $!;
while (my $file = readdir(DIR))
{
# Use a regular expression to find files ending in .song.txt
next unless ($file =~ m/\.song.txt$/);
print "$file\n";
}
closedir(DIR);
exit 0;

Here's how you might go about doing the first and second step:
#!/usr/bin/env perl
use strict;
use warnings;
use autodie;
my $src_dir = '';
my #files = glob $src_dir . qq([0-9]*.song.txt);
my $dest_dir = 'selected';
mkdir $dest_dir;
for my $file (#files) {
open my $fin, "<", $file;
open my $fout, ">", "$dest_dir/$file";
my $c = 1;
while (<$fin>) {
$_ = $c++ . " $_" unless /^$/;
print $fout $_;
}
}
Rather than using opendir, you can use glob to find the files that you are interested in, starting with a number [0-9] and ending with .song.txt. After creating the output directory, the for loop goes through these files and creates a new file in the destination directory with the same name. The while loop goes through each line of the input file. It adds a number to the start of each line if it is not blank, i.e. the start of the line is not followed immediately by the end of the line /^$/. Then it writes the line to the new file.
As for the third step, I think that it would be worth you having a go at that yourself and asking a new question if you get stuck.

Related

Recursive grep in perl

I am new to perl. I have a directory structure. In each directory, I have a log file. I want to grep pattern from that file and do post processing. Right now I am grepping the pattern from those files using unix grep and putting into text file and reading that text file to do post processing, But I want to automate task of reading each file and grepping pattern from that file. In the code below the mdp_cgdis_1102.txt have grepped pattern from directories. I would really appreciate any help
#!usr/bin/perl
use strict;
use warnings;
open FILE, 'mdp_cgdis_1102.txt' or die "Cannot open file $!";
my #array = <FILE>;
my #arr;
my #brr;
foreach my $i (#array){
#arr = split (/\//, $i);
#brr = split (/\:/, $i);
print " $arr[0] --- $brr[2]";
}
It is unclear to me which part of the process needs automating. I'll go by "want to automate reading each file and grepping pattern from that file," whereby you presumably already have a list of files. If you actually need to build the file list as well see the added code below.
One way: pull all patterns from each file and store that in a hash (filename => arrayref-with-patterns)
my %file_pattern;
foreach my $file (#filelist) {
open my $fh, '<', $file or die "Can't open $file: $!";
$file_pattern{$file} = [ grep { /$pattern/ } <$fh> ];
close $fh;
}
The [ ] takes a reference to the list returned by grep, ie. constructs an "anonymous array", and that (reference) is assigned as a value to the $file key.
Now you can process your patterns, per log file
foreach my $filename (sort keys %file_pattern) {
print "Processing log $filename.\n";
my #patterns = #{$file_pattern{$filename}};
# Process the list of patterns in this log file
}
ADDED
In order to build the list of files #filelist used above, from a known list of directories, use core File::Find
module which recursively scans supplied directories and applies supplied subroutines
use File::Find;
find( { wanted => \&process_logs, preprocess => \&select_logs }, #dir_list);
Your subroutine process_logs() is applied to each file/directory that passed preprocessing by the second sub, with its name available as $File::Find::name, and in it you can either populate the hash with patterns-per-log as shown above, or run complete processing as needed.
Your subroutine select_logs() contains code to filter log files from all files in each directory, that File::Find would normally processes, so that process_file() only gets the log files.
Another way would be to use the other invocation
find(\&process_all, #dir_list);
where now the sub process_all() is applied to all entries (files and directories) found and thus this sub itself needs to ensure that it only processes the log files. See linked documentation.
The equivalent of
find ... -name '*.txt' -type f -exec grep ... {} +
is
use File::Find::Rule qw( );
my $base_dir_qfn = ...;
my $re = qr/.../;
my #log_qfns =
File::Find::Rule
->name(qr/\..txt\z/)
->file
->in($base_dir_qfn);
my $success = 1;
for my $log_qfn (#log_qfns) {
open(my $fh, '<', $log_qfn)
or do {
$success = 0;
warn("Can't open log file \"$log_qfn\": $!\n);
next;
};
while (<$fh>) {
print if /$re/;
}
}
exit(1) if !$success;
Use File::Find to traverse the directory.
In a loop go through all the logfiles:
Open the file
read it line by line
For each line, do a regular expression match (
if ($line =~ /pattern/) ) or use
if (index($line, $searchterm) >= 0) if you are looking for a certain static string.
If you find a match, print the line.
close the file
I hope that gives you enough pointers to get started. You will learn more if you find out how to do each of these steps in Perl by yourself (I pointed out the hard ones).

Remove files less than n lines in Perl

I'm writing a Perl script to remove files that have fewer than a given number of lines. What I have so far is
my $cmd = join('','wc -l ', $file); #prints number of lines to command line
if (system($cmd) < 4)
{
my $rmcmd = join('','rm ',$file);
system($rmcmd);
}
where $file is the name and location of a file.
There's no need to use system for this. Perl is perfectly capable of counting lines:
sub count_lines {
open my $fh, '<', shift;
while(local $_ = <$fh>) {} # loop through all lines
return $.;
}
unlink $file if count_lines($file) < 4;
I'm assuming your end goal is to have it search through a directory tree removing files with line count less than n. Check out File::Find and its nifty code generator find2perl to handle that part for you.

How to find position of a word by using a counter?

I am currently working on a code that changes certain words to Shakespearean words. I have to extract the sentences that contain the words and print them out into another file. I had to remove .START from the beginning of each file.
First I split the files with the text by spaces, so now I have the words. Next, I iterated the words through a hash. The hash keys and values are from a tab delimited file that is structured as so, OldEng/ModernEng (lc_Shakespeare_lexicon.txt). Right now, I'm trying to figure out how to find the exact position of each modern English word that is found, change it to the Shakespearean; then find the sentences with the change words and printing them out to a different file. Most of the code is finished except for this last part. Here is my code so far:
#!/usr/bin/perl -w
use diagnostics;
use strict;
#Declare variables
my $counter=();
my %hash=();
my $conv1=();
my $conv2=();
my $ssph=();
my #text=();
my $key=();
my $value=();
my $conversion=();
my #rmv=();
my $splits=();
my $words=();
my #word=();
my $vals=();
my $existingdir='/home/nelly/Desktop';
my #file='Sentences.txt';
my $eng_words=();
my $results=();
my $storage=();
#Open file to tab delimited words
open (FILE,"<", "lc_shakespeare_lexicon.txt") or die "could not open lc_shakespeare_lexicon.txt\n";
#split words by tabs
while (<FILE>){
chomp($_);
($value, $key)= (split(/\t/), $_);
$hash{$value}=$key;
}
#open directory to Shakespearean files
my $dir="/home/nelly/Desktop/input";
opendir(DIR,$dir) or die "can't opendir Shakespeare_input.tar.gz";
#Use grep to get WSJ file and store into an array
my #array= grep {/WSJ/} readdir(DIR);
#store file in a scalar
foreach my $file(#array){
#open files inside of input
open (DATA,"<", "/home/nelly/Desktop/input/$file") or die "could not open $file\n";
#loop through each file
while (<DATA>){
#text=$_;
chomp(#text);
#Remove .START
#rmv=grep(!/.START/, #text);
foreach $splits(#rmv){
#split data into separate words
#word=(split(/ /, $splits));
#Loop through each word and replace with Shakespearean word that exists
$counter=0;
foreach $words(#word){
if (exists $hash{$words}){
$eng_words= $hash{$words};
$results=$counter;
print "$counter\n";
$counter++;
#create a new directory and store senteces with Shakespearean words in new file called "Sentences.txt"
mkdir $existingdir unless -d $existingdir;
open my $FILE, ">>", "$existingdir/#file", or die "Can't open $existingdir/conversion.txt'\n";
#print $FILE "#words\n";
close ($FILE);
}
}
}
}
}
close (FILE);
close (DIR);
Natural language processing is very hard to get right except in trivial cases, for instance it is difficult to define exactly what is meant by a word or a sentence, and it is awkward to distinguish between a single quote and an apostrophe when they are both represented using the U+0027 "apostrophe" character '
Without any example data it is difficult to write a reliable solution, but the program below should be reasonably close
Please note the following
use warnings is preferable to -w on the shebang line
A program should contain as few comments as possible as long as it is comprehensible. Too many comments just make the program bigger and harder to grasp without adding any new information. The choice of identifiers should make the code mostly self documenting
I believe use diagnostics to be unnecessary. Most messages are fairly self-explanatory, and diagnostics can produce large amounts of unnecessary output
Because you are opening multiple files it is more concise to use autodie which will avoid the need to explicitly test every open call for success
It is much better to use lexical file handles, such as open my $fh ... instead of global ones, like open FH .... For one thing a lexical file handle will be implicitly closed when it goes out of scope, which helps to tidy up the program a lot by making explicit close calls unnecessary
I have removed all of the variable declarations from the top of the program except those that are non-empty. This approach is considered to be best practice as it aids debugging and assists the writing of clean code
The program lower-cases the original word using lc before checking to see if there is a matching entry in the hash. If a translation is found, then the new word is capitalised using ucfirst if the original word started with a capital letter
I have written a regular expression that will take the next sentence from the beginning of the string $content. But this is one of the things that I can't get right without sample data, and there may well be problems, for instance, with sentences that end with a closing quotation mark or a closing parenthesis
use strict;
use warnings;
use autodie;
my $lexicon = 'lc_shakespeare_lexicon.txt';
my $dir = '/home/nelly/Desktop/input';
my $existing_dir = '/home/nelly/Desktop';
my $sentences = 'Sentences.txt';
my %lexicon = do {
open my ($fh), '<', $lexicon;
local $/;
reverse(<$fh> =~ /[^\t\n\r]+/g);
};
my #files = do {
opendir my ($dh), $dir;
grep /WSJ/, readdir $dh;
};
for my $file (#files) {
my $contents = do {
open my $fh, '<', "$dir/$file";
join '', grep { not /\A\.START/ } <$fh>;
};
# Change any CR or LF to a space, and reduce multiple spaces to single spaces
$contents =~ tr/\r\n/ /;
$contents =~ s/ {2,}/ /g;
# Find and process each sentence
while ( $contents =~ / \s* (.+?[.?!]) (?= \s+ [A-Z] | \s* \z ) /gx ) {
my $sentence = $1;
my #words = split ' ', $sentence;
my $changed;
for my $word (#words) {
my $eng_word = $lexicon{lc $word};
$eng_word = ucfirst $eng_word if $word =~ /\A[A-Z]/;
if ($eng_word) {
$word = $eng_word;
++$changed;
}
}
if ($changed) {
mkdir $existing_dir unless -d $existing_dir;
open my $out_fh, '>>', "$existing_dir/$sentences";
print "#words\n";
}
}
}

File are not getting rename in the same folder

I am trying to rename the existing file name with Kernel.txt on the basis of "Linux kernel Version" or "USB_STATE=DISCONNECTED". Script is running without any error but no output is coming. The changed file needs to be in the same folder(F1,F2,F3) as it was earlier.
Top dir: Log
SubDir: F1,F2,F3
F1: .bin file,.txt file,.jpg file
F2: .bin file,.txt file,.jpg file
F3: .bin file,.txt file,.jpg file
#!/usr/bin/perl
use strict;
use warnings;
use File::Find;
use File::Basename;
use File::Spec;
use Cwd;
chdir('C:\\doc\\logs');
my $dir_01 = getcwd;
my $all_file=find ({ 'wanted' => \&renamefile }, $dir_01);
sub renamefile
{
if ( -f and /.txt?/ )
{
my #files = $_;
foreach my $file (#files)
{
open (FILE,"<" ,$file) or die"Can not open the file";
my #lines = <FILE>;
close FILE;
for my $line ( #lines )
{
if($line=~ /Linux kernel Version/gi || $line=~ /USB_STATE=DISCONNECTED/gi)
{
my $dirname = dirname($file); # file's directory, so we rename only the file itself.
my $file_name = basename($file); # File name fore renaming.
my $new_file_name = $file_name;
$new_file_name =~ s/.* /Kernal.txt/g; # replace the name with Kernal.txt
rename($file, File::Spec->catfile($dirname, $new_file_name)) or die $!;
}
}
}
}
}
This code looks a bit like cargo-cult programming. That is, some constructs are here without indication that you are understanding what this is doing.
chdir('C:\\doc\\logs');
my $dir_01 = getcwd;
Do yourself a favour and use forward slashes, even for Windows pathnames. This is generally supported.
Your directory diagram says that there is a top dir Log, yet you chdir to C:/doc/logs. What is it?
You do realize that $dir_01 is a very nondescriptive name, and is the path you just chdir'd to? Also, File::Find does not require you to start in the working directory. That is, the chdir is a bit useless here. You actually want:
my $start_directory = "C:/doc/Log"; # or whatever
my $all_file=find ({ 'wanted' => \&renamefile }, $dir_01);
I'm not sure what the return value of find would mean. But I'm sure that we don't have to put it into some unused variable.
When we provide key names with the => fat comma, we don't have to manually quote these keys. Therefore:
find({ wanted => \&renamefile }, $start_directory);
/.txt?/
This regex does the following:
match any character (that isn't a newline),
followed by literal tx,
and optionally a t. the ? is a zero-or-one quantifier.
If you want to match filenames that end with .txt, you should do
/\.txt$/
the \. matches a literal period. The $ anchors the regex at the end of the string.
my #files = $_;
foreach my $file (#files) {
...;
}
This would normally be written as
my $file = $_;
...;
You assign the value of $_ to the #files array, which then has one element: The $_ contents. Then you loop over this one element. Such loops don't deserve to be called loops.
open (FILE,"<" ,$file) or die"Can not open the file";
my #lines = <FILE>;
close FILE;
for my $line ( #lines )
{ ... }
Ah, where to begin?
Use lexical variables for file handles. These have the nice property of closing themselves.
For error handling, use autodie. If you really want to do it yourself, the error message should contain two important pieces of information:
the name of the file you couldn't open ($file)
the reason why the open failed ($!)
That would mean something like ... or die "Can't open $file: $!".
Don't read the whole file into an array and loop over that. Instead, be memory-efficient and iterate over the lines, using a while(<>)-like loop. This only reads one line at a time, which is much better.
Combined, this would look like
use autodie; # at the top
open my $fh, "<", $file;
LINE: while (<$fh>) {
...; # no $line variable, let's use $_ instead
}
Oh, and I labelled the loop (with LINE) for later reference.
if($line=~ /Linux kernel Version/gi || $line=~ /USB_STATE=DISCONNECTED/gi) { ... }
Putting the /g flag on regexes turns them into an iterator. you really don't want that. And I'm not quite sure if that case-insensitive matching is really neccessary. You can move the || or into the regex, with the regex alternation |. As we now use $_ to contain the lines, we don't have to manually bind the regex to a string. Therefore, we can write:
if (/Linux Kernel Version|USB_STATE=DISCONNECTED/i) { ... }
my $dirname = dirname($file); # file's directory, so we rename only the file itself.
my $file_name = basename($file); # File name fore renaming.
The by default, the original $_, and therefore our $file, only contains the filename, but not the directory. This isn't a problem: File::Find chdir'd into the correct directory. This makes our processing a lot easier. If you want to have the directory, use the $File::Find::dir variable.
my $new_file_name = $file_name;
$new_file_name =~ s/.* /Kernal.txt/g;
The /.* / regex says:
match anything up to including the last space
If this matches, replace the matched part with Kernal.txt.
The /g flag is completely useless here. Are you sure you don't want Kernel.txt with an e? And why the space in the filename? I don't quite understand that. If you want to rename the file to Kernel.txt, just assign that as a string, instead of doing weird stuff with substitutions:
my $new_file_name = "Kernel.txt";
rename($file, File::Spec->catfile($dirname, $new_file_name)) or die $!;
We already established that an error message should also include the filename, or even better: we should use automatic error handling.
Also, we are already in the correct directory, so we don't have to concatenate the new name with the directory.
rename $file => $new_file_name; # error handling by autodie
last LINE;
That should be enough. Also note that I leave the LINE loop. Once we renamed the file, there is no need to check the other lines as well.

Perl Find - No such file or directory

I am using File::Find and file i/o on a text file to parse a series of directories and move the contents into a new folder. It is a simple script (see below):
#!/usr/bin/perl
use strict;
use warnings;
use File::Find;
use File::Copy;
my $dir = "/opt/CollectMinderDocuments/coastalalglive"; #base directory for Coastal documents
#read file that contains a list of closed IDs
open(MYDATA, "Closed.txt");
mkdir("Closed");
while(my $line = <MYDATA>) {
chomp $line;
my $str = "$dir" . "/Account$line";
print "$str\n";
find(\&move_documents, $str);
}
sub move_documents {
my $smallStr = substr $File::Find::name, 43;
if(-d) {
#system("mkdir ~/Desktop/Closed/$smallStr");
print "I'm here\n";
system("mkdir /opt/CollectMinderDocuments/coastalalglive/Closed/$smallStr");
#print "Made a directory: /opt/CollectMinderDocuments/coastalalglive/Closed/$smallStr\n";
}
else {
print "Now I'm here\n";
my $smallerStr = substr $File::Find::dir, 43;
my $temp = "mv * /opt/CollectMinderDocuments/coastalalglive/Closed/$smallerStr/";
system("$temp");
}
}
The text file contains a list of numbers:
1234
2805
5467
The code worked when I executed it last month, but it is now returning a "file or directory not found" error. The actual error is "No such file or directoryerDocuments/coastalalglive/Account2805". I know all of the directories it is searching for exist. I have manually typed in one of the directories, and the script executes fine:
find(\&move_documents, "/opt/CollectMinderDocuments/coastalalglive/Account2805/");
I am not sure why the error is being returned. Thanks in advance for the help.
Your error:
"No such file or directoryerDocuments/coastalalglive/Account2805"
Seems to imply that there is an \r that was not removed by your chomp. That will happen when transferring files between different file systems, where the file contains \r\n as line endings. The real error string would be something like:
/opt/CollectMinderDocuments/coastalalglive/Account2805\r: No such file or directory
Try changing chomp $line to $line =~ s/[\r\n]+$//; instead, and see if that works.
Also:
my $temp = "mv * /opt/CollectMinderDocuments/coastalalglive/Closed/$smallerStr/";
system("$temp");
Is very wrong. The first non-directory file in that loop will move all the remaining files (including dirs? not sure if mv does that by default). Hence, subsequent iterations of the subroutine will find nothing to move, also causing a "Not found" type error. Though not one caught by perl, since you are using system instead of File::Copy::move. E.g.:
move $_, "/opt/CollectMinderDocuments/coastalalglive/Closed/$smallerStr/" or die $!;