I use readdir to get the files of a directory , but I want to remove . and .. using grep . The output shows it still contain the . and .. , but I can't figure out what's wrong with it ?
here is my code
#!/usr/bin/perl
opendir(Dir,$ARGV[0]);
#Dirs = readdir(Dir);
#Dirs = grep { $_ != /./ } #Dirs;
# #Dirs = grep { $_ =~ /^./ } #Dirs;
print join("\n",#Dirs);
Thanks
I strongly suggest you take note of the following
Always use strict and use warnings, even for the tiniest bit of code. They will repay you the extra typing time many times over
Always use lexical directory handles and file handles. Global handles like this have been the
wrong choice for over twelve years now
Always check the success of file and directory open calls, and use a die string that includes the $! variable to say why the open failed
Use lower-case letters and underscores for local variable names. Upper case is reserved by convention for global items like package names and built-in variables
Use print "$_\n" for #array instead of print join "\n", #array because a) using join produces a second copy of the text in the array and wastes space, and b) using join omits the newline from the last line of the array
Take a look at this alternative to your program, which applies the advice above. I have excluded all directory entries beginning with a dot, as it successfully removes . and .. as well as Linux "hidden" entries that start with a dot. You may require something different.
#!/usr/bin/perl
use strict;
use warnings;
opendir my $dh, $ARGV[0] or die $!;
my #dirs = grep { not /^\./ } readdir $dh;
print "$_\n" for #dirs;
Try escaping the .:
#Dirs = grep { $_ !~ /^\.\.?$/ } #Dirs;
The dot is a special metacharacter which matches any character when not escaped.
. in a regexp means "any character", try escaping it like this: \.
Related
I am currently working on a code that changes certain words to Shakespearean words. I have to extract the sentences that contain the words and print them out into another file. I had to remove .START from the beginning of each file.
First I split the files with the text by spaces, so now I have the words. Next, I iterated the words through a hash. The hash keys and values are from a tab delimited file that is structured as so, OldEng/ModernEng (lc_Shakespeare_lexicon.txt). Right now, I'm trying to figure out how to find the exact position of each modern English word that is found, change it to the Shakespearean; then find the sentences with the change words and printing them out to a different file. Most of the code is finished except for this last part. Here is my code so far:
#!/usr/bin/perl -w
use diagnostics;
use strict;
#Declare variables
my $counter=();
my %hash=();
my $conv1=();
my $conv2=();
my $ssph=();
my #text=();
my $key=();
my $value=();
my $conversion=();
my #rmv=();
my $splits=();
my $words=();
my #word=();
my $vals=();
my $existingdir='/home/nelly/Desktop';
my #file='Sentences.txt';
my $eng_words=();
my $results=();
my $storage=();
#Open file to tab delimited words
open (FILE,"<", "lc_shakespeare_lexicon.txt") or die "could not open lc_shakespeare_lexicon.txt\n";
#split words by tabs
while (<FILE>){
chomp($_);
($value, $key)= (split(/\t/), $_);
$hash{$value}=$key;
}
#open directory to Shakespearean files
my $dir="/home/nelly/Desktop/input";
opendir(DIR,$dir) or die "can't opendir Shakespeare_input.tar.gz";
#Use grep to get WSJ file and store into an array
my #array= grep {/WSJ/} readdir(DIR);
#store file in a scalar
foreach my $file(#array){
#open files inside of input
open (DATA,"<", "/home/nelly/Desktop/input/$file") or die "could not open $file\n";
#loop through each file
while (<DATA>){
#text=$_;
chomp(#text);
#Remove .START
#rmv=grep(!/.START/, #text);
foreach $splits(#rmv){
#split data into separate words
#word=(split(/ /, $splits));
#Loop through each word and replace with Shakespearean word that exists
$counter=0;
foreach $words(#word){
if (exists $hash{$words}){
$eng_words= $hash{$words};
$results=$counter;
print "$counter\n";
$counter++;
#create a new directory and store senteces with Shakespearean words in new file called "Sentences.txt"
mkdir $existingdir unless -d $existingdir;
open my $FILE, ">>", "$existingdir/#file", or die "Can't open $existingdir/conversion.txt'\n";
#print $FILE "#words\n";
close ($FILE);
}
}
}
}
}
close (FILE);
close (DIR);
Natural language processing is very hard to get right except in trivial cases, for instance it is difficult to define exactly what is meant by a word or a sentence, and it is awkward to distinguish between a single quote and an apostrophe when they are both represented using the U+0027 "apostrophe" character '
Without any example data it is difficult to write a reliable solution, but the program below should be reasonably close
Please note the following
use warnings is preferable to -w on the shebang line
A program should contain as few comments as possible as long as it is comprehensible. Too many comments just make the program bigger and harder to grasp without adding any new information. The choice of identifiers should make the code mostly self documenting
I believe use diagnostics to be unnecessary. Most messages are fairly self-explanatory, and diagnostics can produce large amounts of unnecessary output
Because you are opening multiple files it is more concise to use autodie which will avoid the need to explicitly test every open call for success
It is much better to use lexical file handles, such as open my $fh ... instead of global ones, like open FH .... For one thing a lexical file handle will be implicitly closed when it goes out of scope, which helps to tidy up the program a lot by making explicit close calls unnecessary
I have removed all of the variable declarations from the top of the program except those that are non-empty. This approach is considered to be best practice as it aids debugging and assists the writing of clean code
The program lower-cases the original word using lc before checking to see if there is a matching entry in the hash. If a translation is found, then the new word is capitalised using ucfirst if the original word started with a capital letter
I have written a regular expression that will take the next sentence from the beginning of the string $content. But this is one of the things that I can't get right without sample data, and there may well be problems, for instance, with sentences that end with a closing quotation mark or a closing parenthesis
use strict;
use warnings;
use autodie;
my $lexicon = 'lc_shakespeare_lexicon.txt';
my $dir = '/home/nelly/Desktop/input';
my $existing_dir = '/home/nelly/Desktop';
my $sentences = 'Sentences.txt';
my %lexicon = do {
open my ($fh), '<', $lexicon;
local $/;
reverse(<$fh> =~ /[^\t\n\r]+/g);
};
my #files = do {
opendir my ($dh), $dir;
grep /WSJ/, readdir $dh;
};
for my $file (#files) {
my $contents = do {
open my $fh, '<', "$dir/$file";
join '', grep { not /\A\.START/ } <$fh>;
};
# Change any CR or LF to a space, and reduce multiple spaces to single spaces
$contents =~ tr/\r\n/ /;
$contents =~ s/ {2,}/ /g;
# Find and process each sentence
while ( $contents =~ / \s* (.+?[.?!]) (?= \s+ [A-Z] | \s* \z ) /gx ) {
my $sentence = $1;
my #words = split ' ', $sentence;
my $changed;
for my $word (#words) {
my $eng_word = $lexicon{lc $word};
$eng_word = ucfirst $eng_word if $word =~ /\A[A-Z]/;
if ($eng_word) {
$word = $eng_word;
++$changed;
}
}
if ($changed) {
mkdir $existing_dir unless -d $existing_dir;
open my $out_fh, '>>', "$existing_dir/$sentences";
print "#words\n";
}
}
}
I am trying to rename the existing file name with Kernel.txt on the basis of "Linux kernel Version" or "USB_STATE=DISCONNECTED". Script is running without any error but no output is coming. The changed file needs to be in the same folder(F1,F2,F3) as it was earlier.
Top dir: Log
SubDir: F1,F2,F3
F1: .bin file,.txt file,.jpg file
F2: .bin file,.txt file,.jpg file
F3: .bin file,.txt file,.jpg file
#!/usr/bin/perl
use strict;
use warnings;
use File::Find;
use File::Basename;
use File::Spec;
use Cwd;
chdir('C:\\doc\\logs');
my $dir_01 = getcwd;
my $all_file=find ({ 'wanted' => \&renamefile }, $dir_01);
sub renamefile
{
if ( -f and /.txt?/ )
{
my #files = $_;
foreach my $file (#files)
{
open (FILE,"<" ,$file) or die"Can not open the file";
my #lines = <FILE>;
close FILE;
for my $line ( #lines )
{
if($line=~ /Linux kernel Version/gi || $line=~ /USB_STATE=DISCONNECTED/gi)
{
my $dirname = dirname($file); # file's directory, so we rename only the file itself.
my $file_name = basename($file); # File name fore renaming.
my $new_file_name = $file_name;
$new_file_name =~ s/.* /Kernal.txt/g; # replace the name with Kernal.txt
rename($file, File::Spec->catfile($dirname, $new_file_name)) or die $!;
}
}
}
}
}
This code looks a bit like cargo-cult programming. That is, some constructs are here without indication that you are understanding what this is doing.
chdir('C:\\doc\\logs');
my $dir_01 = getcwd;
Do yourself a favour and use forward slashes, even for Windows pathnames. This is generally supported.
Your directory diagram says that there is a top dir Log, yet you chdir to C:/doc/logs. What is it?
You do realize that $dir_01 is a very nondescriptive name, and is the path you just chdir'd to? Also, File::Find does not require you to start in the working directory. That is, the chdir is a bit useless here. You actually want:
my $start_directory = "C:/doc/Log"; # or whatever
my $all_file=find ({ 'wanted' => \&renamefile }, $dir_01);
I'm not sure what the return value of find would mean. But I'm sure that we don't have to put it into some unused variable.
When we provide key names with the => fat comma, we don't have to manually quote these keys. Therefore:
find({ wanted => \&renamefile }, $start_directory);
/.txt?/
This regex does the following:
match any character (that isn't a newline),
followed by literal tx,
and optionally a t. the ? is a zero-or-one quantifier.
If you want to match filenames that end with .txt, you should do
/\.txt$/
the \. matches a literal period. The $ anchors the regex at the end of the string.
my #files = $_;
foreach my $file (#files) {
...;
}
This would normally be written as
my $file = $_;
...;
You assign the value of $_ to the #files array, which then has one element: The $_ contents. Then you loop over this one element. Such loops don't deserve to be called loops.
open (FILE,"<" ,$file) or die"Can not open the file";
my #lines = <FILE>;
close FILE;
for my $line ( #lines )
{ ... }
Ah, where to begin?
Use lexical variables for file handles. These have the nice property of closing themselves.
For error handling, use autodie. If you really want to do it yourself, the error message should contain two important pieces of information:
the name of the file you couldn't open ($file)
the reason why the open failed ($!)
That would mean something like ... or die "Can't open $file: $!".
Don't read the whole file into an array and loop over that. Instead, be memory-efficient and iterate over the lines, using a while(<>)-like loop. This only reads one line at a time, which is much better.
Combined, this would look like
use autodie; # at the top
open my $fh, "<", $file;
LINE: while (<$fh>) {
...; # no $line variable, let's use $_ instead
}
Oh, and I labelled the loop (with LINE) for later reference.
if($line=~ /Linux kernel Version/gi || $line=~ /USB_STATE=DISCONNECTED/gi) { ... }
Putting the /g flag on regexes turns them into an iterator. you really don't want that. And I'm not quite sure if that case-insensitive matching is really neccessary. You can move the || or into the regex, with the regex alternation |. As we now use $_ to contain the lines, we don't have to manually bind the regex to a string. Therefore, we can write:
if (/Linux Kernel Version|USB_STATE=DISCONNECTED/i) { ... }
my $dirname = dirname($file); # file's directory, so we rename only the file itself.
my $file_name = basename($file); # File name fore renaming.
The by default, the original $_, and therefore our $file, only contains the filename, but not the directory. This isn't a problem: File::Find chdir'd into the correct directory. This makes our processing a lot easier. If you want to have the directory, use the $File::Find::dir variable.
my $new_file_name = $file_name;
$new_file_name =~ s/.* /Kernal.txt/g;
The /.* / regex says:
match anything up to including the last space
If this matches, replace the matched part with Kernal.txt.
The /g flag is completely useless here. Are you sure you don't want Kernel.txt with an e? And why the space in the filename? I don't quite understand that. If you want to rename the file to Kernel.txt, just assign that as a string, instead of doing weird stuff with substitutions:
my $new_file_name = "Kernel.txt";
rename($file, File::Spec->catfile($dirname, $new_file_name)) or die $!;
We already established that an error message should also include the filename, or even better: we should use automatic error handling.
Also, we are already in the correct directory, so we don't have to concatenate the new name with the directory.
rename $file => $new_file_name; # error handling by autodie
last LINE;
That should be enough. Also note that I leave the LINE loop. Once we renamed the file, there is no need to check the other lines as well.
I need to compare the big file(2GB) contains 22 million lines with the another file. its taking more time to process it while using Tie::File.so i have done it through 'while' but problem remains. see my code below...
use strict;
use Tie::File;
# use warnings;
my #arr;
# tie #arr, 'Tie::File', 'title_Nov19.txt';
# open(IT,"<title_Nov19.txt");
# my #arr=<IT>;
# close(IT);
open(RE,">>res.txt");
open(IN,"<input.txt");
while(my $data=<IN>){
chomp($data);
print"$data\n";
my $occ=0;
open(IT,"<title_Nov19.txt");
while(my $line2=<IT>){
my $line=$line2;
chomp($line);
if($line=~m/\b$data\b/is){
$occ++;
}
}
print RE"$data\t$occ\n";
}
close(IT);
close(IN);
close(RE);
so help me to reduce it...
Lots of things wrong with this.
Asides from the usual (lack of use strict, use warnings, use of 2-argument open(), not checking open() result, use of global filehandles), the specific problem in your case is that you are opening/reading/closing the second file once for every single line of the first. This is going to be very slow.
I suggest you open the file title_Nov19.txt once, read all the lines into an array or hash or something, then close it; and then you can open the first file, input.txt and walk along that once, comparing to things in the array so you don't have to reopen that second file all the time.
Futher I suggest you read some basic articles on style/etc.. as your question is likely to gain more attention if it's actually written in vaguely modern standards.
I tried to build a small example script with a better structure but I have to say, man, your problem description is really very unclear. It's important to not read the whole comparison file each time as #LeoNerd explained in his answer. Then I use a hash to keep track of the match count:
#!/usr/bin/env perl
use strict;
use warnings;
# cache all lines of the comparison file
open my $comp_file, '<', 'input.txt' or die "input.txt: $!\n";
chomp (my #comparison = <$comp_file>);
close $comp_file;
# prepare comparison
open my $input, '<', 'title_Nov19.txt' or die "title_Nov19.txt: $!\n";
my %count = ();
# compare each line
while (my $title = <$input>) {
chomp $title;
# iterate comparison strings
foreach my $comp (#comparison) {
$count{$comp}++ if $title =~ /\b$comp\b/i;
}
}
# done
close $input;
# output (sorted by count)
open my $output, '>>', 'res.txt' or die "res.txt: $!\n";
foreach my $comp (#comparison) {
print $output "$comp\t$count{$comp}\n";
}
close $output;
Just to get you started... If someone wants to further work on this: these were my test files:
title_Nov19.txt
This is the foo title
Wow, we have bar too
Nothing special here but foo
OMG, the last title! And Foo again!
input.txt
foo
bar
And the result of the program was written to res.txt:
foo 3
bar 1
Here's another option using memowe's (thank you) data:
use strict;
use warnings;
use File::Slurp qw/read_file write_file/;
my %count;
my $regex = join '|', map { chomp; $_ = "\Q$_\E" } read_file 'input.txt';
for ( read_file 'title_Nov19.txt' ) {
my %seen;
!$seen{ lc $1 }++ and $count{ lc $1 }++ while /\b($regex)\b/ig;
}
write_file 'res.txt', map "$_\t$count{$_}\n",
sort { $count{$b} <=> $count{$a} } keys %count;
Numerically-sorted output to res.txt:
foo 3
bar 1
An alternation regex which quotes meta characters (\Q$_\E) is built and used, so only one pass against the large file's lines is needed. The hash %seen is used to insure that the input words are only counted once per line.
Hope this helps!
Try this:
grep -i -c -w -f input.txt title_Nov19.txt > res.txt
in Unix, what I want to do is "history | grep keyword", just because it takes quite some steps if i wanna grep many types of keywords, so I want it to be automation, which I write a Perl script to do everything, instead of repeating the commands by just changing the keyword, so whenever I want to see those certain commands, I will just use the Perl script to do it for me.
The keyword that I would like to 'grep' is such as source, ls, cd, etc.
It can be printed out in any format, as long as to know how to do it.
Thanks! I appreciate any comments.
modified (thanks to #chas-owens)
#!/bin/perl
my $searchString = $ARGV[0];
my $historyFile = ".bash.history";
open FILE, "<", $historyFile or die "could not open $historyFile: $!";
my #line = <FILE>;
print "Lines that matched $searchString\n";
for (#lines) {
if ($_ =~ /$searchString/) {
print "$_\n";
}
}
original
#!/bin/perl
my $searchString = $ARGV[0];
my $historyFile = "<.bash.history";
open FILE, $historyFile;
my #line = <FILE>;
print "Lines that matched $searchString\n";
for (#lines) {
if ($_ =~ /$searchString/) {
print "$_\n";
}
}
to be honest ... history | grep whatever is clean and simple and nice ; )
note code may not be perfect
because it takes quite some steps if i wanna grep many types of keywords
history | grep -E 'ls|cd|source'
-P will switch on the Perl compatible regular expression library, if you have a new enough version of grep.
This being Perl, there are many ways to do it. The simplest is probably:
#!/usr/bin/perl
use strict;
use warnings;
my $regex = shift;
print grep { /$regex/ } `cat ~/.bash_history`;
This runs the shell command cat ~/.bash_history and returns the output as a list of lines. The list of lines is then consumed by the grep function. The grep function runs the code block for every item and only returns the ones that have a true return value, so it will only return lines that match the regex.
This code has several things wrong with it (it spawns a shell to run cat, it holds the entire file in memory, $regex could contain dangerous things, etc.), but in a safe environment where speed/memory isn't an issue, it isn't all that bad.
A better script would be
#!/usr/bin/perl
use strict;
use warnings;
use constant HISTORYFILE => "$ENV{HOME}/.bash_history";
my $regex = shift;
open my $fh, "<", HISTORYFILE
or die "could not open ", HISTORYFILE, ": $!";
while (<$fh>) {
next unless /$regex/;
print;
}
This script uses a constant to make it easier to change which history file it is using at a latter date. It opens the history file directly and reads it line by line. This means the whole file is never in memory. This can be very important if the file is very large. It still has the problem that $regex might contain a harmful regex, but so long as you are the person running it, you only have yourself to blame (but I wouldn't let outside users pass arguments to a command like this through, say a web application).
I think you are better off writing a perlscript which does you fancy matching (i.e. replaces the grep) but does not read the history file. I say this because the history does not appear to be flushed to the .bash_history file until I exit the shell. Now there are probably settings and/or environment variables to control this, but I don't know what they are. So if you just write a perl script which scanns STDIN for your favourite commands you can invoke it like
history | findcommands.pl
If its less typing you are after set up a shell function or alias to do this for you.
As requested by #keifer here is a sample perl script which searches for a specified (or default set of commands in your history). Onbiously you should change the dflt_cmds to whichever ones you search for most frequently.
#!/usr/bin/perl
my #dflt_cmds = qw( cd ls echo );
my $cmds = \#ARGV;
if( !scalar(#$cmds) )
{
$cmds = \#dflt_cmds;
}
while( my $line = <STDIN> )
{
my( $num, $cmd, #args ) = split( ' ', $line );
if( grep( $cmd eq $_ , #$cmds ) )
{
print join( ' ', $cmd, #args )."\n";
}
}
The script below takes function names in a text file and scans on a
folder that contains multiple c,h files. It opens those files one-by-one and
reads each line. If the match is found in any part of the files, it prints the
line number and the line that contains the match.
Everything is working fine except that the comparison is not working properly. I would be very grateful to whoever solves my problem.
#program starts:
use FileHandle;
print "ENTER THE PATH OF THE FILE THAT CONTAINS THE FUNCTIONS THAT YOU WANT TO
SEARCH: ";#getting the input file
our $input_path = <STDIN>;
$input_path =~ s/\s+$//;
open(FILE_R1,'<',"$input_path") || die "File open failed!";
print "ENTER THE PATH OF THE FUNCTION MODEL: ";#getting the folder path that
#contains multiple .c,.h files
our $model_path = <STDIN>;
$model_path =~ s/\s+$//;
our $last_dir = uc(substr ( $model_path,rindex( $model_path, "\\" ) +1 ));
our $output = $last_dir."_FUNC_file_names";
while(our $func_name_input = <FILE_R1> )#$func_name_input is the function name
#that is taken as the input
{
$func_name_input=reverse($func_name_input);
$func_name_input=substr($func_name_input,rindex($func_name_input,"\("+1);
$func_name_input=reverse($func_name_input);
$func_name_input=substr($func_name_input,index($func_name_input," ")+1);
#above 4 lines are func_name_input is choped and only part of the function
#name is taken.
opendir FUNC_MODEL,$model_path;
while (our $file = readdir(FUNC_MODEL))
{
next if($file !~ m/\.(c|h)/i);
find_func($file);
}
close(FUNC_MODEL);
}
sub find_func()
{
my $fh1 = FileHandle->new("$model_path//$file") or die "ERROR: $!";
while (!$fh1->eof())
{
my $func_name = $fh1->getline(); #getting the line
**if($func_name =~$func_name_input)**#problem here it does not take the
#match
{
next if($func_name=~m/^\s+/);
print "$.,$func_name\n";
}
}
}
$func_name_input=substr($func_name_input,rindex($func_name_input,"\("+1);
You're missing an ending parenthesis. Should be:
$func_name_input=substr($func_name_input,rindex($func_name_input,"\(")+1);
There's probably an easier way than those four statements, too. But it's a little early to wrap my head around it all. Do you want to match "foo" in "function foo() {"? If so, you could use a regex like /\s+([^) ]+)/.
When you say $func_name =~$func_name_input, you're treating all characters in $func_name_input as special regex characters. If this is not what you mean to do, you can use quotemeta (perldoc -f quotemeta): $func_name =~quotemeta($func_name_input) or $func_name =~ qr/\Q$func_name_input\E/.
Debugging will be easier with strictures (and a syntax-hilighting editor). Also note that, if you're not using those variables in other files, "our" doesn't do anything "my" wouldn't do for file-scoped variables.
find + xargs + grep does 90% of what you want.
find . -name '*.[c|h]' | xargs grep -n your_pattern
ack does it even easier.
ack --type=cc your_pattern
Simply take your list of patterns from your file and "or" them together.
ack --type=cc 'foo|bar|baz'
This has the benefit of only search the files once, and not once for each pattern being searched for as you're doing.
I still think you should just use ack, but your code needed some serious love.
Here is an improved version of your program. It now takes the directory to search and patterns on the command line rather than having to ask for (and the user write) files. It searches all the files under the directory, not just the ones in the directory, using File::Find. It does this in one pass by concatenating all the patterns into regular expressions. It uses regexes instead of index() and substr() and reverse() and oh god. It simply uses built in filehandles rather than the FileHandle module and checking for eof(). Everything is declared lexical (my) instead of global (our). Strict and warnings are on for easier debugging.
#!/usr/bin/perl
use strict;
use warnings;
use File::Find;
die "Usage: search_directory function ...\n" unless #ARGV >= 2;
my $Search_Dir = shift;
my $Pattern = build_pattern(#ARGV);
find(
{
wanted => sub {
return unless $File::Find::name =~ m/\.(c|h)$/i;
find_func($File::Find::name, $pattern);
},
no_chdir => 1,
},
$Search_Dir
);
# Join all the function names into one pattern
sub build_pattern {
my #patterns;
for my $name (#_) {
# Turn foo() into foo. This replaces all that reverse() and rindex()
# and substr() stuff.
$name =~ s{\(.*}{};
# Use \Q to protect against regex metacharacters in the input
push #patterns, qr{\Q$name\E};
}
# Join them up into one pattern.
return join "|", #patterns;
}
sub find_func {
my( $file, $pattern ) = #_;
open(my $fh, "<", $file) or die "Can't open $file: $!";
while (my $line = <$fh>) {
# XXX not all functions are unindented, but your choice
next if $line =~ m/^\s+/;
print "$file:$.: $line" if $line =~ $pattern;
}
}