Read same extension multiple files in one directory in Perl - perl

I currently have an issue with reading files in one directory.
I need to take all the fastq files in a file and run the script for each file then put new files in an ‘Edited_sequences’ folder.
The one script I had is
perl -ne '$i++; if($i<80001){print}' BM2003_TCCCAGAACAAC_L001_R1_001.fastq > ./Edited_sequences/BM2003_TCCCAGAACAAC_L001_R1_001.fastq
It takes the first 80000 lines in one fastq file then outputs the result.
Now for example I have 2000 fastq files, then I need to copy and paste for 2000 times.
I know there is a glob command suit for this situation but I just do not know how to deal with that.
Please help me out.

You can use perl to do copy/paste for you, first argument *.fastq are all fastq files, and second ./Edited_sequences is target folder for new files,
perl -e '$d=pop; `head -8000 "$_" > "$d/$_"` for #ARGV' *.fastq ./Edited_sequences

glob gets you an array of filenames matching a particular expression. It's frequently used with <> brackets, a lot like reading input (you can think of it as reading files from a directory).
This is a simple example that will print the names of every ".fastq" file in the current directory:
print "$_\n" for <*.fastq>;
The important part is <*.fastq>, which gives us an array of filenames matching that expression (in this case, a file extension). If you need to change which directory your Perl script is working in, you can use chdir.
From there, we can process your files as needed:
while (my $filename = <*.fastq>) {
open(my $in, '<', $filename) or die $!;
open(my $out, '>', "./Edited_sequences/$filename") or die $!;
for (1..80000) {
my $line = <$in>;
print $out $line;
}
}

You have two choices:
Use Perl to read in the 2000 files and run it as part of your program
Use the Shell to pass each of those 2000 file to your command line
Here's the bash alternative:
for file in *.fastq
do
perl -ne '$i++; if($i<80001){print}' "$file" > "./Edited_sequences/$file"
done
Your same Perl script, but with the shell finding each file. This should work and not overload the command line. The for loop in bash, if handed a glob can expand them correctly.
However, I always recommend that you don't actually execute the command, but echo the resulting commands into a file:
for file in *.fastq
do
echo "perl -ne '\$i++; if(\$i<80001){print}' \
\"$file\" > \"./Edited_sequences/$file\"" >> myoutput.txt
done
Then, you can look at myoutput.txt to make sure it looks good before you actually do any real harm. Once you've determined that myoutput.txt is a good file, you can execute that as a shell script:
$ bash myoutput.txt

Related

Perl regular expression loop through all the directory and get specific file

I would like to translate the unix regular expression into Perl language to get some specific file associated with some condition.
Suppose now I have Perl script in a directory /nfs/cs/test_case/y2016 call totalResult.pl, this directory also contains lot of directories as well such as testWeek1, testWeek2, testWeek3...etc. Each directory contain sub-directory such as testCase1, testCase2, testCase3...etc. and Each testCase directory contains a file call .test_result, the contain record the result either success or fail.
So I can get the file information using unix command, for example:
wc /nfs/cs/test_case/y2016/testWeek1/testCase1/.test_result
If would like to get the test_results for each directory and sub-directory which is fail, I can do it from the current path /nfs/cs/test_case/y2016 in unix like:
grep -ri "fail" */*/.test_result
It will give me the output:
/nfs/cs/test_case/y2016/testWeek1/testCase1/.test_result:fail
/nfs/cs/test_case/y2016/testWeek3/testCase45/.test_result:fail
/nfs/cs/test_case/y2016/testWeek4/testCase12/.test_result:fail
.
.
...etc
How can I achieve it in writing a Perl script just run the command perl testCase.pl then can get the same output? I'm new in unix and Perl, anyone can help?
# Collect names of all test files
my #TestFiles = glob('/nfs/cs/test_case/y2016/*/*/.test_result');
# Check test files for "fail"
foreach my $TestFile ( #TestFiles ) {
open(my $T,'<',$TestFile) or die "Can't open < $TestFile: $!";
while(<$T>){
if( /fail/ ) {
chomp;
print $TestFile,":",$_,"\n";
}
}
close($T);
}
You can also execute the same linux command within Perl using back tick (`) operator.
#result=`grep -ri "fail" */*/.test_result`;
print #result;

How to read a .conf file in Perl

I just created a text test.conf file with some information. How can I read it on Perl?
I am new to Perl and I am not sue would will I need to do.
I tried the following:
C:\Perl\Perl_Project>perl
#!/usr/local/bin/perl
open (MYFILE, 'test.conf');
while (<MYFILE>)
{ chomp; print "$_\n"; }
close (MYFILE);
I tried installing Perl on my laptop that has Windows 7 OS, and using command line.
Instead of using command line, write your program in a file (you can use any editor to write your program, I would suggest use Notepad++) and save as myprogram.pl in the same directory where you have your .conf file.
use warnings;
use strict;
open my $fh, "<", "test.conf" or die $!;
while (<$fh>)
{
chomp;
print "$_\n";
}
close $fh;
Now open a command prompt and go to the same path where you have your both file myprogram.pl and test.conf file and execute your program by typing this:
perl myprogram.pl
You can give full path of your input file inside program and can run your program from any path from command prompt by giving full path of your program:
perl path\to\myprogram.pl
Side note: Always use use warnings; and use strict; at the top of your program and to open file always use lexical filehandle with three arguments with error handling.
This is an extended comment more than an answer, as I believe #serenesat has given you everything you need to execute your program.
When you do "command line" Perl, it's typically stuff that is relatively brief or trivial, such as:
perl -e "print 2 ** 16"
Anything that goes beyond a few lines, and you're probably better off putting that in a file and having Perl run the file. You certainly can put larger programs on the command line, but when it comes to going back in and editing lines, it becomes more of a hassle than a shortcut.
Also, for what it's worth the -n and -p parameters allow you to process the contents of a stream, meaning you could do something like this:
perl -ne "print if /oracle/i" test.conf

how to create a script from a perl script which will use bash features to copy a directory structure

hi i have written a perl script which copies all the entire directory structure from source to destination and then i had to create a restore script from the perl script which will undo what the perl script has done that is create a script(shell) which can use bash features to restore the contents from destination back to source i m struggling to find the correct function or command which can copy recursively (not an requirement) but i want exactly the same structure as it was before
Below is the way i m trying to create a file called restore to do the restoration process
i m particularly looking for algorithm.
Also restore will restore the structure to a command line directory input if it is supplied if not You can assume the default input supplied to perl script
$source
$target
in this case we would wanna copy from target to source
So we have two different parts in one script.
1 which will copy from source to destination.
2 it will create a script file which will undo what part 1 has done
i hope this makes it very clear
unless(open FILE, '>'."$source/$file")
{
# Die with error message
# if we can't open it.
die "\nUnable to create $file\n";
}
# Write some text to the file.
print FILE "#!/bin/sh\n";
print FILE "$1=$target;\n";
print FILE "cp -r \n";
# close the file.
close FILE;
# here we change the permissions of the file
chmod 0755, "$source/$file";
The last problem i have is i couldn't get $1 in my restore file as it refers to a some variable in perl
but i need this for getting command line input when i run restore as $0 = ./restore $1=/home/xubuntu/User
First off, the standard way in Perl for doing this:
unless(open FILE, '>'."$source/$file") {
die "\nUnable to create $file\n";
}
is to use the or statement:
open my $file_fh, ">", "$source/$file"
or die "Unable to create "$file"";
It's just easier to understand.
A more modern way would be use autodie; which will handle all IO problems when opening or writing to files.
use strict;
use warnings;
use autodie;
open my $file_fh, '>', "$source/$file";
You should look at the Perl Modules File::Find, File::Basename, and File::Copy for copying files and directories:
use File::Find;
use File::Basename;
my #file_list;
find ( sub {
return unless -f;
push #file_list, $File::Find::name;
},
$directory );
Now, #file_list will contain all the files in $directory.
for my $file ( #file_list ) {
my $directory = dirname $file;
mkdir $directory unless -d $directory;
copy $file, ...;
}
Note that autodie will also terminate your program if the mkdir or copy commands fail.
I didn't fill in the copy command because where you want to copy and how may differ. Also you might prefer use File::Copy qw(cp); and then use cp instead of copy in your program. The copy command will create a file with default permissions while the cp command will copy the permissions.
You didn't explain why you wanted a bash shell command. I suspect you wanted to use it for the directory copy, but you can do that in Perl anyway. If you still need to create a shell script, the easiest way is via the :
print {$file_fh} << END_OF_SHELL_SCRIPT;
Your shell script goes here
and it can contain as many lines as you need.
Since there are no quotes around `END_OF_SHELL_SCRIPT`,
Perl variables will be interpolated
This is the last line. The END_OF_SHELL_SCRIPT marks the end
END_OF_SHELL_SCRIPT
close $file_fh;
See Here-docs in Perldoc.
First, I see that you want to make a copy-script - because if you only need to copy files, you can use:
system("cp -r /sourcepath /targetpath");
Second, if you need to copy subfolders, you can use -r switch, can't you?

How do I run a Perl script on multiple input files with the same extension?

How do I run a Perl script on multiple input files with the same extension?
perl scriptname.pl file.aspx
I'm looking to have it run for all aspx files in the current directory
Thanks!
In your Perl file,
my #files = <*.aspx>;
for $file (#files) {
# do something.
}
The <*.aspx> is called a glob.
you can pass those files to perl with wildcard
in your script
foreach (#ARGV){
print "file: $_\n";
# open your file here...
#..do something
# close your file
}
on command line
$ perl myscript.pl *.aspx
You can use glob explicitly, to use shell parameters without depending to much on the shell behaviour.
for my $file ( map {glob($_)} #ARGV ) {
print $file, "\n";
};
You may need to control the possibility of a filename duplicate with more than one parameter expanded.
For a simple one-liner with -n or -p, you want
perl -i~ -pe 's/foo/bar/' *.aspx
The -i~ says to modify each target file in place, and leave the original as a backup with an ~ suffix added to the file name. (Omit the suffix to not leave a backup. But if you are still learning or experimenting, that's a bad idea; removing the backups when you're done is a much smaller hassle than restoring the originals from a backup if you mess something up.)
If your Perl code is too complex for a one-liner (or just useful enough to be reusable) obviously replace -e '# your code here' with scriptname.pl ... though then maybe refactor scriptname.pl so that it accepts a list of file name arguments, and simply use scriptname.pl *.aspx to run it on all *.aspx files in the current directory.
If you need to recurse a directory structure and find all files with a particular naming pattern, the find utility is useful.
find . -name '*.aspx' -exec perl -pi~ -e 's/foo/bar/' {} +
If your find does not support -exec ... + try with -exec ... \; though it will be slower and launch more processes (one per file you find instead of as few as possible to process all the files).
To only scan some directories, replace . (which names the current directory) with a space-separated list of the directories to examine, or even use find to find the directories themselves (and then perhaps explore -execdir for doing something in each directory that find selects with your complex, intricate, business-critical, maybe secret list of find option predicates).
Maybe also explore find2perl to do this directory recursion natively in Perl.
If you are on Linux machine, you could try something like this.
for i in `ls /tmp/*.aspx`; do perl scriptname.pl $i; done
For example to handle perl scriptname.pl *.aspx *.asp
In linux: The shell expands wildcards, so the perl can simply be
for (#ARGV) {
operation($_); # do something with each file
}
Windows doesn't expand wildcards so expand the wildcards in each argument in perl as follows. The for loop then processes each file in the same way as above
for (map {glob} #ARGV) {
operation($_); # do something with each file
}
For example, this will print the expanded list under Windows
print "$_\n" for(map {glob} #ARGV);
You can also pass the path where you have your aspx files and read them one by one.
#!/usr/bin/perl -w
use strict;
my $path = shift;
my #files = split/\n/, `ls *.aspx`;
foreach my $file (#files) {
do something...
}

How can I scan multiple log files to find which ones have a particular IP address in them?

Recently there have been a few attackers trying malicious things on my server so I've decided to somewhat "track" them even though I know they won't get very far.
Now, I have an entire directory containing the server logs and I need a way to search through every file in the directory, and return a filename if a string is found. So I thought to myself, what better of a language to use for text & file operations than Perl? So my friend is helping me with a script to scan all files for a certain IP, and return the filenames that contain the IP so I don't have to search for the attacker through every log manually. (I have hundreds)
#!/usr/bin/perl
$dir = ".";
opendir(DIR, "$dir");
#files = grep(/\.*$/,readdir(DIR));
closedir(DIR);
foreach $file(#files) {
open FILE, "$file" or die "Unable to open files";
while(<FILE>) {
print if /12.211.23.200/;
}
}
although it is giving me directory read errors. Any assistance is greatly appreciated.
EDIT: Code edited, still saying permission denied cannot open directory on line 10. I am just going to run the script from within the logs directory if you are questioning the directory change to "."
Mike.
Can you use grep instead?
To get all the lines with the IP, I would directly use grep, no need to show a list of files, it's a simple command:
grep 12\.211\.23\.200 *
I like to pipe it to another file and then open that file in an editor...
If you insist on wanting the filenames, it's also easy
grep -l 12\.211\.23\.200 *
grep is available on all Unix//Linux with the GNU tools, or on windows using one of the many implementations (unxutils, cygwin, ...etc.)
You have to concatenate $dirname with $filname when using files found through readdir, remember you haven't chdir'ed into the directory where those files resides.
open FH, "<", "$dirname/$filname" or die "Cannot open $filname:$!";
Incidentally, why not just use grep -r to recursively search all subdirectories under your log dir for your string?
EDIT: I see your edits, and two things. First, this line:
#files = grep(/\.*$/,readdir(DIR));
Is not effective, because you are searching for zero or more . characters at the end of the string. Since it's zero or more, it'll match everything in the directory. If you're trying to exclude files ending in ., try this:
#files = grep(!/\.$/,readdir(DIR));
Note the ! sign for negation if you're trying to exclude those files. Otherwise (if you only want those files and I'm misunderstanding your intent), leave the ! out.
In any case, if you're getting your die message on line 10, most likely you're hitting a file that has permissions such that you can't read it. Try putting the filename in the die output so you can see which file it's failing on:
open FILE, "$file" or die "Unable to open file: $file";
But as with other answers, and to reiterate: Why not use grep? The unix command, not the Perl function.
This will get the file names you are looking for in perl, and probably do it much faster than running and doing a perl regex.
#files = `find ~/ServerLogs -name "*.log" | xargs grep -l "<ip address>"`'
Although, this will require a *nix compliant system, or Cygwin on Windows.
Firstly get a list of files within your source directory:
opendir(DIR, "$dir");
#files = grep(/\.log$/,readdir(DIR));
closedir(DIR);
And then loop through those files
foreach $file(#files)
{
// file processing code
}
My first suggest would be to use grep instead. The right tool for the job, they say...
But to answer your question:
readdir just returns the filenames from the directory. You'll need to concatenate the directory name and filename together.
$path = "$dirname/$filname";
open FH, $path or die ...
Then you should ignore files that are actually directories, such as "." and "..". After getting the $path, check to see if it's a file.
if (-f $path) {
open FH, $path or die ...
while (<FH>)
BTW, I thought I would throw in a mention for File::Next. To iterate over all files in a directory (recursively):
use Path::Class; # always useful.
use File::Next;
my $files = File::Next::files( dir(qw/path to files/) ); # look in path/to/files
while( defined ( my $file = $files->() ) ){
$file = file( $file );
say "Examining $file";
say "found foo" if $file->slurp =~ /foo/;
}
File::Next is taint-safe.
~ doesn't auto-expand in Perl.
opendir my $fh, '~/' or die("Doin It Wrong"); # Doing It Wrong.
opendir my $fh, glob('~/') and die( "Thats right!" );
Also, if you must use readdir(), make sure you guard the expression thus:
while (defined(my $filename = readdir(DH))) {
...
}
If you don't do the defined() test, the loop will terminate if it finds a file called '0'.
Have you looked on CPAN for log parsers? I searched with 'log parse' and it yielded over 200 hits. Some (probably many) won't be relevant - some may be. It depends, in part, on which web server you are using.
Am I reading this right? Your line 10 that gives you the error is
open FILE, "$file" or die "Unable to open files";
And the $file you are trying to read, according to line 6,
#files = grep(/\.*$/,readdir(DIR));
is a file that ends with zero or more dot. Is this what you really wanted? This basically matches every file in the directory, including "." and "..". Maybe you don't have enough permission to open the parent directory for reading?
EDIT: if you only want to read all files (including hidden ones), you might want to use something like the following:
opendir(DIR, ".");
#files = readdir(DIR);
closedir(DIR);
foreach $file (#files) {
if ($file ne "." and $file ne "..") {
open FILE, "$file" or die "cannot open $file\n";
# do stuff with FILE
}
}
Note that this doesn't take care of sub directories.
I know I am way late to this discussion (ran across it while searching for grep related posts) but I am going to answer anyway:
It isn't specified clearly if these are web server logs (Apache, IIS, W3SVC, etc.) but the best tool for mining those for data is the LogParser tool from Microsoft. See logparser.com for more info.
LogParser will allow you to write SQL-like statements against the log files. It is very flexible and very fast.
Use perl from the command line, like a better grep
perl -wnl -e '/12.211.23.200/ and print;' *.log > output.txt
the benefit here is that you can chain logic far easier
perl -wnl -e '(/12.211.23.20[1-11]/ or /denied/i ) and print;' *.log
if you are feeling wacky you can also use more advanced command line options to feed perl one liner result into other perl one liners.
You really need to read "Minimal Perl: For UNIX and Linux People", awesome book on this very sort of thing.
First, use grep.
But if you don't want to, here are two small improvements you can make that I haven't seen mentioned yet:
1) Change:
#files = grep(/\.*$/,readdir(DIR));
to
#files = grep({ !-d "$dir/$_" } readdir(DIR));
This way you will exclude not just "." and ".." but also any other subdirectories that may exist in the server log directory (which the open downstream would otherwise choke on).
2) Change:
print if /12.211.23.200/;
to
print if /12\.211\.23\.200/;
"." is a regex wildcard meaning "any character". Changing it to "\." will reduce the number of false positives (unlikely to change your results in practice but it's more correct anyway).