Globbing and regular expression prob with bash script - perl

I have a problem with my regex:
My script is written in perl.
#!/usr/bin/perl
# Inverse les colonnes 1 et 2
while(<>){
my #cols = split (/\|/);
print "$cols[-3]/$cols[-4]\n";
}
exit;
I create an alias using the command :
alias inverseur="perl /laboratoire10/inverseur_colonnes.pl
I am hoping to accomplish the following:
Write a "bash" script that creates a file container for each movie title (.avi) in the file.
The original file is: http://www.genxvideo.com/genxinventory-current.xls
but I have since renamed it to liste_films.csv .
All quotation marks, spaces, dashes, and other strange characters must be replaced by an underscore, "_".
The group would become the directory name and the title of the movie will follow the file name suffix( .avi). In order to do this, the code must process the fields "title" and "class" in reverse. You can reverse the fields "title" and "class" with the alias "inverter" created earlier.
The script will obviously create each directory in "/laboratoire10" before creating the .avi files. There should be 253 valid directories total. Directories are being created through a "|" with the command "xargs mkdir-pv /."
I need help augmenting my current code with a command to find .avi files whose name contains the string min/maj "wood

It is very hard to understand what exactly you are trying to do. Under the assumption you have a | separated CSV and wish to have a directory tree with CATEGORY/TITLE and the file named "cans.avi" under each directory with that name, here is a one liner perl script.
perl -mText::CSV -e '$csv = Text::CSV->new({ sep_char=>"|",binary=>1,auto_diag => 1 } ) || die; open my $fh, "<", $ARGV[0] or die; while (my $row = $csv->getline($fh)) { $file = cleaner($row->[1])."/".cleaner($row->[0]); print "mkdir $file; touch $file/cans.avi\n"; } sub cleaner($) { my($f) = #_; $f =~ s/\W/_/g; $f;}' ~/tmp/genxinventory-current.csv
I converted the XLS file to | separated CSV using libreoffice, so your conversion mileage (kilometerage?) may vary.

Related

How to rename multiple files in a folder with a specific format?

I have many files in a folder with the format '{galaxyID}-cutout-HSC-I-{#}-pdr2_wide.fits', where {galaxyID} and {#} are different numbers for each file. Here are some examples:
2185-cutout-HSC-I-9330-pdr2_wide.fits
992-cutout-HSC-I-10106-pdr2_wide.fits
2186-cutout-HSC-I-9334-pdr2_wide.fits
I want to change the format of all files in this folder to match the following:
2185_HSC-I.fits
992_HSC-I.fits
2186_HSC-I.fits
namely, I want to take out "cutout", the second number, and "pdr2_wide" from each file name. I would prefer to do this in either Perl or Python. For my Perl script, so far I have the following:
rename [-n];
my #parts=split /-/;
my $this=$parts[0].$parts[1].$parts[2].$parts[3].$parts[4].$parts[5];
$_ = $parts[0]."_".$parts[2]."_".$parts[3];
*fits
which gives me the error message
Not enough arguments for rename at ./rename.sh line 3, near "];" Execution of ./rename.sh aborted due to compilation errors.
I included the [-n] because I want to make sure the changes are what I want before actually doing it; either way, this is in a duplicated directory just for safety.
It looks like you are using the rename you get on Ubuntu (it's not the one that's on my ArchLinux box), but there are other ones out there. But, you've presented it oddly. The brackets around -n shouldn't be there and the ; ends the command.
The syntax, if you are using what I think you are, is this:
% rename -n -e PERL_EXPR file1 file2 ...
The Perl expression is the argument to the -e switch, and can be a simple substitution. Note that this expression is a string that you give to -e, so that probably needs to be quoted:
% rename -n -e 's/-\d+-pdr2_wide//' *.fits
rename(2185-cutout-HSC-I-9330-pdr2_wide.fits, 2185-cutout-HSC-I.fits)
And, instead of doing this in one step, I'd do it in two:
% rename -n -e 's/-cutout-/-/; s/-\d+-pdr2_wide//' *.fits
rename(2185-cutout-HSC-I-9330-pdr2_wide.fits, 2185-HSC-I.fits)
There are other patterns that might make sense. Instead of taking away parts, you can keep parts:
% rename -n -e 's/\A(\d+).*(HSC-I).*/$1-$2.fits/' *.fits
rename(2185-cutout-HSC-I-9330-pdr2_wide.fits, 2185-HSC-I.fits)
I'd be inclined to use named captures so the next poor slob knows what you are doing:
% rename -n -e 's/\A(?<galaxy>\d+).*(HSC-I).*/$+{galaxy}-$2.fits/' *.fits
rename(2185-cutout-HSC-I-9330-pdr2_wide.fits, 2185-HSC-I.fits)
From your description {galaxyID}-cutout-HSC-I-{#}-pdr2_wide.fits, I assume that cutout-HSC-I is fixed.
Here's a script that will do the rename. It takes a list of files on stdin. But, you could adapt to take the output of readdir:
#!/usr/bin/perl
master(#ARGV);
exit(0);
sub master
{
my($oldname);
while ($oldname = <STDIN>) {
chomp($oldname);
# find the file extension/suffix
my($ix) = rindex($oldname,".");
next if ($ix < 0);
# get the suffix
my($suf) = substr($oldname,$ix);
# only take filenames of the expected format
next unless ($oldname =~ /^(\d+)-cutout-(HSC-I)/);
# get the new name
my($newname) = $1 . "_" . $2 . $suf;
printf("OLDNAME: %s NEWNAME: %s\n",$oldname,$newname);
# rename the file
# change to "if (1)" to actually do it
if (0) {
rename($oldname,$newname) or
die("unable to rename '$oldname' to '$newname' -- $!\n");
}
}
}
For your sample input file, here's the program output:
OLDNAME: 2185-cutout-HSC-I-9330-pdr2_wide.fits NEWNAME: 2185_HSC-I.fits
OLDNAME: 992-cutout-HSC-I-10106-pdr2_wide.fits NEWNAME: 992_HSC-I.fits
OLDNAME: 2186-cutout-HSC-I-9334-pdr2_wide.fits NEWNAME: 2186_HSC-I.fits
The above is how I usually do things but here's one with just a regex. It's fairly strict in what it accepts [for safety], but you can adapt as desired:
#!/usr/bin/perl
master(#ARGV);
exit(0);
sub master
{
my($oldname);
while ($oldname = <STDIN>) {
chomp($oldname);
# only take filenames of the expected format
next unless ($oldname =~ /^(\d+)-cutout-(HSC-I)-\d+-pdr2_wide([.].+)$/);
# get the new name
my($newname) = $1 . "_" . $2 . $3;
printf("OLDNAME: %s NEWNAME: %s\n",$oldname,$newname);
# rename the file
# change to "if (1)" to actually do it
if (0) {
rename($oldname,$newname) or
die("unable to rename '$oldname' to '$newname' -- $!\n");
}
}
}

How to print result STDOUT to a temporary blank new file in the same directory in Perl?

I'm new in Perl, so it's maybe a very basic case that i still can't understand.
Case:
Program tell user to types the file name.
User types the file name (1 or more files).
Program read the content of file input.
If it's single file input, then it just prints the entire content of it.
if it's multi files input, then it combines the contents of each file in a sequence.
And then print result to a temporary new file, which located in the same directory with the program.pl .
file1.txt:
head
a
b
end
file2.txt:
head
c
d
e
f
end
SINGLE INPUT program ioSingle.pl:
#!/usr/bin/perl
print "File name: ";
$userinput = <STDIN>; chomp ($userinput);
#read content from input file
open ("FILEINPUT", $userinput) or die ("can't open file");
#PRINT CONTENT selama ada di file tsb
while (<FILEINPUT>) {
print ; }
close FILEINPUT;
SINGLE RESULT in cmd:
>perl ioSingle.pl
File name: file1.txt
head
a
b
end
I found tutorial code that combine content from multifiles input but cannot adapt the while argument to code above:
while ($userinput = <>) {
print ($userinput);
}
I was stucked at making it work for multifiles input,
How am i suppose to reformat the code so my program could give result like this?
EXPECTED MULTIFILES RESULT in cmd:
>perl ioMulti.pl
File name: file1.txt file2.txt
head
a
b
end
head
c
d
e
f
end
i appreciate your response :)
A good way to start working on a problem like this, is to break it down into smaller sections.
Your problem seems to break down to this:
get a list of filenames
for each file in the list
display the file contents
So think about writing subroutines that do each of these tasks. You already have something like a subroutine to display the contents of the file.
sub display_file_contents {
# filename is the first (and only argument) to the sub
my $filename = shift;
# Use lexical filehandl and three-arg open
open my $filehandle, '<', $filename or die $!;
# Shorter version of your code
print while <$filehandle>;
}
The next task is to get our list of files. You already have some of that too.
sub get_list_of_files {
print 'File name(s): ';
my $files = <STDIN>;
chomp $files;
# We might have more than one filename. Need to split input.
# Assume filenames are separated by whitespace
# (Might need to revisit that assumption - filenames can contain spaces!)
my #filenames = split /\s+/, $files;
return #filenames;
}
We can then put all of that together in the main program.
#!/usr/bin/perl
use strict;
use warnings;
my #list_of_files = get_list_of_files();
foreach my $file (#list_of_files) {
display_file_contents($file);
}
By breaking the task down into smaller tasks, each one becomes easier to deal with. And you don't need to carry the complexity of the whole program in you head at one time.
p.s. But like JRFerguson says, taking the list of files as command line parameters would make this far simpler.
The easy way is to use the diamond operator <> to open and read the files specified on the command line. This would achieve your objective:
while (<>) {
chomp;
print "$_\n";
}
Thus: ioSingle.pl file1.txt file2.txt
If this is the sole objective, you can reduce this to a command line script using the -p or -n switch like:
perl -pe '1' file1.txt file2.txt
perl -ne 'print' file1.txt file2.txt
These switches create implicit loops around the -e commands. The -p switch prints $_ after every loop as if you had written:
LINE:
while (<>) {
# your code...
} continue {
print;
}
Using -n creates:
LINE:
while (<>) {
# your code...
}
Thus, -p adds an implicit print statement.

Cannot find argument passed to program called using Perl "system" command

I'm writing a Perl script to run an external program on every file in a directory. This program converts files from one format to another. Here's the deal...
When I run the program from the command line, everything works as it should:
computer.name % /path/program /inpath/input.in /outpath/output.out
converting: /inpath/input.in to /outpath/output.out
computer.name %
Here's the code I wrote to convert all files in a directory (listed in "file_list.txt"):
#!/usr/bin/perl -w
use warnings;
use diagnostics;
use FileHandle;
use File::Copy;
# Set simulation parameters and directories
#test_dates = ("20110414");
$listfile = "file_list.txt";
$execname = "/path/program";
foreach $date (#test_dates)
{
# Set/make directories
$obs_file_dir = "inpath";
$pred_file_dir = "outpath";
mkdir "$pred_file_dir", 0755 unless -d "$pred_file_dir";
# Read input file names to array
$obs_file_list = $obs_file_dir . $listfile;
open(DIR, $obs_file_list) or die "Could not open file!";
#obs_files = <DIR>;
close(DIR);
# Convert and save files
foreach $file (#obs_files)
{
$file =~ s/(\*)//g;
$infile = $obs_file_dir . $file;
$outfile = $pred_file_dir . $file;
$outfile =~ s/in/out/g;
print $infile . "\n";
#arg_list = ($execname, $infile, $outfile);
system(#arg_list);
}
}
The output shows me the following error for every file in the list:
computer.name % perl_script_name.pl
/inpath/input.in
converting: /inpath/input.in to /outpath/output.out
unable to find /inpath/input.in
stat status=-1
error while processing the product
I verified every file is in the proper place and have no idea why I am getting this error. Why can't the files be found? When I manually pass the arguments using the command line, no problem. When I pass the arguments through a variable via a system call, they can't be found even though the path and file names are correct.
Your advice is greatly appreciated!
Your list of files (#obs_files) comes from reading in a file via #obs_files = <DIR>;
When you do that, each element of array will be a line from a file (e.g. directory listing), with the line being terminated by a newline character.
Before using it, you need to remove the newline character via chomp($file).
Please note that s/(\*)//g; does NOT remove that trailing newline!

Splitting a concatenated file based on header text

I have a few very large files which are basically a concatenation of several small files and I need to split them into their constituent files. I also need to name the files the same as the original files.
For example the files QMAX123 and QMAX124 have been concatenated to:
;QMAX123 - Student
... file content ...
;QMAX124 - Course
... file content ...
I need to recreate the file QMAX123 as
;QMAX123 - Student
... file content ...
And QMAX124 as
;QMAX124 - Course
... file content ...
The original file's header ;QMAX<some number> is unique and only appears as a header in the file.
I used the script below to split the content of the files, but I haven't been able to adapt it to get the file names right.
awk '/^;QMAX/{close("file"f);f++}{print $0 > "file"f}' <filename>
So I can either adapt that script to name the file correctly or I can rename the split files created using the script above based on the content of the file, whichever is easier.
I'm currently using cygwin bash (which has perl and awk) if that has any bearing on your answer.
The following Perl should do the trick
use warnings ;
use strict ;
my $F ; #will hold a filehandle
while (<>) {
if ( / ^ ; (\S+) /x) {
my $filename = $1 ;
open $F, '>' , $filename or die "can't open $filename " ;
} else {
next unless defined $F ;
print $F $_ or warn "can't write" ;
}
}
Note it discards any input before a line with filename next unless defined $F ; You may care to generate an error or add a default file. Let me know and I can change it
With Awk, it's as simple as
awk '/^;QMAX/ {filename = substr($1,2)} {print >> filename}' input_file

Using perl to parse a file and insert specific values into a database

Disclaimer: I'm a newbie at scripting in perl, this is partially a learning exercise (but still a project for work). Also, I have a much stronger grasp on shell scripting, so my examples will likely be formatted in that mindset (but I would like to create them in perl). Sorry in advance for my verbosity, I want to make sure I am at least marginally clear in getting my point across
I have a text file (a reference guide) that is a Word document converted to text then swapped from Windows to UNIX format in Notepad++. The file is uniform in that each section of the file had the same fields/formatting/tables.
What I have planned to do, in a basic way is grab each section, keyed by unique batch job names and place all of the values into a database (or maybe just an excel file) so all the fields can be searched/edited for each job much easier than in the word file and possibly create a web interface later on.
So what I want to do is grab each section by doing something like:
sed -n '/job_name_1_regex/,/job_name_2_regex/' file.txt --how would this be formatted within a perl script?
(grab the section in total, then break it down further from there)
To read the file in the script I have open FORMAT_FILE, 'test_format.txt'; and then use foreach $line (<FORMAT_FILE>) to parse the file line by line. --is there a better way?
My next problem is that since I converted from a word doc with tables, which looks like:
Table Heading 1 Table Heading 2
Heading 1/Value 1 Heading 2/Value 1
Heading 1/Value 2 Heading 2/Value 2
but the text file it looks like:
Table Heading 1
Table Heading 2Heading 1/Value 1Heading 1/Value 2Heading 2/Value 1Heading 2/Value 2
So I want to have "Heading 1" and "Heading 2" as a columns name and then put the respective values there. I just am not sure how to get the values in relation to the heading from the text file. The values of Heading 1 will always be the line number of Heading 1 plus 2 (Heading 1, Heading 2, Values for heading 1). I know this can be done in awk/sed pretty easily, just not sure how to address it inside a perl script.
---EDIT---
For this I was thinking of doing an array something like:
my #heading1 = ($value1, $value2, etc.)
my #heading2 = ($value1, $value2, etc.)
I just need to be able to associate the correct values and headings together. So that heading1 = the line after heading2 (where the values start).
Like saying (in shell):
x=$(grep -n "Heading 1" file.txt | cut -d":" -f1) #gets the line that "Heading 1" is on in the file
(( x = x+2 )) #adds 2 to the line (where the values will start)
#print values from file.txt from the line where they start to the
#last one (I'll figure that out at some point before this)
sed -n "$x,$last_line_of_values p" file.txt
This is super-hacked together for the moment, to try to elaborate what I want to do...let me know if it clears it up a little...
---/EDIT---
After I have all the right values and such, linking it up to a database may be an issue as well, I haven't started looking at the way perl interacts with DBs yet.
Sorry if this is a bit scatterbrained...it's still not fully formed in my head.
http://perlmeme.org/tutorials/connect_to_db.html
#!/usr/bin/perl
use strict;
use warnings;
use DBI;
my $driver = "mysql"; # Database driver type
my $database = "test"; # Database name
my $user = ""; # Database user name
my $password = ""; # Database user password
my $dbh = DBI->connect(
"DBI:$driver:$database",
$user, $password,
{
RaiseError => 1,
PrintError => 1,
}
) or die $DBI::errstr;
my $sth = $dbh->prepare("
INSERT INTO test
(col1, col2)
VALUES (?, ?)
") or die $dbh->errstr;
my $intable = 0;
open my $file, "file.txt" or die "can't open file $!";
while (<$file>) {
if (/job_name_1_regex/../job_name_2_regex/) { # job 1 section
$intable = 1 if /Table Heading 1/; # table start
if ($intable) {
my $next_line = <$file>; # heading 2 line
chomp; chomp $next_line;
$sth->execute($_, $next_line) or die $dbh->errstr;
}
}
}
close $file or die "can't close file $!";
$dbh->disconnect;
Several things in this post... First, the basic "best practices" :
use modern perl. start your scripts with
use strict; use warnings;
don't use global filehandles, use lexical filehandles (declare them in a variable).
always check "open" for return values.
open my $file, "/some/file" or die "can't open file : $!"
Then, about pattern matching : I don't understand your example at all but I suppose you want something like :
foreach my $line ( <$file> ) {
if ( $line =~ /regexp1/) {
# do something...
}
}
Edit : about table, I suppose the best thing is to build two arrays, one for each column.
If I understand correctly when reading the file you need to split the line and put one part in the #col1 array, and the second part in the #col2 array. The clear and easy way is to use two temporary variables :
my ( $val1, $val2 ) = split /\s+/, $line;
push #col1, $val1;
push #col2, $val2;