perl write variables to a file - perl

Here's my code to parse a configuration file, write the retrieved data to another file and send it to a MySQL database.
The database connection and writing data to a table works fine, however I can't get it to write data to the mentioned file mongoData.txt.
I'm quite new to Perl, so any help will be highly appreciated.
#!/usr/bin/perl
use strict;
use warnings;
use DBI;
my $line;
# Retrieving data
open( my $FILE, "<", "/etc/mongod.conf" )
or die "Cannot find file! : $!\n";
while ( $line = <$FILE> ) {
chomp($line);
my ( $KEY, $VALUE ) = split /\:/, $line;
# Ignoring commented lines
$_ = $line;
unless ( $_ = ~/^#/ ) {
# Write to file
open my $FILE2, ">", "/home/sierra/Documents/mongoData.txt"
or die "Cannot create file $!\n";
print $FILE2 "$KEY", "$VALUE\n";
}
# Connection to SQL database
my $db = DBI->connect(( "dbi:mysql:dbname=mongodconf;
host = localhost;", "root", "sqladmin"
)) or die "can't connect to mysql";
# Inserting into database
$db->do("insert into data values ('$KEY', '$VALUE')")
or die "query error\n";
}
close($FILE);

Every time you open a file for output, you create a new file and delete any pre-existing file with the same name. That means you're going to be left with only the last line you wrote to the file
Here are some more pointers
Variable identifiers should in general be all in digits, lower case letters, and underscores. Capital letters are reserved for global identifiers such as package names
If you are running a version of Perl later than v5.14 then you can use autodie which checks all IO operations for you and removes the need to test the return status by hand
If you use a die string that has no newline at the end, then Perl will add information about the source file name and line number where it occurred, which can be useful for debugging
It is unnecessary to name your loop control variables. Programs can be made much more concise and readable by using Perl's pronoun variable $_ which is the default for many built-in operators
It is wasteful to reconnect to your database every time you need to make changes. You should connect once at the top of your program and use that static connection throughout your code
You should use placeholders when passing parameter expressions to an SQL operation. It can be dangerous, and that way DBI will quote them correctly for you
There is no need to close input files explicitly. Everything will be closed automatically at the end of the program. But if you are worried about the integrity of your output data, you may want to do an explicit close on output file handles so that you can check that they succeeded
Here's what I would write. Rather than testing whether each line of the input begins with a hash, it removes everything from the first hash character onwards and then checks to see if there are any non-blank characters in what remains. That allows for trailing comments in the data
#!/usr/bin/perl
use strict;
use warnings 'all';
use autodie;
use DBI;
my ($input, $output, $dsn) = qw{
/etc/mongod.conf
/home/sierra/Documents/mongoData.txt
dbi:mysql:dbname=mongodconf;host=localhost;
};
open my $fh, '<', $input;
open my $out_fh, '>', $output;
my $dbh = DBI->connect($dsn, qw/ root sqladmin /)
or die "Can't connect to MySQL: $DBI::errstr";
while ( <$fh> ) {
chomp;
s/#.*//;
next unless /\S/;
my ( $key, $val ) = split /\:/;
print $out_fh "$key $val\n";
$dbh->do('insert into data values (?, ?)', $key, $val);
}
close $out_fh or die $!;
$dbh->disconnect or warn $dbh->errstr;

You need to append the text into the creating new file mongoData.txt
while ($line=<$FILE>)
{
chomp ($line);
my ($KEY, $VALUE) = split /\:/,$line;
# Ignoring commented lines
$_ = $line;
unless ($_ = ~/^#/)
{
open my $FILE2, ">>", "/home/sierra/Documents/mongoData.txt" or die "Cannot create file $!\n";
print $FILE2 "$KEY","$VALUE\n";
}
}
close($FILE2);
or else
Create the text file once before your nesting the while loop
open my $FILE2, ">", "/home/sierra/Documents/mongoData.txt"
or die "Cannot create file $!\n";
while ($line=<$FILE>)
{
chomp ($line);
my ($KEY, $VALUE) = split /\:/,$line;
# Ignoring commented lines
$_ = $line;
unless ($_ = ~/^#/)
{
print $FILE2 "$KEY","$VALUE\n";
}
}
close($FILE2);
May be this will help you.

Related

My perl script isn't working, I have a feeling it's the grep command

I'm trying for search in the one file for instances of the
number and post if the other file contains those numbers
#!/usr/bin/perl
open(file, "textIds.txt"); #
#file = <file>; #file looking into
# close file; #
while(<>){
$temp = $_;
$temp =~ tr/|/\t/; #puts tab between name and id
#arrayTemp = split("\t", $temp);
#found=grep{/$arrayTemp[1]/} <file>;
if (defined $found[0]){
#if (grep{/$arrayTemp[1]/} <file>){
print $_;
}
#found=();
}
print "\n";
close file;
#the input file lines have the format of
#John|7791 154
#Smith|5432 290
#Conor|6590 897
#And in the file the format is
#5432
#7791
#6590
#23140
There are some issues in your script.
Always include use strict; and use warnings;.
This would have told you about odd things in your script in advance.
Never use barewords as filehandles as they are global identifiers. Use three-parameter-open
instead: open( my $fh, '<', 'testIds.txt');
use autodie; or check whether the opening worked.
You read and store testIds.txt into the array #file but later on (in your grep) you are
again trying to read from that file(handle) (with <file>). As #PaulL said, this will always
give undef (false) because the file was already read.
Replacing | with tabs and then splitting at tabs is not neccessary. You can split at the
tabs and pipes at the same time as well (assuming "John|7791 154" is really "John|7791\t154").
Your talking about "input file" and "in file" without exactly telling which is which.
I assume your "textIds.txt" is the one with only the numbers and the other input file is the
one read from STDIN (with the |'s in it).
With this in mind your script could be written as:
#!/usr/bin/perl
use strict;
use warnings;
# Open 'textIds.txt' and slurp it into the array #file:
open( my $fh, '<', 'textIds.txt') or die "cannot open file: $!\n";
my #file = <$fh>;
close($fh);
# iterate over STDIN and compare with lines from 'textIds.txt':
while( my $line = <>) {
# split "John|7791\t154" into ("John", "7791", "154"):
my ($name, $number1, $number2) = split(/\||\t/, $line);
# compare $number1 to each member of #file and print if found:
if ( grep( /$number1/, #file) ) {
print $line;
}
}

How to search and replace a column in a CSV via perl?

So, I'm super new to perl... I'm on a windows box within a Corporate network. I am unable to download CPAN modules...
I have a CSV that is exported with multiple columns, the 2nd column contains 2 rows with text and the rest of the rows are IP addresses. I need to change the first 3 octets of the IP for all the rows excluding the text obviously. I then need to either save it to the same file, or create a new one... I also need all the other columns to remain in the document. I've looked and looked and everyone suggests modules which makes sense but I cannot get them. Here is my code:
if ( $cmd[0] eq "update" ) {
print "Old subnet ex: 10.0.0\n";
my $oldsubnet = <STDIN>;
chomp $oldsubnet;
print "New subnet ex: 10.0.0\n";
my $newsubnet = <STDIN>;
chomp $newsubnet;
my $file = "path\\file.csv";
open( my $fh, '<', $file ) or die "'$file' would not open $!";
while ( my $line = <$fh> ) {
chomp $line;
my #fields = split ",", $line;
my $string = $fields[1];
$string =~ s/$oldsubnet/$newsubnet/g;
my $ofile = "path\\test.csv";
open( my $ofh, '>>', $ofile ) or die "'$ofile' would not open $!";
print $ofh "$string\n";
}
}
So I just end up with a single column with the updated IP's with this code. How in the world do I get the rest of the CSV... Is there an easier way to just replace the IP's???? It seems like this would be easy but the CSV is making it hard. Any help would be appreciated. P.S. I am using strict and warnings :)
So just to clarify I am aware that #fields is being left out and thus not being printed... What I'm asking is how do I update only the data in the 2nd column and then put the whole document back into a new file?
You need to output more than just the changed field...
my #fields = split(/,/, $line);
$fields[1] =~ s/$oldsubnet/$newsubnet/g;
print $ofh join(',', #fields);
Also, don't open the output file in the loop to read the input file.
Without your source CSV, I can't say for sure, but would be thinking "use the Text::CSV module":
#!/usr/bin/env perl;
use strict;
use warnings;
use Text::CSV;
print "Old subnet ex: 10.0.0\n";
my $oldsubnet = <STDIN>;
chomp $oldsubnet;
print "New subnet ex: 10.0.0\n";
my $newsubnet = <STDIN>;
chomp $newsubnet;
my $file = "path\\file.csv";
my $ofile = "path\\test.csv";
open( my $input, '<', $file ) or die "'$file' would not open $!";
open( my $output, '>', $ofile ) or die "'$ofile' would not open $!";
my $csv = Text::CSV->new( { binary => 1, eol => "\n" } );
while ( my $row = $csv->getline() ) {
$row->[1] =~ s/$oldsubnet/$newsubnet/;
$csv->print( $output, $row );
}
It's not always the best answer, but where you're trying to read CSV, change a single field, and write CSV it works quite nicely.

Alter a file using information from another file

I want to alter the names in a phylip file using information from another file. The phylip is just one continuous string of information, and the names I want to alter (e.g. aaaaaaabyd) are embedded in it. Like so
((aaaaaaabyd:0.23400159127856412500,(((aaaaaaaaxv:0.44910864993667892753,aaaaaaaagf:0.51328033054009691849):0.06090419044604544752,((aaaaaaabyc:0.11709094683204501752,aaaaaaafzz:0.04488198976629347720):0.09529995111708353117,((aaaaaaadbn:0.34408087090010841536,aaaaaaaafj:0.47991503739434709930):0.06859184769990583908,((aaaaaaaabk:0.09244297511609228524,aaaaaaaete:0.12568841555837687030):0.28431
(there are no new lines)
The names within are like aaaaaaaabk.
The other file has the information change to, like so in the other file,
aaaaaaaabk;Ciona savignyi
aaaaaaaete;Homo sapiens
aaaaaaaafj;Cryptosporidium hominis
aaaaaaaaad;Strongylocentrotus purpuratus
aaaaaaabyd;Theileria parva
aaaaaaaaaf;Plasmodium vivax
I have tried numerous things but this is the closest I got. The problem is it does it for one and doesn't print out the rest of the phylip file. I need to get to ((Theileria parva:0.23400159127856412500, etc.
open(my $tree, "$ARGV[0]") or die "Failed to open file: $!\n";
open(my $csv, "$ARGV[0]") or die "Failed to open file: $!\n";
open(my $new_tree, "> raxml_tree.phy");
# Declare variables
my $find;
my $replace;
my $digest;
# put the file of the tree into string variable
my $string = <$tree>;
# open csv file
while (my $line = <$csv>) {
# aaaaaaaaaa;Ciona savignyi
if ($line =~ m/(\w+)\;+(\w+\s+\w*)/) {
$find = $1;
$replace = $2;
$string =~ s/$find/$replace/g;
}
}
print $new_tree "$string";
close $tree;
close $csv;
close $new_tree;
Some guidelines on your own code
The problem is almost certainly that you are opening the same file $ARGV[0] twice. Presumably one should be `$ARGV[1]
You must always use strict and use warnings at the top of every Perl program you write (there is very little point in declaring your variables unless use strict is in place) and declare all your variables with my as close as possible to their first point of use. It is bad form to declare all your variables in a block at the start, because it makes them all effectively global, and you lose most of the advantages of declaring lexical variables
You should use the three-parameter form of open, and it is a good idea to put the name of the file in the die string so that you can see which one failed. So
open(my $tree, "$ARGV[0]") or die "Failed to open file: $!\n";
becomes
open my $tree, '<', $ARGV[0] or die qq{Failed to open "$ARGV[0]" for input: $!\n};
You should look for simpler solutions rather than apply regex methods every time. $line =~ m/(\w+)\;+(\w+\s+\w*)/ is much tidier as chomp, split /;/
You shouldn't use double-quotes around variables when you want just the value of the variable, so print $new_tree "$string" should be print $new_tree $string
Rather than trying to use the data from the other file (please try to use useful names for items in your question, as it's tough to know what to call them when writing a solution) it is best to build a hash that contains all the translations
This program will do as you ask. It builds a regex consisting of an alternation of all the hash keys, and then converts all ocurrences of that pattern into its corresponding name. Only those names that are in your sample other file are translated: the others are left as they are
use strict;
use warnings;
use 5.014; # For non-destructive substitution
use autodie;
my %names;
open my $fh, '<', 'other_file.txt';
while ( <$fh> ) {
my ($k, $v) = split /;/, s/\s+\z//r;
$names{$k} = $v;
}
open $fh, '<', 'phylip.txt';
my $data = <$fh>;
close $fh;
my $re = join '|', sort { length $b <=> length $a } keys %names;
$re = qr/(?:$re)/;
$data =~ s/\b($re)\b/$names{$1}/g;
print $data;
output
((Theileria parva:0.23400159127856412500,(((aaaaaaaaxv:0.44910864993667892753,aaaaaaaagf:0.51328033054009691849):0.06090419044604544752,((aaaaaaabyc:0.11709094683204501752,aaaaaaafzz:0.04488198976629347720):0.09529995111708353117,((aaaaaaadbn:0.34408087090010841536,Cryptosporidium hominis:0.47991503739434709930):0.06859184769990583908,((Ciona savignyi:0.09244297511609228524,Homo sapiens:0.12568841555837687030):0.28431
Update
Here is a revised version of your own program with the above points accounted for and the bugs fixed
use strict;
use warnings;
open my $tree_fh, '<', $ARGV[0] or die qq{Failed to open "$ARGV[0]" for input: $!\n};
my $string = <$tree_fh>;
close $tree_fh;
open my $csv_fh, '<', $ARGV[1] or die qq{Failed to open "$ARGV[1]" for input: $!\n};
while ( <$csv_fh> ) {
chomp;
my ($find, $replace) = split /;/;
$string =~ s/$find/$replace/g;
}
close $csv_fh;
open my $new_tree_fh, '>', 'raxml_tree.phy' or die qq{Failed to open "raxml_tree.phy" for output: $!\n};
print $new_tree_fh $string;
close $new_tree_fh;

splitting a large file into small files based on column value in perl

I am trying to split up a large file (having around 17.6 million data) into 6-7 small files based on the column value.Currently, I am using sql bcp utility to dump in all data into one table and creating seperate files using bcp out utility.
But someone suggested me to use Perl as it would be more faster and you don't need to create a table for that.As I am not a perl guy. I am not sure how to do it in perl.
Any help..
INPUT file :
inputfile.txt
0010|name|address|city|.........
0020|name|number|address|......
0030|phone no|state|street|...
output files:
0010.txt
0010|name|address|city|.........
0020.txt
0020|name|number|address|......
0030.txt
0030|phone no|state|street|...
It is simplest to keep a hash of output file handles, keyed by the file name. This program shows the idea. The number at the start of each record is used to create the name of the file where it belongs, and file of that name is opened unless we already have a file handle for it.
All of the handles are closed once all of the data has been processed. Any errors are caught by use autodie, so explicit checking of the open, print and close calls is unnecessary.
use strict;
use warnings;
use autodie;
open my $in_fh, '<', 'inputfile.txt';
my %out_fh;
while (<$in_fh>) {
next unless /^(\d+)/;
my $filename = "$1.txt";
open $out_fh{$filename}, '>', $filename unless $out_fh{$filename};
print { $out_fh{$filename} } $_;
}
close $_ for values %out_fh;
Note close caught me out here because, unlike most operators that work on $_ if you pass no parameters, a bare close will close the currently selected file handle. That is a bad choice IMO, but it's way to late to change it now
17.6 million rows is going to be a pretty large file, I'd imagine. It'll still be slow with perl to process.
That said, you're going to want something like the below:
use strict;
use warnings;
my $input = 'FILENAMEHERE.txt';
my %results;
open(my $fh, '<', $input) or die "cannot open input file: $!";
while (<$fh>) {
my ($key) = split '|', $_;
my $array = $results{$key} || [];
push $array, $_;
$results{$key} = $array;
}
for my $filename (keys %results) {
open(my $out, '>', "$filename.txt") or die "Cannot open output file $out: $!";
print $out, join "\n", $results{$filename};
close($out);
}
I haven't explicitly tested this, but it should get you going in the right direction.
$ perl -F'|' -lane '
$key = $F[0];
$fh{$key} or open $fh{$key}, ">", "$key.txt" or die $!;
print { $fh{$key} } $_
' inputfile.txt
perl -Mautodie -ne'
sub out { $h{$_[0]} ||= open(my $f, ">", "$_[0].txt") && $f }
print { out($1) } $_ if /^(\d+)/;
' file

help merging perl code routines together for file processing

I need some perl help in putting these (2) processes/code to work together. I was able to get them working individually to test, but I need help bringing them together especially with using the loop constructs. I'm not sure if I should go with foreach..anyways the code is below.
Also, any best practices would be great too as I'm learning this language. Thanks for your help.
Here's the process flow I am looking for:
read a directory
look for a particular file
use the file name to strip out some key information to create a newly processed file
process the input file
create the newly processed file for each input file read (if i read in 10, I create 10 new files)
Part 1:
my $target_dir = "/backups/test/";
opendir my $dh, $target_dir or die "can't opendir $target_dir: $!";
while (defined(my $file = readdir($dh))) {
next if ($file =~ /^\.+$/);
#Get filename attributes
if ($file =~ /^foo(\d{3})\.name\.(\w{3})-foo_p(\d{1,4})\.\d+.csv$/) {
print "$1\n";
print "$2\n";
print "$3\n";
}
print "$file\n";
}
Part 2:
use strict;
use Digest::MD5 qw(md5_hex);
#Create new file
open (NEWFILE, ">/backups/processed/foo$1.name.$2-foo_p$3.out") || die "cannot create file";
my $data = '';
my $line1 = <>;
chomp $line1;
my #heading = split /,/, $line1;
my ($sep1, $sep2, $eorec) = ( "^A", "^E", "^D");
while (<>)
{
my $digest = md5_hex($data);
chomp;
my (#values) = split /,/;
my $extra = "__mykey__$sep1$digest$sep2" ;
$extra .= "$heading[$_]$sep1$values[$_]$sep2" for (0..scalar(#values));
$data .= "$extra$eorec";
print NEWFILE "$data";
}
#print $data;
close (NEWFILE);
You are using an old-style of Perl programming. I recommend you to use functions and CPAN modules (http://search.cpan.org). Perl pseudocode:
use Modern::Perl;
# use...
sub get_input_files {
# return an array of files (#)
}
sub extract_file_info {
# takes the file name and returs an array of values (filename attrs)
}
sub process_file {
# reads the input file, takes the previous attribs and build the output file
}
my #ifiles = get_input_files;
foreach my $ifile(#ifiles) {
my #attrs = extract_file_info($ifile);
process_file($ifile, #attrs);
}
Hope it helps
I've bashed your two code fragments together (making the second a sub that the first calls for each matching file) and, if I understood your description of the objective correctly, this should do what you want. Comments on style and syntax are inline:
#!/usr/bin/env perl
# - Never forget these!
use strict;
use warnings;
use Digest::MD5 qw(md5_hex);
my $target_dir = "/backups/test/";
opendir my $dh, $target_dir or die "can't opendir $target_dir: $!";
while (defined(my $file = readdir($dh))) {
# Parens on postfix "if" are optional; I prefer to omit them
next if $file =~ /^\.+$/;
if ($file =~ /^foo(\d{3})\.name\.(\w{3})-foo_p(\d{1,4})\.\d+.csv$/) {
process_file($file, $1, $2, $3);
}
print "$file\n";
}
sub process_file {
my ($orig_name, $foo_x, $name_x, $p_x) = #_;
my $new_name = "/backups/processed/foo$foo_x.name.$name_x-foo_p$p_x.out";
# - From your description of the task, it sounds like we actually want to
# read from the found file, not from <>, so opening it here to read
# - Better to use lexical ("my") filehandle and three-arg form of open
# - "or" has lower operator precedence than "||", so less chance of
# things being grouped in the wrong order (though either works here)
# - Including $! in the error will tell why the file open failed
open my $in_fh, '<', $orig_name or die "cannot read $orig_name: $!";
open(my $out_fh, '>', $new_name) or die "cannot create $new_name: $!";
my $data = '';
my $line1 = <$in_fh>;
chomp $line1;
my #heading = split /,/, $line1;
my ($sep1, $sep2, $eorec) = ("^A", "^E", "^D");
while (<$in_fh>) {
chomp;
my $digest = md5_hex($data);
my (#values) = split /,/;
my $extra = "__mykey__$sep1$digest$sep2";
$extra .= "$heading[$_]$sep1$values[$_]$sep2"
for (0 .. scalar(#values));
# - Useless use of double quotes removed on next two lines
$data .= $extra . $eorec;
#print $out_fh $data;
}
# - Moved print to output file to here (where it will print the complete
# output all at once) rather than within the loop (where it will print
# all previous lines each time a new line is read in) to prevent
# duplicate output records. This could also be achieved by printing
# $extra inside the loop. Printing $data at the end will be slightly
# faster, but requires more memory; printing $extra within the loop and
# getting rid of $data entirely would require less memory, so that may
# be the better option if you find yourself needing to read huge input
# files.
print $out_fh $data;
# - $in_fh and $out_fh will be closed automatically when it goes out of
# scope at the end of the block/sub, so there's no real point to
# explicitly closing it unless you're going to check whether the close
# succeeded or failed (which can happen in odd cases usually involving
# full or failing disks when writing; I'm not aware of any way that
# closing a file open for reading can fail, so that's just being left
# implicit)
close $out_fh or die "Failed to close file: $!";
}
Disclaimer: perl -c reports that this code is syntactically valid, but it is otherwise untested.