How to search and replace a column in a CSV via perl? - perl

So, I'm super new to perl... I'm on a windows box within a Corporate network. I am unable to download CPAN modules...
I have a CSV that is exported with multiple columns, the 2nd column contains 2 rows with text and the rest of the rows are IP addresses. I need to change the first 3 octets of the IP for all the rows excluding the text obviously. I then need to either save it to the same file, or create a new one... I also need all the other columns to remain in the document. I've looked and looked and everyone suggests modules which makes sense but I cannot get them. Here is my code:
if ( $cmd[0] eq "update" ) {
print "Old subnet ex: 10.0.0\n";
my $oldsubnet = <STDIN>;
chomp $oldsubnet;
print "New subnet ex: 10.0.0\n";
my $newsubnet = <STDIN>;
chomp $newsubnet;
my $file = "path\\file.csv";
open( my $fh, '<', $file ) or die "'$file' would not open $!";
while ( my $line = <$fh> ) {
chomp $line;
my #fields = split ",", $line;
my $string = $fields[1];
$string =~ s/$oldsubnet/$newsubnet/g;
my $ofile = "path\\test.csv";
open( my $ofh, '>>', $ofile ) or die "'$ofile' would not open $!";
print $ofh "$string\n";
}
}
So I just end up with a single column with the updated IP's with this code. How in the world do I get the rest of the CSV... Is there an easier way to just replace the IP's???? It seems like this would be easy but the CSV is making it hard. Any help would be appreciated. P.S. I am using strict and warnings :)
So just to clarify I am aware that #fields is being left out and thus not being printed... What I'm asking is how do I update only the data in the 2nd column and then put the whole document back into a new file?

You need to output more than just the changed field...
my #fields = split(/,/, $line);
$fields[1] =~ s/$oldsubnet/$newsubnet/g;
print $ofh join(',', #fields);
Also, don't open the output file in the loop to read the input file.

Without your source CSV, I can't say for sure, but would be thinking "use the Text::CSV module":
#!/usr/bin/env perl;
use strict;
use warnings;
use Text::CSV;
print "Old subnet ex: 10.0.0\n";
my $oldsubnet = <STDIN>;
chomp $oldsubnet;
print "New subnet ex: 10.0.0\n";
my $newsubnet = <STDIN>;
chomp $newsubnet;
my $file = "path\\file.csv";
my $ofile = "path\\test.csv";
open( my $input, '<', $file ) or die "'$file' would not open $!";
open( my $output, '>', $ofile ) or die "'$ofile' would not open $!";
my $csv = Text::CSV->new( { binary => 1, eol => "\n" } );
while ( my $row = $csv->getline() ) {
$row->[1] =~ s/$oldsubnet/$newsubnet/;
$csv->print( $output, $row );
}
It's not always the best answer, but where you're trying to read CSV, change a single field, and write CSV it works quite nicely.

Related

Writing a file and Reading it in Perl

I'm trying to build a primary key into a new file from an original File which has the following structure (tbl_20180615.txt):
573103150033,0664,54,MSS02VEN*',INT,zxzc,,,,,
573103150033,0665,54,MSS02VEN,INT,zxzc,,,,,
573103150080,0659,29,MSS05ARA',INT,zxzc,,,,,
573103150080,0660,29,MSS05ARA ,INT,zxzc,,,,,
573103154377,1240,72,MSSTRI01,INT,zxzc,,,,,
573103154377,1240,72,MSSTRI01,INT,zxzc,,,,,
I launch my perl Verify.pl then I send the arguments, the first one is the number of columns to build the primary key in the new file, after I have to send the name of file (original file).
(Verify.pl)
#!/usr/bin/perl
use strict;
use warnings;
my $n1 = $ARGV[0];
my $name = $ARGV[1];
$n1 =~ s/"//g;
my $n2 = $n1 + 1;
my %seen;
my ( $file3 ) = qw(log.txt);
open my $fh3, '>', $file3 or die "Can't open $file3: $!";
print "Loading file ...\n";
open( my $file, "<", "$name" ) || die "Can't read file somefile.txt: $!";
while ( <$file> ) {
chomp;
my #rec = split( /,/, $_, $n2 ); #$n2 sirve para armar la primary key, hacer le split en los campos deseados
for ( my $i = 0; $i < $n1; $i++ ) {
print $fh3 "#rec[$i],";
}
print $fh3 "\n";
}
close( $file );
print "Done!\n";
#########to check duplicates
my ($file4) = qw(log.txt);
print "Checking duplicates records...\n\n";
open (my $file4, "<", "log.txt") || die "Can't read file log.txt: $!";
while ( <$file4> ) {
print if $seen{$_}++;
}
close($file4);
if I send the following instruction
perl Verify.pl 2 tbl_20180615.txt
this code build a new file called "log.txt" with the following structure, splitting the original file () into two columns given by the first argument:
(log.txt)
573103150033,0664,
573103150033,0665,
573103150080,0659,
573103150080,0660,
573103154377,1240,
573103154377,1240,
That works ok, but if I want to read the new file log.txt to check duplicates, it doesn't work, but If I comment the lines to generate the file log.txt (listed above) before the line in the code (###############to check duplicates################) launch the next part of the code it works ok, giving me two duplicates lines and looks like this:
(Result in command line)
573103154377,1240
573103154377,1240
How can I solve this issue?
I think this does what you're asking for. It builds a unique list of derived keys before printing any of them, using a hash to check whether a key has already been generated
Note that I have assigned values to #ARGV to emulate input values. You must remove that statement before running the program with input from the command line
#!/usr/bin/perl
use strict;
use warnings;
use autodie; # Handle bad IO statuses automatically
local #ARGV = qw/ 2 tbl_20180615.txt /; # For testing only
tr/"//d for #ARGV; # "
my ($key_fields, $input_file) = #ARGV;
my $output_file = 'log.txt';
my (#keys, %seen);
print "Loading input ... ";
open my $in_fh, '<', $input_file;
while ( <$in_fh> ) {
chomp;
my #rec = split /,/;
my $key = join ',', #rec[0..$key_fields-1];
push #keys, $key unless $seen{$key}++;
}
print "Done\n";
open my $out_fh, '>', $output_file;
print $out_fh "$_\n" for #keys;
close $out_fh;
output log.txt
573103150033,0664
573103150033,0665
573103150080,0659
573103150080,0660
573103154377,1240

perl write variables to a file

Here's my code to parse a configuration file, write the retrieved data to another file and send it to a MySQL database.
The database connection and writing data to a table works fine, however I can't get it to write data to the mentioned file mongoData.txt.
I'm quite new to Perl, so any help will be highly appreciated.
#!/usr/bin/perl
use strict;
use warnings;
use DBI;
my $line;
# Retrieving data
open( my $FILE, "<", "/etc/mongod.conf" )
or die "Cannot find file! : $!\n";
while ( $line = <$FILE> ) {
chomp($line);
my ( $KEY, $VALUE ) = split /\:/, $line;
# Ignoring commented lines
$_ = $line;
unless ( $_ = ~/^#/ ) {
# Write to file
open my $FILE2, ">", "/home/sierra/Documents/mongoData.txt"
or die "Cannot create file $!\n";
print $FILE2 "$KEY", "$VALUE\n";
}
# Connection to SQL database
my $db = DBI->connect(( "dbi:mysql:dbname=mongodconf;
host = localhost;", "root", "sqladmin"
)) or die "can't connect to mysql";
# Inserting into database
$db->do("insert into data values ('$KEY', '$VALUE')")
or die "query error\n";
}
close($FILE);
Every time you open a file for output, you create a new file and delete any pre-existing file with the same name. That means you're going to be left with only the last line you wrote to the file
Here are some more pointers
Variable identifiers should in general be all in digits, lower case letters, and underscores. Capital letters are reserved for global identifiers such as package names
If you are running a version of Perl later than v5.14 then you can use autodie which checks all IO operations for you and removes the need to test the return status by hand
If you use a die string that has no newline at the end, then Perl will add information about the source file name and line number where it occurred, which can be useful for debugging
It is unnecessary to name your loop control variables. Programs can be made much more concise and readable by using Perl's pronoun variable $_ which is the default for many built-in operators
It is wasteful to reconnect to your database every time you need to make changes. You should connect once at the top of your program and use that static connection throughout your code
You should use placeholders when passing parameter expressions to an SQL operation. It can be dangerous, and that way DBI will quote them correctly for you
There is no need to close input files explicitly. Everything will be closed automatically at the end of the program. But if you are worried about the integrity of your output data, you may want to do an explicit close on output file handles so that you can check that they succeeded
Here's what I would write. Rather than testing whether each line of the input begins with a hash, it removes everything from the first hash character onwards and then checks to see if there are any non-blank characters in what remains. That allows for trailing comments in the data
#!/usr/bin/perl
use strict;
use warnings 'all';
use autodie;
use DBI;
my ($input, $output, $dsn) = qw{
/etc/mongod.conf
/home/sierra/Documents/mongoData.txt
dbi:mysql:dbname=mongodconf;host=localhost;
};
open my $fh, '<', $input;
open my $out_fh, '>', $output;
my $dbh = DBI->connect($dsn, qw/ root sqladmin /)
or die "Can't connect to MySQL: $DBI::errstr";
while ( <$fh> ) {
chomp;
s/#.*//;
next unless /\S/;
my ( $key, $val ) = split /\:/;
print $out_fh "$key $val\n";
$dbh->do('insert into data values (?, ?)', $key, $val);
}
close $out_fh or die $!;
$dbh->disconnect or warn $dbh->errstr;
You need to append the text into the creating new file mongoData.txt
while ($line=<$FILE>)
{
chomp ($line);
my ($KEY, $VALUE) = split /\:/,$line;
# Ignoring commented lines
$_ = $line;
unless ($_ = ~/^#/)
{
open my $FILE2, ">>", "/home/sierra/Documents/mongoData.txt" or die "Cannot create file $!\n";
print $FILE2 "$KEY","$VALUE\n";
}
}
close($FILE2);
or else
Create the text file once before your nesting the while loop
open my $FILE2, ">", "/home/sierra/Documents/mongoData.txt"
or die "Cannot create file $!\n";
while ($line=<$FILE>)
{
chomp ($line);
my ($KEY, $VALUE) = split /\:/,$line;
# Ignoring commented lines
$_ = $line;
unless ($_ = ~/^#/)
{
print $FILE2 "$KEY","$VALUE\n";
}
}
close($FILE2);
May be this will help you.

Extract data from file

I have data like
"scott
E -45 COLLEGE LANE
BENGALI MARKET
xyz -785698."
"Tomm
D.No: 4318/3,Ansari Road, Dariya Gunj,
xbc - 289235."
I wrote one Perl program to extract names i.e;
open(my$Fh, '<', 'printable address.txt') or die "!S";
open(my$F, '>', 'names.csv') or die "!S";
while (my#line =<$Fh> ) {
for(my$i =0;$i<=13655;$i++){
if ($line[$i]=~/^"/) {
print $F $line[$i];
}
}
}
It works fine and it extracts names exactly .Now my aim is to extract address that is like
BENGALI MARKET
xyz -785698."
D.No: 4318/3,Ansari Road, Dariya Gunj,
xbc - 289235."
In CSV file. How to do this please tell me
There are a lot of flaws with your original problem. Should address those before suggesting any enhancements:
Always have use strict; and use warnings; at the top of every script.
Your or die "!S" statements are broken. The error code is actually in $!. However, you can skip the need to do that by just having use autodie;
Give your filehandles more meaningful names. $Fh and $F say nothing about what those are for. At minimum label them as $infh and $outfh.
The while (my #line = <$Fh>) { is flawed as that can just be reduced to my #line = <$Fh>;. Because you're going readline in a list context it will slurp the entire file, and the next loop it will exit. Instead, assign it to a scalar, and you don't even need the next for loop.
If you wanted to slurp your entire file into #line, your use of for(my$i =0;$i<=13655;$i++){ is also flawed. You should iterate to the last index of #line, which is $#line.
if ($line[$i]=~/^"/) { is also flawed as you leave the quote character " at the beginning of your names that you're trying to match. Instead add a capture group to pull the name.
With the suggested changes, the code reduces to:
use strict;
use warnings;
use autodie;
open my $infh, '<', 'printable address.txt';
open my $outfh, '>', 'names.csv';
while (my $line = <$infh>) {
if ($line =~ /^"(.*)/) {
print $outfh "$1\n";
}
}
Now if you also want to isolate the address, you can use a similar method as you did with the name. I'm going to assume that you might want to build the whole address in a variable so you can do something more complicated with it than throwing them blindly at a file. However, mirroring the file setup for now:
use strict;
use warnings;
use autodie;
open my $infh, '<', 'printable address.txt';
open my $namefh, '>', 'names.csv';
open my $addressfh, '>', 'address.dat';
my $address = '';
while (my $line = <$infh>) {
if ($line =~ /^"(.*)/) {
print $namefh "$1\n";
} elsif ($line =~ /(.*)"$/) {
$address .= $1;
print $addressfh "$address\n";
$address = '';
} else {
$address .= $line;
}
}
Ultimately, no matter what you want to use your data for, your best solution is probably to output it to a real CSV file using Text::CSV. That way it can be imported into a spreadsheet or some other system very easily, and you won't have to parse it again.
use strict;
use warnings;
use autodie;
use Text::CSV;
my $csv = Text::CSV->new ( { binary => 1, eol => "\n" } )
or die "Cannot use CSV: ".Text::CSV->error_diag ();
open my $infh, '<', 'printable address.txt';
open my $outfh, '>', 'address.csv';
my #data;
while (my $line = <$infh>) {
# Name Field
if ($line =~ /^"(.*)/) {
#data = ($1, '');
# End of Address
} elsif ($line =~ /(.*)"$/) {
$data[1] .= $1;
$csv->print($outfh, \#data);
# Address lines
} else {
$data[1] .= $line;
}
}

How to search and replace using hash with Perl

I'm new to Perl and I'm afraid I am stuck and wanted to ask if someone might be able to help me.
I have a file with two columns (tab separated) of oldname and newname.
I would like to use the oldname as key and newname as value and store it as a hash.
Then I would like to open a different file (gff file) and replace all the oldnames in there with the newnames and write it to another file.
I have given it my best try but am getting a lot of errors.
If you could let me know what I am doing wrong, I would greatly appreciate it.
Here are how the two files look:
oldname newname(SFXXXX) file:
genemark-scaffold00013-abinit-gene-0.18 SF130001
augustus-scaffold00013-abinit-gene-1.24 SF130002
genemark-scaffold00013-abinit-gene-1.65 SF130003
file to search and replace in (an example of one of the lines):
scaffold00013 maker gene 258253 258759 . - . ID=maker-scaffold00013-augustus-gene-2.187;Name=maker-scaffold00013-augustus-gene-2.187;
Here is my attempt:
#!/usr/local/bin/perl
use warnings;
use strict;
my $hashfile = $ARGV[0];
my $gfffile = $ARGV[1];
my %names;
my $oldname;
my $newname;
if (!defined $hashfile) {
die "Usage: $0 hash_file gff_file\n";
}
if (!defined $gfffile) {
die "Usage: $0 hash_file gff_file\n";
}
###save hashfile with two columns, oldname and newname, into a hash with oldname as key and newname as value.
open(HFILE, $hashfile) or die "Cannot open $hashfile\n";
while (my $line = <HFILE>) {
chomp($line);
my ($oldname, $newname) = split /\t/;
$names{$oldname} = $newname;
}
close HFILE;
###open gff file and replace all oldnames with newnames from %names.
open(GFILE, $gfffile) or die "Cannot open $gfffile\n";
while (my $line2 = <GFILE>) {
chomp($line2);
eval "$line2 =~ s/$oldname/$names{oldname}/g";
open(OUT, ">SFrenamed.gff") or die "Cannot open SFrenamed.gff: $!";
print OUT "$line2\n";
close OUT;
}
close GFILE;
Thank you!
Your main problem is that you aren't splitting the $line variable. split /\t/ splits $_ by default, and you haven't put anything in there.
This program builds the hash, and then constructs a regex from all the keys by sorting them in descending order of length and joining them with the | regex alternation operator. The sorting is necessary so that the longest of all possible choices is selected if there are any alternatives.
Every occurrence of the regex is replaced by the corresponding new name in each line of the input file, and the output written to the new file.
use strict;
use warnings;
die "Usage: $0 hash_file gff_file\n" if #ARGV < 2;
my ($hashfile, $gfffile) = #ARGV;
open(my $hfile, '<', $hashfile) or die "Cannot open $hashfile: $!";
my %names;
while (my $line = <$hfile>) {
chomp($line);
my ($oldname, $newname) = split /\t/, $line;
$names{$oldname} = $newname;
}
close $hfile;
my $regex = join '|', sort { length $b <=> length $a } keys %names;
$regex = qr/$regex/;
open(my $gfile, '<', $gfffile) or die "Cannot open $gfffile: $!";
open(my $out, '>', 'SFrenamed.gff') or die "Cannot open SFrenamed.gff: $!";
while (my $line = <$gfile>) {
chomp($line);
$line =~ s/($regex)/$names{$1}/g;
print $out $line, "\n";
}
close $out;
close $gfile;
Why are you using an eval? And $oldname is going to be undefined in the second while loop, because the first while loop you redeclare them in that scope (even if you used the outer scope, it would store the very last value that you processed, which wouldn't be helpful).
Take out the my $oldname and my $newname at the top of your script, it is useless.
Take out the entire eval line. You need to repeat the regex for each thing you want to replace. Try something like:
$line2 =~ s/$_/$names{$_}/g for keys %names;
Also see Borodin's answer. He made one big regex instead of a loop, and caught your lack of the second argument to split.

help merging perl code routines together for file processing

I need some perl help in putting these (2) processes/code to work together. I was able to get them working individually to test, but I need help bringing them together especially with using the loop constructs. I'm not sure if I should go with foreach..anyways the code is below.
Also, any best practices would be great too as I'm learning this language. Thanks for your help.
Here's the process flow I am looking for:
read a directory
look for a particular file
use the file name to strip out some key information to create a newly processed file
process the input file
create the newly processed file for each input file read (if i read in 10, I create 10 new files)
Part 1:
my $target_dir = "/backups/test/";
opendir my $dh, $target_dir or die "can't opendir $target_dir: $!";
while (defined(my $file = readdir($dh))) {
next if ($file =~ /^\.+$/);
#Get filename attributes
if ($file =~ /^foo(\d{3})\.name\.(\w{3})-foo_p(\d{1,4})\.\d+.csv$/) {
print "$1\n";
print "$2\n";
print "$3\n";
}
print "$file\n";
}
Part 2:
use strict;
use Digest::MD5 qw(md5_hex);
#Create new file
open (NEWFILE, ">/backups/processed/foo$1.name.$2-foo_p$3.out") || die "cannot create file";
my $data = '';
my $line1 = <>;
chomp $line1;
my #heading = split /,/, $line1;
my ($sep1, $sep2, $eorec) = ( "^A", "^E", "^D");
while (<>)
{
my $digest = md5_hex($data);
chomp;
my (#values) = split /,/;
my $extra = "__mykey__$sep1$digest$sep2" ;
$extra .= "$heading[$_]$sep1$values[$_]$sep2" for (0..scalar(#values));
$data .= "$extra$eorec";
print NEWFILE "$data";
}
#print $data;
close (NEWFILE);
You are using an old-style of Perl programming. I recommend you to use functions and CPAN modules (http://search.cpan.org). Perl pseudocode:
use Modern::Perl;
# use...
sub get_input_files {
# return an array of files (#)
}
sub extract_file_info {
# takes the file name and returs an array of values (filename attrs)
}
sub process_file {
# reads the input file, takes the previous attribs and build the output file
}
my #ifiles = get_input_files;
foreach my $ifile(#ifiles) {
my #attrs = extract_file_info($ifile);
process_file($ifile, #attrs);
}
Hope it helps
I've bashed your two code fragments together (making the second a sub that the first calls for each matching file) and, if I understood your description of the objective correctly, this should do what you want. Comments on style and syntax are inline:
#!/usr/bin/env perl
# - Never forget these!
use strict;
use warnings;
use Digest::MD5 qw(md5_hex);
my $target_dir = "/backups/test/";
opendir my $dh, $target_dir or die "can't opendir $target_dir: $!";
while (defined(my $file = readdir($dh))) {
# Parens on postfix "if" are optional; I prefer to omit them
next if $file =~ /^\.+$/;
if ($file =~ /^foo(\d{3})\.name\.(\w{3})-foo_p(\d{1,4})\.\d+.csv$/) {
process_file($file, $1, $2, $3);
}
print "$file\n";
}
sub process_file {
my ($orig_name, $foo_x, $name_x, $p_x) = #_;
my $new_name = "/backups/processed/foo$foo_x.name.$name_x-foo_p$p_x.out";
# - From your description of the task, it sounds like we actually want to
# read from the found file, not from <>, so opening it here to read
# - Better to use lexical ("my") filehandle and three-arg form of open
# - "or" has lower operator precedence than "||", so less chance of
# things being grouped in the wrong order (though either works here)
# - Including $! in the error will tell why the file open failed
open my $in_fh, '<', $orig_name or die "cannot read $orig_name: $!";
open(my $out_fh, '>', $new_name) or die "cannot create $new_name: $!";
my $data = '';
my $line1 = <$in_fh>;
chomp $line1;
my #heading = split /,/, $line1;
my ($sep1, $sep2, $eorec) = ("^A", "^E", "^D");
while (<$in_fh>) {
chomp;
my $digest = md5_hex($data);
my (#values) = split /,/;
my $extra = "__mykey__$sep1$digest$sep2";
$extra .= "$heading[$_]$sep1$values[$_]$sep2"
for (0 .. scalar(#values));
# - Useless use of double quotes removed on next two lines
$data .= $extra . $eorec;
#print $out_fh $data;
}
# - Moved print to output file to here (where it will print the complete
# output all at once) rather than within the loop (where it will print
# all previous lines each time a new line is read in) to prevent
# duplicate output records. This could also be achieved by printing
# $extra inside the loop. Printing $data at the end will be slightly
# faster, but requires more memory; printing $extra within the loop and
# getting rid of $data entirely would require less memory, so that may
# be the better option if you find yourself needing to read huge input
# files.
print $out_fh $data;
# - $in_fh and $out_fh will be closed automatically when it goes out of
# scope at the end of the block/sub, so there's no real point to
# explicitly closing it unless you're going to check whether the close
# succeeded or failed (which can happen in odd cases usually involving
# full or failing disks when writing; I'm not aware of any way that
# closing a file open for reading can fail, so that's just being left
# implicit)
close $out_fh or die "Failed to close file: $!";
}
Disclaimer: perl -c reports that this code is syntactically valid, but it is otherwise untested.