Comparing a csv file with file and finding the match is not working - perl

I am comparing csv file with another normal file both the files have lot of similar words(Fields) but it is not matching
my $file = "sample.csv";
open my $fh, "<", $file or die "$file: $!";
my $csv1 = Text::CSV->new ({
binary => 1, # Allow special character. Always set this
auto_diag => 1, # Report irregularities immediately
});
my #lines = read_file("brand1.txt");
my $count = 0;
while (my $row = $csv1->getline ($fh)) {
$count = $count + 1;
foreach my $line(#lines) {
my $che = $row->[4];
print $count;
if ($line eq $che){
print $line ."\t". $che;
}
}
}
This code gives me blank output in terminal.
But comparing two files(without csv file) works with the same script

The best thing one can do when trying to figure out why two things aren't equal is to print those two things.
print "<<<$line>>>\n>>>$che<<<\n";
This will show you visually what the differences are, which most of the time will make it obvious.
In your case the issue is that read_file doesn't chomp input, so each line in #lines has a \n at the end. However, your parsed CSV does not.
If you do this:
chomp #lines;
It should work fine.

Related

match first columns in two files

I have two files of unequal sizes. First file has two columns and second has only one column. I want to match the column in second file to the first column in first file and if they match, print the whole line from the first file. Pretty simple but I am stuck. Here's what I did after opening and storing the contents of both the files in arrays
foreach(#q) #second file
{
$line=$_;
foreach(#gs) #first file
{
$line1=$_;
if ( $line1=~ /$line/ )
{
print $line1;
}
}
}
This doesnt give an output.
I suspect you might be getting tripped up by line endings for one or both of your files. Regardless, it's not necessary to slurp both your files, just the 2nd one. And a regex is most likely overkill, a simple equality check is sufficient, and more likely what you intend.
The following is probably what you intend:
use strict;
use warnings;
use autodie;
my $file1 = 'foo.txt';
my $file2 = 'bar.txt';
open my $fh2, '<', $file2;
my #keys = <$fh2>;
chomp(#keys);
open my $fh1, '<', $file1;
while (my $line = <$fh1>) {
my $fields = split ' ', $line;
if (grep {$fields[0] eq $_} #keys) {
print $line;
}
}
use strict;
use warnings;
my $file2 = 'foo.txt';
my $file1 = 'bar.txt';
my #line1;
open FF,$file2;
while(<FF>)
{
unshift(#line1,$_);
}
close(FF);
open FH,$file1;
while(<FH>)
{
my $se=$_;
chomp($se);
foreach my $data (#line1)
{
if($data=~m/^\s*$se\s*\t/is)
{
print $data."\n";
}
}
}
close(FH);
Try This....

Perl - Read multiple files and read line by line of the text file

I am trying to read multiple .txt files in a folder. Each file should be read line by line, however, I failed to read multiple .txt files by using glob. Any advice on my code?
my %data;
#FILES = glob("*.txt");
$EmailMsg .= "EG. Folder(week) = Folder(CW01) --CW01 = Week 1 -- Number is week\n ";
$EmailMsg .= "=======================================================================================================\n";
# Try to Loop multiple files here
foreach my $file (#FILES) {
local $/ = undef;
open my $fh, '<', $file;
$data{$file} = <$fh>;
# Read the file one line at a time.
while (my $line = <$fh>) {
chomp $line;
$line =~ s/^\s+//;
$line =~ s/\s+$//;
my ($name, $date, $week) = split /\:/, $line;
if ($name eq "NoneFolder") {
$EmailMsg .= "Folder ($week) - No Folder created on the FTP! Failed to open folder!\n";
}
if ($name eq "EmptyFiles") {
$EmailMsg .= "Folder ($week) - No Files insides the folder! Failed download files!\n";
}
}
}
$EmailMsg .= "=======================================================================================================\n";
$EmailMsg .= "Please note that if you receive this email means that the script is running fine just that no folder is created or no files inside the folder for the week on the FTP.\n";
# close the file.
#close <$fh>;
Currently output:
EG. Folder(week) = Folder(CW01) --CW01 = Week 1 -- Number is week
=======================================================================================================
=======================================================================================================
Please note that if you receive this email means that the script is running fine just that no folder is created or no files inside the folder for the week on the FTP.
It failed to get any .txt files.
You are trying to read each file twice: firstly into the hash %data and then again line by line.
Once you have reached end of file, you have to either reopen the file or use seek to move the read pointer back to the beginning.
You also need to set $/ back to its original value, otherwise your loop will read the entire file instead of one line at a time.
It's not clear whether you really need the second copy of the file data in the hash, but you can avoid having to reset $/ by putting the change within a block, like this
open my $fh, '<', $file;
$data{$file} = do {
local $/ = undef;
<$fh>;
};
and then reset the file pointer to the start again before the while loop.
seek $fh, 0, 0;
#!/usr/bin/perl
use strict;
use warnings FATAL => 'all';
my #files=('Read a file.pl','Read a single text file.pl','Read only one
file.pl','Read the file using while.pl','Reading the file.pl');
foreach my $i(#files) {
open(FH, "<$i");
{
while (my $row = <FH>) {
chomp $row;
print "$row\n";
}
}
}
The file globbing works for me. You might want to specify scope for your #FILES variable and check that there actually are files matching the path you have specified,
#!/bin/env perl
use strict;
use warnings;
## glob on all files in home directory
## see: http://perldoc.perl.org/File/Glob.html
use File::Glob ':globally';
my #configs = <~myname/project/etc/*.cfg>;
foreach my $fn (#configs) {
print "file $fn\n";
}
your code,
my %data;
#here are some .c files,
my #FILES = glob("../*.c");
foreach my $fn (#FILES) {
print "file $fn\n";
}
exit;
This way catches more garbage for about the same amount of code.
my $PATH = shift #ARGV ;
chomp $PATH ;
opendir(TXTFILE,$PATH) || die ("failed to opendir: $PATH") ;
my #file = readdir TXTFILE ;
closedir(TXTFILE) ;
foreach(#file) { #
next unless ($_ =~ /\.txt$/i) ; # Only get .txt files
$PATH =~ s/\/$//g ; $PATH =~ s/$/\// ; # Uniform trailing slash
my $thisfile = $PATH . $_ ; # now a fully qualified filename
unless (open(THISFILE,$thisfile)) { # Notify on busted files.
warn ("$thisfile failed to open") ;
next ;
}
while(<THISFILE>) {
# etc. etc.
}
close(THISFILE) ;
}

Parsing Tab Delimited File into an array

I am attempting to read a CSV into an array in a way that I can access each column in a row. However when I run the following code with the goal of printing a specific column from each row, it only outputs empty lines.
#set command line arguments
my ($infi, $outdir, $idcol) = #ARGV;
#lead file of data to get annotations for
open FILE, "<", $infi or die "Can't read file '$infi' [$!]\n";
my #data;
foreach my $row (<FILE>){
chomp $row;
my #cells = split /\t/, $row;
push #data, #cells;
}
#fetch genes
foreach (#data){
print "#_[$idcol]\n";
# print $geneadaptor->fetch_by_dbID($_[$idcol]);
}
With a test input of
a b c
1 2 3
d e f
4 5 6
I think the issue here isn't so much loading the file, but in treating the resulting array. How should I be approaching this problem?
First of all you need to push #data, \#cells, otherwise you will get all the fields concatenated into a single list.
Then you need to use the loop value in the second for loop.
foreach (#data){
print $_->[$idcol], "\n";
}
#_ is a completely different variable from $_ and is unpopulated here.
You should also consider using
while (my $row = <FILE>) { ... }
to read your file. It reads only a single line at a time whereas for will read the entire file into a list of lines before iterating over it.
I recommend to avoid parsing the CSV file directly and using the Text::CSV module.
use Text::CSV;
use Carp;
#set command line arguments
my ($infi, $outdir, $idcol) = #ARGV;
my $csv = Text::CSV->new({
sep_char => "\t"
});
open(my $fh, "<:encoding(UTF-8)", $infi) || croak "can't open $infi: $!";
# Uncomment if you need to skip header line
# <$fh>;
while (<$fh>) {
if ($csv->parse($_)) {
my #columns = $csv->fields();
print "$columns[0]\t$columns[1]\t$columns[2]\n";
} else {
my $err = $csv->error_input;
print "Failed to parse line: $err";
}
}
close $fh;

File manipulation in Perl

I have a simple .csv file that has that I want to extract data out of a write to a new file.
I to write a script that reads in a file, reads each line, then splits and structures the columns in a different order, and if the line in the .csv contains 'xxx' - dont output the line to output file.
I have already managed to read in a file, and create a secondary file, however am new to Perl and still trying to work out the commands, the following is a test script I wrote to get to grips with Perl and was wondering if I could aulter this to to what I need?-
open (FILE, "c1.csv") || die "couldn't open the file!";
open (F1, ">c2.csv") || die "couldn't open the file!";
#print "start\n";
sub trim($);
sub trim($)
{
my $string = shift;
$string =~ s/^\s+//;
$string =~ s/\s+$//;
return $string;
}
$a = 0;
$b = 0;
while ($line=<FILE>)
{
chop($line);
if ($line =~ /xxx/)
{
$addr = $line;
$post = substr($line, length($line)-18,8);
}
$a = $a + 1;
}
print $b;
print " end\n";
Any help is much appreciated.
To manipulate CSV files it is better to use one of the available modules at CPAN. I like Text::CSV:
use Text::CSV;
my $csv = Text::CSV->new ({ binary => 1, empty_is_undef => 1 }) or die "Cannot use CSV: ".Text::CSV->error_diag ();
open my $fh, "<", 'c1.csv' or die "ERROR: $!";
$csv->column_names('field1', 'field2');
while ( my $l = $csv->getline_hr($fh)) {
next if ($l->{'field1'} =~ /xxx/);
printf "Field1: %s Field2: %s\n", $l->{'field1'}, $l->{'field2'}
}
close $fh;
If you need do this only once, so don't need the program later you can do it with oneliner:
perl -F, -lane 'next if /xxx/; #n=map { s/(^\s*|\s*$)//g;$_ } #F; print join(",", (map{$n[$_]} qw(2 0 1)));'
Breakdown:
perl -F, -lane
^^^ ^ <- split lines at ',' and store fields into array #F
next if /xxx/; #skip lines what contain xxx
#n=map { s/(^\s*|\s*$)//g;$_ } #F;
#trim spaces from the beginning and end of each field
#and store the result into new array #n
print join(",", (map{$n[$_]} qw(2 0 1)));
#recombine array #n into new order - here 2 0 1
#join them with comma
#print
Of course, for the repeated use, or in a bigger project you should use some CPAN module. And the above oneliner has much cavetas too.

help merging perl code routines together for file processing

I need some perl help in putting these (2) processes/code to work together. I was able to get them working individually to test, but I need help bringing them together especially with using the loop constructs. I'm not sure if I should go with foreach..anyways the code is below.
Also, any best practices would be great too as I'm learning this language. Thanks for your help.
Here's the process flow I am looking for:
read a directory
look for a particular file
use the file name to strip out some key information to create a newly processed file
process the input file
create the newly processed file for each input file read (if i read in 10, I create 10 new files)
Part 1:
my $target_dir = "/backups/test/";
opendir my $dh, $target_dir or die "can't opendir $target_dir: $!";
while (defined(my $file = readdir($dh))) {
next if ($file =~ /^\.+$/);
#Get filename attributes
if ($file =~ /^foo(\d{3})\.name\.(\w{3})-foo_p(\d{1,4})\.\d+.csv$/) {
print "$1\n";
print "$2\n";
print "$3\n";
}
print "$file\n";
}
Part 2:
use strict;
use Digest::MD5 qw(md5_hex);
#Create new file
open (NEWFILE, ">/backups/processed/foo$1.name.$2-foo_p$3.out") || die "cannot create file";
my $data = '';
my $line1 = <>;
chomp $line1;
my #heading = split /,/, $line1;
my ($sep1, $sep2, $eorec) = ( "^A", "^E", "^D");
while (<>)
{
my $digest = md5_hex($data);
chomp;
my (#values) = split /,/;
my $extra = "__mykey__$sep1$digest$sep2" ;
$extra .= "$heading[$_]$sep1$values[$_]$sep2" for (0..scalar(#values));
$data .= "$extra$eorec";
print NEWFILE "$data";
}
#print $data;
close (NEWFILE);
You are using an old-style of Perl programming. I recommend you to use functions and CPAN modules (http://search.cpan.org). Perl pseudocode:
use Modern::Perl;
# use...
sub get_input_files {
# return an array of files (#)
}
sub extract_file_info {
# takes the file name and returs an array of values (filename attrs)
}
sub process_file {
# reads the input file, takes the previous attribs and build the output file
}
my #ifiles = get_input_files;
foreach my $ifile(#ifiles) {
my #attrs = extract_file_info($ifile);
process_file($ifile, #attrs);
}
Hope it helps
I've bashed your two code fragments together (making the second a sub that the first calls for each matching file) and, if I understood your description of the objective correctly, this should do what you want. Comments on style and syntax are inline:
#!/usr/bin/env perl
# - Never forget these!
use strict;
use warnings;
use Digest::MD5 qw(md5_hex);
my $target_dir = "/backups/test/";
opendir my $dh, $target_dir or die "can't opendir $target_dir: $!";
while (defined(my $file = readdir($dh))) {
# Parens on postfix "if" are optional; I prefer to omit them
next if $file =~ /^\.+$/;
if ($file =~ /^foo(\d{3})\.name\.(\w{3})-foo_p(\d{1,4})\.\d+.csv$/) {
process_file($file, $1, $2, $3);
}
print "$file\n";
}
sub process_file {
my ($orig_name, $foo_x, $name_x, $p_x) = #_;
my $new_name = "/backups/processed/foo$foo_x.name.$name_x-foo_p$p_x.out";
# - From your description of the task, it sounds like we actually want to
# read from the found file, not from <>, so opening it here to read
# - Better to use lexical ("my") filehandle and three-arg form of open
# - "or" has lower operator precedence than "||", so less chance of
# things being grouped in the wrong order (though either works here)
# - Including $! in the error will tell why the file open failed
open my $in_fh, '<', $orig_name or die "cannot read $orig_name: $!";
open(my $out_fh, '>', $new_name) or die "cannot create $new_name: $!";
my $data = '';
my $line1 = <$in_fh>;
chomp $line1;
my #heading = split /,/, $line1;
my ($sep1, $sep2, $eorec) = ("^A", "^E", "^D");
while (<$in_fh>) {
chomp;
my $digest = md5_hex($data);
my (#values) = split /,/;
my $extra = "__mykey__$sep1$digest$sep2";
$extra .= "$heading[$_]$sep1$values[$_]$sep2"
for (0 .. scalar(#values));
# - Useless use of double quotes removed on next two lines
$data .= $extra . $eorec;
#print $out_fh $data;
}
# - Moved print to output file to here (where it will print the complete
# output all at once) rather than within the loop (where it will print
# all previous lines each time a new line is read in) to prevent
# duplicate output records. This could also be achieved by printing
# $extra inside the loop. Printing $data at the end will be slightly
# faster, but requires more memory; printing $extra within the loop and
# getting rid of $data entirely would require less memory, so that may
# be the better option if you find yourself needing to read huge input
# files.
print $out_fh $data;
# - $in_fh and $out_fh will be closed automatically when it goes out of
# scope at the end of the block/sub, so there's no real point to
# explicitly closing it unless you're going to check whether the close
# succeeded or failed (which can happen in odd cases usually involving
# full or failing disks when writing; I'm not aware of any way that
# closing a file open for reading can fail, so that's just being left
# implicit)
close $out_fh or die "Failed to close file: $!";
}
Disclaimer: perl -c reports that this code is syntactically valid, but it is otherwise untested.