Perl text::csv to check then add field - perl

I am reading an csv file and checking for text within a certain field.
If text exists I would like to keep the line and also add a field.
The following while loop in a program works fine, to save lines with text in col[17]:
while (my $row = $csv->getline($infh)) {
if ($row->[16] ne ""){
my $string = $csv->string;
print {$outfh} $string;
}
}
In another test program the following while loop also works fine, to add a field:
while (my $row = $csv->getline($infh)) {
splice #$row, 1, 0, "1GM";
$csv->print($outfh, $row);
}
I would like to combine these to work in the same program but I am having syntax nitemares.
Can anyone show some pointers to a perl novice?
Here is the full "test program":
#!/usr/bin/perl
use strict;
use warnings;
use Text::CSV;
use Cwd;
use File::Basename;
my $dirname = basename(getcwd);
my $input_file = $dirname . ".txt";
my $output_file = $dirname . '_extract.txt';
my $csv = Text::CSV->new({binary => 1, eol=> "\015\012"})
or die "Cannot use CSV: ".Text::CSV->error_diag ();
open my $infh, $input_file or die "Cannot open '$input_file': $!";
open my $outfh, '>>', $output_file or die "Cannot open '$output_file': $!";
while (my $row = $csv->getline($infh)) {
splice #$row, 1, 0, "1GM";
$csv->print($outfh, $row);
}
close $infh;
close $outfh;
$csv->eof or die "Processing of '$input_file' terminated prematurely";
Added:
I would like the test program to also include the test for the text, somehow using;
if ($row->[16] ne "")
From the other program - So that the extra column is only added to the rows I want to keep.
Cannot figure out how to do this...
Added 2/18: This Worked!, but does not seem very elegant...
#!/usr/bin/perl
use strict;
use warnings;
use Text::CSV;
use Cwd;
use File::Basename;
my $dirname = basename(getcwd);
my $input_file = $dirname . ".txt";
my $output_file = $dirname . '_extract.txt';
my $column = '1GM'; # Text for new column to insert if a match
my $csv = Text::CSV->new({binary => 1, eol=> "\015\012"})
or die "Cannot use CSV: ".Text::CSV->error_diag ();
open my $infh, $input_file or die "Cannot open '$input_file': $!";
open my $outfh, '>>', $output_file or die "Cannot open '$output_file': $!";
while (my $row = $csv->getline($infh)) {
if ($row->[16] ne "") # add/insert column if col[17] has text
{
splice #$row, 1, 0, "$column";
if ($row->[1] eq "$column") # print to file only rows with new column
{
$csv->print($outfh, $row);
}
}
}
close $infh;
close $outfh;
$csv->eof or die "Processing of '$input_file' terminated prematurely";
Added 2/18 5:40pm - Got it..This works!!
while (my $row = $csv->getline($infh)) {
splice #$row, 1, 0, "$column"; # add column
if ($row->[17] ne "") # print if test is true
{
$csv->print($outfh, $row);
}
}

Related

Write old fasta header and new to file

I want to extract the old fasta names which looks something like this:
>Bartonella bibbi
AUUCCGGUUGAUCCUGCCGGAGGCCACUGCUAUCGGGGUCCG
The new headers should look like this:
>Seq1
AUUCCGGUUGAUCCUGCCGGAGGCCACUGCUAUCGGGGUCCG
and so on...
The Bartonella Bibbi should be saved together with the new name Seq1 in a new file an so on. So I've started a bit, by looking for lines with >, and then I split to get an array to get the old name. I don't know how to continue, because I want two things here, first to put the new name in there, but also extracting the old name together with the new in a file, and ALSO get an output file with my sequence and my new names. Please, any input from you will help!
#!/usr/bin/perl
use warnings;
use strict;
my $infile = $ARGV[0];
open my $IN, '<', $infile or die "Could not open $infile: $!, $?";
while (my $line = <$IN>) {
if ($line =~ /^>/) {
my #header = split (/\>/, $line);
my $oldfasta = "$header[1]";
}
}
So after some edits, this is the current script:
#!/usr/bin/perl
use warnings;
use strict;
my $infile = $ARGV[0];
open my $IN, '<', $infile or die "Could not open $infile: $!, $?";
my $seqid = 1;
my %id;
while (my $line = <$IN>) {
if ($line =~ /^>/) {
$id{"Seq$seqid "} = $line;
print ">Seq$seqid\n";
$seqid++
} else {
print $line;
}
}
my $outfile = 'output';
open my $OUT, '>', $outfile or die "Could not open $outfile: $!, $?"; # overwrites the file $outfile;
print $OUT %id;
This gives me a file that looks like this:
Seq29 >Sulfophobococcus_zilligii
Seq20 >Pyrococcus_shinkaii
and so on.
They are not in order, how do I sort them and get rid of the > in the species name?
You’re simply not printing anything. Once you add a print statement, it should work.
In addition, it’s unclear what you’re using split for. Just increase a counter for the sequence:
#!/usr/bin/perl
use warnings;
use strict;
my $infile = $ARGV[0];
open my $IN, '<', $infile or die "Could not open $infile: $!, $?";
my $seqid = 1;
while (my $line = <$IN>) {
if ($line =~ /^>/) {
print ">Seq$seqid\n";
$seqid++;
} else {
print $line;
}
}
Simply write the new entries as you create them.
#!/usr/bin/perl
use warnings;
use strict;
my $infile = $ARGV[0];
open my $IN, '<', $infile or die "Could not open $infile: $!, $?";
my $outfile = 'output';
open my $OUT, '>', $outfile or die "Could not open $outfile: $!, $?"; # overwrites the file $outfile;
my $seqid = 1;
while (my $line = <$IN>) {
if ($line =~ /^>(.+)/) {
print $OUT "Seq$seqid\t$1\n"
print ">Seq$seqid\n";
$seqid++
} else {
print $line;
}
}
I tried to fix the indentation but left the gratutious variable for the $OUT file name.
If you want to keep the mapping in memory for other reasons (maybe to develop this into a much more complex script) using an array instead of a hash would seem like a natural way to keep the entries sorted; the new label is trivially derivable from the array index.

Read multiple files from folder in perl

I'm pretty new on perl and in need for some help, basically what I want is a program that reads all .txt files from a folder, doing the script and throw the output in a new folder with a new name. Everything works when I'm working with one file at the time, specifying the name of the file.. But I can't get it to work with all of the files in the folder. This is how far I've gotten.
#!/usr/bin/perl
use warnings;
use strict;
use Path::Class;
use autodie;
use File::Find;
my #now = localtime();
my $timeStamp = sprintf(
"%04d%02d%02d-%02d:%02d:%02d",
$now[5] + 1900,
$now[4] + 1,
$now[3], $now[2], $now[1], $now[0]); #A function that translates time
my %wordcount;
my $dir = "/home/smenk/.filfolder";
opendir(DIR, $dir) || die "Kan inte öppna $dir: $!";
my #files = grep { /txt/ } readdir(DIR);
closedir DIR;
my $new_dir = dir("/home/smenk/.result"); # Reads in the folder for save
my $new_file = $new_dir->file("$timeStamp.log"); # Reads in the new file timestamp variable
open my $fh, '<', $dir or die "Kunde inte öppna '$dir' $!";
open my $fhn, '>', $new_file or die "test '$new_file'";
foreach my $file (#files) {
open(FH, "/home/smenk/.filfolder/$file") || die "Unable to open $file - $!\n";
while (<FH>) {
}
close(FH);
}
while (my $line = <$fh>) {
foreach my $str (split /\s+/, $line) {
$wordcount{$str}++;
}
}
my #listing = (sort { $wordcount{$b} <=> $wordcount{$a} } keys %wordcount)[0 .. 9];
foreach my $str (#listing) {
my $output = $wordcount{$str} . " $str\n";
print $fhn $output;
}
Here is the simplest skeleton for the reading part using Path::Class (see also dir and file:
#!/usr/bin/perl
use warnings;
use strict;
use Path::Class;
my $src = dir("/home/smenk/.filfolder");
my #txt_files = grep /[.] txt\z/x, $src->children;
for my $txt_file ( #txt_files ) {
my $in = $txt_file->openr;
while (my $line = <$in>) {
print "OUT: $line";
}
}
You can also use another great module Path::Tiny, for dir/file operations and the Time::Piece for the date/time functions - like:
#!/usr/bin/env perl
use strict;
use warnings;
use Path::Tiny;
use Time::Piece;
my #txtfiles = path("/home/smenk/.filfolder")->children(qr/\.txt\z/);
my $outdir = path("home/smenk/.result");
$outdir->mkpath; #create the dir...
my $t = localtime;
my $outfile = $outdir->child($t->strftime("%Y%m%d-%H%M%S.txt"));
$outfile->touch;
my #outdata;
for my $infile (#txtfiles) {
my #lines = $infile->lines({chomp => 1});
#do something with lines and create the output #data
push #outdata, scalar #lines;
}
$outfile->append({truncate => 1}, map { "$_\n" } #outdata); #or spew;

Modify Perl script to run for each file with specified extension in a given directory

I need to modify a Perl script x937.pl to run on all files with extension .x937 within a specific directory. Currently, I use a separate script test.pl that calls my main script, and runs it for each file of that type. However, I need to combine both into one script.
Ideally, I would be able to specify a directory path in the script, and loop through all *.x937 files in that directory.
test.pl
#!/usr/bin/perl -w
use strict;
use Encode;
my #files = <*.x937>;
foreach my $file (#files) {
system('x937.pl', $file);
}
x937.pl
#!/usr/bin/perl -w
use strict;
use Encode;
use warnings;
my $tiff_flag = 0;
my $count = 0;
my $file = "output_$ARGV[0].txt";
unless ( open OPUT, '>' . $file ) {
die "Unable to create $file";
}
open FILE, '<:raw', $ARGV[0] or die "Error opening '$ARGV[0]' $!";
binmode( FILE ) or die 'Error setting binary mode on input file';
while ( read( FILE, $_, 4 ) ) {
my $rec_len = unpack( "N", $_ );
die "Bad record length: $rec_len" unless ( $rec_len > 0 );
read( FILE, $_, $rec_len );
if ( substr( $_, 0, 2 ) eq "\xF5\xF2" ) {
if ( $tiff_flag ) {
$count++;
open( TIFF, '>', 'output_' . $ARGV[0] . '_img' . sprintf( "%04d", $count ) . '.tiff' )
or die "Can't create image file $!";
binmode( TIFF ) or die 'Error setting binary mode on image file';
print TIFF substr( $_, 117 );
close TIFF;
}
$_ = substr( $_, 0, 117 );
}
print OPUT decode( 'cp1047', $_ ) . "\n";
}
close FILE;
close OPUT;
I think I managed to generate this correctly (on iPad, sat on the sofa) ... There could be some typos ; )
Usage: perl test_x397.pl <path>
test_x397.pl
#!/usr/bin/perl -w
use strict; use warnings;
use Encode;
my ($path) = #ARGV;
$path // die "No path specified";
(-e $path) or die "Path not found: $path";
(-d $path) or die "Not a directory: $path";
my #files = <$path/*.x937>;
foreach my $file (#files) {
process($file);
}
sub process {
my ($fname) = #_;
my ($dir, $file) = $fname =~ m{^(.*)/(.+)$};
my $tiff_flag = 0;
my $count = 0;
my $outfile = sprintf("%s/output_%s.txt", $dir, $file);
open (my $outfh, '>', $outfile) or die "Unable to create $outfile. $!";
open (my $infh, '<:raw', $file) or die "Error opening '$file'. $!";
my $buffer = undef;
while (read ($infh,$buffer,4)) {
my $rec_len = unpack("N", $buffer);
die "Bad record length: $rec_len" unless ($rec_len > 0);
read ($infh, $buffer, $rec_len);
if (substr($buffer, 0, 2) eq "\xF5\xF2") {
if ($tiff_flag) {
$count++;
my $tiff_filename = sprintf('%s/output_%s_img%04d.tiff', $dir, $file, $count);
open (my $tiffh, '>', $tiff_filename) or die "Can't create image file $!";
binmode($tiffh) or die 'Error setting binary mode on image file';
print $tiffh substr($buffer, 117);
close $tiffh;
}
$buffer = substr($buffer, 0, 117);
}
print $outfh decode ('cp1047', $buffer) . "\n";
}
close $infh;
close $outfh;
}
A few things to note:
Always use the three argument version of open
Using a scalar filehandle makes it easier to pass it around (not necessary in this example but good practice)
Don't modify $_. It can lead to nasty surprises in larger programs
You already used sprintf to make part of your tiff filename, so why not use it for the whole thing.

Combining two csv files together in perl

Hi i'm very new to perl and i've got litle knowledge on it but i'm trying to create a script that conbines two .csv files into a new one
#!/usr/bin/env perl
use strict;
use warnings;
use Text::CSV_XS;
my #rows;
{ # Read the CSV file
my $csv = Text::CSV_XS->new() or die "Cannot use Text::CSV_XS ($!)";
my $file = "file.csv";
open my $fh, '<', $file or die "Cannot open $file ($!)";
while (my $row = $csv->getline($fh)) {
push #rows, $row;
}
$csv->eof or $csv->error_diag();
close $fh or die "Failed to close $file ($!)";
}
{ # Gather the data
foreach my $row (#rows) {
foreach my $col (#{$row}) {
$col = uc($col);
}
print "\n";
}
}
# (over)Write the data
# Needs to be changed to ADD data
{
my $csv = Text::CSV_XS->new({ binary => 1, escape_char => undef })
or die "Cannot use Text::CSV ($!)";
my $file = "output.csv";
open my $fh, '>', $file or die "Cannot open $file ($!)";
$csv->eol("\n");
foreach my $row (#rows) {
$csv->print($fh, \#{$row}) or die "Failed to write $file ($!)";
}
close $fh or die "Failed to close $file ($!)";
}
this is my current code i do know this over write's the data insted of actually adding it to the new file but this is how far i managed to get with the limited time and knowledge i've got on perl
the csv format of both files are the same.
"Header1";"Header2";"Header3";"Header4";"Header5"
"Data1";"Data2";"Data3";"Data4";"Data5"
"Data1";"Data2";"Data3";"Data4";"Data5"
"Data1";"Data2";"Data3";"Data4";"Data5"
"Data1";"Data2";"Data3";"Data4";"Data5"
"Data1";"Data2";"Data3";"Data4";"Data5"
I believe the issue is here:
open my $fh, '>', $file
or die "Cannot open $file ($!)";
If I remember my Perl properly, the line should read:
open my $fh, '>>', $file
or die "Cannot open $file ($!)";
The >> should open the file handle $fh for append instead of for overwrite.
you could try something like this
opendir(hand,"DIRPATH");
#files = readdir(hand);
closedir(hand);
foreach(#files){
if(/\.csv$/i) { #if the filename has .csv at the end
push(#csvfiles,$_);
}
}
foreach(#csvfiles) {
$csvfile=$_;
open(hanr,"DIRPATH".$csvfile)or die"error $!\n"; #read handler
open(hanw , ">>DIRPATH"."outputfile.csv") or die"error $! \n"; #write handler for creating new sorted files
#lines=();
#lines=<hanr>;
foreach $line (#lines){
chomp $line;
$count++;
next unless $count; # skip header i.e the first line containing stock details
print hanw join $line,"\n";
}
$count= -1;
close(hanw);
close(hanr);
}`

Extracting specific multiple line of records that is pipe delimited in perl

I have a file that looks like
NAME|JOHN|TOKYO|JPN
AGE|32|M
INFO|SINGLE|PROFESSIONAL|IT
NAME|MARK|MANILA|PH
AGE|37|M
INFO|MARRIED|PROFESSIONAL|BPO
NAME|SAMANTHA|SYDNEY|AUS
AGE|37|F
INFO|MARRIED|PROFESSIONAL|OFFSHORE
NAME|LUKE|TOKYO|JPN
AGE|27|M
INFO|SINGLE|PROFESSIONAL|IT
I want to separate the records by country. I have stored each line into array variable #fields
my #fields = split(/\|/, $_ );
making $fields[3] as my basis for sorting it. I wanted it to separate into 2 output text files
OUTPUT TEXT FILE 1:
NAME|JOHN|TOKYO|JPN
AGE|32|M
INFO|SINGLE|PROFESSIONAL|IT
NAME|LUKE|TOKYO|JPN
AGE|27|M
INFO|SINGLE|PROFESSIONAL|IT
OUTPUT TEXT FILE 2
NAME|MARK|MANILA|PH
AGE|37|M
INFO|MARRIED|PROFESSIONAL|BPO
NAME|SAMANTHA|SYDNEY|AUS
AGE|37|F
INFO|MARRIED|PROFESSIONAL|OFFSHORE
Putting all that is from JPN to output text 1 & non-JPN country to output text file 2
here's the code that what trying to work out
use strict;
use warnings;
use Data::Dumper;
use Carp qw(croak);
my #fields;
my $tmp_var;
my $count;
;
my ($line, $i);
my $filename = 'data.txt';
open(my $input_fh, '<', $filename ) or croak "Can't open $filename: $!";
open(OUTPUTA, ">", 'JPN.txt') or die "wsl_reformat.pl: could not open $ARGV[0]";
open(OUTPUTB, ">", 'Non-JPN.txt') or die "wsl_reformat.pl: could not open $ARGV[0]";
my $fh;
while (<$input_fh>) {
chomp;
my #fields = split /\|/;
if ($fields[0] eq 'NAME') {
for ($i=1; $i < #fields; $i++) {
if ($fields[3] eq 'JPN') {
$fh = $_;
print OUTPUTA $fh;
}
else {
$fh = $_;
print OUTPUTB $fh;
}
}
}
}
close(OUTPUTA);
close(OUTPUTB)
Still has no luck on it :(
Here is the way I think ikegami was saying, but I've never tried this before (although it gave the correct results).
#!/usr/bin/perl
use strict;
use warnings;
open my $jpn_fh, ">", 'o33.txt' or die $!;
open my $other_fh, ">", 'o44.txt' or die $!;
my $fh;
while (<DATA>) {
if (/^NAME/) {
if (/JPN$/) {
$fh = $jpn_fh;
}
else {
$fh = $other_fh;
}
}
print $fh $_;
}
close $jpn_fh or die $!;
close $other_fh or die $!;
__DATA__
NAME|JOHN|TOKYO|JPN
AGE|32|M
INFO|SINGLE|PROFESSIONAL|IT
NAME|MARK|MANILA|PH
AGE|37|M
INFO|MARRIED|PROFESSIONAL|BPO
NAME|SAMANTHA|SYDNEY|AUS
AGE|37|F
INFO|MARRIED|PROFESSIONAL|OFFSHORE
NAME|LUKE|TOKYO|JPN
AGE|27|M
INFO|SINGLE|PROFESSIONAL|IT
You didn't say what you needed help with, so I'm assuming it's coming up with an algorithm. Here's a good one:
Open the file to read.
Open the file for the JPN entries.
Open the file for the non-JPN entries.
While not eof,
Read a line.
Parse the line.
If it's the first line of a record,
If the person's country is JPN,
Set current file handle to the file handle for JPN entries.
Else,
Set current file handle to the file handle for non-JPN entries.
Print the line to the current file handle.
my $jpn_qfn = '...';
my $other_qfn = '...';
open(my $jpn_fh, '>', $jpn_qfn)
or die("Can't create $jpn_qfn: $!\n");
open(my $other_fh, '>', $other_qfn)
or die("Can't create $other_qfn: $!\n");
my $fh;
while (<>) {
chomp;
my #fields = split /\|/;
if ($fields[0] eq 'NAME') {
$fh = $fields[3] eq 'JPN' ? $jpn_fh : $other_fh;
}
say $fh $_;
}
#!/usr/bin/env perl
use 5.012;
use autodie;
use strict;
use warnings;
# store per country output filehandles
my %output;
# since this is just an example, read from __DATA__ section
while (my $line = <DATA>) {
# split the fields
my #cells = split /[|]/, $line;
# if first field is NAME, this is a new record
if ($cells[0] eq 'NAME') {
# get the country code, strip trailing whitespace
(my $country = $cells[3]) =~ s/\s+\z//;
# if we haven't created and output file for this
# country, yet, do so
unless (defined $output{$country}) {
open my $fh, '>', "$country.out";
$output{$country} = $fh;
}
my $out = $output{$country};
# output this and the next two lines to
# country specific output file
print $out $line, scalar <DATA>, scalar <DATA>;
}
}
close $_ for values %output;
__DATA__
NAME|JOHN|TOKYO|JPN
AGE|32|M
INFO|SINGLE|PROFESSIONAL|IT
NAME|MARK|MANILA|PH
AGE|37|M
INFO|MARRIED|PROFESSIONAL|BPO
NAME|SAMANTHA|SYDNEY|AUS
AGE|37|F
INFO|MARRIED|PROFESSIONAL|OFFSHORE
NAME|LUKE|TOKYO|JPN
AGE|27|M
INFO|SINGLE|PROFESSIONAL|IT
Thanks for your Help heaps
I was able to solved this problem in perl,
many thanks
#!/usr/local/bin/perl
use strict;
use warnings;
use Data::Dumper;
use Carp qw(croak);
my #fields;
my $tmp_var;
my ($rec_type, $country);
my $filename = 'data.txt';
open (my $input_fh, '<', $filename ) or croak "Can't open $filename: $!";
open my $OUTPUTA, ">", 'o33.txt' or die $!;
open my $OUTPUTB, ">", 'o44.txt' or die $!;
my $Combline;
while (<$input_fh>) {
$_ = _trim($_);
#fields = split (/\|/, $_);
$rec_type = $fields[0];
$country = $fields[3];
if ($rec_type eq 'NAME') {
if ($country eq 'JPN') {
*Combline = $OUTPUTA;
}
else {
*Combline = $OUTPUTB;
}
}
print Combline;
}
close $OUTPUTA or die $!;
close $OUTPUTB or die $!;
sub _trim {
my $word = shift;
if ( $word ) {
$word =~ s/\s*\|/\|/g; #remove trailing spaces
$word =~ s/"//g; #remove double quotes
}
return $word;
}