How do I copy a CSV file, but skip the first line? - perl

I want to write a script that takes a CSV file, deletes its first row and creates a new output csv file.
This is my code:
use Text::CSV_XS;
use strict;
use warnings;
my $csv = Text::CSV_XS->new({sep_char => ','});
my $file = $ARGV[0];
open(my $data, '<', $file) or die "Could not open '$file'\n";
my $csvout = Text::CSV_XS->new({binary => 1, eol => $/});
open my $OUTPUT, '>', "file.csv" or die "Can't able to open file.csv\n";
my $tmp = 0;
while (my $line = <$data>) {
# if ($tmp==0)
# {
# $tmp=1;
# next;
# }
chomp $line;
if ($csv->parse($line)) {
my #fields = $csv->fields();
$csvout->print($OUTPUT, \#fields);
} else {
warn "Line could not be parsed: $line\n";
}
}
On the perl command line I write: c:\test.pl csv.csv and it doesn't create the file.csv output, but when I double click the script it creates a blank CSV file. What am I doing wrong?

Your program isn't ideally written, but I can't tell why it doesn't work if you pass the CSV file on the command line as you have described. Do you get the errors Could not open 'csv.csv' or Can't able to open file.csv? If not then the file must be created in your current directory. Perhaps you are looking in the wrong place?
If all you need to do is to drop the first line then there is no need to use a module to process the CSV data - you can handle it as a simple text file.
If the file is specified on the command line, as in c:\test.pl csv.csv, you can read from it without explicitly opening it using the <> operator.
This program reads the lines from the input file and prints them to the output only if the line counter (the $. variable) isn't equal to one).
use strict;
use warnings;
open my $out, '>', 'file.csv' or die $!;
while (my $line = <>) {
print $out $line unless $. == 1;
}

Yhm.. you don't need any modules for this task, since CSV ( comma separated value ) are simply text files - just open file, and iterate over its lines ( write to output all lines except particular number, e.g. first ). Such task ( skip first line ) is so simple, that it would be probably better to do it with command line one-liner than a dedicated script.
quick search - see e.g. this link for an example, there are numerous tutorials about perl input/output operations
http://learn.perl.org/examples/read_write_file.html
PS. Perl scripts ( programs ) usually are not "compiled" into binary file - they are of course "compiled", but, uhm, on the fly - that's why /usr/bin/perl is called rather "interpreter" than "compiler" like gcc or g++. I guess what you're looking for is some editor with syntax highlighting and other development goods - you probably could try Eclipse with perl plugin for that ( cross platform ).
http://www.eclipse.org/downloads/
http://www.epic-ide.org/download.php/
this
user#localhost:~$ cat blabla.csv | perl -ne 'print $_ if $x++; '
skips first line ( prints out only if variable incremented AFTER each use of it is more than zero )

You are missing your first (and only) argument due to Windows.
I think this question will help you: #ARGV is empty using ActivePerl in Windows 7

Related

Replace a string in file

My file is like this
DIV=25
FACILITY=11111
and I want to use Perl to replace DIV=25 into DIV=30. Below is my script to do it, but the output of the file is DIV=3030
open( IN_IOE, $FILE_NAME ) || die "Cannot open file";
my #line_ioe = <IN_IOE>;
close(IN_IOE);
chomp #line_ioe;
foreach $_ ( #line_ioe ) {
s/DIV=/DIV=30/
}
open( OUT, ">test.txt" );
foreach $_ (#line_ioe) {
print OUT "$_ \n";
}
close(OUT);
The output of my file is
DIV=3030
FACILITY=11111
Can anyone please show me how to replace that line in file with Perl, and point out where I was wrong.
You can do that in one line of Perl at the command line:
perl -pi -e 's/DIV=25/DIV=30/' file.txt
if you have multiple lines with different numeric numbers (i.e. DIV=25, DIV =31, DIV=21) you could do following.
s/DIV=\d+/DIV=25/g
here \d is to replace any digits and 'g' to perform this globally.
The code you show certainly didn't change DIV=30 into DIV=3030. It didn't do anything at all because you have opened your output file for input
This line
open( OUT, "<test.txt");
should look like this
open OUT, '>', 'test.txt' or die $!;
Also, if you want to replace DIV=30 with DIV=25 then you need to write that. I think it's clear that the substitution
s/DIV=/DIV=25/
will change DIV=30 into DIV=2530. Use this instead
s/DIV=30/DIV=25/

Perl: Merge Multiple Text File and Split Filename

I am using below solution from earlier solution from Alan, which works to combine text files with Pipe Character. Thanks !
Merge multiple text files and append current file name at the end of each line
#!/usr/bin/env perl
use strict;
use warnings;
use Text::CSV;
my $csv = Text::CSV->new( { 'sep_char' => '|' } );
open my $fho, '>', 'combined.csv' or die "Error opening file: $!";
while ( my $file = <*.txt> ) {
open my $fhi, '<', $file or die "Error opening file: $!";
( my $last_field = $file ) =~ s/\.[^\.]+$//; # Strip the file extension off
while ( my $row = $csv->getline($fhi) ) {
$csv->combine( #$row, $last_field ); # Construct new row by appending the file name without the extension
print $fho $csv->string, "\n"; # Write the combined string to combined.csv
}
}
If anyone could help enhance the solution with following 3 more requirements, would be helpful.
1) Within my text data, some data is within quotation marks in forllowing format; |"XYZ NR 456"|
Above solution is placing these data into different columns when I open the final combine.csv file, Is there a way to ensure all data remains combined within pipe character when merged.
2) Delete an entire line where it finds word Place_of_Destination
3) Current solution adds the filename at end of each line. I also want to split the file name with pipe characters
My filename structure is X_INRUS6_08072013.txt, solution adds pipe character and filename X_INRUS6_08072013 at the end of each line, What I also want to do is to split this file name further in following ; X | INRUS6 | 08072013
Is this possible, Thanks in advance for all the help.
As for your point 2, in order to skip the lines, you could use a 'next if', with the effect of skipping the line concerned:
next if ($line =~ /Place_of_Destination/)
or 'unless', as your wish:
unless ($line =~ /Place_of_Destination/) {
# do something
}
If I understood properly what you are trying to do this test is to be used in your second 'while' loop.

I want to replace a sequence name in fasta file with another name

I have one fasta file and one text file fasta file contains sequences in fasta format and text file contains name of genes now I want to replace name of the sequences in fasta file after '>' sign with the gene names in text file
I am new to perl though I have written a script but I don't know why its not working can anyone help me on that please
following is my script:
print"Enter annotated file...";
$f1=<STDIN>;
print"Enter sequence file...";
$f2=<STDIN>;
open(FILE1,$f1) || die"Can't open $f1";
#annotfile=<FILE1>;
open(FILE2,$f2) || die"Can't open $f2";
#seqfile=<FILE2>;
#d=split('\t',#annotfile[0]);
for($i=0;$i<scalar(#annotfile);$i++)
{
#curr_all=split('\t',#annotfile[$i]);
#curr_id[$i]=#curr_all[0];
#gene_nm[$i]=#curr_all[1];
}
for($j=0;$j<scalar(#seqfile);$j++)
{
$id=#curr_id[$j];
$gene=#gene_nm[$j];
#seqfile[$j]=~s/$id[$j]/$gene[$j]/g;
print #seqfile[$j];
}
my files looks like following:
annot.txt
pool75_contig_389 ubiquitin ligase e3a
pool75_contig_704 tumor susceptibility
pool75_contig_1977 serine threonine-protein phosphatase 4 catalytic subunit
pool75_contig_3064 bardet-biedl syndrome 2 protein P
pool75_contig_2499 succinyl- ligase
goat300.fasta
goat300.fasta
>pool75_contig_704
CCCTTTCTCCCTTCCCAACATTCAGAGATACTGAATCGAAACTCTTACTGTCTGTTAGAT
GACAAAGAGTTATCCATCCTACATACTCCAATTTCCTTCCGCAACTTGTGATTTCGCCGC
TTGAATCTTGACGCCGTGCGTCCACAGTTTGTTGTGTTTTATCAATCAAGGTCATTATCA
ACCGAAGACGCTATCTATTTTCTTGGCGAAGCTCTCGGAAAGGAGCCATCGAAATGGAAG
TATTTCTCAAGAAAGTCCGCGAGTTATCCCGGAAGCAGTTC
>pool75_contig_389
GACCTATACCGGACCGTCACTGAAAGNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
ACGATCCAGGCATGGAGTTGTGGTGACGAGTAGGAGGGTCACCGTGGTGAGCGGGAAGCC
TCGGGCGTGAGCCTGGGTGGAGCCGCCACGGGTGCAGATCTTGGTGGTAGTAGCAAATAT
TCAAGTGAGAACCTTGAAGGCCGAGGTGGAGAAGGNNNNNNNNNNNNNNNNNNNNNNNNN
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNTCATTTGTAT
CGCCCGGAAAACGTCACAAGAACGGGAGTTGCGTACAGAA
>pool75_contig_1977
AAGGGACACCGTTGGGTGAGGCGAGCTGCGTTCCTCGAACCATGGCTTCAAAAAGCGACT
TAGACCGTCAGATTGAACAGCTCAGGGCCTGCAAGCTCATTACAGAGGATGAGGTTAAGG
CACTCTGCGCTAAGGCGCGTGAGATTTTAATTGAAGAGAGTAATGTCCAGTGCGTGGACT
CACCTGTCACGGTTTGTGGCGATATCCACGGCCAGTTTTACGACTTGATTGAACTGTTTA
AAGTGGGCGGAGATGTTC
>pool75_contig_3064
TTACTATTTCTGGGCCTTAAGACTGGCTTAGTCGCTTACGACCCTTATAACAATGTAGAT
GTATATTATAAGGATCTTCCTGATGGTGCTAACGCTATGTTAATTTATTCAAACTCACCG
ACAAAGGAACAGAATATGCTTTGGCAGGTGGAAACTGTTCGATAATTGGATTGAACGACG
GCGGATGCGAGGTATTTTGGACAGTCACTGGCGACTCCGTTTGCTCTCTTTGCTCGATTA
AATCCGACAGCGATAAGTCAAGAGATTTTGTGGTTGGCTCTGAAGATTTTGACATCCGAA
TCTTCCATGGGGATGCCATAATATATGAAATCACGGAGTCTGATG
>pool75_contig_2499
AAGAGAAGAGGTGAGTTTGAGTATTGTTTGTGTGTGTGTGGTTGGGTGAGTGTGTGGTAT
GTGGTGTATGTGTGTGATGAATGTATGTGAAAGAGAGTGATGAATCTCATGGATATGTTC
GAGTTCGTGGTTTCCATTGATCGGTTATAGCCGAGATGATGGATGTGTTCCATGTGTCTG
ATTTCAGTTTAGGATTGTGTTGATGATGTTGATGATGAAAATTGTTGATGGTGATGACGA
TAGTGATGATGATGACGATGTTTCGGATAATGGTGATGATGATGATGGTTCCGACGATGA
TGTTTCGCTTGATGATGGTGATAATGATGACTCCGAAAATAACGTTGACTCGGATGAG
Consider using Bio::SeqIO to parse your Fasta dataset, instead of doing it yourself. Bio::SeqIO lives for this task, and is well developed for it. Additionally, if you're in bioinformatics, it would serve you well to get to know Bio::SeqIO. Given this, consider the following:
use strict;
use warnings;
use Bio::SeqIO;
open my $fh, '<', 'annot.txt' or die $!;
my %annot = map { /(\S+)\s+(.+)/; $1 => $2 } <$fh>;
close $fh;
my $in = Bio::SeqIO->new( -file => 'goat300.fasta', -format => 'Fasta' );
while ( my $seq = $in->next_seq() ) {
my $seqID = $annot{ $seq->id } // $seq->id;
print "$seqID\n" . $seq->seq . "\n";
}
Output on your datasets:
tumor susceptibility
CCCTTTCTCCCTTCCCAACATTCAGAGATACTGAATCGAAACTCTTACTGTCTGTTAGATGACAAAGAGTTATCCATCCTACATACTCCAATTTCCTTCCGCAACTTGTGATTTCGCCGCTTGAATCTTGACGCCGTGCGTCCACAGTTTGTTGTGTTTTATCAATCAAGGTCATTATCAACCGAAGACGCTATCTATTTTCTTGGCGAAGCTCTCGGAAAGGAGCCATCGAAATGGAAGTATTTCTCAAGAAAGTCCGCGAGTTATCCCGGAAGCAGTTC
ubiquitin ligase e3a
GACCTATACCGGACCGTCACTGAAAGNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNACGATCCAGGCATGGAGTTGTGGTGACGAGTAGGAGGGTCACCGTGGTGAGCGGGAAGCCTCGGGCGTGAGCCTGGGTGGAGCCGCCACGGGTGCAGATCTTGGTGGTAGTAGCAAATATTCAAGTGAGAACCTTGAAGGCCGAGGTGGAGAAGGNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNTCATTTGTATCGCCCGGAAAACGTCACAAGAACGGGAGTTGCGTACAGAA
serine threonine-protein phosphatase 4 catalytic subunit
AAGGGACACCGTTGGGTGAGGCGAGCTGCGTTCCTCGAACCATGGCTTCAAAAAGCGACTTAGACCGTCAGATTGAACAGCTCAGGGCCTGCAAGCTCATTACAGAGGATGAGGTTAAGGCACTCTGCGCTAAGGCGCGTGAGATTTTAATTGAAGAGAGTAATGTCCAGTGCGTGGACTCACCTGTCACGGTTTGTGGCGATATCCACGGCCAGTTTTACGACTTGATTGAACTGTTTAAAGTGGGCGGAGATGTTC
bardet-biedl syndrome 2 protein P
TTACTATTTCTGGGCCTTAAGACTGGCTTAGTCGCTTACGACCCTTATAACAATGTAGATGTATATTATAAGGATCTTCCTGATGGTGCTAACGCTATGTTAATTTATTCAAACTCACCGACAAAGGAACAGAATATGCTTTGGCAGGTGGAAACTGTTCGATAATTGGATTGAACGACGGCGGATGCGAGGTATTTTGGACAGTCACTGGCGACTCCGTTTGCTCTCTTTGCTCGATTAAATCCGACAGCGATAAGTCAAGAGATTTTGTGGTTGGCTCTGAAGATTTTGACATCCGAATCTTCCATGGGGATGCCATAATATATGAAATCACGGAGTCTGATG
succinyl- ligase
AAGAGAAGAGGTGAGTTTGAGTATTGTTTGTGTGTGTGTGGTTGGGTGAGTGTGTGGTATGTGGTGTATGTGTGTGATGAATGTATGTGAAAGAGAGTGATGAATCTCATGGATATGTTCGAGTTCGTGGTTTCCATTGATCGGTTATAGCCGAGATGATGGATGTGTTCCATGTGTCTGATTTCAGTTTAGGATTGTGTTGATGATGTTGATGATGAAAATTGTTGATGGTGATGACGATAGTGATGATGATGACGATGTTTCGGATAATGGTGATGATGATGATGGTTCCGACGATGATGTTTCGCTTGATGATGGTGATAATGATGACTCCGAAAATAACGTTGACTCGGATGAG
The hash %annot is initialized by reading and capturing the contents of your annot.txt data. A Bio::SeqIO object is created using your goat300.fasta file data. The while loop iterates through your fasta sequences. The variable $seqID either takes the associated value of the key in the %annot hash or it keeps the current sequence ID (the // notation means defined or, so that insures $seqID will be defined). Finally, the Fasta record is printed.
Hope this helps!
There were a lot of warnings in your code, and your approach was inefficient. Let me first show you a working Perl program. I'll explain afterwards.
#!/usr/bin/perl
use strict;
use warnings;
# Read the annotations file
print"Enter annotated file...\n";
# my $f1 = <STDIN>;
my $f1 = 'annot.txt';
open(my $fh_annotations, '<', $f1) or die "Can't open $f1";
my #annotfile = <$fh_annotations>;
close $fh_annotations;
# Read the sequence file
print"Enter sequence file...\n";
# my $f2 = <STDIN>;
my $f2 = 'goat300.fasta';
open(my $fh_genes, '<', $f2) or die "Can't open $f2";
my #seqfile = <$fh_genes>;
close $fh_genes;
# Process the annotations data
my %names; # this hash is going to hold the names
foreach my $line (#annotfile) {
chomp $line; # remove newline
my #fields = split /\t/, $line; # split into array
$names{$fields[0]} = $fields[1]; # save in the hash as key->value pair
}
# Process the sequence data
foreach my $line (#seqfile) {
# Look at each line
if ($line =~ m/>(.+)$/) {
# If there is a heading there, remember it...
if (exists $names{$1}) {
# ... check if we know a name for it and replace it in the line
$line =~ s/($1)/$names{$1}/;
}
}
# output the line (this would be done to another filehandle)
print $line;
}
This reads both files and saves them in memory, just like yours did. But instead of trying to build two arrays for the names, I went with a hash, which is a key/value pair. Think of it like an array with names instead of numbers and no particular sorting.
Once these names are set up, I can process the sequence file. I simply look at each line and check if there is a heading there, by looking for the > sign. If it's there (it goes into $1 because of the parenthesis), I look if we have a hash entry (with exists) in our %names hash. If we do, we can replace the heading with the proper name.
After that, we could write it out to a new file. I'm just printing it.
I've used a few other techniques. Unfortunately the literature people get in a BioPerl context is quite outdated. Please take this advice, it will make your live easier.
Always use strict and warnings. They will tell you about problems with your code.
Always declare your variables with my. This is not like other languages, where you need to set up a variable at the top of your problem. You can declare it where you need it. The vars only live in a certain scope, which means between the nearest enclosing { and } brackets, or block.
Use three-argument open and lexical file handles for security. Read more here.
Perl offers foreach as an alternative to the C for loop. In this case, it made things a lot easier.
One more thing about this program: While this example data was rather short, I believe your actual data might be a lot larger. Consider processing the sequence file while you read it so you do not run out of memory. There's no need to save all the lines, unless you want to do something else with them.
open my $fh_out, '>', $filename_out or die $!;
open my $fh_in, '<', $filename_in or die $!;
while (my $line = <$fh_in>) {
# do stuff with the line, like your regex
print $fh_out $line;
}
close $fh_in;
close $fh_out;

Perl program skipping first line in csv file

I am trying to understand an error with a PERL program. I have a comma-separated file, and I want to extract the contents of each row to a separate text file, using the contents of the first field in each row as the file name.
The program below does exactly this EXCEPT it skips the first line of the csv file. I tried to nail down the source of the error by adding a couple of print commands. The print command on line 22 shows that the first line is read by the command in line 21. But, once the foreach loop starts, the first line is not printed.
I'm not quite sure of the problem. I appreciate any help!
#!/usr/bin/perl
# script that takes a .csv file (such as that exported from Excel) and
# extracts the contents of each row into a separate text file, using the first column as the filename
# original source: http://www.tek-tips.com/viewthread.cfm?qid=1516940
# modified 3/14/12
# usage = ./export_rows.pl <yourfilename>.csv
use warnings;
use strict;
use Text::CSV_XS;
use Tie::Handle::CSV;
unless(#ARGV) {
print "Please supply a .csv file at the command line! For example, export_rows.pl myfile.csv\n";
exit;
}
my $fh = Tie::Handle::CSV->new(file => $ARGV[0],
header => 0);
my #headers = #{scalar <$fh>};
print "$headers[0]\n\n";
foreach my $csv_line (<$fh>) {
print "$csv_line->[0]\n";
open OUT, "> $csv_line->[0].txt" or die "Could not open file $csv_line->[0].txt for output.\n$!";
for my $i (1..$#headers) {
print OUT "$csv_line->[$i]\n";
}
close OUT;
}
close $fh;
Try beginning at 0 in your for loop:
for my $i (1..$#headers)
Should be:
for my $i (0..$#headers)
EDIT:
To get the first line of the file you can use Tie::File
Here is sample code:
my #arr;
tie #arr, 'Tie::File', 'a.txt' or die $!;
my $first = $arr[0];
untie #arr;
print "$first\n";
This module is cool in that it allow you to access lines of a file via accessing indices in an array. If your file is big it is not incredibly efficient, but I think you can definitely use it here.

How to remove one line from a file using Perl?

I'm trying to remove one line from a text file. Instead, what I have wipes out the entire file. Can someone point out the error?
removeReservation("john");
sub removeTime() {
my $name = shift;
open( FILE, "<times.txt" );
#LINES = <FILE>;
close(FILE);
open( FILE, ">times.txt" );
foreach $LINE (#LINES) {
print NEWLIST $LINE unless ( $LINE =~ m/$name/ );
}
close(FILE);
print("Reservation successfully removed.<br/>");
}
Sample times.txt file:
04/15/2012&08:00:00&bob
04/15/2012&08:00:00&john
perl -ni -e 'print unless /whatever/' filename
Oalder's answer is correct, but he should have tested whether the open statements succeeded or not. If the file times.txt doesn't exist, your program would continue on its merry way without a word of warning that something terrible has happened.
Same program as oalders' but:
Testing the results of the open.
Using the three part open statement which is more goof proof. If your file name begins with > or |, your program will fail with the old two part syntax.
Not using global file handles -- especially in subroutines. File handles are normally global in scope. Imagine if I had a file handle named FILE in my main program, and I was reading it, I called this subroutine. That would cause problems. Use locally scoped file handle names.
Variable names should be in lowercase. Constants are all uppercase. It's just a standard that developed over time. Not following it can cause confusion.
Since oalders put the program in a subroutine, you should pass the name of your file in the subroutine as well...
Here's the program:
#!/usr/bin/env perl
use strict;
use warnings;
removeTime( "john", "times.txt" );
sub removeTime {
my $name = shift;
my $time_file = shift;
if (not defined $time_file) {
#Make sure that the $time_file was passed in too.
die qq(Name of Time file not passed to subroutine "removeTime"\n);
}
# Read file into an array for processing
open( my $read_fh, "<", $time_file )
or die qq(Can't open file "$time_file" for reading: $!\n);
my #file_lines = <$read_fh>;
close( $read_fh );
# Rewrite file with the line removed
open( my $write_fh, ">", $time_file )
or die qq(Can't open file "$time_file" for writing: $!\n);
foreach my $line ( #file_lines ) {
print {$write_fh} $line unless ( $line =~ /$name/ );
}
close( $write_fh );
print( "Reservation successfully removed.<br/>" );
}
It looks like you're printing to a filehandle which you have not yet defined. At least you haven't defined it in your sample code. If you enable strict and warnings, you'll get the following message:
Name "main::NEWLIST" used only once: possible typo at remove.pl line 16.
print NEWLIST $LINE unless ($LINE =~ m/$name/);
This code should work for you:
#!/usr/bin/env perl
use strict;
use warnings;
removeTime( "john" );
sub removeTime {
my $name = shift;
open( FILE, "<times.txt" );
my #LINES = <FILE>;
close( FILE );
open( FILE, ">times.txt" );
foreach my $LINE ( #LINES ) {
print FILE $LINE unless ( $LINE =~ m/$name/ );
}
close( FILE );
print( "Reservation successfully removed.<br/>" );
}
A couple of other things to note:
1) Your sample code calls removeReservation() when you mean removeTime()
2) You don't require the round brackets in your subroutine definition unless your intention is to use prototypes. See my example above.
This is in the FAQ.
How do I change, delete, or insert a line in a file, or append to the beginning of a file?
It's always worth checking the FAQ.
Just in case someone wants to remove all lines from a file.
For example, a file (4th line is empty; 5th line has 3 spaces):
t e st1
test2 a
e
aa
bb bb
test3a
cc
To remove lines which match a pattern some might use:
# Remove all lines with a character 'a'
perl -pi -e 's/.*a.*//' fileTest && sed -i '/^$/d' fileTest;
The result:
t e st1
e
bb bb
cc
Related:
perl -h
# -p assume loop like -n but print line also, like sed
# -i[extension] edit <> files in place (makes backup if extension supplied)
# -e program one line of program (several -e's allowed, omit programfile)
sed -h
# -i[SUFFIX], --in-place[=SUFFIX]
# edit files in place (makes backup if SUFFIX supplied)
Reference 1, Reference 2