Related
I have a perl code that extract from .xls file. My .xls file is as below
NUMBER NAME ALPHABET
one Jane a
two Adam b
three Josh c
;four
five Agnes e
six Mary f
;seven
eight Lara h
I want to extract the info and only take column 1 and 2. My perl code is as below.
#!/usr/bin/perl
use warnings;
use strict;
use Spreadsheet::ParseExcel;
main ();
sub main {
my $filename = 'Book1.xls';
my $parser = Spreadsheet::ParseExcel->new();
my $workbook = $parser->parse( $filename );
if ( !defined $workbook ) {
die "-E-: cannot parse <$filename>.\n ", $parser->error(), ".\n";
}
my $worksheet = $workbook -> Worksheet ( 'a' ) || die "-E-: cannot parse family pin list.\n";
my ( $row_min, $row_max ) = $worksheet-> row_range();
open ( my $file,"> output.txt");
for my $row ( 1 .. $row_max ) {
my #data;
for my $col ( 0 ) {
my $number = $worksheet-> get_cell( $row, $col );
if ( $number ) {
push #data, $number-> value();
}
else {
push #data, '';
}
}
for my $col ( 2 ) {
my $alphabet = $worksheet->get_cell( $row, $col );
if ( $alphabet ) {
push #data, $alphabet->value();
print $file "#data\n";
}
else {
push #data, '';
}
}
}
close $file;
print "done\n";
}
The result is
one a
two b
three c
;four
five e
six f
;seven
eight h
I want to remove the entire array that start with string ";". I extend my code like below
open ( my $file,"> output.txt");
for my $row ( 1 .. $row_max ) {
my #data;
for my $col ( 0 ) {
my $number = $worksheet-> get_cell( $row, $col );
if ( $number ) {
push #data, $number-> value();
}
else {
push #data, '';
}
}
for my $col ( 11 ) {
my $alphabet = $worksheet->get_cell( $row, $col );
if ( $alphabet ) {
push #data, $alphabet->value();
}
else {
push #data, '';
}
}
my #new_data = grep(!/;/, #data);
my #latest_data = grep ( $_ ne '', #new_data);
print $file "#latest_data\n";
}
close $file;
print "done\n";
}
The output result produce like below.
one a
two b
three c
five e
six f
eight h
I don't want to be empty space. How i want to eliminate the empty space that produce result as below?
one a
two b
three c
five e
six f
eight h
I also try doing like this, but the result is same.
for my $index (reverse 0..$#data) {
if ( $data[$index] =~ /^;/ ) {
splice(#data, $index, 1);
}
}
print $file "#data\n";
I hope you are looking for next
use warnings;
use strict;
#data=("one a", "two b","three c", ";four d","five e","six e","seven g","eight h")
for(#data){
next if (/^;/);
print $_,"\n";
}
let's get your data gist
#data=("one a", "two b","three c", ";four d","five e","six e","seven g","eight h")
for(#data){
s/^;.*//; # // <-- Put Replacer here between `//` , leave it to remove line
push(#new_data,$_)}
print "$_\n" for #new_data
I am new to Perl. I have excel file say "sample.xls" which looks like follows.
Sample.xls
There are about data of 1000 rows like this. I want to parse this file and write it in another file say "output.xls" with following output format.
output.xls
I have written a script in perl, however, it doesn't give me the exact output the way I want. Also, looks like the script is not very efficient. Can anyone guide me how I can improve my script as well as have my output as shown in "output.xls" ??
Here's the Script:
#!/usr/bin/perl –w
use strict;
use warnings;
use Spreadsheet::ParseExcel;
use Spreadsheet::WriteExcel;
use Spreadsheet::WriteExcel::Chart;
# Read the input and output filenames.
my $inputfile = "path/sample.xls";
my $outputfile = "path/output.xls";
if ( !$inputfile || !$outputfile ) {
die( "Couldn't find file\n" );
}
my $parser = Spreadsheet::ParseExcel->new();
my $inwb = $parser->parse( $inputfile );
if ( !defined $inwb ) {
die "Parsing error: ", $parser->error(), ".\n";
}
my $outwb = Spreadsheet::WriteExcel->new( $outputfile );
my $inws = $inwb->worksheet( "Sheet1" );
my $outws = $outwb->add_worksheet("Sheet1");
my $out_row = 0;
my ( $row_min, $row_max ) = $inws->row_range();
my ( $col_min, $col_max ) = $inws->col_range();
my $format = $outwb->add_format(
center_across => 1,
bold => 1,
size => 10,
border => 4,
color => 'black',
border_color => 'black',
align => 'vcenter',
);
$outws->write(0,0, "Item Name", $format);
$outws->write(0,1, "Spec", $format);
$outws->write(0,2, "First name", $format);
$outws->write(0,3, "Middle Name", $format);
$outws->write(0,4, "Last Name", $format);
$outws->write(0,5, "Customer Number", $format);
$outws->write(0,6, "Age", $format);
$outws->write(0,7, "Units", $format);
my $col_count = 1;
#$row_min = 1;
for my $inws ( $inwb->worksheets() ) {
my ( $row_min, $row_max ) = $inws->row_range();
my ( $col_min, $col_max ) = $inws->col_range();
for my $in_row ( 2 .. $row_max ) {
for my $col ( 0 .. 0 ) {
my $cell = $inws->get_cell( $in_row, $col);
my #fields = split /_/, $cell->value();
next unless $cell;
$outws->write($in_row,$col, $cell->value());
$outws->write($in_row,$col+1, $fields[1]);
}
}
for my $in_row ( 2 .. $row_max ) {
for my $col ( 1 .. 1 ) {
my $cell = $inws->get_cell( $in_row, $col);
my #fields = split /_/, $cell->value();
next unless $cell;
#$outws->write($in_row,$col+1, $cell->value());
$outws->write($in_row,$col+1, $fields[0]);
$outws->write($in_row,$col+2, $fields[1]);
$outws->write($in_row,$col+3, $fields[2]);
$outws->write($in_row,$col+4, $fields[3]);
}
}
for my $in_row ( 2 .. $row_max ) {
for my $col ( 2 .. 2 ) {
my $cell = $inws->get_cell( $in_row, $col);
my #fields = split /_/, $cell->value();
next unless $cell;
$outws->write($in_row,6, $cell->value());
}
}
for my $in_row ( 2 .. $row_max ) {
for my $col ( 3 .. 9 ) {
my $cell = $inws->get_cell( $in_row, $col);
next unless $cell;
}
}
for my $in_row ( 2 .. $row_max ) {
for my $col ( 10 .. 10 ) {
my $cell = $inws->get_cell( $in_row, $col );
next unless $cell;
$outws->write($in_row,7, $cell->value());
}
}
}
To get your output sorted, you need to collect all the information first before you are writing it out. Right now, you are doing a bit of jumping back and forth between rows and columns.
Here are some changes I would make to get it sorted, and make it more efficient (to read).
Create a data structure $data outside of your loop to store all the information.
If there is only one worksheet, you don't need to loop over sheets. Just work with one sheet.
Loop over the lines.
Inside that loop, use the code you have to parse the individual fields to just parse them. No 2..2 loops. Just a bunch of statements.
my #item_fields = split /_/, $inws->get_cell( $in_row, 0 ) || q{};
my #name_fields = split /_/, $inws->get_cell( $in_row, $col ) || q{};
Store them in $data per item.
push #{ $data } = [ $item_fields[0], ... ];
Done with the loop. Open the output file.
Loop over $data with a sort and write to the output file.
foreach my $row (sort { $a->[0] cmp $b->[0] } #{ $data } ) { ... }
Done.
I suggest you read up on sort and also check out perlref and perlreftut to learn more about references (data structures).
I'm trying to run a program to check each line of one file against each line of a second file to see if some of the elements match. Each file is around 200k lines.
What I've got so far looks like this;
#!/usr/bin/perl
#gffgenefind.pl
use strict;
use warnings;
die "SNP gff\n" unless #ARGV == 4;
open( my $snp, "<", $ARGV[0] ) or die "Can't open $:";
open( my $gff, "<", $ARGV[1] ) or die "can't open $:";
open( my $outg, ">", $ARGV[2] );
open( my $outs, ">", $ARGV[3] );
my $scaffold;
my $site;
my #snplines = <$snp>;
my #gfflines = <$gff>;
foreach my $snpline (#snplines) {
my #arr = split( /\t/, $snpline );
$scaffold = $arr[0];
$site = $arr[1];
foreach my $line (#gfflines) {
my #arr1 = split( /\t/, $line );
if ( $arr1[3] <= $site and $site <= $arr1[4] and $arr1[0] eq $scaffold ) {
print $outg "$line";
print $outs "$snpline";
}
}
}
File 1 (snp) looks like this scaffold_100 10689 A C A 0 0 0 0 0 0
File 2 (gff) looks like this scaffold_1 phytozomev10 gene 750912 765975 . - . ID=Carubv10008059m.g.v1.0;Name=Carubv10008059m.g
Essentially, I'm looking to see if the first values match and if the second value from snp is within the range defined on the second file (in this case 750912 to 765975)
I've seen that nested loops are to be avoided, and was wondering if there's an alternative way for me to look through this data.
Thanks!
Firstly - lose the foreach loop. That reads your whole file into memory, when you probably don't need to.
Try instead:
while ( my $snpline = <$snp> ) {
because it reads line by line.
Generally - mixing array indicies and named variables is also bad style.
The core problem will most likely be though because each line of your first file, you're cycling all of the second file.
Edit: Note - because 'scaffold' isn't unique, amended accordingly
This seems like a good place to use a hash. E.g.
my %sites;
while ( <$snp> ) {
my ( $scaffold, $site ) = split ( /\t/ );
$sites{$site}{$scaffold}++
}
while ( <$gff> ) {
my ( $name, $tmp1, $tmp2, $range_start, $range_end ) = split ( /\t/ );
if ( $sites{$name} ) {
foreach my $scaffold ( keys %{ $sites{$name} ) {
if ( $scaffold > $range_start
and $scaffold < $range_end ) {
#do stuff with it;
print;
}
}
}
}
Hopefully you get the gist, even if it isn't specifically what you're after?
Try this Python snippet:
#!/usr/bin/env python
import sys
import contextlib
if len(sys.argv) !=5:
raise Exception('SNP gff')
snp, gff, outg, outs = sys.argv[1:]
gff_dict = {}
with open(gff) as gff_handler:
for line in gff_handler:
fields=line.split()
try:
gff_dict[fields[0]].append(fields[1:])
except KeyError:
gff_dict[fields[0]] = [fields[1:]]
with contextlib.nested(open(snp),
open(outs, 'w'),
open(outg, 'w')) as (snp_handler,
outs_handler,
outg_handler):
for line_snp in snp_handler:
fields=line_snp.split()
key = fields[0]
if key in gff_dict:
for ele in gff_dict[key]:
if ele[2] <= fields[1] <= ele[3]:
outs_handler.write(line_snp)
outg_handler.write("{0}\t{1}\n".format(key,"\t".join(ele)))
I have two files myresult and annotation. details of these files are as follows.
myresult:
288..639 1.13075739182609-6.20035408429888i
300..651 1.90372125344918-6.09008858828515i
312..663 1.6908117147722-5.67058877579329i
324..675 0.644484787809351-5.54571698740166i
336..687 1.21850904281332-5.47700589647424i
annotation:
272..1042
1649..2629
For loop is running only once after satisfying the If condition. It is not entering again in loop even after the condition is satisfied. for example, in file myresult 2nd line satisfies the condition i.e numbers 300..651 lies in the range of 1st line of file annotation, therefore it prints all values from 300 to 651. But when it need to go for next round(3rd line) i.e. 312..663 it is not entering this loop since this range also lies in 1st line of file annotation.
so the output should be numbers from 300 to 663, but it is printing from 300 to 651 only.
Code:
#!/usr/bin/perl
use Math::Complex;
open( $inp0, "<myresult" ) or die "not found";
open( $inp2, "<annotation" ) or die "not found";
my #arr2 = <$inp0>;
my #arr4 = <$inp2>;
my #result;
foreach my $line1 (#arr2) {
my ( $col1, $col2 ) = split( /\s/, $line1 );
if ( $col2 > 1.60 ) {
my ( $from1, $to1 ) = split( /\.\./, $col1 );
foreach my $line2 (#arr4) {
my ( $from2, $to2 ) = split( /\.\./, $line2 );
for ( my $i = $from1; $i <= $to1; $i++ ) {
for ( my $j = $from2; $j <= $to2; $j++ ) {
$res = grep( /$i/, #result );
if ( $i == $j && $res == 0 ) {
print "$i \n";
push( #result, $i );
}
}
}
}
}
}
Second columns of the file "myresult" contains complex numbers like "1.13075739182609-6.20035408429888i" for example.
Two complex numbers can not be compared. (Complex number is a vector of complex plane with real axis and imaginary axis. Two vectors could not be compared like integer.)
'($col2>1.60)' will be false for all of your data in the file "myresult". This is why for loop is not executed.
As Fumu said two complex numbers can not be compared.
If you need help with complex numbers in Perl then check out Math::Complex module.
So I have an array like this:
W,X,Y,Z
-7,6,101,15
-7,6,101,16
-7,6,101,17
-7,6,101,78
-7,6,101,79
-7,6,101,84
-7,6,101,92
-7,9,615,49
-7,9,615,50
-7,10,759,38
-7,10,759,39
Now, I want to print a line when W=X=Y and Z are consecutive numbers.
Expected Output:
W,X,Y,Z
-7,6,101,15-16-17
-7,6,101,78-79
-7,6,101,84
-7,6,101,92
-7,9,615,49-50
-7,10,759,38-39
How do I implement this on Perl?
Thanks,
Art
Here is my script:
while ( $output_line = <FILE_C> ) {
chomp $output_line;
my ( $W, $X, $Y, $C, $D, $E, $F, $Z ) = ( split /\s/, $output_line );
if ( $Y == $Block_previous ) {
print("Yes\t$Block_previous\t$Y\t$Z\n");
push( #Z_array, $Z );
push( #Y_array, $Y );
next;
}
else {
push( #Z_array_nonblkmatch, $Z );
}
foreach $Z_printer (#Z_array) {
print("$Y_array[0]\t$Z_printer\n");
if ( ( $Z_delta == 1 ) || ( $Z_delta == -1 ) ) {
push( #Z_adj, $Z_printer, $Z_printer_prev );
#~ print ("pair: $Z_printer_prev-$Z_printer\n");
}
else {
#~ print ("$Z_printer\n");
}
$Z_printer_prev = $Z_printer;
}
#Z_adj = ();
#Z_array = ();
#Y_array = ();
#Z_array_nonblkmatch = ();
$Block_previous = $Y;
#~ <STDIN>;
}
close(FILE_C);
Thanks, raina77ow! However, this is what the output look like:
-7,6,101,15-16-17-79
-7,6,101,16-17-79
-7,6,101,17-79
-7,6,101,78-79
-7,6,101,79-50
-7,6,101,84-50
-7,6,101,92
-7,6,615,49-50-39
-7,6,615,50
One possible approach (ideone demo):
use warnings;
use strict;
my $prev;
while (<DATA>) {
chomp;
next unless /\S/;
my #numbers = split /,/;
if (defined $prev && $numbers[3] == $prev + 1) {
print '-' . ++$prev;
next;
}
print "\n" if defined $prev;
print join ',', #numbers;
$prev = $numbers[3];
}
__DATA__
-7,6,101,15
-7,6,101,16
-7,6,101,17
-7,6,101,78
-7,6,101,79
-7,6,101,84
-7,6,101,92
-7,9,615,49
-7,9,615,50
-7,10,759,38
-7,10,759,39
I choose not to collect this data into intermediate array, as you did, as the question was simple: print it grouped. The key is storing the value of the last (Z) column, then checking each new line against it: if it matches, you print just the incremented value (that's what print '-' . ++$prev line for), if not, you end this line (for all but the first case) and start a new one with the numbers of this line.