Extract zip Files on cmd with progress indicator - perl

I am looking for a program, which is able to extract zip archives via the windows commandline and that is able to display a progressbar or a percentage indicator on the cmd. I want to use this from within a Perl script and so give the user a hint how long the progress will take. I tried 7zip(http://www.7-zip.org/) and Unzip(from InfoZIP) so far, but was not able to produce the behaviour described above. Does somebody know how to solve this?
Update:
Currently i'm trying it with this approach:
#!/usr/bin/perl
use strict; $|++;
use warnings;
use Archive::Zip;
my $zip = Archive::Zip->new('file.zip');
my $total_bytes = 0;
my $bytes_already_unzipped = 0;
foreach my $member ($zip->members()) {
$total_bytes += $member->uncompressedSize();
}
foreach my $member ($zip->members()) {
$zip->extractMember($member);
$bytes_already_unzipped += $member->uncompressedSize();
print progress_bar($bytes_already_unzipped, $total_bytes, 25, '=' );
}
#routine by tachyon at http://tachyon.perlmonk.org/
#also have a look at http://oreilly.com/pub/h/943
sub progress_bar {
my ( $got, $total, $width, $char ) = #_;
$width ||= 25; $char ||= '=';
my $num_width = length $total;
sprintf "|%-${width}s| Got %${num_width}s bytes of %s (%.2f%%)\r",
$char x (($width-1)*$got/$total). '>',
$got, $total, 100*$got/+$total;
}
However i have two problems:
this approach seems to be very slow
i do not have a periodic update in the progress bar, but only when a file is finished beeing extracted. As i have some large files, the system seems to not respond while extracting them

Do the extraction from within your program instead of delegating to a different one. Use Archive::Zip and Term::ProgressBar. Extract files one by one. Update the progress after each.

Related

Expand multiple for loop embedded inside Perl code

my $limit = 10;
my $new_limit = 20;
$command = '
some_plain_lines_here
foreach my $i (o..$limit-1) { print"
some loop_lines_here with $i
";}
some_more_plain_lines_here
foreach my $j (0..$new_lmit-1) {print"
some_more_loop_lines_here with $j
";}
some_more_nonloop_lines_here;
';
#TODO: How to scoop out individual for-loops from command and expand and put replace the result back.
my $out_line = eval $command;
I have a piece of code like above where I have a $command variable containing a mixture of text and foreach loops. The text comes from an XLS and this apparently weird format is to help users write content fast and automate rest in backend.
Now, if there's a single for loop, we can easily call eval and expand the content into an $out_line. However in this case we have to scoop out every foreach and then replace the expanded output in same location.
I am not sure how to split the content into an array at the foreach boundaries. If we get the array, we can call eval as required and expand/stitch back text.
It's essentially a string-to-array conversion problem, but gave the complete picture for clarity.
Like I said in the comments, eval() is very dangerous and almost always unnecessary. You should basically never use it, except in very controlled and select situations, and certainly never when you do not control what is being eval'ed.
With something like this, you could do the same thing without eval. Assuming that the user input is the limits. Note that I am cleaning user input to make sure it is ok to use. You may even insert error feedback to the user, but do not reveal how the de-taint works.
use strict;
use warnings; # always use these
my $limit = shift;
my $new_limit = shift; # user input
$limit //= 10; # default
$new_limit //= 20;
$limit =~ s/[^0-9]+//g; # de-taint input
$new_limit =~ s/[^0-9]+//g;
sub command {
my ($limit, $new_limit) = #_;
# some_plain_lines_here <--- dont know what this is supposed to be
for my $i (0 .. $limit-1) {
print "some loop_lines_here with $i";
}
# some_more_plain_lines_here
for my $j (0 .. $new_lmit-1) {
print "some_more_loop_lines_here with $j";
}
# some_more_nonloop_lines_here;
}
#TODO: How to scoop out individual for-loops from command and expand and put replace the result back.
# ^-- no idea what this means
command($limit, $new_limit); # execute the command
It is impossible to say what you could put in a subroutine like this without knowing more about what you are looking for. But you sure do not need eval() to perform a for loop and print.

print doesn't work while iterations are going inside foreach loop

I try to find a way to print a progressbar on the commandline while parsing logfiles. Get logfiles=> foreach file => foreach line {do}.
The idea: I want to print a part of the progressbar in every "foreach file" loop. Meaing: print the whole bar if you just parse 1 file. print half of the bar for every file when u parse 2 files and so on. You find the specific code at the bottom.
The problem: The output (print "*") is printed after ALL foreach iteration are done - not in between. Details are in the Code.
Does someone have an idea how to print inside a foreach? Or can tell me the problem? I don't get it :(.
my #logfiles=glob($logpath);
print "<------------------>\n";
$vari=20/(scalar #logfiles);
foreach my $logfile (#logfiles){
open(LOGFILEhandle, $logfile);
#lines = <LOGFILEhandle>;
print "*" x $vari; #won't work, only after loop. Even a "print "*";" doesn't work
foreach my $line (#lines){
#print "*"; works "in between". print "*" x $vari; does not.
if ($line=~/xyz/){
......
......
}
close(LOGFILEhandle);
}
}
I would like to suggest Term::ProgressBar module to avoid reinventing the wheel.
#!/usr/bin/perl
use strict;
use warnings;
use Term::ProgressBar;
my #files = qw (file1 file2 file3 file4);
my $progress = Term::ProgressBar->new(scalar #files);
for (0..#files) {
$| = 1;
sleep(1); #introducing sleep for demo purpose otherwise bar will fill up quickly
#open the file, do some operations and when you are done
#update the bar
$progress->update($_);
}
You are suffering from buffering. The output is buffered until a certain amount is reached or you print a newline. To change this behaviour simply add
$| = 1 ;
at the top of your file. This will turn on autoflush for STDOUT.
There is more than one way to do it and a little bit longer and less cryptic is Borodins suggestion:
STDOUT->autoflush();

Remove duplicate lines on file by substring - preserve order (PERL)

i m trying to write a perl script to deal with some 3+ gb text files, that are structured like :
1212123x534534534534xx4545454x232322xx
0901001x876879878787xx0909918x212245xx
1212123x534534534534xx4545454x232323xx
1212133x534534534534xx4549454x232322xx
4352342xx23232xxx345545x45454x23232xxx
I want to perform two operations :
Count the number of delimiters per line and compare it to a static number (ie 5), those lines that exceed said number should be output to a file.control.
Remove duplicates on the file by substring($line, 0, 7) - first 7 numbers, but i want to preserve order. I want the output of that in a file.output.
I have coded this in simple shell script (just bash), but it took too long to process, the same script calling on perl one liners was quicker, but i m interested in a way to do this purely in perl.
The code i have so far is :
open $file_hndl_ot_control, '>', $FILE_OT_CONTROL;
open $file_hndl_ot_out, '>', $FILE_OT_OUTPUT;
# INPUT.
open $file_hndl_in, '<', $FILE_IN;
while ($line_in = <$file_hndl_in>)
{
# Calculate n. of delimiters
my $delim_cur_line = $line_in =~ y/"$delimiter"//;
# print "$commas \n"
if ( $delim_cur_line != $delim_amnt_per_line )
{
print {$file_hndl_ot_control} "$line_in";
}
# Remove duplicates by substr(0,7) maintain order
my substr_in = substr $line_in, 0, 11;
print if not $lines{$substr_in}++;
}
And i want the file.output file to look like
1212123x534534534534xx4545454x232322xx
0901001x876879878787xx0909918x212245xx
1212133x534534534534xx4549454x232322xx
4352342xx23232xxx345545x45454x23232xxx
and the file.control file to look like :
(assuming delimiter control number is 6)
4352342xx23232xxx345545x45454x23232xxx
Could someone assist me? Thank you.
Posting edits : Tried code
my %seen;
my $delimiter = 'x';
my $delim_amnt_per_line = 5;
open(my $fh1, ">>", "outputcontrol.txt");
open(my $fh2, ">>", "outputoutput.txt");
while ( <> ) {
my $count = ($_ =~ y/x//);
print "$count \n";
# print $_;
if ( $count != $delim_amnt_per_line )
{
print fh1 $_;
}
my ($prefix) = substr $_, 0, 7;
next if $seen{$prefix}++;
print fh2;
}
I dont know if i m supposed to post new code in here. But i tried the above, based on your example. What baffles me (i m still very new in perl) is that it doesnt output to either filehandle, but if i redirected from the command line just as you said, it worked perfect. The problem is that i need to output into 2 different files.
It looks like entries with the same seven-character prefix may appear anywhere in the file, so it's necessary to use a hash to keep track of which ones have already been encountered. With a 3GB text file this may result in your perl process running out of memory, in which case a different approach is necessary. Please give this a try and see if it comes in under the bar
The tr/// operator (the same as y///) doesn't accept variables for its character list, so I've used eval to create a subroutine delimiters() that will count the number of occurrences of $delimiter in $_
It's usually easiest to pass the input file as a parameter on the command line, and redirect the output as necessary. That way you can run your program on different files without editing the source, and that's how I've written this program. You should run it as
$ perl filter.pl my_input.file > my_output.file
use strict;
use warnings 'all';
my %seen;
my $delimiter = 'x';
my $delim_amnt_per_line = 5;
eval "sub delimiters { tr/$delimiter// }";
while ( <> ) {
next if delimiters() == $delim_amnt_per_line;
my ($prefix) = substr $_, 0, 7;
next if $seen{$prefix}++;
print;
}
output
1212123x534534534534xx4545454x232322xx
0901001x876879878787xx0909918x212245xx
1212133x534534534534xx4549454x232322xx
4352342xx23232xxx345545x45454x23232xxx

BioPerl with clustalw - outputting file

I have a perl script to automate many multiple alignments (I'm making the script first with only one file and one multiple alignment - big one though. I can then modify for multiple files) and I want to output the resulting file, but I am unsure on how to do with with AlignIO: so far I have:
use warnings;
use strict;
use Bio::AlignIO;
use Bio::SeqIO;
use Bio::Tools::Run::Alignment::Clustalw;
my $file = shift or die; # Get filename from command prompt.
my $factory = Bio::Tools::Run::Alignment::Clustalw->new(-matrix => 'BLOSUM');
my $ktuple = 3;
$factory->ktuple($ktuple);
my $inseq = Bio::SeqIO->new(
-file => "<$file",
-format => $format
);
my $seq;
my #seq_array;
while ($seq = $inseq->next_seq) {
push(#seq_array, $seq);
}
# Now we do the actual alignment.
my $seq_array_ref = \#seq_array;
my $aln = $factory->align($seq_array_ref);
Once the alignment is done I have $aln which is the alignment I want to get out of the process as a fasta file - I tried something like:
my $out = Bio::AlignIO->new(-file => ">outputalignmentfile",
-format => 'fasta');
while( my $outaln = $aln->next_aln() ){
$out->write_aln($outaln);
}
but it didn't work, presumably because the method next_aln() only applies to AlignIO things, which $aln is probably not. So I need to know what it is that is generated by the line my $aln = $factory->align($seq_array_ref); and how to get the aligned sequences output into a file. My next step is tree estimation or network analysis.
Thanks,
Ben.
$out->write_aln($outaln);
Was the only line needed to write the object returned by the clustalw line to output the object to that stream.

How to make recursive calls using Perl, awk or sed?

If a .cpp or .h file has #includes (e.g. #include "ready.h"), I need to make a text file that has these filenames on it. Since ready.h may have its own #includes, the calls have to be made recursively. Not sure how to do this.
The solution of #OneSolitaryNoob will likely work allright, but has an issue: for each recursion, it starts another process, which is quite wasteful. We can use subroutines to do that more efficiently. Assuming that all header files are in the working directory:
sub collect_recursive_includes {
# Unpack parameter from subroutine
my ($filename, $seen) = #_;
# Open the file to lexically scoped filehandle
# In your script, you'll probably have to transform $filename to correct path
open my $fh, "<", $filename or do {
# On failure: Print a warning, and return. I.e. go on with next include
warn "Can't open $filename: $!";
return;
};
# Loop through each line, recursing as needed
LINE: while(<$fh>) {
if (/^\s*#include\s+"([^"]+)"/) {
my $include = $1;
# you should probably normalize $include before testing if you've seen it
next LINE if $seen->{$include}; # skip seen includes
$seen->{$include} = 1;
collect_recursive_includes($include, $seen);
}
}
}
This subroutine remembers what files it has already seen, and avoids recursing there again—each file is visited one time only.
At the top level, you need to provide a hashref as second argument, that will hold all filenames as keys after the sub has run:
my %seen = ( $start_filename => 1 );
collect_recursive_includes($start_filename, \%seen);
my #files = sort keys %seen;
# output #files, e.g. print "$_\n" for #files;
I hinted in the code comments that you'll probabably have to normalize the filenames. E.g consider a starting filename ./foo/bar/baz.h, which points to qux.h. Then the actual filename we wan't to recurse to is ./foo/bar/qux.h, not ./qux.h. The Cwd module can help you find your current location, and to transform relative to absolute paths. The File::Spec module is a lot more complex, but has good support for platform-independent filename and -path manipulation.
In Perl, recursion is straightforward:
sub factorial
{
my $n = shift;
if($n <= 1)
{ return 1; }
else
{ return $n * factorial($n - 1); }
}
print factorial 7; # prints 7 * 6 * 5 * 4 * 3 * 2 * 1
Offhand, I can think of only two things that require care:
In Perl, variables are global by default, and therefore static by default. Since you don't want one function-call's variables to trample another's, you need to be sure to localize your variables, e.g. by using my.
There are some limitations with prototypes and recursion. If you want to use prototypes (e.g. sub factorial($) instead of just sub factorial), then you need to provide the prototype before the function definition, so that it can be used within the function body. (Alternatively, you can use & when you call the function recursively; that will prevent the prototype from being applied.)
Not totally clear what you want the display to look like, but the basic would be a script called follow_includes.pl:
#!/usr/bin/perl -w
while(<>) {
if(/\#include "(\S+)\"/) {
print STDOUT $1 . "\n";
system("./follow_includes.pl $1");
}
}
Run it like:
% follow_includes.pl somefile.cpp
And if you want to hide any duplicate includes, run it like:
% follow_includes.pl somefile.cpp | sort -u
Usually you'd want this in some sort of tree-print.