in perl can I sort my output of a foreach loop according to keywords in the original string? - perl

I am working on a problem, and iterating through an array. I am new to perl, so sorry if this is something very obvious I am not seeing.
I want to sort the output according to a keyword in the original string. As I have two foreach loops that give me something like this:
[blup]
[ich]
[du]
[er]
[sie]
[es]
something something something
somethingelse something else something else
I want to sort it like that though according to a keyword in the original string where the substrings have been extracted from:
[blup blup]
[ich]
something something something
[er]
[sie]
[es]
something else something else something else
Thank you for your help!
This is my code:
#!/usr/bin/perl
# perl -d ./perl_debugger.pl
use strict;
use warnings;
use Data::Dumper qw(Dumper);
use File::Slurp;
my #a_linesorig;
my #solution;
my $line;
my $str;
my $grab;
my $s;
my $rs;
my $capture;
open(my $fh, "<", "output.txt")
or die "cannot open < output.txt: $!";
$line = read_file('output.txt');
$line = read_file('output.txt');
#a_linesorig = split( /\*/, $line);
#solution = split( /\bsolution\b/, $line);
close $fh
or die "can't close file: $!";
my $filename = 'neu.txt';
open(my $fh1, '>', $filename)
or die "can't open file: $!";
foreach $str (#a_linesorig) {
if ($str =~ (/\[(.*?)\]/)) {
print ($fh1 "content bracket: $1\n\n");
}
}
foreach $str (#a_linesorig) {
if ($str =~ /\brewrites\b([^\|]+)((\bcpu\b))*/g) {
print ($fh1 "decision: $&\n\n");
}
}
close $fh1
or die "can't close file: $!";

As you are calculating your results you could store them in a hash at the end you can iterate through your hash (by sorted keys)
This is unchecked pseudo-perl but the concept is:
Define a hash:
%hash
When you store the entry you would do
$hash{$key-that-you-want-to-sort-by} = $Thing-that-you-want-to-print
Then when you are done you could loop through your keys
for (my $key (sort keys $hash)) {
print $key{$hash};
}

At a high level, I would say that you should employ a hash, who's keys are the words in the original string, and who's values are the ordering that you want to preserve.
Then, afte ryou're processed your input, you will look over the keys of the hash, sorted by your ordering, and print you results for each word inside the loop.

Related

perl: make script fast to use big file

My problem is how to make my script fast (I use big files)
I have the script above it add "bbb" between words if the words exist in an other file that contain sequences of words
for exemple file2.txt : i eat big pizza .my big pizza ...
file1.txt (sequences):
eat big pizza
big pizza
the result Newfile
i eatbbbbigbbbpizza.my bigbbbpizza ...
my script:
use strict;
use warnings;
use autodie;
open Newfile ,">./newfile.txt" or die "Cannot create Newfile.txt";
my %replacement;
my ($f1, $f2) = ('file1.txt', 'file2.txt');
open(my $fh, $f1);
my #seq;
foreach (<$fh> )
{
chomp;
s/^\s+|\s+$//g;
push #seq, $_;
}
close $fh;
#seq = sort bylen #seq;
open($fh, $f2);
foreach (<$fh> ) {
foreach my $r (#seq) {
my $t = $r;
$t =~ s/\h+/bbb/g;
s/$r/$t/g;
}
print Newfile ;
}
close $fh;
close Newfile ;
exit 0;
sub bylen {
length($b) <=> length($a);
}
Instead of an array
my #seq;
define your words as a hash.
my %seq;
Instead of pushing the words
push #seq, $_;
store the words in the hash. Precalculate the replacement and move it out of the loop.
my $t = $_;
$t =~ s/\h+/bbb/g;
$seq{$_} = $t;
Precalculate the words in front of the outer loop:
my #seq = keys %seq;
And use hash look-ups to find the replacement in the inner loop:
my $t = $seq{$r};
This might be a bit faster, but do not expect too much.
In most cases it is better to reduce the problem by preparing the input in a way, which makes the solution easier. For example grep -f is much faster than your Perl loops. Use grep to find the lines, which need a replacement, and do the replacement with Perl or Sed.
Another way is to parallel the job. You can divide your input in n parts and run n processes on n CPUs in parallel. See the GNU parallel tutorial.
What about a regexp like this (beware that this approach can cause security concerns) ?
use strict;
use warnings;
open (my $Newfile, '>', 'newfile.txt') or die "Cannot create Newfile.txt: $!";
my ($f1, $f2) = qw(file1.txt file2.txt);
open (my $fh, $f1) or die "Can't open $f1 for reading: $!";
my #seq = map {split ' ', $_ } <$fh>;
close $fh;
# an improvement would be to use an hash to avoid dupplicates
my $regexp = '(' . join('|', #seq) . ')';
open($fh, $f2) or die "Can't open $f2 for reading: $!";
foreach my $line (<$fh> ) {
$line =~ s/$regexp/$1bbb/g;
print $Newfile $line;
}
close $fh;
close $Newfile ;
exit 0;

Printing value from split result Perl

Here I have a abc.txt file:
aaa,1000,kevin
bbb,2000,john
ccc,3000,jane
ddd,4000,kevin
Then I want to print out:
kevin
john
jane
my Perl script is:
open (INFILE, $ARGV[1]) or die "An input file is required as argument\n";
#store=();
while(<INFILE>)
{
chomp();
#data=split(/,/);
#
#
#
if (%store ne "0")
{
print "Printing users:\n";
foreach $key (keys %store)
{print $key."\n";}
}
print "Printing users:\n";
foreach $key (keys %store)
{print $key."\n";}
}
My idea is to store the value into hash and create key to each value. How can I do in the ### line?
You have declared #store and then using %store. I didn't understand that why you doing that, but the below code will give you desire output. First read the input file, split the data and then remove the duplicates.
use strict;
use warnings;
my $infile = $ARGV[0];
open my $fh, "<", $infile or die "An input file is required as argument: $!";
my %store;
while(my $line = <$fh>)
{
chomp($line);
my #data = split /,/, $line;
my #removeduplicate = (grep { !$store{$_}++ } #data)[2];
foreach(#removeduplicate){
if( $_ ne ''){
print "$_\n";
}
}
}
close $fh;
Output:
kevin
john
jane
hmmm. it depends what do you want. maybe this example will help you:
#!/usr/bin/perl
use strict;
use warnings;
use Data::Dumper; #for debug if you want
my $infile='abc.txt'; #or ARGV[0] whatever it is
my $fh;
open $fh,'<',$infile or die "problem with $infile $# $!";
my $inputline;
my %Storage;
my #Values;
while (defined($inputline=<$fh>)) {
chomp $inputline;
#Values=split ',',$inputline;
if (#Values != 3) {
warn "$inputline has formatted badly";
next;
}
#warn if exists $Storage{$Values[1]}; #optional warning for detected duplicates
$Storage{$Values[1]}=#Values[0,2]; #create hash data
#duplicates will be removed automaticly
}
close $fh;
print Dumper \%Storage; #print how perl it stores
foreach my $Key (keys %Storage) { #example loop
print #{Storage->{$Key}},"\n"; #do anything
}
I hope this template will be enough for you.

Extract data from file

I have data like
"scott
E -45 COLLEGE LANE
BENGALI MARKET
xyz -785698."
"Tomm
D.No: 4318/3,Ansari Road, Dariya Gunj,
xbc - 289235."
I wrote one Perl program to extract names i.e;
open(my$Fh, '<', 'printable address.txt') or die "!S";
open(my$F, '>', 'names.csv') or die "!S";
while (my#line =<$Fh> ) {
for(my$i =0;$i<=13655;$i++){
if ($line[$i]=~/^"/) {
print $F $line[$i];
}
}
}
It works fine and it extracts names exactly .Now my aim is to extract address that is like
BENGALI MARKET
xyz -785698."
D.No: 4318/3,Ansari Road, Dariya Gunj,
xbc - 289235."
In CSV file. How to do this please tell me
There are a lot of flaws with your original problem. Should address those before suggesting any enhancements:
Always have use strict; and use warnings; at the top of every script.
Your or die "!S" statements are broken. The error code is actually in $!. However, you can skip the need to do that by just having use autodie;
Give your filehandles more meaningful names. $Fh and $F say nothing about what those are for. At minimum label them as $infh and $outfh.
The while (my #line = <$Fh>) { is flawed as that can just be reduced to my #line = <$Fh>;. Because you're going readline in a list context it will slurp the entire file, and the next loop it will exit. Instead, assign it to a scalar, and you don't even need the next for loop.
If you wanted to slurp your entire file into #line, your use of for(my$i =0;$i<=13655;$i++){ is also flawed. You should iterate to the last index of #line, which is $#line.
if ($line[$i]=~/^"/) { is also flawed as you leave the quote character " at the beginning of your names that you're trying to match. Instead add a capture group to pull the name.
With the suggested changes, the code reduces to:
use strict;
use warnings;
use autodie;
open my $infh, '<', 'printable address.txt';
open my $outfh, '>', 'names.csv';
while (my $line = <$infh>) {
if ($line =~ /^"(.*)/) {
print $outfh "$1\n";
}
}
Now if you also want to isolate the address, you can use a similar method as you did with the name. I'm going to assume that you might want to build the whole address in a variable so you can do something more complicated with it than throwing them blindly at a file. However, mirroring the file setup for now:
use strict;
use warnings;
use autodie;
open my $infh, '<', 'printable address.txt';
open my $namefh, '>', 'names.csv';
open my $addressfh, '>', 'address.dat';
my $address = '';
while (my $line = <$infh>) {
if ($line =~ /^"(.*)/) {
print $namefh "$1\n";
} elsif ($line =~ /(.*)"$/) {
$address .= $1;
print $addressfh "$address\n";
$address = '';
} else {
$address .= $line;
}
}
Ultimately, no matter what you want to use your data for, your best solution is probably to output it to a real CSV file using Text::CSV. That way it can be imported into a spreadsheet or some other system very easily, and you won't have to parse it again.
use strict;
use warnings;
use autodie;
use Text::CSV;
my $csv = Text::CSV->new ( { binary => 1, eol => "\n" } )
or die "Cannot use CSV: ".Text::CSV->error_diag ();
open my $infh, '<', 'printable address.txt';
open my $outfh, '>', 'address.csv';
my #data;
while (my $line = <$infh>) {
# Name Field
if ($line =~ /^"(.*)/) {
#data = ($1, '');
# End of Address
} elsif ($line =~ /(.*)"$/) {
$data[1] .= $1;
$csv->print($outfh, \#data);
# Address lines
} else {
$data[1] .= $line;
}
}

How to search and replace using hash with Perl

I'm new to Perl and I'm afraid I am stuck and wanted to ask if someone might be able to help me.
I have a file with two columns (tab separated) of oldname and newname.
I would like to use the oldname as key and newname as value and store it as a hash.
Then I would like to open a different file (gff file) and replace all the oldnames in there with the newnames and write it to another file.
I have given it my best try but am getting a lot of errors.
If you could let me know what I am doing wrong, I would greatly appreciate it.
Here are how the two files look:
oldname newname(SFXXXX) file:
genemark-scaffold00013-abinit-gene-0.18 SF130001
augustus-scaffold00013-abinit-gene-1.24 SF130002
genemark-scaffold00013-abinit-gene-1.65 SF130003
file to search and replace in (an example of one of the lines):
scaffold00013 maker gene 258253 258759 . - . ID=maker-scaffold00013-augustus-gene-2.187;Name=maker-scaffold00013-augustus-gene-2.187;
Here is my attempt:
#!/usr/local/bin/perl
use warnings;
use strict;
my $hashfile = $ARGV[0];
my $gfffile = $ARGV[1];
my %names;
my $oldname;
my $newname;
if (!defined $hashfile) {
die "Usage: $0 hash_file gff_file\n";
}
if (!defined $gfffile) {
die "Usage: $0 hash_file gff_file\n";
}
###save hashfile with two columns, oldname and newname, into a hash with oldname as key and newname as value.
open(HFILE, $hashfile) or die "Cannot open $hashfile\n";
while (my $line = <HFILE>) {
chomp($line);
my ($oldname, $newname) = split /\t/;
$names{$oldname} = $newname;
}
close HFILE;
###open gff file and replace all oldnames with newnames from %names.
open(GFILE, $gfffile) or die "Cannot open $gfffile\n";
while (my $line2 = <GFILE>) {
chomp($line2);
eval "$line2 =~ s/$oldname/$names{oldname}/g";
open(OUT, ">SFrenamed.gff") or die "Cannot open SFrenamed.gff: $!";
print OUT "$line2\n";
close OUT;
}
close GFILE;
Thank you!
Your main problem is that you aren't splitting the $line variable. split /\t/ splits $_ by default, and you haven't put anything in there.
This program builds the hash, and then constructs a regex from all the keys by sorting them in descending order of length and joining them with the | regex alternation operator. The sorting is necessary so that the longest of all possible choices is selected if there are any alternatives.
Every occurrence of the regex is replaced by the corresponding new name in each line of the input file, and the output written to the new file.
use strict;
use warnings;
die "Usage: $0 hash_file gff_file\n" if #ARGV < 2;
my ($hashfile, $gfffile) = #ARGV;
open(my $hfile, '<', $hashfile) or die "Cannot open $hashfile: $!";
my %names;
while (my $line = <$hfile>) {
chomp($line);
my ($oldname, $newname) = split /\t/, $line;
$names{$oldname} = $newname;
}
close $hfile;
my $regex = join '|', sort { length $b <=> length $a } keys %names;
$regex = qr/$regex/;
open(my $gfile, '<', $gfffile) or die "Cannot open $gfffile: $!";
open(my $out, '>', 'SFrenamed.gff') or die "Cannot open SFrenamed.gff: $!";
while (my $line = <$gfile>) {
chomp($line);
$line =~ s/($regex)/$names{$1}/g;
print $out $line, "\n";
}
close $out;
close $gfile;
Why are you using an eval? And $oldname is going to be undefined in the second while loop, because the first while loop you redeclare them in that scope (even if you used the outer scope, it would store the very last value that you processed, which wouldn't be helpful).
Take out the my $oldname and my $newname at the top of your script, it is useless.
Take out the entire eval line. You need to repeat the regex for each thing you want to replace. Try something like:
$line2 =~ s/$_/$names{$_}/g for keys %names;
Also see Borodin's answer. He made one big regex instead of a loop, and caught your lack of the second argument to split.

Perl while loop is repeating itself

I am 100% new to Perl but do have some PHP knowledge. I'm trying to create a quick script that will take the #url vars and save it to a .txt file. The problem that I'm having is that it's saving the url again everytime it runs through the loop which is super annoying. So when the loop runs, it'll look like this.
url1.com
url1.com url2.com
url1.com url2.com url3.com
What I would like it to look like is just a plain and simple:
url1.com
url2.com
url3.com
Here is my code. If anyone can help, I would appreciate it SO SO much!
#!/usr/bin/perl
use strict;
use warnings;
my $file = "data.rdf.u8";
my #urls;
open(my $fh, "<", $file) or die "Unable to open $file\n";
while (my $line = <$fh>) {
if ($line =~ m/<(?:ExternalPage about|link r:resource)="([^\"]+)"\/?>/) {
push #urls, $1;
}
open (FH, ">>my_urls.txt") or die "$!";
print FH "#urls ";
close(FH);
}
close $fh;
Your print is inside your while loop. It sounds like you want to move your print outside of the loop.
Or if you want to print each url as you go through each line, move the declaration of "my #urls" down into the loop, then it will get reset each line
Shouldn't this part:
open (FH, ">>my_urls.txt") or die "$!";
print FH "#urls ";
close(FH);
...be placed outside of while loop? It makes no sense within while, as #urls are apparently incomplete there.
And two regex-related sidenotes: first, with m operator you may choose another set of delimiters so you don't have to escape / sign; second, it's not necessary to escape " sign within character class definition. In fact, it's not required to escape it in regex at all - unless you choose this character as a delimiter. )
So your regex may look like this:
$line =~ m#<(?:ExternalPage about|link r:resource)="([^"]+)"/?>#
do you need the #urls array elsewhere? because else, you could simply:
#!/usr/bin/perl
use strict;
use warnings;
my $file = "data.rdf.u8";
my #urls;
open(my $fh, "<", $file) or die "Unable to open $file\n";
open (FH, ">>my_urls.txt") or die "$!";
while (my $line = <$fh>) {
if ($line =~ m/<(?:ExternalPage about|link r:resource)="([^\"]+)"\/?>/) {
print FH $1;
}
}
close(FH);
close $fh;