Displaying duplicate records - perl

I've a code as below to parse a text file. Display all words after "Enter:" keyword on all lines of the text file. I'm getting displayed all words after "Enter:" keyword, but i wan't duplicated should not be repeated but its repeating. Please guide me as to wht is wrong in my code.
#! /usr/bin/perl
use strict;
use warnings;
$infile "xyz.txt";
open (FILE, $infile) or die ("can't open file:$!");
if(FILE =~ /ENTER/ ){
#functions = substr($infile, index($infile, 'Enter:'));
#functions =~/#functions//;
%seen=();
#unique = grep { ! $seen{$_} ++ } #array;
while (#unique != ''){
print '#unique\n';
}
}
close (FILE);

Here is a way to do the job, it prints unique words found on each line that begins with the keyword Enter:
#!/usr/bin/perl
use strict;
use warnings;
my $infile = "xyz.txt";
# use 3 arg open with lexical file handler
open my $fh, '<', $infile or die "unable to open '$infile' for reading: $!";
# loop thru all lines
while(my $line = <$fh) {
# remove linefeed;
chomp($line);
# if the line begins with "Enter:"
# remove the keyword "Enter:"
if ($line =~ s/^Enter:\s+//) {
# split the line on whitespaces
# and populate the array with all words found
my #words = split(/\s+/, $line);
# create a hash where the keys are the words found
my %seen = map { $_ => 1 }#words;
# display unique words
print "$_\t" for(keys %seen);
print "\n";
}
}

If I understand you correctly, one problem is that your 'grep' only counts the occurrences of each word. I think you want to use 'map' so that '#unique' only contains the unique words from '#array'. Something like this:
#unique = map {
if (exists($seen{$_})) {
();
} else {
$seen{$_}++; $_;
}
} #array;

Related

How to randomly pair items in a list

I have a list of Accession numbers that I want to pair randomly using a Perl script below:
#!/usr/bin/perl -w
use List::Util qw(shuffle);
my $file = 'randomseq_acc.txt';
my #identifiers = map { (split /\n/)[1] } <$file>;
chomp #identifiers;
#Shuffle them and put in a hash
#identifiers = shuffle #identifiers;
my %pairs = (#identifiers);
#print the pairs
for (keys %pairs) {
print "$_ and $pairs{$_} are partners\n";
but keep getting errors.
The accession numbers in the file randomseq_acc.txt are:
1094711
1586007
2XFX_C
Q27031.2
P22497.2
Q9TVU5.1
Q4N4N8.1
P28547.2
P15711.1
AAC46910.1
AAA98602.1
AAA98601.1
AAA98600.1
EAN33235.2
EAN34465.1
EAN34464.1
EAN34463.1
EAN34462.1
EAN34461.1
EAN34460.1
I needed to add the closing right curly brace to be able to compile the script.
As arrays are indexed from 0, (split /\n/)[1] returns the second field, i.e. what follows newline on each line (i.e. nothing). Change it to [0] to make it work:
my #identifiers = map { (split /\n/)[0] } <$file>; # Still wrong.
The diamond operator needs a file handle, not a file name. Use open to associate the two:
open my $FH, '<', $file or die $!;
my #identifiers = map { (split /\n/)[0] } <$FH>;
Using split to remove a newline is not common. I'd probably use something else:
map { /(.*)/ } <$FH>
# or
map { chomp; $_ } <$FH>
# or, thanks to ikegami
chomp(my #identifiers = <$FH>);
So, the final result would be something like the following:
#!/usr/bin/perl
use warnings;
use strict;
use List::Util qw(shuffle);
my $filename = '...';
open my $FH, '<', $filename or die $!;
chomp(my #identifiers = <$FH>);
my %pairs = shuffle(#identifiers);
print "$_ and $pairs{$_} are partners\n" for keys %pairs;

search a group of string in a file is present in another file or not

I"m writing to perl script where basically want to open a file having many strings(one string in one line) and compare each of these strings is present in another file(search file) and print each occurrence of it. I have written the below code for one particular string finding. How can i improve it for list of strings from a file.
open(DATA, "<filetosearch.txt") or die "Couldn't open file filetosearch.txt for reading: $!";
my $find = "word or string to find";
#open FILE, "<signatures.txt";
my #lines = <DATA>;
print "Lined that matched $find\n";
for (#lines) {
if ($_ =~ /$find/) {
print "$_\n";
}
}
I'd try something like this:
use strict;
use warnings;
use Tie::File;
tie my #lines, 'Tie::File', 'filetosearch.txt';
my #matched;
my #result;
tie my #patterns, 'Tie::File', 'patterns.txt';
foreach my $pattern (#patterns)
{
$pattern = quotemeta $pattern;
#matched = grep { /$pattern/ } #lines;
push #result, #matched;
}
I use Tie::File, because it is convenient (not especially in this case, but others), others (perhaps a lot of others?) would disagree, but it is of no importance here
grep is a core function, that is very good at what it does (In my experience)
Ok, something like this will be faster.
sub testmatch
{
my ($find, $linesref)= #_ ;
for ( #$linesref ) { if ( $_ =~ /$find/ ) { return 1 ; } }
return 0 ;
}
{
open(DATA, "<filetosearch.txt") or die "die" ;
my #lines = <DATA> ;
open(SRC, "tests.txt") ;
while (<SRC>)
{
if ( testmatch( $_, \#lines )) { print "a match\n" }
}
}
If its matching full line to full line, you can pack the one line in as keys to a hash and just test existance:
{
open(DATA, "<filetosearch.txt") or die "die" ;
my %lines ;
#lines{<DATA>}= undef ;
open(SRC, "tests.txt") ;
while (<SRC>)
{
if ($_ ~~ %lines) { print "a match\n" }
}
}
maybe something like this will do the job:
open FILE1, "filetosearch.txt";
my #arrFileToSearch = <FILE1>;
close FILE1;
open FILE2, "signatures.txt";
my #arrSignatures = <FILE2>;
close FILE2;
for(my $i = 0; defined($arrFileToSearch[$i]);$i++){
foreach my $signature(#arrSignatures){
chomp($signature);
$signature = quotemeta($signature);#to be sure you are escaping special characters
if($arrFileToSearch[$i] =~ /$signature/){
print $arrFileToSearch[$i-3];#or any other index that you want
}
}
}
Here's another option:
use strict;
use warnings;
my $searchFile = pop;
my #strings = map { chomp; "\Q$_\E" } <>;
my $regex = '(?:' . ( join '|', #strings ) . ')';
push #ARGV, $searchFile;
while (<>) {
print if /$regex/;
}
Usage: perl script.pl strings.txt searchFile.txt [>outFile.txt]
The last, optional parameter directs output to a file.
First, the search file's name is (implicitly) popped off #ARGV and saved for later. Then the strings' file is read (<>) and map is used to chomp each line, escape meta-characters (the \Q and \E, in case there may be regex chars, e.g., a '.' or '*' etc., in the string) then these lines are passed to an array. The array's elements are joined with the regex alternation character (|) to effectively form an OR statement of all the strings that will be matched against each of the search file's lines. Next, the search file's name is pushed onto #ARGV so its lines can be searched. Again, each line is chomped and printed if one of the strings are found on the line.
Hope this helps!

check if a pattern exist in a file

i have a very simple perl question regarding pattern matching problem.
I am reading file with a list of names (fileA).
I would like to check if any of these names exist in another file (fileB).
if ($name -e $fileB){
do something
}else{
do something else
}
it is in a way to check if a pattern exists in a file.
I have tried
open(IN, $controls) or die "Can't open the control file\n";
while(my $line = <IN>){
if ($name =~ $line ){
print "$name\tfound\n";
}else{
print "$name\tnotFound\n";
}
}
This is repeating itself as it checks and prints every entry rather than checking whether the name exists or not.
When you are doing compare one list to another, you're interested in hashes. A hash is an array that is keyed and the list itself has no order. A hash can only have a single instance of a particular key (but different keys can have the same data).
What you can do is go through the first file, and create a hash keyed by that line. Then, you go through the second folder and check to see if any of those lines match any keys in your hash:
#! /usr/bin/env perl
use strict;
use warnings;
use feature qw(say);
use autodie; #You don't have to check if "open" fails.
use constant {
FIRST_FILE => 'file1.txt',
SECOND_FILE => 'file2.txt',
};
open my $first_fh, "<", FIRST_FILE;
# Get each line as a hash key
my %line_hash;
while ( my $line = <$first_fh> ) {
chomp $line;
$line_hash{$line} = 1;
}
close $first_fh;
Now each line is a key in your hash %line_hash. The data really doesn't matter. The important part is the value of the key itself.
Now that I have my hash of the lines in the first file, I can read in the second file and see if that line exists in my hash:
open my $second_fh, "<", SECOND_FILE;
while ( my $line = <$second_fh> ) {
chomp $line;
if ( exists $line_hash{$line} ) {
say qq(I found "$line" in both files);
}
}
close $second_fh;
There's a map function too that can be used:
#! /usr/bin/env perl
use strict;
use warnings;
use feature qw(say);
use autodie; #You don't have to check if "open" fails.
use constant {
FIRST_FILE => 'file1.txt',
SECOND_FILE => 'file2.txt',
};
open my $first_fh, "<", FIRST_FILE
chomp ( my #lines = <$first_fh> );
# Get each line as a hash key
my %line_hash = map { $_ => 1 } #lines;
close $first_fh;
open my $second_fh, "<", SECOND_FILE;
while ( my $line = <$second_fh> ) {
chomp $line;
if ( exists $line_hash{$line} ) {
say qq(I found "$line" in both files);
}
}
close $second_fh;
I am not a great fan of map because I don't find it that much more efficient and it is harder to understand what is going on.
To check whether a pattern exists in a file, you have to open the file and read its content. The fastest way how to search for inclusion of two lists is to store the content in a hash:
#!/usr/bin/perl
use strict;
use warnings;
open my $LST, '<', 'fileA' or die "fileA: $!\n";
open my $FB, '<', 'fileB' or die "fileB: $!\n";
my %hash;
while (<$FB>) {
chomp;
undef $hash{$_};
}
while (<$LST>) {
chomp;
if (exists $hash{$_}) {
print "$_ exists in fileB.\n";
}
}
I have just given an algorithm kind of code which is not tested.
But i feel this does the job for you.
my #a;
my $matched
my $line;
open(A,"fileA");
open(A,"fileB");
while(<A>)
{
chomp;
push #a,$_;
}
while(<B>)
{
chomp;
$line=$_;
$matched=0;
for(#a){if($line=~/$_/){last;$matched=1}}
if($matched)
{
do something
}
else
{
do something else
}
}

Printing array in Perl

I currently have my Perl script to read fstab files, split them up by column and search for which word in each column is the longest to display it. All that works peachy (I think), the problem I'm having is that it keeps printing out the same length for every line which is not true. Example $dev_parts prints 24, and $labe_parts prints 24 and so on...
below is my code.
#!/usr/bin/perl
use strict;
print "Enter file name: \n";
my $file_name = <STDIN>;
open(IN, "$file_name");
my #parts = split( /\s+/, $file_name);
foreach my $usr_file (<IN>) {
chomp($usr_file);
#parts = split( /\s+/, $usr_file);
push(#dev, $parts[0]);
push(#label, $parts[1]);
push(#tmpfs, $parts[2]);
push(#devpts, $parts[3]);
push(#sysfs, $parts[4]);
push(#proc, $parts[5]);
}
foreach $dev_parts (#dev) {
$dev_length1 = length ($parts[$dev_parts]);
if ( $dev_length1 > $dev_length2) {
$dev_length2 = $dev_length1;
}
}
print "The longest word in the first line is: $dev_length2 \n";
foreach $label_parts (#label) {
$label_length1 = length($parts[$label_parts]);
if ($label_length1 > $label_length2) {
$label_length2 = $label_length1;
}
}
print "The longest word in the first line is: $label_length2 \n";
This is how your code should be
#!/usr/bin/perl
use strict;
use warnings;
use Data::Dumper;
print "Enter file name: \n";
my $file_name = <STDIN>;
chomp($file_name);
open(FILE, "$file_name") or die $!;
my %colhash;
while (<FILE>) {
my $col=0;
my #parts = split /\s+/;
map { my $len = length($_);
$col++;
if($colhash{$col} < $len ){
$colhash{$col} = $len; # store the longest word length for each column
}
} #parts;
}
print Dumper(\%colhash);
You have a mistake here:
foreach $dev_parts (#dev) {
$dev_length1 = length ($parts[$dev_parts]);
As I understand it, you are looking for the longest element in #dev. However, you take the length of an element from the #parts array. This array is always set to whatever the last line of the file is. So you are looking at each element in the last line of the file, rather than each element of the appropriate column.
You just need to take length($dev_parts) instead.
Incidentally, here is a simpler way to find the longest length in an array:
use List::Util qw/max/; #Core module, always available.
my $longest_dev = max map {length} #dev;
A few other comments on your code:
use strict; is good. You should also use warnings;. It will help
you catch silly mistakes in your code.
You ought to check for errors whenever you open a file:
open(IN, $file_name) or die "Failed to open $file_name: $!";
Better yet, use the preferred open syntax with a lexical filehandle:
open(my $in_file, '<', $file_name) or die "Failed to open $file_name: $!";
...
while (<$in_file>) {
I'm not sure what you are trying to do here:
my #parts = split( /\s+/, $file_name);
You are splitting the file name by white space, but you don't use that for anything. And then you re-use the same array to hold the lines later.
A while loop is preferred to foreach when you go through lines of a file. It saves memory because it doesn't read the whole file into memory first (and it is otherwise exactly the same).
while (my $usr_file = <IN>) {

Program in Perl that reads from file, finds a line containing specific character and prints them. × 22510

I have been learning Perl for a few days and I am completely new.
The code is supposed to read from a big file and if a line contains "warning" it should store it and print it on a new line and also count the number of appearances of each type of warning. There are different types of warnings in the file e.g "warning GR145" or "warning GT10" etc.
So I want to print something like
Warning GR145 14 warnings
Warning GT10 12 warnings
and so on
The problem is that when I run it, it doesnt print the whole list of warnings.
I will appreciate your help. Here is the code:
use strict;
use warnings;
my #warnings;
open (my $file, '<', 'Warnings.txt') or die $!;
while (my $line = <$file>) {
if($line =~ /warning ([a-zA-Z0-9]*):/) {
push (#warnings, $line);
print $1 ,"\n";
}
}
close $file;
You are using case sensitive matching in your if statement. Try adding a /i:
if($line =~ /warning ([a-z0-9]*):/i)
EDIT: I misread the actual question, so this answer could be ignored...
You need to use a hash array, a mapping from warning string to occurrence count.
use strict;
use warnings;
my %warnings = {};
open (my $file, '<', 'Warnings.txt') or die $!;
while (my $line = <$file>) {
if ($line =~ /warning ([a-zA-Z0-9]*)\:.*/) {
++$warnings{$1};
}
}
close $file;
foreach $w (keys %warnings) {
print $w, ": ", $warnings{$w}, "\n";
}