How to print lines that don't match?

How to print lines that don't match? - perl

The script I have written outputs all lines from the file 2 that starts with a number that is in the file 1.
Question
How do I output all the other lines that didn't matched?
#!/usr/bin/perl
use strict;
use warnings;
use Data::Dumper;
my #res;
open(FILE, '<', "1") or die $!;
while (defined (my $line = <FILE>)) {
chomp $line;
push #res, $line;
}
close FILE;
open(FILE, '<', "2") or die $!;
while (defined (my $line = <FILE>)) {
chomp $line;
$line =~ m/(\d+)/;
if (defined $1) {
foreach my $a (#res) {
if ($a == $1) {
print $line . "\n";
}
}
}
}
close FILE;
File 1
155
156
157
158
159
160
File 2
150 a
151 f
152 r
153 a
154 a
155 a
156 a
157 f
158 f
159 f

Your answer is pretty close actually: it's enough to change this
foreach my $a (#res) {
if ($a == $1) {
print $line . "\n";
}
}
... to this ...
my $found;
foreach my $a (#res) {
if ($a eq $1) { # we compare strings, not numbers, even if these strings are 'numeric'
$found = 1;
print $line . "\n";
last; # no need to look further, we already found an item
}
}
print "Not matched: $line", "\n" unless $found;
Yet still there's something to talk about. ) See, as all these number strings in the first file are unique, it's much better to use a hash for storing them. The code will actually not change that much:
my %digits;
... # in the first file processing loop:
$digits{$line} = 1;
... # in the second file processing loop, instead of foreach:
if ($digits{$1}) {
print $line, "\n";
} else {
print "Not matched: $line", "\n";
}
But the point is that searching in hash is MUCH faster than looping through an array again and again. )

use strict;
use warnings;
my %res;
open(FILE, '<', "1") or die $!;
while (defined (my $line = <FILE>)) {
chomp $line;
$res{$line} = 1;
}
close FILE;
open(FILE, '<', "2") or die $!;
while (defined (my $line = <FILE>)) {
if ($line =~ m/(\d+)/) {
print $line if not $res{$1};
}
}
close FILE;

Related

Perl script grep

The script is printing the amount of input lines, I want it to print the amount of input lines that are present in another file
#!/usr/bin/perl -w
open("file", "text.txt");
#todd = <file>;
close "file";
while(<>){
if( grep( /^$_$/, #todd)){
#if( grep #todd, /^$_$/){
print $_;
}
print "\n";
}
if for example file contains
1
3
4
5
7
and the input file that will be read from contains
1
2
3
4
5
6
7
8
9
I would want it to print 1,3,4,5 and 7
but 1-9 are being printed instead
UPDATE******
This is my code now and I am getting this error
readline() on closed filehandle todd at ./may6test.pl line 3.
#!/usr/bin/perl -w
open("todd", "<text.txt");
#files = <todd>; #file looking into
close "todd";
while( my $line = <> ){
chomp $line;
if ( grep( /^$line$/, #files) ) {
print $_;
}
print "\n";
}
which makes no sense to me because I have this other script that is basically doing the same thing
#!/usr/bin/perl -w
open("file", "<text2.txt"); #
#file = <file>; #file looking into
close "file"; #
while(<>){
$temp = $_;
$temp =~ tr/|/\t/; #puts tab between name and id
my ($name, $number1, $number2) = split("\t", $temp);
if ( grep( /^$number1$/, #file) ) {
print $_;
}
}
print "\n";

OK, the problem here is - grep sets $_ too. So grep { $_ } #array will always give you every element in the array.
At a basic level - you need to:
while ( my $line = <> ) {
chomp $line;
if ( grep { /^$line$/ } #todd ) {
#do something
}
}
But I'd suggest instead that you might want to consider building a hash of your lines instead:
open( my $input, '<', "text.txt" ) or die $!;
my %in_todd = map { $_ => 1 } <$input>;
close $input;
while (<>) {
print if $in_todd{$_};
}
Note - you might want to watch for trailing linefeeds.

Perl compilation error

sorry if it seems obvious but Im pretty new at Perl and programming and I've been working over a week and can't get it done.
My idea is simple. I've got a .csv where I've got the names in the first column, a number from -1 to 1 in the second and a position on the third. Then another file where I have got the names (line starts with >) and the info with 80 characters per line.
What I want to do is keep the name lines of the first file and grab the 'position' given from -20 to +60. But I cannot get it to work and I've got to the point where don't know where to follow.
use strict; #read file line by line
use warnings;
my $outputfile = "Output1.txt";
my $filename = "InputP.txt";
my $inputfasta = "Inputfasta.txt";
open my $fh, '<', $filename or die "Couldn't open '$filename'";
open my $fh2, '>', $outputfile or die "Couldn't create '$outputfile'";
open my $fh3, '<', $inputfasta or die "Couldn't open '$inputfasta'";
my $Psequence = 0;
my $seqname = 0;
while (my $line = <$fh>) {
chomp $line;
my $length = index ($line, ",");
$seqname = substr ($line, 0, $length);
my $length2 = index ($line, ",", $length);
my $score = substr ($line, $length +1, $length2);
my $length3 = index ($line, ",", $length2);
my $position = substr ($line, $length2 +1, $length3);
#print $fh2 "$seqname"."\t"."$score"."\t"."$position"."\n"; }
my $Rlength2 = index ($score, ",");
my $Rscore = substr ($score, 0, $Rlength2);
#print "$Rscore"."\n";}
while (my $linea = <$fh3>){ #same order.
chomp $linea;
if ($linea=~/^>(.+)/) {
print $fh3 "\n"."$linea"."\n"; }
else { $linea =~ /^\s*(.*)\s*$/;
chomp $linea;
print $fh3 "$linea". "\n"; }
}
if ($Rscore >= 0.5){
$Psequence = substr ($linea, -20, 81);
print "$seqname"."\n"."$Psequence";}
}

Please, learn to indent the code correctly. Then the error will be more obvious:
while (my $linea = <$fh3>){ #same order.
chomp $linea;
if ($linea =~ /^>(.+)/) {
print $fh3 "\n$linea\n";
} else {
# Commented out as it does nothing.
# $linea =~ /^\s*(.*)\s*$/;
# chomp $linea;
print $fh3 "$linea\n";
}
}
if ($Rscore >= 0.5){
$Psequence = substr $linea, -20, 81;
print "$seqname\n$Psequence";
}
$linea exists only in the while loop, but you try to use it in the following paragraph, too. The variable disappears when the loop ends.

Create a hash from the CSV where the key is the name and the value is the position.
use Text::CSV_XS qw( );
my %pos_by_name;
{
open(my $fh, '<', $input_qfn)
or die("Can't open $input_qfn: $!\n");
my $csv = Text::CSV_XS->new({ auto_diag => 1, binary => 1 });
while (my $row = $csv_in->getline($fh)) {
$pos_by_name{ $row->[0] } = $row->[2];
}
}
Then, it's just a question of extracting the names from the other file, and using the hash to find the associated position.
open(my $fh, '<', $fasta_qfn)
or die("Can't open $fasta_qfn: $!\n");
while (<$fh>) {
chomp;
my ($name) = /^>(.*)/
or next;
my $pos = $pos_by_name{$name};
if (!defined($pos)) {
die("Can't find position for $name\n");
}
... Do something with $name and $pos ...
}

How can I organize a table based on similarities of two columns?

The following table represents a tab-delimited file that I have.
1 2 3
A Jack 01
A Mary 02
A Jack 03
B Mary 04
B Mike 05
B Mary 06
C Mike 07
C Mike 08
C Jack 09
I would like to parse this text file and create multiple text files based on columns 1 & 2. Each text file would contain data (column 3) where column 1&2 are the same. So in this example, the data would be organized as follows:
> file1.txt
A Jack 01
A Jack 03
> file2.txt
A Mary 02
> file3.txt
B Mary 04
B Mary 06
> file4.txt
B Mike 05
> file5.txt
C Mike 07
C Mike 08
> file6.txt
C Jack 09
#What would be the best way to tackle this? The only method I can think of is to create a 2-dimensional array and then comparing every row/col pair.
edit: The following code seems to work. Is there a better alternative?
#!/usr/bin/perl
use strict;
use warnings;
my $file = "/home/user/Desktop/a.txt";
my #array =();
open(FILE, $file) || die "cannot open file";
while(my $line = <FILE>){
my #row = split("\t", $line);
my $two = $row[1]."\t".$row[2];
push(#array, $two);
}
close FILE;
#array = uniq(#array);
my $counter = 0;
foreach my $x (#array){
#unique tab
open(SEC, $file) || die "cannot open 2nd time\n";
while(my $line = <SEC>){
if($line =~ /($x)/){
my $output = "txt".$counter.".txt";
print "hit!\n";
open(OUT, '>>', $output) || die "cannot write out";
print OUT $line."\n";
close OUT;
}
}
$counter++;
}
close SEC;
sub uniq {
return keys %{{ map { $_ => 1 } #_ }};
}
I know how to sort it via command line (sort -t: -k1,2 a.txt) but I'm wondering how to do it within perl and writeout the multiple files.

Concatenate the two first fields and use them as hash keys. Each hash key points to an array where you add all relevant lines:
#!/usr/bin/perl
use strict;
use warnings;
open my $fh, "/home/johan/Desktop/tabdel.txt"
or die $!;
<$fh>; # Skip header
my $data = {};
while (my $line = <$fh>) {
# match the fields
next unless $line =~ /^(\S+)\s+(\S+)\s+\S+/;
# append $line to the hash value, key is the concatenated two first fields:
push #{ $data->{"$1 $2"}->{'lines'} }, "$line";
}
my $file_count = 0;
foreach my $key (sort keys %{$data}) {
my $qfn = "file".(++$file_count).".txt";
open(my $fh, '>', $qfn)
or die $!;
foreach my $line (#{ $data->{$key}->{'lines'} }) {
print $fh $line;
}
}

Perhaps the following will be helpful:
use strict;
use warnings;
my %hash;
my $n = 1;
while (<>) {
next if $. == 1;
my #a = split;
if ( !exists $hash{ $a[0] }{ $a[1] } ) {
open $hash{ $a[0] }{ $a[1] }, '>', 'file' . $n++ . '.txt' or die $!;
}
print { $hash{ $a[0] }{ $a[1] } } $_;
}
Usage: perl script.pl inFile.txt
Acknowledging ikegami's point about possibly running out of file handles in case your dataset is large, here's a option that collects results in a hash and then prints the results to files (the usage is the same as the above script):
use strict;
use warnings;
my ( %hash, %seen );
my $n = 0;
while (<>) {
next if $. == 1;
my ( $key, $elem ) = /(.+\s+)(\d+)\Z/;
push #{ $hash{$key} }, $elem;
}
for my $key ( sort { $hash{$a}->[0] <=> $hash{$b}->[0] } keys %hash ) {
$n++ if !$seen{$key}++;
open my $fh, '>', 'file' . $n . '.txt' or die $!;
print $fh "$key$_\n" for #{ $hash{$key} };
}

Why I am not getting "success" with this program?

I have written the following program with the hope of getting success. But I could never get it.
my $fileName = 'myfile.txt';
print $fileName,"\n";
if (open MYFILE, "<", $fileName) {
my $Data;
{
local $/ = undef;
$Data = <MYFILE>;
}
my #values = split('\n', $Data);
chomp(#values);
if($values[2] eq '9999999999') {
print "Success"."\n";
}
}
The content of myfile.txt is
160002
something
9999999999
700021

Try splitting by \s*[\r\n]+
my $fileName = 'myfile.txt';
print $fileName,"\n";
if (open MYFILE, "<", $fileName) {
my $Data;
{
local $/ = undef;
$Data = <MYFILE>;
}
my #values = split(/\s*[\r\n]+/, $Data);
if($values[2] eq '9999999999') {
print "Success";
}
}

If myfile.txt contain carriage return (CR, \r), it will not work as expected.
Another possible cause is trailing spaces before linefeed (LF, \n).

You don't need to read an entire file into an array to check one line. Open the file, skip the lines you don't care about, then play with the line you do care about. When you've done what you need to do, stop reading the file. This way, only one line is ever in memory:
my $fileName = 'myfile.txt';
open MYFILE, "<", $fileName or die "$filename: $!";
while( <MYFILE> ) {
next if $. < 3; # $. is the line number
last if $. > 3;
chomp;
print "Success\n" if $_ eq '9999999999';
}
close MYFILE;

my $fileName = 'myfile.txt';
open MYFILE, "<", $fileName || die "$fileName: $!";
while( $rec = <MYFILE> ) {
for ($rec) { chomp; s/\r//; s/^\s+//; s/\s+$//; } #Remove line-feed and space characters
$cnt++;
if ( $rec =~ /^9+$/ ) { print "Success\n"; last; } #if record matches "9"s only
#print "Success" and live the loop
}
close MYFILE;
#Or you can write: if ($cnt==3 and $rec =~ /^9{10}$/) { print "Success\n"; last; }
#If record 3 matches ten "9"s print "Success" and live the loop.

Using perl, how do I search a text file for _NN (at the end of a word) and print the word in front?

This gives the whole line:
#!/usr/bin/perl
$file = 'output.txt';
open(txt, $file);
while($line = <txt>) {
print "$line" if $line =~ /_NN/;
}
close(txt);

#!/usr/bin/perl
use strict;
use warnings FATAL => "all";
binmode(STDOUT, ":utf8") || die;
my $file = "output.txt";
open(TEXT, "< :utf8", $file) || die "Can't open $file: $!";
while(<TEXT>) {
print "$1\n" while /(\w+)_NN\b/g;
}
close(TEXT) || die "Can't close $file: $!";

Your answer script reads a bit awkwardly, and has a couple of potential errors. I'd rewrite the main logic loop like so:
foreach my $line (grep { /expend_VB/ } #sentences) {
my #nouns = grep { /_NN/ } split /\s+/, $line;
foreach my $word (#nouns) {
$word =~ s/_NN//;
print "$word\n";
}
print "$line\n" if scalar(#nouns);
}
You need to put the my declaration inside the loop - otherwise it will persist longer than you want it to, and could conceivably cause problems later.
foreach is a more common perl idiom for iterating over a list.

print "$1" if $line =~ /(\S+)_NN/;

#!/usr/bin/perl
use strict;
use warnings FATAL => "all";
my $search_key = "expend"; ## CHANGE "..." to <>
open(my $tag_corpus, '<', "ch13tagged.txt") or die $!;
my #sentences = <$tag_corpus>; # This breaks up each line into list
my #words;
for (my $i=0; $i <= #sentences; $i++) {
if ( defined( $sentences[$i] ) and $sentences[$i] =~ /($search_key)_VB.*/i) {
#words = split /\s/,$sentences[$i]; ## \s is a whitespace
for (my $j=0; $j <= #words; $j++) {
#FILTER if word is noun:
if ( defined( $words[$j] ) and $words[$j] =~ /_NN/) {
#PRINT word and sentence:
print "**",split(/_\S+/,$words[$j]),"**", "\n";
print split(/_\S+/,$sentences[$i]), "\n"
}
} ## put print sentences here to print each sentence after all the nouns inside
}
}
close $tag_corpus || die "Can't close $tag_corpus: $!";

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse

How to print lines that don't match? - perl

use strict; use warnings; my %res; open(FILE, '<', "1") or die $!; while (defined (my $line = <FILE>)) { chomp $line; $res{$line} = 1; } close FILE; open(FILE, '<', "2") or die $!; while (defined (my $line = <FILE>)) { if ($line =~ m/(\d+)/) { print $line if not $res{$1}; } } close FILE;

Related

Perl script grep

Perl compilation error

How can I organize a table based on similarities of two columns?

Why I am not getting "success" with this program?

Using perl, how do I search a text file for _NN (at the end of a word) and print the word in front?

Categories

Resources