How to check whether one file's value contains in another text file? (perl script) - perl

I would like to check one of the file's values contains on another file. if one of the value contains it will show there is existing bin for that specific, if no, it will show there is no existing bin limit. the problem is I am not sure how to check all values at once.
first DID1 text file value contain :
L84A:D:O:M:
L84C:B:E:D:
second DID text file value contain :
L84A:B:E:Q:X:F:i:M:Y:
L84C:B:E:Q:X:F:i:M:Y:
L83A:B:E:Q:X:F:i:M:Y:
if first 4words value are match, need to check all value for that line.
for example L84A in first text file & second text file value has M . it should print out there is an existing M bin
below is my code :
use strict;
use warnings;
my $filename = 'DID.txt';
my $filename1 = 'DID1.txt';
my $count = 0;
open( FILE2, "<$filename1" )
or die("Could not open log file. $!\n");
while (<FILE2>) {
my ($number) = $_;
chomp($number);
my #values1 = split( ':', $number );
open( FILE, "<$filename" )
or die("Could not open log file. $!\n");
while (<FILE>) {
my ($line) = $_;
chomp($line);
my #values = split( ':', $line );
foreach my $val (#values) {
if ( $val =~ /$values1[0]/ ) {
$count++;
if ( $values[$count] =~ /$values1[$count]/ ) {
print
"Yes ,There is an existing bin & DID\n #values1\n";
}
else {
print "No, There is an existing bin & DID\n";
}
}
}
}
}
I cannot check all value. please help to give any advice on it since this is my first time learning for perl language. Thanks a lot :)

Based on my understanding I write this code:
use strict;
use warnings;
#use ReadWrite;
use Array::Utils qw(:all);
use vars qw($my1file $myfile1cnt $my2file $myfile2cnt #output);
$my1file = "did1.txt"; $my2file = "did2.txt";
We are going to read both first and second files (DID1 and DID2).
readFileinString($my1file, \$myfile1cnt); readFileinString($my2file, \$myfile2cnt);
In first file, as per the OP's request the first four characters should be matched with second file and then if they matched we need to check rest of the characters in the first file with the second one.
while($myfile1cnt=~m/^((\w){4})\:([^\n]+)$/mig)
{
print "<LineStart>";
my $lineChk = $1; my $full_Line = $3; #print ": $full_Line\n";
my #First_values = split /\:/, $full_Line; #print join "\n", #First_values;
If the first four digit matched then,
if($myfile2cnt=~m/^$lineChk\:([^\n]+)$/m)
{
Storing the rest of the content in the same and to be split with colon and getting the characters to be matched with first file contents.
my $FullLine = $1; my #second_values = split /:/, $FullLine;
Then search each letter first and second content which matched line...
foreach my $sngletter(#First_values)
{
If the letters are matched with first and second file its going to be printed.
if( grep {$_ eq "$sngletter"} #second_values)
{
print "Matched: $sngletter\t";
}
}
}
else { print "Not Matched..."; }
This is just information that the line end.
print "<LineEnd>\n"
}
#------------------>Reading a file
sub readFileinString
#------------------>
{
my $File = shift;
my $string = shift;
use File::Basename;
my $filenames = basename($File);
open(FILE1, "<$File") or die "\nFailed Reading File: [$File]\n\tReason: $!";
read(FILE1, $$string, -s $File, 0);
close(FILE1);
}

Read search pattern and data into hash (first field is a key), then go through data and select only field included into pattern for this key.
use strict;
use warnings;
use feature 'say';
my $input1 = 'DID1.txt'; # look for key,pattern(array)
my $input2 = 'DID.txt'; # data - key,elements(array)
my $pattern;
my $data;
my %result;
$pattern = file2hash($input1); # read pattern into hash
$data = file2hash($input2); # read data into hash
while( my($k,$v) = each %{$data} ) { # walk through data
next unless defined $pattern->{$k}; # skip those which is not in pattern hash
my $find = join '|', #{ $pattern->{$k} }; # form search pattern for grep
my #found = grep {/$find/} #{ $v }; # extract only those of interest
$result{$k} = \#found; # store in result hash
}
while( my($k,$v) = each %result ) { # walk through result hash
say "$k has " . join ':', #{ $v }; # output final result
}
sub file2hash {
my $filename = shift;
my %hash;
my $fh;
open $fh, '<', $filename
or die "Couldn't open $filename";
while(<$fh>) {
chomp;
next if /^\s*$/; # skip empty lines
my($key,#data) = split ':';
$hash{$key} = \#data;
}
close $fh;
return \%hash;
}
Output
L84C has B:E
L84A has M

Related

how to replace specific line by write appending to file

i have two files . one is user's input file and another file is original config file. After comparing two files , do add/delete functions in my original config file.
user's input file: (showing line by line)
add:L28A:Z:W #add--> DID ID --> Bin ID
del:L28C:B:Q:X:
rpl:L38A:B:M:D:
original input file
L28A:B:Q:M:X:
L28C:B:Q:M:X:
L38A:B:Q:M:X:
based on user's input file , first is doing add function second is delete function and third is replace function.
so output for original input txt file should show:
L28A:B:Q:M:X:Z:W
L28C:M:
L38A:B:M:D:
but my code is showing :
L28A:B:Q:M:X:
L28C:B:Q:M:X:
L38A:B:Q:M:X:
L28A:B:Q:M:X:Z:W
L28C:M:
L38A:B:M:D:
how can i replace above three lines with new modify lines?
use strict;
use warnings;
use File::Copy;
use vars qw($requestfile $requestcnt $configfile $config2cnt $my3file $myfile3cnt $new_file $new_filecnt #output);
my $requestfile = "DID1.txt"; #user's input file
my $configfile = "DID.txt"; #original config file
my $new_file = "newDID.txt";
readFileinString($requestfile, \$requestcnt);
readFileinString($configfile, \$config2cnt);
copy($configfile, $new_file) or die "The copy operation failed: $!";
while ($requestcnt =~ m/^((\w){3})\:([^\n]+)$/mig) #Each line from user request
{
my $action = $1;
my $requestFullLine = $3;
while ($requestFullLine =~ m/^((\w){4})\:([^\n]+)$/mig) #Each line from user request
{
my $DID = $1; #DID
my $requestBinList = $3; #Bin List in user request
#my #First_values = split /\:/, $requestBinList;
if ($config2cnt =~ m/^$DID\:([^\n]+)$/m) #configfile
{
my $ConfigFullLine = $1; #Bin list in config
my $testfile = $1;
my #First_values = split /\:/, $ConfigFullLine;
my #second_values = split /\:/, $requestBinList;
foreach my $sngletter(#second_values) # Each line from user request
{
if( grep {$_ eq "$sngletter"} #First_values)
{
print " $DID - $sngletter - Existing bin..\n\n";
}
else
{
print "$DID - $sngletter - Not existing bin..\n\n";
}
}
print "Choose option 1.Yes 2.No\n";
my $option = <STDIN>;
if ($option == 1) {
open(DES,'>>',$configfile) or die $!;
if($action eq 'add')
{
$ConfigFullLine =~ s/$/$requestBinList/g;
my $add = "$DID:$ConfigFullLine";
print DES "$add\n" ;
print"New Added Bin Valu $add\n\n";
}
if ( $action eq 'del')
{
foreach my $sngletter(#second_values){
$ConfigFullLine =~ s/$sngletter://g;
}
print DES "$DID:$ConfigFullLine\n";
print "New Deleted Bin Value $DID:$ConfigFullLine\n\n";
}
if ( $action eq 'rpl')
{
my $ConfigFullLine = $requestBinList;
my $replace = "$DID:$ConfigFullLine";
print DES "$replace\n";
print"Replace Bin Value $replace\n\n";
}
}
elsif ($option == 2)
{
print"Start from begining\n";
}
else
{
print "user chose invalid process or input is wrong\n";
}
}
else
{
print "New DID $DID detected\n";}
}
}
sub readFileinString
{
my $File = shift;
my $string = shift;
use File::Basename;
my $filenames = basename($File);
open(FILE1, "<$File") or die "\nFailed Reading File: [$File]\n\tReason: $!";
read(FILE1, $$string, -s $File, 0);
close(FILE1);
}
The problem is here:
open(DES,'>>',$configfile) or die $!;
You open your file for appending. So you get the original data, followed by your edited data.
Update: It appears that you have a working solution now, but I thought it might be interesting to show you how I would write this.
This program is a Unix filter. That is, it reads from STDIN and writes to STDOUT. I find that far more flexible than hard-coded filenames. You also don't have to explicitly open files - which saves time :-)
It also takes a command-line option, -c, telling it which file contains the edit definitions. So it is called like this (assuming we've called the program edit_files:
$ edit_files -c edit_definitions.txt < your_input_file > your_output_file
And here's the code.
#!/usr/bin/perl
use strict;
use warnings;
use Getopt::Std;
my %opts;
getopts('e:', \%opts);
my %edits = read_edits($opts{e});
while (<>) {
chomp;
my ($key, $val) = split /:/, $_, 2; #/ stop faulty syntax highlight
if (!exists $edits{$key}) {
print "$_\n";
next;
}
my $edit = $edits{$key};
if ($edit->[0] eq 'add') {
print "$_$edit->[1]\n";
} elsif ($edit->[0] eq 'del') {
$val =~ s/$_:// for split /:/, $edit->[1]; #/
print "$key:$val\n";
} elsif ($edit->[0] eq 'rpl') {
print "$key:$edit->[1]\n";
} else {
warn "$edit->[0] is an invalid edit type\n";
next;
}
}
sub read_edits {
my $file = shift;
open my $edit_fh, '<', $file or die $!;
my %edits;
while (<$edit_fh>) {
chomp;
# Remove comments
s/\s*#.*//; #/
my ($type, $key, $val) = split /:/, $_, 3; #/
$edits{$key} = [ $type, $val ];
}
}

Parsing string in multiline data with positive lookbehind

I am trying to parse data like:
header1
-------
var1 0
var2 5
var3 9
var6 1
header2
-------
var1 -3
var3 5
var5 0
Now I want to get e.g. var3 for header2. Whats the best way to do this?
So far I was parsing my files line-by-line via
open(FILE,"< $file");
while (my $line = <FILE>){
# do stuff
}
but I guess it's not possible to handle multiline parsing properly.
Now I am thinking to parse the file at once but wasn't successful so far...
my #Input;
open(FILE,"< $file");
while (<FILE>){ #Input = <FILE>; }
if (#Input =~ /header2/){
#...
}
The easier way to handle this is "paragraph mode".
local $/ = "";
while (<>) {
my ($header, $body) =~ /^([^\n]*)\n-+\n(.*)/s
or die("Bad data");
my #data = map [ split ], split /\n/, $body;
# ... Do something with $header and #data ...
}
The same can be achieved without messing with $/ as follows:
my #buf;
while (1) {
my $line = <>;
$line =~ s/\s+\z// if !defined($line);
if (!length($line)) {
if (#buf) {
my $header = shift(#buf);
shift(#buf);
my #data = map [ split ], splice(#buf);
# ... Do something with $header and #data ...
}
last if !defined($line);
next;
}
push #buf, $line;
}
(In fact, the second snippet includes a couple of small improvements over the first.)
Quick comments on your attempt:
The while loop is useless because #Input = <FILE> places the remaining lines of the file in #Input.
#Input =~ /header2/ matches header2 against the stringification of the array, which is the stringification of the number of elements in #Input. If you want to check of an element of #Input contains header2, will you will need to loop over the elements of #Inputs and check them individually.
while (<FILE>){ #Input = <FILE>; }
This doesn't make much sense. "While you can read a record from FILE, read all of the data on FILE into #Input". I think what you actually want is just:
my #Input = <FILE>;
if (#Input =~ /header2/){
This is quite strange too. The binding operator (=~) expects scalar operands, so it evaluates both operands in scalar context. That means #Input will be evaluated as the number of elements in #Input. That's an integer and will never match "header2".
A couple of approaches. Firstly a regex approach.
#!/usr/bin/perl
use strict;
use warnings;
use feature 'say';
my $file = 'file';
open my $fh, '<', $file or die $!;
my $data = join '', <$fh>;
if ($data =~ /header2.+var3 (.+?)\n/s) {
say $1;
} else {
say 'Not found';
}
The key to this is the /s on the m// operator. Without it, the two dots in the regex won't match newlines.
The other approach is more of a line by line parser.
#!/usr/bin/perl
use strict;
use warnings;
use feature 'say';
my $file = 'file';
open my $fh, '<', $file or die $!;
my $section = '';
while (<$fh>) {
chomp;
# if the line all word characters,
# then we've got a section header.
if ($_ !~ /\W/) {
$section = $_;
next;
}
my ($key, $val) = split;
if ($section eq 'header2' and $key eq 'var3') {
say $val;
last;
}
}
We read the file a line at a time and make a note of the section headers. For data lines, we split on whitespace and check to see if we're in the right section and have the right key.
In both cases, I've switched to using a more standard approach (lexical filehandles, 3-arg open(), or die $!) for opening the file.

Reading and comparing lines in Perl

I am having trouble with getting my perl script to work. The issue might be related to the reading of the Extract file line by line within the while loop, any help would be appreciated. There are two files
Bad file that contains a list of bad IDs (100s of IDs)
2
3
Extract that contains a delimited data with the ID in field 1 (millions of rows)
1|data|data|data
2|data|data|data
2|data|data|data
2|data|data|data
3|data|data|data
4|data|data|data
5|data|data|data
I am trying to remove all the rows from the large extract file where the IDs match. There can be multiple rows where the ID matches. The extract is sorted.
#use strict;
#use warnning;
$SourceFile = $ARGV[0];
$ToRemove = $ARGV[1];
$FieldNum = $ARGV[2];
$NewFile = $ARGV[3];
$LargeRecords = $ARGV[4];
open(INFILE, $SourceFile) or die "Can't open source file: $SourceFile \n";
open(REMOVE, $ToRemove) or die "Can't open toRemove file: $ToRemove \n";
open(OutGood, "> $NewFile") or die "Can't open good output file \n";
open(OutLarge, "> $LargeRecords") or die "Can't open Large Records output file \n";
#Read in the list of bad IDs into array
#array = <REMOVE>;
#Loop through each bad record
foreach (#array)
{
$badID = $_;
#read the extract line by line
while(<INFILE>)
{
#take the line and split it into
#fields = split /\|/, $_;
my $extractID = $fields[$FieldNum];
#print "Here's what we got: $badID and $extractID\n";
while($extractID == $badID)
{
#Write out bad large records
print OutLarge join '|', #fields;
#Get the next line in the extract file
#fields = split /\|/, <INFILE>;
my $extractID = $fields[$FieldNum];
$found = 1; #true
#print " We got a match!!";
#remove item after it has been found
my $input_remove = $badID;
#array = grep {!/$input_remove/} #array;
}
print OutGood join '|', #fields;
}
}
Try this:
$ perl -F'|' -nae 'BEGIN {while(<>){chomp; $bad{$_}++;last if eof;}} print unless $bad{$F[0]};' bad good
First, you are lucky: The number of bad IDs is small. That means, you can read the list of bad IDs once, stick them in a hash table without running into any difficulty with memory usage. Once you have them in a hash, you just read the big data file line by line, skipping output for bad IDs.
#!/usr/bin/env perl
use strict;
use warnings;
# hardwired for convenience
my $bad_id_file = 'bad.txt';
my $data_file = 'data.txt';
my $bad_ids = read_bad_ids($bad_id_file);
remove_data_with_bad_ids($data_file, $bad_ids);
sub remove_data_with_bad_ids {
my $file = shift;
my $bad = shift;
open my $in, '<', $file
or die "Cannot open '$file': $!";
while (my $line = <$in>) {
if (my ($id) = extract_id(\$line)) {
exists $bad->{ $id } or print $line;
}
}
close $in
or die "Cannot close '$file': $!";
return;
}
sub read_bad_ids {
my $file = shift;
open my $in, '<', $file
or die "Cannot open '$file': $!";
my %bad;
while (my $line = <$in>) {
if (my ($id) = extract_id(\$line)) {
$bad{ $id } = undef;
}
}
close $in
or die "Cannot close '$file': $!";
return \%bad;
}
sub extract_id {
my $string_ref = shift;
if (my ($id) = ($$string_ref =~ m{\A ([0-9]+) }x)) {
return $id;
}
return;
}
I'd use a hash as follows:
use warnings;
use strict;
my #bad = qw(2 3);
my %bad;
$bad{$_} = 1 foreach #bad;
my #file = qw (1|data|data|data 2|data|data|data 2|data|data|data 2|data|data|data 3|data|data|data 4|data|data|data 5|data|data|data);
my %hash;
foreach (#file){
my #split = split(/\|/);
$hash{$split[0]} = $_;
}
foreach (sort keys %hash){
print "$hash{$_}\n" unless exists $bad{$_};
}
Which gives:
   
1|data|data|data
4|data|data|data
5|data|data|data

how to write my results to external file in perl

I am trying to read some particular columns from myu data into my output file, i succeed in this reading one cloumn at a time but i want to read some more columns of my interest at a time (i have list of column i want to extract in a separate tex file) because extract individual column and joining them to make one separate file will become hectic to me, here is the code i tried to extract single coulmn,
#!/usr/bin/perl
use strict;
use warnings;
open (DATA, "<file.txt") or die ("Unable to open file");
my $search_string = "IADC512444";
my $header = <DATA>;
my #header_titles = split /\t/, $header;
my $extract_col = 0;
for my $header_line (#header_titles) {
last if $header_line =~ m/$search_string/;
$extract_col++;
}
print "Extracting column $extract_col\n";
while ( my $row = <DATA> ) {
last unless $row =~ /\S/;
chomp $row;
my #cells = split /\t/, $row;
print "$cells[$extract_col] ";
}
is there any possibility to extract all columns at a time instead of only IADC512444 i want from my textfile into outfile on to my harddisc? please help me in solving this problem,
Thanks
If you need to print the contents to a file on disk then you should open a file in write mode and write to it. Also if you want more columns you can do that by accessing corresponding element in the array cells. In this example i am printing the column you are printing plus column 1 and 2
open(OUT_FILE,">path_to_out_file") || die "cant open file...";
while ( my $row = <DATA> ) {
last unless $row =~ /\S/;
chomp $row;
my #cells = split /\t/, $row;
#print "$cells[$extract_col] ";
print OUT_FILE "$cells[$extract_col],$cells[1],$cells[2]\n";
}
close(OUT_FILE)
I have tweaked the code little bit to suit your requirement.
In the variable req_hdr_string you should say the column names which you require separated by ,
So it will be splitted and stored in a hash.
Then from the header i get the position of the column and print only those
#!/usr/bin/perl
use strict;
use warnings;
open (DATA, "<h11.txt") or die ("Unable to open file");
my $req_hdr_string = "abc,ghi,mno,";
my %req_hdrs = ();
my %extract_col = ();
foreach(split /,/, $req_hdr_string)
{
print "req hdr is:$_\n";
$req_hdrs{$_} = $_;
}
my $index = 0;
my $header = <DATA>;
chomp $header;
foreach (split /\t/, $header)
{
print "input is:|$_|\n";
if(exists $req_hdrs{$_})
{
print "\treq index is:$index\n";
$extract_col{$index} = 1;
}
$index++;
}
open(OUT_FILE,">out_file") || die "cant open file...";
while ( my $row = <DATA> )
{
last unless $row =~ /\S/;
chomp $row;
my #cells = split /\t/, $row;
foreach $index (sort keys%extract_col)
{
print OUT_FILE "$cells[$index],";
}
print OUT_FILE "\n";
}
close(OUT_FILE);
close(DATA);

Reading Data from a file in Perl

I have a file abc.txt that has data of the form
sHost = "Arun";
sUid ="Abc";
I want to get Arun for sHost and so forth using Perl. My code:
my $filename = "abc.txt";
use strict;
use warnings;
open(my $fh, '<:encoding(UTF-8)', $filename)
or die "Could not open file '$filename' $!";
while (my $row = <$fh>)
{
chomp $row;
if ($row=~m/sHost/)
{
print $row;
}
}
The output I am getting sHost = Arun;
But I want only 'Arun'. What logic should I apply here? I am very new to Perl and Linux.
After the chomp, alter to this and the variable $host will contain the value
if ($row=~m/sHost = "(.*)"/) {
$host=$1;
In simple terms the ( ) section is given to $1 if there is a match. See man perlre for the details
To generalise this to read any key and any value do something like this
while (my $row = <$fh>) {
if ($row = ~ /^(\w+) = "([^"]+)"/) {
$value{$1} = $2;
}
Then $value{'sHost'} will be "Arun" etc
For universal config file parsing you can use following piece of code:
my %config;
if ($row =~ m/^\s*(["'`])?(\S+)\1?\s*=\s*(["'`])?(\S+?)\3?;?$/) {
my $key = $2;
my $value = $4;
$config{$key} = $value;
}
This regexp allows you to process key-value lines with plain or surrounded by different quote type (" ' `, but you can add your symbols if you like) key/value with leading or/and trailing whitespaces, semicolon is not ogligatory. Also you can change (\S+) according to your requirements of key/value possible values (\S - all except whitespaces).
use m/.*=\s*([^\s]*)/g instead of m/sHost/
use print $1 instead of print $row
Replace
if ($row=~m/sHost/)
with
if ($row=~s/sHost//)