Compare and replace two files in perl - perl

I am trying to compare the string in two documents test1, test 2
Test 1:
<p><imagedata rid="rId7"></p>
...
<p><imagedata rid="rId8"></p>
Test2:
<imagesource Id="rId7" Target="image/image1.jpg"/>
...
<imagesource Id="rId9" Target="image/image2.jpg"/>
...
<imagesource Id="rId8" Target="image/image3.jpg"/>
What I want is, the first file should get replaced with the image target path like:
<p><imagedata src="image/image1.jpg"></p>
...
<p><imagedata rid="image/image3.jpg"></p>
I tried to extract the text from both files but I stuck to compare both strings
opendir(DIR, $filenamenew1);
our(#test1,#test2);
open fhr, "$filenamenew1/test1.txt";
open fhr1, "$filenamenew1/test2.txt";
my #line;
#line= <fhr>;
for (my $i=0;$i<=$#line;$i++)
{
if ($line[$i]=~m/rid="(rId[0-9])"/)
{
my $k = $1;
push (#test1, "$k");
}
}
my #file2;
#file2= <fhr1>;
for (my $i=0;$i<=$#file2;$i++)
{
if ($file2[$i]=~m/Id="(rId[0-9])"/)
{
my $k1 = $1;
push (#test2, "$k1");
foreach (#test1 = #test2)
{
print "equal";
}
}
}

One solution could be to read first the file with <imagesources> and save both the rid and the target in a hash. After that read the other file line by line and compare if the rid exists in the hash and do the substitution, something like:
Content of script.pl:
#!/usr/bin/env perl
use warnings;
use strict;
my (%hash);
open my $fh2, '<', shift or die;
open my $fh1, '<', shift or die;
while ( <$fh2> ) {
chomp;
if ( m/Id="(rId\d+)".*Target="([^"]*)"/i ) {
$hash{ $1 } = $2;
}
}
while ( <$fh1> ) {
if ( m/rId="([^"]+)"/i && defined $hash{ $1 } ) {
s//src="$hash{ $1 }"/;
}
print $_;
}
Run it like:
perl script.pl test2 test1
That yields:
<p><imagedata src="image/image1.jpg"></p>
...
<p><imagedata src="image/image3.jpg"></p>

Related

Search a list file of files for keywords

I have an array of files. Now I need to cat each file and search for a list of keywords which is in file keywords.txt.
my keywords.txt contains below
AES
3DES
MD5
DES
SHA-1
SHA-256
SHA-512
10.*
http://
www.
#john.com
john.com
and I'm expecting output as below
file jack.txt contains AES:5 (5 line number) http://:55
file new.txt contains 3DES:75 http://:105
Okay Here is my Code
use warnings;
use strict;
open STDOUT, '>>', "my_stdout_file.txt";
my $filename = $ARGV[2];
chomp ($filename);
open my $fh, q[<], shift or die $!;
my %keyword = map { chomp; $_ => 1 } <$fh>;
print "$fh\n";
while ( <> ) {
chomp;
my #words = split;
for ( my $i = 0; $i <= $#words; $i++ ) {
if ( $keyword{ $words[ $i ] } ) {
print "Keyword Found for file:$filename\n";
printf qq[$filename Line: %4d\tWord position: %4d\tKeyword: %s\n],
$., $i, $words[ $i ];
}
}
}
But the problem is its considering all the Arguments and trying to open the files for ARGV[2]. Actually i need to Open only ARGV[0] and ARGV[1]. ARGV[2] i kept for writing in output only.
Appreciate responses.
Thanks.
I think maybe there are multipe keywords in one line of those txt files. So below code for your reference:
my #keywords;
open IN, "keywords.txt";
while(<IN>) {
if(/^(.+)$/) {
push(#keywords, $1);
}
}
close(IN);
my #filelist=glob("*.txt");
foreach my $filename(#filelist) {
if(open my $fh, '<', $filename) {
my $ln=1;
while(my $line = <$fh>) {
foreach my $kw(#keywords) {
if($line=~/$kw/) {
print $filename.':'.$ln.':'.$kw."\n";
}
}
$ln++;
}
}
close($fh);
}
Ok but one thing i noticed here is..
the Keywords
http://
oracle.com
it is matching the whole word, instead it should try to match as part of strings.. Only these keywords.. Others should be searched as per the above.

File comparison in Perl - by line or substring

i m trying to compare 2 text files and i got down the following perl script, but for some reason even when i use the /same/ file as a base and filter, it doesnt output anything. I m really new to Perl, so apologies if any of this sounds base.
my $file_base = 'CSP8216.TXT';
my $file_filter = 'CSP8216.TXT';
open my $info_filter, $file_filter or die "Die: Could not open $file_filter: $!";
while(my $line_filter = <$info_filter>)
{
open my $info_base, $file_base or die "Die: Could not open $file_base: $!";
while(my $line_base = <$info_base>)
{
if("$line_filter"=="$line_base")
#if(substr($line_filter, 0, 11)==substr($line_base, 0, 11))
{
print $line_base;
}
}
close $info_bae;
}
close $info_filter;
Could someone point out why this doesnt seem to work?
Use eq to compare strings:
if($line_filter eq $line_base).
Also use use strict to see errors in your program
I would do it a little different
Obviously you probably done want to push the file into an array if they are large.
use strict;
use warnings;
use Data::Dumper;
my $file_base = '1.TXT';
my $file_filter = '2.TXT';
open ( FILTER, "<$file_filter" )
or die "Die: Could not open $file_filter: $!";
open ( BASE, "<$file_base" )
or die "Die: Could not open $file_base: $!";
my #filterArray = <FILTER>;
my #baseArray = <BASE>;
close BASE;
close FILTER;
unless( arrayDiff( \#filterArray , \#baseArray ) )
{
print "Success!";
}
sub arrayDiff {
my $array1 = shift(#_);
my $array2 = shift(#_);
my %array1_hash;
my %array2_hash;
# Create a hash entry for each element in #array1
for my $element ( #{$array1} ) {
$array1_hash{$element} = #{$array1};
}
# Same for #array2: This time, use map instead of a loop
map { $array2_hash{$_} = 1 } #{$array2};
for my $entry ( #{$array2} ) {
if ( not $array1_hash{$entry} ) {
return 1; #Entry in #array2 but not #array1: Differ
}
}
if ( keys %array1_hash != keys %array2_hash ) {
return 1; #Arrays differ
}
else {
return 0; #Arrays contain the same elements
}
}
perl 1156663.pl
Success!

Perl Help Needed: Replacing values

I am having an input file like this:
Input file
I need to replace the value #pSBSB_ID="*" of #rectype=#pRECTYPE="SBSB" with #pMEME_SSN="034184233", value of #pRECTYPE="SMSR", ..and have to delete the row where #rectype='#pRECTYPE="SMSR", '
Example:
So, after changes have been made, the file should be like this:
....#pRECTYPE="SBSB", #pGWID="17199269", #pINPUT_METHOD="E", #pGS08="005010X220A1", #pSBSB_FAM_UPDATE_CD="UP", #pSBSB_ID="034184233".....
....#pRECTYPE="SBEL", #pSBEL_EFF_DT="01/01/2013", #pSBEL_UPDATE_CD="TM", #pCSPD_CAT="M", #pCSPI_ID="MHMO1003"
.
.
.
Update
I tried below mentioned code:
Input file extension: mms and there are multiple files to process.
my $save_for_later;
my $record;
my #KwdFiles;
my $r;
my $FilePath = $ARGV[0];
chdir($FilePath);
#KwdFiles = <*>;
foreach $File(#KwdFiles)
{
unless(substr($File,length($File)-4,length($File)) eq '.mms')
{
next;
}
unless(open(INFILE, "$File"))
{
print "Unable to open file: $File";
exit(0);
}
print "Successfully opened the file: \"$File\" for processing\n\n";
while ( my $record = <INFILE> ) {
my %r = $record =~ /\#(\w+) = '(.*?)'/xg;
if ($r{rectype} eq "SMSR") {
$save_for_later = $r{pMEME_SSN};
next;
}
elsif ($r{rectype} eq "SBSB" and $r{pSBSB_ID} eq "*") {
$record =~ s|(\#pSBSB_ID = )'.*?'|$1'$save_for_later'|x;
}
close(INFILE);
}
}
But, I am still not getting the updated values in the file.
#!/usr/bin/perl
open IN, "< in.txt";
open OUT, "> out.txt";
my $CUR_RECID = 1^1;
while (<IN>) {
if ($CUR_RECID) {
s/recname='.+?'/recname='$CUR_RECID'/ if /rectype='DEF'/;
$CUR_RECID = 1^1;
print OUT;
}
$CUR_RECID = $1 if /rectype='ABC'.+?rec_id='(.+?)'/;
}
close OUT;
close IN;
Try that whole code. No need a separate function; This code does everything.
Run this script from your terminal with the files to be modified as arguments:
use strict;
use warnings;
$^I = '.bak'; #modify original file and create a backup of the old ones with .bak appended to the name
my $replacement;
while (<>) {
$replacement = $1 if m/(?<=\#pMEME_SSN=)("\d+")/; #assume replacement will be on the first line of every file.
next if m/^\s*\#pRECTYPE="SMSR"/;
s/(?<=\#pSBSB_ID=)("\*")/$replacement/g;
print;
}

How to write a correct name using combination of variable and string as a filehandler?

I want to make a tool to classify each line in input file to several files
but it seems have some problem in naming a filehandler so I can't go ahead , how do I solve?
here is my program
ARGV[0] is the input file
ARGV[1] is the number of classes
#!/usr/bin/perl
use POSIX;
use warnings;
# open input file
open(Raw,"<","./$ARGV[0]") or die "Can't open $ARGV[0] \n";
# create a directory class to store class files
system("mkdir","Class");
# create files for store class informations
for($i=1;$i<=$ARGV[1];$i++)
{
# it seems something wrong in here
open("Class$i",">","./Class/$i.class") or die "Can't create $i.class \n";
}
# read each line and random decide which class to store
while( eof(Raw) != 1)
{
$Line = readline(*Raw);
$Random_num = ceil(rand $ARGV[1]);
for($k=1;$k<=$ARGV[1];$k++)
{
if($Random_num == $k)
{
# Store to the file
print "Class$k" $Line;
last;
}
}
}
for($h=1;$h<=$ARGV[1];$h++)
{
close "Class$h";
}
close Raw;
thanks
Later I use the advice provided by Bill Ruppert
I put the name of filehandler into array , but it seems appear a syntax bug , but I can't correct it
I label the syntax bug with ######## A syntax error but it looks quite OK ########
here is my code
#!/usr/bin/perl
use POSIX;
use warnings;
use Data::Dumper;
# open input file
open(Raw,"<","./$ARGV[0]") or die "Can't open $ARGV[0] \n";
# create a directory class to store class files
system("mkdir","Class");
# put the name of hilehandler into array
for($i=0;$i<$ARGV[1];$i++)
{
push(#Name,("Class".$i));
}
# create files of classes
for($i=0;$i<=$#Name;$i++)
{
$I = ($i+1);
open($Name[$i],">","./Class/$I.class") or die "Can't create $I.class \n";
}
# read each line and random decide which class to store
while( eof(Raw) != 1)
{
$Line = readline(*Raw);
$Random_num = ceil(rand $ARGV[1]);
for($k=0;$k<=$#Name;$k++)
{
if($Random_num == ($k+1))
{
print $Name[$k] $Line; ######## A syntax error but it looks quite OK ########
last;
}
}
}
for($h=0;$h<=$#Name;$h++)
{
close $Name[$h];
}
close Raw;
thanks
To quote the Perl documentation on the print function:
If you're storing handles in an array or hash, or in general whenever you're using any expression more complex than a bareword handle or a plain, unsubscripted scalar variable to retrieve it, you will have to use a block returning the filehandle value instead, in which case the LIST may not be omitted:
print { $files[$i] } "stuff\n";
print { $OK ? STDOUT : STDERR } "stuff\n";
Thus, print $Name[$k] $Line; needs to be changed to print { $Name[$k] } $Line;.
How about this one:
#! /usr/bin/perl -w
use strict;
use POSIX;
my $input_file = shift;
my $file_count = shift;
my %hash;
open(INPUT, "<$input_file") || die "Can't open file $input_file";
while(my $line = <INPUT>) {
my $num = ceil(rand($file_count));
$hash{$num} .= $line
}
foreach my $i (1..$file_count) {
open(OUTPUT, ">$i.txt") || die "Can't open file $i.txt";
print OUTPUT $hash{$i};
close OUTPUT;
}
close INPUT;

Perl script.file handling issues

I have written a Perl script:
#!/usr/bin/perl
use strict;
use warnings;
my $file_name;
my $ext = ".text";
my $subnetwork2;
my %files_list = ();
opendir my $dir, "." or die "Cannot open directory: $!";
my #files = readdir $dir;
sub create_files() {
my $subnetwork;
open(MYFILE, 'file.txt');
while (<MYFILE>) {
if (/.subnetwork/) {
my #string = split /[:,\s]+/, $_;
$subnetwork = $string[2];
}
if (/.set/ && (defined $subnetwork)) {
my #string = split /[:,\s]+/, $_;
my $file = $subnetwork . $string[1];
open FILE, ">", "$file.text" or die $!;
close(FILE);
}
}
close(MYFILE);
}
sub create_hash() {
foreach (#files) {
if (/.text/) {
open($files_list{$_}, ">>$_") || die("This file will not open!");
}
}
}
sub init() {
open(MYFILE3, 'file.txt');
while (<MYFILE3>) {
if (/.subnetwork/) {
my #string3 = split /[:,\s]+/, $_;
$subnetwork2 = $string3[2];
last;
}
}
close(MYFILE3);
}
sub main_process() {
init;
create_files;
create_hash;
open(MYFILE1, 'file.txt');
while (<MYFILE1>) {
if (/.subnetwork/) {
my #string3 = split /[:,\s]+/, $_;
$subnetwork2 = $string3[2];
}
if (/.set/) {
my #string2 = split /[:,\s]+/, $_;
$file_name = $subnetwork2 . $string2[1] . $ext;
}
if (/.domain/ || /.end/ || ($. < 6)) {
my $domain = $_;
foreach (#files) {
if (/.text/ && /$subnetwork2/) {
prnt { $files_list{$_} } "$domain";
}
}
}
elsif ($. >= 6) {
print { $files_list{$file_name} } "$_";
}
}
close(MYFILE1);
foreach my $val (values %files_list) { close($val); }
closedir $dir;
}
main_process;
This script creates files in the current directory based upon the content of file.txt, and then open those files again.
Then it starts processing file.txt and redirects the lines according to the filename set dynamically.
This setting of the file name is also based upon the data in the file file.txt.
The problem that I am facing here is that the redirection is only to a single file. That means there is some problem with the file handle.
All the files that are expected to be created are created perfectly but the data goes into only one of them.
I doubt if there is a problem with the file handle that I am using while redirecting.
Could anyone please help?
Sample input file is below:
..cnai #Generated on Thu Aug 02 18:33:18 2012 by CNAI R21D06_EC01, user tcssrpi
..capabilities BASIC
.utctime 2012-08-02 13:03:18
.subnetwork ONRM_ROOT_MO:NETSim_BAG
.domain BSC
.set BAG01
AFRVAMOS="OFF"
AWBVAMOS="OFF"
ALPHA=0
AMRCSFR3MODE=1,3,4,7
AMRCSFR3THR=12,21,21
AMRCSFR3HYST=2,3,3
AMRCSFR3ICM=
AMRCSFR4ICM=
USERDATA=""
.set BAG02
AFRVAMOS="OFF"
AWBVAMOS="OFF"
ALPHA=0
AMRCSFR3MODE=1,3,4,7
AMRCSFR3THR=12,21,21
AMRCSFR3HYST=2,3,3
..end
The problem that i am facing is during execution:
> process.pl
Use of uninitialized value in ref-to-glob cast at process.pl line 79, <MYFILE1> line 6.
Can't use string ("") as a symbol ref while "strict refs" in use at process.pl line 79, <MYFILE1> line 6.
The problem i can understand is with this line:
print { $files_list{$_} } "$domain";
but i am unable to understand why!!
The output i need is :
> cat NETSim_BAGBAG01.text
.set BAG01
AFRVAMOS="OFF"
AWBVAMOS="OFF"
ALPHA=0
AMRCSFR3MODE=1,3,4,7
AMRCSFR3THR=12,21,21
AMRCSFR3HYST=2,3,3
AMRCSFR3ICM=
AMRCSFR4ICM=
USERDATA=""
> cat NETSim_BAGBAG02.text
.set BAG02
AFRVAMOS="OFF"
AWBVAMOS="OFF"
ALPHA=0
AMRCSFR3MODE=1,3,4,7
AMRCSFR3THR=12,21,21
AMRCSFR3HYST=2,3,3
>
Your problem in following lines:
open(PLOT,">>$_") || die("This file will not open!");
$files_list{$_}=*PLOT;
You should replace they with:
open($files_list{$_},">>$_") || die("This file will not open!");
This portion of your code is the key:
open(PLOT,">>$_") || die("This file will not open!");
$files_list{$_}=*PLOT;
The problem is that you are essentially using the filehandle PLOT as a global variable; every single entry in your hash is pointing to this same filehandle. Replace with something like this:
local *PLOT;
open(PLOT,">>$_") || die("This file will not open!");
$files_list{$_}=*PLOT;
You have got youself very entangled with this program. There is no need for the hash table or the multiple subroutines.
Here is a quick refactoring of your code that works with your data and writes files NETSim_BAG.BAG01.text and NETSim_BAG.BAG02.text. I put a dot between the subnet and the set to make the names a little clearer.
use strict;
use warnings;
my $out_fh;
open my $fh, '<', 'file.txt' or die $!;
my ($subnetwork, $set, $file);
while (<$fh>) {
if ( /^\.subnetwork\s+\w+:(\w+)/ ) {
$subnetwork = $1;
}
elsif ( /^\.set\s+(\w+)/ and $subnetwork) {
$set = $1;
$file = "$subnetwork.$set.text";
open $out_fh, '>', $file or die qq(Unable to open "$file" for output: $!);
print $out_fh;
}
elsif ( /^\.\.end/ ) {
undef $subnetwork;
undef $file;
}
if (/^[^.]/ and $file) {
print $out_fh $_;
}
}