File truncated, when opened in Perl - perl

Im new to perl, so sorry if this is obvious, but i looked up how to open a file, and use the flags, but for the life of me they dont seem to work right I narrowed it down to these lines of code.
if ($flag eq "T"){
open xFile, ">" , "$lUsername\\$openFile";
}
else
{
open xFile, ">>", "$lUsername\\$openFile";
}
Both of these methods seem to delete the contents of my file. I also checked if the flag is formatted correctly and it is, i know for a fact ive gone down both conditions.
EDIT: codepaste of a larger portion of my code http://codepaste.net/n52sma

New to Perl? I hope you're using use strict and use warnings.
As other's have stated, you should be using a test to make sure your file is open. However, that's not really the problem here. In fact, I used your code, and it seems to work fine for me. Maybe you should try printing some debugging messages to see if this is doing what you think it's doing:
use strict;
use warnings;
use autodie; #Will stop your program if the "open" doesn't work.
my $lUsername = "ABaker";
my $openFile = "somefile.txt";
if ($flag eq "T") {
print qq(DEBUG: Flag = "$flag": Deleting file "$lUsername/$openFile");
open xFile, ">" , "$lUsername/$openFile";
}
else {
print qq(DEBUG: Flag = "$flag": Appending file "$lUsername/$openFile");
open xFile, ">>", "$lUsername/$openFile";
}
You want to use strict and warnings in order to make sure you're not having issues with variable names. The use strict forces you to declare your variables first. For example, are you setting $Flag, but then using $flag? Maybe $flag is set the first time through, but you're setting $Flag the second time through.
Anyway, the DEBUG: statements will give you a better idea of what your error could be.
By the way, in Perl, you're checking if $flag is set to T and not t. If you want to test against both t and T, test whether uc $flag eq 'T' and not just $flag eq 'T'.
#Ukemi
I reformated to comply with use strict, i also made print statements to make sure i was trunctating when i want to, and not when i dont. It still is deleting the file. Although now sometimes its simply not writing, im going to give a larger portion of my code in a link, id really appreciate it if you gave it a once over.
Are you seeing it say Truncating, but the file is empty? Are you sure the file already existed? There's a reason why I put the flag and everything in my debug statements. The more you print, the more you know. Try the following section of code:
$file = "lUsername/$openFile" #Use forward slashes vs. back slashes.
if ($flag eq "T") {
print qq(Flag = "$flag". Truncating file "$file"\n);
open $File , '>', $file
or die qq(Unable to open file "$file" for writing: $!\n);
}
else {
print qq(Flag = "$flag". Appending to file "$file"\n);
if (not -e $file) {
print qq(File "$file" does not exist. Will create it\n");
}
open $File , '>>', $file
or die qq(Unable to open file "$file" for appending: $!\n);
}
Note I'm printing out the flag and the name of the file in quotes. This will allow me to see if there are any hidden characters in my file name.
I'm using the qq(...) method to quote strings, so I can use the quotation marks in my print statements.
Also note I'm checking for the existence of the file when I truncate. This way, I make sure the file actually exists.
This should point out any possible errors in your logic. The other thing you can do is to stop your program when you finish writing out the file and verify that the file was written out as expected.
print "Write to file now:\n";
my $writeToFile = <>;
printf $File "$writeToFile";
close $File;
print "DEBUG: Temporary stop. Examine file\n";
<STDIN>; #DEBUG:
Now, if you see it saying it's appending to the file, and the file exists, and you still see the file being overwritten, we'll know the problem lies in your actual open xFile, ">>" $file statement.

You should use the three-argument-version of open, lexical filehandles and check wether there might have been an error:
# Writing to file (clobbering it if it exists)
open my $file , '>', $filename
or die "Unable to write to file '$filename': $!";
# Appending to file
open my $file , '>>', $filename
or die "Unable to append to file '$filename': $!";

>> does not clobber or truncate. Either you ended up in the "then" clause when you expected to be in the "else" clause, or the problem is elsewhere.
To check what $flag contains:
use Data::Dumper;
local $Data::Dumper::Useqq = 1;
print(Dumper($flag));

For your reference I have mentioned some basic file handling techniques below.
open FILE, "filename.txt" or die $!;
The command above will associate the FILE filehandle with the file filename.txt. You can use the filehandle to read from the file. If the file doesn't exist - or you cannot read it for any other reason - then the script will die with the appropriate error message stored in the $! variable.
open FILEHANDLE, MODE, EXPR
The available modes are the following:
read < #this mode will read the file
write > # this mode will create the new file. If the file already exists it will truncate and overwrite.
append >> #this will append the contents if the file already exists,else it will create new one.
if you have confusion on this, you can use the module called File::Slurp;
I have mentioned the sample codes using File::Slurp module.
use strict;
use File::Slurp;
my $read_mode=read_file("test.txt"); #to read file contents
write_file("test2.txt",$read_mode); #to write file
my #all_files=read_dir("/home/desktop",keep_dot_dot=>0); #read a dir
write_file("test2.txt",{append=>1},"#all_files"); #Append mode

Related

How to match and find common based on substring from two files?

I have two files. File1 contains list of email addresses. File2 contains list of domains.
I want to filter out all the email addresses after matching exact domain using Perl script.
I am using below code, but I don't get correct result.
#!/usr/bin/perl
#use strict;
#use warnings;
use feature 'say';
my $file1 = "/home/user/domain_file" or die " FIle not found\n";
my $file2 = "/home/user/email_address_file" or die " FIle not found\n";
my $match = open(MATCH, ">matching_domain") || die;
open(my $data1, '<', $file1) or die "Could not open '$file1' $!\n";
my #wrd = <$data1>;
chomp #wrd;
# loop on the fiile to be searched
open(my $data2, '<', $file2) or die "Could not open '$file2' $!\n";
while(my $line = <$data2>) {
chomp $line;
foreach (#wrd) {
if($line =~ /\#$_$/) {
print MATCH "$line\n";
}
}
}
File1
abc#1gmail.com.au
abc#gmail.com
abc#gmail.com1
abc#2outlook.com2
abc#outlook.com1
abc#yahoo.com
abc#yahooo1.com
abc#yahooo.com
File2
yahoo.com
gmail.com
Expected output
abc#gmail.com
abc#yahoo.com
First off, since you seem to be on *nix, you might want to check out grep -f, which can take search patterns from a given file. I'm no expert in grep, but I would try the file and "match whole words" and this should be fairly easy.
Second: Your Perl code can be improved, but it works as expected. If you put the emails and domains in the files as indicated by your code. It may be that you have mixed the files up.
If I run your code, fixing only the paths, and keeping the domains in file1, it does create the file matching_domain and it contains your expected output:
abc#gmail.com
abc#yahoo.com
So I don't know what you think your problem is (because you did not say). Maybe you were expecting it to print output to the terminal. Either way, it does work, but there are things to fix.
#use strict;
#use warnings;
It is a huge mistake to remove these two. Biggest mistake you will ever do while coding Perl. It will not remove your errors, just hide them. You will spend 10 times as much time bug fixing. Uncomment this as your first thing you do to fix this.
use feature 'say';
You never use this. You could for example replace print MATCH "$line\n" with say MATCH $line, which is slightly more concise.
my $file1 = "/home/user/domain_file" or die " FIle not found\n";
my $file2 = "/home/user/email_address_file" or die " FIle not found\n";
This is very incorrect. You are placing a condition on the creation of a variable. If the condition fails, does the variable exist? Don't do this. I assume this is to check if the file exists, but that is not what this does. To check if a file exists, you can use -e, documented as perldoc "-X" (various file tests).
Furthermore, a statement in the form of a string, "/home/user..." is TRUE ("truthy"), as far as Perl conditions are concerned. It is only false if it is "0" (zero), "" (empty) or undef (undefined). So your or clause will never be executed. E.g. "foo" or die will never die.
Lastly, this test is quite meaningless, as you will be testing this in your open statement later on anyway. If the file does not exist, the open will fail and your program will die.
my $match = open(MATCH, ">matching_domain") || die;
This is also very incorrect. First off, you never use the $match variable. Secondly, I bet it does not contain what you think it does. (it contains a boolean which states whether open was successful or not, see perldoc -f open) Thirdly, again, don't put conditions on my declarations of variables, it is a bad idea.
What this statement really means is that $match will contain either the return value of the open, or the return value of die. This should probably be simply:
open my $match, ">", "matching_domain" or die "Cannot open '$match': $!;
Also, use the three argument open with explicit open MODE, and use lexical file handles, like you have done elsewhere.
And one more thing on top of all the stuff I've already badgered you with: I don't recommend hard coding output files for small programs like this. If you want to redirect the output, use shell redirection: perl foo.pl > output.txt. I think this is what has prompted you to think something is wrong with your code: You don't see the output.
Other than that, your code is fine, as near as I can tell. You may want to chomp the lines from the domain file, but it should not matter. Also remember that indentation is a good thing, and it helps you read your code. I mentioned this in a comment, but it was removed for some reason. It is important though.
Good luck!
This assumes that the lines labeled File1 are in the file pointed to by $file1 and the lines labeled File2 are in the file pointed to by $file2.
You have your variables swapped. You want to match what is in $line against $_, not the other way around:
# loop on the file to be searched
open( my $data2, '<', $file2 ) or die "Could not open '$file2' $!\n";
while ( my $line = <$data2> ) {
chomp $line;
foreach (#wrd) {
if (/\#$line$/) {
print MATCH "$_\n";
}
}
}
You should un-comment the warnings and strict lines:
use strict;
use warnings;
warnings shows you that the or die checks are not really working the way you intended in the file name assignment statements. Just use :
my $file1 = "/home/user/domain_file";
my $file2 = "/home/user/email_address_file";
You are already doing the checks where they belong (on open).

Replace multiple lines in text file

I have text files containing the text below (amongst other text)
DIFF_COEFF= 1.000e+07,1.000e+07,1.000e+07,1.000e+07,
1.000e+07,1.000e+07,1.000e+07,1.000e+07,1.000e+07,1.000e+07,1.000e+07,
1.000e+07,1.000e+07,1.000e+07,1.000e+07,1.000e+07,1.000e+07,1.000e+07,
1.000e+07,1.000e+07,1.000e+07,1.000e+07,1.000e+07,1.000e+07,1.000e+07,
1.000e+07,1.000e+07,1.000e+07,1.000e+07,1.000e+07,1.000e+07,1.000e+07,
1.000e+07,1.000e+07,1.000e+07,1.000e+07,1.000e+07,4.000e+05,
and I need to replace it with the following text:
DIFF_COEFF= 2.000e+07,2.000e+07,2.000e+07,2.000e+07,
2.000e+07,2.000e+07,2.000e+07,2.000e+07,2.000e+07,2.000e+07,2.000e+07,
2.000e+07,2.000e+07,2.000e+07,2.000e+07,2.000e+07,2.000e+07,2.000e+07,
2.000e+07,2.000e+07,2.000e+07,2.000e+07,2.000e+07,2.000e+07,2.000e+07,
2.000e+07,2.000e+07,2.000e+07,2.000e+07,2.000e+07,2.000e+07,2.000e+07,
2.000e+07,2.000e+07,2.000e+07,2.000e+07,2.000e+07,8.000e+05,
Each line above corresponds to a new line in the text file.
After some googling, I thought making use of Perl in the following might work, but it did not. I got the error message
Illegal division by zero at -e line 1, <> chunk 1
s_orig='DIFF_COEFF=*4.000e+05,'
s_new='DIFF_COEFF= 2.000e+07,2.000e+07,2.000e+07,2.000e+07,\n2.000e+07,2.000e+07,2.000e+07,2.000e+07,2.000e+07,2.000e+07,2.000e+07,\n2.000e+07,2.000e+07,2.000e+07,2.000e+07,2.000e+07,2.000e+07,2.000e+07,\n2.000e+07,2.000e+07,2.000e+07,2.000e+07,2.000e+07,2.000e+07,2.000e+07,\n2.000e+07,2.000e+07,2.000e+07,2.000e+07,2.000e+07,2.000e+07,2.000e+07,\n2.000e+07,2.000e+07,2.000e+07,2.000e+07,2.000e+07,8.000e+05,'
perl -0 -i -pe "s:\Q${s_orig}\E:${s_new}:/igs" file.txt
Does anyone here know the right way to do this?
Edit - some more details: the text after this block is "DIFF_COEFF_Q=" followed by the same set of numbers, so I need to search for and replace the specific lines shown. The text files are not very large in size.
Copy the file over to a new one, except that within the range of text between these markers drop the replacement text instead. Then move that file to replace the original, as it may be needed judging by the attempted perl -0 -i in the question.
Note that when changing a file we have to build new content and then replace the file. There are a few ways to do this and modules that make it easier, shown further below.
The code below uses the range operator and the fact that it returns the counter for lines within the range, 1 for the first and the number ending with E0 for the last. So we don't copy lines inside that region while we write the replacement text (and the post-region-end marker) on the last line.
I consider the region of interest to end right before DIFF_COEFF_Q= line, per the question edit.
use warnings;
use strict;
use feature 'say';
use File::Copy 'move';
my $replacement = "replacement text";
my $file = 'input.txt';
my $out_file = 'new_' . $file;
open my $fh_out, '>', $out_file or die "Can't open $out_file: $!";
open my $fh, '<', $file or die "Can't open $file: $!";
while (<$fh>)
{
if (my $range_cnt = /^\s*DIFF_COEFF\s*=/ .. /^\s*DIFF_COEFF_Q\s*=/) #/
{
if ($range_cnt =~ /E0$/)
{
print $fh_out $replacement; # may need a newline
print $fh_out $_;
}
}
else {
print $fh_out $_;
}
}
close $fh or die "Can't close $file: $!"; # don't overwrite original
close $fh_out or die "Can't close $out_file: $!"; # if there are problems
#move $out_file, $file or die "Can't move $file to $out_file: $!";
Uncomment the move line once this has been tested well enough on your actual files, if you want to replace the original. You may or may not need a newline after $replacement, depending on it.
An alternative is to use flags for entering/leaving that range. But this won't be cleaner since there are two distinct actions, to stop copying when entering the range and write replacement when leaving. Thus multiple flags need be set and checked, what may end up messier.
If the files can't ever be huge it is simpler to read and process the file in memory. Then open the same file for writing and dump the new content
my $text = do { # slurp file into a scalar
local $/;
open my $fh, '<', $file or die "Can't open $file: $!";
<$fh>
};
$text =~ s/^\s*DIFF_COEFF\s*=.*?(\n\s*DIFF_COEFF_Q)/$replacement$1/ms;
# Change $out_file to $file to overwrite
open my $fh_out, '>', $out_file or die "Can't open $out_file: $!";
print $fh_out $text;
Here /m modifier is for multiline mode in which we can use ^ for the beginning of a line (not the whole string), what is helpful here. The /s makes . match a newline, too. Also note that we can slurp a file with Path::Tiny as simply as: my $text = path($file)->slurp;
Another option is to use Path::Tiny, which in newer versions has edit and edit_lines methods
use Path::Tiny;
# NOTE: edits $file in place (changes it)
path($file)->edit(
sub { s/DIFF_COEFF=.*?(\n\s*DIFF_COEFF_Q)/$replacement$1/s }
);
For more on this see, for example, this post and this post and this post.
The first and last way change the inode number of the file. See this post if that is a problem.
It's an interesting error that you've made and I can see what has led you to make it. But I don't think I've ever seen anyone else make the same mistake :-)
Your substitution statement is this:
s:\Q${s_orig}\E:${s_new}:/igs
So you've decided to use : as the delimiter of the substitution operator. But you want to use the options i, g and s and everywhere you've seen people talk about options on a substitution operator, they talk about using / to introduce the options. So you've added /igs to your substitution operator.
But what you've missed (and I completely understand why) is that the / that comes before the options is actually the closing delimiter of the standard, s/.../.../, version of the substitution operator. If you change the delimiter (as you have done) then your altered closing delimiter is all you need.
In your case, Perl doesn't expect the / as it has already seen the closing delimiter. It, therefore, decides that the / is a division operator and tries to divide the result of your substitution by igs. It interprets igs as zero and you get your error.
The fix is to remove that / so:
s:\Q${s_orig}\E:${s_new}:/igs
becomes:
s:\Q${s_orig}\E:${s_new}:igs

perl: canot open file within a loop

I am trying to read in a bunch of similar files and process them one by one. Here is the code I have. But somehow the perl script doesn't read in the files correctly. I'm not sure how to fix it. The files are definitely readable and writable by me.
#!/usr/bin/perl
use strict;
use warnings;
my #olap_f = `ls /full_dir_to_file/*txt`;
foreach my $file (#olap_f){
my %traits_h;
open(IN,'<',$file) || die "cannot open $file";
while(<IN>){
chomp;
my #array = split /\t/;
my $trait = $array[4];
$traits_h{$trait} ++;
}
close IN;
}
When I run it, the error message (something like below) showed up:
cannot open /full_dir_to_file/a.txt
You have newlines at the end of each filename:
my #olap_f = `ls ~dir_to_file/*txt`;
chomp #olap_f; # Remove newlines
Better yet, use glob to avoid launching a new process (and having to trim newlines):
my #olap_f = glob "~dir_to_file/*txt";
Also, use $! to find out why a file couldn't be opened:
open(IN,'<',$file) || die "cannot open $file: $!";
This would have told you
cannot open /full_dir_to_file/a.txt
: No such file or directory
which might have made you recognize the unwanted newline.
I'll add a quick plug for IO::All here. It's important to know what's going on under the hood but it's convenient sometimes to be able to do:
use IO::All;
my #olap_f = io->dir('/full_dir_to_file/')->glob('*txt');
In this case it's not shorter than #cjm's use of glob but IO::All does have a few other convenient methods for working with files as well.

Perl: Substitute text string with value from list (text file or scalar context)

I am a perl novice, but have read the "Learning Perl" by Schwartz, foy and Phoenix and have a weak understanding of the language. I am still struggling, even after using the book and the web.
My goal is to be able to do the following:
Search a specific folder (current folder) and grab filenames with full path. Save filenames with complete path and current foldername.
Open a template file and insert the filenames with full path at a specific location (e.g. using substitution) as well as current foldername (in another location in the same text file, I have not gotten this far yet).
Save the new modified file to a new file in a specific location (current folder).
I have many files/folders that I want to process and plan to copy the perl program to each of these folders so the perl program can make new .
I have gotten so far ...:
use strict;
use warnings;
use Cwd;
use File::Spec;
use File::Basename;
my $current_dir = getcwd;
open SECONTROL_TEMPLATE, '<secontrol_template.txt' or die "Can't open SECONTROL_TEMPLATE: $!\n";
my #secontrol_template = <SECONTROL_TEMPLATE>;
close SECONTROL_TEMPLATE;
opendir(DIR, $current_dir) or die $!;
my #seq_files = grep {
/gz/
} readdir (DIR);
open FASTQFILENAMES, '> fastqfilenames.txt' or die "Can't open fastqfilenames.txt: $!\n";
my #fastqfiles;
foreach (#seq_files) {
$_ = File::Spec->catfile($current_dir, $_);
push(#fastqfiles,$_);
}
print FASTQFILENAMES #fastqfiles;
open (my ($fastqfilenames), "<", "fastqfilenames.txt") or die "Can't open fastqfilenames.txt: $!\n";
my #secontrol;
foreach (#secontrol_template) {
$_ =~ s/#/$fastqfilenames/eg;
push(#secontrol,$_);
}
open SECONTROL, '> secontrol.txt' or die "Can't open SECONTROL: $!\n";
print SECONTROL #secontrol;
close SECONTROL;
close FASTQFILENAMES;
My problem is that I cannot figure out how to use my list of files to replace the "#" in my template text file:
my #secontrol;
foreach (#secontrol_template) {
$_ =~ s/#/$fastqfilenames/eg;
push(#secontrol,$_);
}
The substitute function will not replace the "#" with the list of files listed in $fastqfilenames. I get the "#" replaced with GLOB(0x8ab1dc).
Am I doing this the wrong way? Should I not use substitute as this can not be done, and then rather insert the list of files ($fastqfilenames) in the template.txt file? Instead of the $fastqfilenames, can I substitute with content of file (e.g. s/A/{r file.txt ...). Any suggestions?
Cheers,
JamesT
EDIT:
This made it all better.
foreach (#secontrol_template) {
s/#/$fastqfilenames/g;
push #secontrol, $_;
}
And as both suggestions, the $fastqfiles is a filehandle.
replaced this: open (my ($fastqfilenames), "<", "fastqfilenames.txt") or die "Can't open fastqfilenames.txt: $!\n";
with this:
my $fastqfilenames = join "\n", #fastqfiles;
made it all good. Thanks both of you.
$fastqfilenames is a filehandle. You have to read the information out of the filehandle before you can use it.
However, you have other problems.
You are printing all of the filenames to a file, then reading them back out of the file. This is not only a questionable design (why read from the file again, since you already have what you need in an array?), it also won't even work:
Perl buffers file I/O for performance reasons. The lines you have written to the file may not actually be there yet, because Perl is waiting until it has a large chunk of data saved up, to write it all at once.
You can override this buffering behavior in a few different ways (closing the file handle being the simplest if you are done writing to it), but as I said, there is no reason to reopen the file again and read from it anyway.
Also note, the /e option in a regex replacement evaluates the replacement as Perl code. This is not necessary in your case, so you should remove it.
Solution: Instead of reopening the file and reading it, just use the #fastqfiles variable you previously created when replacing in the template. It is not clear exactly what you mean by replacing # with the filenames.
Do you want to to replace each # with a list of all filenames together? If so, you should probably need to join the filenames together in some way before doing the replacement.
Do you want to create a separate version of the template file for each filename? If so, you need an inner for loop that goes over each filename for each template. And you will need something other than a simple replacement, because the replacement will change the original string on the first time through. If you are on Perl 5.16, you could use the /r option to replace non-destructively: push(#secontrol,s/#/$file_name/gr); Otherwise, you should copy to another variable before doing the replacement.
$_ =~ s/#/$fastqfilenames/eg;
$fastqfilenames is a file handle, not the file contents.
In any case, I recommend the use of Text::Template module in order to do this kind of work (file text substitution).

How to read and write a file, syntax wrong

I end up having my script appending the new changes that I wanted to make to the end of the file instead of in the actual file.
open (INCONFIG, "+<$text") or die $!;
#config = <INCONFIG>;
foreach(#config)
{
if ( $_ =~ m/$checker/ )
{
$_ = $somethingnew;
}
print INCONFIG $_;
}
close INCONFIG or die;
Ultimately I wanted to rewrite the whole text again, but with certain strings modified if it matched the search criterion. But so far it only appends ANOTHER COPY of the entire file(with changes) to the bottom of the old file.
I know that I can just close the file, and use another write file -handle and parse it in. But was hoping to be able to learn what I did wrong, and how to fix it.
As I understand open, using read/write access for a text file isn't a good idea. After all a file just is a byte stream: Updating a part of the file with something of a different length is the stuff headaches are made of ;-)
Here is my approach: Try to emulate the -i "inplace" switch of perl. So essentially we write to a backup file, which we will later rename. (On *nix system, there is some magic with open filehandles keeping deleted files available, so we don't have to create a new file. Lets do it anyway.)
my $filename = ...;
my $tempfile = "$filename.tmp";
open my $inFile, '<', $filename or die $!;
open my $outFile, '>', $tempfile or die $!;
while (my $line = <$inFile>) {
$line = doWeirdSubstitutions($line);
print $outFile $line;
}
close $inFile or die $!;
close $outFile or die $!;
rename $tempfile, $filename
or die "rename failed: $!"; # will break under weird circumstances.
# delete temp file
unlink $tempfile or die $!;
Untested, but obvious code. Does this help with your problem?
Your problem is a misunderstanding of what <+ "open for update" does. It is discussed in the Perl Tutorial at
Mixing Reads and Writes.
What you really want to do is copy the old file to a new file and then rename it after the fact. This is discussed in the perlfaq5 mentioned by daxim. Plus there are entire modules dedicated to doing this safely, such as File::AtomicWrite. These help with the issue of your program aborting and leaving you with a clobbered file.
As others pointed out, there are better ways :)
But if you really want to read and write using +<, you should remember that, after reading the file, you're at the end of the file... That explains that your output is appended after the original content.
What you need to do is reset the file-pointer to the beginning of the file, using seek:
seek(INCONFIG ,0,0);
Then start writing...
perlopentut says this about mixing reads and writes
In fact, when it comes to updating a file, unless you're working on a
binary file as in the WTMP case above, you probably don't want to use
this approach for updating. Instead, Perl's -i flag comes to the
rescue.
Another way is to use the Tie::File module. The code reduces to just this:
tie my #config, 'Tie::File', $text or die $!;
s/$checker/$somethingnew/g for #config;
But remember to back the file up before you modify it until you have debugged your program.