Splitting a string into multiple lines and write to file - perl

With Ensembl API, there is a seq() method which prints a sequence in one line (1xN characters). The length of the sequence may be long (order of 10K). So, I want to split the string for every 100 characters. Currently, to write into a file, I use
my $outfilename = $row;
open(my $ofh, '>', $outfilename) or die "Could not open file '$outfilename' $!";
print $ofh $gene->seq();
Where $gene has been defined in the code.
How can I do that?

This is easiest using unpack
open my $ofh, '>', $outfilename or die qq{Could not open file "$outfilename" for output: $!};
print $ofh "$_\n" for unpack '(A100)*', $gene->seq;

my $outfilename = $row;
open(my $ofh, '>', $outfilename) or die "Could not open file '$outfilename' $!";
{
my $gene_seq = $gene->seq();
my #chunks = $gene_seq =~ /(.{1,100})/sg; # split $gene_seq into 100 chars chunks
print $ofh join("\n",#chunks),"\n";
}

Related

Perl iterating through each line in a file and and appending to the end of each line in another file - follow up

I have a follow up question regarding an ealier post.
The post in question is:
Perl iterating through each line in a file and appending to the end of each line in another file
I used:
use warnings;
use strict;
open my $animals, '<', 'File1.txt' or die "Can't open animals: $!";
open my $payloads, '<', 'File2.txt' or die "Can't open payloads: $!";
my #payloads = <$payloads>; #each line of the file into an array
close $payloads or die "Can't close payloads: $!";
while (my $line = <$animals>) {
chomp $line;
print $line.$_ foreach (#payloads);
}
close $animals or die "Can't close animals: $!";
This works fine for files that look like this:
file 1: file 2:
line1 lineA
line2 lineB
line3 lineC
but not for files that look like this:
<01 line1
<02 line2
So what I want to do is the following:
file 1: file 2:
<01 line1 <AA lineAA
<02 line2 <AB lineAB
should become:
file 3:
<01_AA line1lineAA
<01_AB line1lineAB
<02_AA line2lineAA
<02_AB line2lineAB
I have tried to solve it by splitting the strings on the tab using while loops in while loops (see below), but I cannot get it to work.
my script:
#!C:/perl64/bin/perl.exe
use warnings;
use strict;
open my $file1, '<', 'file1.fasta' or die "Can't open file1: $!";
open my $file2, '<', 'file2.fasta' or die "Can't open file2: $!";
open(OUT, '>', 'file3.fasta') or die "Cannot write $!";
while (<$file2>)
{
chomp;
my ($F2_Id, #SF2_seq) = split (/\t/, $_);
while (<$file1>)
{
chomp;
my ($F1_Id, #F1_seq) = split (/\t/, $_);
foreach my $seq (#F1_seq)
{
print OUT $F1_Id,"_",$F2_Id,"\t",$seq.$_ foreach (#F2_seq),"\n";
}
close;
}
}
I started with perl just recently so I can imagine that there are a lot of faults in the script.
I'm sorry for the really long post, but I would appriciate any help.
You can store the id and seq of the first file in an array of arrays.
You also have to replace the < in the second file with _.
#!/usr/bin/perl
use warnings;
use strict;
open my $LEFT, '<', 'file1.fasta' or die "Can't open file1: $!";
open my $RIGHT, '<', 'file2.fasta' or die "Can't open file2: $!";
open my $OUT, '>', 'file3.fasta' or die "Cannot write: $!";
my #left;
while (<$LEFT>) {
chomp;
push #left, [ split /\t/ ];
}
while (<$RIGHT>) {
chomp;
my ($id, $seq) = split /\t/;
$id =~ s/</_/;
print {$OUT} "$_->[0]$id\t$_->[1]$seq\n" for #left;
}
close $OUT or die "Cannot close: $!";

Read and write file bit by bit

There is a .jpg file for example or some other file. I want to read it bit by bit. I do this:
open(FH, "<", "red.jpg") or die "Error: $!\n";
my $str;
while(<FH>) {
$str .= unpack('B*', $_);
}
close FH;
Well it gives me $str with 0101001 of the file. After that I do this:
open(AB, ">", "new.jpg") or die "Error: $!\n";
binmode(AB);
print AB $str;
close AB;
but it doesn't work.
How can I do it? and how to do that that it would work regardless of byte order(cross-platform)?
Problems:
You're didn't use binmode when reading too.
It makes no sense to read a binary file line by line since they don't have lines.
You're needlessly using global variables for your file handles.
And the one that answers your question: You didn't reverse the unpack.
open(my $FH, "<", "red.jpg")
or die("Can't open red.jpg: $!\n");
binmode($FH);
my $file; { local $/; $file = <$FH>; }
my $binary = unpack('B*', $file);
open(my $FH, ">", "new.jpg")
or die("Can't create new.jpg: $!\n");
binmode($FH);
print $FH pack('B*', $binary);

copy text after a specific string from a file and append to another in perl

I want to extract the desired information from a file and append it into another. the first file consists of some lines as the header without a specific pattern and just ends with the "END OF HEADER" string. I wrote the following code for find the matching line for end of the header:
$find = "END OF HEADER";
open FILEHANDLE, $filename_path;
while (<FILEHANDLE>) {
my $line = $_;
if ($line =~ /$find/) {
#??? what shall I do here???
}
}
, but I don't know how can I get the rest of the file and append it to the other file.
Thank you for any help
I guess if the content of the file isn't enormous you can just load the whole file in a scalar and just split it with the "END OF HEADER" then print the output of the right side of the split in the new file (appending)
open READHANDLE, 'readfile.txt' or die $!;
my $content = do { local $/; <READHANDLE> };
close READHANDLE;
my (undef,$restcontent) = split(/END OF HEADER/,$content);
open WRITEHANDLE, '>>writefile.txt' or die $!;
print WRITEHANDLE $restcontent;
close WRITEHANDLE;
This code will take the filenames from the command line, print all files up to END OF HEADER from the first file, followed by all lines from the second file. Note that the output is sent to STDOUT so you will have to redirect the output, like this:
perl program.pl headfile.txt mainfile.txt > newfile.txt
Update Now modified to print all of the first file after the line END OF HEADER followed by all of the second file
use strict;
use warnings;
my ($header_file, $main_file) = #ARGV;
open my $fh, '<', $header_file or die $!;
my $print;
while (<$fh>) {
print if $print;
$print ||= /END OF HEADER/;
}
open $fh, '<', $main_file or die $!;
print while <$fh>;
use strict;
use warnings;
use File::Slurp;
my #lines = read_file('readfile.txt');
while ( my $line = shift #lines) {
next unless ($line =~ m/END OF HEADER/);
last;
}
append_file('writefile.txt', #lines);
I believe this will do what you need:
use strict;
use warnings;
my $find = 'END OF HEADER';
my $fileContents;
{
local $/;
open my $fh_read, '<', 'theFile.txt' or die $!;
$fileContents = <$fh_read>;
}
my ($restOfFile) = $fileContents =~ /$find(.+)/s;
open my $fh_write, '>>', 'theFileToAppend.txt' or die $!;
print $fh_write $restOfFile;
close $fh_write;
my $status = 0;
my $find = "END OF HEADER";
open my $fh_write, '>', $file_write
or die "Can't open file $file_write $!";
open my $fh_read, '<', $file_read
or die "Can't open file $file_read $!";
LINE:
while (my $line = <$fh_read>) {
if ($line =~ /$find/) {
$status = 1;
next LINE;
}
print $fh_write $line if $status;
}
close $fh_read;
close $fh_write;

merging two files using perl keeping the copy of original file in other file

I have to files like A.ini and B.ini ,I want to merge both the files in A.ini
examples of files:
A.ini::
a=123
b=xyx
c=434
B.ini contains:
a=abc
m=shank
n=paul
my output in files A.ini should be like
a=123abc
b=xyx
c=434
m=shank
n=paul
I want to this merging to be done in perl language and I want to keep the copy of old A.ini file at some other place to use old copy
A command line variant:
perl -lne '
($a, $b) = split /=/;
$v{$a} = $v{$a} ? $v{$a} . $b : $_;
END {
print $v{$_} for sort keys %v
}' A.ini B.ini >NEW.ini
How about:
#!/usr/bin/perl
use strict;
use warnings;
my %out;
my $file = 'path/to/A.ini';
open my $fh, '<', $file or die "unable to open '$file' for reading: $!";
while(<$fh>) {
chomp;
my ($key, $val) = split /=/;
$out{$key} = $val;
}
close $fh;
$file = 'path/to/B.ini';
open my $fh, '<', $file or die "unable to open '$file' for reading: $!";
while(<$fh>) {
chomp;
my ($key, $val) = split /=/;
if (exists $out{$key}) {
$out{$key} .= $val;
} else {
$out{$key} = $val;
}
}
close $fh;
$file = 'path/to/A.ini';
open my $fh, '>', $file or die "unable to open '$file' for writing: $!";
foreach(keys %out) {
print $fh $_,'=',$out{$_},"\n";
}
close $fh;
The two files to be merged can be read in a single pass and don't need to be treated as separate source files. That allows the use of <> to read all files passed as parameters on the command line.
Keeping a backup copy of A.ini is simply a matter of renaming it before writing the merged data to a new file of the same name.
This program appears to do what you need.
use strict;
use warnings;
my $file_a = $ARGV[0];
my (#keys, %values);
while (<>) {
if (/\A\s*(.+?)\s*=\s*(.+?)\s*\z/) {
push #keys, $1 unless exists $values{$1};
$values{$1} .= $2;
}
}
rename $file_a, "$file_a.bak" or die qq(Unable to rename "$file_a": $!);
open my $fh, '>', $file_a or die qq(Unable to open "$file_a" for output: $!);
printf $fh "%s=%s\n", $_, $values{$_} for #keys;
output (in A.ini)
a=123abc
b=xyx
c=434
m=shank
n=paul

perl: Writing file at Nth position

I am trying to write in to file at Nth POSITION. I have tried with below example but it writes at the end. Please help to achieve this.
#!/usr/bin/perl
open(FILE,"+>>try.txt")
or
die ("Cant open file try.txt");
$POS=5;
seek(FILE,$POS,0);
print FILE "CP1";
You are opening the file in read-write appending mode. Try opening the file in read-write mode:
my $file = "try.txt";
open my $fh, "+<", $file
or die "could not open $file: $!";
Also, note the use of the three argument open, the lexical filehandle, and $!.
#!/usr/bin/perl
use strict;
use warnings;
#create an in-memory file
my $fakefile = "1234567890\n";
open my $fh, "+<", \$fakefile
or die "Cant open file: $!";
my $offset = 5;
seek $fh, $offset, 0
or die "could not seek: $!";
print $fh "CP1";
print $fakefile;
The code above prints:
12345CP190
If I understand you correctly, if the file contents are
123456789
you want to change that to
1234CP157689
You cannot achieve that using modes supplied to open (regardless of programming language).
You need to open the source file and another temporary file (see File::Temp. Read up to the insertion point from the source and write the contents to the temporary file, write what you want to insert, then write the remainder of the source file to the temporary file, close the source and rename the temporary to the source.
If you are going to do this using seek, both files must be opened in binary mode.
Here is an example using line oriented input and text mode:
#!/usr/bin/perl
use strict; use warnings;
use File::Temp qw( :POSIX );
my $source = 'test.test';
my $temp = tmpnam;
open my $source_h, '<', $source
or die "Failed to open '$source': $!";
open my $temp_h, '>', $temp
or die "Failed to open '$temp' for writing: $!";
while ( my $line = <$source_h> ) {
if ( $line =~ /^[0-9]+$/ ) {
$line = substr($line, 0, 5) . "CP1" . substr($line, 5);
}
print $temp_h $line;
}
close $temp_h
or die "Failed to close '$temp': $!";
close $source_h
or die "Failed to close '$source': $!";
rename $temp => $source
or die "Failed to rename '$temp' to '$source': $!";
this works for me
use strict;
use warnings;
open( my $fh, '+<', 'foo.txt' ) or die $!;
seek( $fh, 3, 0 );
print $fh "WH00t?";
this is also a more "modern" use of open(), see http://perldoc.perl.org/functions/open.html
The file will be closed when $fh goes out of scope ..
"Inserting" a string into a function can (mostly) be done in place. See the lightly used truncate built-in function.
open my $fh, '+<', $file or die $!;
seek $fh, 5, 0;
$/ = undef;
$x = <$fh>; # read everything after the 5th byte into $x
truncate $fh, 5;
print $fh "CPI";
print $fh $x;
close $fh;
If your file is line or record oriented, you can insert lines or modify individual lines easily with the core module Tie::File This will allow the file to be treated as an array and Perl string and array manipulation to be used to modify the file in memory. You can safely operate on huge files larger than your RAM with this method.
Here is an example:
use strict; use warnings;
use Tie::File;
#create the default .txt file:
open (my $out, '>', "nums.txt") or die $!;
while(<DATA>) { print $out "$_"; }
close $out or die $!;
tie my #data, 'Tie::File', "nums.txt" or die $!;
my $offset=5;
my $insert="INSERTED";
#insert in a string:
$data[0]=substr($data[0],0,$offset).$insert.substr($data[0],$offset)
if (length($data[0])>$offset);
#insert a new array element that becomes a new file line:
splice #data,$offset,0,join(':',split(//,$insert));
#insert vertically:
$data[$_]=substr($data[$_],0,$offset) .
substr(lc $insert,$_,1) .
substr($data[$_],$offset) for (0..length($insert));
untie #data; #close the file too...
__DATA__
123456789
234567891
345678912
456789123
567891234
678912345
789123456
891234567
912345678
Output:
12345iINSERTED6789
23456n7891
34567s8912
45678e9123
56789r1234
I:N:St:E:R:T:E:D
67891e2345
78912d3456
891234567
912345678
The file modifications with Tie::File are made in place and as the array is modified. You could use Tie::File just on the first line of you file to modify and insert as you requested. You can put sleep between the array mods and use tail -n +0 -f on the file and watch the file change if you wish...
Alternatively, if your file is reasonable size and you want to treat it like characters, you can read the entire file into memory, do string operations on the data, then write the modified data back out. Consider:
use strict; use warnings;
#creat the default .txt file:
open (my $out, '>', "nums.txt") or die $!;
while(<DATA>) { print $out "$_"; }
close $out or die $!;
my $data;
open (my $in, '<', "nums.txt") or die $!;
{ local $/=undef; $data=<$in>; }
close $in or die $!;
my $offset=5;
my $insert="INSERTED";
open (my $out, '>', "nums.txt") or die $!;
print $out substr($data,0,$offset).$insert.substr($data,$offset);
close $out or die $!;
__DATA__
123456789
2
3
4
5
6
7
8
9
Output:
12345INSERTED6789
2
3
4
5
6
7
8
9
If you treat files as characters, beware that under Windows, files in text mode have a \r\n for a new line. That is two characters if opened in binary mode.