Decrypting a file read in from the command line - perl

I need the decrypting to be in ROT-25 which i think I already have set up. Next it needs to decrypt a file read in from the command line and that's where my problem is. I'm guessing it would have to be run like perl filename anyfile.txt but how do i set this up?
#!/Strawberry/perl/bin/perl
use v5.14;
my ($file1) = #ARGV;
open my $fh1, '<', $file1;
while (<$fh1>) {
sub encode_decode {
shift =~ tr/A-Za-z/Z-ZA-Yz-za-y/r;
}
my $enc = encode_decode();
my $dec = encode_decode($enc);
say "Enc: ", $enc;
say "Dec: ", $dec;
}
close $fh1;

There are several issue here. First, a function that uses the same logic to encode_decode() doesn't make sense for ROT25, only for ROT13. To create your initial encoded file, you can use Unix to do it:
echo "The secret of getting ahead is getting started -- Mark Twain" | tr "A-Za-z" "Z-ZA-Yz-za-y" > encoded_twain.txt
then run your program on encoded_twain.txt
Since you need to determine if "the" appears anywere in the text, reading the file in line by line isn't your best bet. You're better off reading it in as a single string and then both decoding and testing that.
Your decoder has to do the opposite of what it does now (encoding.)
Putting it all together, we get something like:
use English;
my $file_name = shift;
sub decode
{
return shift =~ tr/Z-ZA-Yz-za-y/A-Za-z/r;
}
open my $file_handle, '<', $file_name;
my $encoded = '';
{ # allow us to read entire file in as a string:
local $INPUT_RECORD_SEPARATOR = undef;
$encoded = <$file_handle>;
}
close $file_handle;
my $decoded = &decode($encoded);
if ($decoded =~ m/(^| )the /m) # make this more robust!
{
print($decoded);
}

Only a small change, either declare a variable for holding the current line (or use $_):
open my $fh1, '<', $file1;
while ( my $line = <$fh1> ) {
my $dec = decode( $line );
# say "Dec: ", $dec;
}
close $fh1;
You can test the decoded lines for "the". If it is found, open the file again and print all lines.

Related

Perl print to seperate files

I have a text file which lists a service, device and a filter, here I list 3 examples only:
service1 device04 filter9
service2 device01 filter2
service2 device10 filter11
I have written a perl script that iterates through the file and should then print device=device filter=filter to a file named according to the service it belongs to, but if a string contains a duplicate filter, it should add the devices to the same file, seperated by semicolons. Looking at the above example, I then need a result of:
service1.txt
device=device04 filter=filter9
service2.txt
device=device01 filter=filter2 ; device=device10 filter=filter11
Here is my code:
use strict;
use warnings qw(all);
open INPUT, "<", "file.txt" or die $!;
my #Input = <INPUT>;
foreach my $item(#Input) {
my ($serv, $device, $filter) = split(/ /, $item);
chomp ($serv, $device, $filter);
push my #arr, "device==$device & filter==$filter";
open OUTPUT, ">>", "$serv.txt" or die $!;
print OUTPUT join(" ; ", #arr);
close OUTPUT;
}
The problem I am having is that both service1.txt and service2.txt are created, but my results are all wrong, see my current result:
service1.txt
device==device04 filter==filter9
service2.txt
device==device04 filter==filter9 ; device==device01 filter==filter2device==device04 filter==filter9 ; device==device01 filter==filter2 ; device==device10 filter==filter11
I apologise, I know this is something stupid, but it has been a really long night and my brain cannot function properly I believe.
For each service to have its own file where data for it accumulates you need to distinguish for each line what file to print it to.
Then open a new service-file when a service without one is encountered, feasible since there aren't so many as clarified in a comment. This can be organized by a hash service => filehandle.
use warnings;
use strict;
use feature 'say';
my $file = shift #ARGV || 'data.txt';
my %handle;
open my $fh, '<', $file or die "Can't open $file: $!";
while (<$fh>) {
my ($serv, $device, $filter) = split;
if (exists $handle{$serv}) {
print { $handle{$serv} } " ; device==$device & filter==$filter";
}
else {
open my $fh_out, '>', "$serv.txt" or do {
warn "Can't open $serv.txt: $!";
next;
};
print $fh_out "device==$device & filter==$filter";
$handle{$serv} = $fh_out;
}
}
say $_ '' for values %handle; # terminate the line in each file
close $_ for values %handle;
For clarity the code prints almost the same in both cases, what surely can be made cleaner. This was tested only with the provided sample data and produces the desired output.
Note that when a filehandle need be evaluated we need { }. See this post, for example.
Comments on the original code (addressed in the code above)
Use lexical filehandles (my $fh) instead of typeglobs (FH)
Don't read the whole file at once unless there is a specific reason for that
split has nice defaults, split ' ', $_, where ' ' splits on whitespace and discards leading and trailing space as well. (And then there is no need to chomp in this case.)
Another option is to first collect data for each service, just as OP attempts, but again use a hash (service => arrayref/string with data) and print at the end. But I don't see a reason to not print as you go, since you'd need the same logic to decide when ; need be added.
Your code looks pretty perl4-ish, but that's not a problem. As MrTux has pointed out, you are confusing collection and fanning out of your data. I have refactored this to use a hash as intermediate container with the service name as keys. Please note that this will not accumulate results across mutliple calls (as it uses ">" and not ">>").
use strict;
use warnings qw(all);
use File::Slurp qw/read_file/;
my #Input = read_file('file.txt', chomp => 1);
my %store = (); # Global container
# Capture
foreach my $item(#Input) {
my ($serv, $device, $filter) = split(/ /, $item);
push #{$store{$serv}}, "device==$device & filter==$filter";
}
# Write out for each service file
foreach my $k(keys %store) {
open(my $OUTPUT, ">", "$k.txt") or die $!;
print $OUTPUT join(" ; ", #{$store{$k}});
close( $OUTPUT );
}

Calculate the length of a string in a specific file format with perl

I am trying to both learn perl and use it in my research. I need to do a simple task which is counting the number of sequences and their lengths in a file such as follow:
>sequence1
ATCGATCGATCG
>sequence2
AAAATTTT
>sequence3
CCCCGGGG
The output should look like this:
sequence1 12
sequence2 8
sequence3 8
Total number of sequences = 3
This is the code I have written which is very crude and simple:
#!/usr/bin/perl
use strict;
use warnings;
my ($input, $output) = #ARGV;
open(INFILE, '<', $input) or die "Can't open $input, $!\n"; # Open a file for reading.
open(OUTFILE, '>', $output) or die "Can't open $output, $!"; # Open a file for writing.
while (<INFILE>) {
chomp;
if (/^>/)
{
my $number_of_sequences++;
}else{
my length = length ($input);
}
}
print length, number_of_sequences;
close (INFILE);
I'd be grateful if you could give me some hints, for example, in the else block, when I use the length function, I am not sure what argument I should pass into it.
Thanks in advance
You're printing out just the last length, not each sequence length, and you want to catch the sequence names as you go:
#!/usr/bin/perl
use strict;
use warnings;
my ($input, $output) = #ARGV;
my ($lastSeq, $number_of_sequences) = ('', 0);
open(INFILE, '<', $input) or die "Can't open $input, $!\n"; # Open a file for reading.
# You never use OUTFILE
# open(OUTFILE, '>', $output) or die "Can't open $output, $!"; # Open a file for writing.
while (<INFILE>) {
chomp;
if (/^>(.+)/)
{
$lastSeq = $1;
$number_of_sequences++;
}
else
{
my $length = length($_);
print "$lastSeq $length\n";
}
}
print "Total number of sequences = $number_of_sequences\n";
close (INFILE);
Since you have indicated that you want feedback on your program, here goes:
my ($input, $output) = #ARGV;
open(INFILE, '<', $input) or die "Can't open $input, $!\n"; # Open a file for reading.
open(OUTFILE, '>', $output) or die "Can't open $output, $!"; # Open a file for writing.
Personally, I think when dealing with a simple input/output file relation, it is best to just use the diamond operator and standard output. That means that you read from the special file handle <>, commonly referred to as "the diamond operator", and you print to STDOUT, which is the default output. If you want to save the output in a file, just use shell redirection:
perl program.pl input.txt > output.txt
In this part:
my $number_of_sequences++;
you are creating a new variable. This variable will go out of scope as soon as you leave the block { .... }, in this case: the if-block.
In this part:
my length = length ($input);
you forgot the $ sigil. You are also using length on the file name, not the line you read. If you want to read a line from your input, you must use the file handle:
my $length = length(<INFILE>);
Although this will also include the newline in the length.
Here you have forgotten the sigils again:
print length, number_of_sequences;
And of course, this will not create the expected output. It will print something like sequence112.
Recommendations:
Use a while (<>) loop to read your input. This is the idiomatic method to use.
You do not need to keep a count of your input lines, there is a line count variable: $.. Though keep in mind that it will also count "bad" lines, like blank lines or headers. Using your own variable will allow you to account for such things.
Remember to chomp the line before finding out its length. Or use an alternative method that only counts the characters you want: my $length = ( <> =~ tr/ATCG// ) This will read a line, count the letters ATGC, return the count and discard the read line.
Summary:
use strict;
use warnings; # always use these two pragmas
my $count;
while (<>) {
next unless /^>/; # ignore non-header lines
$count++; # increment counter
chomp;
my $length = (<> =~ tr/ATCG//); # get length of next line
s/^>(\S+)/$1 $length\n/; # remove > and insert length
} continue {
print; # print to STDOUT
}
print "Total number is sequences = $count\n";
Note the use of continue here, which will allow us to skip a line that we do not want to process, but that will still get printed.
And as I said above, you can redirect this to a file if you want.
For starters, you need to change your inner loop to this:
...
chomp;
if (/^>/)
{
$number_of_sequences++;
$sequence_name = $_;
}else{
print "$sequence_name ", length($input), "\n";
}
...
Note the following:
The my declaration has been removed from $number_of_sequences
The sequence name is captured in the variable $sequence_name. It is used later when the next line is read.
To make the script run under strict mode, you can add my declarations for $number_of_sequences and $sequence_name outside of the loop:
my $sequence_name;
my $number_of_sequences = 0;
while (<INFILE>) {
...(as above)...
}
print "Total number of sequences: $number_of_sequences\n";
The my keyword declares a new lexically scoped variable - i.e. a variable which only exists within a certain block of code, and every time that block of code is entered, a new version of that variable is created. Since you want to have the value of $sequence_name carry over from one loop iteration to the next you need to place the my outside of the loop.
#!/usr/bin/perl
use strict;
use warnings;
my ($file, $line, $length, $tag, $count);
$file = $ARGV[0];
open (FILE, "$file") or print"can't open file $file\n";
while (<FILE>){
$line=$_;
chomp $line;
if ($line=~/^>/){
$tag = $line;
}
else{
$length = length ($line);
$count=1;
}
if ($count==1){
print "$tag\t$length\n";
$count=0
}
}
close FILE;

how to extract substrings by knowing the coordinates

I am terribly sorry for bothering you with my problem in several questions, but I need to solve it...
I want to extract several substrings from a file whick contains string by using another file with the begin and the end of each substring that I want to extract.
The first file is like:
>scaffold30 24194
CTTAGCAGCAGCAGCAGCAGTGACTGAAGGAACTGAGAAAAAGAGCGAGCTGAAAGGAAGCATAGCCATTTGGGAGTGCCAGAGAGTTGGGAGG GAGGGAGGGCAGAGATGGAAGAAGAAAGGCAGAAATACAGGGAGATTGAGGATCACCAGGGAG.........
.................
(the string must be everything in the file except the first line), and the coordinates file is like:
44801988 44802104
44846151 44846312
45620133 45620274
45640443 45640543
45688249 45688358
45729531 45729658
45843362 45843490
46066894 46066996
46176337 46176464
.....................
my script is this:
my $chrom = $ARGV[0];
my $coords_file = $ARGV[1];
#finds subsequences: fasta files
open INFILE1, $chrom or die "Could not open $chrom: $!";
my $count = 0;
while(<INFILE1>) {
if ($_ !~ m/^>/) {
local $/ = undef;
my $var = <INFILE1>;
open INFILE, $coords_file or die "Could not open $coords_file: $!";
my #cline = <INFILE>;
foreach my $cline (#cline) {
print "$cline\n";
my#data = split('\t', $cline);
my $start = $data[0];
my $end = $data[1];
my $offset = $end - $start;
$count++;
my $sub = substr ($var, $start, $offset);
print ">conserved $count\n";
print "$sub\n";
}
close INFILE;
}
}
when I run it, it looks like it does only one iteration and it prints me the start of the first file.
It seems like the foreach loop doesn't work.
also substr seems that doesn't work.
when I put an exit to print the cline to check the loop, it prints all the lines of the file with the coordinates.
I am sorry if I become annoying, but I must finish it and I am a little bit desperate...
Thank you again.
This line
local $/ = undef;
changes $/ for the entire enclosing block, which includes the section where you read in your second file. $/ is the input record separator, which essentially defines what a "line" is (it is a newline by default, see perldoc perlvar for details). When you read from a filehandle using <>, $/ is used to determine where to stop reading. For example, the following program relies on the default line-splitting behavior, and so only reads until the first newline:
my $foo = <DATA>;
say $foo;
# Output:
# 1
__DATA__
1
2
3
Whereas this program reads all the way to EOF:
local $/;
my $foo = <DATA>;
say $foo;
# Output:
# 1
# 2
# 3
__DATA__
1
2
3
This means your #cline array gets only one element, which is a string containing the text of your entire coordinates file. You can see this using Data::Dumper:
use Data::Dumper;
print Dumper(\#cline);
Which in your case will output something like:
$VAR1 = [
'44801988 44802104
44846151 44846312
45620133 45620274
45640443 45640543
45688249 45688358
45729531 45729658
45843362 45843490
46066894 46066996
46176337 46176464
'
];
Notice how your array (technically an arrayref in this case), delineated by [ and ], contains only a single element, which is a string (delineated by single quotes) that contains newlines.
Let's walk through the relevant sections of your code:
while(<INFILE1>) {
if ($_ !~ m/^>/) {
# Enable localized slurp mode. Stays in effect until we leave the 'if'
local $/ = undef;
# Read the rest of INFILE1 into $var (from current line to EOF)
my $var = <INFILE1>;
open INFILE, $coords_file or die "Could not open $coords_file: $!";
# In list context, return each block until the $/ character as a
# separate list element. Since $/ is still undef, this will read
# everything until EOF into our first list element, resulting in
# a one-element array
my #cline = <INFILE>;
# Since #cline only has one element, the loop only has one iteration
foreach my $cline (#cline) {
As a side note, your code could be cleaned up a bit. The names you chose for your filehandles leave something to be desired, and you should probably use lexical filehandles anyway (and the three-argument form of open):
open my $chromosome_fh, "<", $ARGV[0] or die $!;
open my $coordinates_fh, "<", $ARGV[1] or die $!;
Also, you do not need to nest your loops in this case, it just makes your code more convoluted. First read the relevant parts of your chromosome file into a variable (named something more meaningful than var):
# Get rid of the `local $/` statement, we don't need it
my $chromosome;
while (<$chromosome_fh>) {
next if /^>/;
$chromosome .= $_;
}
Then read in your coordinates file:
my #cline = <$coordinates_fh>;
Or if you only need to use the contents of the coordinates file once, process each line as you go using a while loop:
while (<$coordinates_fh>) {
# Do something for each line here
}
As 'ThisSuitIsBlackNot' suggested, your code could be cleaned up a little. Here is a possible solution that may be what you want.
#!/usr/bin/perl
use strict;
use warnings;
my $chrom = $ARGV[0];
my $coords_file = $ARGV[1];
#finds subsequences: fasta files
open INFILE1, $chrom or die "Could not open $chrom: $!";
my $fasta;
<INFILE1>; # get rid of the first line - '>scaffold30 24194'
while(<INFILE1>) {
chomp;
$fasta .= $_;
}
close INFILE1 or die "Could not close '$chrom'. $!";
open INFILE, $coords_file or die "Could not open $coords_file: $!";
my $count = 0;
while(<INFILE>) {
my ($start, $end) = split;
# Or, should this be: my $offset = $end - ($start - 1);
# That would include the start fasta
my $offset = $end - $start;
$count++;
my $sub = substr ($fasta, $start, $offset);
print ">conserved $count\n";
print "$sub\n";
}
close INFILE or die "Could not close '$coords_file'. $!";

How do I change values with perl and regex/sed inside a file?

I'm pretty sure I am doing something stupid and I apologize for this ahead of time. I have looked at the one-liners that were suggested elsewhere on similar searches and I like the idea of them, I'm just not sure how to apply because it's not a direct swap. And if the answer is that this can't be done, then that is fine and I will script around that.
The problem: I have log files I need to send through a parser that requires the dates to be in YYYY-MM-DD. The files can be saved this way; however, some people prefer them in YYYY/MM/DD for their own viewing and send those to me. I can modify one or two dates with sed and this works beautifully; however, when there are 2-3+ years in the files, it would be nice not to have to do it manually for each date.
My code (I have left the debugging commands in place):
use strict;
use File::Copy;
use Getopt::Std;
my %ARGS = ();
getopts('f:v', \%ARGS);
my $file = $ARGS{f};
&main();
sub main($)
{
open (FIN, "<$file") || die ("Cannot open file");
print "you opened the file\n";
while (<FIN>) {
my $line = $_;
if ($line =~ /(\d*)\/(\d*)\/(\d*) /i) {
#print "you are in the if";
my $year = $1;
my $month = $2;
my $day = $3;
print $line;
print "\nyou have year $1\n";
print "you have month $2\n";
print "you have day $3\n";
s/'($1\/$2\/$3)/$1-$2-$3'/;
}
}
close FIN;
}
I can see that the regex is getting the right values into my variables but the original line is not being replaced in the file.
Questions:
1) Should this be possible to do within the same file or do I need to output it to a different file? Looking at other answers, same file should be fine.
2) Does the file need to be opened in another way or somehow set to be written to rather than merely running the replace command like I do with sed? <--I am afraid that the failure may be in here somewhere simple that I am overlooking.
Thanks!
You never write to the file. With sed, you'd use -i, and you can do exactly the same in Perl.
perl -i -pe's{(\d{4})/(\d{2})/(\d{2})}{$1-$2-$3}g' file
Or with a backup:
perl -i~ -pe's{(\d{4})/(\d{2})/(\d{2})}{$1-$2-$3}g' file
That's equivalent to
local $^I = ''; # Or for the second: local $^I = '~';
while (<>) {
s{(\d{4})/(\d{2})/(\d{2})}{$1-$2-$3}g;
print;
}
If you didn't want to rely on $^I, you'd have to replicate its behaviour.
for my $qfn (#ARGV) {
open($fh_in, '<', $qfn)
or do { warn("Can't open $ARGV: $!\n"); next; };
unlink($qfn)
or do { warn("Can't overwrite $ARGV: $!\n"); next; };
open(my $fh_out, '>', $qfn) {
or do { warn("Can't create $ARGV: $!\n"); next; };
while (<$fh_in>) {
s{(\d{4})/(\d{2})/(\d{2})}{$1-$2-$3}g;
print $fh_out $_;
}
}
perl -pi.bak -e 's|(\d{4})/(\d\d)/(\d\d)|$1-$2-$3|g;' input
Replace input with your log file name. A backup file input.bak will be created in case you ever need the original data.

How do I read the contents of a small text file into a scalar in Perl?

I have a small text file that I'd like to read into a scalar variable exactly as it is in the file (preserving line separators and other whitespace).
The equivalent in Python would be something like
buffer = ""
try:
file = open("fileName", 'rU')
try:
buffer += file.read()
finally:
file.close()
except IOError:
buffer += "The file could not be opened."
This is for simply redisplaying the contents of the file on a web page, which is why my error message is going into my file buffer.
From the Perl Cookbook:
my $filename = 'file.txt';
open( FILE, '<', $filename ) or die 'Could not open file: ' . $!;
undef $/;
my $whole_file = <FILE>;
I would localize the changes though:
my $whole_file = '';
{
local $/;
$whole_file = <FILE>;
}
As an alternative to what Alex said, you can install the File::Slurp module (cpan -i File::Slurp from the command line) and use this:
use File::Slurp;
# Read data into a variable
my $buffer = read_file("fileName");
# or read data into an array
my #buffer = read_file("fileName");
Note that this dies (well... croaks, but that's just the proper way to call die from a module) on errors, so you may need to run this in an eval block to catch any errors.
If I don't have Slurp or Perl6::Slurp near by then I normally go with....
open my $fh, '<', 'file.txt' or die $!;
my $whole_file = do { local $/; <$fh> };
There is a discussion of the various ways to read a file here.
I don't have enough reputation to comment, so I apologize for making this another post.
# Harold Bamford: $/ should not be an obscure variable to a Perl programmer. A beginner may not know it, but he or she should learn it. The join method is a poor choice for the reasons stated in the article linked by hackingwords above. Here's the relevant quotation from the article:
That needlessly splits the input file into lines (join provides a list context to ) and then joins up those lines again. The original coder of this idiom obviously never read perlvar and learned how to use $/ to allow scalar slurping.
You could do something like:
$data_file="somefile.txt";
open(DAT, $data_file);
#file_data = <DAT>;
close(DAT);
That'll give you the file contents in an array, that you can use for whatever you want, for example, if you wanted each individual line, you could do something like:
foreach $LINE (#file_data)
{
dosomethingwithline($LINE);
}
For a full usage example:
my $result;
$data_file = "somefile.txt";
my $opened = open(DAT, $data_file);
if (!$opened)
{
$result = "Error.";
}
else
{
#lines = <DAT>;
foreach $LINE (#lines)
{
$result .= $LINE;
}
close(DAT);
}
Then you can use $result however you need. Note: This code is untested, but it should give you an idea.
I'd tweak draegtun's answer like this, to make it do exactly what was being asked:
my $buffer;
if ( open my $fh, '<', 'fileName' ) {
$buffer = do { local $/; <$fh> };
close $fh;
} else {
$buffer = 'The file could not be opened.';
}
Just join all lines together into a string:
open(F, $file) or die $!;
my $content = join("", <F>);
close F;
(It was previously suggested to use join "\n" but that will add extra newlines. Each line already has a newline at its end when it's read.)