I am writing a code to create files named after one column in an array, and then have all the common values of another column within it. I have successfully done this, but now I would like to eliminate the white space in-between the file names and convert it into underscores. How do I do that?
#!/usr/local/bin/perl
use strict;
my #traitarray;
my $traitarray;
my $input ;
my %traithash ;
my $t_out ;
my $TRAIT;
my $SNPS;
open ($input, "gwas_catalog_v1.0-downloaded_2015-07-08") || die () ;
while(<$input>) {
#traitarray = split (/\t/);
$TRAIT = $traitarray[7];
$SNPS = $traitarray[21];
if (!exists $traithash {$TRAIT}) {
open ($t_out, ">outputFiles/".$TRAIT.".txt");
print ($t_out "$SNPS\n");
$traithash {$TRAIT} = 1 ;
push (#traitarray, $TRAIT) ;
}
else {
print $t_out "$SNPS\n";
}
}
foreach ($traitarray) {
close "$TRAIT.txt";
}
I have tried looking for an answer but many of the questions either include something else as well, or how to go about this within the bash terminal, something I am not comfortable with yet, as I am still new to coding.
The file is 10947980 lines, and has 33 columns.
You're presumably talking about the file you open with this
open ($t_out, ">outputFiles/".$TRAIT.".txt")
You can do that using the transliterate operator first tr/ /_/ and your open call would be better written like this
my $outfile = "outputFiles/$TRAIT.txt";
$outfile =~ tr/ /_/;
open my $t_out, '>', $outfile or die qq{Unable to open "$outfile" for output: $!};
I assume you're referring to the value in the $TRAIT var which is then used as part of the new filename.
$TRAIT = $traitarray[7];
$TRAIT =~ s/\s+/_/g;
Related
Here I had tried to read the file using standard input. In my case standard input fails to print the flower bracket contents.
My code:
#!/usr/local/bin/perl
use strict;
use warnings;
my $file = 'file.txt';
open my $fh, "<", $file
or die "Could not open '$file': $!";
chomp(my #files = <$fh>);
close $fh
or die "Coould not close '$file' $!";
while (my $stdin = <>) {
chomp $stdin;
if ( grep { $stdin eq $_ } #files ) {
print "#files\n";
last
}
else {
print "There is no word in the $file\n";
last;
}
}
File.txt:
{data1}
data2
data3
{data4}
File Execution:
perl t.pl
data1
There is no word in the file.txt
Looks like this question has been edited since I first looked at it a few hours ago. Originally, the crucial line looked like this:
if ( grep { $stdin eq $_ } #files ) {
That was never going to work because you are giving it "data1" as input and none of the lines matches that string using eq. You have a line that contains "data1", but as it is surrounded by "{" and "}", the strings are different - 'data1 eq {data1} is obviously false.
You have now changed that line to:
if ( grep { $stdin && $_ } #files ) {
And that's very strange. This check asks the question "do both $stdin and $_ contain true values?". And that will almost certainly always be true. I'm really not sure what that change was supposed to achieve.
Your question doesn't actually say what you're trying to do here. But I'm guessing that you want to match if any of the lines contains the string that is entered (but it's ok if it doesn't make up the entire line). In that case, you want a regex check and your line of code should be:
if ( grep { /\Q$stdin/ } #files ) {
Note: I've added the \Q as suggested in the comments. This is a good idea as it prevents the strings in #files being interpreted as containing regex metacharacters.
I need the decrypting to be in ROT-25 which i think I already have set up. Next it needs to decrypt a file read in from the command line and that's where my problem is. I'm guessing it would have to be run like perl filename anyfile.txt but how do i set this up?
#!/Strawberry/perl/bin/perl
use v5.14;
my ($file1) = #ARGV;
open my $fh1, '<', $file1;
while (<$fh1>) {
sub encode_decode {
shift =~ tr/A-Za-z/Z-ZA-Yz-za-y/r;
}
my $enc = encode_decode();
my $dec = encode_decode($enc);
say "Enc: ", $enc;
say "Dec: ", $dec;
}
close $fh1;
There are several issue here. First, a function that uses the same logic to encode_decode() doesn't make sense for ROT25, only for ROT13. To create your initial encoded file, you can use Unix to do it:
echo "The secret of getting ahead is getting started -- Mark Twain" | tr "A-Za-z" "Z-ZA-Yz-za-y" > encoded_twain.txt
then run your program on encoded_twain.txt
Since you need to determine if "the" appears anywere in the text, reading the file in line by line isn't your best bet. You're better off reading it in as a single string and then both decoding and testing that.
Your decoder has to do the opposite of what it does now (encoding.)
Putting it all together, we get something like:
use English;
my $file_name = shift;
sub decode
{
return shift =~ tr/Z-ZA-Yz-za-y/A-Za-z/r;
}
open my $file_handle, '<', $file_name;
my $encoded = '';
{ # allow us to read entire file in as a string:
local $INPUT_RECORD_SEPARATOR = undef;
$encoded = <$file_handle>;
}
close $file_handle;
my $decoded = &decode($encoded);
if ($decoded =~ m/(^| )the /m) # make this more robust!
{
print($decoded);
}
Only a small change, either declare a variable for holding the current line (or use $_):
open my $fh1, '<', $file1;
while ( my $line = <$fh1> ) {
my $dec = decode( $line );
# say "Dec: ", $dec;
}
close $fh1;
You can test the decoded lines for "the". If it is found, open the file again and print all lines.
I am a beginner with Perl and I want to merge the content of two text files.
I have read some similar questions and answers on this forum, but I still cannot resolve my issues
The first file has the original ID and the recoded ID of each individual (in the first and fourth columns)
The second file has the recoded ID and some information on some of the individuals (in the first and second columns).
I want to create an output file with the original, recoded and information of these individuals.
This is the perl script I have created so far, which is not working.
If anyone could help it would be very much appreciated.
use warnings;
use strict;
use diagnostics;
use vars qw( #fields1 $recoded $original $IDF #fields2);
my %columns1;
open (FILE1, "<file1.txt") || die "$!\n Couldn't open file1.txt\n";
while ($_ = <FILE1>)
{
chomp;
#fields1=split /\s+/, $_;
my $recoded = $fields1[0];
my $original = $fields1[3];
my %columns1 = (
$recoded => $original
);
};
open (FILE2, "<file2.txt") || die "$!\n Couldnt open file2.txt \n";
for ($_ = <FILE2>)
{
chomp;
#fields2=split /\s+/, $_;
my $IDF= $fields2[0];
my $F=$fields2[1];
my %columns2 = (
$F => $IDF
);
};
close FILE1;
close FILE2;
open (FILE3, ">output.txt") ||die "output problem\n";
for (keys %columns1) {
if (exists ($columns2{$_}){
print FILE3 "$_ $columns1{$_}\n"
};
}
close FILE3;
One problem is with scoping. In your first loop, you have a my in front of $column1 which makes it local to the loop and will not be in scope when you next the loop. So the %columns1 (which is outside of the loop) does not have any values set (which is what I suspect you want to set). For the assignment, it would seem to be easier to have $columns1{$recorded} = $original; which assigns the value to the key for the hash.
In the second loop you need to declare %columns2 outside of the loop and possibly use the above assignment.
For the third loop, in the print you just need add $columns2{$_} in front part of the string to be printed to get the original ID to be printed before the recorded ID.
Scope:
The problem is with scope of the hash variables you have defined. The scope of the variable is limited to the loop inside which the variable has been defined.
In your code, since %columns1 and %columns2 are used outside the while loops. Hence, they should be defined outside the loops.
Compilation error : braces not closed properly
Also, in the "if exists" part, the open-and-closed braces symmetry is affected.
Here is your code with the required corrections made:
use warnings;
use strict;
use diagnostics;
use vars qw( #fields1 $recoded $original $IDF #fields2);
my (%columns1, %columns2);
open (FILE1, "<file1.txt") || die "$!\n Couldn't open CFC_recoded.txt\n";
while ($_ = <FILE1>)
{
chomp;
#fields1=split /\s+/, $_;
my $recoded = $fields1[0];
my $original = $fields1[3];
%columns1 = (
$recoded => $original
);
}
open (FILE2, "<file2.txt") || die "$!\n Couldnt open CFC_F.xlsx \n";
for ($_ = <FILE2>)
{
chomp;
#fields2=split /\s+/, $_;
my $IDF= $fields2[0];
my $F=$fields2[1];
%columns2 = (
$F => $IDF
);
}
close FILE1;
close FILE2;
open (FILE3, ">output.txt") ||die "output problem\n";
for (keys %columns1) {
print FILE3 "$_ $columns1{$_} \n" if exists $columns2{$_};
}
close FILE3;
am fairly new to the perl scripting and need some help. below is my query:
I have a file which has contents like below:
AA ABC 0 0
line1
line2
...
AA XYZ 1 1
line..
line..
AA GHI 2 2
line..
line...
Now I would like get all the lines between those lines which have the starting string/pattern "AA" and write them to files ABC.txt, XYZ.txt, GHI.txt, repsectively including the line AA*, for examples ABC.txt should look like
AA ABC 0 0
line1
line2...
and XYZ.txt should look like
AA XYZ 1 1
line..
line..
Hope am clear in this question and any help regarding this is much appreciated.
Thanks,
Sandy
I presume you're asking for an algorithm since you didn't specify what you needed help with.
Declare a file handle for use for output.
While you haven't reached the end of the input file,
Read a line.
If it's a header line,
Parse it.
Determine file name.
(Re)open the output file.
Print the line to the output file handle.
Lest you be tempted to use one of the poor solutions that have been posted since I posted the above, here's the code:
my $fh;
while (<>) {
if (my ($fn) = /^AA\s+(\S+)/) {
$fn .= '.txt';
open($fh, '>', $fn)
or die("Can't create file \"$fn\": $!\n");
}
print $fh $_;
}
Possible improvements, all of which are easy to add:
Check for duplicate headers. (if -e $fn is one way)
Check for data before the first header. (if !$fh is one way)
You just need to keep one file open at a time... When a line matches XYZ, then you open your XYZ.txt file and output the line. You keep that file open (let's just say it's the handle CURRENT_FILE) and output each successive line to it until you match a new header line. Then you close the current file and open another one.
My Perl is a extremely rusty, so I don't think I can provide code that compiles, but essentially it's something close to this.
my $current_name = "";
foreach my $line (<INPUT>)
{
my($name) = $line =~ /^AA (\w+)/;
if( $name ne $current_name ) {
close(CURRENT_FILE) if $current_name ne "";
open(CURRENT_FILE, ">>", "$name.txt") || die "Argh\n";
$current_name = $name;
}
next if $current_name eq "";
print CURRENT_FILE $line;
}
close(CURRENT_FILE) if $current_name ne "";
What do you think about this one?
1: Get contents from the file (maybe using File::Slurp's read_file) and save to a scalar.
use File::Slurp qw(read_file write_file);
my $contents = read_file($filename);
2: Have a regex pattern matching similar to this:
my #file_rows = ($contents ~= /(AA\s[A-Z]{3}\s+\d+\s+\w*)/);
3: If column 2 values are always unique throughout the file:
foreach my $file_row (#file_rows) {
my #values = split(' ', $file_row, 3);
write_file($values[1] . ".txt", $file_row);
}
3: Otherwise: Split the row values. Store them to a hash using the second column as the key. Write data to output files using the hash.
my %hash;
foreach my $file_row (#file_rows) {
my #values = split(' ', $file_row, 3);
if (defined $hash{$value[1]}) {
$hash{$values[1]} .= $file_row;
} else {
$hash{$values[1]} = $file_row;
}
}
foreach my $key (keys %hash) {
write_file($key .'txt', $hash{$key});
}
Here's an option that looks for the pattern matching the start of each record. When found, it loops through the data file's lines and builds a record until it finds the same pattern again or eof, then that record is written to a file. It does not check to see if the file already exists before writing to it, so it will replace ABC.txt if it already exists:
use strict;
use warnings;
my $dataFile = 'data.txt';
my $nextLine = '';
my $recordRegex = qr/^AA\s+(\S+)\s+\d+\s+\d+/;
open my $inFH, '<', $dataFile or die $!;
RECORD: while ( my $line = <$inFH> ) {
my $record = $nextLine . $line;
if ( $record =~ $recordRegex ) {
my $fileName = $1 . '.txt';
while ( $nextLine = <$inFH> ) {
if ( $nextLine =~ $recordRegex or eof $inFH ) {
$record .= $nextLine if eof $inFH;
open my $outFH, '>', $fileName or die $!;
print $outFH $record;
close $outFH;
next RECORD;
}
$record .= $nextLine;
}
}
}
close $inFH;
Hope this helps!
Edit: This code replaces the original that was problematic. Thank you, amon, for reviewing the original code.
Ive been trying to compare lines between two files and matching lines that are the same.
For some reason the code below only ever goes through the first line of 'text1.txt' and prints the 'if' statement regardless of if the two variables match or not.
Thanks
use strict;
open( <FILE1>, "<text1.txt" );
open( <FILE2>, "<text2.txt" );
foreach my $first_file (<FILE1>) {
foreach my $second_file (<FILE2>) {
if ( $second_file == $first_file ) {
print "Got a match - $second_file + $first_file";
}
}
}
close(FILE1);
close(FILE2);
If you compare strings, use the eq operator. "==" compares arguments numerically.
Here is a way to do the job if your files aren't too large.
#!/usr/bin/perl
use Modern::Perl;
use File::Slurp qw(slurp);
use Array::Utils qw(:all);
use Data::Dumper;
# read entire files into arrays
my #file1 = slurp('file1');
my #file2 = slurp('file2');
# get the common lines from the 2 files
my #intersect = intersect(#file1, #file2);
say Dumper \#intersect;
A better and faster (but less memory efficient) approach would be to read one file into a hash, and then search for lines in the hash table. This way you go over each file only once.
# This will find matching lines in two files,
# print the matching line and it's line number in each file.
use strict;
open (FILE1, "<text1.txt") or die "can't open file text1.txt\n";
my %file_1_hash;
my $line;
my $line_counter = 0;
#read the 1st file into a hash
while ($line=<FILE1>){
chomp ($line); #-only if you want to get rid of 'endl' sign
$line_counter++;
if (!($line =~ m/^\s*$/)){
$file_1_hash{$line}=$line_counter;
}
}
close (FILE1);
#read and compare the second file
open (FILE2,"<text2.txt") or die "can't open file text2.txt\n";
$line_counter = 0;
while ($line=<FILE2>){
$line_counter++;
chomp ($line);
if (defined $file_1_hash{$line}){
print "Got a match: \"$line\"
in line #$line_counter in text2.txt and line #$file_1_hash{$line} at text1.txt\n";
}
}
close (FILE2);
You must re-open or reset the pointer of file 2. Move the open and close commands to within the loop.
A more efficient way of doing this, depending on file and line sizes, would be to only loop through the files once and save each line that occurs in file 1 in a hash. Then check if the line was there for each line in file 2.
If you want the number of lines,
my $count=`grep -f [FILE1PATH] -c [FILE2PATH]`;
If you want the matching lines,
my #lines=`grep -f [FILE1PATH] [FILE2PATH]`;
If you want the lines which do not match,
my #lines = `grep -f [FILE1PATH] -v [FILE2PATH]`;
This is a script I wrote that tries to see if two file are identical, although it could easily by modified by playing with the code and switching it to eq. As Tim suggested, using a hash would probably be more effective, although you couldn't ensure the files were being compared in the order they were inserted without using a CPAN module (and as you can see, this method should really use two loops, but it was sufficient for my purposes). This isn't exactly the greatest script ever, but it may give you somewhere to start.
use warnings;
open (FILE, "orig.txt") or die "Unable to open first file.\n";
#data1 = ;
close(FILE);
open (FILE, "2.txt") or die "Unable to open second file.\n";
#data2 = ;
close(FILE);
for($i = 0; $i < #data1; $i++){
$data1[$i] =~ s/\s+$//;
$data2[$i] =~ s/\s+$//;
if ($data1[$i] ne $data2[$i]){
print "Failure to match at line ". ($i + 1) . "\n";
print $data1[$i];
print "Doesn't match:\n";
print $data2[$i];
print "\nProgram Aborted!\n";
exit;
}
}
print "\nThe files are identical. \n";
Taking the code you posted, and transforming it into actual Perl code, this is what I came up with.
use strict;
use warnings;
use autodie;
open my $fh1, '<', 'text1.txt';
open my $fh2, '<', 'text2.txt';
while(
defined( my $line1 = <$fh1> )
and
defined( my $line2 = <$fh2> )
){
chomp $line1;
chomp $line2;
if( $line1 eq $line2 ){
print "Got a match - $line1\n";
}else{
print "Lines don't match $line1 $line2"
}
}
close $fh1;
close $fh2;
Now what you may really want is a diff of the two files, which is best left to Text::Diff.
use strict;
use warnings;
use Text::Diff;
print diff 'text1.txt', 'text2.txt';