Modifying CSV file and Preserving Order - perl

The question that follows is a made up simplified example of a more complex problem that I'm trying to solve. I would like to preserve the structure of the code, especially the use of the %hash to store the outcomes for each patient but I do not need to read the data file into memory (but I cannot find a way of reading my csv data file line by line from the end.)
My sample data is made up of events that occur to patients. A patient can be added to the study (Event=B) or he can die (Event=D) or exit the study(Event=F.) Death and Exit are the only two possible outcomes for each patient.
For each event I have the date of occurrence (in hours from given point in time), the unique ID number of each patient, the event and the Outcome (a field set to 0 for every patient.)
I'm trying to write a code that will change the input file by putting next to each addition of a new patient, what is his eventual outcome (death or exit.)
In order to do so, I read the file from the end, and whenever I encounter a death or exit of a patient, I populate a hash that matches patient ID with outcome. When I encounter an event telling me that a new patient has been added to the study, I then match his ID with those in the hash and change the value of "Outcome" from 0 to either D or F.
I have been able to write a code that reads the file from bottom and then creates a new modified file with the updated value for Outcome. The problem is that since I read the input file from bottom to top and print each line after reading it, the output file is in reversed order and I do not know how to change this. Also, ideally I don't want to create a new file bu I would like to simply modify the input one. However, I have failed with every attempt to do so.
Sample data:
Data,PatientNumber,Event,Outcome
25201027,562962838335407,B,0
25201028,562962838335408,B,0
25201100,562962838335407,D,0
25201128,562962838335408,F,0
My code:
#!/usr/bin/perl
use strict;
use warnings;
use diagnostics;
open (my $fh_input, "<", "mini_test2.csv")
or die "cannot open > mini_test2.csv: $!";
my #lines = <$fh_input>;
close $fh_input;
open (my $fh_output, ">>", "Revised_mini_test2.csv")
or die "cannot open > Revised_mini_test2.csv: $!";
my $length = scalar(#lines);
my %outcome;
my #input_variables;
for (my $i = 1; $i < #lines; $i++){
chomp($lines[$length-$i]);
#input_variables=split(/,/, $lines[$length - $i]);
if ($input_variables[2] eq "D" || $input_variables[2] eq "F"){
$outcome{$input_variables[1]} = $input_variables[2];
my $line = join(",", #input_variables);
print $fh_output $line . "\n";
}
elsif($input_variables[2] eq "B") {
$input_variables[3]=$outcome{$input_variables[1]};
my $line = join(",", #input_variables);
print $fh_output $line . "\n";
}
else{
# necessary since the actual data has many more possible "Events"
my $line = join(",", #input_variables);
print $fh_output $line . "\n";
}
}
close $fh_output;
EDIT: desired output should be
Data,PatientNumber,Event,Outcome
25201027,562962838335407,B,D
25201028,562962838335408,B,F
25201100,562962838335407,D,0
25201128,562962838335408,F,0
Also, an additional complication is that the unique patient ID after the exit of a patient gets re-used. This means that I cannot do a 1st pass and store the outcome for each patient and a 2nd one to update the values of Outcome.
EDIT 2: let me clarify that when I say that each patient has a "unique ID" I mean that there cannot be in the study, at the same time, two patients with the same ID. However, if a patient exits the study, his ID gets re-used.

Update
I have just read your additional information that patient numbers are re-used once they exit the study. Why you would design a system like that I don't know, but there it is
It becomes far harder to write something straightforward without reading the file into an array, so that's what I have done here
use strict;
use warnings;
use 5.010;
use autodie;
open my $fh, '<', 'mini_test2.csv';
my #data;
while ( <$fh> ) {
chomp;
push #data, [ split /,/ ];
}
my %outcome;
for ( my $i = $#data; $i > 0; --$i ) {
my ($patient_number, $event) = #{$data[$i]}[1,2];
if ( $event =~ /[DF]/ ) {
$outcome{$patient_number} = $event;
}
elsif ( $event =~ /[B]/ ) {
$data[$i][3] = delete $outcome{$patient_number} // 0;
}
}
print join(',', #$_), "\n" for #data;
output
Data,PatientNumber,Event,Outcome
25201027,562962838335407,B,D
25201028,562962838335408,B,F
25201100,562962838335407,D,0
25201128,562962838335408,F,0
There are a few ways to approach this. I have chosen to take two passes through the file, first accumulating the outcome for each patient in a hash, and then replacing all the outcome fields in the B records
use strict;
use warnings;
use 5.010;
use autodie;
use Fcntl ':seek';
my %outcome;
open my $fh, '<', 'mini_test2.csv';
<$fh>; # Drop header
while ( <$fh> ) {
chomp;
my #fields = split /,/;
my ($patient_number, $event) = #fields[1,2];
if ( $event =~ /[DF]/ ) {
$outcome{$patient_number} = $event;
}
}
seek $fh, 0, SEEK_SET; # Rewind
print scalar <$fh>; # Copy header
while ( <$fh> ) {
chomp;
my #fields = split /,/;
my ($patient_number, $event) = #fields[1,2];
if ( $event !~ /[DF]/ ) {
$fields[3] = $outcome{$patient_number} // 0;
}
print join(',', #fields), "\n";
}
output
Data,PatientNumber,Event,Outcome
25201027,562962838335407,B,D
25201028,562962838335408,B,F
25201100,562962838335407,D,0
25201128,562962838335408,F,0

What we can do is instead of printing out the line at each stage, we'll write it back to the array of lines. Then we can just print them out at the end.
for (my $i=$#lines; i>=0; i--)
{
chomp $lines[$i];
#input_variables = split /,/, $lines[$i];
if ($input_variables[2] eq "D" || $input_variables[2] eq "F")
{
$outcome{$input_variables[1]}=$input_variables[2];
}else
{
$input_variables[3]=$outcome{$input_variables[1]};
}
$line[$i] = join ",", #input_variables;
}
$, = "\n"; #Make list seperator for printing a newline.
print $fh_output #lines;
As for the second question of modifying the original file. It is possible to open a file for both reading and writing using modes "+<", "+>", or "+>>". Don't do this! It is error prone as you must replace data character by character.
The standard way to "modify" an existing file is to rename it, read from the renamed file, write to a new file with the original name, and delete the temp file.
my $file_name = "mini_test2.csv";
my $tmp_file_name = $file_name . ".tmp";
rename $file_name, $tmp_file_name;
open (my $fh_input, "<", $tmp_file_name)
or die "cannot open > $tmp_file_name: $!";
open (my $fh_output, ">>", $file_name)
or die "cannot open > $file_name: $!";
#Your code to process the data.
close $fh_input;
close $fh_output;
#delete the temp file
unlink $tmp_file_name;
But, in your case, you slurp all of the data into memory right away. Just open for writing that clobbers existing files
open (my $fh_output, ">", "mini_test2.csv")
or die "cannot open > mini_test2.csv: $!";

Related

Perl print to seperate files

I have a text file which lists a service, device and a filter, here I list 3 examples only:
service1 device04 filter9
service2 device01 filter2
service2 device10 filter11
I have written a perl script that iterates through the file and should then print device=device filter=filter to a file named according to the service it belongs to, but if a string contains a duplicate filter, it should add the devices to the same file, seperated by semicolons. Looking at the above example, I then need a result of:
service1.txt
device=device04 filter=filter9
service2.txt
device=device01 filter=filter2 ; device=device10 filter=filter11
Here is my code:
use strict;
use warnings qw(all);
open INPUT, "<", "file.txt" or die $!;
my #Input = <INPUT>;
foreach my $item(#Input) {
my ($serv, $device, $filter) = split(/ /, $item);
chomp ($serv, $device, $filter);
push my #arr, "device==$device & filter==$filter";
open OUTPUT, ">>", "$serv.txt" or die $!;
print OUTPUT join(" ; ", #arr);
close OUTPUT;
}
The problem I am having is that both service1.txt and service2.txt are created, but my results are all wrong, see my current result:
service1.txt
device==device04 filter==filter9
service2.txt
device==device04 filter==filter9 ; device==device01 filter==filter2device==device04 filter==filter9 ; device==device01 filter==filter2 ; device==device10 filter==filter11
I apologise, I know this is something stupid, but it has been a really long night and my brain cannot function properly I believe.
For each service to have its own file where data for it accumulates you need to distinguish for each line what file to print it to.
Then open a new service-file when a service without one is encountered, feasible since there aren't so many as clarified in a comment. This can be organized by a hash service => filehandle.
use warnings;
use strict;
use feature 'say';
my $file = shift #ARGV || 'data.txt';
my %handle;
open my $fh, '<', $file or die "Can't open $file: $!";
while (<$fh>) {
my ($serv, $device, $filter) = split;
if (exists $handle{$serv}) {
print { $handle{$serv} } " ; device==$device & filter==$filter";
}
else {
open my $fh_out, '>', "$serv.txt" or do {
warn "Can't open $serv.txt: $!";
next;
};
print $fh_out "device==$device & filter==$filter";
$handle{$serv} = $fh_out;
}
}
say $_ '' for values %handle; # terminate the line in each file
close $_ for values %handle;
For clarity the code prints almost the same in both cases, what surely can be made cleaner. This was tested only with the provided sample data and produces the desired output.
Note that when a filehandle need be evaluated we need { }. See this post, for example.
Comments on the original code (addressed in the code above)
Use lexical filehandles (my $fh) instead of typeglobs (FH)
Don't read the whole file at once unless there is a specific reason for that
split has nice defaults, split ' ', $_, where ' ' splits on whitespace and discards leading and trailing space as well. (And then there is no need to chomp in this case.)
Another option is to first collect data for each service, just as OP attempts, but again use a hash (service => arrayref/string with data) and print at the end. But I don't see a reason to not print as you go, since you'd need the same logic to decide when ; need be added.
Your code looks pretty perl4-ish, but that's not a problem. As MrTux has pointed out, you are confusing collection and fanning out of your data. I have refactored this to use a hash as intermediate container with the service name as keys. Please note that this will not accumulate results across mutliple calls (as it uses ">" and not ">>").
use strict;
use warnings qw(all);
use File::Slurp qw/read_file/;
my #Input = read_file('file.txt', chomp => 1);
my %store = (); # Global container
# Capture
foreach my $item(#Input) {
my ($serv, $device, $filter) = split(/ /, $item);
push #{$store{$serv}}, "device==$device & filter==$filter";
}
# Write out for each service file
foreach my $k(keys %store) {
open(my $OUTPUT, ">", "$k.txt") or die $!;
print $OUTPUT join(" ; ", #{$store{$k}});
close( $OUTPUT );
}

How do I properly find double entries in two files in Perl?

Let's say I have two files with lists of ip-addresses. Lines in the first file are unique. Lines in the second may or may not be the same as in the first one.
What I need is to compare two files, and remove possible doubles from the second file in order to merge it with the base file later.
I've managed to write the following code and it seems to work properly, but I have a solid feeling that this code can be improved or I may be totally missing some important concept.
Are there any ways to solve the task without using complex data structures, i.e. hashrefs?
#!/usr/bin/perl
use strict;
use warnings;
my $base = shift #ARGV;
my $input = shift #ARGV;
my $res = 'result.txt';
open ("BASE","<","$base");
open ("INP","<","$input");
open ("RES", ">", "$res");
my $rhash = {}; # result hash
while (my $line = <BASE>) {chomp($line); $rhash->{$line}{'res'} = 1;} # create uniq table
while (my $line = <INP>) { chomp($line); $rhash->{$line}{'res'}++; $rhash->{$line}{'new'} = 1; } # create compare table marking it's entries as new and incrementing double keys
close BASE;
close INP;
for my $line (sort keys %$rhash) {
next if $line =~ /\#/; # removing comments
printf("%-30s%3s%1s", $line, $rhash->{$line}{'res'}, "\n") if $rhash->{$line}{'res'} > 1; # kinda diagnosti output of doubles
if (($rhash->{$line}{'new'}) and ($rhash->{$line}{'res'} < 2)) {
print RES "$line\n"; # printing new uniq entries to result file
}
}
close RES;
If I understand correctly file1 and file2 each contain ips (unique in each file) And you want to get ips in file2 not in file1. If so, then maybe the following code achieves your goal.
Although it seems your code will do it, this might be clearer.
#!/usr/bin/perl
use strict;
use warnings;
my $base = shift #ARGV;
my $input = shift #ARGV;
my $res = 'result.txt';
open ("BASE","<","$base") or die $!;
open ("INP","<","$input") or die $!;
open ("RES", ">", "$res") or die $!;
my %seen;
while (my $line = <BASE>) {
chomp $line;
$seen{$line}++;
}
close BASE or die $!;
while (my $line = <INP>) {
chomp $line;
print RES "$line\n" unless $seen{$line}; # only in file2 not in file1
}
close INP or die $!;
close RES or die $!;

Parsing file based on column ID: perl

I have a tab delineated file with repeated values in the first column. The single, but repeated values in the first column correspond to multiple values in the second column. It looks something like this:
AAAAAAAAAA1 m081216|101|123
AAAAAAAAAA1 m081216|100|1987
AAAAAAAAAA1 m081216|927|463729
BBBBBBBBBB2 m081216|254|260489
BBBBBBBBBB2 m081216|475|1234
BBBBBBBBBB2 m081216|987|240
CCCCCCCCCC3 m081216|433|1000
CCCCCCCCCC3 m081216|902|366
CCCCCCCCCC3 m081216|724|193
For every type of sequence in the first column, I am trying to print to a file with just the sequences that correspond to it. The name of the file should include the repeated sequence in the first column and the number of sequences that correspond to it in the second column. In the above example I would therefore have 3 files of 3 sequences each. The first file would be named something like "AAAAAAAAAA1.3.txt" and look like the following when opened:
m081216|101|123
m081216|100|1987
m081216|927|463729
I have seen other similar questions, but they have been answered with using a hash. I don't think I can't use a hash because I need to keep the number of relationships between columns. Maybe there is a way to use a hash of hashes? I am not sure.
Here is my code so far.
use warnings;
use strict;
use List::MoreUtils 'true';
open(IN, "<", "/path/to/in_file") or die $!;
my #array;
my $queryID;
while(<IN>){
chomp;
my $OutputLine = $_;
processOutputLine($OutputLine);
}
sub processOutputLine {
my ($OutputLine) = #_;
my #Columns = split("\t", $OutputLine);
my ($queryID, $target) = #Columns;
push(#array, $target, "\n") unless grep{$queryID eq $_} #array;
my $delineator = "\n";
my $count = true { /$delineator/g } #array;
open(OUT, ">", "/path/to/out_$..$queryID.$count.txt") or die $!;
foreach(#array){
print OUT #array;
}
}
I would still recommend a hash. However, you store all sequences related to the same id in an anonymous array which is the value for that ID key. It's really two lines of code.
use warnings;
use strict;
use feature qw(say);
my $filename = 'rep_seqs.txt'; # input file name
open my $in_fh, '<', $filename or die "Can't open $filename: $!";
my %seqs;
foreach my $line (<$in_fh>) {
chomp $line;
my ($id, $seq) = split /\t/, $line;
push #{$seqs{$id}}, $seq;
}
close $in_fh;
my $out_fh;
for (sort keys %seqs) {
my $outfile = $_ . '_' . scalar #{$seqs{$_}} . '.txt';
open $out_fh, '>', $outfile or do {
warn "Can't open $outfile: $!";
next;
};
say $out_fh $_ for #{$seqs{$_}};
}
close $out_fh;
With your input I get the desired files, named AA..._count.txt, with their corresponding three lines each. If items separated by | should be split you can do that while writing it out, for example.
Comments
The anonymous array for a key $seqs{$id} is created once we push, if not there already
If there are issues with tabs (converted to spaces?), use ' '. See the comment.
A filehandle is closed and re-opened on every open, so no need to close every time
The default pattern for split is ' ', also triggering specific behavior -- it matches "any contiguous whitespace", and also omits leading whitespace. (The pattern / / matches a single space, turning off this special behavior of ' '.) See a more precise description on the split page. Thus it is advisable to use ' ' when splitting on unspecified number of spaces, since in the case of split this is a bit idiomatic, is perhaps the most common use, and is its default. Thanks to Borodin for prompting this comment and update (the original post had the equivalent /\s+/).
Note that in this case, since ' ' is the default along with $_, we can shorten it a little
for (<$in_fh>) {
chomp;
my ($id, $seq) = split;
push #{$seqs{$id}}, $seq;
}

Record separator within a record separator

How can I make use of a record separator, and then simultaneously use a sub-record separator? Perhaps that isn't the best way to think about what I am trying to do. Here is my goal:
I want to perform a while loop on a single tab delimitated item at a time, in a specified row of items. For every line (row) of tab separated items, I need to print the outcomes of all the while loops into a unique file. Allow the following examples to help clarify.
My input file will be something like the following. It will be called "Clustered_Barcodes.txt"
TTTATGC TTTATGG TTTATCC TTTATCG
TTTATAA TTTATAA TTTATAT TTTATAT TTTATTA
CTTGTAA
My perl code looks like the following:
#!/usr/bin/perl
use warnings;
use strict;
open(INFILE, "<", "Clustered_Barcodes.txt") or die $!;
my %hash = (
"TTTATGC" => "TATAGCGCTTTATGCTAGCTAGC",
"TTTATGG" => "TAGCTAGCTTTATGGGCTAGCTA",
"TTTATCC" => "GCTAGCTATTTATCCGCTAGCTA",
"TTTATCG" => "TAGCTAGCTTTATCGCGTACGTA",
"TTTATAA" => "TAGCTAGCTTTATAATAGCTAGC",
"TTTATAA" => "ATCGATCGTTTATAACGATCGAT",
"TTTATAT" => "TCGATCGATTTATATTAGCTAGC",
"TTTATAT" => "TAGCTAGCTTTATATGCTAGCTA",
"TTTATTA" => "GCTAGCTATTTATTATAGCTAGC",
"CTTGTAA" => "ATCGATCGCTTGTAACGATTAGC",
);
while(<INFILE>) {
$/ = "\n";
my #lines = <INFILE>;
open my $out, '>', "Clustered_Barcode_$..fasta" or die $!;
foreach my $sequence (#lines){
if (exists $hash{$sequence}){
print $out ">$sequence\n$hash{$sequence}\n";
}
}
}
My desired output would be three different files.
The first file will be called "Clustered_Barcode_1.fasta" and will look like:
>TTTATGC
TATAGCGCTTTATGCTAGCTAGC
>TTTATGG
TAGCTAGCTTTATGGGCTAGCTA
>TTTATCC
GCTAGCTATTTATCCGCTAGCTA
>TTTATCG
TAGCTAGCTTTATCGCGTACGTA
Note that this is formatted so that the keys are preceded by a carrot, and then on the next line is the longer associated sequence (value). This file includes all the sequences in the first line of Clustered_Barcodes.txt
My third file should be named "Clustered_Barcode_3.fasta" and look like the following:
>CTTGTAA
ATCGATCGCTTGTAACGATTAGC
When I run my code, it only takes the second and third lines of sequences in the input file. How can I start with the first line (by getting rid of the \n requirement for a record separator)? How can I then process each item at a time and then print the line's worth of results into one file? Also, if there is a way to incorporate the number of sequences into the file name, that would be great. It would help me to later organize the files by size. For example, the name could be something like "Clusterd_Barcodes_1_File_3_Sequences.fasta".
Thank you all.
OK, so here's one way to do it:
#!/usr/bin/perl
use strict;
use warnings;
Standard preamble.
my %hash = (
"TTTATGC" => "TATAGCGCTTTATGCTAGCTAGC",
"TTTATGG" => "TAGCTAGCTTTATGGGCTAGCTA",
"TTTATCC" => "GCTAGCTATTTATCCGCTAGCTA",
"TTTATCG" => "TAGCTAGCTTTATCGCGTACGTA",
"TTTATAA" => "TAGCTAGCTTTATAATAGCTAGC",
"TTTATAA" => "ATCGATCGTTTATAACGATCGAT",
"TTTATAT" => "TCGATCGATTTATATTAGCTAGC",
"TTTATAT" => "TAGCTAGCTTTATATGCTAGCTA",
"TTTATTA" => "GCTAGCTATTTATTATAGCTAGC",
"CTTGTAA" => "ATCGATCGCTTGTAACGATTAGC",
);
Set up the hash of sequences.
my $infile = 'Clustered_Barcodes.txt';
open my $infh, '<', $infile or die "$0: $infile: $!\n";
Open file for reading.
chomp(my #rows = readline $infh);
my $row_count = #rows;
Slurp all lines into memory in order to get the number of sequences. If you have too many sequences, this approach is not going to work (because you'll run out of memory (but that depends on how much RAM you have)).
my $i = 1;
for my $row (#rows) {
Loop over the lines.
my #fields = split /\t/, $row;
Split each line into fields separated by tabs.
my $outfile = "Clustered_Barcodes_${i}_File_${row_count}_Sequences.fasta";
$i++;
open my $outfh, '>', $outfile or die "$0: $outfile: $!\n";
Open current output file and increment counter.
for my $field (#fields) {
print $outfh ">$field\n$hash{$field}\n" if exists $hash{$field};
}
Write each field (and its mapping) to outfile.
}
And we're done. The main difference to your original code is using split /\t/ and foreach to loop over fields within a line.
We can do it without slurping, too:
while (my $row = readline $infh) {
chomp $row;
Loop over the lines, one by one. This replaces the 4 lines from chomp(my #rows = readline $infh); to for my $row (#rows) {.
But now we've lost the $i and $row_count variables, so we have to change the initialization of $outfile:
my $outfile = "Clustered_Barcodes_$..fasta";
That should be all the changes you need. (You can get $row_count back in this scenario by reading $infh twice (the first time just for counting, then seeking back to the start); this is left as an exercise for the reader.)
There's no need to read in the whole file that I see here. You just need to loop over the contents of each line:
while(my $line = <INFILE>) {
chomp $line;
open my $out, '>', "Clustered_Barcode_$..fasta" or die $!;
foreach my $sequence ( split /\t/, $line ){
if (exists $hash{$sequence}){
print $out ">$sequence\n$hash{$sequence}\n";
}
}
}

Perl merging columns in two text files

I am a beginner with Perl and I want to merge the content of two text files.
I have read some similar questions and answers on this forum, but I still cannot resolve my issues
The first file has the original ID and the recoded ID of each individual (in the first and fourth columns)
The second file has the recoded ID and some information on some of the individuals (in the first and second columns).
I want to create an output file with the original, recoded and information of these individuals.
This is the perl script I have created so far, which is not working.
If anyone could help it would be very much appreciated.
use warnings;
use strict;
use diagnostics;
use vars qw( #fields1 $recoded $original $IDF #fields2);
my %columns1;
open (FILE1, "<file1.txt") || die "$!\n Couldn't open file1.txt\n";
while ($_ = <FILE1>)
{
chomp;
#fields1=split /\s+/, $_;
my $recoded = $fields1[0];
my $original = $fields1[3];
my %columns1 = (
$recoded => $original
);
};
open (FILE2, "<file2.txt") || die "$!\n Couldnt open file2.txt \n";
for ($_ = <FILE2>)
{
chomp;
#fields2=split /\s+/, $_;
my $IDF= $fields2[0];
my $F=$fields2[1];
my %columns2 = (
$F => $IDF
);
};
close FILE1;
close FILE2;
open (FILE3, ">output.txt") ||die "output problem\n";
for (keys %columns1) {
if (exists ($columns2{$_}){
print FILE3 "$_ $columns1{$_}\n"
};
}
close FILE3;
One problem is with scoping. In your first loop, you have a my in front of $column1 which makes it local to the loop and will not be in scope when you next the loop. So the %columns1 (which is outside of the loop) does not have any values set (which is what I suspect you want to set). For the assignment, it would seem to be easier to have $columns1{$recorded} = $original; which assigns the value to the key for the hash.
In the second loop you need to declare %columns2 outside of the loop and possibly use the above assignment.
For the third loop, in the print you just need add $columns2{$_} in front part of the string to be printed to get the original ID to be printed before the recorded ID.
Scope:
The problem is with scope of the hash variables you have defined. The scope of the variable is limited to the loop inside which the variable has been defined.
In your code, since %columns1 and %columns2 are used outside the while loops. Hence, they should be defined outside the loops.
Compilation error : braces not closed properly
Also, in the "if exists" part, the open-and-closed braces symmetry is affected.
Here is your code with the required corrections made:
use warnings;
use strict;
use diagnostics;
use vars qw( #fields1 $recoded $original $IDF #fields2);
my (%columns1, %columns2);
open (FILE1, "<file1.txt") || die "$!\n Couldn't open CFC_recoded.txt\n";
while ($_ = <FILE1>)
{
chomp;
#fields1=split /\s+/, $_;
my $recoded = $fields1[0];
my $original = $fields1[3];
%columns1 = (
$recoded => $original
);
}
open (FILE2, "<file2.txt") || die "$!\n Couldnt open CFC_F.xlsx \n";
for ($_ = <FILE2>)
{
chomp;
#fields2=split /\s+/, $_;
my $IDF= $fields2[0];
my $F=$fields2[1];
%columns2 = (
$F => $IDF
);
}
close FILE1;
close FILE2;
open (FILE3, ">output.txt") ||die "output problem\n";
for (keys %columns1) {
print FILE3 "$_ $columns1{$_} \n" if exists $columns2{$_};
}
close FILE3;