changing a hash value from string to array - perl

data.txt
Name:xyz
ID:1
Value: 1 2 3 4 5 6 7 8 9 ...
ID:2
Value: 9 8 7 6 5 4 3 2 1..
ID:3
Value: 90 89 88....
Name:abc
ID:11
value:...
Intial file.txt
## Header
..
data
data
data
..
Final expected file.txt
## Header xyz_1,xyz_2,xyz_3,abc_11,...
..
data 1 9 90
data 2 8 89
data 3 7 88
data 4 6
..
Current output file.txt
## Header xyz_1,xyz_2,xyz_3,abc_11,...
...
data, 1 2 3 4 5 6 7 8 9 ..,9 8 7 6 5 4 3 2 1 ..,90 89 88
data
data
...
Code
#!/usr/local/bin/perl
use diagnostics;
use strict;
use warnings;
use Tie::File;
my #name_id;
my %test;
#local $/ = '';
open my $fh, '<', 'data.txt' or die "failed: $!";
my %var;
while (<$fh>) {
chomp;
if (m/^([A-Z:]+):\s*(.*)/) {
$var{$1} = $2;
if (exists($var{Name}) && exists($var{ID}) && exists($var{value}) && $1 eq 'value') {
my $var_name = "$var{Name}_$var{ID}";
push #name_id, $var_name;
$test{$var_name} = $var{value};
}
}
}
# print join "\n\t", #test{#name_id};
my $match = "## Header";
tie my #lines, 'Tie::File', 'file.txt' or die "failed : $!";
for my $line (#lines) {
if ($line =~ /^($match.*)/) {
$line = $1 . "," . join ',', #name_id;
}
}
untie #lines;
my $match = "data";
tie my #lines, 'Tie::File', 'file.txt' or die "failed : $!";
my $i = 0;
for my $line (#lines) {
if ($line =~ /^($match.*)/) {
$line = $1 . "," . join(',', map { $test{$_}->[$i] } #name_id);
$i++;
}
}
untie #lines;
Have a problem with this line $line = $1 . "," . join (',', map { $test{$_}->[$i]} #name_id); it throws the error
Can't use string ("1 2 3 4 5 6 7 8 9 .."...) as an ARRAY ref while "strict refs" in use at test.pl line 46, line 80. at test.pl line 46
I think the hash(%test) value I had is a string and I can't split it as an array. Please let me know how to convert it to an array. I tried doing $test{$var_name} = [qw($var{value})]; it didnt work.

You may be interested in this refactoring of your code that seems to do what you want.
#!/usr/local/bin/perl
use strict;
use warnings;
use Tie::File;
open my $fh, '<', 'data.txt' or die "failed: $!";
my #name_id;
my %test;
my %var;
while (<$fh>) {
chomp;
if (my ($key, $val) = /^(\w+):\s*(.*)/) {
$var{$key} = $val;
if ($key eq 'value') {
my $var_name = "$var{Name}_$var{ID}";
push #name_id, $var_name;
$test{$var_name} = [ split ' ', $var{value} ];
}
}
}
tie my #lines, 'Tie::File', 'file.txt' or die "failed : $!";
my $count = 0;
for my $line (#lines) {
if ($line =~ /^## Header/) {
$line .= ' ' . join ',', #name_id;
}
elsif ($line =~ /^data/) {
$line .= ' ' . join ' ', map { $test{$_}[$count] // '' } #name_id;
$count++;
}
}
untie #lines;
output (file.txt)
## Header xyz_1,xyz_2 ,xyz_3
data 1 9 90
data 2 8 89
data 3 7 88
data 4 6

This is surely not right:
$test{$_}->[$i]
Because $test{$_} can only contain a string of some sort.
If you have a string and want to split into an arrayref so the above works, do this:
$test{$var_name} = [split /\s+/, $var{value}];
I have no idea what the code is supposed to accomplish which means that it may run, but I can't tell if it does what it is meant to. The odd variable names (like $test and $var_name didn't help me to understand the purpose).

I'm not too sure I followed your code, but I thought I'd post how to transpose the numbers (unless your code already does that :-) ).
#!/usr/bin/perl
use strict;
use warnings;
my (%data, $name);
while (<DATA>) {
if (/^Name:(.+)/) {
$name = $1
}
elsif (/^Value/) {
# transpose
my $r = 0;
push #{ $data{$name}[$r++] }, $_ for /\d+/g;
}
}
use Data::Dumper; print Dumper \%data;
__DATA__
Name:xyz
ID:1
Value: 1 2 3 4 5 6 7 8 9
ID:2
Value: 9 8 7 6 5 4 3 2 1
ID:3
Value: 90 89 88 87 86 85 84 83 82
Name:abc
ID:11
The dumped results are:
$VAR1 = {
'xyz' => [
[
'1',
'9',
'90'
],
[
'2',
'8',
'89'
],
[
'3',
'7',
'88'
],
[
'4',
'6',
'87'
],
[
'5',
'5',
'86'
],
[
'6',
'4',
'85'
],
[
'7',
'3',
'84'
],
[
'8',
'2',
'83'
],
[
'9',
'1',
'82'
]
]
};

Related

Perl-printing a 2-d matrix from a file

So i am trying to read a 2-d matrix in from a file so that I can multiply two matrices together. I can get the individual rows of the matrix to print, but I can't get the subroutine to return the entire matrix. I'm not sure what I'm doing wrong. I pasted the test matrix from the file I am using:
12345
67890
34567
The output I get is:
final matrix is: ##THIS IS WHAT I AM TRYING TO PRINT OUT BUT I GET NOTHING
row is:12345
row is:67890
row is:34567
Here is an example:
use feature qw(say);
use strict;
use warnings;
use Data::Dumper;
{
print "Enter filename: ";
chomp(my $matrix_file = <STDIN>);
say "final matrix is:";
my $matrix = matrix_read_file($matrix_file);
print Dumper($matrix);
}
sub matrix_read_file {
my ($filename) = #_;
my #matrix;
open (my $F, '<', $filename) or die "Could not open $filename: $!";
while (my $line =<$F> ) {
chomp $line;
next if $line =~ /^\s*$/; # skip blank lines
my #row = split /\s+/, $line;
push #matrix, \#row;
}
close $F;
return \#matrix;
}
If you give the following input file:
1 2 3 4 5
6 7 8 9 10
The program outputs:
final matrix is:
$VAR1 = [
[
'1',
'2',
'3',
'4',
'5'
],
[
'6',
'7',
'8',
'9',
'10'
]
];

Read specific column in perl

I'm new in perl. I have below text file and from there I want only one Time column and next columns are values. How can I create a text file with my desire output in perl.
Time Value Time Value Time Value
1 0.353366497 1 0.822193251 1 0.780866396
2 0.168834182 2 0.865650713 2 0.42429447
3 0.323540698 3 0.865984245 3 0.856875894
4 0.721728497 4 0.634773162 4 0.563059042
5 0.545131335 5 0.029808531 5 0.645993399
6 0.143720835 6 0.949973296 6 0.14425803
7 0.414601876 7 0.53421424 7 0.826148814
8 0.194818367 8 0.942334356 8 0.837107013
9 0.291448263 9 0.242588271 9 0.939609775
10 0.500159997 10 0.428897293 10 0.41946448
I've tried below code:
use strict;
use warnings;
use IO::File;
my $result;
my #files = (q[1.txt],q[2.txt],q[3.txt]);
my #fhs = ();
foreach my $file (#files) {
my $fh = new IO::File $file, O_RDONLY;
push #fhs, $fh if defined $fh;
}
while(1) {
my #lines = map { $_->getline } #fhs;
last if grep { not defined $_ } #lines[0..(#fhs-1)];
my #result=join(qq[\t], map { s/[\r?\n]+/ /g; $_ } #lines ) . qq[\r\n];
open (MYFILE, '>>Result.txt');
print (MYFILE "#result");
close (MYFILE);
}
I'd go with split.
use warnings;
use strict;
open (my $f, '<', 'your-file.dat') or die;
while (my $line = <$f>) {
my #elems = split ' ', $line;
print join "\t", #elems[0,1,3,5];
print "\n";
}
This is a one-liner; no need to write a script:
$ perl -lanE '$,="\t"; say #F[0,1,3,5]' 1.txt 2.txt 3.txt
If you like, you can shorten it to:
$ perl -lanE '$,="\t"; say #F[0,1,3,5]' [123].txt
Right now, you're just concatenating the lines of the files together. If that doesn't give you the output you like, you need to chop some columns out.
Since your output looks like you have tab delimited files as input, I split the lines coming in by tabs. And since you only wanted the second column, I only take the column at the first offset from the split.
my $line_num = 0;
while(1) {
my #lines = map { $_->getline } #fhs;
last if grep { not defined $_ } #lines[0..$#fhs];
$line_num++;
my #rows = map { [ split /\t/ ] } #lines;
my $time_val = $rows[0][0];
die "Time values are not all equal on line #$line_num!"
if grep { $time_val != $_->[0] } #rows
;
my $result = join( q[\t], $time_val, map { $_->[1] } #rows );
open (MYFILE, '>>Result.txt');
print (MYFILE "$result\n");
close (MYFILE);
}
Of course, there is no reason to do custom coding to split delimited columns:
use Text::CSV;
...
my $csv = Text::CSV->new( { sep_char => "\t" } );
while(1) {
my #rows = map { $csv->getline( $_ ) } #fhs;
last if grep { not defined $_ } #rows[0..$#fhs];
my ( $time_val, #time_vals ) = map { $_->[0] } #rows;
my #values = map { $_->[1] } #rows;
die "Time values are not all equal on line #$line_num!"
if grep { $time_val != $_ } #time_vals
;
my $result = join( q[\t], $time_val, #values );
...
}
use strict;
use warnings;
open(FH,"<","a.txt");
print "=========== A File content =========== \n";
my $a = `cat a.txt`;
print "$a\n";
my #temp = <>;
my (#arr, #entries, #final);
foreach ( #temp ) {
#arr = split ( " ", $_ );
push #entries, #arr;
}
close FH;
my #entries1 = #entries;
for(my $i = 7; $i<=$#entries; $i=$i+2) {
push #final, $entries[$i];
}
my $size = scalar #final;
open FH1, ">", "b.txt";
print FH1 "Time \t Value\n";
for(my $i = 0; $i < $size; $i++) {
my $j = $i+1;
print FH1 "$j \t $final[$i]\n";
}
close FH1;
print "============ B file content ===============\n";
my $b = `cat b.txt`;
print "$b";
O/P:
=========== A File content ===========
Time Value Time Value Time Value
1 0.353366497 1 0.822193251 1 0.780866396
2 0.168834182 2 0.865650713 2 0.42429447
3 0.323540698 3 0.865984245 3 0.856875894
4 0.721728497 4 0.634773162 4 0.563059042
5 0.545131335 5 0.029808531 5 0.645993399
6 0.143720835 6 0.949973296 6 0.14425803
7 0.414601876 7 0.53421424 7 0.826148814
8 0.194818367 8 0.942334356 8 0.837107013
9 0.291448263 9 0.242588271 9 0.939609775
10 0.500159997 10 0.428897293 10 0.41946448
============ B file content ===============
Time Value
1 0.353366497
2 0.822193251
3 0.780866396
4 0.168834182
5 0.865650713
6 0.42429447
7 0.323540698
8 0.865984245
9 0.856875894
10 0.721728497
11 0.634773162
12 0.563059042
13 0.545131335
14 0.029808531
15 0.645993399
16 0.143720835
17 0.949973296
18 0.14425803
19 0.414601876
20 0.53421424
21 0.826148814
22 0.194818367
23 0.942334356
24 0.837107013
25 0.291448263
26 0.242588271
27 0.939609775
28 0.500159997
29 0.428897293
30 0.41946448

printing hash values in new line using tie

I have a hash with few keys and each key has 20 values.
%test={
a=> 10 14 34 56 ....
b=> 56 67 89 66 ...
..
}
#values= {a,b,..}
I want to tie values from this hash to another file as shown below
my input file.txt
ID
ID
ID
...
expected file.txt
ID ,10 ,56
ID ,14, 67
ID ,34, 89
ID ,56, 66
..
My code right now ties the all the values to the first line of my file. please help formatting it.
my $match = "ID";
tie my #lines, 'Tie::File', 'file.txt' or die "failed : $!";
for my $line (#lines) {
while ( $line =~ /^($match.*)/ ) {
$line = $1 . "," . join ',',#test{#values};
}
}
untie #lines;
right now my output is
file.txt
ID ,10 ,14, 34, 56,... 56, 67, 89, 66....
ID
ID
ID
I'm a bit confused by your question...
You have some template file that only contains ID at the beginning of (n) lines?
And you want to iterate over each $key by $test->{$key}[$line_count]?
Something seems fishy(I think you must be leaving something out) here. There's going to be quite a few ways to go wrong with this design...
Anyways, I think this is what you're going for:
my $match = "ID";
my $test = {
a => [ qw(1 3 5) ],
b => [ qw(2 4 6) ],
};
tie my #lines, 'Tie::File', 'file.txt' or die "failed : $!";
my $i = 0;
for my $line (#lines) {
if( $line =~ /^($match.*)/ ) {
my #stuff = ();
for my $key ( keys %$test ) {
push #stuff, $test->{$key}[$i];
}
$line = $1 . ", " . join(', ', #stuff);
$i++;
}
}
untie #lines;
Assuming that this is what you have/want:
$ cat file.txt
ID
ID
ID
$ test.pl
$ !cat
cat file.txt
ID, 1, 2
ID, 3, 4
ID, 5, 6
Do you simply want
my %test = (
a => [ 10, 14, 34, 56, ... ],
b => [ 56, 67, 89, 66, ... ],
);
for (0..$#{ $test{a} }) {
print(join(',', 'ID', $test{a}[$_], $test{b}[$_]), "\n");
}
You could write to a file instead of STDOUT by creating the file using
open(my $fh, '>', 'file.txt')
or die("Can't create file.txt: $!\n");
and then using
print($fh ...);
but it's better to let the user redirect the output to the file using >file.txt.
Here is my take, although the tie seems superfluous to me:
use strict;
use warnings;
use Tie::File;
my %test=(
a=> [qw(10 14 34 56)],
b=> [qw(56 67 89 66)]
);
my #values= qw(a b);
my $match = "ID";
tie my #lines, 'Tie::File', 'file.txt' or die "failed : $!";
my $i = 0;
for my $line (#lines) {
if ( $line =~ /^($match.*)/ ) {
$line = $1 . "," . join(',', map { $test{$_}->[$i]} #values );
$i++;
}
}
untie #lines;
Output (file.txt):
ID,10,56
ID,14,67
ID,34,89
ID,56,66

parse a tab delimited data using perl

I have a tab delimited data. I want to process that data using perl. I am a newbie to perl and could not figure out how to solve .
This is sample table: My original file is almost a GB
gi|306963568|gb|GL429799.1|_1316857_1453052 13 1
gi|306963568|gb|GL429799.1|_1316857_1453052 14 1
gi|306963568|gb|GL429799.1|_1316857_1453052 15 1
gi|306963568|gb|GL429799.1|_1316857_1453052 16 1
gi|306963568|gb|GL429799.1|_1316857_1453052 17 1
gi|306963568|gb|GL429799.1|_1316857_1453052 360 1
gi|306963568|gb|GL429799.1|_1316857_1453052 361 1
gi|306963568|gb|GL429799.1|_1316857_1453052 362 1
gi|306963568|gb|GL429799.1|_1316857_1453052 363 1
gi|306963568|gb|GL429799.1|_1316857_1453052 364 1
gi|306963568|gb|GL429799.1|_1316857_1453052 365 1
gi|306963568|gb|GL429799.1|_1316857_1453052 366 1
gi|306963580|gb|GL429787.1|_4276355_4500645 38640 1
gi|306963580|gb|GL429787.1|_4276355_4500645 38641 1
gi|306963580|gb|GL429787.1|_4276355_4500645 38642 1
gi|306963580|gb|GL429787.1|_4276355_4500645 38643 1
gi|306963580|gb|GL429787.1|_4276355_4500645 38644 1
gi|306963580|gb|GL429787.1|_4276355_4500645 38645 1
I would like to get the output as
Name, start value, end value, average
gi|306963568|gb|GL429799.1|_1316857_1453052 13 17 1
gi|306963568|gb|GL429799.1|_1316857_1453052 360 366 1
gi|306963580|gb|GL429787.1|_4276355_4500645 38640 38645 1
it will be great if someone could share their wisdom.
The general pattern is
use strict;
use warnings;
open my $fh, '<', 'myfile' or die $!;
while (<$fh>) {
chomp;
my #fields = split /\t/;
...
}
Within the loop the fields can be accessed as $fields[0] through $fields[2].
Update
I have understood your question better, and I think this solution will work for you. Note that it assumes the input data is sorted, as you have shown in your question.
It accumulates the start and end values, the total and the count in hash %data, and keeps a list of all the names encountered in #names so that the data can be displayed in the order it was read.
The program expects the input file name as a parameter on the command line.
You need to consider the formatting of the average because it is a floating point value. As it stands it will display the value to sixteen significant figures, and you may want to curtail that using sprintf.
use strict;
use warnings;
my ($filename) = #ARGV;
open my $fh, '<', $filename or die qq{Unable to open "$filename": $!};
my #names;
my %data;
my $current_name = '';
my $last_index;
while (<$fh>) {
chomp;
my ($name, $index, $value) = split /\t/;
if ( $current_name ne $name or $index > $last_index + 1 ) {
push #names, $name unless $data{$name};
push #{ $data{$name} }, {
start => $index,
count => 0,
total => 0,
};
$current_name = $name;
}
my $entry = $data{$name}[-1];
$entry->{end} = $index;
$entry->{count} += 1;
$entry->{total} += $value;
$last_index = $index;
}
for my $name (#names) {
for my $entry (#{ $data{$name} }) {
my ($start, $end, $total, $count) = #{$entry}{qw/ start end total count /};
print join("\t", $name, $start, $end, $total / $count), "\n";
}
}
output
gi|306963568|gb|GL429799.1|_1316857_1453052 13 17 1
gi|306963568|gb|GL429799.1|_1316857_1453052 360 366 1
gi|306963580|gb|GL429787.1|_4276355_4500645 38640 38645 1
This will produce the same output for the sample in your question:
#!/usr/bin/env perl -n
#
my ($name, $i, $value) = split(/\t/);
sub print_stats {
print join("\t", $prev_name, $start, $prev_i, $sum / ($prev_i - $start + 1)), "\n";
}
if ($prev_name eq $name && $i == $prev_i + 1) {
$sum += $value;
$prev_i = $i;
}
else {
if ($prev_name) {
&print_stats();
}
$start = $i;
$prev_name = $name;
$sum = $value;
$prev_i = $i;
}
END {
&print_stats();
}
Use it as:
./parser.pl < sample.txt
UPDATE: answers to the questions in comments:
To print output to a file, run like this: ./parser.pl < sample.txt > output.txt
$prev_name and $prev_i are NOT initialized, so they are undef at first (= NULL)
You could do something like this....
open (FILE, 'data.txt');
while (<FILE>) {
chomp;
($name, $start_value, $end_value, $average) = split("\t");
print "Name: $name\n";
print "Start Value: $start_value\n";
print "End Value: $End_Value\n";
print "Average: %average
print "---------\n";
}
close (FILE);
exit;
Those look like GenBank files...so I'm unsure where you are getting the start, end values, average.
Here's an example using Text::CSV:
use Text::CSV; # This will implicitly use Text::CSV_XS if it's installed
my $parser = Text::CSV->new( { sep_char => '|' } );
open my $fh, '<', 'myfile' or die $!;
while (my $row = $parser->getline($fh)) {
# $row references an array of field values from the line just read
}
Also, as a minor side detail, your sample data is delimited by pipe characters, not tabs, although that may just be to avoid copy/paste errors for those answering your question. If the actual data is tab-delimited, set sep_char to "\t" instead of '|'.

merging two files with similar columns

I have a two tab separated files that I need to align together. for example:
File 1: File 2:
AAA 123 BBB 345
BBB 345 CCC 333
CCC 333 DDD 444
(These are large files, potentially thousands of lines!)
What I would like to do is to have the output look like this:
AAA 123
BBB 345 BBB 345
CCC 333 CCC 333
DDD 444
Preferably I would like to do this in perl, but not sure how. any help would be greatly appreaciated.
If its just about making a data structure, this can be quite easy.
#!/usr/bin/env perl
# usage: script.pl file1 file2 ...
use strict;
use warnings;
my %data;
while (<>) {
chomp;
my ($key, $value) = split;
push #{$data{$key}}, $value;
}
use Data::Dumper;
print Dumper \%data;
You can then output in any format you like. If its really about using the files exactly as they are, then its a little bit more tricky.
Assuming the files are sorted,
sub get {
my ($fh) = #_;
my $line = <$fh>;
return () if !defined($line);
return split(' ', $line);
}
my ($key1, $val1) = get($fh1);
my ($key2, $val2) = get($fh2);
while (defined($key1) && defined($key2)) {
if ($key1 lt $key2) {
print(join("\t", $key1, $val1), "\n");
($key1, $val1) = get($fh1);
}
elsif ($key1 gt $key2) {
print(join("\t", '', '', $key2, $val2), "\n");
($key2, $val2) = get($fh2);
}
else {
print(join("\t", $key1, $val1, $key2, $val2), "\n");
($key1, $val1) = get($fh1);
($key2, $val2) = get($fh2);
}
}
while (defined($key1)) {
print(join("\t", $key1, $val1), "\n");
($key1, $val1) = get($fh1);
}
while (defined($key2)) {
print(join("\t", '', '', $key1, $val1), "\n");
($key2, $val2) = get($fh2);
}
Similar to Joel Berger's answer, but this approach allows to you keep track of whether files did or did not contain a given key:
my %data;
while (my $line = <>){
chomp $line;
my ($k) = $line =~ /^(\S+)/;
$data{$k}{line} = $line;
$data{$k}{$ARGV} = 1;
}
use Data::Dumper;
print Dumper(\%data);
Output:
$VAR1 = {
'CCC' => {
'other.dat' => 1,
'data.dat' => 1,
'line' => 'CCC 333'
},
'BBB' => {
'other.dat' => 1,
'data.dat' => 1,
'line' => 'BBB 345'
},
'DDD' => {
'other.dat' => 1,
'line' => 'DDD 444'
},
'AAA' => {
'data.dat' => 1,
'line' => 'AAA 123'
}
};
As ikegami mentioned, it assumes that the files' contents are arranged as shown in your example.
use strict;
use warnings;
open my $file1, '<file1.txt' or die $!;
open my $file2, '<file2.txt' or die $!;
my $file1_line = <$file1>;
print $file1_line;
while ( my $file2_line = <$file2> ) {
if( defined( $file1_line = <$file1> ) ) {
chomp $file1_line;
print $file1_line;
}
my $tabs = $file1_line ? "\t" : "\t\t";
print "$tabs$file2_line";
}
close $file1;
close $file2;
Reviewing your example, you show some identical key/value pairs in both files. Given this, it looks like you want to show the pair(s) unique to file 1, unique to file 2, and show the common pairs. If this is the case (and you're not trying to match the files' pairs by either keys or values), you can use List::Compare:
use strict;
use warnings;
use List::Compare;
open my $file1, '<file1.txt' or die $!;
my #file1 = <$file1>;
close $file1;
open my $file2, '<file2.txt' or die $!;
my #file2 = <$file2>;
close $file2;
my $lc = List::Compare->new(\#file1, \#file2);
my #file1Only = $lc->get_Lonly; # L(eft array)only
for(#file1Only) { print }
my #bothFiles = $lc->get_intersection;
for(#bothFiles) { chomp; print "$_\t$_\n" }
my #file2Only = $lc->get_Ronly; # R(ight array)only
for(#file2Only) { print "\t\t$_" }