I have a hash of hashes where, at the last level, I want each value to be appended - not updated - if that value already exists. What would be the best way to do this? I was thinking about making the values as lists, but this is probably not the most efficient way...
Here's where I got so far:
#!/usr/local/bin/perl
use strict;
use warnings;
use Data::Dumper;
my $dir='D:\';
open my $data,"<","$dir\\file.txt";
my (#selecteddata,#array,%hash);
while (<$data>) {
chomp $_;
my #line= split "\t";
$hash{$line[1]}{$line[2]}=$line[0];
warn Dumper \%hash;
}
close $data;
Note, this code updates the values at last level with value $line[0], but if the key $line[4] already exists (meaning, it already has a previous value $line[0]) I want this value to be appended and not updated.
So, ideally, for the following (tab sepparated) list:
a1 b1 c1
a2 b2 c2
a3 b3 c3
a4 b4 c4
a5 b4 c4
The hash would look something like this - I don't know exactly how the grouping of a4 and a5 should look like, so as long as they are grouped it should be ok:
{
'b1' => {'c1' => 'a1'},
'b2' => {'c2' => 'a2'},
'b3' => {'c3' => 'a3'},
'b4' => {'c4' => 'a4, a5'}
}
You can append your string,
$_ = defined($_) ? "$_, $line[0]" : $line[0]
for $hash{$line[1]}{$line[2]};
or use array which is better suited for storing list of elements,
push #{ $hash{$line[1]}{$line[2]} }, $line[0];
Related
My data looks like this:
G1 G2 G3 G4
Pf1 NO B1 NO D1
Pf2 NO NO C1 D1
Pf3 A1 B1 NO D1
Pf4 A1 NO C1 D2
Pf5 A3 B2 C2 D3
Pf6 NO B3 NO D3
My purpose is to check in each column if an element (different from the "NO" cases) is showed twice (like A1 in column 2, for example) and only twice (if it is showed three times or more I don't want it in the output) and, if so, write the correspondent elements of the first column. So, the desired output looks like this:
Pf3 Pf4 A1
Pf1 Pf3 B1
Pf2 Pf4 C1
Pf5 Pf6 D3
I'm trying to write a perl script, but I need some help to focus on the different steps. This is what I did so far:
open (HAN, "< $file_in") || die "Impossible open the in_file";
#r = <HAN>;
close (HAN);
for ($i=0; $i<=$#r; $i++){
chomp ($r[$i]);
($Ids, #v) = split (/\t/, $r[$i]);
}
}
But I cannot go on in any direction!
(My perl knowledge needs to be pushed by you!)
The hot points in my mind are:
how do I compare elements from the same column (or anyway in the same file)?
how can I associate the elements of the first column with the other column ones (may be keys)?
Any help is absolutely necessary and welcome!
use Data::Dumper;
my %hash;
while (<DATA>) {
next if $.==1;
chomp;
my ($first,#others) = (split /\s+/);
for (#others){
$hash{$_}.=' '.$first;
}
}
print Dumper \%hash;
__DATA__
G1 G2 G3 G4
Pf1 NO B1 NO D1
Pf2 NO NO C1 D1
Pf3 A1 B1 NO D1
Pf4 A1 NO C1 D2
Pf5 A3 B2 C2 D3
Pf6 NO B3 NO D3
What I use here? (tricks)
while (<DATA>){BLOCK} - read data from specific DATA section in Perl script file. (yes, you can put test data here, if you want. But don't store everything! this is not a bin!)
next if $.==1 - $. - special variable, that store a line number of input data. like 'index'.
chomp; - back to while(<DATA>).
Some variables in Perl are hidden. In functions - #_ array of input parameters. And always Perl programmers like to use $_ - You variable.
And this while(<DATA>) really a hidden while(defined($_ = <DATA>)).
Function chomp use hidden-You variable and try to chop \n symbol at the end.
Function split /REGEX/ also take as default variable hidden-You variable ($_).
Perl multi liner :),
perl -anE '
/^\S/ or next;
$k = shift #F;
push #{$t{$_}}, $k for#F;
}{
#$_-1==2 and say join" ",#$_ for map [#{$t{$_}},$_], sort keys%t;
' file
I would like a relatively compact command to perform line-by-line de-interleaving of a text file, i.e
a1
a2
a3
a4
b1
b2
b3
b4
c1
c2
c3
c4
d1
d2
d3
d4
maps to
a1
b1
c1
d1
a2
b2
c2
d2
a3
b3
c3
d3
a4
b4
c4
d4
The interleaving depth should be adjustable. The lines themselves do not contain any useful structure to assist with the process, and the example above is just a toy example for demonstration purposes. What tool can I use to do this?
sort can do it!
$ sort -k1.2 your_file
-k1.2 sorts by first field starting from 2nd character.
Output:
a1
b1
c1
d1
a2
b2
c2
d2
a3
b3
c3
d3
a4
b4
c4
d4
Basically, what you're looking at doing is reading your data into a 2D array. As you read it in, you can (for example) put the data into the array row by row.
Then when you write the data out, you traverse the array column by column. Adjusting the (de-)interleaving you do just requires a different size of array (or at least that you use a different amount of it, though you could leave the array size itself fixed, if you chose).
According to your new requirements, reordering elements based on their position in the file:
use strict;
use warnings;
my #sorted;
my $depth = 4; # the adjustable interleaving depth
while (<DATA>) {
my $num = ($. % $depth) - 1; # $. is input line number
push #{ $sorted[$num] }, $_;
}
for (#sorted) {
print #$_;
}
__DATA__
a1
a2
a3
a4
b1
b2
b3
b4
c1
c2
c3
c4
d1
d2
d3
d4
Note that the script can be tested on an input file by changing <DATA> to <> and running:
perl script.pl input.txt
Update
Having finally understood your question, thanks to TLP, I suggest this solution. It expects the depth and the input file name on the command line:
$ perl deinter.pl 4 interleaved.txt
and prints the reordered data to STDOUT.
use strict;
use warnings;
my $depth = shift;
my #data = <>;
for my $start (0 .. $depth-1) {
for (my $i = $start; $i < #data; $i += $depth) {
print $data[$i];
}
}
output
a1
b1
c1
d1
a2
b2
c2
d2
a3
b3
c3
d3
a4
b4
c4
d4
Previous solution
Here is a technique that reads the whole file into memory, builds a set of keys for comparison, and sorts the indices of the data so that they can be printed in the new order.
It can be changed for your purposes by modifying the regular expression that extracts the keys fields, and changing the sort block so that the sorted order is correct.
If your file is enormous then it may be necessary to build only the array of keys in memory, and leave the rest of the data on file to be read as it is output.
use strict;
use warnings;
open my $fh, '<', 'interleaved.txt' or die $!;
my #data = <$fh>;
my #keys = map [ /^(.)(.)/ ], #data;
my #sorted = sort {
$keys[$a][1] <=> $keys[$b][1] or
$keys[$a][0] cmp $keys[$b][0]
} 0 .. $#keys;
print $data[$_] for #sorted;
This might work for you (GNU sed and sort):
sed '1{x;s/^/1/;x};G;s/\n/\t/p;x;y/1234/2341/;x;d' file|sort -sk2|sed 's/\t.*//'
I'd like to credit Borodin and TLP for their inputs and answers, which inspired the solution. Its ugly, but I like it
awk 'BEGIN{v=4}{now=(NR-1)%v; STOR[now] = STOR[now] "\n" $0;} END {for (v in STOR) print STOR[v]}'
It also has the flaw of printing spurious newlines (well, the ones appended to the start of the array), but I can deal with that.
EDIT:
Solution for the newlines:
awk 'BEGIN{v=4}{now=(NR-1)%v; STOR[now] = STOR[now] "\n" $0;} END {for (v in STOR) print substr(STOR[v],2)}'
I would like to remove the first character from a the elements of an array in a Perl script.
I have this line of script:
#dash = split /\s+/, $dash;
The variable "dash" is read from a particular row of my file: Example
21 A10 A11 A12 A13 ..
Then I have tried to push these values to my hash called "flowers"
for $i (1..$#dash) {
push(#flowers, $line[$i]);
}
This seems to work for what I need in my subsequent lines of script but I have found out that $dash contains unwanted character in front of each values:
A10 A11 A12 A13 ..
instead of
10 11 12 13 .....
but I wanted #flowers to contain:
10 11 12 13 ....
How can I delete the first character Before I pushed it to my hash (#flowers)
chop(#flowers);
could have worked but it only chops out the last character. When I tried to use
substr($dash, 0, 2)
It does produce 10, but all the rest of the values A11 A12 A13 is no longer in my #flowers.
Any help is appreciated.
This will operate on each element of the #dash array :
#dash = split /\s+/, $dash;
shift #dash;
#dash = map { substr($_, 1) } #dash;
Your substr($dash, 0, 2) was operating on the line as one string, not each element of it.
And, unless you need the index for some other operation :
push #flowers, #dash
That will push all elements of #dash onto #flowers. Which looks like what you're doing.
Why not just change the regex in the split?
split /\s+\D?/, $dash;
Adding them to #flowers this way if you want:
push( #flowers, split(/\s+\D?/, $dash) );
You need some kind of loop, since you want to do something to each element of #dash other than the first. map is convenient here.
my #flowers = map substr($dash[$_], 1), 1..$#dash;
which is the short way of writing
my #flowers;
for (1..$#dash) {
push #flowers, substr($dash[$_], 1);
}
I suggest that you just pull out all the digit sequences from $dash, like this:
my $dash = '21 A10 A11 A12 A13 .. ';
my #flowers = $dash =~ /\d+/g;
shift #flowers;
print "#flowers";
output
10 11 12 13
This is a possible solution:
use strict;
use warnings;
my $dash = "21 A10 A11 A12 A13"; #test data
my #dash = split /\s+/, $dash; #split into #dash array
shift #dash; #delete first array value
$_ = substr($_,1) for #dash; #for each item in array, remove the first character
print "#dash\n"; #prints: 10 11 12 13
I have a file with around 25000 records, each records has more than 13 entries are drug names. I want to form all the possible pair combination for these entries. Eg: if a line has three records A, B, C. I should form combinations as 1) A B 2) A C 3)B C. Below is the code I got from internet, it works only if a single line is assigned to an array:
use Math::Combinatorics;
my #n = qw(a b c);
my $combinat = Math::Combinatorics->new(
count => 2,
data => [#n],
);
while ( my #combo = $combinat->next_combination ) {
print join( ' ', #combo ) . "\n";
}
The code I am using, it doesn't produce any output:
open IN, "drugs.txt" or die "Cannot open the drug file";
open OUT, ">Combination.txt";
use Math::Combinatorics;
while (<IN>) {
chomp $_;
#Drugs = split /\t/, $_;
#n = $Drugs[1];
my $combinat = Math::Combinatorics->new(
count => 2,
data => [#n],
);
while ( my #combo = $combinat->next_combination ) {
print join( ' ', #combo ) . "\n";
}
print "\n";
}
Can you please suggest me a solution to this problem?
You're setting #n to be an array containing the second value of the #Drugs array, try just using data => \#Drugs in the Math::Combinatorics constructor.
Also, use strict; use warnings; blahblahblah.
All pairs from an array are straightforward to compute. Using drugs A, B, and C as from your question, you might think of them forming a square matrix.
AA AB AC
BA BB BC
CA CB CC
You probably do not want the “diagonal” pairs AA, BB, and CC. Note that the remaining elements are symmetrical. For example, element (0,1) is AB and (1,0) is BA. Here again, I assume these are the same and that you do not want duplicates.
To borrow a term from linear algebra, you want the upper triangle. Doing it this way eliminates duplicates by construction, assuming that each drug name on a given line is unique. An algorithm for this is below.
Select in turn each drug q on the line. For each of these, perform steps 2 and 3.
Beginning with the drug immediately following q and then for each drug r in the rest of the list, perform step 3.
Record the pair (q, r).
The recorded list is the list of all unique pairs.
In Perl, this looks like
#! /usr/bin/env perl
use strict;
use warnings;
sub pairs {
my #a = #_;
my #pairs;
foreach my $i (0 .. $#a) {
foreach my $j ($i+1 .. $#a) {
push #pairs, [ #a[$i,$j] ];
}
}
wantarray ? #pairs : \#pairs;
}
my $line = "Perlix\tScalaris\tHashagra\tNextium";
for (pairs split /\t/, $line) {
print "#$_\n";
}
Output:
Perlix Scalaris
Perlix Hashagra
Perlix Nextium
Scalaris Hashagra
Scalaris Nextium
Hashagra Nextium
I've answered something like this before for someone else. For them, they had a question on how to combine a list of letters into all possible words.
Take a look at How Can I Generate a List of Words from a group of Letters Using Perl. In it, you'll see an example of using Math::Combinatorics from my answer and the correct answer that ikegami had. (He did something rather interesting with regular expressions).
I'm sure one of these will lead you to the answer you need. Maybe when I have more time, I'll flesh out an answer specifically for your question. I hope this link helps.
in this example I want to read the letter "d" from $ref:
$ref={a,b,c,{d,e}}
# Start using these!
use strict;
use warnings;
# A more standard way of writing your example.
my $ref = { a => "b", c => { d => "e", f => "g" } };
# How to access elements within the structure.
my $inner = $ref->{c};
print $_, "\n" for
$inner->{d}, # e
keys %$inner, # d f
$ref->{c}{d}, # e (directly, without using intermediate variable).
;
For more info, see the Perl Data Structures Cookbook.
print keys %{$ref->{c}}; will work for that specific (awful) example. It may or may not solve your problem since we don't know what the problem actually is.