Remove row and create new array - perl

I am reading a CSV file
A B C D
A 1, 2, 3, 4
B 5, 1, 7, 8
C 9, 4, 1, 2
D 2, 7, 8, 1
The idea is to compute matrix correlation
How can I remove a row and create a new array?
I tried this
my #new_row = split(/\s+/, $header_line);
This is my first Perl program
my #row = /#desired_row/ && $_;
current output
A 1,2,3,4
What I am trying now
my #newarray = ( );
#newarray = grep ($_ > 2, #row);
print "#newarray\n";
Result I am trying to get
A 3, 4

Your question is confusing. My interpretation is that you are trying to read a correlation matrix into a Perl data structure. The fact the matrix has unit diagonal seems to support that interpretation. On the other hand, the matrix is not symmetric, so that's even more confusing.
I am assuming you are confused about how to remove the variable labels, and just getting the numbers into a matrix. In Perl, you can represent a matrix either as an array of references to anonymous arrays (dense matrix) or as a hash whose keys are row or column indexes and values are references to anonymous arrays (sparse matrix). The choice of whether to store column vectors or row vectors can affect how well you can work with the data, but, once again, there is not enough information in your question to deduce what would be most appropriate in your situation.
The code below shows the most basic way of reading that matrix into a Perl data structure. In your code, you would open the data file and assign a filehandle, e.g. $fh, and read from that rather than the __DATA__ section of your script.
#!/usr/bin/env perl
use strict;
use warnings;
my #vars = split ' ', <DATA>;
my #correl;
while (my $line = <DATA>) {
push #correl, [ $line =~ /([0-9]+)/g ];
}
print join("\t", #vars), "\n";
print join("\t", #$_), "\n" for #correl;
__DATA__
A B C D
A 1, 2, 3, 4
B 5, 1, 7, 8
C 9, 4, 1, 2
D 2, 7, 8, 1
Output:
A B C D
1 2 3 4
5 1 7 8
9 4 1 2
2 7 8 1

Related

How to create a numeric ruler and evenly spaced numeric width markers

I am new to Perl. An exercise, where I am to create a numeric ruler from which, I size columns for data at 20 characters-width, is proving a little difficult to complete. So far, I have,
printf “%10d” x 5, (1..6);
#ruler = (1..10) x 7;
Print #ruler, “\n”;
It should look something like,
1 2 3 4
1234567890123456789012345678901234567890
What I get for the top row of numbers is an error, ‘Redundant argument in printf at <script.pl> line #; the bottom row produces numbers from 1 to 10, as it ought with the range operator, but I would like it to produce 1 to 9 with a zero on the end. I did think to start the range from 0, but I haven’t figured out how to remove the first index and only the first index.
I would be grateful for your guidance with both issues.
The warning is due to the fact that you pass 6 numbers to printf, but the format only requires 5.
To me,
1 2 3 4
1234567890123456789012345678901234567890
reads as
11, 12, 13, ..., 19, 10, 21, 22, 23, ...
Why does it start with 11? Why is 10 between 19 and 21?
The following makes more sense:
1 2 3 4
01234567890123456789012345678901234567890 0-based
and
1 2 3 4
1234567890123456789012345678901234567890 1-based
I'm not going to give the solution outright.
If you want the numbers 1 to 9 and 0, that would be 1..9, 0.
%10d will add padding on the left. %-10d will add padding on the right.
Nothing says you can't prefix the output with something that doesn't repeat, like a zero or a space.
Provided desired output starts count from 11 instead 1 -- it doesn't look right.
Perhaps OP intended to start count from 1 until some $max value with placing a digit representing tens above main counter.
Please study following code sample for compliance with your requirements.
use strict;
use warnings;
use feature 'say';
my $max = shift || 45;
rule($max);
sub rule {
my $max = shift;
my($a,$b);
$a .= ' ' x 9 . $_ for 1..$max/10;
$b .= $_ % 10 for 1..$max;
say $a . "\n" . $b;
}
Output
1 2 3 4
123456789012345678901234567890123456789012345
Original OP's code requires slight modification to achieve desired output
use strict;
use warnings;
use feature 'say';
my $max = shift || 45;
printf "%10d" x int($max/10) . "\n", (1..$max/10);
print $_ % 10 for 1..$max;
print "\n";

Perl script to check another array values depending on current array index

I'm working on a perl assignment, that has three arrays - #array_A, #array_B and array_C with some values in it, I grep for a string "CAT" on array A and fetching its indices too
my #index = grep { $#array_A[$_] =~ 'CAT' } 0..$#array_A;
print "Index : #index\n";
Output: Index : 2 5
I have to take this as an input and check the value of other two arrays at indices 2 and 5 and print it to a file.
Trick is the position of the string - "CAT" varies. (Index might be 5 , 7 and 9)
I'm not quite getting the logic here , looking for some help with the logic.
Here's an overly verbose example of how to extract the values you want as to show what's happening, while hopefully leaving some room for you to have to further investigate. Note that it's idiomatic Perl to use regex delimiters when using =~. eg: $name =~ /steve/.
use warnings;
use strict;
my #a1 = qw(AT SAT CAT BAT MAT CAT SLAT);
my #a2 = qw(a b c d e f g);
my #a3 = qw(1 2 3 4 5 6 7);
# note the difference in the next line... no # symbol...
my #indexes = grep { $a1[$_] =~ /CAT/ } 0..$#a1;
for my $index (#indexes){
my $a2_value = $a2[$index];
my $a3_value = $a3[$index];
print "a1 index: $index\n" .
"a2 value: $a2_value\n" .
"a3 value: $a3_value\n" .
"\n";
}
Output:
a1 index: 2
a2 value: c
a3 value: 3
a1 index: 5
a2 value: f
a3 value: 6

How to get the array of all ngrams in Perl Text::Ngrams

As you know the module Text::Ngrams in Perl can give Ngrams analysis. There is the following function to retrieve the array of Ngrams and frequencies.
get_ngrams(orderby=>'ngram|frequency|none',onlyfirst=>NUMBER,out=>filename|handle,normalize=>1)
But it gives only the last Ngrams.
For example the following code does not give both Uni-Gram and Bi-Gram:
my $ng3 = Text::Ngrams->new( windowsize => 2, type=>'byte');
my $text = "test teXT TESTtexT";
$text =~ s/ +/ /g; # replace multiple spaces to single
$text = uc $text; # uppercase all
$ng3->process_text($text);
my #ngramsarray = $ng3->get_ngrams(orderby=>'frequency', onlyfirst=>10, normalize=>0 );
foreach(#ngramsarray)
{
print "$_\n";
}
output:
T E
4
E X
2
_ T
2
E S
2
S T
2
X T
2
T _
2
T T
1
However by using the function
to_string(orderby=>'ngram|frequency|none',onlyfirst=>NUMBER,out=>filename|handle,normalize=>1,spartan=>1)
it shows both of Ngrams. But only it displays the result. I need the result in an array.
How to get all Ngrams (Unigram and Bigram) at the same time by this array?
You can't get all the different sizes of n-grams at the same time, but you can get them all using multiple calls to get_ngrams. There is an undocumented parameter n to get_ngrams that says the size of the n-grams you want listed.
In your code, if you say
my #ngramsarray = $ng3->get_ngrams(
n => 1,
orderby = >'frequency',
onlyfirst => 10,
normalize => 0);
you get this list
('T', 8, 'E', 4, 'X', 2, '_', 2, 'S', 2)

Perl Fibonacci number program with array references?

EDIT: SOLVED. See solution below.
I wrote the following Fibonacci number program for the first 10 numbers; however I can't get it to work. It keeps telling me that Use of uninitialized value $secondLast in addition (+) at fib.plx line 22. and it runs forever. I'm a beginner Perl programmer so I'm sure the error is very simple. Thanks.
#!/usr/bin/perl
use warnings;
use strict;
sub fib(\#$);
my #defaultNums = (1,1);
my $max = 10;
fib(#defaultNums,10);
sub fib(\#$)
{
my $nums_ref = $_[0];
my $max = $_[1];
foreach(#{$nums_ref})
{
print "$_, ";
}
print "\n";
my $last = pop (#{$nums_ref});
my $secondLast = pop (#{$nums_ref});
my $sum = $last + $secondLast;
push (#{$nums_ref}, $sum);
if( scalar #{$nums_ref} >= $max) { return; }
fib (#{$nums_ref},$max);
print "\n";
}
EDIT: SOLVED. See solution below.
A few notes on your program:
It is important that you avoid using subroutine prototypes unless you are absolutely sure about what you are doing. Prototypes are primarily intended for writing replacements for Perl built-in operators, and that is something that is rarely required. You should explicity pass a reference to an array by writing the call as fib(\#defaultNums, 10).
The first two values in the Fibonacci sequence are zero and one. You have seeded your sequence with the second and third values which, while it will work fine, isn't mathematically correct.
Because Perl is sensitive to context, you can remove the scalar call to compare the number of elements in the array. You can also used the if statement modifier to avoid a lot of noise, so your return line becomes return if #{$nums_ref} >= $max.
Your chosen solution - to replace the items popped from the array - is inefficient and counter-intuitive. Using list assignment and Perl's ability to index array elements from the end of the array lets you write my ($last, $secondLast) = #{$nums_ref}[-1, -2] which doesn't remove the elements and so they don't need to be replaced.
Here is quick rewrite of your program to show you what you've been missing!
use strict;
use warnings;
my #defaultNums = (0, 1);
fib(\#defaultNums, 10);
sub fib {
my ($nums_ref, $max) = #_;
print join(', ', #$nums_ref), "\n";
my ($last, $secondLast) = #{$nums_ref}[-1, -2];
my $sum = $last + $secondLast;
push #$nums_ref, $sum;
return if #{$nums_ref} >= $max;
fib($nums_ref, $max);
}
output
0, 1
0, 1, 1
0, 1, 1, 2
0, 1, 1, 2, 3
0, 1, 1, 2, 3, 5
0, 1, 1, 2, 3, 5, 8
0, 1, 1, 2, 3, 5, 8, 13
0, 1, 1, 2, 3, 5, 8, 13, 21
I figured out my mistake. I'm popping off of the array twice each time, so I need to remember to push those numbers back on before I push on $sum. Thanks anyways.

How to subtract values in 2 different arrays in perl?

Hi I have two arrays containing 4 columns and I want to subtract the value in column1 of array2 from value in column1 of array1 and value of column2 of array2 from column2 of array1 so on..example:
my #array1=(4.3,0.2,7,2.2,0.2,2.4)
my #array2=(2.2,0.6,5,2.1,1.3,3.2)
so the required output is
2.1 -0.4 2 # [4.3-2.2] [0.2-0.6] [7-5]
0.1 -1.1 -0.8 # [2.2-2.1] [0.2-1.3] [2.4-3.2]
For this the code I used is
my #diff = map {$array1[$_] - $array2[$_]} (0..2);
print OUT join(' ', #diff), "\n";
and the output I am getting now is
2.1 -0.4 2
2.2 -1.1 3.8
Again the first row is used from array one and not the second row, how can I overcome this problem?
I need output in rows of 3 columns like the way i have shown above so just i had filled in my array in row of 3 values.
This will produce the requested output. However, I suspect (based on your comments), that we could produce a better solution if you simply showed your raw input.
use strict;
use warnings;
my #a1 = (4.3,0.2,7,2.2,0.2,2.4);
my #a2 = (2.2,0.6,5,2.1,1.3,3.2);
my #out = map { $a1[$_] - $a2[$_] } 0 .. $#a1;
print "#out[0..2]\n";
print "#out[3..$#a1]\n";
First of all, your code doesn't even compile. Perl arrays aren't space separated - you need a qw() to turn those into arrays. Not sure how you got your results.
Perl doesn't have 2D arrays. 2.2 is NOT a column1 of row 1 of #array1 - it's element with index 3 of #array1. As far as Perl is concerned, your newline is just another whitespace separator, NOT something that magically turns a 1-d array into a table as you seem to think.
To get the result you want (process those 6 elements as 2 3-element arrays), you can either store them in an array of arrayrefs (Perl's implementation of C multidimentional arrays):
my #array1=( [ 4.3, 0.2, 7 ],
[ 2.2, 0.2, 2.4] );
for(my $idx=0; $idx1 < 2; $idx1++) {
for(my $idx2=0; $idx2 < 3; $idx2++) {
print $array1[$idx1]->[$idx2] - $array2[$idx1]->[$idx2];
print " ";
}
print "\n";
}
or, you can simply fake it by using offsets, the same way pointer arithmetic works in C's multidimentional arrays:
my #array1=( 4.3, 0.2, 7, # index 0..2
2.2, 0.2, 2.4); # index 3..5
for(my $idx=0; $idx1 < 2; $idx1++) {
for(my $idx2=0; $idx2 < 3; $idx2++) {
print $array1[$idx1 * 3 + $idx2] - $array2[$idx1 * 3 + $idx2];
print " ";
}
print "\n";
}