Removing Lines and columns with all zeros - perl

How can I delete lines (rows) and columns in a text file that contain all the zeros.
For example, I have a file:
1 0 1 0 1
0 0 0 0 0
1 1 1 0 1
0 1 1 0 1
1 1 0 0 0
0 0 0 0 0
0 0 1 0 1
I want to delete 2nd and 4th line and also the 2nd column. The output should look like:
1 0 1 1
1 1 1 1
0 1 1 1
1 1 0 0
0 0 1 1
I can do this using sed and egrep
sed '/0 0 0 0/d' or egrep -v '^(0 0 0 0 )$'
for lines with zeros but that would too inconvenient for files with thousands of columns. I have no idea how can I remove the column with all zeros, 2nd column here.

Perl solution. It keeps all the non-zero lines in memory to be printed at the end, because it cannot tell what columns will be non-zero before it processes the whole file. If you get Out of memory, you may only store the numbers of the lines you want to output, and process the file again while printing the lines.
#!/usr/bin/perl
use warnings;
use strict;
my #nonzero; # What columns where not zero.
my #output; # The whole table for output.
while (<>) {
next unless /1/;
my #col = split;
$col[$_] and $nonzero[$_] ||= 1 for 0 .. $#col;
push #output, \#col;
}
my #columns = grep $nonzero[$_], 0 .. $#nonzero; # What columns to output.
for my $line (#output) {
print "#{$line}[#columns]\n";
}

Rather than storing lines in memory, this version scans the file twice: Once to find the "zero columns", and again to find the "zero rows" and perform the output:
awk '
NR==1 {for (i=1; i<=NF; i++) if ($i == 0) zerocol[i]=1; next}
NR==FNR {for (idx in zerocol) if ($idx) delete zerocol[idx]; next}
{p=0; for (i=1; i<=NF; i++) if ($i) {p++; break}}
p {for (i=1; i<=NF; i++) if (!(i in zerocol)) printf "%s%s", $i, OFS; print ""}
' file file
1 0 1 1
1 1 1 1
0 1 1 1
1 1 0 0
0 0 1 1
A ruby program: ruby has a nice array method transpose
#!/usr/bin/ruby
def remove_zeros(m)
m.select {|row| row.detect {|elem| elem != 0}}
end
matrix = File.readlines(ARGV[0]).map {|line| line.split.map {|elem| elem.to_i}}
# remove zero rows
matrix = remove_zeros(matrix)
# remove zero rows from the transposed matrix, then re-transpose the result
matrix = remove_zeros(matrix.transpose).transpose
matrix.each {|row| puts row.join(" ")}

Another awk variant:
awk '{show=0; for (i=1; i<=NF; i++) {if ($i!=0) show=1; col[i]+=$i;}} show==1{tr++; for (i=1; i<=NF; i++) vals[tr,i]=$i; tc=NF} END{for(i=1; i<=tr; i++) { for (j=1; j<=tc; j++) { if (col[j]>0) printf("%s%s", vals[i,j], OFS)} print ""; } }' file
Expanded Form:
awk '{
show=0;
for (i=1; i<=NF; i++) {
if ($i != 0)
show=1;
col[i]+=$i;
}
}
show==1 {
tr++;
for (i=1; i<=NF; i++)
vals[tr,i]=$i;
tc=NF
}
END {
for(i=1; i<=tr; i++) {
for (j=1; j<=tc; j++) {
if (col[j]>0)
printf("%s%s", vals[i,j], OFS)
}
print ""
}
}' file

Try this:
perl -n -e '$_ !~ /0 0 0 0/ and print' data.txt
Or simply:
perl -n -e '/1/ and print' data.txt
Where data.txt contains your data.
In Windows, use double quotes:
perl -n -e "/1/ and print" data.txt

All together:
$ awk '{for (i=1; i<=NF; i++) {if ($i) {print; next}}}' file | awk '{l=NR; c=NF; for (i=1; i<=c; i++) {a[l,i]=$i; if ($i) e[i]++}} END{for (i=1; i<=l; i++) {for (j=1; j<=c; j++) {if (e[j]) printf "%d ",a[i,j] } printf "\n"}}'
This makes the row checking:
$ awk '{for (i=1; i<=NF; i++) {if ($i) {print; next}}}' file
1 0 1 1
1 0 1 0
1 0 0 1
It loops through all the fields of the line. If any of them are "true" (meaning not 0), it prints the line (print) and breaks to next line (next).
This makes the column checking:
$ awk '{l=NR; c=NF;
for (i=1; i<=c; i++) {
a[l,i]=$i;
if ($i) e[i]++
}}
END{
for (i=1; i<=l; i++){
for (j=1; j<=c; j++)
{if (e[j]) printf "%d ",a[i,j] }
printf "\n"
}
}'
It basically saves all the data in the a array, l number of lines, c number of columns. e is an array saving if a column has any value different from 0 or not. Then it loops and prints all fields just when e array index is set, meaning if that column has any non-zero value.
Test
$ cat a
1 0 1 0 1
0 0 0 0 0
1 1 1 0 1
0 1 1 0 1
1 1 0 0 0
0 0 0 0 0
0 0 1 0 1
$ awk '{for (i=1; i<=NF; i++) {if ($i) {print; next}}}' a | awk '{l=NR; c=NF; for (i=1; i<=c; i++) {a[l,i]=$i; if ($i) e[i]++}} END{for (i=1; i<=l; i++) {for (j=1; j<=c; j++) {if (e[j]) printf "%d ",a[i,j] } printf "\n"}}'
1 0 1 1
1 1 1 1
0 1 1 1
1 1 0 0
0 0 1 1
previous input:
$ cat file
1 0 1 1
0 0 0 0
1 0 1 0
0 0 0 0
1 0 0 1
$ awk '{for (i=1; i<=NF; i++) {if ($i) {print; next}}}' file | awk '{l=NR; c=NF; for (i=1; i<=c; i++) {a[l,i]=$i; if ($i) e[i]++}} END{for (i=1; i<=l; i++) {for (j=1; j<=c; j++) {if (e[j]) printf "%d ",a[i,j] } printf "\n"}}'
1 1 1
1 1 0
1 0 1

The following script also makes two passes. During the first pass, it saves the line numbers of lines to be omitted from the output and the column indexes that should be included in the output. In the second pass, it outputs those lines and columns. I think this should provide close to the smallest possible memory footprint which might matter if you are dealing with large files.
#!/usr/bin/env perl
use strict;
use warnings;
filter_zeros(\*DATA);
sub filter_zeros {
my $fh = shift;
my $pos = tell $fh;
my %nonzero_cols;
my %zero_rows;
while (my $line = <$fh>) {
last unless $line =~ /\S/;
my #row = split ' ', $line;
my #nonzero_idx = grep $row[$_], 0 .. $#row;
unless (#nonzero_idx) {
$zero_rows{$.} = undef;
next;
}
$nonzero_cols{$_} = undef for #nonzero_idx;
}
my #matrix;
{
my #idx = sort {$a <=> $b } keys %nonzero_cols;
seek $fh, $pos, 0;
local $. = 0;
while (my $line = <$fh>) {
last unless $line =~ /\S/;
next if exists $zero_rows{$.};
print join(' ', (split ' ', $line)[#idx]), "\n";
}
}
}
__DATA__
1 0 1 0 1
0 0 0 0 0
1 1 1 0 1
0 1 1 0 1
1 1 0 0 0
0 0 0 0 0
0 0 1 0 1
Output:
1 0 1 1
1 1 1 1
0 1 1 1
1 1 0 0
0 0 1 1

Little bit unorthodox solution but fast as hell and small memory consumption:
perl -nE's/\s+//g;$m|=$v=pack("b*",$_);push#v,$v if$v!~/\000/}{$m=unpack("b*",$m);#m=split//,$m;#m=grep{$m[$_]eq"1"}0..$#m;say"#{[(split//,unpack(q(b*),$_))[#m]]}"for#v'

This is my awk solution. It would work with variable number of rows and columns.
#!/usr/bin/gawk -f
BEGIN {
FS = " "
}
{
for (c = 1; c <= NF; ++c) {
v = $c
map[c, NR] = v
ctotal[c] += v
rtotal[NR] += v
}
fields[NR] = NF
}
END {
for (r = 1; r <= NR; ++r) {
if (rtotal[r]) {
append = 0
f = fields[r]
for (c = 1; c <= f; ++c) {
if (ctotal[c]) {
if (append) {
printf " " map[c, r]
} else {
printf map[c, r]
append = 1
}
}
}
print ""
}
}
}

Off the top of my head...
The problem is the columns. How do you know if a column is all zeros until you read in the entire file?
I'm thinking you need an array of the columns with each array being the column. You can push in the amounts. An array of arrays.
The trick is to skip the rows that contain all zeros as you read them in:
#! /usr/bin/env perl
#
use strict;
use warnings;
use autodie;
use feature qw(say);
use Data::Dumper;
my #array_of_columns;
for my $row ( <DATA> ) {
chomp $row;
next if $row =~ /^(0\s*)+$/; #Skip zero rows;
my #columns = split /\s+/, $row;
for my $index ( (0..$#columns) ) {
push #{ $array_of_columns[$index] }, $columns[$index];
}
}
# Remove the columns that contain nothing but zeros;
for my $column ( (0..$#array_of_columns) ) {
my $index = $#array_of_columns - $column;
my $values = join "", #{ $array_of_columns[$index] };
if ( $values =~ /^0+$/ ) {
splice ( #array_of_columns, $index, 1 );
}
}
say Dumper \#array_of_columns;
__DATA__
1 0 1 0 1
0 0 0 0 0
1 1 1 0 1
0 1 1 0 1
1 1 0 0 0
0 0 0 0 0
0 0 1 0 1
Of course, you could use Array::Transpose which will transpose your array which makes things much easier.

This is a real tricky and challenging question.. so in order to solve we need to be tricky too :) in my version I depend on script learning, every time we read a new line we check for new field possibility to be omitted and if new change detected we start over.
The check and start over process should not be repeated so often as we will have few rounds
until we get a constant number of fields to omit or zero, then we omit each row zero value at specific position.
#! /usr/bin/env perl
use strict;
use warnings;
use Data::Dumper;
open my $fh, '<', 'file.txt' or die $!;
##open temp file for output
open my $temp, '>', 'temp.txt' or die $!;
##how many field you have in you data
##you can increase this by one if you have more fields
my #fields_to_remove = (0,1,2,3,4);
my $change = $#fields_to_remove;
while (my $line = <$fh>){
if ($line =~ /1/){
my #new = split /\s+/, $line;
my $i = 0;
for (#new){
unless ($_ == 0){
#fields_to_remove = grep(!/$i/, #fields_to_remove);
}
$i++;
}
foreach my $field (#fields_to_remove){
$new[$field] = 'x';
}
my $new = join ' ', #new;
$new =~ s/(\s+)?x//g;
print $temp $new . "\n";
##if a new change detected start over
## this should repeat for limited time
## as the script keeps learning and eventually stop
if ($#fields_to_remove != $change){
$change = $#fields_to_remove;
seek $fh, 0, 0;
close $temp;
unlink 'temp.txt';
open $temp, '>', 'temp.txt';
}
} else {
##nothing -- removes 0 lines
}
}
### this is just for showing you which fields has been removed
print Dumper \#fields_to_remove;
I have tested with 9 fields 25mb data file and it worked perfectly it wasn't super fast but it didn't consume much memory as well.

My compact and large-file-compatible alternative using grep and cut. Only drawback : lengthy for large files because of the for loop.
# Remove constant lines using grep
$ grep -v "^[0 ]*$\|^[1 ]*$" $fIn > $fTmp
# Remove constant columns using cut and wc
$ nc=`cat $fTmp | head -1 | wc -w`
$ listcol=""
$ for (( i=1 ; i<=$nc ; i++ ))
$ do
$ nitem=`cut -d" " -f$i $fTmp | sort | uniq | wc -l`
$ if [ $nitem -gt 1 ]; then listcol=$listcol","$i ;fi
$ done
$ listcol2=`echo $listcol | sed 's/^,//g'`
$ cut -d" " -f$listcol2 $fTmp | sed 's/ //g' > $fOut

Checking the rows can be done this way: awk '/[^0[:blank:]]/' file
It just states if a line contains any character that is different from 0 or a <blank> character, then print the line.
If you now want to check the columns, then I suggest an adaptation of Glenn Jackman's answer
awk '
NR==1 {for (i=1; i<=NF; i++) if ($i == 0) zerocol[i]=1; next}
NR==FNR {for (idx in zerocol) if ($idx) delete zerocol[idx]; next}
/[^0[:blank:]]/ {for (i=1; i<=NF; i++) if (i in zerocol) $i=""; print}
' file file

Related

Majority Element Failing to close cycles

I'm trying to figure out why this keeps printing the "majority element" candidate in every cycle.
The code I've been trying to make work is a Majority Element search (to find an element that is repeated more than half of the length of a list).
I can't separate the processes of finding the candidate and testing against the array because my input is a text file that has an indeterminate number of arrays. It's an exercise from rosalind.info that has different inputs every time you try to solve it.
An example of the input would be
-5 5 5 5 5 5 5 5 -8 7 7 7 1 7 3 7 -7 1 6 5 10 100 1000 1 -5 1 6 7 1 1 10 1
Here's what I've written so far.
foreach my $currentrow (#lists) {
my #row = ();
#row = split( /\s/, $currentrow );
my $length = $#row;
my $count = 0;
my $i = 0;
for $i ( 0 .. $length - 1 ) {
if ( $count == 0 ) {
$candidate = $row[$i];
$count++;
}
if ( ( $count > 0 ) and ( $i = $length - 1 ) ) {
my $counter2 = 0;
for my $j ( 0 .. $length - 1 ) {
if ( $row[$j] == $candidate ) {
$counter2++;
}
}
if ( $counter2 <= ( $#row / 2 ) and ( $i = $length - 1 ) ) {
$candidate = -1;
print $candidate, " ", $i, " ";
}
if ( $counter2 > ( $#row / 2 ) and ( $i = $length - 1 ) ) {
print $candidate, " ", $i, " ";
}
}
if ( $candidate == $row[$i] and $count > 0 ) {
$count = $count + 1;
}
if ( $candidate != $row[$i] and $count > 0 ) {
$count = $count - 1;
}
}
}
Do you have use strict and use warnings 'all' in place?
I imagine that your problem may be because of the test $i = $length - 1, which is an assignment, and should be $i == $length - 1
To find a majority element I would use a hash:
perl -nae '%h=(); $h{$_}+=2 for #F; $h{$_}>#F and print for keys %h; print "\n"'
Each line of input is treated separately. Each line of output matches a line of input and presents its majority element or is empty if there is no such element.
Edit: Now the solution uses autosplit (-a), which is shorter and work not only for numbers.

Perl : how to print all the lines where second field is 0 and third field is 1

How to print all the lines where second field is 0 and third field is 1 in Perl. I am specifically looking for a Perl Solution since there are few limitations in AIX Server (shell version is very old) and hence shell script is not working properly. Below is the sample file:
237576 6 1 RMAN_Backup Default_Backup clmjk 1442149024 0000000055 1442149079 23328
237575 0 3 RMAN_Backup Default_Backup clmjk 1442148956 0000000053 1442149009 2848
237574 0 1 RMAN_Backup Default_Backup clmjk 1442148954 0000000045 1442148999 544
237573 0 1 RMAN_Backup Default_Backup clmjk 1442145436 0000000053 1442145489 23328
237572 0 3 RMAN_Backup Default_Backup clmjk 1442145352 0000000047 1442145399 544
As a one-liner:
perl -ane 'print if 0 == $F[1] && 1 == $F[2]' input-file
-n reads the input line by line
-a splits the input on whitespace to the #F array
Read file line by line and split it with whitespace, then check if second field is 0 and third field is 1, write it in an output file.
use warnings;
use strict;
open my $fh, "<", "file.txt" or die $!;
open my $fhout, ">", "outfile.txt" or die $!;
while (my $line = <$fh>)
{
chomp $line;
my #data = split (' ', $line);
if ($data[1] == 0 && $data[2] == 1)
{
print $fhout $line, "\n";
}
}
close $fh;
close $fhout;
Output (in output file 'outfile.txt'):
237574 0 1 RMAN_Backup Default_Backup clmjk 1442148954 0000000045 1442148999 544
237573 0 1 RMAN_Backup Default_Backup clmjk 1442145436 0000000053 1442145489 23328
This kind of thing is perfect to be written as a Unix filter (reading from STDIN and writing to STDOUT).
use warnings;
use strict;
while (<>) {
my #data = split;
print if $data[1] == 0 && $data[2] == 1;
}
Run it like this:
$ ./filter < input_file > output_file
In fact, it's simple enough to write it as a command line program
$ perl -ane 'print if $F[1] == 0 && $F[2] == 1' input_file > output_file

Adding columns of numbers in Perl

I have a file with columns of numbers:
1 0.0 0.0
2 0.0 0.0
3 15.2 0.0
4 7.0 9.0
5 0.0 3.0
6 1.0 0.0
7 0.0 2.5
8 0 0 0 0
I need to find the sum of numbers from row 3 to 7 of the right two columns. So for column2 i want to sum 15.2, 7.0 and 1.0. For column3 i want to sum 9.0, 3.0 and 2.5. I need to maintain the single decimal point format.
code:
While (<INPUT>){
my #a = split;
my $c2 .= $a[1];
my $c3 .= $a[2];
my $c2_string = substr($c2, 2, 5);
my $c3_string = substr($c3, 2, 5);
my #sumarray = split ('', $c2);
#then loop through each element and add them up.
This doesnt seem to work. How can i maintain separation of each number while maintaining the decimal format?
For c2, wrong Output:
1
5
.
2
7
.
0
0
.
0
etc
Desired Output:
c2=23.2
c3=14.5
my $x = my $y = 0;
while (<INPUT>) {
my #a = split;
($a[0] >=3 and $a[0] <=7) or next;
$x += $a[1];
$y += $a[2];
}
print "c2=$x\n", "c3=$y\n";
perl -lane'
($F[0] >=3 and $F[0] <=7) or next;
$x += $F[1]; $y += $F[2];
END{ print for "c2=$x","c3=$y" }
' file
my #data;
while (<INPUT>) {
push #data, [ split ];
}
my ($sum2, $sum3);
for (my $i = 2; $i < 7; $i++) {
$sum2 += $data[$i][1];
$sum3 += $data[$i][2];
}
print "$sum2, $sum3\n";
Output:
23.2, 14.5
And this one does not create an array for the entire file:
my ($sum2, $sum3);
while (<INPUT>) {
my #v = split;
if ($v[0] > 2 && $v[0] < 8) {
$sum2 += $v[1];
$sum3 += $v[2];
}
}
#!/usr/bin/perl -w
use strict;
my $infile = 'in.txt';
open my $input, '<', $infile or die "Can't open to $infile: $!";
my ($col1, $sum_col2, $sum_col3 );
while (<$input>) {
my (#cols) = split;
$col1 = $cols[0];
$sum_col2 += $cols[1] if $col1 == 3 .. 7;
$sum_col3 += $cols[2] if $col1 == 3 .. 7;
}
print "Column2: $sum_col2\n";
print "Column3: $sum_col3\n";
Output:
Column2: 23.2
Column3: 14.5

In trying to write a script to permute values in a list, I'm getting an uninitialized value warning

I'm very unsure what is happening. For various lengths of #depths, the code generally works. However, throughout the output, at certain points the program barfs up complaints about use of an uninitialized value.
The errors blame line 20: print $chars[$depths[$y]];
I apologize ahead of time if it's something obvious, but I'm definitely missing the issue. I've googled for about an hour with no luck so any nudges in the right direction would be very appreciated!
Complete code:
#! /usr/bin/perl -w
use strict;
use warnings;
#my #chars = ("0" .. "9");
#my #depths = (0) x 4;
my #chars = ("0" .. "1");
my #depths = (0) x 3;
my ($i, $x, $y);
for ($i = 0; $i < #chars ** #depths; $i++) {
for ($y = 0; $y < #depths; $y++) {
print $chars[$depths[$y]];
}
print"\n";
$depths[$#depths]++;
for($x = 0; $x < #depths; $x++) {
if($depths[$x] == #chars) {
$depths[$x-1]++;
while($x < #depths) {
$depths[$x++] = 0;
}
}
}
}
Output:
000
001
010
011
Use of uninitialized value in print at a.pl line 15.
00
100
101
110
I added some print statements to your code, and the problem becomes apparent.
#! /usr/bin/perl -w
use strict;
use warnings;
my #chars = ("0".."1");
my #depths = (0) x 3;
my $i;
my $x;
my $y;
for ($i = 0; $i < #chars**#depths; $i++)
{
printf "m = %d\n", scalar(#depths);
for ($y = 0; $y < #depths; $y++)
{
print "y:$y; d[y] = $depths[$y]\n";
print " :$chars[$depths[$y]]:\n";
}
print"\n";
printf "b[%d] ", $depths[$#depths];
$depths[$#depths]++;
printf "a[%d]\n", $depths[$#depths];
for($x = 0; $x < #depths; $x++){
if($depths[$x] == #chars){
$depths[$x-1]++;
while($x < #depths){
$depths[$x++] = 0;
}
}
}
}
The output was:
m = 3
y:0; d[y] = 0
:0:
y:1; d[y] = 0
:0:
y:2; d[y] = 0
:0:
b[0] a[1]
m = 3
y:0; d[y] = 0
:0:
y:1; d[y] = 0
:0:
y:2; d[y] = 1
:1:
b[1] a[2]
m = 3
y:0; d[y] = 0
:0:
y:1; d[y] = 1
:1:
y:2; d[y] = 0
:0:
b[0] a[1]
m = 3
y:0; d[y] = 0
:0:
y:1; d[y] = 1
:1:
y:2; d[y] = 1
:1:
b[1] a[2]
m = 3
y:0; d[y] = 0
:0:
y:1; d[y] = 2
Use of uninitialized value within #chars in concatenation (.) or string at perm.pl line 18.
::
y:2; d[y] = 0
:0:
b[0] a[1]
m = 3
y:0; d[y] = 1
:1:
y:1; d[y] = 0
:0:
y:2; d[y] = 0
:0:
b[0] a[1]
m = 3
y:0; d[y] = 1
:1:
y:1; d[y] = 0
:0:
y:2; d[y] = 1
:1:
b[1] a[2]
m = 3
y:0; d[y] = 1
:1:
y:1; d[y] = 1
:1:
y:2; d[y] = 0
:0:
b[0] a[1]
The line immediately before the error report shows that $depths[$y] is 2, but the array #chars only contains elements with indexes 0 and 1, so the warning is accurate.
Think of #depths as a base N number (where N is the number of #chars), and each element of #depths is a digit. You're trying to add one to that number, but you're failing.
A carry propagates from the least significant digit to the most significant, so you should be the loop than handles the carry should loop from the least significant digit to the most significant, but you're doing it in the opposite order. $x should start at $#depth and decrease as the loop advances.
Let's look at an example. Say there are 10 symbols (#chars == 10):
456999 <- #depth
+ 1
-------
457000
So you want to increment the right-most digit that's not equal to $#chars, and set to zero all the right-most digits that are equal to $#chars. Here's how I coded that:
#!/usr/bin/perl -w
use strict;
use warnings;
my #chars = ("0".."2");
my #depths = (0) x 3;
OUTER: while (1) {
print #chars[#depths], "\n";
my $x = $#depths;
while ($depths[$x] == $#chars) {
$depths[$x] = 0;
last OUTER if $x == 0;
--$x;
}
++$depths[$x];
}

Capturing Non-Zero Elements, Counts and Indexes of Sparse Matrix

I have the following sparse matrix A.
2 3 0 0 0
3 0 4 0 6
0 -1 -3 2 0
0 0 1 0 0
0 4 2 0 1
Then I would like to capture the following information from there:
cumulative count of entries, as matrix is scanned columnwise.
Yielding:
Ap = [ 0, 2, 5, 9, 10, 12 ];
row indices of entries, as matrix is scanned columnwise.
Yielding:
Ai = [0, 1, 0, 2, 4, 1, 2, 3, 4, 2, 1, 4 ];
Non-zero matrix entries, as matrix is scanned columnwise.
Yielding:
Ax = [2, 3, 3, -1, 4, 4, -3, 1, 2, 2, 6, 1];
Since the actual matrix A is potentially very2 large, is there any efficient way
in Perl that can capture those elements? Especially without slurping all matrix A
into RAM.
I am stuck with the following code. Which doesn't give what I want.
use strict;
use warnings;
my (#Ax, #Ai, #Ap) = ();
while (<>) {
chomp;
my #elements = split /\s+/;
my $i = 0;
my $new_line = 1;
while (defined(my $element = shift #elements)) {
$i++;
if ($element) {
push #Ax, 0 + $element;
if ($new_line) {
push #Ai, scalar #Ax;
$new_line = 0;
}
push #Ap, $i;
}
}
}
push #Ai, 1 + #Ax;
print('#Ax = [', join(" ", #Ax), "]\n");
print('#Ai = [', join(" ", #Ai), "]\n");
print('#Ap = [', join(" ", #Ap), "]\n");
A common strategy for storing sparse data is to drop the values you don't care about (the zeroes) and to store the row and column indexes with each value that you do care about, thus preserving their positional information:
[VALUE, ROW, COLUMN]
In your case, you can economize further since all of your needs can be met by processing the data column-by-column, which means we don't have to repeat COLUMN for every value.
use strict;
use warnings;
use Data::Dumper;
my ($r, $c, #dataC, #Ap, #Ai, #Ax, $cumul);
# Read data row by row, storing non-zero values by column.
# $dataC[COLUMN] = [
# [VALUE, ROW],
# [VALUE, ROW],
# etc.
# ]
$r = -1;
while (<DATA>) {
chomp;
$r ++;
$c = -1;
for my $v ( split '\s+', $_ ){
$c ++;
push #{$dataC[$c]}, [$v, $r] if $v;
}
}
# Iterate through the data column by column
# to compute the three result arrays.
$cumul = 0;
#Ap = ($cumul);
$c = -1;
for my $column (#dataC){
$c ++;
$cumul += #$column;
push #Ap, $cumul;
for my $value (#$column){
push #Ax, $value->[0];
push #Ai, $value->[1];
}
}
__DATA__
2 3 0 0 0
3 0 4 0 6
0 -1 -3 2 0
0 0 1 0 0
0 4 2 0 1
This is what you are looking for, I guess:
#!/usr/bin/perl
use strict;
use warnings;
use Data::Dumper::Simple;
my #matrix;
# Populate #matrix
while (<>) {
push #matrix, [ split /\s+/ ];
}
my $columns = #{ $matrix[0] };
my $rows = #matrix;
my ( #Ap, #Ai, #Ax );
my $ap = 0;
for ( my $j = 0 ; $j <= $rows ; $j++ ) {
for ( my $i = 0 ; $i <= $columns ; $i++ ) {
if ( $matrix[$i]->[$j] ) {
$ap++;
push #Ai, $i;
push #Ax, $matrix[$i]->[$j];
}
}
push #Ap, $ap;
}
print Dumper #Ap;
print Dumper #Ai;
print Dumper #Ax;
Updated based on FM's comment. If you do not want to store any of the original data:
#!/usr/bin/perl
use strict;
use warnings;
my %matrix_info;
while ( <DATA> ) {
chomp;
last unless /[0-9]/;
my #v = map {0 + $_ } split;
for (my $i = 0; $i < #v; ++$i) {
if ( $v[$i] ) {
push #{ $matrix_info{$i}->{indices} }, $. - 1;
push #{ $matrix_info{$i}->{nonzero} }, $v[$i];
}
}
}
my #cum_count = (0);
my #row_indices;
my #nonzero;
for my $i ( sort {$a <=> $b } keys %matrix_info ) {
my $mi = $matrix_info{$i};
push #nonzero, #{ $mi->{nonzero} };
my #i = #{ $mi->{indices} };
push #cum_count, $cum_count[-1] + #i;
push #row_indices, #i;
}
print(
"\#Ap = [#cum_count]\n",
"\#Ai = [#row_indices]\n",
"\#Ax = [#nonzero]\n",
);
__DATA__
2 3 0 0 0
3 0 4 0 6
0 -1 -3 2 0
0 0 1 0 0
0 4 2 0 1
Output:
C:\Temp> m
#Ap = [0 2 5 9 10 12]
#Ai = [0 1 0 2 4 1 2 3 4 2 1 4]
#Ax = [2 3 3 -1 4 4 -3 1 2 2 6 1]
Ap is easy: simply start with zeroes and increment each time you meet a nonzero number. I don't see you trying to write anything into #Ap, so it's no surprise it doesn't end up as you wish.
Ai and Ax are trickier: you want a columnwise ordering while you're scanning rowwise. You won't be able to do anything in-place since you don't know yet how many elements the columns will yield, so you can't know in advance the elements' position.
Obviously, it would be a hell lot easier if you could just alter the requirement to have a rowwise ordering instead. Failing that, you could get complex and collect (i, j, x) triplets. While collecting, they'd naturally be ordered by (i, j). Post-collection, you'd just want to sort them by (j, i).
The code you provided works on a row-by-row basis. To get results sequential by columns you have to accumulate your values into separate arrays, one for each column:
# will look like ([], [], [] ...), one [] for each column.
my #columns;
while (<MATRIX>) {
my #row = split qr'\s+';
for (my $col = 0; $col < #row; $col++) {
# push each non-zero value into its column
push #{$columns[$col]}, $row[$col] if $row[$col] > 0;
}
}
# now you only need to flatten it to get the desired kind of output:
use List::Flatten;
#non_zero = flat #columns;
See also List::Flatten.