Find values in a matrix and sum when found - matlab

I have a matrix X(1e4,20) which takes on values 0:4.
I'm interested in finding (row by row) the number of times values are ~=0, ==1&2&3 and ==3
Why doesn't
eg:
X=randi([0 4],1e4,20)
for ii=1:1e4
onestwosorfours(ii,1)=sum(X(ii,:)==1|2|4)
end
work?
I've ended up doing
sum(X(ii,:)==1)+sum(X(ii,:)==2), etc

This expression is wrong:
sum( X(ii,:)==1|2|4 )
You are finding the bitwise or of 1,2 and 4 which is true, because anything other than false or 0 is true. Then you are finding the amount of times that the array equals the number.
Instead, rewrite it as :
sum( X(ii,:)==1 | X(ii,:)==2 | X(ii,:)==4 )
Or, even better
numel( X(ii,:)==1 | X(ii,:)==2 | X(ii,:)==4 )
Which clarifies what you really meant.

You have to have the A == b parts each time for the logical or of the results:
X=randi([0 4],1e4,20);
for ii=1:1e4
onestwosorfours(ii,1)=sum( X(ii,:)==1 | X(ii,:) == 2 | X(ii,:) == 4);
end

Related

Tableau mixing aggregate and non-aggregate results error

I have a problem creating a calculated field in Tableau. I have data like so:
ID ... Status Step1 Step2 Step3
1 ... Accepted 1 1 1
2 ... Waiting 1 0 0
3 ... Discard 0 0 0
4 ... Waiting 1 1 0
...
I would like to create a calculated column that will give me the name of the last Step, but only when status is 'Accepted'. Otherwise I want the status. The syntax is quite easy, it looks like this:
IF [Status] = 'Accepted' THEN (
IF [Step3] = 1 THEN 'Step3' ELSEIF [STEP2] = 1 THEN 'Step2' ELSEIF [STEP1] = '1' THEN 'Step1' ELSE 'Step0')
ELSE [Status]
The problem is that the column 'Status' is a Dimension and the 'Step' statuses come from Measure. So they are AGG(Step1), AGG(Step2),...
I guess that is the reason I get this error:
Cannot mix aggregate and non-aggregate comparisons or results in 'IF' expressions.
I am not very familiar with Tableau. Any idea how I can solve this?
Solution:
Just use function ATTR that will make the non-aggregate function (Status) into an aggregate one. Then it is possible to combine them and the calculation is working.
IF ATTR([Status]) = 'Accepted' THEN (
IF [Step3] = 1 THEN 'Step3' ELSEIF [STEP2] = 1 THEN 'Step2' ELSEIF [STEP1] = '1' THEN 'Step1' ELSE 'Step0')
ELSE ATTR([Status])
Tableau automatically interprets numeric values as measures. It appears though that in your case they are a boolean (0 for false, 1 for true) and really ought to be dimensions.
Convert Step 1, Step 2, and Step 3 to dimensions. Highlight the fields, right click, and choose Convert to Dimension.

Reference to non-existent field 'd'

My mat file contains 40,000 rows and two columns. I have to read it line by line
and then get values of last column in a single row.
Following is my code:
for v = 1:40000
firstRowB = data.d(v,:)
if(firstRowB(1,2)==1)
count1=count1+1;
end
if(firstRowB(1,2)==2)
count2=count2+1;
end
end
FirstRowB gets the row checks whether last column equals 1 or 2 and then increases the value of respective count by 1.
But I keep getting this error:
Reference to non-existent field 'd'.
You could use vectorization (it is always convenient especially in Matlab). Taking advantage of the fact that true is one and false is zero, if you just want to count you can do :
count1 = sum ( data.d(:, 2) == 1 ) ;
count2 = sum (data.d(:,2) == 2 ) ;
in fact in general you could define :
getNumberOfElementsInLastColEqualTo = #(numb) sum (data.d(:,end) == numb ) ;
counts =arrayfun( getNumberOfElementsInLastColEqualTo , [1 2 ] );
Hope this helps.

Parallelizing sequential for-loop for GPU

I have a for-loop where the current index of a vector depends on the previous indices that I am trying to parallelize for a GPU in MATLAB.
A is an nx1 known vector
B is an nx1 output vector that is initialized to zeros.
The code is as follows:
for n = 1:size(A)
B(n+1) = B(n) + A(n)*B(n) + A(n)^k + B(n)^2
end
I have looked at this similar question and tried to find a simple closed form for the recurrence relation, but couldn't find one.
I could do a prefix sum as mentioned in the first link over the A(n)^k term, but I was hoping there would be another method to speed up the loop.
Any advice is appreciated!
P.S. My real code involves 3D arrays that index and sum along 2D slices, but any help for the 1D case should transfer to a 3D scaling.
A word "Parallelizing" sounds magically, but scheduling rules apply:
Your problem is not in spending efforts on trying to convert a pure SEQ-process into it's PAR-re-representation, but in handling the costs of doing so, if you indeed persist into going PAR at any cost.
m = size(A); %{
+---+---+---+---+---+---+---+---+---+---+---+---+---+ .. +---+
const A[] := | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | A | B | C | D | | M |
+---+---+---+---+---+---+---+---+---+---+---+---+---+ .. +---+
:
\
\
\
\
\
\
\
\
\
\
\
+---+---+---+---+---+ .. + .. +---+---+---+---+---+ .. +---+
var B[] := | 0 | 0 | 0 | 0 | 0 | : | 0 | 0 | 0 | 0 | 0 | | 0 |
+---+---+---+---+---+ .. : .. +---+---+---+---+---+ .. +---+ }%
%% : ^ :
%% : | :
for n = 1:m %% : | :
B(n+1) =( %% ====:===+ : .STO NEXT n+1
%% : :
%% v :
B(n)^2 %% : { FMA B, B, .GET LAST n ( in SEQ :: OK, local data, ALWAYS )
+ B(n) %% v B } ( in PAR :: non-local data. CSP + bcast + many distributed-caches invalidates )
+ B(n) * A(n) %% { FMA B, A,
+ A(n)^k %% ApK}
);
end
Once the SEQ-process data-dependency is recurrent ( having a need to re-use the LAST B(n-1) for an assignment of the NEXT B(n), any attempt to make such SEQ calculation work in PAR will have to introduce a system-wide communication of known values, before "new"-values could get computed only after the respective "previous" B(n-1) has been evaluated and assigned -- through the pure serial SEQ chain of recurrent evaluation, thus not before all the previous cell have been processed serially -- as the LAST piece is always needed for the NEXT step , ref. the "crossroads" in for()-loop iterator dependency-map ( having this, all the rest have to wait in a "queue", to become able do their two primitive .FMA-s + .STO result for the next one in the recurrency indoctrinated "queue" ).
Yes, one can "enforce" the formula to become PAR-executed, but the very costs of such LAST values being communicated "across" the PAR-execution fabric ( towards the NEXT ) is typically prohibitively expensive ( in terms of resources and accrued delays -- either damaging the SIMT-optimised scheduler latency-masking, or blocking all the threads until receiving their "neighbour"-assigned LAST-value that they rely on and cannot proceed without getting it first -- either of which effectively devastates any potential benefit from all the efforts invested into going PAR ).
Even just a pair of FMA-s is not enough code to justify add-on costs -- indeed an extremely small amount of work to do -- for all the PAR efforts.
Unless some very mathematically "dense" processing is being in place, all the additional costs do not get easily adjusted and such attempt to introduce a PAR-mode of computing exhibits nothing but a negative ( adverse ) effect, instead of any wished speedup. One ought, in all professional cases, express all add-on costs during the Proof-of-Concept phase ( a PoC ), before deciding whether any feasible PAR-approach is possible at all, and how to achieve a speedup of >> 1.0 x
Relying on just advertised theoretical GFLOPS and TFLOPS is a nonsense. Your actual GPU-kernel will never be able to repeat the advertised tests' performance figures ( unless you run exactly the same optimised layout and code, which one does not need, does one? ). One typically needs to compute one's own specific algorithmisation, that is related to one's problem domain, not willing to artificially align all the toy-problem elements so that the GPU-silicon will not have to wait for real data and can enjoy some tweaked cache/register based ILP-artifacts, practically not achievable in most of the real-world problem solutions ). If there is one step to recommend -- do always evaluate overhead-fair PoC first to see, if there exists any such chance for speedup, before diving any resources and investing time and money into prototyping detailed design & testing.
Recurrent and weak processing GPU kernel-payloads almost in every case will fight hard to get at least their additional overhead-times ( bidirectional data-transfer related ( H2D + D2H ) + kernel-code related loads ) adjusted.

Simplify Boolean Function with don't care

Can you help me with this problem:
"Simplify the Boolean Function together with the don't care condition d in sum of the products and product of sum.
F(x,y,z) = ∑(0,1,2,4,5)
d(x, y, z) = ∑(3,6,7)"
I try to solve it but I came up with 1 and 0.
I would use a Karnaugh map for this problem. The order of the minterms would be (in the top row), 0,2,6,4 and (in the bottom row) 1, 3, 7, 5. This evaluates to 1 since the 'don't cares' can be whatever value (1 or 0).
|1|_1_|d|_1_|
| 1 | d | d | 1 |

Randomize matrix in perl, keeping row and column totals the same

I have a matrix that I want to randomize a couple of thousand times, while keeping the row and column totals the same:
1 2 3
A 0 0 1
B 1 1 0
C 1 0 0
An example of a valid random matrix would be:
1 2 3
A 1 0 0
B 1 1 0
C 0 0 1
My actual matrix is a lot bigger (about 600x600 items), so I really need an approach that is computationally efficient.
My initial (inefficient) approach consisted of shuffling arrays using the Perl Cookbook shuffle
I pasted my current code below. I've got extra code in place to start with a new shuffled list of numbers, if no solution is found in the while loop. The algorithm works fine for a small matrix, but as soon as I start scaling up it takes forever to find a random matrix that fits the requirements.
Is there a more efficient way to accomplish what I'm searching for?
Thanks a lot!
#!/usr/bin/perl -w
use strict;
my %matrix = ( 'A' => {'3' => 1 },
'B' => {'1' => 1,
'2' => 1 },
'C' => {'1' => 1 }
);
my #letters = ();
my #numbers = ();
foreach my $letter (keys %matrix){
foreach my $number (keys %{$matrix{$letter}}){
push (#letters, $letter);
push (#numbers, $number);
}
}
my %random_matrix = ();
&shuffle(\#numbers);
foreach my $letter (#letters){
while (exists($random_matrix{$letter}{$numbers[0]})){
&shuffle (\#numbers);
}
my $chosen_number = shift (#numbers);
$random_matrix{$letter}{$chosen_number} = 1;
}
sub shuffle {
my $array = shift;
my $i = scalar(#$array);
my $j;
foreach my $item (#$array )
{
--$i;
$j = int rand ($i+1);
next if $i == $j;
#$array [$i,$j] = #$array[$j,$i];
}
return #$array;
}
The problem with your current algorithm is that you are trying to shuffle your way out of dead ends -- specifically, when your #letters and #numbers arrays (after the initial shuffle of #numbers) yield the same cell more than once. That approach works when the matrix is small, because it doesn't take too many tries to find a viable re-shuffle. However, it's a killer when the lists are big. Even if you could hunt for alternatives more efficiently -- for example, trying permutations rather than random shuffling -- the approach is probably doomed.
Rather than shuffling entire lists, you might tackle the problem by making small modifications to an existing matrix.
For example, let's start with your example matrix (call it M1). Randomly pick one cell to change (say, A1). At this point the matrix is in an illegal state. Our goal will be to fix it in the minimum number of edits -- specifically 3 more edits. You implement these 3 additional edits by "walking" around the matrix, with each repair of a row or column yielding another problem to be solved, until you have walked full circle (err ... full rectangle).
For example, after changing A1 from 0 to 1, there are 3 ways to walk for the next repair: A3, B1, and C1. Let's decide that the 1st edit should fix rows. So we pick A3. On the second edit, we will fix the column, so we have choices: B3 or C3 (say, C3). The final repair offers only one choice (C1), because we need to return to the column of our original edit. The end result is a new, valid matrix.
Orig Change A1 Change A3 Change C3 Change C1
M1 M2
1 2 3 1 2 3 1 2 3 1 2 3 1 2 3
----- ----- ----- ----- -----
A | 0 0 1 1 0 1 1 0 0 1 0 0 1 0 0
B | 1 1 0 1 1 0 1 1 0 1 1 0 1 1 0
C | 1 0 0 1 0 0 1 0 0 1 0 1 0 0 1
If an editing path leads to a dead end, you backtrack. If all of the repair paths fail, the initial edit can be rejected.
This approach will generate new, valid matrixes quickly. It will not necessarily produce random outcomes: M1 and M2 will still be highly correlated with each other, a point that will become more directly evident as the size of the matrix grows.
How do you increase the randomness? You mentioned that most cells (99% or more) are zeros. One idea would be to proceed like this: for each 1 in the matrix, set its value to 0 and then repair the matrix using the 4-edit method outlined above. In effect, you would be moving all of the ones to new, random locations.
Here is an illustration. There are probably further speed optimizations in here, but this approach yielded 10 new 600x600 matrixes, at 0.5% density, in 30 seconds or so on my Windows box. Don't know if that's fast enough.
use strict;
use warnings;
# Args: N rows, N columns, density, N iterations.
main(#ARGV);
sub main {
my $n_iter = pop;
my $matrix = init_matrix(#_);
print_matrix($matrix);
for my $n (1 .. $n_iter){
warn $n, "\n"; # Show progress.
edit_matrix($matrix);
print_matrix($matrix);
}
}
sub init_matrix {
# Generate initial matrix, given N of rows, N of cols, and density.
my ($rows, $cols, $density) = #_;
my #matrix;
for my $r (1 .. $rows){
push #matrix, [ map { rand() < $density ? 1 : 0 } 1 .. $cols ];
}
return \#matrix;
}
sub print_matrix {
# Dump out a matrix for checking.
my $matrix = shift;
print "\n";
for my $row (#$matrix){
my #vals = map { $_ ? 1 : ''} #$row;
print join("\t", #vals), "\n";
}
}
sub edit_matrix {
# Takes a matrix and moves all of the non-empty cells somewhere else.
my $matrix = shift;
my $move_these = cells_to_move($matrix);
for my $cell (#$move_these){
my ($i, $j) = #$cell;
# Move the cell, provided that the cell hasn't been moved
# already and the subsequent edits don't lead to a dead end.
$matrix->[$i][$j] = 0
if $matrix->[$i][$j]
and other_edits($matrix, $cell, 0, $j);
}
}
sub cells_to_move {
# Returns a list of non-empty cells.
my $matrix = shift;
my $i = -1;
my #cells = ();
for my $row (#$matrix){
$i ++;
for my $j (0 .. #$row - 1){
push #cells, [$i, $j] if $matrix->[$i][$j];
}
}
return \#cells;
}
sub other_edits {
my ($matrix, $cell, $step, $last_j) = #_;
# We have succeeded if we've already made 3 edits.
$step ++;
return 1 if $step > 3;
# Determine the roster of next edits to fix the row or
# column total upset by our prior edit.
my ($i, $j) = #$cell;
my #fixes;
if ($step == 1){
#fixes =
map { [$i, $_] }
grep { $_ != $j and not $matrix->[$i][$_] }
0 .. #{$matrix->[0]} - 1
;
shuffle(\#fixes);
}
elsif ($step == 2) {
#fixes =
map { [$_, $j] }
grep { $_ != $i and $matrix->[$_][$j] }
0 .. #$matrix - 1
;
shuffle(\#fixes);
}
else {
# On the last edit, the column of the fix must be
# the same as the column of the initial edit.
#fixes = ([$i, $last_j]) unless $matrix->[$i][$last_j];
}
for my $f (#fixes){
# If all subsequent fixes succeed, we are golden: make
# the current fix and return true.
if ( other_edits($matrix, [#$f], $step, $last_j) ){
$matrix->[$f->[0]][$f->[1]] = $step == 2 ? 0 : 1;
return 1;
}
}
# Failure if we get here.
return;
}
sub shuffle {
my $array = shift;
my $i = scalar(#$array);
my $j;
for (#$array ){
$i --;
$j = int rand($i + 1);
#$array[$i, $j] = #$array[$j, $i] unless $i == $j;
}
}
Step 1: First I would initialize the matrix to zeros and calculate the required row and column totals.
Step 2: Now pick a random row, weighted by the count of 1s that must be in that row (so a row with count 300 is more likely to be picked than a row with weight 5).
Step 3: For this row, pick a random column, weighted by the count of 1s in that column (except ignore any cells that may already contain a 1 - more on this later).
Step 4: Place a one in this cell and reduce both the row and column count for the appropriate row and column.
Step 5: Go back to step 2 until no rows have non-zero count.
The problem though is that this algorithm can fail to terminate because you may have a row where you need to place a one, and a column that needs a one, but you've already placed a one in that cell, so you get 'stuck'. I'm not sure how likely this is to happen, but I wouldn't be surprised if it happened very frequently - enough to make the algorithm unusable. If this is a problem I can think of two ways to fix it:
a) Construct the above algorithm recursively and allow backtracking on failure.
b) Allow a cell to contain a value greater than 1 if there is no other option and keep going. Then at the end you have a correct row and column count but some cells may contain numbers greater than 1. You can fix this by finding a grouping that looks like this:
2 . . . . 0
. . . . . .
. . . . . .
0 . . . . 1
and changing it to:
1 . . . . 1
. . . . . .
. . . . . .
1 . . . . 0
It should be easy to find such a grouping if you have many zeros. I think b) is likely to be faster.
I'm not sure it's the best way, but it's probably faster than shuffling arrays. I'll be tracking this question to see what other people come up with.
I'm not a mathematician, but I figure that if you need to keep the same column and row totals, then random versions of the matrix will have the same quantity of ones and zeros.
Correct me if I'm wrong, but that would mean that making subsequent versions of the matrix would only require you to shuffle around the rows and columns.
Randomly shuffling columns won't change your totals for rows and columns, and randomly shuffling rows won't either. So, what I would do, is first shuffle rows, and then shuffle columns.
That should be pretty fast.
Not sure if it will help, but you can try going from one corner and for each column and row you should track the total and actual sum. Instead of trying to hit a good matrix, try to see the total as amount and split it. For each element, find the smaller number of row total - actual row total and column total - actual column total. Now you have the upper bound for your random number.
Is it clear? Sorry I don't know Perl, so I cannot show any code.
Like #Gabriel I'm not a Perl programmer so it's possible that this is what your code already does ...
You've only posted one example. It's not clear whether you want a random matrix which has the same number of 1s in each row and column as your start matrix, or one which has the same rows and columns but shuffled. If the latter is good enough you could create an array of row (or column, it doesn't matter) indexes and randomly permute that. You can then read your original array in the order specified by the randomised index. No need to modify the original array or create a copy.
Of course, this might not meet aspects of your requirements which are not explicit.
Thank the Perl code of FMc. Based on this solution, I rewrite it in Python (for my own use and share here for more clarity) as shown below:
matrix = numpy.array(
[[0, 0, 1],
[1, 1, 0],
[1, 0, 0]]
)
def shuffle(array):
i = len(array)
j = 0
for _ in (array):
i -= 1;
j = random.randrange(0, i+1) #int rand($i + 1);
#print('arrary:', array)
#print(f'len(array)={len(array)}, (i, j)=({i}, {j})')
if i != j:
tmp = array[i]
array[i] = array[j]
array[j] = tmp
return array
def other_edits(matrix, cell, step, last_j):
# We have succeeded if we've already made 3 edits.
step += 1
if step > 3:
return True
# Determine the roster of next edits to fix the row or
# column total upset by our prior edit.
(i, j) = cell
fixes = []
if (step == 1):
fixes = [[i, x] for x in range(len(matrix[0])) if x != j and not matrix[i][x] ]
fixes = shuffle(fixes)
elif (step == 2):
fixes = [[x, j] for x in range(len(matrix)) if x != i and matrix[x][j]]
fixes = shuffle(fixes)
else:
# On the last edit, the column of the fix must be
# the same as the column of the initial edit.
if not matrix[i][last_j]: fixes = [[i, last_j]]
for f in (fixes):
# If all subsequent fixes succeed, we are golden: make
# the current fix and return true.
if ( other_edits(matrix, f, step, last_j) ):
matrix[f[0]][f[1]] = 0 if step == 2 else 1
return True
# Failure if we get here.
return False # return False
def cells_to_move(matrix):
# Returns a list of non-empty cells.
i = -1
cells = []
for row in matrix:
i += 1;
for j in range(len(row)):
if matrix[i][j]: cells.append([i, j])
return cells
def edit_matrix(matrix):
# Takes a matrix and moves all of the non-empty cells somewhere else.
move_these = cells_to_move(matrix)
for cell in move_these:
(i, j) = cell
# Move the cell, provided that the cell hasn't been moved
# already and the subsequent edits don't lead to a dead end.
if matrix[i][j] and other_edits(matrix, cell, 0, j):
matrix[i][j] = 0
return matrix
def Shuffle_Matrix(matrix, N, M, n_iter):
for n in range(n_iter):
print(f'iteration: {n+1}') # Show progress.
matrix = edit_matrix(matrix)
#print('matrix:\n', matrix)
return matrix
print(matrix.shape[0], matrix.shape[1])
# Args: N rows, N columns, N iterations.
matrix2 = Shuffle_Matrix(matrix, matrix.shape[0], matrix.shape[1], 1)
print("The resulting matrix:\n", matrix2)