Matlab - very inefficient loop - matlab

I have two big matrices. Matrix A which is [4144514 x 3] and matrix B which is [51962 x 17].
The first three columns of A and B have the identifiers. On matrix B the 3 columns make a unique identifier, but this might be repeated on A.
I want to merge the two matrices, in such a away that the resulting matrix A, which should be [4144514 x 20], i.e. I am merging the 17 columns of B with matrix A given the criteria on the first three columns of each matrix.
This is the loop I am doing:
for i=1:size(B,1)
aux = sum(A(:,1)==B(i,1) & A(:,2)==B(i,2) & A(:,3) == B(i,3));
A(A(:,1)==B(i,1) & A(:,2)==B(i,2) & A(:,3) == B(i,3),4:20) = repmat(B(i,:),aux,1);
end
The variable aux tells me how many lines in A match the criteria from B.
The second line of the loop, creates a matrix with the columns of B repeated for aux number of times, and puts them on the correct place in A. However this is extremely inefficient.
Let me give you a toy example below with smaller matrices. I will have A to be [108 x 3] and B to be [27 x 17].
What I am looking to do is the following:
A = [100, 1, 2000 ;100, 1, 2000 ;100, 1, 2000 ;100, 1, 2000 ;100, 2, 2000 ;100, 2, 2000 ;100, 2, 2000 ;100, 2, 2000 ;100, 3, 2000 ;100, 3, 2000 ;100, 3, 2000 ;100, 3, 2000 ;100, 1, 2001 ;100, 1, 2001 ;100, 1, 2001 ;100, 1, 2001 ;100, 2, 2001 ;100, 2, 2001 ;100, 2, 2001 ;100, 2, 2001 ;100, 3, 2001 ;100, 3, 2001 ;100, 3, 2001 ;100, 3, 2001 ;100, 1, 2002 ;100, 1, 2002 ;100, 1, 2002 ;100, 1, 2002 ;100, 2, 2002 ;100, 2, 2002 ;100, 2, 2002 ;100, 2, 2002 ;100, 3, 2002 ;100, 3, 2002 ;100, 3, 2002 ;100, 3, 2002 ;101, 1, 2000 ;101, 1, 2000 ;101, 1, 2000 ;101, 1, 2000 ;101, 2, 2000 ;101, 2, 2000 ;101, 2, 2000 ;101, 2, 2000 ;101, 3, 2000 ;101, 3, 2000 ;101, 3, 2000 ;101, 3, 2000 ;101, 1, 2001 ;101, 1, 2001 ;101, 1, 2001 ;101, 1, 2001 ;101, 2, 2001 ;101, 2, 2001 ;101, 2, 2001 ;101, 2, 2001 ;101, 3, 2001 ;101, 3, 2001 ;101, 3, 2001 ;101, 3, 2001 ;101, 1, 2002 ;101, 1, 2002 ;101, 1, 2002 ;101, 1, 2002 ;101, 2, 2002 ;101, 2, 2002 ;101, 2, 2002 ;101, 2, 2002 ;101, 3, 2002 ;101, 3, 2002 ;101, 3, 2002 ;101, 3, 2002 ;103, 1, 2000 ;103, 1, 2000 ;103, 1, 2000 ;103, 1, 2000 ;103, 2, 2000 ;103, 2, 2000 ;103, 2, 2000 ;103, 2, 2000 ;103, 3, 2000 ;103, 3, 2000 ;103, 3, 2000 ;103, 3, 2000 ;103, 1, 2001 ;103, 1, 2001 ;103, 1, 2001 ;103, 1, 2001 ;103, 2, 2001 ;103, 2, 2001 ;103, 2, 2001 ;103, 2, 2001 ;103, 3, 2001 ;103, 3, 2001 ;103, 3, 2001 ;103, 3, 2001 ;103, 1, 2002 ;103, 1, 2002 ;103, 1, 2002 ;103, 1, 2002 ;103, 2, 2002 ;103, 2, 2002 ;103, 2, 2002 ;103, 2, 2002 ;103, 3, 2002 ;103, 3, 2002 ;103, 3, 2002 ;103, 3, 2002];
B = [100, 1, 2000, 8, 7, 9, 10, 1, 2, 9, 2, 1, 3, 3, 3, 9, 7; 100, 2, 2000, 8, 2, 7, 2, 7, 5, 5, 9, 2, 7, 1, 2, 6, 4; 100, 3, 2000, 8, 8, 7, 3, 2, 8, 1, 10, 9, 8, 6, 1, 5, 7; 100, 1, 2001, 10, 10, 1, 7, 2, 5, 5, 8, 6, 5, 3, 6, 6, 4; 100, 2, 2001, 6, 7, 3, 1, 5, 3, 9, 9, 3, 8, 1, 6, 4, 4; 100, 3, 2001, 1, 5, 7, 1, 5, 4, 2, 10, 5, 4, 6, 5, 1, 10; 100, 1, 2002, 7, 4, 6, 4, 7, 8, 3, 7, 7, 8, 2, 1, 6, 2; 100, 2, 2002, 8, 7, 8, 10, 2, 10, 8, 4, 7, 5, 10, 4, 4, 2; 100, 3, 2002, 4, 9, 4, 10, 2, 4, 2, 1, 4, 10, 9, 2, 6, 9; 101, 1, 2000, 5, 9, 5, 3, 10, 1, 4, 2, 10, 2, 6, 8, 5, 4; 101, 2, 2000, 6, 1, 8, 10, 10, 7, 4, 6, 5, 2, 8, 3, 2, 1; 101, 3, 2000, 7, 7, 5, 6, 2, 8, 8, 8, 1, 6, 1, 1, 9, 7; 101, 1, 2001, 8, 4, 5, 5, 8, 7, 2, 2, 9, 8, 1, 4, 2, 1; 101, 2, 2001, 3, 7, 10, 4, 9, 9, 1, 1, 10, 7, 6, 5, 10, 9; 101, 3, 2001, 10, 8, 2, 4, 6, 1, 6, 4, 8, 10, 7, 9, 4, 7; 101, 1, 2002, 6, 9, 3, 2, 10, 8, 5, 2, 6, 9, 1, 3, 6, 6; 101, 2, 2002, 8, 3, 8, 4, 4, 3, 4, 2, 4, 7, 2, 10, 3, 3; 101, 3, 2002, 10, 2, 10, 10, 6, 5, 3, 5, 10, 1, 3, 4, 8, 5; 103, 1, 2000, 8, 3, 9, 9, 4, 3, 3, 9, 7, 7, 6, 5, 2, 6; 103, 2, 2000, 6, 7, 5, 5, 7, 10, 5, 3, 5, 4, 7, 8, 9, 7; 103, 3, 2000, 3, 4, 9, 10, 3, 10, 7, 2, 10, 3, 3, 3, 6, 6; 103, 1, 2001, 7, 7, 1, 7, 10, 7, 10, 7, 9, 8, 4, 7, 6, 2; 103, 2, 2001, 5, 5, 4, 3, 7, 7, 6, 5, 2, 5, 5, 6, 6, 5; 103, 3, 2001, 6, 3, 9, 9, 2, 10, 10, 10, 10, 7, 10, 9, 9, 8; 103, 1, 2002, 5, 10, 2, 8, 6, 5, 7, 6, 4, 3, 6, 8, 7, 4; 103, 2, 2002, 10, 7, 6, 3, 10, 4, 5, 5, 1, 3, 1, 9, 1, 5; 103, 3, 2002, 2, 1, 5, 5, 2, 8, 6, 2, 6, 6, 10, 1, 4, 9];
for i=1:size(B,1)
aux = sum(A(:,1)==B(i,1) & A(:,2)==B(i,2) & A(:,3) == B(i,3));
A(A(:,1)==B(i,1) & A(:,2)==B(i,2) & A(:,3) == B(i,3),4:20) = repmat(B(i,:),aux,1);
end
With this smaller example the code runs very fast. But as soon as the matrices get as big as the ones I have, it takes ages.
Is there any faster way to do this?

A very simple task to vectorize, ismember does all the work.
%your example
A = [100, 1, 2000 ;100, 1, 2000 ;100, 1, 2000 ;100, 1, 2000 ;100, 2, 2000 ;100, 2, 2000 ;100, 2, 2000 ;100, 2, 2000 ;100, 3, 2000 ;100, 3, 2000 ;100, 3, 2000 ;100, 3, 2000 ;100, 1, 2001 ;100, 1, 2001 ;100, 1, 2001 ;100, 1, 2001 ;100, 2, 2001 ;100, 2, 2001 ;100, 2, 2001 ;100, 2, 2001 ;100, 3, 2001 ;100, 3, 2001 ;100, 3, 2001 ;100, 3, 2001 ;100, 1, 2002 ;100, 1, 2002 ;100, 1, 2002 ;100, 1, 2002 ;100, 2, 2002 ;100, 2, 2002 ;100, 2, 2002 ;100, 2, 2002 ;100, 3, 2002 ;100, 3, 2002 ;100, 3, 2002 ;100, 3, 2002 ;101, 1, 2000 ;101, 1, 2000 ;101, 1, 2000 ;101, 1, 2000 ;101, 2, 2000 ;101, 2, 2000 ;101, 2, 2000 ;101, 2, 2000 ;101, 3, 2000 ;101, 3, 2000 ;101, 3, 2000 ;101, 3, 2000 ;101, 1, 2001 ;101, 1, 2001 ;101, 1, 2001 ;101, 1, 2001 ;101, 2, 2001 ;101, 2, 2001 ;101, 2, 2001 ;101, 2, 2001 ;101, 3, 2001 ;101, 3, 2001 ;101, 3, 2001 ;101, 3, 2001 ;101, 1, 2002 ;101, 1, 2002 ;101, 1, 2002 ;101, 1, 2002 ;101, 2, 2002 ;101, 2, 2002 ;101, 2, 2002 ;101, 2, 2002 ;101, 3, 2002 ;101, 3, 2002 ;101, 3, 2002 ;101, 3, 2002 ;103, 1, 2000 ;103, 1, 2000 ;103, 1, 2000 ;103, 1, 2000 ;103, 2, 2000 ;103, 2, 2000 ;103, 2, 2000 ;103, 2, 2000 ;103, 3, 2000 ;103, 3, 2000 ;103, 3, 2000 ;103, 3, 2000 ;103, 1, 2001 ;103, 1, 2001 ;103, 1, 2001 ;103, 1, 2001 ;103, 2, 2001 ;103, 2, 2001 ;103, 2, 2001 ;103, 2, 2001 ;103, 3, 2001 ;103, 3, 2001 ;103, 3, 2001 ;103, 3, 2001 ;103, 1, 2002 ;103, 1, 2002 ;103, 1, 2002 ;103, 1, 2002 ;103, 2, 2002 ;103, 2, 2002 ;103, 2, 2002 ;103, 2, 2002 ;103, 3, 2002 ;103, 3, 2002 ;103, 3, 2002 ;103, 3, 2002];
B = [100, 1, 2000, 8, 7, 9, 10, 1, 2, 9, 2, 1, 3, 3, 3, 9, 7; 100, 2, 2000, 8, 2, 7, 2, 7, 5, 5, 9, 2, 7, 1, 2, 6, 4; 100, 3, 2000, 8, 8, 7, 3, 2, 8, 1, 10, 9, 8, 6, 1, 5, 7; 100, 1, 2001, 10, 10, 1, 7, 2, 5, 5, 8, 6, 5, 3, 6, 6, 4; 100, 2, 2001, 6, 7, 3, 1, 5, 3, 9, 9, 3, 8, 1, 6, 4, 4; 100, 3, 2001, 1, 5, 7, 1, 5, 4, 2, 10, 5, 4, 6, 5, 1, 10; 100, 1, 2002, 7, 4, 6, 4, 7, 8, 3, 7, 7, 8, 2, 1, 6, 2; 100, 2, 2002, 8, 7, 8, 10, 2, 10, 8, 4, 7, 5, 10, 4, 4, 2; 100, 3, 2002, 4, 9, 4, 10, 2, 4, 2, 1, 4, 10, 9, 2, 6, 9; 101, 1, 2000, 5, 9, 5, 3, 10, 1, 4, 2, 10, 2, 6, 8, 5, 4; 101, 2, 2000, 6, 1, 8, 10, 10, 7, 4, 6, 5, 2, 8, 3, 2, 1; 101, 3, 2000, 7, 7, 5, 6, 2, 8, 8, 8, 1, 6, 1, 1, 9, 7; 101, 1, 2001, 8, 4, 5, 5, 8, 7, 2, 2, 9, 8, 1, 4, 2, 1; 101, 2, 2001, 3, 7, 10, 4, 9, 9, 1, 1, 10, 7, 6, 5, 10, 9; 101, 3, 2001, 10, 8, 2, 4, 6, 1, 6, 4, 8, 10, 7, 9, 4, 7; 101, 1, 2002, 6, 9, 3, 2, 10, 8, 5, 2, 6, 9, 1, 3, 6, 6; 101, 2, 2002, 8, 3, 8, 4, 4, 3, 4, 2, 4, 7, 2, 10, 3, 3; 101, 3, 2002, 10, 2, 10, 10, 6, 5, 3, 5, 10, 1, 3, 4, 8, 5; 103, 1, 2000, 8, 3, 9, 9, 4, 3, 3, 9, 7, 7, 6, 5, 2, 6; 103, 2, 2000, 6, 7, 5, 5, 7, 10, 5, 3, 5, 4, 7, 8, 9, 7; 103, 3, 2000, 3, 4, 9, 10, 3, 10, 7, 2, 10, 3, 3, 3, 6, 6; 103, 1, 2001, 7, 7, 1, 7, 10, 7, 10, 7, 9, 8, 4, 7, 6, 2; 103, 2, 2001, 5, 5, 4, 3, 7, 7, 6, 5, 2, 5, 5, 6, 6, 5; 103, 3, 2001, 6, 3, 9, 9, 2, 10, 10, 10, 10, 7, 10, 9, 9, 8; 103, 1, 2002, 5, 10, 2, 8, 6, 5, 7, 6, 4, 3, 6, 8, 7, 4; 103, 2, 2002, 10, 7, 6, 3, 10, 4, 5, 5, 1, 3, 1, 9, 1, 5; 103, 3, 2002, 2, 1, 5, 5, 2, 8, 6, 2, 6, 6, 10, 1, 4, 9];
%reference code, changed output to C to preserve input
C=A;
for i=1:size(B,1)
aux = sum(A(:,1)==B(i,1) & A(:,2)==B(i,2) & A(:,3) == B(i,3));
C(A(:,1)==B(i,1) & A(:,2)==B(i,2) & A(:,3) == B(i,3),4:20) = repmat(B(i,:),aux,1);
end
%vectorized version
[~,lia]=ismember(A(:,1:3),B(:,1:3),'rows');
C2=nan(numel(lia),size(B,2)+3);
C2(lia>0,:)=B(lia(lia>0),[1,2,3,1:end]);
toc;
tic;
%simplified vectorized version, assuming there is no need to duplicate the first three rows:
[~,lia]=ismember(A(:,1:3),B(:,1:3),'rows');
C3=nan(numel(lia),size(B,2));
C3(lia>0,:)=B(lia(lia>0),:);
toc;
Comparing the performance with some example data close to your real data (same data was to large):
n=100000 %4144514
m=10000 %51962
B=rand(10000,17);
B=unique(B,'rows');
A=B(randi([1 size(B,1)],100000,1),1:3);
Decreased the execution time from 14s to less 0.05s

Related

Seq sortWith function with strange behaviour

I was trying to sort elements of a Seq object with the sortWith function when I got an exception. I didn't use the sorted function because the code below is a simplification of the real code where the seq has tuples instead of ints.
See below that in the last two cases, when comparing with (v1 <= v2) an exception is thrown, but when comparing with (v1 < v2) no exception is thrown.
heitor#heitor-340XAA-350XAA-550XAA:~$ sbt console
[info] welcome to sbt 1.6.2 (Ubuntu Java 11.0.11)
[info] loading settings for project global-plugins from sbt-updates.sbt ...
[info] loading global plugins from /home/heitor/.sbt/1.0/plugins
[info] loading project definition from /home/heitor/project
[info] loading settings for project root from build.sbt ...
[info] set current project to example (in build file:/home/heitor/)
[info] Starting scala interpreter...
Welcome to Scala 2.13.8 (OpenJDK 64-Bit Server VM, Java 11.0.11).
Type in expressions for evaluation. Or try :help.
scala> val lst69 = List(1, 10, 4, 2, 1, 1, 1, 1, 2, 1, 1, 1, 1, 2, 1, 1, 1, 1, 2, 1, 1, 2, 2, 1, 1, 1, 1, 1, 3, 1, 4, 10, 1, 1, 3, 1, 1, 1, 1, 1, 2, 1, 1, 2, 1, 1, 1, 1, 1, 1, 1, 1, 2, 1, 1, 1, 1, 2, 1, 1, 1, 1, 2, 1, 1, 1, 1, 1, 1)
val lst69: List[Int] = List(1, 10, 4, 2, 1, 1, 1, 1, 2, 1, 1, 1, 1, 2, 1, 1, 1, 1, 2, 1, 1, 2, 2, 1, 1, 1, 1, 1, 3, 1, 4, 10, 1, 1, 3, 1, 1, 1, 1, 1, 2, 1, 1, 2, 1, 1, 1, 1, 1, 1, 1, 1, 2, 1, 1, 1, 1, 2, 1, 1, 1, 1, 2, 1, 1, 1, 1, 1, 1)
scala> lst69.size
val res0: Int = 69
scala> val lst68 = lst69.take(68)
val lst68: List[Int] = List(1, 10, 4, 2, 1, 1, 1, 1, 2, 1, 1, 1, 1, 2, 1, 1, 1, 1, 2, 1, 1, 2, 2, 1, 1, 1, 1, 1, 3, 1, 4, 10, 1, 1, 3, 1, 1, 1, 1, 1, 2, 1, 1, 2, 1, 1, 1, 1, 1, 1, 1, 1, 2, 1, 1, 1, 1, 2, 1, 1, 1, 1, 2, 1, 1, 1, 1, 1)
scala> lst68.size
val res1: Int = 68
scala> lst68.sorted
val res2: List[Int] = List(1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 3, 3, 4, 4, 10, 10)
scala> lst68.sortWith{ case (v1,v2) => (v1 <= v2) }
val res3: List[Int] = List(1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 3, 3, 4, 4, 10, 10)
scala> lst69.sorted
val res4: List[Int] = List(1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 3, 3, 4, 4, 10, 10)
scala> lst69.sortWith{ case (v1,v2) => (v1 <= v2) }
java.lang.IllegalArgumentException: Comparison method violates its general contract!
at java.base/java.util.TimSort.mergeHi(TimSort.java:903)
at java.base/java.util.TimSort.mergeAt(TimSort.java:520)
at java.base/java.util.TimSort.mergeForceCollapse(TimSort.java:461)
at java.base/java.util.TimSort.sort(TimSort.java:254)
at java.base/java.util.Arrays.sort(Arrays.java:1441)
at scala.collection.SeqOps.sorted(Seq.scala:700)
at scala.collection.SeqOps.sorted$(Seq.scala:692)
at scala.collection.immutable.List.scala$collection$immutable$StrictOptimizedSeqOps$$super$sorted(List.scala:79)
at scala.collection.immutable.StrictOptimizedSeqOps.sorted(StrictOptimizedSeqOps.scala:78)
at scala.collection.immutable.StrictOptimizedSeqOps.sorted$(StrictOptimizedSeqOps.scala:78)
at scala.collection.immutable.List.sorted(List.scala:79)
at scala.collection.SeqOps.sortWith(Seq.scala:727)
at scala.collection.SeqOps.sortWith$(Seq.scala:727)
at scala.collection.AbstractSeq.sortWith(Seq.scala:1161)
... 59 elided
scala> lst69.sortWith{ case (v1,v2) => (v1 < v2) }
val res6: List[Int] = List(1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 3, 3, 4, 4, 10, 10)
scala> ```
You can try instead:
lst69.sortWith{ case (v1,v2) => (v1 < v2 || v1 == v2 ) }

MongoDB - transform array field to set

I have documents with a field containing an array of values which can be duplicated. I want to transform these documents with an extra field corresponding to unique values of this array. I tried aggregate + addToSet without success.
Data:
{..., "random_integers" : [1, 1, 2, 2, 3, 3]},
{..., "random_integers" : [2, 3, 4, 4, 5, 6]},
{..., "random_integers" : [9, 9, 8, 8, 7, 7]}
Expecting:
{
...
"random_integers" : [1, 1, 2, 2, 3, 3],
"unique_integers" : [1, 2, 3],
},
{
...
"random_integers" : [2, 3, 4, 4, 5, 6],
"unique_integers" : [2, 3, 4, 5, 6],
},
{
...
"random_integers" : [9, 9, 8, 8, 7, 7],
"unique_integers" : [7, 8, 9],
}
Try with aggregate + addToSet():
# Query
db.getCollection().aggregate([
{
$group: {
_id: '$_id',
unique_integers: {$addToSet: '$random_integers' }
}
}
])
# Results
{..., "unique_integers" : [[1, 1, 2, 2, 3, 3]]},
{..., "unique_integers" : [[2, 3, 4, 4, 5, 6]]},
{..., "unique_integers" : [[9, 9, 8, 8, 7, 7]]}
$addToSet add the whole list into a set, instead of each element of the array. I tried to combine $addToSet with $each but it is not recognize by mongo on a group:
# Query
db.getCollection().aggregate([
{
$group: {
_id: '$_id',
unique_integers: {$addToSet: { $each: '$random_integers' }}
}
}
])
# Error
Error: command failed: {
"ok" : 0,
"errmsg" : "Unrecognized expression '$each'",
"code" : 168,
"codeName" : "InvalidPipelineOperator"
} : aggregate failed
db.ints.aggregate( [
{ $project: {
random_integers: 1,
unique_integers: { $setIntersection: [ "$random_integers", "$random_integers" ] },
_id: 0
} }
] )

Numpy array: Conditional encoding

I have following numpy array:
array([1, 1, 1, 3, 1, 1, 1, 1, 1, 1, 2, 0, 1, 1, 2, 3, 3, 3, 3, 1, 1, 1, 1,
1, 3, 1, 1, 3, 0, 1, 3, 1, 2, 1, 1, 1, 1, 1, 2, 1, 2, 0, 1, 2, 0, 2,
2, 2, 1, 2, 2, 0, 2, 1, 1, 1, 1, 1, 1, 1, 3, 1, 1, 1, 3, 0, 2, 1, 1,
1, 1, 3, 1, 1, 2, 1, 1, 2, 1, 1, 1, 1, 2, 1, 2, 1, 1, 1, 1, 0, 2, 3,
2, 1, 1, 1, 1, 3, 1, 0])
Question: How can I create another array that encodes the data, given condition: If value = 3 or 2, then "1", else "0".
I tried:
from sklearn.preprocessing import label_binarize
label_binarize(doc_topics, classes=[3,2])[:15]
array([[0, 0],
[0, 0],
[0, 0],
[1, 0],
[0, 0],
[0, 0],
[0, 0],
[0, 0],
[0, 0],
[0, 0],
[0, 1],
[0, 0],
[0, 0],
[0, 0],
[0, 1]])
However, this seems to return a 2-D array.
Use np.where and pass your condition to mask the elements of interest to set where the condition is met to 1, 0 otherwise:
In[18]:
a = np.where((a==3) | (a == 2),1,0)
a
Out[18]:
array([0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 1, 1, 1, 1, 1, 0, 0, 0,
0, 0, 1, 0, 0, 1, 0, 0, 1, 0, 1, 0, 0, 0, 0, 0, 1, 0, 1, 0, 0, 1,
0, 1, 1, 1, 0, 1, 1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 1, 0,
1, 0, 0, 0, 0, 1, 0, 0, 1, 0, 0, 1, 0, 0, 0, 0, 1, 0, 1, 0, 0, 0,
0, 0, 1, 1, 1, 0, 0, 0, 0, 1, 0, 0])
Here we compare the array with the desired values, and use the unary | to or the conditions, due to operator precedence we have to use parentheses () around the conditions.
To do this using sklearn:
In[68]:
binarizer = preprocessing.Binarizer(threshold=1)
binarizer.transform(a.reshape(1,-1))
Out[68]:
array([[0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 1, 1, 1, 1, 1, 0, 0, 0,
0, 0, 1, 0, 0, 1, 0, 0, 1, 0, 1, 0, 0, 0, 0, 0, 1, 0, 1, 0, 0, 1,
0, 1, 1, 1, 0, 1, 1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 1, 0,
1, 0, 0, 0, 0, 1, 0, 0, 1, 0, 0, 1, 0, 0, 0, 0, 1, 0, 1, 0, 0, 0,
0, 0, 1, 1, 1, 0, 0, 0, 0, 1, 0, 0]])
This treats values above 1 as 1 and 0 otherwise, this only works for this specific data set as you want 2 and 3 to be 1, it won't work if you have other values, so the numpy method is more general

Compare an element from a map with the rest of the map

I'm trying to select the last element from a map and compare it against the other last elements in the map to say if the selected element is higher or lower.
So far I can get it to select the element from user input but cannot work out how to loop through the map and get it to compare against the other elements:
def menu(f: (String) => (String, Int)) = {
print("Number > ")
val data = f(readLine)
println(s"${data._1}: ${data._2}")
}
Here is the map which the elements are coming from:
val mapdata = Map(
"A1" -> List(9, 7, 2, 0, 0, 2, 7, 9, 1, 2, 4, 1, 9, 6, 5, 3, 2, 3, 7, 2, 8, 5, 4, 5, 1, 6, 5, 2, 4, 1),
"B2" -> List(0, 7, 6, 3, 3, 3, 1, 2, 9, 2, 9, 7, 4, 7, 3, 6, 3, 9, 5, 2, 9, 7, 3, 4, 6, 3, 4, 3, 4, 1),
"C3" -> List(8, 7, 1, 8, 0, 5, 8, 0, 5, 9, 7, 5, 3, 7, 9, 8, 1, 4, 6, 5, 6, 6, 3, 6, 8, 8, 7, 4, 0, 6),
"D4" -> List(2, 9, 5, 7, 3, 8, 6, 9, 7, 9, 0, 1, 3, 1, 3, 0, 0, 1, 3, 8, 5, 4, 0, 9, 7, 1, 4, 5, 2, 9),
"E5" -> List(2, 6, 8, 0, 3, 5, 2, 1, 5, 9, 4, 5, 3, 5, 5, 8, 8, 2, 5, 9, 3, 8, 6, 7, 8, 7, 4, 1, 2, 3),
"F6" -> List(2, 7, 5, 9, 1, 9, 2, 4, 1, 6, 3, 7, 4, 3, 4, 5, 9, 2, 2, 4, 8, 7, 9, 2, 2, 7, 9, 1, 6, 9),
"G7" -> List(6, 9, 5, 0, 8, 0, 0, 5, 8, 5, 8, 7, 1, 6, 6, 1, 5, 2, 2, 7, 9, 5, 5, 9, 1, 4, 4, 0, 2, 0),
"H8" -> List(2, 8, 8, 3, 2, 1, 1, 8, 5, 9, 0, 2, 1, 6, 9, 7, 9, 6, 7, 7, 0, 9, 5, 2, 5, 0, 2, 1, 8, 6),
"I9" -> List(2, 1, 8, 2, 4, 4, 2, 4, 9, 4, 0, 6, 9, 5, 9, 4, 9, 1, 8, 6, 3, 4, 4, 3, 7, 9, 1, 2, 6, 6)
)
for example I select H8 and its last element is 6
I then want to compare it to all last elements and say if it is higher or lower than each of the last elements.
Thanks in advance
I'm not sure if I understand you correctly. You can access last element of the list by calling last.
object A extends App {
val mapdata: Map[String, List[Int]] = Map(
"A1" -> List(9, 7, 2, 0, 0, 2, 7, 9, 1, 2, 4, 1, 9, 6, 5, 3, 2, 3, 7, 2, 8, 5, 4, 5, 1, 6, 5, 2, 4, 1),
"B2" -> List(0, 7, 6, 3, 3, 3, 1, 2, 9, 2, 9, 7, 4, 7, 3, 6, 3, 9, 5, 2, 9, 7, 3, 4, 6, 3, 4, 3, 4, 1),
"C3" -> List(8, 7, 1, 8, 0, 5, 8, 0, 5, 9, 7, 5, 3, 7, 9, 8, 1, 4, 6, 5, 6, 6, 3, 6, 8, 8, 7, 4, 0, 6),
"D4" -> List(2, 9, 5, 7, 3, 8, 6, 9, 7, 9, 0, 1, 3, 1, 3, 0, 0, 1, 3, 8, 5, 4, 0, 9, 7, 1, 4, 5, 2, 9),
"E5" -> List(2, 6, 8, 0, 3, 5, 2, 1, 5, 9, 4, 5, 3, 5, 5, 8, 8, 2, 5, 9, 3, 8, 6, 7, 8, 7, 4, 1, 2, 3),
"F6" -> List(2, 7, 5, 9, 1, 9, 2, 4, 1, 6, 3, 7, 4, 3, 4, 5, 9, 2, 2, 4, 8, 7, 9, 2, 2, 7, 9, 1, 6, 9),
"G7" -> List(6, 9, 5, 0, 8, 0, 0, 5, 8, 5, 8, 7, 1, 6, 6, 1, 5, 2, 2, 7, 9, 5, 5, 9, 1, 4, 4, 0, 2, 0),
"H8" -> List(2, 8, 8, 3, 2, 1, 1, 8, 5, 9, 0, 2, 1, 6, 9, 7, 9, 6, 7, 7, 0, 9, 5, 2, 5, 0, 2, 1, 8, 6),
"I9" -> List(2, 1, 8, 2, 4, 4, 2, 4, 9, 4, 0, 6, 9, 5, 9, 4, 9, 1, 8, 6, 3, 4, 4, 3, 7, 9, 1, 2, 6, 6)
)
val choice = "H8"
val numberToBeCompared = mapdata.getOrElse(choice, throw new RuntimeException(s"couldn't find $choice")).last
mapdata.filter(_._1 != choice)
.values
.foreach(list => {
if (list.last > numberToBeCompared)
println(s"${list.last} > $numberToBeCompared")
else
println(s"${list.last} <= $numberToBeCompared")
})
}
Result:
1 <= 6
1 <= 6
0 <= 6
3 <= 6
9 > 6
9 > 6
6 <= 6
6 <= 6
Is this what you need ?
Note that I still learn Scala so there's probably better way of doing this.
I also don't know if any list in mapdata can be empty so use Option if necessary.
// EDIT
If you want the letters you may try sth like this:
val choice = "H8"
val numberToBeCompared = mapdata.getOrElse(choice, throw new RuntimeException(s"couldn't find $choice")).last
mapdata.filter(_._1 != choice)
.foreach(entry => {
if (entry._2.last > numberToBeCompared)
println(s"${choice} > ${entry._1}")
else
println(s"${choice} <= ${entry._1}")
})
H8 <= A1
H8 <= B2
H8 <= G7
H8 <= E5
H8 > D4
H8 > F6
H8 <= C3
H8 <= I9
You just need to remember that when you iterate through mapdata you get entry. entry._1 is the key (your string), entry._2 is the list.

Probit model in winbugs

I conducted an analysis using a logit model and now want to do the same using a probit model. Can anyone please turn this winbugs logit model into a winbugs probit model?
model
{
for (i in 1:n) {
# Linear regression on logit
logit(p[i]) <- alpha + b.sex*sex[i] + b.age*age[i]
# Likelihood function for each data point
frac[i] ~ dbern(p[i])
}
alpha ~ dnorm(0.0,1.0E-4) # Prior for intercept
b.sex ~ dnorm(0.0,1.0E-4) # Prior for slope of sex
b.age ~ dnorm(0.0,1.0E-4) # Prior for slope of age
}
Data
list(sex=c(1, 1, 1, 0, 1, 1, 0, 0, 0, 0, 1, 1, 1, 1, 1, 0, 1, 0, 0, 0, 1, 1, 1, 0, 0, 1,
1, 0, 1, 1, 1, 0, 0, 0, 1, 1, 0, 0, 1, 1, 0, 1, 0, 0, 0, 1, 0, 0, 0, 1, 0, 1, 0, 1, 1, 1, 0,
0, 1, 1, 1, 1, 0, 0, 0, 1, 1, 1, 0, 0, 1, 1, 1, 0, 0, 0, 1, 1, 0, 0, 0, 0, 1, 0, 0, 1, 1, 1,
0, 1, 0, 1, 1, 1, 0, 1, 1, 1, 1, 1),
age= c(69, 57, 61, 60, 69, 74, 63, 68, 64, 53, 60, 58, 79, 56, 53, 74, 56, 76, 72,
56, 66, 52, 77, 70, 69, 76, 72, 53, 69, 59, 73, 77, 55, 77, 68, 62, 56, 68, 70, 60,
65, 55, 64, 75, 60, 67, 61, 69, 75, 68, 72, 71, 54, 52, 54, 50, 75, 59, 65, 60, 60,
57, 51, 51, 63, 57, 80, 52, 65, 72, 80, 73, 76, 79, 66, 51, 76, 75, 66, 75, 78, 70,
67, 51, 70, 71, 71, 74, 74, 60, 58, 55, 61, 65, 52, 68, 75, 52, 53, 70),
frac=c(1, 1, 1, 0, 1, 1, 0, 1, 1, 0, 1, 0, 1, 1, 0, 1, 0, 1, 1, 0, 1, 0, 1, 1, 1, 1, 1, 0,
1, 0, 1, 1, 0, 1, 1, 1, 0, 1, 1, 0, 1, 0, 0, 1, 0, 1, 0, 1, 1, 1, 1, 1, 0, 0, 0, 0, 1, 1, 1,
1, 1, 0, 0, 0, 1, 0, 1, 0, 0, 1, 1, 1, 1, 1, 0, 0, 1, 1, 0, 1, 1, 1, 0, 0, 1, 1, 1, 1, 1, 1,
1, 0, 1, 1, 0, 0, 1, 0, 0, 1),
n=100)
Initial Values
list(alpha=0, b.sex=1, b.age=1)
WinBUGS accepts multiple types of link functions (see page 15 in the WinBUGS manual). For a probit model, change your linear regression equation to:
probit(p[i]) <- alpha + b.sex*sex[i] + b.age*age[i]
I would recommend you center the age variable, otherwise you may well run into some convergence problems, so something like:
probit(p[i]) <- alpha + b.sex*sex[i] + b.age*(age[i] - mean(age[]))
Alternatively, for a probit model (if the probit functions gives you some trap errors) you could use the phi standard normal cdf function:
p[i] <- phi(alpha + b.sex*sex[i] + b.age*(age[i] - mean(age[])))