Numpy array: Conditional encoding - encoding

I have following numpy array:
array([1, 1, 1, 3, 1, 1, 1, 1, 1, 1, 2, 0, 1, 1, 2, 3, 3, 3, 3, 1, 1, 1, 1,
1, 3, 1, 1, 3, 0, 1, 3, 1, 2, 1, 1, 1, 1, 1, 2, 1, 2, 0, 1, 2, 0, 2,
2, 2, 1, 2, 2, 0, 2, 1, 1, 1, 1, 1, 1, 1, 3, 1, 1, 1, 3, 0, 2, 1, 1,
1, 1, 3, 1, 1, 2, 1, 1, 2, 1, 1, 1, 1, 2, 1, 2, 1, 1, 1, 1, 0, 2, 3,
2, 1, 1, 1, 1, 3, 1, 0])
Question: How can I create another array that encodes the data, given condition: If value = 3 or 2, then "1", else "0".
I tried:
from sklearn.preprocessing import label_binarize
label_binarize(doc_topics, classes=[3,2])[:15]
array([[0, 0],
[0, 0],
[0, 0],
[1, 0],
[0, 0],
[0, 0],
[0, 0],
[0, 0],
[0, 0],
[0, 0],
[0, 1],
[0, 0],
[0, 0],
[0, 0],
[0, 1]])
However, this seems to return a 2-D array.

Use np.where and pass your condition to mask the elements of interest to set where the condition is met to 1, 0 otherwise:
In[18]:
a = np.where((a==3) | (a == 2),1,0)
a
Out[18]:
array([0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 1, 1, 1, 1, 1, 0, 0, 0,
0, 0, 1, 0, 0, 1, 0, 0, 1, 0, 1, 0, 0, 0, 0, 0, 1, 0, 1, 0, 0, 1,
0, 1, 1, 1, 0, 1, 1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 1, 0,
1, 0, 0, 0, 0, 1, 0, 0, 1, 0, 0, 1, 0, 0, 0, 0, 1, 0, 1, 0, 0, 0,
0, 0, 1, 1, 1, 0, 0, 0, 0, 1, 0, 0])
Here we compare the array with the desired values, and use the unary | to or the conditions, due to operator precedence we have to use parentheses () around the conditions.
To do this using sklearn:
In[68]:
binarizer = preprocessing.Binarizer(threshold=1)
binarizer.transform(a.reshape(1,-1))
Out[68]:
array([[0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 1, 1, 1, 1, 1, 0, 0, 0,
0, 0, 1, 0, 0, 1, 0, 0, 1, 0, 1, 0, 0, 0, 0, 0, 1, 0, 1, 0, 0, 1,
0, 1, 1, 1, 0, 1, 1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 1, 0,
1, 0, 0, 0, 0, 1, 0, 0, 1, 0, 0, 1, 0, 0, 0, 0, 1, 0, 1, 0, 0, 0,
0, 0, 1, 1, 1, 0, 0, 0, 0, 1, 0, 0]])
This treats values above 1 as 1 and 0 otherwise, this only works for this specific data set as you want 2 and 3 to be 1, it won't work if you have other values, so the numpy method is more general

Related

Seq sortWith function with strange behaviour

I was trying to sort elements of a Seq object with the sortWith function when I got an exception. I didn't use the sorted function because the code below is a simplification of the real code where the seq has tuples instead of ints.
See below that in the last two cases, when comparing with (v1 <= v2) an exception is thrown, but when comparing with (v1 < v2) no exception is thrown.
heitor#heitor-340XAA-350XAA-550XAA:~$ sbt console
[info] welcome to sbt 1.6.2 (Ubuntu Java 11.0.11)
[info] loading settings for project global-plugins from sbt-updates.sbt ...
[info] loading global plugins from /home/heitor/.sbt/1.0/plugins
[info] loading project definition from /home/heitor/project
[info] loading settings for project root from build.sbt ...
[info] set current project to example (in build file:/home/heitor/)
[info] Starting scala interpreter...
Welcome to Scala 2.13.8 (OpenJDK 64-Bit Server VM, Java 11.0.11).
Type in expressions for evaluation. Or try :help.
scala> val lst69 = List(1, 10, 4, 2, 1, 1, 1, 1, 2, 1, 1, 1, 1, 2, 1, 1, 1, 1, 2, 1, 1, 2, 2, 1, 1, 1, 1, 1, 3, 1, 4, 10, 1, 1, 3, 1, 1, 1, 1, 1, 2, 1, 1, 2, 1, 1, 1, 1, 1, 1, 1, 1, 2, 1, 1, 1, 1, 2, 1, 1, 1, 1, 2, 1, 1, 1, 1, 1, 1)
val lst69: List[Int] = List(1, 10, 4, 2, 1, 1, 1, 1, 2, 1, 1, 1, 1, 2, 1, 1, 1, 1, 2, 1, 1, 2, 2, 1, 1, 1, 1, 1, 3, 1, 4, 10, 1, 1, 3, 1, 1, 1, 1, 1, 2, 1, 1, 2, 1, 1, 1, 1, 1, 1, 1, 1, 2, 1, 1, 1, 1, 2, 1, 1, 1, 1, 2, 1, 1, 1, 1, 1, 1)
scala> lst69.size
val res0: Int = 69
scala> val lst68 = lst69.take(68)
val lst68: List[Int] = List(1, 10, 4, 2, 1, 1, 1, 1, 2, 1, 1, 1, 1, 2, 1, 1, 1, 1, 2, 1, 1, 2, 2, 1, 1, 1, 1, 1, 3, 1, 4, 10, 1, 1, 3, 1, 1, 1, 1, 1, 2, 1, 1, 2, 1, 1, 1, 1, 1, 1, 1, 1, 2, 1, 1, 1, 1, 2, 1, 1, 1, 1, 2, 1, 1, 1, 1, 1)
scala> lst68.size
val res1: Int = 68
scala> lst68.sorted
val res2: List[Int] = List(1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 3, 3, 4, 4, 10, 10)
scala> lst68.sortWith{ case (v1,v2) => (v1 <= v2) }
val res3: List[Int] = List(1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 3, 3, 4, 4, 10, 10)
scala> lst69.sorted
val res4: List[Int] = List(1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 3, 3, 4, 4, 10, 10)
scala> lst69.sortWith{ case (v1,v2) => (v1 <= v2) }
java.lang.IllegalArgumentException: Comparison method violates its general contract!
at java.base/java.util.TimSort.mergeHi(TimSort.java:903)
at java.base/java.util.TimSort.mergeAt(TimSort.java:520)
at java.base/java.util.TimSort.mergeForceCollapse(TimSort.java:461)
at java.base/java.util.TimSort.sort(TimSort.java:254)
at java.base/java.util.Arrays.sort(Arrays.java:1441)
at scala.collection.SeqOps.sorted(Seq.scala:700)
at scala.collection.SeqOps.sorted$(Seq.scala:692)
at scala.collection.immutable.List.scala$collection$immutable$StrictOptimizedSeqOps$$super$sorted(List.scala:79)
at scala.collection.immutable.StrictOptimizedSeqOps.sorted(StrictOptimizedSeqOps.scala:78)
at scala.collection.immutable.StrictOptimizedSeqOps.sorted$(StrictOptimizedSeqOps.scala:78)
at scala.collection.immutable.List.sorted(List.scala:79)
at scala.collection.SeqOps.sortWith(Seq.scala:727)
at scala.collection.SeqOps.sortWith$(Seq.scala:727)
at scala.collection.AbstractSeq.sortWith(Seq.scala:1161)
... 59 elided
scala> lst69.sortWith{ case (v1,v2) => (v1 < v2) }
val res6: List[Int] = List(1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 3, 3, 4, 4, 10, 10)
scala> ```
You can try instead:
lst69.sortWith{ case (v1,v2) => (v1 < v2 || v1 == v2 ) }

Compare an element from a map with the rest of the map

I'm trying to select the last element from a map and compare it against the other last elements in the map to say if the selected element is higher or lower.
So far I can get it to select the element from user input but cannot work out how to loop through the map and get it to compare against the other elements:
def menu(f: (String) => (String, Int)) = {
print("Number > ")
val data = f(readLine)
println(s"${data._1}: ${data._2}")
}
Here is the map which the elements are coming from:
val mapdata = Map(
"A1" -> List(9, 7, 2, 0, 0, 2, 7, 9, 1, 2, 4, 1, 9, 6, 5, 3, 2, 3, 7, 2, 8, 5, 4, 5, 1, 6, 5, 2, 4, 1),
"B2" -> List(0, 7, 6, 3, 3, 3, 1, 2, 9, 2, 9, 7, 4, 7, 3, 6, 3, 9, 5, 2, 9, 7, 3, 4, 6, 3, 4, 3, 4, 1),
"C3" -> List(8, 7, 1, 8, 0, 5, 8, 0, 5, 9, 7, 5, 3, 7, 9, 8, 1, 4, 6, 5, 6, 6, 3, 6, 8, 8, 7, 4, 0, 6),
"D4" -> List(2, 9, 5, 7, 3, 8, 6, 9, 7, 9, 0, 1, 3, 1, 3, 0, 0, 1, 3, 8, 5, 4, 0, 9, 7, 1, 4, 5, 2, 9),
"E5" -> List(2, 6, 8, 0, 3, 5, 2, 1, 5, 9, 4, 5, 3, 5, 5, 8, 8, 2, 5, 9, 3, 8, 6, 7, 8, 7, 4, 1, 2, 3),
"F6" -> List(2, 7, 5, 9, 1, 9, 2, 4, 1, 6, 3, 7, 4, 3, 4, 5, 9, 2, 2, 4, 8, 7, 9, 2, 2, 7, 9, 1, 6, 9),
"G7" -> List(6, 9, 5, 0, 8, 0, 0, 5, 8, 5, 8, 7, 1, 6, 6, 1, 5, 2, 2, 7, 9, 5, 5, 9, 1, 4, 4, 0, 2, 0),
"H8" -> List(2, 8, 8, 3, 2, 1, 1, 8, 5, 9, 0, 2, 1, 6, 9, 7, 9, 6, 7, 7, 0, 9, 5, 2, 5, 0, 2, 1, 8, 6),
"I9" -> List(2, 1, 8, 2, 4, 4, 2, 4, 9, 4, 0, 6, 9, 5, 9, 4, 9, 1, 8, 6, 3, 4, 4, 3, 7, 9, 1, 2, 6, 6)
)
for example I select H8 and its last element is 6
I then want to compare it to all last elements and say if it is higher or lower than each of the last elements.
Thanks in advance
I'm not sure if I understand you correctly. You can access last element of the list by calling last.
object A extends App {
val mapdata: Map[String, List[Int]] = Map(
"A1" -> List(9, 7, 2, 0, 0, 2, 7, 9, 1, 2, 4, 1, 9, 6, 5, 3, 2, 3, 7, 2, 8, 5, 4, 5, 1, 6, 5, 2, 4, 1),
"B2" -> List(0, 7, 6, 3, 3, 3, 1, 2, 9, 2, 9, 7, 4, 7, 3, 6, 3, 9, 5, 2, 9, 7, 3, 4, 6, 3, 4, 3, 4, 1),
"C3" -> List(8, 7, 1, 8, 0, 5, 8, 0, 5, 9, 7, 5, 3, 7, 9, 8, 1, 4, 6, 5, 6, 6, 3, 6, 8, 8, 7, 4, 0, 6),
"D4" -> List(2, 9, 5, 7, 3, 8, 6, 9, 7, 9, 0, 1, 3, 1, 3, 0, 0, 1, 3, 8, 5, 4, 0, 9, 7, 1, 4, 5, 2, 9),
"E5" -> List(2, 6, 8, 0, 3, 5, 2, 1, 5, 9, 4, 5, 3, 5, 5, 8, 8, 2, 5, 9, 3, 8, 6, 7, 8, 7, 4, 1, 2, 3),
"F6" -> List(2, 7, 5, 9, 1, 9, 2, 4, 1, 6, 3, 7, 4, 3, 4, 5, 9, 2, 2, 4, 8, 7, 9, 2, 2, 7, 9, 1, 6, 9),
"G7" -> List(6, 9, 5, 0, 8, 0, 0, 5, 8, 5, 8, 7, 1, 6, 6, 1, 5, 2, 2, 7, 9, 5, 5, 9, 1, 4, 4, 0, 2, 0),
"H8" -> List(2, 8, 8, 3, 2, 1, 1, 8, 5, 9, 0, 2, 1, 6, 9, 7, 9, 6, 7, 7, 0, 9, 5, 2, 5, 0, 2, 1, 8, 6),
"I9" -> List(2, 1, 8, 2, 4, 4, 2, 4, 9, 4, 0, 6, 9, 5, 9, 4, 9, 1, 8, 6, 3, 4, 4, 3, 7, 9, 1, 2, 6, 6)
)
val choice = "H8"
val numberToBeCompared = mapdata.getOrElse(choice, throw new RuntimeException(s"couldn't find $choice")).last
mapdata.filter(_._1 != choice)
.values
.foreach(list => {
if (list.last > numberToBeCompared)
println(s"${list.last} > $numberToBeCompared")
else
println(s"${list.last} <= $numberToBeCompared")
})
}
Result:
1 <= 6
1 <= 6
0 <= 6
3 <= 6
9 > 6
9 > 6
6 <= 6
6 <= 6
Is this what you need ?
Note that I still learn Scala so there's probably better way of doing this.
I also don't know if any list in mapdata can be empty so use Option if necessary.
// EDIT
If you want the letters you may try sth like this:
val choice = "H8"
val numberToBeCompared = mapdata.getOrElse(choice, throw new RuntimeException(s"couldn't find $choice")).last
mapdata.filter(_._1 != choice)
.foreach(entry => {
if (entry._2.last > numberToBeCompared)
println(s"${choice} > ${entry._1}")
else
println(s"${choice} <= ${entry._1}")
})
H8 <= A1
H8 <= B2
H8 <= G7
H8 <= E5
H8 > D4
H8 > F6
H8 <= C3
H8 <= I9
You just need to remember that when you iterate through mapdata you get entry. entry._1 is the key (your string), entry._2 is the list.

Matlab - very inefficient loop

I have two big matrices. Matrix A which is [4144514 x 3] and matrix B which is [51962 x 17].
The first three columns of A and B have the identifiers. On matrix B the 3 columns make a unique identifier, but this might be repeated on A.
I want to merge the two matrices, in such a away that the resulting matrix A, which should be [4144514 x 20], i.e. I am merging the 17 columns of B with matrix A given the criteria on the first three columns of each matrix.
This is the loop I am doing:
for i=1:size(B,1)
aux = sum(A(:,1)==B(i,1) & A(:,2)==B(i,2) & A(:,3) == B(i,3));
A(A(:,1)==B(i,1) & A(:,2)==B(i,2) & A(:,3) == B(i,3),4:20) = repmat(B(i,:),aux,1);
end
The variable aux tells me how many lines in A match the criteria from B.
The second line of the loop, creates a matrix with the columns of B repeated for aux number of times, and puts them on the correct place in A. However this is extremely inefficient.
Let me give you a toy example below with smaller matrices. I will have A to be [108 x 3] and B to be [27 x 17].
What I am looking to do is the following:
A = [100, 1, 2000 ;100, 1, 2000 ;100, 1, 2000 ;100, 1, 2000 ;100, 2, 2000 ;100, 2, 2000 ;100, 2, 2000 ;100, 2, 2000 ;100, 3, 2000 ;100, 3, 2000 ;100, 3, 2000 ;100, 3, 2000 ;100, 1, 2001 ;100, 1, 2001 ;100, 1, 2001 ;100, 1, 2001 ;100, 2, 2001 ;100, 2, 2001 ;100, 2, 2001 ;100, 2, 2001 ;100, 3, 2001 ;100, 3, 2001 ;100, 3, 2001 ;100, 3, 2001 ;100, 1, 2002 ;100, 1, 2002 ;100, 1, 2002 ;100, 1, 2002 ;100, 2, 2002 ;100, 2, 2002 ;100, 2, 2002 ;100, 2, 2002 ;100, 3, 2002 ;100, 3, 2002 ;100, 3, 2002 ;100, 3, 2002 ;101, 1, 2000 ;101, 1, 2000 ;101, 1, 2000 ;101, 1, 2000 ;101, 2, 2000 ;101, 2, 2000 ;101, 2, 2000 ;101, 2, 2000 ;101, 3, 2000 ;101, 3, 2000 ;101, 3, 2000 ;101, 3, 2000 ;101, 1, 2001 ;101, 1, 2001 ;101, 1, 2001 ;101, 1, 2001 ;101, 2, 2001 ;101, 2, 2001 ;101, 2, 2001 ;101, 2, 2001 ;101, 3, 2001 ;101, 3, 2001 ;101, 3, 2001 ;101, 3, 2001 ;101, 1, 2002 ;101, 1, 2002 ;101, 1, 2002 ;101, 1, 2002 ;101, 2, 2002 ;101, 2, 2002 ;101, 2, 2002 ;101, 2, 2002 ;101, 3, 2002 ;101, 3, 2002 ;101, 3, 2002 ;101, 3, 2002 ;103, 1, 2000 ;103, 1, 2000 ;103, 1, 2000 ;103, 1, 2000 ;103, 2, 2000 ;103, 2, 2000 ;103, 2, 2000 ;103, 2, 2000 ;103, 3, 2000 ;103, 3, 2000 ;103, 3, 2000 ;103, 3, 2000 ;103, 1, 2001 ;103, 1, 2001 ;103, 1, 2001 ;103, 1, 2001 ;103, 2, 2001 ;103, 2, 2001 ;103, 2, 2001 ;103, 2, 2001 ;103, 3, 2001 ;103, 3, 2001 ;103, 3, 2001 ;103, 3, 2001 ;103, 1, 2002 ;103, 1, 2002 ;103, 1, 2002 ;103, 1, 2002 ;103, 2, 2002 ;103, 2, 2002 ;103, 2, 2002 ;103, 2, 2002 ;103, 3, 2002 ;103, 3, 2002 ;103, 3, 2002 ;103, 3, 2002];
B = [100, 1, 2000, 8, 7, 9, 10, 1, 2, 9, 2, 1, 3, 3, 3, 9, 7; 100, 2, 2000, 8, 2, 7, 2, 7, 5, 5, 9, 2, 7, 1, 2, 6, 4; 100, 3, 2000, 8, 8, 7, 3, 2, 8, 1, 10, 9, 8, 6, 1, 5, 7; 100, 1, 2001, 10, 10, 1, 7, 2, 5, 5, 8, 6, 5, 3, 6, 6, 4; 100, 2, 2001, 6, 7, 3, 1, 5, 3, 9, 9, 3, 8, 1, 6, 4, 4; 100, 3, 2001, 1, 5, 7, 1, 5, 4, 2, 10, 5, 4, 6, 5, 1, 10; 100, 1, 2002, 7, 4, 6, 4, 7, 8, 3, 7, 7, 8, 2, 1, 6, 2; 100, 2, 2002, 8, 7, 8, 10, 2, 10, 8, 4, 7, 5, 10, 4, 4, 2; 100, 3, 2002, 4, 9, 4, 10, 2, 4, 2, 1, 4, 10, 9, 2, 6, 9; 101, 1, 2000, 5, 9, 5, 3, 10, 1, 4, 2, 10, 2, 6, 8, 5, 4; 101, 2, 2000, 6, 1, 8, 10, 10, 7, 4, 6, 5, 2, 8, 3, 2, 1; 101, 3, 2000, 7, 7, 5, 6, 2, 8, 8, 8, 1, 6, 1, 1, 9, 7; 101, 1, 2001, 8, 4, 5, 5, 8, 7, 2, 2, 9, 8, 1, 4, 2, 1; 101, 2, 2001, 3, 7, 10, 4, 9, 9, 1, 1, 10, 7, 6, 5, 10, 9; 101, 3, 2001, 10, 8, 2, 4, 6, 1, 6, 4, 8, 10, 7, 9, 4, 7; 101, 1, 2002, 6, 9, 3, 2, 10, 8, 5, 2, 6, 9, 1, 3, 6, 6; 101, 2, 2002, 8, 3, 8, 4, 4, 3, 4, 2, 4, 7, 2, 10, 3, 3; 101, 3, 2002, 10, 2, 10, 10, 6, 5, 3, 5, 10, 1, 3, 4, 8, 5; 103, 1, 2000, 8, 3, 9, 9, 4, 3, 3, 9, 7, 7, 6, 5, 2, 6; 103, 2, 2000, 6, 7, 5, 5, 7, 10, 5, 3, 5, 4, 7, 8, 9, 7; 103, 3, 2000, 3, 4, 9, 10, 3, 10, 7, 2, 10, 3, 3, 3, 6, 6; 103, 1, 2001, 7, 7, 1, 7, 10, 7, 10, 7, 9, 8, 4, 7, 6, 2; 103, 2, 2001, 5, 5, 4, 3, 7, 7, 6, 5, 2, 5, 5, 6, 6, 5; 103, 3, 2001, 6, 3, 9, 9, 2, 10, 10, 10, 10, 7, 10, 9, 9, 8; 103, 1, 2002, 5, 10, 2, 8, 6, 5, 7, 6, 4, 3, 6, 8, 7, 4; 103, 2, 2002, 10, 7, 6, 3, 10, 4, 5, 5, 1, 3, 1, 9, 1, 5; 103, 3, 2002, 2, 1, 5, 5, 2, 8, 6, 2, 6, 6, 10, 1, 4, 9];
for i=1:size(B,1)
aux = sum(A(:,1)==B(i,1) & A(:,2)==B(i,2) & A(:,3) == B(i,3));
A(A(:,1)==B(i,1) & A(:,2)==B(i,2) & A(:,3) == B(i,3),4:20) = repmat(B(i,:),aux,1);
end
With this smaller example the code runs very fast. But as soon as the matrices get as big as the ones I have, it takes ages.
Is there any faster way to do this?
A very simple task to vectorize, ismember does all the work.
%your example
A = [100, 1, 2000 ;100, 1, 2000 ;100, 1, 2000 ;100, 1, 2000 ;100, 2, 2000 ;100, 2, 2000 ;100, 2, 2000 ;100, 2, 2000 ;100, 3, 2000 ;100, 3, 2000 ;100, 3, 2000 ;100, 3, 2000 ;100, 1, 2001 ;100, 1, 2001 ;100, 1, 2001 ;100, 1, 2001 ;100, 2, 2001 ;100, 2, 2001 ;100, 2, 2001 ;100, 2, 2001 ;100, 3, 2001 ;100, 3, 2001 ;100, 3, 2001 ;100, 3, 2001 ;100, 1, 2002 ;100, 1, 2002 ;100, 1, 2002 ;100, 1, 2002 ;100, 2, 2002 ;100, 2, 2002 ;100, 2, 2002 ;100, 2, 2002 ;100, 3, 2002 ;100, 3, 2002 ;100, 3, 2002 ;100, 3, 2002 ;101, 1, 2000 ;101, 1, 2000 ;101, 1, 2000 ;101, 1, 2000 ;101, 2, 2000 ;101, 2, 2000 ;101, 2, 2000 ;101, 2, 2000 ;101, 3, 2000 ;101, 3, 2000 ;101, 3, 2000 ;101, 3, 2000 ;101, 1, 2001 ;101, 1, 2001 ;101, 1, 2001 ;101, 1, 2001 ;101, 2, 2001 ;101, 2, 2001 ;101, 2, 2001 ;101, 2, 2001 ;101, 3, 2001 ;101, 3, 2001 ;101, 3, 2001 ;101, 3, 2001 ;101, 1, 2002 ;101, 1, 2002 ;101, 1, 2002 ;101, 1, 2002 ;101, 2, 2002 ;101, 2, 2002 ;101, 2, 2002 ;101, 2, 2002 ;101, 3, 2002 ;101, 3, 2002 ;101, 3, 2002 ;101, 3, 2002 ;103, 1, 2000 ;103, 1, 2000 ;103, 1, 2000 ;103, 1, 2000 ;103, 2, 2000 ;103, 2, 2000 ;103, 2, 2000 ;103, 2, 2000 ;103, 3, 2000 ;103, 3, 2000 ;103, 3, 2000 ;103, 3, 2000 ;103, 1, 2001 ;103, 1, 2001 ;103, 1, 2001 ;103, 1, 2001 ;103, 2, 2001 ;103, 2, 2001 ;103, 2, 2001 ;103, 2, 2001 ;103, 3, 2001 ;103, 3, 2001 ;103, 3, 2001 ;103, 3, 2001 ;103, 1, 2002 ;103, 1, 2002 ;103, 1, 2002 ;103, 1, 2002 ;103, 2, 2002 ;103, 2, 2002 ;103, 2, 2002 ;103, 2, 2002 ;103, 3, 2002 ;103, 3, 2002 ;103, 3, 2002 ;103, 3, 2002];
B = [100, 1, 2000, 8, 7, 9, 10, 1, 2, 9, 2, 1, 3, 3, 3, 9, 7; 100, 2, 2000, 8, 2, 7, 2, 7, 5, 5, 9, 2, 7, 1, 2, 6, 4; 100, 3, 2000, 8, 8, 7, 3, 2, 8, 1, 10, 9, 8, 6, 1, 5, 7; 100, 1, 2001, 10, 10, 1, 7, 2, 5, 5, 8, 6, 5, 3, 6, 6, 4; 100, 2, 2001, 6, 7, 3, 1, 5, 3, 9, 9, 3, 8, 1, 6, 4, 4; 100, 3, 2001, 1, 5, 7, 1, 5, 4, 2, 10, 5, 4, 6, 5, 1, 10; 100, 1, 2002, 7, 4, 6, 4, 7, 8, 3, 7, 7, 8, 2, 1, 6, 2; 100, 2, 2002, 8, 7, 8, 10, 2, 10, 8, 4, 7, 5, 10, 4, 4, 2; 100, 3, 2002, 4, 9, 4, 10, 2, 4, 2, 1, 4, 10, 9, 2, 6, 9; 101, 1, 2000, 5, 9, 5, 3, 10, 1, 4, 2, 10, 2, 6, 8, 5, 4; 101, 2, 2000, 6, 1, 8, 10, 10, 7, 4, 6, 5, 2, 8, 3, 2, 1; 101, 3, 2000, 7, 7, 5, 6, 2, 8, 8, 8, 1, 6, 1, 1, 9, 7; 101, 1, 2001, 8, 4, 5, 5, 8, 7, 2, 2, 9, 8, 1, 4, 2, 1; 101, 2, 2001, 3, 7, 10, 4, 9, 9, 1, 1, 10, 7, 6, 5, 10, 9; 101, 3, 2001, 10, 8, 2, 4, 6, 1, 6, 4, 8, 10, 7, 9, 4, 7; 101, 1, 2002, 6, 9, 3, 2, 10, 8, 5, 2, 6, 9, 1, 3, 6, 6; 101, 2, 2002, 8, 3, 8, 4, 4, 3, 4, 2, 4, 7, 2, 10, 3, 3; 101, 3, 2002, 10, 2, 10, 10, 6, 5, 3, 5, 10, 1, 3, 4, 8, 5; 103, 1, 2000, 8, 3, 9, 9, 4, 3, 3, 9, 7, 7, 6, 5, 2, 6; 103, 2, 2000, 6, 7, 5, 5, 7, 10, 5, 3, 5, 4, 7, 8, 9, 7; 103, 3, 2000, 3, 4, 9, 10, 3, 10, 7, 2, 10, 3, 3, 3, 6, 6; 103, 1, 2001, 7, 7, 1, 7, 10, 7, 10, 7, 9, 8, 4, 7, 6, 2; 103, 2, 2001, 5, 5, 4, 3, 7, 7, 6, 5, 2, 5, 5, 6, 6, 5; 103, 3, 2001, 6, 3, 9, 9, 2, 10, 10, 10, 10, 7, 10, 9, 9, 8; 103, 1, 2002, 5, 10, 2, 8, 6, 5, 7, 6, 4, 3, 6, 8, 7, 4; 103, 2, 2002, 10, 7, 6, 3, 10, 4, 5, 5, 1, 3, 1, 9, 1, 5; 103, 3, 2002, 2, 1, 5, 5, 2, 8, 6, 2, 6, 6, 10, 1, 4, 9];
%reference code, changed output to C to preserve input
C=A;
for i=1:size(B,1)
aux = sum(A(:,1)==B(i,1) & A(:,2)==B(i,2) & A(:,3) == B(i,3));
C(A(:,1)==B(i,1) & A(:,2)==B(i,2) & A(:,3) == B(i,3),4:20) = repmat(B(i,:),aux,1);
end
%vectorized version
[~,lia]=ismember(A(:,1:3),B(:,1:3),'rows');
C2=nan(numel(lia),size(B,2)+3);
C2(lia>0,:)=B(lia(lia>0),[1,2,3,1:end]);
toc;
tic;
%simplified vectorized version, assuming there is no need to duplicate the first three rows:
[~,lia]=ismember(A(:,1:3),B(:,1:3),'rows');
C3=nan(numel(lia),size(B,2));
C3(lia>0,:)=B(lia(lia>0),:);
toc;
Comparing the performance with some example data close to your real data (same data was to large):
n=100000 %4144514
m=10000 %51962
B=rand(10000,17);
B=unique(B,'rows');
A=B(randi([1 size(B,1)],100000,1),1:3);
Decreased the execution time from 14s to less 0.05s

Probit model in winbugs

I conducted an analysis using a logit model and now want to do the same using a probit model. Can anyone please turn this winbugs logit model into a winbugs probit model?
model
{
for (i in 1:n) {
# Linear regression on logit
logit(p[i]) <- alpha + b.sex*sex[i] + b.age*age[i]
# Likelihood function for each data point
frac[i] ~ dbern(p[i])
}
alpha ~ dnorm(0.0,1.0E-4) # Prior for intercept
b.sex ~ dnorm(0.0,1.0E-4) # Prior for slope of sex
b.age ~ dnorm(0.0,1.0E-4) # Prior for slope of age
}
Data
list(sex=c(1, 1, 1, 0, 1, 1, 0, 0, 0, 0, 1, 1, 1, 1, 1, 0, 1, 0, 0, 0, 1, 1, 1, 0, 0, 1,
1, 0, 1, 1, 1, 0, 0, 0, 1, 1, 0, 0, 1, 1, 0, 1, 0, 0, 0, 1, 0, 0, 0, 1, 0, 1, 0, 1, 1, 1, 0,
0, 1, 1, 1, 1, 0, 0, 0, 1, 1, 1, 0, 0, 1, 1, 1, 0, 0, 0, 1, 1, 0, 0, 0, 0, 1, 0, 0, 1, 1, 1,
0, 1, 0, 1, 1, 1, 0, 1, 1, 1, 1, 1),
age= c(69, 57, 61, 60, 69, 74, 63, 68, 64, 53, 60, 58, 79, 56, 53, 74, 56, 76, 72,
56, 66, 52, 77, 70, 69, 76, 72, 53, 69, 59, 73, 77, 55, 77, 68, 62, 56, 68, 70, 60,
65, 55, 64, 75, 60, 67, 61, 69, 75, 68, 72, 71, 54, 52, 54, 50, 75, 59, 65, 60, 60,
57, 51, 51, 63, 57, 80, 52, 65, 72, 80, 73, 76, 79, 66, 51, 76, 75, 66, 75, 78, 70,
67, 51, 70, 71, 71, 74, 74, 60, 58, 55, 61, 65, 52, 68, 75, 52, 53, 70),
frac=c(1, 1, 1, 0, 1, 1, 0, 1, 1, 0, 1, 0, 1, 1, 0, 1, 0, 1, 1, 0, 1, 0, 1, 1, 1, 1, 1, 0,
1, 0, 1, 1, 0, 1, 1, 1, 0, 1, 1, 0, 1, 0, 0, 1, 0, 1, 0, 1, 1, 1, 1, 1, 0, 0, 0, 0, 1, 1, 1,
1, 1, 0, 0, 0, 1, 0, 1, 0, 0, 1, 1, 1, 1, 1, 0, 0, 1, 1, 0, 1, 1, 1, 0, 0, 1, 1, 1, 1, 1, 1,
1, 0, 1, 1, 0, 0, 1, 0, 0, 1),
n=100)
Initial Values
list(alpha=0, b.sex=1, b.age=1)
WinBUGS accepts multiple types of link functions (see page 15 in the WinBUGS manual). For a probit model, change your linear regression equation to:
probit(p[i]) <- alpha + b.sex*sex[i] + b.age*age[i]
I would recommend you center the age variable, otherwise you may well run into some convergence problems, so something like:
probit(p[i]) <- alpha + b.sex*sex[i] + b.age*(age[i] - mean(age[]))
Alternatively, for a probit model (if the probit functions gives you some trap errors) you could use the phi standard normal cdf function:
p[i] <- phi(alpha + b.sex*sex[i] + b.age*(age[i] - mean(age[])))

easiest way to prototype a symbolic orthogonal matrix

I have 8 sins and cosines that I try to symbolically define as shown below using Matlab. My goal is to symbolically build a matrix H (accumulated Givens rotations matrix) of 8x8 using all these sins and cosines and end up seeing what the formula for this H orthogonal projection matrix is. I can do that using the code below conceptually G7*G6*...*G0*I where I is the Identity 8x8 and the Gi are the Givens rotation corresponding to elements (i:i+1,i:i+1).
c_0 = sym('c_0');
c_1 = sym('c_1');
c_2 = sym('c_2');
c_3 = sym('c_3');
c_4 = sym('c_4');
c_5 = sym('c_5');
c_6 = sym('c_6');
c_7 = sym('c_7');
s_0 = sym('s_0');
s_1 = sym('s_1');
s_2 = sym('s_2');
s_3 = sym('s_3');
s_4 = sym('s_4');
s_5 = sym('s_5');
s_6 = sym('s_6');
s_7 = sym('s_7');
% create H orthogonal matrix using the sin and cos symbols
% filling in the first rotation
I=eye(9,9)
H = I;
H(1:2,1:2) = [c_0 -s_0; s_0 c_0]
% build the 2nd rotation and update H
G = I;
G(2:3,2:3) = [c_1 -s_1; s_1 c_1]
H = G*H
% build the 3rd rotation and update H
G = I;
G(3:4,3:4) = [c_2 -s_2; s_2 c_2]
H = G*H
% build the 4rth rotation and update H
G = I;
G(4:5,4:5) = [c_3 -s_3; s_3 c_3]
H = G*H
% build the 5th rotation and update H
G = I;
G(5:6,5:6) = [c_4 -s_4; s_4 c_4]
H = G*H
% build the 6th rotation and update H
G = I;
G(6:7,6:7) = [c_5 -s_5; s_5 c_5]
H = G*H
% build the 7th rotation and update H
G = I;
G(7:8,7:8) = [c_6 -s_6; s_6 c_6]
H = G*H
% build the 8th rotation and update H
G = I;
G(8:9,8:9) = [c_7 -s_7; s_7 c_7]
H = G*H
The code fails with the following error and can't find how to fix this:
The following error occurred converting from sym to double:
Error using mupadmex
Error in MuPAD command: DOUBLE cannot convert the input expression into a double array.
If the input expression contains a symbolic variable, use the VPA function instead.
Error in build_rotH_test (line 26)
H(1:2,1:2) = [c_0 -s_0; s_0 c_0]
I solved it like this. Note I realized I need the transpose of each rotation so I can build and apply H'*x i.e. G7'*G6'*...*G0'*I that's why the sin signs are flipped in the solution.
clear all;
% defining 0 and 1 as symbols too, solves the problem
sym_0 = sym('0');
sym_1 = sym('1');
c0 = sym('c0');
c1 = sym('c1');
c2 = sym('c2');
c3 = sym('c3');
c4 = sym('c4');
c5 = sym('c5');
c6 = sym('c6');
c7 = sym('c7');
s0 = sym('s0');
s1 = sym('s1');
s2 = sym('s2');
s3 = sym('s3');
s4 = sym('s4');
s5 = sym('s5');
s6 = sym('s6');
s7 = sym('s7');
% create H orthogonal matrix using the sin and cos symbols
% filling in the first rotation
I = repmat(sym_0,9,9);
for i=1:9
I(i,i)=sym_1;
end
H = I
H(1:2,1:2) = [c0 s0; -s0 c0]
% build the 2nd rotation and update H
G = I;
G(2:3,2:3) = [c1 s1; -s1 c1]
H = G*H;
% build the 3rd rotation and update H
G = I;
G(3:4,3:4) = [c2 s2; -s2 c2]
H = G*H;
% build the 4rth rotation and update H
G = I;
G(4:5,4:5) = [c3 s3; -s3 c3]
H = G*H;
% build the 5th rotation and update H
G = I;
G(5:6,5:6) = [c4 s4; -s4 c4]
H = G*H;
% build the 6th rotation and update H
G = I;
G(6:7,6:7) = [c5 s5; -s5 c5]
H = G*H;
% build the 7th rotation and update H
G = I;
G(7:8,7:8) = [c6 s6; -s6 c6]
H = G*H;
% build the 8th rotation and update H
G = I;
G(8:9,8:9) = [c7 s7; -s7 c7]
H = G*H
and the output is:
H =
[ 1, 0, 0, 0, 0, 0, 0, 0, 0]
[ 0, 1, 0, 0, 0, 0, 0, 0, 0]
[ 0, 0, 1, 0, 0, 0, 0, 0, 0]
[ 0, 0, 0, 1, 0, 0, 0, 0, 0]
[ 0, 0, 0, 0, 1, 0, 0, 0, 0]
[ 0, 0, 0, 0, 0, 1, 0, 0, 0]
[ 0, 0, 0, 0, 0, 0, 1, 0, 0]
[ 0, 0, 0, 0, 0, 0, 0, 1, 0]
[ 0, 0, 0, 0, 0, 0, 0, 0, 1]
H =
[ c0, s0, 0, 0, 0, 0, 0, 0, 0]
[ -s0, c0, 0, 0, 0, 0, 0, 0, 0]
[ 0, 0, 1, 0, 0, 0, 0, 0, 0]
[ 0, 0, 0, 1, 0, 0, 0, 0, 0]
[ 0, 0, 0, 0, 1, 0, 0, 0, 0]
[ 0, 0, 0, 0, 0, 1, 0, 0, 0]
[ 0, 0, 0, 0, 0, 0, 1, 0, 0]
[ 0, 0, 0, 0, 0, 0, 0, 1, 0]
[ 0, 0, 0, 0, 0, 0, 0, 0, 1]
G =
[ 1, 0, 0, 0, 0, 0, 0, 0, 0]
[ 0, c1, s1, 0, 0, 0, 0, 0, 0]
[ 0, -s1, c1, 0, 0, 0, 0, 0, 0]
[ 0, 0, 0, 1, 0, 0, 0, 0, 0]
[ 0, 0, 0, 0, 1, 0, 0, 0, 0]
[ 0, 0, 0, 0, 0, 1, 0, 0, 0]
[ 0, 0, 0, 0, 0, 0, 1, 0, 0]
[ 0, 0, 0, 0, 0, 0, 0, 1, 0]
[ 0, 0, 0, 0, 0, 0, 0, 0, 1]
G =
[ 1, 0, 0, 0, 0, 0, 0, 0, 0]
[ 0, 1, 0, 0, 0, 0, 0, 0, 0]
[ 0, 0, c2, s2, 0, 0, 0, 0, 0]
[ 0, 0, -s2, c2, 0, 0, 0, 0, 0]
[ 0, 0, 0, 0, 1, 0, 0, 0, 0]
[ 0, 0, 0, 0, 0, 1, 0, 0, 0]
[ 0, 0, 0, 0, 0, 0, 1, 0, 0]
[ 0, 0, 0, 0, 0, 0, 0, 1, 0]
[ 0, 0, 0, 0, 0, 0, 0, 0, 1]
G =
[ 1, 0, 0, 0, 0, 0, 0, 0, 0]
[ 0, 1, 0, 0, 0, 0, 0, 0, 0]
[ 0, 0, 1, 0, 0, 0, 0, 0, 0]
[ 0, 0, 0, c3, s3, 0, 0, 0, 0]
[ 0, 0, 0, -s3, c3, 0, 0, 0, 0]
[ 0, 0, 0, 0, 0, 1, 0, 0, 0]
[ 0, 0, 0, 0, 0, 0, 1, 0, 0]
[ 0, 0, 0, 0, 0, 0, 0, 1, 0]
[ 0, 0, 0, 0, 0, 0, 0, 0, 1]
G =
[ 1, 0, 0, 0, 0, 0, 0, 0, 0]
[ 0, 1, 0, 0, 0, 0, 0, 0, 0]
[ 0, 0, 1, 0, 0, 0, 0, 0, 0]
[ 0, 0, 0, 1, 0, 0, 0, 0, 0]
[ 0, 0, 0, 0, c4, s4, 0, 0, 0]
[ 0, 0, 0, 0, -s4, c4, 0, 0, 0]
[ 0, 0, 0, 0, 0, 0, 1, 0, 0]
[ 0, 0, 0, 0, 0, 0, 0, 1, 0]
[ 0, 0, 0, 0, 0, 0, 0, 0, 1]
G =
[ 1, 0, 0, 0, 0, 0, 0, 0, 0]
[ 0, 1, 0, 0, 0, 0, 0, 0, 0]
[ 0, 0, 1, 0, 0, 0, 0, 0, 0]
[ 0, 0, 0, 1, 0, 0, 0, 0, 0]
[ 0, 0, 0, 0, 1, 0, 0, 0, 0]
[ 0, 0, 0, 0, 0, c5, s5, 0, 0]
[ 0, 0, 0, 0, 0, -s5, c5, 0, 0]
[ 0, 0, 0, 0, 0, 0, 0, 1, 0]
[ 0, 0, 0, 0, 0, 0, 0, 0, 1]
G =
[ 1, 0, 0, 0, 0, 0, 0, 0, 0]
[ 0, 1, 0, 0, 0, 0, 0, 0, 0]
[ 0, 0, 1, 0, 0, 0, 0, 0, 0]
[ 0, 0, 0, 1, 0, 0, 0, 0, 0]
[ 0, 0, 0, 0, 1, 0, 0, 0, 0]
[ 0, 0, 0, 0, 0, 1, 0, 0, 0]
[ 0, 0, 0, 0, 0, 0, c6, s6, 0]
[ 0, 0, 0, 0, 0, 0, -s6, c6, 0]
[ 0, 0, 0, 0, 0, 0, 0, 0, 1]
G =
[ 1, 0, 0, 0, 0, 0, 0, 0, 0]
[ 0, 1, 0, 0, 0, 0, 0, 0, 0]
[ 0, 0, 1, 0, 0, 0, 0, 0, 0]
[ 0, 0, 0, 1, 0, 0, 0, 0, 0]
[ 0, 0, 0, 0, 1, 0, 0, 0, 0]
[ 0, 0, 0, 0, 0, 1, 0, 0, 0]
[ 0, 0, 0, 0, 0, 0, 1, 0, 0]
[ 0, 0, 0, 0, 0, 0, 0, c7, s7]
[ 0, 0, 0, 0, 0, 0, 0, -s7, c7]
H =
[ c0, s0, 0, 0, 0, 0, 0, 0, 0]
[ -c1*s0, c0*c1, s1, 0, 0, 0, 0, 0, 0]
[ c2*s0*s1, -c0*c2*s1, c1*c2, s2, 0, 0, 0, 0, 0]
[ -c3*s0*s1*s2, c0*c3*s1*s2, -c1*c3*s2, c2*c3, s3, 0, 0, 0, 0]
[ c4*s0*s1*s2*s3, -c0*c4*s1*s2*s3, c1*c4*s2*s3, -c2*c4*s3, c3*c4, s4, 0, 0, 0]
[ -c5*s0*s1*s2*s3*s4, c0*c5*s1*s2*s3*s4, -c1*c5*s2*s3*s4, c2*c5*s3*s4, -c3*c5*s4, c4*c5, s5, 0, 0]
[ c6*s0*s1*s2*s3*s4*s5, -c0*c6*s1*s2*s3*s4*s5, c1*c6*s2*s3*s4*s5, -c2*c6*s3*s4*s5, c3*c6*s4*s5, -c4*c6*s5, c5*c6, s6, 0]
[ -c7*s0*s1*s2*s3*s4*s5*s6, c0*c7*s1*s2*s3*s4*s5*s6, -c1*c7*s2*s3*s4*s5*s6, c2*c7*s3*s4*s5*s6, -c3*c7*s4*s5*s6, c4*c7*s5*s6, -c5*c7*s6, c6*c7, s7]
[ s0*s1*s2*s3*s4*s5*s6*s7, -c0*s1*s2*s3*s4*s5*s6*s7, c1*s2*s3*s4*s5*s6*s7, -c2*s3*s4*s5*s6*s7, c3*s4*s5*s6*s7, -c4*s5*s6*s7, c5*s6*s7, -c6*s7, c7]