Understanding the | operator in scala [duplicate] - scala
This question already has answers here:
What are bitwise operators?
(9 answers)
Closed 3 years ago.
I came to code:
scala> val a = 0 | 1
a: Int = 1
scala> val a = 0 | 1 | 2
a: Int = 3
scala> val a = 0 | 1 | 2 | 3
a: Int = 3
scala> val a = 0 | 1 | 2 | 3 | 4
a: Int = 7
The only result I expected from the | operator is the result of the first command.
I see it behaves like a logic or and also it adds elements in the second command.
Could someone explain the working of the | operator using integers as operators?
| is the bitwise OR operator:
val a = 0 | 1
a: Int = 1
00 0
01 1
-----
01 1
val a = 0 | 1 | 2 | 3
a: Int = 3
00 0
01 1
10 2
11 3
------
11 3
val a = 0 | 1 | 2 | 3 | 4
a: Int = 7
000 0
001 1
010 2
011 3
100 4
-------
111 7
It's just a logical or between each bit of binary representation of integer value (1 or 1 = 1, 1 or 0 = 1, 0 or 1 = 1, 0 or 0 = 0)
val a = 0 | 1
//0 or 1 = 1 (1 - decimal number)
val a = 0 | 1 | 2
//00 or 01 or 10 = 11 (3 - decimal number)
val a = 0 | 1 | 2 | 3
//00 or 01 or 10 or 11 = 11 (3 - decimal number)
val a = 0 | 1 | 2 | 3 | 4
//000 or 001 or 010 or 011 or 100 = 111 (7 - decimal number)
Related
summarise (avg) table (keyed) for each row
Given a keyed table, e.g.: q)\S 7 / seed random numbers for reproducibility q)v:flip (neg[d 0]?`1)!#[;prd[d]?12] d:4 6 / 4 cols 6 rows q)show kt:([]letter:d[1]#.Q.an)!v letter| c g b e ------| ---------- a | 11 0 3 9 b | 11 8 10 0 c | 7 2 2 3 d | 8 4 9 6 e | 0 0 5 0 f | 1 0 0 11 How to calculate an average for each row --- e.g. (c+g+b+e)%4 --- for any number of columns?
Following on from your own solution, note that you have to be a little careful with null handling. Your approach won't ignore nulls in the way that avg normally would. q).[`kt;("a";`g);:;0N]; q)update av:avg flip value kt from kt letter| c g b e av ------| --------------- a | 11 3 9 b | 11 8 10 0 7.25 c | 7 2 2 3 3.5 d | 8 4 9 6 6.75 e | 0 0 5 0 1.25 f | 1 0 0 11 3 To make it ignore nulls you have to avg each row rather than averaging the flip. q)update av:avg each value kt from kt letter| c g b e av ------| ------------------- a | 11 3 9 7.666667 b | 11 8 10 0 7.25 c | 7 2 2 3 3.5 d | 8 4 9 6 6.75 e | 0 0 5 0 1.25 f | 1 0 0 11 3
Solution 1: q-sql q)update av:avg flip value kt from kt letter| c g b e av ------| --------------- a | 11 0 3 9 5.75 b | 11 8 10 0 7.25 c | 7 2 2 3 3.5 d | 8 4 9 6 6.75 e | 0 0 5 0 1.25 f | 1 0 0 11 3 Solution 2: functional q-sql tl;dr: q)![kt;();0b;](1#`av)!enlist(avg;)enlist,cols[`kt]except cols key`kt letter| c g b e av ------| --------------- a | 11 0 3 9 5.75 b | 11 8 10 0 7.25 c | 7 2 2 3 3.5 d | 8 4 9 6 6.75 e | 0 0 5 0 1.25 f | 1 0 0 11 3 let's start with a look how the parse tree of a non-general solution would look like: q)parse"update av:avg (c;g;b;e) from kt" ! `kt () 0b (,`av)!,(avg;(enlist;`c;`g;`b;`e)) (note that q is a wrapper implemented in k, so the , prefix operator in the above expression is the same as enlist keyword in q) so all the below are equivalent (verify with ~). relying on projection: (x;y)~(x;)y, we can further improve the readability by reducing the distance between parens: q)k)(!;`kt;();0b;(,`av)!,(avg;(enlist;`c;`g;`b;`e))) q)(!;`kt;();0b;(enlist`av)!enlist(avg;(enlist;`c;`g;`b;`e))) q)(!;`kt;();0b;(1#`av)!enlist(avg;(enlist;`c;`g;`b;`e))) q)(!;`kt;();0b;)(1#`av)!enlist(avg;)(enlist;`c;`g;`b;`e) let's evaluate the parse tree to check: q)eval(!;`kt;();0b;)(1#`av)!enlist(avg;)(enlist;`c;`g;`b;`e) letter| c g b e av ------| --------------- a | 11 0 3 9 5.75 b | 11 8 10 0 7.25 c | 7 2 2 3 3.5 d | 8 4 9 6 6.75 e | 0 0 5 0 1.25 f | 1 0 0 11 3 (enlist;`c;`g;`b;`e) in the general case is: q)enlist,cols[`kt]except cols key`kt enlist `c `g `b `e so let's plug in and check: q)eval(!;`kt;();0b;(1#`av)!enlist(avg;)enlist,cols[`kt]except cols key`kt) letter| c g b e av ------| --------------- a | 11 0 3 9 5.75 b | 11 8 10 0 7.25 c | 7 2 2 3 3.5 d | 8 4 9 6 6.75 e | 0 0 5 0 1.25 f | 1 0 0 11 3 also: q)![`kt;();0b;(1#`av)!enlist(avg;)enlist,cols[`kt]except cols key`kt] q)![ kt;();0b;](1#`av)!enlist(avg;)enlist,cols[`kt]except cols key`kt
Stata merge with multiple match variables
I am having difficulty combining datasets for a project. Our primary dataset is organized by individual judges. It is an attribute dataset. judge j | x | y | z ----|----|----|---- 1 | 2 | 3 | 4 2 | 5 | 6 | 7 The second dataset is a case database. Each observation is a case and judges can appear in one of three variables. case case | j1 | j2 | j3 | year -----|----|----|----|----- 1 | 1 | 2 | 3 | 2002 2 | 2 | 3 | 1 | 1997 We would like to merge the case database into the attribute database, matching by judge. So, for each case that a judge appears in j1, j2, or j3, an observation for that case would be added creating a dataset that looks like below. combined j | x | y | z | case | year ---|----|----|----|-------|-------- 1 | 2 | 3 | 4 | 1 | 2002 1 | 2 | 3 | 4 | 2 | 1997 2 | 5 | 6 | 7 | 1 | 2002 2 | 5 | 6 | 7 | 2 | 1997 My best guess is to use rename j1 j merge 1:m j using case rename j j1 rename j2 j merge 1:m j using case However, I am unsure that this will work, especially since the merging dataset has three possible variables that the j identification can occur in.
Your examples are clear, but even better would be present them as code that would not require engineering edits to remove the scaffolding. See dataex from SSC (ssc inst dataex). It's a case of the missing reshape, I think. clear input j x y z 1 2 3 4 2 5 6 7 end save judge clear input case j1 j2 j3 year 1 1 2 3 2002 2 2 3 1 1997 end reshape long j , i(case) j(which) merge m:1 j using judge list +-------------------------------------------------------+ | case which j year x y z _merge | |-------------------------------------------------------| 1. | 1 1 1 2002 2 3 4 matched (3) | 2. | 2 3 1 1997 2 3 4 matched (3) | 3. | 2 1 2 1997 5 6 7 matched (3) | 4. | 1 2 2 2002 5 6 7 matched (3) | 5. | 2 2 3 1997 . . . master only (1) | |-------------------------------------------------------| 6. | 1 3 3 2002 . . . master only (1) | +-------------------------------------------------------+ drop if _merge < 3 list +---------------------------------------------------+ | case which j year x y z _merge | |---------------------------------------------------| 1. | 1 1 1 2002 2 3 4 matched (3) | 2. | 2 3 1 1997 2 3 4 matched (3) | 3. | 2 1 2 1997 5 6 7 matched (3) | 4. | 1 2 2 2002 5 6 7 matched (3) | +---------------------------------------------------+
K-map ( karnaugh map ) 8,4,-2,-1 to binary code conversion
I'm taking computer science courses and some digital design knowledge is required, so I'm taking digital design 101. Image above is representing the conversion process of 8,4,-2,-1 to binary using K-map (Karnaugh map). I have no idea why 0001, 0011, 0010, 1100, 1101, 1110 are marked as 'X'. For 0001, 0011, 0010, they could be expressed as 8,4,-2,-1 as 0111, 0110, 0101. And for 1100, 1101, 1110, 1110 can still be expressed as 1100 in 8,4,-2,-1 form as 1100. rests cannot be expressed in 8,4,-2,-1 since 1100 is the biggest amount of number in 8,4,-2,-1 binary form (I think). Is there something I'm missing? I understand the excess-3 to binary code conversion provided from my textbook example ( m10-m15 are marked as 'X' since excess-3 were used to express only 0-9.)
According to the definition of BCD, 1 decimal digit (NOT one number) is represented by 4 bits. The 4 given inputs can therefore represent only values from interval from 0 to 9. The corresponding and complete truth-table looks like this: decimal | 8 4 -2 -1 | decimal || BCD /index | A B C D | result || W X Y Z ----------------------------------||--------- 0 | 0 0 0 0 | 0 || 0 0 0 0 ~ 0 1 | 0 0 0 1 | -1 || X X X X 2 | 0 0 1 0 | -2 || X X X X 3 | 0 0 1 1 | -2-1=-3 || X X X X 4 | 0 1 0 0 | 4 || 0 1 0 0 ~ 4 5 | 0 1 0 1 | 4-1=3 || 0 0 1 1 ~ 3 6 | 0 1 1 0 | 4-2=2 || 0 0 1 0 ~ 2 7 | 0 1 1 1 | 4-2-1=1 || 0 0 0 1 ~ 1 8 | 1 0 0 0 | 8 || 1 0 0 0 ~ 8 9 | 1 0 0 1 | 8-1=7 || 0 1 1 1 ~ 7 10 | 1 0 1 0 | 8-2=6 || 0 1 1 0 ~ 6 11 | 1 0 1 1 | 8-2-1=5 || 0 1 0 1 ~ 5 12 | 1 1 0 0 | 8+4=12 || X X X X 13 | 1 1 0 1 | 8+4-1=11 || X X X X 14 | 1 1 1 0 | 8+4-2=10 || X X X X 15 | 1 1 1 1 | 8+4-2-1=9 || 1 0 0 1 ~ 9 The K-maps then match the truth-table by its indexes: Using the K-maps, it can be indeed simplified to these boolean expressions: W = A·B + A·¬C·¬D X = ¬B·C + ¬B·D + B·¬C·¬D Y = ¬C·D + C·¬D Z = D
Scala for-comprehension if filtering too much?
I have the following program: (Scala 2.9.2, Java6) object Forcomp { def main(args: Array[String]): Unit = { val xs = List(-1, 0, 1) val xss = for (a <- xs; b <- xs if a != 0 && b != 0) yield (a,b) println(xss) } } It produces this output: List((-1,-1), (-1,1), (1,-1), (1,1)) I would have expected it to only filter out values where a and b are both 0 – not all values where either a or b are 0. I can get the behaviour I want by changing the if-clause to this: if (a,b) != (0,0) – however, should I really have to? Is this a bug or is this intentional behaviour? I, for one, was surprised by this.
The truth table for the filter you have is this: a==0 | b==0 | (a!=0 && b!=0) -------------------------------- 0 | 0 | 0 0 | 1 | 0 1 | 0 | 0 1 | 1 | 1 whereas the behaviour you say you want is: a==0 | b==0 | !(a==0 && b==0) -------------------------------- 0 | 0 | 0 0 | 1 | 1 1 | 0 | 1 1 | 1 | 1
Difference between correctly / incorrectly classified instances in decision tree and confusion matrix in Weka
I have been using Weka’s J48 decision tree to classify frequencies of keywords in RSS feeds into target categories. And I think I may have a problem reconciling the generated decision tree with the number of correctly classified instances reported and in the confusion matrix. For example, one of my .arff files contains the following data extracts: #attribute Keyword_1_nasa_Frequency numeric #attribute Keyword_2_fish_Frequency numeric #attribute Keyword_3_kill_Frequency numeric #attribute Keyword_4_show_Frequency numeric ... #attribute Keyword_64_fear_Frequency numeric #attribute RSSFeedCategoryDescription {BFE,FCL,F,M, NCA, SNT,S} #data 0,0,0,34,0,0,0,0,0,40,0,0,0,0,0,0,0,0,0,0,24,0,0,0,0,13,0,0,0,0,0,0,0,0,0,0,0,0, 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,BFE 0,0,0,10,0,0,0,0,0,11,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0, 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,BFE 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0, 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,BFE ... 20,0,64,19,0,162,0,0,36,72,179,24,24,47,24,40,0,48,0,0,0,97,24,0,48,205,143,62,78, 0,0,216,0,36,24,24,0,0,24,0,0,0,0,140,24,0,0,0,0,72,176,0,0,144,48,0,38,0,284, 221,72,0,72,0,SNT ... 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,6,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,8,0,0,0,0,0, 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,S And so on: there’s a total of 64 keywords (columns) and 570 rows where each one contains the frequency of a keyword in a feed for a day. In this case, there are 57 feeds for 10 days giving a total of 570 records to be classified. Each keyword is prefixed with a surrogate number and postfixed with ‘Frequency’. My use of the decision tree is with default parameters using 10x validation. Weka reports the following: Correctly Classified Instances 210 36.8421 % Incorrectly Classified Instances 360 63.1579 % With the following confusion matrix: === Confusion Matrix === a b c d e f g <-- classified as 11 0 0 0 39 0 0 | a = BFE 0 0 0 0 60 0 0 | b = FCL 1 0 5 0 72 0 2 | c = F 0 0 1 0 69 0 0 | d = M 3 0 0 0 153 0 4 | e = NCA 0 0 0 0 90 10 0 | f = SNT 0 0 0 0 19 0 31 | g = S The tree is as follows: Keyword_22_health_Frequency <= 0 | Keyword_7_open_Frequency <= 0 | | Keyword_52_libya_Frequency <= 0 | | | Keyword_21_job_Frequency <= 0 | | | | Keyword_48_pic_Frequency <= 0 | | | | | Keyword_63_world_Frequency <= 0 | | | | | | Keyword_26_day_Frequency <= 0: NCA (461.0/343.0) | | | | | | Keyword_26_day_Frequency > 0: BFE (8.0/3.0) | | | | | Keyword_63_world_Frequency > 0 | | | | | | Keyword_31_gaddafi_Frequency <= 0: S (4.0/1.0) | | | | | | Keyword_31_gaddafi_Frequency > 0: NCA (3.0) | | | | Keyword_48_pic_Frequency > 0: F (7.0) | | | Keyword_21_job_Frequency > 0: BFE (10.0/1.0) | | Keyword_52_libya_Frequency > 0: NCA (31.0) | Keyword_7_open_Frequency > 0 | | Keyword_31_gaddafi_Frequency <= 0: S (32.0/1.0) | | Keyword_31_gaddafi_Frequency > 0: NCA (4.0) Keyword_22_health_Frequency > 0: SNT (10.0) My question concerns reconciling the matrix to the tree or vice versa. As far as I understand the results, a rating like (461.0/343.0) indicates that 461 instances have been classified as NCA. But how can that be when the matrix reveals only 153? I am not sure how to interpret this so any help is welcome. Thanks in advance.
The number in parentheses at each leaf should be read as (number of total instances of this classification at this leaf / number of incorrect classifications at this leaf). In your example for the first NCA leaf, it says there are 461 test instances that were classified as NCA, and of those 461, there were 343 incorrect classifications. So there are 461-343 = 118 correctly classified instances at that leaf. Looking through your decision tree, note that NCA is also at other leaves. I count 118 + 3 + 31 + 4 = 156 correctly classified instances out of 461 + 3 + 31 + 4 = 499 total classifications of NCA. Your confusion matrix shows 153 correct classifications of NCA out of 39 + 60 + 72 + 69 + 153 + 90 + 19 = 502 total classifications of NCA. So there is a slight difference between the tree (156/499) and your confusion matrix (153/502). Note that if you are running Weka from the command-line, it outputs a tree and a confusion matrix for testing on all the training data and also for testing with cross-validation. Be careful that you are looking at the right matrix for the right tree. Weka outputs both training and test results, resulting in two pairs of matrix and tree. You may have mixed them up.