Understanding the | operator in scala [duplicate]

Understanding the | operator in scala [duplicate] - scala

This question already has answers here:
What are bitwise operators?
(9 answers)
Closed 3 years ago.
I came to code:
scala> val a = 0 | 1
a: Int = 1
scala> val a = 0 | 1 | 2
a: Int = 3
scala> val a = 0 | 1 | 2 | 3
a: Int = 3
scala> val a = 0 | 1 | 2 | 3 | 4
a: Int = 7
The only result I expected from the | operator is the result of the first command.
I see it behaves like a logic or and also it adds elements in the second command.
Could someone explain the working of the | operator using integers as operators?

| is the bitwise OR operator:
val a = 0 | 1
a: Int = 1
00 0
01 1
-----
01 1
val a = 0 | 1 | 2 | 3
a: Int = 3
00 0
01 1
10 2
11 3
------
11 3
val a = 0 | 1 | 2 | 3 | 4
a: Int = 7
000 0
001 1
010 2
011 3
100 4
-------
111 7

It's just a logical or between each bit of binary representation of integer value (1 or 1 = 1, 1 or 0 = 1, 0 or 1 = 1, 0 or 0 = 0)
val a = 0 | 1
//0 or 1 = 1 (1 - decimal number)
val a = 0 | 1 | 2
//00 or 01 or 10 = 11 (3 - decimal number)
val a = 0 | 1 | 2 | 3
//00 or 01 or 10 or 11 = 11 (3 - decimal number)
val a = 0 | 1 | 2 | 3 | 4
//000 or 001 or 010 or 011 or 100 = 111 (7 - decimal number)

Related

summarise (avg) table (keyed) for each row

Given a keyed table, e.g.:
q)\S 7 / seed random numbers for reproducibility
q)v:flip (neg[d 0]?`1)!#[;prd[d]?12] d:4 6 / 4 cols 6 rows
q)show kt:([]letter:d[1]#.Q.an)!v
letter| c g b e
------| ----------
a | 11 0 3 9
b | 11 8 10 0
c | 7 2 2 3
d | 8 4 9 6
e | 0 0 5 0
f | 1 0 0 11
How to calculate an average for each row --- e.g. (c+g+b+e)%4 --- for any number of columns?

Following on from your own solution, note that you have to be a little careful with null handling. Your approach won't ignore nulls in the way that avg normally would.
q).[`kt;("a";`g);:;0N];
q)update av:avg flip value kt from kt
letter| c g b e av
------| ---------------
a | 11 3 9
b | 11 8 10 0 7.25
c | 7 2 2 3 3.5
d | 8 4 9 6 6.75
e | 0 0 5 0 1.25
f | 1 0 0 11 3
To make it ignore nulls you have to avg each row rather than averaging the flip.
q)update av:avg each value kt from kt
letter| c g b e av
------| -------------------
a | 11 3 9 7.666667
b | 11 8 10 0 7.25
c | 7 2 2 3 3.5
d | 8 4 9 6 6.75
e | 0 0 5 0 1.25
f | 1 0 0 11 3

Solution 1: q-sql
q)update av:avg flip value kt from kt
letter| c g b e av
------| ---------------
a | 11 0 3 9 5.75
b | 11 8 10 0 7.25
c | 7 2 2 3 3.5
d | 8 4 9 6 6.75
e | 0 0 5 0 1.25
f | 1 0 0 11 3
Solution 2: functional q-sql
tl;dr:
q)![kt;();0b;](1#`av)!enlist(avg;)enlist,cols[`kt]except cols key`kt
letter| c g b e av
------| ---------------
a | 11 0 3 9 5.75
b | 11 8 10 0 7.25
c | 7 2 2 3 3.5
d | 8 4 9 6 6.75
e | 0 0 5 0 1.25
f | 1 0 0 11 3
let's start with a look how the parse tree of a non-general solution would look like:
q)parse"update av:avg (c;g;b;e) from kt"
!
`kt
()
0b
(,`av)!,(avg;(enlist;`c;`g;`b;`e))
(note that q is a wrapper implemented in k, so the , prefix operator in the above expression is the same as enlist keyword in q)
so all the below are equivalent (verify with ~). relying on projection: (x;y)~(x;)y, we can further improve the readability by reducing the distance between parens:
q)k)(!;`kt;();0b;(,`av)!,(avg;(enlist;`c;`g;`b;`e)))
q)(!;`kt;();0b;(enlist`av)!enlist(avg;(enlist;`c;`g;`b;`e)))
q)(!;`kt;();0b;(1#`av)!enlist(avg;(enlist;`c;`g;`b;`e)))
q)(!;`kt;();0b;)(1#`av)!enlist(avg;)(enlist;`c;`g;`b;`e)
let's evaluate the parse tree to check:
q)eval(!;`kt;();0b;)(1#`av)!enlist(avg;)(enlist;`c;`g;`b;`e)
letter| c g b e av
------| ---------------
a | 11 0 3 9 5.75
b | 11 8 10 0 7.25
c | 7 2 2 3 3.5
d | 8 4 9 6 6.75
e | 0 0 5 0 1.25
f | 1 0 0 11 3
(enlist;`c;`g;`b;`e) in the general case is:
q)enlist,cols[`kt]except cols key`kt
enlist
`c
`g
`b
`e
so let's plug in and check:
q)eval(!;`kt;();0b;(1#`av)!enlist(avg;)enlist,cols[`kt]except cols key`kt)
letter| c g b e av
------| ---------------
a | 11 0 3 9 5.75
b | 11 8 10 0 7.25
c | 7 2 2 3 3.5
d | 8 4 9 6 6.75
e | 0 0 5 0 1.25
f | 1 0 0 11 3
also:
q)![`kt;();0b;(1#`av)!enlist(avg;)enlist,cols[`kt]except cols key`kt]
q)![ kt;();0b;](1#`av)!enlist(avg;)enlist,cols[`kt]except cols key`kt

Stata merge with multiple match variables

I am having difficulty combining datasets for a project. Our primary dataset is organized by individual judges. It is an attribute dataset.
judge
j | x | y | z
----|----|----|----
1 | 2 | 3 | 4
2 | 5 | 6 | 7
The second dataset is a case database. Each observation is a case and judges can appear in one of three variables.
case
case | j1 | j2 | j3 | year
-----|----|----|----|-----
1 | 1 | 2 | 3 | 2002
2 | 2 | 3 | 1 | 1997
We would like to merge the case database into the attribute database, matching by judge. So, for each case that a judge appears in j1, j2, or j3, an observation for that case would be added creating a dataset that looks like below.
combined
j | x | y | z | case | year
---|----|----|----|-------|--------
1 | 2 | 3 | 4 | 1 | 2002
1 | 2 | 3 | 4 | 2 | 1997
2 | 5 | 6 | 7 | 1 | 2002
2 | 5 | 6 | 7 | 2 | 1997
My best guess is to use
rename j1 j
merge 1:m j using case
rename j j1
rename j2 j
merge 1:m j using case
However, I am unsure that this will work, especially since the merging dataset has three possible variables that the j identification can occur in.

Your examples are clear, but even better would be present them as code that would not require engineering edits to remove the scaffolding. See dataex from SSC (ssc inst dataex).
It's a case of the missing reshape, I think.
clear
input j x y z
1 2 3 4
2 5 6 7
end
save judge
clear
input case j1 j2 j3 year
1 1 2 3 2002
2 2 3 1 1997
end
reshape long j , i(case) j(which)
merge m:1 j using judge
list
+-------------------------------------------------------+
| case which j year x y z _merge |
|-------------------------------------------------------|
1. | 1 1 1 2002 2 3 4 matched (3) |
2. | 2 3 1 1997 2 3 4 matched (3) |
3. | 2 1 2 1997 5 6 7 matched (3) |
4. | 1 2 2 2002 5 6 7 matched (3) |
5. | 2 2 3 1997 . . . master only (1) |
|-------------------------------------------------------|
6. | 1 3 3 2002 . . . master only (1) |
+-------------------------------------------------------+
drop if _merge < 3
list
+---------------------------------------------------+
| case which j year x y z _merge |
|---------------------------------------------------|
1. | 1 1 1 2002 2 3 4 matched (3) |
2. | 2 3 1 1997 2 3 4 matched (3) |
3. | 2 1 2 1997 5 6 7 matched (3) |
4. | 1 2 2 2002 5 6 7 matched (3) |
+---------------------------------------------------+

K-map ( karnaugh map ) 8,4,-2,-1 to binary code conversion

I'm taking computer science courses and some digital design knowledge is required, so I'm taking digital design 101.
Image above is representing the conversion process of 8,4,-2,-1 to binary using K-map (Karnaugh map).
I have no idea why 0001, 0011, 0010, 1100, 1101, 1110 are marked as 'X'.
For 0001, 0011, 0010, they could be expressed as 8,4,-2,-1 as 0111, 0110, 0101.
And for 1100, 1101, 1110,
1110 can still be expressed as 1100 in 8,4,-2,-1 form as 1100.
rests cannot be expressed in 8,4,-2,-1 since 1100 is the biggest amount of number in 8,4,-2,-1 binary form (I think).
Is there something I'm missing?
I understand the excess-3 to binary code conversion provided from my textbook example ( m10-m15 are marked as 'X' since excess-3 were used to express only 0-9.)

According to the definition of BCD, 1 decimal digit (NOT one number) is represented by 4 bits.
The 4 given inputs can therefore represent only values from interval from 0 to 9.
The corresponding and complete truth-table looks like this:
decimal | 8 4 -2 -1 | decimal || BCD
/index | A B C D | result || W X Y Z
----------------------------------||---------
0 | 0 0 0 0 | 0 || 0 0 0 0 ~ 0
1 | 0 0 0 1 | -1 || X X X X
2 | 0 0 1 0 | -2 || X X X X
3 | 0 0 1 1 | -2-1=-3 || X X X X
4 | 0 1 0 0 | 4 || 0 1 0 0 ~ 4
5 | 0 1 0 1 | 4-1=3 || 0 0 1 1 ~ 3
6 | 0 1 1 0 | 4-2=2 || 0 0 1 0 ~ 2
7 | 0 1 1 1 | 4-2-1=1 || 0 0 0 1 ~ 1
8 | 1 0 0 0 | 8 || 1 0 0 0 ~ 8
9 | 1 0 0 1 | 8-1=7 || 0 1 1 1 ~ 7
10 | 1 0 1 0 | 8-2=6 || 0 1 1 0 ~ 6
11 | 1 0 1 1 | 8-2-1=5 || 0 1 0 1 ~ 5
12 | 1 1 0 0 | 8+4=12 || X X X X
13 | 1 1 0 1 | 8+4-1=11 || X X X X
14 | 1 1 1 0 | 8+4-2=10 || X X X X
15 | 1 1 1 1 | 8+4-2-1=9 || 1 0 0 1 ~ 9
The K-maps then match the truth-table by its indexes:
Using the K-maps, it can be indeed simplified to these boolean expressions:
W = A·B + A·¬C·¬D
X = ¬B·C + ¬B·D + B·¬C·¬D
Y = ¬C·D + C·¬D
Z = D

Scala for-comprehension if filtering too much?

I have the following program: (Scala 2.9.2, Java6)
object Forcomp {
def main(args: Array[String]): Unit = {
val xs = List(-1, 0, 1)
val xss = for (a <- xs; b <- xs if a != 0 && b != 0) yield (a,b)
println(xss)
}
}
It produces this output: List((-1,-1), (-1,1), (1,-1), (1,1))
I would have expected it to only filter out values where a and b are both 0 – not all values where either a or b are 0.
I can get the behaviour I want by changing the if-clause to this: if (a,b) != (0,0) – however, should I really have to? Is this a bug or is this intentional behaviour? I, for one, was surprised by this.

The truth table for the filter you have is this:
a==0 | b==0 | (a!=0 && b!=0)
--------------------------------
0 | 0 | 0
0 | 1 | 0
1 | 0 | 0
1 | 1 | 1
whereas the behaviour you say you want is:
a==0 | b==0 | !(a==0 && b==0)
--------------------------------
0 | 0 | 0
0 | 1 | 1
1 | 0 | 1
1 | 1 | 1

Difference between correctly / incorrectly classified instances in decision tree and confusion matrix in Weka

I have been using Weka’s J48 decision tree to classify frequencies of keywords
in RSS feeds into target categories. And I think I may have a problem
reconciling the generated decision tree with the number of correctly classified
instances reported and in the confusion matrix.
For example, one of my .arff files contains the following data extracts:
#attribute Keyword_1_nasa_Frequency numeric
#attribute Keyword_2_fish_Frequency numeric
#attribute Keyword_3_kill_Frequency numeric
#attribute Keyword_4_show_Frequency numeric
...
#attribute Keyword_64_fear_Frequency numeric
#attribute RSSFeedCategoryDescription {BFE,FCL,F,M, NCA, SNT,S}
#data
0,0,0,34,0,0,0,0,0,40,0,0,0,0,0,0,0,0,0,0,24,0,0,0,0,13,0,0,0,0,0,0,0,0,0,0,0,0,
0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,BFE
0,0,0,10,0,0,0,0,0,11,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,
0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,BFE
0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,
0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,BFE
...
20,0,64,19,0,162,0,0,36,72,179,24,24,47,24,40,0,48,0,0,0,97,24,0,48,205,143,62,78,
0,0,216,0,36,24,24,0,0,24,0,0,0,0,140,24,0,0,0,0,72,176,0,0,144,48,0,38,0,284,
221,72,0,72,0,SNT
...
0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,6,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,8,0,0,0,0,0,
0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,S
And so on: there’s a total of 64 keywords (columns) and 570 rows where each one contains the frequency of a keyword in a feed for a day. In this case, there are 57 feeds for
10 days giving a total of 570 records to be classified. Each keyword is prefixed
with a surrogate number and postfixed with ‘Frequency’.
My use of the decision tree is with default parameters using 10x validation.
Weka reports the following:
Correctly Classified Instances 210 36.8421 %
Incorrectly Classified Instances 360 63.1579 %
With the following confusion matrix:
=== Confusion Matrix ===
a b c d e f g <-- classified as
11 0 0 0 39 0 0 | a = BFE
0 0 0 0 60 0 0 | b = FCL
1 0 5 0 72 0 2 | c = F
0 0 1 0 69 0 0 | d = M
3 0 0 0 153 0 4 | e = NCA
0 0 0 0 90 10 0 | f = SNT
0 0 0 0 19 0 31 | g = S
The tree is as follows:
Keyword_22_health_Frequency <= 0
| Keyword_7_open_Frequency <= 0
| | Keyword_52_libya_Frequency <= 0
| | | Keyword_21_job_Frequency <= 0
| | | | Keyword_48_pic_Frequency <= 0
| | | | | Keyword_63_world_Frequency <= 0
| | | | | | Keyword_26_day_Frequency <= 0: NCA (461.0/343.0)
| | | | | | Keyword_26_day_Frequency > 0: BFE (8.0/3.0)
| | | | | Keyword_63_world_Frequency > 0
| | | | | | Keyword_31_gaddafi_Frequency <= 0: S (4.0/1.0)
| | | | | | Keyword_31_gaddafi_Frequency > 0: NCA (3.0)
| | | | Keyword_48_pic_Frequency > 0: F (7.0)
| | | Keyword_21_job_Frequency > 0: BFE (10.0/1.0)
| | Keyword_52_libya_Frequency > 0: NCA (31.0)
| Keyword_7_open_Frequency > 0
| | Keyword_31_gaddafi_Frequency <= 0: S (32.0/1.0)
| | Keyword_31_gaddafi_Frequency > 0: NCA (4.0)
Keyword_22_health_Frequency > 0: SNT (10.0)
My question concerns reconciling the matrix to the tree or vice versa. As far as
I understand the results, a rating like (461.0/343.0) indicates that 461 instances have been classified as NCA. But how can that be when the matrix reveals only 153? I am
not sure how to interpret this so any help is welcome.
Thanks in advance.

The number in parentheses at each leaf should be read as (number of total instances of this classification at this leaf / number of incorrect classifications at this leaf).
In your example for the first NCA leaf, it says there are 461 test instances that were classified as NCA, and of those 461, there were 343 incorrect classifications. So there are 461-343 = 118 correctly classified instances at that leaf.
Looking through your decision tree, note that NCA is also at other leaves. I count 118 + 3 + 31 + 4 = 156 correctly classified instances out of 461 + 3 + 31 + 4 = 499 total classifications of NCA.
Your confusion matrix shows 153 correct classifications of NCA out of 39 + 60 + 72 + 69 + 153 + 90 + 19 = 502 total classifications of NCA.
So there is a slight difference between the tree (156/499) and your confusion matrix (153/502).
Note that if you are running Weka from the command-line, it outputs a tree and a confusion matrix for testing on all the training data and also for testing with cross-validation. Be careful that you are looking at the right matrix for the right tree. Weka outputs both training and test results, resulting in two pairs of matrix and tree. You may have mixed them up.

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse

Understanding the | operator in scala [duplicate] - scala

| is the bitwise OR operator: val a = 0 | 1 a: Int = 1 00 0 01 1 ----- 01 1 val a = 0 | 1 | 2 | 3 a: Int = 3 00 0 01 1 10 2 11 3 ------ 11 3 val a = 0 | 1 | 2 | 3 | 4 a: Int = 7 000 0 001 1 010 2 011 3 100 4 ------- 111 7

Related

summarise (avg) table (keyed) for each row

Stata merge with multiple match variables

K-map ( karnaugh map ) 8,4,-2,-1 to binary code conversion

Scala for-comprehension if filtering too much?

Difference between correctly / incorrectly classified instances in decision tree and confusion matrix in Weka

Categories

Resources