XOR Majority Algebraic Logic - boolean

How do you implement a Majority function with XOR and AND only? How do the authors of this paper get to the equation that they present below?

A Majority Function with three inputs can be written as CNF (product of sums)
(a or b) and (a or c) and (b or c)
or as DNF (sum of products)
ab or ac or bc
Using AND and XOR, you can write
maj(a,b,c) = ab xor bc xor ac
A truth-table is probably the easiest way to check this. An XOR with three inputs is true, if either one input is true or all three inputs.
ab
00 01 11 10
+---+---+---+---+
0 | 0 | 0 | 1 | 0 |
c +---+---+---+---+
1 | 0 | 1 | 1 | 1 |
+---+---+---+---+

Related

kdb union join (with plus join)

I have been stuck on this for a while now, but cannot come up with a solution, any help would be appriciated
I have 2 table like
q)x
a b c d
--------
1 x 10 1
2 y 20 1
3 z 30 1
q)y
a b| c d
---| ----
1 x| 1 10
3 h| 2 20
Would like to sum the common columns and append the new ones. Expected result should be
a b c d
--------
1 x 11 11
2 y 20 1
3 z 30 1
3 h 2 20
pj looks to only update the (1,x) but doesn't insert the new (3,h). I am assuming there has to be a way to do some sort of union+plus join in kdb
You can take advantage of the plus (+) operator here by simply keying x and adding the table y to get the desired table:
q)(2!x)+y
a b| c d
---| -----
1 x| 11 11
2 y| 20 1
3 z| 30 1
3 h| 2 20
The same "plus if there's a matching key, insert if not" behaviour works for dictionaries too:
q)(`a`b!1 2)+`a`c!10 30
a| 11
b| 2
c| 30
got it :)
q) (x pj y), 0!select from y where not ([]a;b) in key 2!x
a b c d
--------
1 x 11 11
2 y 20 1
3 z 30 1
3 h 2 20
Always open for a better implementation :D I am sure there is one.

Combining logical operators

I have an expression of the form
A or A and B
Can we represent it more concisely by representing the expression some other way?
As stated the expression may be slightly ambiguous. It can be interpereted in two ways:
(A or A) and B
Obviously A or A is logically equivalent to A, so in this case the entire statement is simply equivalent to A and B
More likely, this is intended to be read as
A or (A and B)
Let's write a truth table for this
A B | A or (A and B) | result
-----------------------------
0 0 | 0 or (0 and 0) | 0
0 1 | 0 or (0 and 1) | 0
1 0 | 1 or (1 and 0) | 1
1 1 | 1 or (1 and 1) | 1
Now you can pretty clearly see, in this case the statement is equivalent to A alone.

Difference between correctly / incorrectly classified instances in decision tree and confusion matrix in Weka

I have been using Weka’s J48 decision tree to classify frequencies of keywords
in RSS feeds into target categories. And I think I may have a problem
reconciling the generated decision tree with the number of correctly classified
instances reported and in the confusion matrix.
For example, one of my .arff files contains the following data extracts:
#attribute Keyword_1_nasa_Frequency numeric
#attribute Keyword_2_fish_Frequency numeric
#attribute Keyword_3_kill_Frequency numeric
#attribute Keyword_4_show_Frequency numeric
...
#attribute Keyword_64_fear_Frequency numeric
#attribute RSSFeedCategoryDescription {BFE,FCL,F,M, NCA, SNT,S}
#data
0,0,0,34,0,0,0,0,0,40,0,0,0,0,0,0,0,0,0,0,24,0,0,0,0,13,0,0,0,0,0,0,0,0,0,0,0,0,
0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,BFE
0,0,0,10,0,0,0,0,0,11,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,
0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,BFE
0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,
0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,BFE
...
20,0,64,19,0,162,0,0,36,72,179,24,24,47,24,40,0,48,0,0,0,97,24,0,48,205,143,62,78,
0,0,216,0,36,24,24,0,0,24,0,0,0,0,140,24,0,0,0,0,72,176,0,0,144,48,0,38,0,284,
221,72,0,72,0,SNT
...
0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,6,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,8,0,0,0,0,0,
0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,S
And so on: there’s a total of 64 keywords (columns) and 570 rows where each one contains the frequency of a keyword in a feed for a day. In this case, there are 57 feeds for
10 days giving a total of 570 records to be classified. Each keyword is prefixed
with a surrogate number and postfixed with ‘Frequency’.
My use of the decision tree is with default parameters using 10x validation.
Weka reports the following:
Correctly Classified Instances 210 36.8421 %
Incorrectly Classified Instances 360 63.1579 %
With the following confusion matrix:
=== Confusion Matrix ===
a b c d e f g <-- classified as
11 0 0 0 39 0 0 | a = BFE
0 0 0 0 60 0 0 | b = FCL
1 0 5 0 72 0 2 | c = F
0 0 1 0 69 0 0 | d = M
3 0 0 0 153 0 4 | e = NCA
0 0 0 0 90 10 0 | f = SNT
0 0 0 0 19 0 31 | g = S
The tree is as follows:
Keyword_22_health_Frequency <= 0
| Keyword_7_open_Frequency <= 0
| | Keyword_52_libya_Frequency <= 0
| | | Keyword_21_job_Frequency <= 0
| | | | Keyword_48_pic_Frequency <= 0
| | | | | Keyword_63_world_Frequency <= 0
| | | | | | Keyword_26_day_Frequency <= 0: NCA (461.0/343.0)
| | | | | | Keyword_26_day_Frequency > 0: BFE (8.0/3.0)
| | | | | Keyword_63_world_Frequency > 0
| | | | | | Keyword_31_gaddafi_Frequency <= 0: S (4.0/1.0)
| | | | | | Keyword_31_gaddafi_Frequency > 0: NCA (3.0)
| | | | Keyword_48_pic_Frequency > 0: F (7.0)
| | | Keyword_21_job_Frequency > 0: BFE (10.0/1.0)
| | Keyword_52_libya_Frequency > 0: NCA (31.0)
| Keyword_7_open_Frequency > 0
| | Keyword_31_gaddafi_Frequency <= 0: S (32.0/1.0)
| | Keyword_31_gaddafi_Frequency > 0: NCA (4.0)
Keyword_22_health_Frequency > 0: SNT (10.0)
My question concerns reconciling the matrix to the tree or vice versa. As far as
I understand the results, a rating like (461.0/343.0) indicates that 461 instances have been classified as NCA. But how can that be when the matrix reveals only 153? I am
not sure how to interpret this so any help is welcome.
Thanks in advance.
The number in parentheses at each leaf should be read as (number of total instances of this classification at this leaf / number of incorrect classifications at this leaf).
In your example for the first NCA leaf, it says there are 461 test instances that were classified as NCA, and of those 461, there were 343 incorrect classifications. So there are 461-343 = 118 correctly classified instances at that leaf.
Looking through your decision tree, note that NCA is also at other leaves. I count 118 + 3 + 31 + 4 = 156 correctly classified instances out of 461 + 3 + 31 + 4 = 499 total classifications of NCA.
Your confusion matrix shows 153 correct classifications of NCA out of 39 + 60 + 72 + 69 + 153 + 90 + 19 = 502 total classifications of NCA.
So there is a slight difference between the tree (156/499) and your confusion matrix (153/502).
Note that if you are running Weka from the command-line, it outputs a tree and a confusion matrix for testing on all the training data and also for testing with cross-validation. Be careful that you are looking at the right matrix for the right tree. Weka outputs both training and test results, resulting in two pairs of matrix and tree. You may have mixed them up.

MATLAB: Identify if a value is repeated sequentially N times in a vector

I am trying to identify if a value is repeated sequentially in a vector N times. The challenge I am facing is that it could be repeated sequentially N times several times within the vector. The purpose is to determine how many times in a row certain values fall above the mean value. For example:
>> return_deltas
return_deltas =
7.49828129642663
11.5098198572327
15.1776644881294
11.256677995536
6.22315734182976
8.75582103474613
21.0488849115947
26.132605745393
27.0507649089989
...
(I only printed a few values for example but the vector is large.)
>> mean(return_deltas)
ans =
10.50007490258002
>> sum(return_deltas > mean(return_deltas))
ans =
50
So there are 50 instances of a value in return_deltas being greater than the mean of return_deltas.
I need to identify the number of times, sequentially, the value in return_deltas is greater than its mean 3 times in a row. In other words, if the values in return_deltas are greater than its mean 3 times in a row, that is one instance.
For example:
---------------------------------------------------------------------
| `return_delta` value | mean | greater or less | sequence |
|--------------------------------------------------------------------
| 7.49828129642663 |10.500074902 | LT | 1 |
| 11.5098198572327 |10.500074902 | GT | 1 |
| 15.1776644881294 |10.500074902 | GT | 2 |
| 11.256677995536 |10.500074902 | GT | 3 * |
| 6.22315734182976 |10.500074902 | LT | 1 |
| 8.75582103474613 |10.500074902 | LT | 2 |
| 21.0488849115947 |10.500074902 | GT | 1 |
| 26.132605745393 |10.500074902 | GT | 2 |
| 27.0507649089989 |10.500074902 | GT | 3 * |
---------------------------------------------------------------------
The star represents a successful sequence of 3 in a row. The result of this set would be two because there were two occasions where the value was greater than the mean 3 times in a row.
What I am thinking is to create a new vector:
>> a = return_deltas > mean(return_deltas)
that of course contains ones where values in return_deltas is greater than the mean and using it to find how many times sequentially, the value in return_deltas is greater than its mean 3 times in a row. I am attempting to do this with a built in function (if there is one, I have not discovered it) or at least avoiding loops.
Any thoughts on how I might approach?
With a little work, this snippet finds the starting index of every run of numbers:
[0 find(diff(v) ~= 0)] + 1
An Example:
>> v = [3 3 3 4 4 4 1 2 9 9 9 9 9]; # vector of integers
>> run_starts = [0 find(diff(v) ~= 0)] + 1 # may be better to diff(v) < EPSILON, for floating-point
run_starts =
1 4 7 8 9
To find the length of each run
>> run_lengths = [diff(run_starts), length(v) - run_starts(end) + 1]
This variables then makes it easy to query which runs were above a certain number
>> find(run_lengths >= 4)
ans =
5
>> find(run_lengths >= 2)
ans =
1 2 5
This tells us that the only run of at least four integers in a row was run #5.
However, there were three runs that were at least two integers in a row, specifically runs #1, #2, and #5.
You can reference where each run starts from the run_starts variable.

Can all boolean expressions be written sequentially?

I'm working with some tools, and the only way it can determine if a particular transaction is successful is if it passes various checks. However, it is limited in the way that it can only do one check at a time, and it must be sequential. Everything must be computed from left to right.
For example,
A || C && D
It will be computed with A || C first, and then the result will be AND'ed with D.
It gets tougher with parenthesis. I am unable to compute an expression like this, since B || C would need to be compututed first. I cannot work with any order of operations;
A && ( B || C)
I think I've worked this down to this sequential boolean expression,
C || B && A
Where C || B is computed first, then that result is AND'd with A
Can all boolean expressions be successfully worked into a sequential boolean expression? (Like the example I have)
The answer is no:
Consider A || B && C || D which has the truth table:
A | B | C | D |
0 | 0 | 0 | 0 | 0
0 | 0 | 0 | 1 | 0
0 | 0 | 1 | 0 | 0
0 | 0 | 1 | 1 | 0
0 | 1 | 0 | 0 | 0
0 | 1 | 0 | 1 | 1
0 | 1 | 1 | 0 | 1
0 | 1 | 1 | 1 | 1
1 | 0 | 0 | 0 | 0
1 | 0 | 0 | 1 | 1
1 | 0 | 1 | 0 | 1
1 | 0 | 1 | 1 | 1
1 | 1 | 0 | 0 | 0
1 | 1 | 0 | 1 | 1
1 | 1 | 1 | 0 | 1
1 | 1 | 1 | 1 | 1
If it were possible to evaluate sequentially there would have to be a last expression which would be one of two cases:
Case 1:
X || Y such that Y is one of A,B,C,D and X is any sequential boolean expression.
Now, since there is no variable in A,B,C,D where the entire expression is true whenever that variable is true, none of:
X || A
X || B
X || C
X || D
can possibly be the last operation in the expression (for any X).
Case 2:
X && Y: such that Y is one of A,B,C,D and X is any sequential boolean expression.
Now, since there is no variable in A,B,C,D where the entire expression is false whenever that variable is false, none of:
X && A
X && B
X && C
X && D
can possibly be the last operation in the expression (for any X).
Therefore you cannot write (A || B) && (C || D) in this way.
The reason you are able to do this for some expressions, like: A && ( B || C) becoming C || B && A is because that expression can be built recursively out of expressions which have one of the two properties above:
IE.
The truth table for A && ( B || C) is:
A | B | C |
0 | 0 | 0 | 0
0 | 0 | 1 | 0
0 | 1 | 0 | 0
0 | 1 | 1 | 0
1 | 0 | 0 | 0
1 | 0 | 1 | 1
1 | 1 | 0 | 1
1 | 1 | 1 | 1
Which we can quickly see has the property that it is false whenever A is 0. So Our expression Could be X && A.
Then we take A out of the truth table and look at only the rows where A is 1 is the original:
B | C
0 | 0 | 0
0 | 1 | 1
1 | 0 | 1
1 | 1 | 1
Which has the property that it is True whenever B is 1 (or C, we can pick here). So we can write the expression as
X || B and the entire expression becomes X || B && A
Then we reduce the table again to the portion where B was 0 and we get:
C
0 | 0
1 | 1
X is just C. So the final expression is C || B && A
This is a problem of rewriting an expression so that no parentheses occur on the right. Logical AND (∧) and OR (∨) are both commutative:
A ∧ B = B ∧ A
A ∨ B = B ∨ A
So you can rewrite any expression of the form “X a (Y)” as “(Y) a X”:
A ∧ (B ∧ C) = (A ∧ B) ∧ C
A ∧ (B ∨ C) = (B ∨ C) ∧ A
A ∨ (B ∧ C) = (B ∧ C) ∨ A
A ∨ (B ∨ C) = (B ∨ C) ∨ A
They are also distributive, by the following laws:
(A ∧ B) ∨ (A ∧ C)
= A ∧ (B ∨ C)
= (B ∨ C) ∧ A
(A ∨ B) ∧ (A ∨ C)
= A ∨ (B ∧ C)
= (B ∧ C) ∨ A
So many Boolean expressions can be rewritten without parentheses on the right. But, as a counter­example, there is no way to rewrite an expression such as (A ∧ B) ∨ (C ∧ D) if A ≠ C, because of the lack of common factors.
Can't you just do this:
(( A || C ) && D)
and for your second example:
((( A && C ) || B ) && A )
Would that work for you?
Hope that helps...
You'll hit problems if you need to do something like (A || B) && (C || D) unless you can store the intermediate values for later use.
If you're allowed to construct more than one chain and try them all until either one of them passes or they all fail (so each chain is effectively ORed with the next) then I should think you can handle any combination. The example above would become (where each line is a separate query):
(A && C) ||
(A && D) ||
(B && C) ||
(B && D)
However, for a very complex check this could easily get out of hand!