Suppose I have a query combining AND and OR conditions without parenthesis:
SELECT * FROM tbl1
WHERE a = 1 AND b = 2 OR c = 3;
How does PostgreSQL evaluate these conditions? Like (a = 1 AND b = 2) OR c = 3 or a = 1 AND (b = 2 OR c = 3). I couldn't find it anywhere in the documentation.
Note: I'm not purposefully writing an ambiguous query like this. I'm building a tool where the user could potentially create a query like that.
Note 2: If it makes any difference, I'm using PostgreSQL 9.6 in one instance and 11 in another.
AND is stronger than OR, so:
a AND b OR c == (a AND b) OR c
demo:db<>fiddle
a | b | c | a AND b OR c | (a AND b) OR c | a AND (b OR c)
:- | :- | :- | :----------- | :------------- | :-------
f | f | f | f | f | f
f | f | t | t | t | f
f | t | f | f | f | f
f | t | t | t | t | f
t | f | f | f | f | f
t | f | t | t | t | t
t | t | f | t | t | t
t | t | t | t | t | t
That, of course, means in your case:
a = 1 AND b = 2 OR c = 3 == (a = 1 AND b = 2) OR c = 3
I have data in following format sorted by timestamp, each row representing an event:
+----------+--------+---------+
|event_type| data |timestamp|
+----------+--------+---------+
| A | d1 | 1 |
| B | d2 | 2 |
| C | d3 | 3 |
| C | d4 | 4 |
| C | d5 | 5 |
| A | d6 | 6 |
| A | d7 | 7 |
| B | d8 | 8 |
| C | d9 | 9 |
| B | d10 | 12 |
| C | d11 | 20 |
+----------+--------+---------+
I need to collect these events into series like so:
1. Event of type C marks the end of the series
2. If there are multiple consecutive events of type C, they fall to the same series and the last one marks the end of that series
3. Each series can span 7 days at max, even if there is no C event to end it
Please also note that there can be multiple series in a single day. In reality, timestamp column are standard UNIX timestamps, here let the numbers express days for simplicity.
So desired output would look like this:
+---------------------+--------------------------------------------------------------------+
|first_event_timestamp| events: List[(event_type, data, timestamp)] |
+---------------------+--------------------------------------------------------------------+
| 1 | List((A, d1, 1), (B, d2, 2), (C, d3, 3), (C, d4, 4), (C, d5, 5)) |
| 6 | List((A, d6, 6), (A, d7, 7), (B, d8, 8), (C, d9, 9)) |
| 12 | List((B, d10, 12)) |
| 20 | List((C, d11, 20)) |
+---------------------+--------------------------------------------------------------------+
I tried to solve this using Window functions, where I would add 2 columns like this:
1. Seed column marked event directly succeeding an event of type C using some unique id
2. SeriesId was filled by values from Seed column using last() to mark all events in one series with same id
3. I would then group the events by the SeriesId
Unfortunately, this doesn't seem possible:
+----------+--------+---------+------+-----------+
|event_type| data |timestamp| seed | series_id |
+----------+--------+---------+------+-----------+
| A | d1 | 1 | null | null |
| B | d2 | 2 | null | null |
| C | d3 | 3 | null | null |
| C | d4 | 4 | 0 | 0 |
| C | d5 | 5 | 1 | 1 |
| A | d6 | 6 | 2 | 2 |
| A | d7 | 7 | null | 2 |
| B | d8 | 8 | null | 2 |
| C | d9 | 9 | null | 2 |
| B | d10 | 12 | 3 | 3 |
| C | d11 | 20 | null | 3 |
+----------+--------+---------+------+-----------+
I don't seem to be able to test preceding row on equality using lag(), i.e. following code:
df.withColumn(
"seed",
when(
(lag($"eventType", 1) === ventType.Conversion).over(w),
typedLit(DigestUtils.sha256Hex("some fields").substring(0, 32))
)
)
throws
org.apache.spark.sql.AnalysisException: Expression '(lag(eventType#76, 1, null) = C)' not supported within a window function.
As the table shows, it fails on case where there are multiple consecutive C events and also wouldn't work for the first and last series.
I'm kinda stuck here, any help would be appreciated(using Dataframe/dataset api is prefered).
Here is the approach
Identify the start of the event series, based on conditions
Tag the record as start event
select the records of start events
get the records end date (if we order the start event records desc, then
previous start time will be current end series time)
join the original data, with above dataset
Here is udf to tag the record as "start"
//tag the starting event, based on the conditions
def tagStartEvent : (String,String,Int,Int) => String = (prevEvent:String,currEvent:String,prevTimeStamp:Int,currTimeStamp:Int)=>{
//very first event is tagged as "start"
if (prevEvent == "start")
"start"
else if ((currTimeStamp - prevTimeStamp) > 7 )
"start"
else {
prevEvent match {
case "C" =>
if (currEvent == "A")
"start"
else if (currEvent == "B")
"start"
else // if current event C
""
case _ => ""
}
}
}
val tagStartEventUdf = udf(tagStartEvent)
data.csv
event_type,data,timestamp
A,d1,1
B,d2,2
C,d3,3
C,d4,4
C,d5,5
A,d6,6
A,d7,7
B,d8,8
C,d9,9
B,d10,12
C,d11,20
val df = spark.read.format("csv")
.option("header", "true")
.option("inferSchema", "true")
.load("data.csv")
val window = Window.partitionBy("all").orderBy("timestamp")
//tag the starting event
val dfStart =
df.withColumn("all", lit(1))
.withColumn("series_start",
tagStartEventUdf(
lag($"event_type",1, "start").over(window), df("event_type"),
lag($"timestamp",1,1).over(window),df("timestamp")))
val dfStartSeries = dfStart.filter($"series_start" === "start").select(($"timestamp").as("series_start_time"),$"all")
val window2 = Window.partitionBy("all").orderBy($"series_start_time".desc)
//get the series end times
val dfSeriesTimes = dfStartSeries.withColumn("series_end_time",lag($"series_start_time",1,null).over(window2)).drop($"all")
val dfSeries =
df.join(dfSeriesTimes).withColumn("timestamp_series",
// if series_end_time is null and timestamp >= series_start_time, then series_start_time
when(col("series_end_time").isNull && col("timestamp") >= col("series_start_time"), col("series_start_time"))
// if record greater or equal to series_start_time, and less than series_end_time, then series_start_time
.otherwise(when((col("timestamp") >= col("series_start_time") && col("timestamp") < col("series_end_time")), col("series_start_time")).otherwise(null)))
.filter($"timestamp_series".isNotNull)
dfSeries.show()
For a Karnaugh map of three or more variables deciding which side the variables go makes the solution easier to spot and simpler. But how do you know which side which variables go on.
eg. For variables x, y and z; You could have x and y as column headers and z as a row header or you could have y and z as column headers and x as a row header which would give two different tables
For maps with up to four variables, it is a matter of taste, which variable is put at which side. However, Mahoney maps as extension of Karnaugh maps for five and more variables do require a certain ordering along the side.
Expression for the following examples:
abcd!e + abc!de
Five-input Mahoney map:
Equivalent Karnaugh map:
de de
00 01 11 10 00 01 11 10
abc +---+---+---+---+ abc +---+---+---+---+
000 | 0 | 0 | 0 | 0 | 001 | 0 | 0 | 0 | 0 |
+---+---+---+---+ +---+---+---+---+
010 | 0 | 0 | 0 | 0 | 011 | 0 | 0 | 0 | 0 |
+---+---+---+---+ +---+---+---+---+
110 | 0 | 0 | 0 | 0 | 111 | 0 | 1 | 0 | 1 |
+---+---+---+---+ +---+---+---+---+
100 | 0 | 0 | 0 | 0 | 101 | 0 | 0 | 0 | 0 |
+---+---+---+---+ +---+---+---+---+
It is always possible to swap variables as shown here:
de de
00 01 11 10 00 01 11 10
abc +---+---+---+---+ abc +---+---+---+---+
000 | 0 | 0 | 0 | 0 | 001 | 0 | 0 | 0 | 0 |
+---+---+---+---+ +---+---+---+---+
010 | 0 | 0 | 0 | 0 | 011 | 0 | 0 | 0 | 0 |
+---+---+---+---+ +---+---+---+---+
110 | 0 | 0 | 0 | 0 | 111 | 0 | 1 | 0 | 1 |
+---+---+---+---+ +---+---+---+---+
100 | 0 | 0 | 0 | 0 | 101 | 0 | 0 | 0 | 0 |
+---+---+---+---+ +---+---+---+---+
Here you can find a nice online-tool to draw and simplify Karnaugh-Veitch/Mahoney maps.
I have the following logic:
if(!(A || B)) {}
How can this be simplified and how can this simplification be visualized?
A | B
-----
0 0
0 1 -
1 0 |- this is A OR B
1 1 -
A | B
-----
0 0 - This is !(A OR B) ?
0 1
1 0
1 1
The simplification !(A || B) <=> !A && !B (which is one of
De Morgan's laws, as noted by #JamesChoi) is best visualised
by observing that the truth value that accrues to the major
truth-functor in each expression is the same for all possible
distributions of truth values to the variables:
A | B | !(A || B) | !A && !B
---|-----|------------|----------
T | T | F(T T T) | FT F FT
T | F | F(T T F) | FT F TF
F | T | F(F T T) | TF F FT
F | F | T(F F F) | TF T TF
---------------------------------
^ ^
This shows that the expressions are truth-functionally equivalent. It is an
application of the truth-table method of propositional calculus.
The truth table for && is:
A | B | A && B
---|-----|-------
T | T | T T T
T | F | T F F
F | T | F F T
F | F | F F F
and the truth-table for || (inclusive-or) is:
A | B | A || B
---|-----|-------
T | T | T T T
T | F | T T F
F | T | F T T
F | F | F F F
The truth-table for ! must be self-evident.
How do I simplify this to 3 literals/letters?
= LM'+LN+N'B
How would you simplify this boolean expression? I don't know which boolean laws I need to use. I tried but I couldn't get it down to 3 literals only 4.
I have also not been able to reduce your expression to three literals.
The Karnaugh map:
BL
00 01 11 10
+---+---+---+---+
00 | 0 | 1 | 1 | 1 |
+---+---+---+---+
01 | 0 | 1 | 1 | 0 |
MN +---+---+---+---+
11 | 0 | 1 | 1 | 0 |
+---+---+---+---+
10 | 0 | 0 | 1 | 1 |
+---+---+---+---+
From looking at the map, you can see that three terms are needed to cover the nine minterms (depicted by "1") in the map. Each of the terms has two literals and covers four minterms. A term with just one literal would cover eight minterms.