Firebird and dummy conditions - firebird

I have two, similar queries. In one of case have additional dummy where conditions (1=1, 0=0, true):
FROM table1 t1
JOIN table2 t2 ON t2.fk_t1 =
JOIN table3 t3 ON = t1.fk_t3
0 = 0 AND /* with this in 1st case, without this line in 2nd case */
t3.field = 6
AND EXISTS (SELECT 1 FROM table2 x WHERE x.fk2_t2 =
All necessary fields are indexed.
For each case, Firebird (both versions 2.1 and 3.0) works differently, and statistics of reads see like this:
1st case (with 0=0):
Query Time
Prepare : 32,00 ms
Execute : 1 046,00 ms
Avg fetch time: 61,53 ms
Read : 8 342
Writes : 1
Fetches: 1 316 042
Marks : 0
Enhanced Info:
| Table Name | Records | Indexed | Non-Indexed | Updates | Deletes | Inserts | Backouts | Purges | Expunges |
| | Total | reads | reads | | | | | | |
|TABLE2 | 0 | 4804 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
|TABLE1 | 0 | 0 | 96884 | 0 | 0 | 0 | 0 | 0 | 0 |
|TABLE3 | 0 | 387553 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
And in 2nd case (without dummy condition):
Query Time
Prepare : 16,00 ms
Execute : 515,00 ms
Avg fetch time: 30,29 ms
Read : 7 570
Writes : 1
Fetches: 648 103
Marks : 0
Enhanced Info:
| Table Name | Records | Indexed | Non-Indexed | Updates | Deletes | Inserts | Backouts | Purges | Expunges |
| | Total | reads | reads | | | | | | |
|TABLE2 | 0 | 506 | 152655 | 0 | 0 | 0 | 0 | 0 | 0 |
|TABLE1 | 0 | 467 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
|TABLE3 | 0 | 1885 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
Queries have different execution plans.
It's strange for me. Why query with same sense of conditions works so different? How work FB optimizer and how write quick and optimal queries? How understand this?


Conditionally lag value over multiple rows

I am trying to find cases where one type of error causes multiple sequential instances of a second type of error on a vehicle. For example, if there are two vehicles, 'a' and 'b', and vehicle a has an error of type 1 ('error_1') on day 0, it can cause errors of type 2 ('error_2') on days 1, 2, 3, and 4. I want to create a variable named cascading_error that shows every consecutive error_2 following an error_1. Note that in the case of vehicle b, it is possible to have an error_2 without a preceding error_1, in which case the value for cascading_error should be 0.
Here's what I've tried:
vals = [('a',0,1,0),('a',1,0,1),('a',2,0,1),('a',3,0,1),('b',0,0,0),('b',1,0,0),('b',2,0,1), ('b',3,0,1)]
df = spark.createDataFrame(vals, ['vehicle','day','error_1','error_2'])
w = Window.partitionBy('vehicle').orderBy('day')
df = df.withColumn('cascading_error', F.lag(df.error_1).over(w) * df.error_2)
df = df.withColumn('cascading_error', F.when((F.lag(df.cascading_error).over(w)==1) & (df.error_2==1), F.lit(1)).otherwise(df.cascading_error))
This is my result
| vehicle | day | error_1 | error_2 | cascading_error |
| ------- | --- | ------- | ------- | --------------- |
| a | 0 | 1 | 0 | null |
| a | 1 | 0 | 1 | 1 |
| a | 2 | 0 | 1 | 1 |
| a | 3 | 0 | 1 | 0 |
| a | 4 | 0 | 1 | 0 |
| b | 0 | 0 | 0 | null |
| b | 1 | 0 | 0 | 0 |
| b | 2 | 0 | 1 | 0 |
| b | 3 | 0 | 1 | 0 |
The code is generating the correct cascading_error value on days 1 and 2 for vehicle a, but not on days 3 and 4, which should also be 1. It seems that the logic of combining cascading_error with error_2 to update cascading_error only works for a single row, not sequential ones.

Split postgres records into groups based on time fields

I have a table with records that look like this:
| id | coord-x | coord-y | time |
| 1 | 0 | 0 | 123 |
| 1 | 0 | 1 | 124 |
| 1 | 0 | 3 | 125 |
The time column represents a time in milliseconds. What I want to do is find all coord-x, coord-y as a set of points for a given timeframe for a given id. For any given id there is a unique coord-x, coord-y, and time.
What I need to do however is group these points as long as they're n milliseconds apart. So if I have this:
| id | coord-x | coord-y | time |
| 1 | 0 | 0 | 123 |
| 1 | 0 | 1 | 124 |
| 1 | 0 | 3 | 125 |
| 1 | 0 | 6 | 140 |
| 1 | 0 | 7 | 141 |
I would want a result similar to this:
| id | points | start-time | end-time |
| 1 | (0,0), (0,1), (0,3) | 123 | 125 |
| 1 | (0,140), (0,141) | 140 | 141 |
I do have PostGIS installed on my database, the times I posted above are not representative but I kept them small just as a sample, the time is just a millisecond timestamp.
The tricky part is picking the expression inside your GROUP BY. If n = 5, you can do something like time / 5. To match the example exactly, the query below uses (time - 3) / 5. Once you group it, you can aggregate them into an array with array_agg.
array_agg(("coord-x", "coord-y")) as points,
min(time) AS time_start,
max(time) AS time_end
FROM "<your_table>"
WHERE id = 1
GROUP BY (time - 3) / 5
Here is the output
| points | time_start | time_end |
| {"(0,0)","(0,1)","(0,3)"} | 123 | 125 |
| {"(0,6)","(0,7)"} | 140 | 141 |

PostgreSQL Join Two Tables by Nearest Date

I have a large single table of sent emails with dates and outcomes and I'd like to be able to match each row with the last time that email was sent and a specific outcome occurred (here that open=1). This needs to be done with PostgreSQL. For example:
Initial table:
id | sent_dt | bounced | open ` | clicked | unsubscribe
1 | 2015-01-01 | 1 | 0 | 0 | 0
1 | 2015-01-02 | 0 | 1 | 1 | 0
1 | 2015-01-03 | 0 | 1 | 1 | 0
2 | 2015-01-01 | 0 | 1 | 0 | 0
2 | 2015-01-02 | 1 | 0 | 0 | 0
2 | 2015-01-03 | 0 | 1 | 0 | 0
2 | 2015-01-04 | 0 | 1 | 0 | 1
Result table:
id | sent_dt | bounced| open | clicked | unsubscribe| previous_time
1 | 2015-01-01 | 1 | 0 | 0 | 0 | NULL
1 | 2015-01-02 | 0 | 1 | 1 | 0 | NULL
1 | 2015-01-03 | 0 | 1 | 1 | 0 | 2015-01-02
2 | 2015-01-01 | 0 | 1 | 0 | 0 | NULL
2 | 2015-01-02 | 1 | 0 | 0 | 0 | 2015-01-01
2 | 2015-01-03 | 0 | 1 | 0 | 0 | 2015-01-01
2 | 2015-01-04 | 0 | 1 | 0 | 1 | 2015-01-03
I have tried using Lag but I don't know how to go about that with the conditional that open needs to equal 1 while still returning all rows. I also tried doing a many to many Join on id then finding the minimum Datediff but that is going to essentially square the size of my table and takes entirely too long to compute (>7hrs). There are several answers which would work for SQL but none that I see work for PostgreSQL.
Thanks for any help guys!
You can use ROW_NUMBER() to achieve this desired result, connect each one to the one that occurred before if it has open = 1.
SELECT t.*,s.sent_dt
(SELECT p.*,
FROM YourTable p) t
(SELECT p.*,
FROM YourTable p) s
ON(t.rnk = s.rnk-1 AND = 1)
First I create a cte openFilter for the dates where the mail are open.
Then I join the table mail with those filter and get the dates previous to that email. Finally filter everyone execpt the latest open mail.
SQL Fiddle Demo
WITH openFilter as (
SELECT m."id", m."sent_dt"
FROM mail m
WHERE "open" = 1
SELECT m."id",
to_char(m."sent_dt", 'YYYY-MM-DD'),
"bounced", "open", "clicked", "unsubscribe",
to_char(o."sent_dt", 'YYYY-MM-DD') previous_time
FROM mail m
LEFT JOIN openFilter o
ON m."id" = o."id"
AND m."sent_dt" > o."sent_dt"
WHERE o."sent_dt" = (SELECT MAX(t."sent_dt")
FROM openFilter t
WHERE t."id" = m."id"
AND t."sent_dt" < m."sent_dt")
OR o."sent_dt" IS NULL
| id | to_char | bounced | open | clicked | unsubscribe | previous_time |
| 1 | 2015-01-01 | 1 | 0 | 0 | 0 | (null) |
| 1 | 2015-01-02 | 0 | 1 | 1 | 0 | (null) |
| 1 | 2015-01-03 | 0 | 1 | 1 | 0 | 2015-01-02 |
| 2 | 2015-01-01 | 0 | 1 | 0 | 0 | (null) |
| 2 | 2015-01-02 | 1 | 0 | 0 | 0 | 2015-01-01 |
| 2 | 2015-01-03 | 0 | 1 | 0 | 0 | 2015-01-01 |
| 2 | 2015-01-04 | 0 | 1 | 0 | 1 | 2015-01-03 |

Count occurrences of value in field for a particular ID using Redshift

I want to count the occurrences of particular values in a certain field for an ID. So what I have is this:
| Location ID | Group |
|:----------- |:---------|
| 1 | Group A |
| 2 | Group B |
| 3 | Group C |
| 4 | Group A |
| 4 | Group B |
| 4 | Group C |
| 3 | Group A |
| 2 | Group B |
| 1 | Group C |
| 2 | Group A |
And what I would hope to yield through some computer magic is this:
| Location ID | Group A Count | Group B Count | Group C count|
|:----------- |:--------------|:--------------|:-------------|
| 1 | 1 | 0 | 1 |
| 2 | 1 | 2 | 0 |
| 3 | 1 | 0 | 1 |
| 4 | 1 | 1 | 1 |
Is there some sort of pivoting function I can use in Redshift to achieve this?
This will require the usage of the CASE function and GROUP clause, as in example.
SELECT l_id,
SUM(CASE WHEN l_group = 'Group A' THEN 1 ELSE 0 END) AS a,
SUM(CASE WHEN l_group = 'Group B' THEN 1 ELSE 0 END) AS b-- and so on
FROM location
GROUP BY l_id;
This should give you such result:
| l_id | a | b |
| 4 | 1 | 1 |
| 1 | 1 | 0 |
| 3 | 1 | 0 |
| 2 | 1 | 2 |
You can play with it on this SQL Fiddle.

PostgreSQL group by bug on Unicode strings?

I have a very weird thing happening, where I noticed that a group by (word) wasn't always grouping by word if that word is a UTF-8 string. In the same query, I get cases where it's been grouped correctly, and cases where it hasn't. I wonder if anybody knows what's up with that?
select *,count(*) over (partition by md5(word)) as k
from (
select word,count(*) as n
from :tmpwl
group by 1
) a order by 1,2 limit 12;
/* gives:
word | n | k
いい | 1 | 1
くず | 1 | 1
ごみ | 1 | 1
さま | 1 | 1
さん | 1 | 1
へま | 1 | 1
まめ | 1 | 1
よく | 1 | 1
ろく | 1 | 1
ネガ | 1 | 2 -- what the heck?
ネガ | 1 | 2
パス | 1 | 1
Note that the following workaround works fine:
select word,n,count(*) over (partition by md5(word)) as k
from (
select md5(word),max(word) as word,count(*) as n
from :tmpwl
group by 1
) a order by 1,2 limit 12;
/* gives:
word | n | k
いい | 1 | 1
くず | 1 | 1
ごみ | 1 | 1
さま | 1 | 1
さん | 1 | 1
へま | 1 | 1
まめ | 1 | 1
よく | 1 | 1
ろく | 1 | 1
ネガ | 2 | 1
パス | 1 | 1
プア | 1 | 1
The version is PostgreSQL 8.2.14 (Greenplum Database build 3 Single-Node Edition) on x86_64-unknown-linux-gnu, compiled by GCC gcc.exe (GCC) 4.1.1 compiled on Nov 30 2010 17:20:26.
The source table :tmpwl:
\d :tmpwl
Table "pg_temp_25149.pdtmp_foo706453357357532"
Column | Type | Modifiers
baseword | text |
word | text |
value | integer |
lexicon | text |
nalts | bigint |
Distributed by: (word)