In a table with some fields built for testing, we can select field = 0.0 but not <> 0.0
Table contents:
| id | lower_name | sphinx_internal_id | sphinx_internal_class | sphinx_deleted | latt | lngt | status_hash | uploaded_as_hash | wifi |
| 119626 | lekarna novo mesto | 23925 | Spot | 0 | 0.000000 | 0.400000 | 75439824 | 21396735 | 0 |
| 119627 | lekarna novo mesto | 23925 | Spot | 0 | 0.000000 | 1.700000 | 75439824 | 21396735 | 0 |
2 row in set (0.00 sec)
Example output:
> SELECT * FROM `spot_core`, `spot_delta` WHERE `latt` = 0.0;
| id | lower_name | sphinx_internal_id | sphinx_internal_class | sphinx_deleted | latt | lngt | status_hash | uploaded_as_hash | wifi |
| 119626 | lekarna novo mesto | 23925 | Spot | 0 | 0.000000 | 0.000000 | 75439824 | 21396735 | 0 |
1 row in set (0.00 sec)
> SELECT * FROM `spot_core`, `spot_delta` WHERE `latt` <> 0.0;
ERROR 1064 (42000): sphinxql: NEQ filter on floats is not (yet?) supported near '0.0'
> SELECT * FROM `spot_core`, `spot_delta` WHERE `latt` != 0.0;
ERROR 1064 (42000): sphinxql: NEQ filter on floats is not (yet?) supported near '0.0'
> SELECT * FROM `spot_core`, `spot_delta` WHERE `latt` != 0;
Empty set, 1 warning (0.00 sec)
> SELECT * FROM `spot_core`, `spot_delta` WHERE `latt` <> 0;
Empty set, 1 warning (0.00 sec)
Can do
SELECT *,IF(latt<>0.0,1,0) AS myint FROM `spot_core`, `spot_delta` WHERE myint = 1;
The IF() function implements the threshold required for comparing floats.
Related
I have a dataset that is similar to this. I need to pick out the most recent metadata (greater execution time = more recent) for a client including the sum of quantities and the latest execution time and meta where the quantity > 0
| Name | Quantity | Metadata | Execution time |
| -------- | ---------|----------|----------------|
| Neil | 1 | [1,3] | 4 |
| James | 1 | [2,18] | 5 |
| Neil | 1 | [4, 1] | 6 |
| Mike | 1 | [5, 42] | 7 |
| James | -1 | Null | 8 |
| Neil | -1 | Null | 9 |
Eg the query needs to return:
| Name | Summed Quantity | Metadata | Execution time |
| -------- | ----------------|----------|----------------|
| James | 0 | [2,18] | 5 |
| Neil | 1 | [4, 1] | 6 |
| Mike | 1 | [5, 42] | 7 |
My query doesn't quite work as it's not returning the sum of the quantities correctly.
SELECT
distinct on (name) name,
(
SELECT
cast(
sum(quantity) as int
)
) as summed_quantity,
meta,
execution_time
FROM
table
where
quantity > 0
group by
name,
meta,
execution_time
order by
name,
execution_time desc;
This query gives a result of
| Name | Summed Quantity | Metadata | Execution time |
| -------- | ----------------|----------|----------------|
| James | 1 | [2,18] | 5 |
| Neil | 1 | [4, 1] | 6 |
| Mike | 1 | [5, 42] | 7 |
ie it's just taking the quantity > 0 from the where and not adding up the quantities in the sub query (i assume because of the distinct clause) I'm unsure how to fix my query to produce the desired output.
This can be achieved using window functions (hence with a single pass of the data)
select
name
, sum_qty
, metadata
, execution_time
from (
select
*
, sum(Quantity) over(partition by name) sum_qty
, row_number() over(partition by name, case when quantity > 0 then 1 else 0 end
order by Execution_time DESC) as rn
from mytable
) d
where rn = 1 and quantity > 0
order by name
result
+-------+---------+----------+----------------+
| name | sum_qty | metadata | execution_time |
+-------+---------+----------+----------------+
| James | 0 | [2,18] | 5 |
| Mike | 1 | [5,42] | 7 |
| Neil | 1 | [4,1] | 6 |
+-------+---------+----------+----------------+
db<>fiddle here
I have two, similar queries. In one of case have additional dummy where conditions (1=1, 0=0, true):
SELECT t1.*
FROM table1 t1
JOIN table2 t2 ON t2.fk_t1 = t1.id
JOIN table3 t3 ON t3.id = t1.fk_t3
WHERE
0 = 0 AND /* with this in 1st case, without this line in 2nd case */
t3.field = 6
AND EXISTS (SELECT 1 FROM table2 x WHERE x.fk2_t2 = t2.id)
All necessary fields are indexed.
For each case, Firebird (both versions 2.1 and 3.0) works differently, and statistics of reads see like this:
1st case (with 0=0):
Query Time
------------------------------------------------
Prepare : 32,00 ms
Execute : 1 046,00 ms
Avg fetch time: 61,53 ms
Operations
------------------------------------------------
Read : 8 342
Writes : 1
Fetches: 1 316 042
Marks : 0
Enhanced Info:
+-------------------------------+-----------+-----------+-------------+---------+---------+---------+----------+----------+----------+
| Table Name | Records | Indexed | Non-Indexed | Updates | Deletes | Inserts | Backouts | Purges | Expunges |
| | Total | reads | reads | | | | | | |
+-------------------------------+-----------+-----------+-------------+---------+---------+---------+----------+----------+----------+
|TABLE2 | 0 | 4804 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
|TABLE1 | 0 | 0 | 96884 | 0 | 0 | 0 | 0 | 0 | 0 |
|TABLE3 | 0 | 387553 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
+-------------------------------+-----------+-----------+-------------+---------+---------+---------+----------+----------+----------+
And in 2nd case (without dummy condition):
Query Time
------------------------------------------------
Prepare : 16,00 ms
Execute : 515,00 ms
Avg fetch time: 30,29 ms
Operations
------------------------------------------------
Read : 7 570
Writes : 1
Fetches: 648 103
Marks : 0
Enhanced Info:
+-------------------------------+-----------+-----------+-------------+---------+---------+---------+----------+----------+----------+
| Table Name | Records | Indexed | Non-Indexed | Updates | Deletes | Inserts | Backouts | Purges | Expunges |
| | Total | reads | reads | | | | | | |
+-------------------------------+-----------+-----------+-------------+---------+---------+---------+----------+----------+----------+
|TABLE2 | 0 | 506 | 152655 | 0 | 0 | 0 | 0 | 0 | 0 |
|TABLE1 | 0 | 467 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
|TABLE3 | 0 | 1885 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
+-------------------------------+-----------+-----------+-------------+---------+---------+---------+----------+----------+----------+
Queries have different execution plans.
PLAN JOIN (T2 NATURAL, T1 INDEX (T1_ID_IDX), T3 INDEX (T3_ID_IDX))
PLAN JOIN (T1 NATURAL, T3 INDEX (T3_ID_IDX1), T2 INDEX (T2_FK_T1_IDX))
It's strange for me. Why query with same sense of conditions works so different? How work FB optimizer and how write quick and optimal queries? How understand this?
P.S. https://github.com/FirebirdSQL/firebird/issues/6941
I have a table in Postgres that I would like to copy into from a csv file. I usually do as so:
\copy my_table from '/workdir/some_file.txt' with null as 'NULL' delimiter E'|' csv header;
The problem is now however that my_table has one column extra that I would like to fill in manually on copy, with the same value 'b'. Here are my tables:
some_file.txt:
col1 | col2 | col3
0 0 1
0 1 3
my_table :
xtra_col | col1 | col2 | col3
a 5 2 5
a 6 2 5
a 7 2 5
Desired my_table after copy into:
xtra_col | col1 | col2 | col3
a 5 2 5
a 6 2 5
a 7 2 5
b 0 0 1
b 0 1 3
Is there a way to mention the persisting 'b' value in the copy statement for column `xtra_col'. If not, how should I approach this problem?
You could set a (temporary) default value for the xtra_col:
ALTER TABLE my_table ALTER COLUMN xtra_col SET DEFAULT 'b';
COPY my_table (col1, col2, col3) FROM '/workdir/some_file.txt' WITH (FORMAT CSV, DELIMITER '|', NULL 'NULL', HEADER true);
ALTER TABLE my_table ALTER COLUMN xtra_col DROP DEFAULT;
is there a way to not repeat columns in my_table? the real my_table has 20 columns and i wouldnt want to call all of them.
If my_table has a lot of columns and you wish to avoid having to type out all the column names,
you could dynamically generate the COPY command like this:
SELECT format($$COPY my_table(%s) FROM '/workdir/some_file.txt' WITH (FORMAT CSV, DELIMITER '|', NULL 'NULL', HEADER true);$$
, string_agg(quote_ident(attname), ','))
FROM pg_attribute
WHERE attrelid = 'my_table'::regclass
AND attname != 'xtra_col'
AND attnum > 0
you could then copy-and-paste the SQL to run it.
Or, for totally hands-free operation, you could create a function to generate the SQL and execute it:
CREATE OR REPLACE FUNCTION test_func(filepath text, xcol text, fillval text)
RETURNS void
LANGUAGE plpgsql
AS $func$
DECLARE sql text;
BEGIN
EXECUTE format($$ALTER TABLE my_table ALTER COLUMN %s SET DEFAULT '%s';$$, xcol, fillval);
SELECT format($$COPY my_table(%s) FROM '%s' WITH (FORMAT CSV, DELIMITER '|', NULL 'NULL', HEADER true);$$
, string_agg(quote_ident(attname), ','), filepath)
INTO sql
FROM pg_attribute
WHERE attrelid = 'my_table'::regclass
AND attname != 'xtra_col'
AND attnum > 0;
EXECUTE sql;
EXECUTE format($$ALTER TABLE my_table ALTER COLUMN %s DROP DEFAULT;$$, xcol);
END;
$func$;
SELECT test_func('/workdir/some_file.txt', 'xtra_col', 'b');
This is the sql I used to test the solution above:
DROP TABLE IF EXISTS test;
CREATE TABLE test (
xtra_col text
, col1 int
, col2 int
, col3 int
);
INSERT INTO test VALUES
('a', 5, 2, 5)
, ('a', 6, 2, 5)
, ('a', 7, 2, 5);
with the contents of /tmp/data being
col1 | col2 | col3
0 | 0 | 1
0 | 1 | 3
Then
SELECT test_func('/tmp/data', 'xtra_col', 'b');
SELECT * FROM test;
results in
+----------+------+------+------+
| xtra_col | col1 | col2 | col3 |
+----------+------+------+------+
| a | 5 | 2 | 5 |
| a | 6 | 2 | 5 |
| a | 7 | 2 | 5 |
| b | 0 | 0 | 1 |
| b | 0 | 1 | 3 |
+----------+------+------+------+
(5 rows)
Regarding the pg.dropped column:
The test_func call does not seem to produce the pg.dropped column, at least on the test table used above:
unutbu=# SELECT *
FROM pg_attribute
WHERE attrelid = 'test'::regclass;
+----------+----------+----------+---------------+--------+--------+----------+-------------+-----------+----------+------------+----------+------------+-----------+-------------+--------------+------------+-------------+--------------+--------+------------+---------------+
| attrelid | attname | atttypid | attstattarget | attlen | attnum | attndims | attcacheoff | atttypmod | attbyval | attstorage | attalign | attnotnull | atthasdef | attidentity | attisdropped | attislocal | attinhcount | attcollation | attacl | attoptions | attfdwoptions |
+----------+----------+----------+---------------+--------+--------+----------+-------------+-----------+----------+------------+----------+------------+-----------+-------------+--------------+------------+-------------+--------------+--------+------------+---------------+
| 53393 | tableoid | 26 | 0 | 4 | -7 | 0 | -1 | -1 | t | p | i | t | f | | f | t | 0 | 0 | | | |
| 53393 | cmax | 29 | 0 | 4 | -6 | 0 | -1 | -1 | t | p | i | t | f | | f | t | 0 | 0 | | | |
| 53393 | xmax | 28 | 0 | 4 | -5 | 0 | -1 | -1 | t | p | i | t | f | | f | t | 0 | 0 | | | |
| 53393 | cmin | 29 | 0 | 4 | -4 | 0 | -1 | -1 | t | p | i | t | f | | f | t | 0 | 0 | | | |
| 53393 | xmin | 28 | 0 | 4 | -3 | 0 | -1 | -1 | t | p | i | t | f | | f | t | 0 | 0 | | | |
| 53393 | ctid | 27 | 0 | 6 | -1 | 0 | -1 | -1 | f | p | s | t | f | | f | t | 0 | 0 | | | |
| 53393 | xtra_col | 25 | -1 | -1 | 1 | 0 | -1 | -1 | f | x | i | f | f | | f | t | 0 | 100 | | | |
| 53393 | col1 | 23 | -1 | 4 | 2 | 0 | -1 | -1 | t | p | i | f | f | | f | t | 0 | 0 | | | |
| 53393 | col2 | 23 | -1 | 4 | 3 | 0 | -1 | -1 | t | p | i | f | f | | f | t | 0 | 0 | | | |
| 53393 | col3 | 23 | -1 | 4 | 4 | 0 | -1 | -1 | t | p | i | f | f | | f | t | 0 | 0 | | | |
+----------+----------+----------+---------------+--------+--------+----------+-------------+-----------+----------+------------+----------+------------+-----------+-------------+--------------+------------+-------------+--------------+--------+------------+---------------+
(10 rows)
As far as I know, the pg.dropped column is a natural result of how PostgreSQL works when a column is dropped. So no fix is necessary.
Rows whose attname contains pg.dropped also have a negative attnum.
This is why attnum > 0 was used in test_func -- to remove such rows from the generated list of column names.
My experience with Postgresql is limited, so I might be wrong. If you can produce an example which generates a pg.dropped "column" with positive attnum, I'd very much like to see it.
I usually load a file into a temporary table then insert (or update) from there. In this case,
CREATE TEMP TABLE input (LIKE my_table);
ALTER TABLE input DROP xtra_col;
\copy input from 'some_file.txt' ...
INSERT INTO my_table
SELECT 'b', * FROM input;
The INSERT statement looks tidy, but that can only really be achieved when the columns you want to exclude are on either end of my_table. In your (probably simplified) example, xtra_col is at the front so we can quickly append the remaining columns using *.
If the arrangement of CSV file columns differs my_table much more than that, you'll need to start typing out column names.
I have a large single table of sent emails with dates and outcomes and I'd like to be able to match each row with the last time that email was sent and a specific outcome occurred (here that open=1). This needs to be done with PostgreSQL. For example:
Initial table:
id | sent_dt | bounced | open ` | clicked | unsubscribe
1 | 2015-01-01 | 1 | 0 | 0 | 0
1 | 2015-01-02 | 0 | 1 | 1 | 0
1 | 2015-01-03 | 0 | 1 | 1 | 0
2 | 2015-01-01 | 0 | 1 | 0 | 0
2 | 2015-01-02 | 1 | 0 | 0 | 0
2 | 2015-01-03 | 0 | 1 | 0 | 0
2 | 2015-01-04 | 0 | 1 | 0 | 1
Result table:
id | sent_dt | bounced| open | clicked | unsubscribe| previous_time
1 | 2015-01-01 | 1 | 0 | 0 | 0 | NULL
1 | 2015-01-02 | 0 | 1 | 1 | 0 | NULL
1 | 2015-01-03 | 0 | 1 | 1 | 0 | 2015-01-02
2 | 2015-01-01 | 0 | 1 | 0 | 0 | NULL
2 | 2015-01-02 | 1 | 0 | 0 | 0 | 2015-01-01
2 | 2015-01-03 | 0 | 1 | 0 | 0 | 2015-01-01
2 | 2015-01-04 | 0 | 1 | 0 | 1 | 2015-01-03
I have tried using Lag but I don't know how to go about that with the conditional that open needs to equal 1 while still returning all rows. I also tried doing a many to many Join on id then finding the minimum Datediff but that is going to essentially square the size of my table and takes entirely too long to compute (>7hrs). There are several answers which would work for SQL but none that I see work for PostgreSQL.
Thanks for any help guys!
You can use ROW_NUMBER() to achieve this desired result, connect each one to the one that occurred before if it has open = 1.
SELECT t.*,s.sent_dt
FROM
(SELECT p.*,
ROW_NUMBER() OVER(PARTITION BY ID ORDER BY sent_dt DESC) rnk
FROM YourTable p) t
LEFT OUTER JOIN
(SELECT p.*,
ROW_NUMBER() OVER(PARTITION BY ID ORDER BY sent_dt DESC) rnk
FROM YourTable p) s
ON(t.rnk = s.rnk-1 AND s.open = 1)
First I create a cte openFilter for the dates where the mail are open.
Then I join the table mail with those filter and get the dates previous to that email. Finally filter everyone execpt the latest open mail.
SQL Fiddle Demo
WITH openFilter as (
SELECT m."id", m."sent_dt"
FROM mail m
WHERE "open" = 1
)
SELECT m."id",
to_char(m."sent_dt", 'YYYY-MM-DD'),
"bounced", "open", "clicked", "unsubscribe",
to_char(o."sent_dt", 'YYYY-MM-DD') previous_time
FROM mail m
LEFT JOIN openFilter o
ON m."id" = o."id"
AND m."sent_dt" > o."sent_dt"
WHERE o."sent_dt" = (SELECT MAX(t."sent_dt")
FROM openFilter t
WHERE t."id" = m."id"
AND t."sent_dt" < m."sent_dt")
OR o."sent_dt" IS NULL
Output
| id | to_char | bounced | open | clicked | unsubscribe | previous_time |
|----|------------|---------|------|---------|-------------|---------------|
| 1 | 2015-01-01 | 1 | 0 | 0 | 0 | (null) |
| 1 | 2015-01-02 | 0 | 1 | 1 | 0 | (null) |
| 1 | 2015-01-03 | 0 | 1 | 1 | 0 | 2015-01-02 |
| 2 | 2015-01-01 | 0 | 1 | 0 | 0 | (null) |
| 2 | 2015-01-02 | 1 | 0 | 0 | 0 | 2015-01-01 |
| 2 | 2015-01-03 | 0 | 1 | 0 | 0 | 2015-01-01 |
| 2 | 2015-01-04 | 0 | 1 | 0 | 1 | 2015-01-03 |
I'm looking to preserve the sid, and cid pairs that link my tables when using SELECT DISTINCT in my query. signature, ip_src, and ip_dst is what makes it distinct. I just want the output to also include the corresponding sid and cid pairs.
QUERY:
SELECT DISTINCT signature, ip_src, ip_dst FROM
(SELECT *
FROM event
INNER JOIN sensor ON (sensor.sid = event.sid)
INNER JOIN iphdr ON (iphdr.cid = event.cid) AND (iphdr.sid = event.sid)
WHERE timestamp >= NOW() - '1 day'::INTERVAL
ORDER BY timestamp DESC)
as d_dup;
OUTPUT:
signature | ip_src | ip_dst
-----------+------------+------------
29177 | 3244829114 | 2887777034
29177 | 2960340989 | 2887777034
29179 | 2887777893 | 2887777556
29178 | 1208608738 | 2887777034
29178 | 1211607091 | 2887777034
29177 | 776526845 | 2887777034
29177 | 1332731268 | 2887777034
(7 rows)
SUB QUERY:
SELECT *
FROM event
INNER JOIN sensor ON (sensor.sid = event.sid)
INNER JOIN iphdr ON (iphdr.cid = event.cid) AND (iphdr.sid = event.sid)
WHERE timestamp >= NOW() - '1 day'::INTERVAL
ORDER BY timestamp DESC;
OUTPUT:
sid | cid | signature | timestamp | sid | hostname | interface | filter | detail | encoding | last_cid | sid | cid | ip_src | ip_dst | ip_ver | ip_hlen | ip_tos | ip_len | ip_id | ip_flags | ip_off | ip_ttl | ip_proto | ip_csum
-----+-------+-----------+-------------------------+-----+---------------------+-----------+--------+--------+----------+----------+-----+-------+------------+------------+--------+---------+--------+--------+-------+----------+--------+--------+----------+---------
3 | 13123 | 29177 | 2014-11-15 20:53:14.656 | 3 | VS-101-Z0:dna0:dna1 | dna0:dna1 | | 1 | 0 | 12888 | 3 | 13123 | 3244829114 | 2887777034 | 4 | 5 | 0 | 344 | 19301 | 0 | 0 | 122 | 6 | 8686
3 | 13122 | 29177 | 2014-11-15 20:53:14.43 | 3 | VS-101-Z0:dna0:dna1 | dna0:dna1 | | 1 | 0 | 12888 | 3 | 13122 | 3244829114 | 2887777034 | 4 | 5 | 0 | 69 | 19071 | 0 | 0 | 122 | 6 | 9191
3 | 13121 | 29177 | 2014-11-15 18:45:13.461 | 3 | VS-101-Z0:dna0:dna1 | dna0:dna1 | | 1 | 0 | 12888 | 3 | 13121 | 3244829114 | 2887777034 | 4 | 5 | 0 | 366 | 25850 | 0 | 0 | 122 | 6 | 2115
3 | 13120 | 29177 | 2014-11-15 18:45:13.23 | 3 | VS-101-Z0:dna0:dna1 | dna0:dna1 | | 1 | 0 | 12888 | 3 | 13120 | 3244829114 | 2887777034 | 4 | 5 | 0 | 69 | 25612 | 0 | 0 | 122 | 6 | 2650
3 | 13119 | 29177 | 2014-11-15 18:45:01.887 | 3 | VS-101-Z0:dna0:dna1 | dna0:dna1 | | 1 | 0 | 12888 | 3 | 13119 | 3244829114 | 2887777034 | 4 | 5 | 0 | 352 | 13697 | 0 | 0 | 122 | 6 | 14282
3 | 13118 | 29177 | 2014-11-15 18:45:01.681 | 3 | VS-101-Z0:dna0:dna1 | dna0:dna1 | | 1 | 0 | 12888 | 3 | 13118 | 3244829114 | 2887777034 | 4 | 5 | 0 | 69 | 13464 | 0 | 0 | 122 | 6 | 14798
4 | 51 | 29179 | 2014-11-15 18:44:02.06 | 4 | VS-101-Z1:dna2:dna3 | dna2:dna3 | | 1 | 0 | 51 | 4 | 51 | 2887777893 | 2887777556 | 4 | 5 | 0 | 80 | 18830 | 0 | 0 | 63 | 17 | 40533
3 | 13117 | 29177 | 2014-11-15 18:41:46.418 | 3 | VS-101-Z0:dna0:dna1 | dna0:dna1 | | 1 | 0 | 12888 | 3 | 13117 | 1332731268 | 2887777034 | 4 | 5 | 0 | 261 | 15393 | 0 | 0 | 119 | 6 | 62131
...
(30 rows)
How do I keep the sid, and cid when using SELECT DISTINCT?
This is shorter and probably faster:
SELECT DISTINCT ON (signature, ip_src, ip_dst)
signature, ip_src, ip_dst, sid, cid
FROM event e
JOIN sensor s USING (sid)
JOIN iphdr i USING (cid, sid)
WHERE timestamp >= NOW() - '1 day'::interval
ORDER BY signature, ip_src, ip_dst, timestamp DESC;
Assuming you want the latest row (greatest timestamp) from each set of dupes.
Detailed explanation:
Select first row in each GROUP BY group?
Sounds like you are looking for a window function:
SELECT *
FROM (
SELECT *,
row_number() over (partition by signature, ip_src, ip_dst order by timestamp desc) as rn
FROM event
JOIN sensor ON sensor.sid = event.sid
JOIN iphdr ON iphdr.cid = event.cid AND iphdr.sid = event.sid
WHERE timestamp >= NOW() - interval '1' day
) as d_dup
where rn = 1
order by timestamp desc;
Maybe something like this?
SELECT DISTINCT e.sid, e.cid, ip_src, ip_dst
FROM event e
INNER JOIN sensor s ON (s.sid = e.sid)
INNER JOIN iphdr i ON (i.cid = e.cid) AND (i.sid = e.sid)
WHERE timestamp >= NOW() - '1 day'::INTERVAL;
If you want the combination of (signature, ip_src, ip_dst) to be unique in the result (one row for each combination) then you can try something like this:
SELECT max(e.cid), max(e.sid), signature, ip_src, ip_dst
FROM event e
INNER JOIN sensor s ON (s.sid = e.sid)
INNER JOIN iphdr i ON (i.cid = e.cid) AND (i.sid = e.sid)
WHERE timestamp >= NOW() - '1 day'::INTERVAL
GROUP BY signature, ip_src, ip_dst;
But it will give max cid and sid for each combination