DB2 cardinatily estimation for BETWEEN predicate

DB2 cardinatily estimation for BETWEEN predicate - db2

This is more of an academic question, since I´m interested in the details of DB2 query optimizer.
I got a table with 10000 records and no indexes. COLCARD in SYSCAT.TABLES is showing 10000 and a column named MesFabricacao has the following COLSTAT:
COLCARD HIGH2KEY LOW2KEY TABNAME COLUMN
198 198 2 CARINFO MESFABRICACAO
According to the core manual “DB2PerfTuneTroubleshoot-db2d3e1011.pdf” page 451 the cardinality formula for a between predicate with no histogram would be: (( KEY2 - KEY1) / (HIGH2KEY - LOW2KEY)) * CARD
For the given query "SELECT COUNT(*) FROM STATS.CARINFO WHERE MesFabricacao BETWEEN 1 AND 3", using db2exfmt I see a filter factor of 0.0151258, a value that I can´t explain why the QO is using for estimation.
Does anyone have an explanation on why DB2 is applying this filter factor? I´m using DB2 10.1.0.0.
(Output from db2exfmt)
Predicates:
2) Sargable Predicate,
Comparison Operator: Less Than or Equal (<=)
Subquery Input Required: No
Filter Factor: 0.0151258
Predicate Text:
--------------
(Q1.MESFABRICACAO <= 3)
3) Sargable Predicate,
Comparison Operator: Less Than or Equal (<=)
Subquery Input Required: No
Filter Factor: 1
Predicate Text:
--------------
(1 <= Q1.MESFABRICACAO)
Input Streams:
-------------
1) From Object STATS.CARINFO
Estimated number of rows: 10000
Number of columns: 2
Subquery predicate ID: Not Applicable
Column Names:
------------
+Q1.$RID$+Q1.MESFABRICACAO
Output Streams:
--------------
2) To Operator #2
Estimated number of rows: 151.258
Number of columns: 0

Related

How to sum the total length of an array of uuids column

Currently, I have 1 table consisting of id and otherIds
I want to calculate the sum of otherIds present in database.
id: 1, otherIds: {1,2,3,4,5}
id: 2, otherIds: {3,4,5}
id: 3, otherIds: {9,2,1}
id: 4, otherIds: {}
Desired result: 11 (5 + 3 + 3 + 0)
SELECT
sum(jsonb_array_elements("table"."otherIds")) as "sumLength"
FROM
"Table"
LIMIT 1
[42883] ERROR: function jsonb_array_elements(uuid[]) does not exist

I don't see how JSONB is relevant here. If otherIds is an array of UUID values then wouldn't you just need
SELECT
SUM(ARRAY_LENGTH("table"."otherIds")) as "sumLength"
FROM
"Table"
LIMIT 1

You can get the number of elements in an array with the cardinality() function. Just sum the results over all rows.
I'd like to remark that a table design that includes an array of UUIDs is not pretty and will probable gibe you performance and data integrity problems some day.

PostgreSQL: get first non null value per group

I'd like to obtain is the first ocurrence of non-null value per category.
If there are just null values, the result of this category shall be NULL.
For a table like this:
Category Value
1 NULL
1 1922
2 23
2 99
3 NULL
3 NULL
the result should be
Category Value
1 1922
2 23
3 NULL
How can this be achieved using postgres?

Unfortunately the two features that would make this trivial are not implemented in postgresql
IGNORE NULLS in FIRST_VALUE, LAST_VALUE
FILTER clause in non-aggregate window functions
However, you can hack the desired result using groupby & array_agg , which does support the FILTER clause, and then pick the first element using square-bracket syntax. (recall that postgresql array indexing starts with 1)
Also, I would advise that you provide an explicit ordering for the aggregation step. Otherwise the value that ends up as the first element would depend on the query plan & physical data layout of the underlying table.
WITH vals (category, val) AS ( VALUES
(1,NULL),
(1,1922),
(2,23),
(2,99),
(3,NULL),
(3,NULL)
)
SELECT
category
, (ARRAY_AGG(val) FILTER (WHERE val IS NOT NULL))[1]
FROM vals
GROUP BY 1
produces the following output:
category | array_agg
----------+-----------
1 | 1922
3 |
2 | 23
(3 rows)

Can PostgreSQL LAG() function refer to itself?

I've just discovered LAG() function in PostgreSQL and I've been experimenting to see what it can achieve. I've though that I might calculate factorial with it and I wrote
SELECT i, i * lag(factorial, 1, 1) OVER (ORDER BY i, 1) as factorial FROM generate_series(1, 10) as i;
But online IDE complains that 42703 column "factorial" does not exist.
Is there any way I can access the result of previous LAG call?

You can't refer to the column recursively in its definition.
However, you can express the factorial calculation as:
SELECT i, EXP(SUM(LN(i)) OVER w)::int factorial
FROM generate_series(1, 10) i
WINDOW w AS (ORDER BY i ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW);
-- outputs:
i | factorial
----+-----------
1 | 1
2 | 2
3 | 6
4 | 24
5 | 120
6 | 720
7 | 5040
8 | 40320
9 | 362880
10 | 3628800
(10 rows)
Postgresql does support an advanced SQL feature called recursive query, which can also be used to express the factorial table recursively:
WITH RECURSIVE series AS (
SELECT i FROM generate_series(1, 10) i
)
, rec AS (
SELECT i, 1 factorial FROM series WHERE i = 1
UNION ALL
SELECT series.i, series.i * rec.factorial
FROM series
JOIN rec ON series.i = rec.i + 1
)
SELECT *
FROM rec;
what EXP(SUM(LN(i)) OVER w) does:
This exploits the mathematical identities that:
[1]: log(a * b * c) = log (a) + log (b) + log (c)
[2]: exp (log a) = a
[combining 1&2]: exp(log a + log b + log c) = a * b * c
SQL does not have an aggregate multiply operation, so to perform an aggregate multiply operation, we first have to take the log of each value, then we can use the sum aggregate function to give us the the log of the values' product. This we invert with the final exponentiation step.
This works as long as the values being multiplied are positive as log is undefined for 0 and negative numbers. If you have negative numbers, or zero, the trick is to check if any value is 0, then the whole aggregation is 0, and check if the number of negative values is even, then the result is positive, else it is negative. Alternatively, you could also convert the reals to the complex plane and then use the identity Log(z) = ln(r) - iπ
what ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW does
This declares an expanding window frame that includes all preceding rows, and the current row.
e.g.
when i equals 1 the values in this window frame are {1}
when i equals 2 the values in this window frame are {1,2}
when i equals 3 the values in this window frame are {1,2,3}
what is a recursive query
A recursive query lets you express recursive logic using SQL. Recursive queries are often used to generate parent-child relationships from relational data (think manager-report, or product classification hierarchy), but they can generally be used to query any tree like structure.
Here is a SO answer I wrote a while back that illustrates and explains some of the capabilities of recursive queries.
There are also a tonne of useful tutorials on recursive queries. It is a very powerful sql-language feature and solves a type of problem that are very difficult do do without recursion.
Hope this gives you more insight into what the code does. Happy learning!

Iterate over current row values in kdb query

Consider the table:
q)trade
stock price amt time
-----------------------------
ibm 121.3 1000 09:03:06.000
bac 5.76 500 09:03:23.000
usb 8.19 800 09:04:01.000
and the list:
q)x: 10000 20000
The following query:
q)select from trade where price < x[first where (x - price) > 100f]
'length
fails as above. How can I pass the current row value of price in each iteration of the search query?
While price[0] in the square brackets above works, that's obviously not what I want. I even tried price[i] but that gives the same error.

Postgresql - VALUE between two columns

I have a long list of six digit numbers (e.g. 123456)
In my postgresql DB I have a table with two columns start_value and end_value. The table has rows with start and end values which are 9 digits in length and represent a range of numbers i.e. start_value might be 123450000 and end_value might be 123459999.
I need to match each of the six digit numbers with it's row in the DB table which falls in its range.
For many numbers in my list I can simply run the following
SELECT * FROM table WHERE start_value=(number + 000)
However, this does not cover numbers which fall inside a range, but do not match this pattern.
I have been trying statements such as:
SELECT * FROM table WHERE start_value > (number + 000) AND end_value < (number + 999)
But this doesn't work because some rows cover larger ranges than xxxxx0000 to xxxxx9999 and so the statement above may return 20 rows or none.
Any points would be most welcome!
EDIT: the Data Type of the columns are numeric(25)

Assuming number is numeric:
select *
from table
where number * 1000 between start_value and end_value

Ok, so if I'm understanding correctly, first you need to pad your search value to 9 digits. You can do that with this - 12345 * (10 ^ (9 - length(12345::text))).
length(12345::text) gets the number of digits you currently have, then it subtracts that from 9 and multiplies your search value by 10 to the power of the result. Then you just throw it in your search. The resulting query looks something like this -
SELECT * FROM table WHERE (12345 * (10 ^ (9 - length(12345::text)))) > start_value AND (12345 * (10 ^ (9 - length(12345::text)))) < end_value
You could also use the BETWEEN operator, but it is inclusive, which doesn't match the example query you have.

POSTGRESQL
Some time we stuck in data type casting problems and null value exceptions.
SELECT *
FROM TABLE
WHERE COALESCE(number::int8, 0::int8) * 1000 BETWEEN start_value::int8 AND end_value::int8
;
number::int8 type cast to integer
start_value::int8 type cast to integer
COALESCE(number::int8, 0::int8)
return number or zero if value is empty to avoid exceptions