What is the difference between lj and ljf?
Looking at https://code.kx.com/q/ref/lj/ I don't ljf getting mentioned.
lj behaviour was changed in kdb 3.0 version. Old behaviour is supported by ljf function. From TimeStored:
The joins in 3.x for uj/ij and lj all changed how they treat nulls from the keyed table. In particular nulls now by default overwrite existing values. In the past nulls from the joining table did not overwrite and left the original value in the column
For more details and examples visit TimeStored.
Related
Following query is not randomizing array in postgres 10. Is this expected behaviour?
select array(select generate_series(1,10) order by random());
v9.4.15
array
------------------------
{7,1,10,6,2,8,9,4,5,3}
v10.4
array
------------------------
{1,2,3,4,5,6,7,8,9,10}
This is a consequence of commit 69f4b9c85f168ae006929eec44fc44d569e846b9 that changes how set-returning functions in the SELECT list are handled.
Tim's answer and your comment show how to deal with the problem.
I think the issue here is the newer version of Postgres has an optimizer which is getting smarter, and is caching away the value of random() after a single call to that function.
One workaround is to force a new random value to be calculated for each record. We can add a dummy WHERE clause to force this:
WITH cte AS (
select generate_series(1,10) AS col
)
SELECT col
FROM cte
WHERE col IS NOT NULL
ORDER BY random();
Demo
You may observe in the demo that the order is in fact random. However, in the same demo if you run your orignal query the order won't be random.
Edit:
The reason why this tricks works is that the WHERE clause convinces the optimizer that you really care about the values being used in each record. Therefore, it calls the function in ORDER BY once for each record rather than caching it.
Is there a configuration option somewhere or anything that will allow me to force postgres to use NULLS LAST on every query that uses DESC ordering?
I dont want to rewrite all queries from Criteria API to JPQL in my app, and it seems JPA Criteria API does not allow setting nulls last option.
No. At least I never heard of such. Simple check does not give hope for it either:
t=# select setting, name from pg_settings where name like '%null%';
setting | name
---------+-----------------------
on | array_nulls
off | transform_null_equals
(2 rows)
https://www.postgresql.org/docs/current/static/queries-order.html does not mention such global switches either, just:
The NULLS FIRST and NULLS LAST options can be used to determine
whether nulls appear before or after non-null values in the sort
ordering. By default, null values sort as if larger than any non-null
value; that is, NULLS FIRST is the default for DESC order, and NULLS
LAST otherwise.
I'm selecting distinct values from tables thru Java's JDBC connector and it seems that NULL value (if there's any) is always the first row in the ResultSet.
I need to remove this NULL from the List where I load this ResultSet. The logic looks only at the first element and if it's null then ignores it.
I'm not using any ORDER BY in the query, can I still trust that logic? I can't find any reference in Postgres' documentation about this.
You can add a check for NOT NULL. Simply like
select distinct columnName
from Tablename
where columnName IS NOT NULL
Also if you are not providing the ORDER BY clause then then order in which you are going to get the result is not guaranteed, hence you can not rely on it. So it is better and recommended to provide the ORDER BY clause if you want your result output in a particular output(i.e., ascending or descending)
If you are looking for a reference Postgresql document then it says:
If ORDER BY is not given, the rows are returned in whatever order the
system finds fastest to produce.
If it is not stated in the manual, I wouldn't trust it. However, just for fun and try to figure out what logic is being used, running the following query does bring the NULL (for no apparent reason) to the top, while all other values are in an apparent random order:
with t(n) as (values (1),(2),(1),(3),(null),(8),(0))
select distinct * from t
However, cross joining the table with a modified version of itself brings two NULLs to the top, but random NULLs dispersed througout the resultset. So it doesn't seem to have a clear-cut logic clumping all NULL values at the top.
with t(n) as (values (1),(2),(1),(3),(null),(8),(0))
select distinct * from t
cross join (select n+3 from t) t2
How can I modify this query to improve it?
I think that doing a join It'd be better.
UPDATE t1 HIJA
SET IND_ESTADO = 'P'
WHERE IND_ESTADO = 'D'
AND NOT EXISTS
(SELECT COD_OPERACION
FROM t1 PADRE
WHERE PADRE.COD_SISTEMA_ORIGEN = HIJA.COD_SISTEMA_ORIGEN
AND PADRE.COD_OPERACION = HIJA.COD_OPERACION_DEPENDIENTE)
Best regards.
According to this article by Quassnoi:
Oracle's optimizer is able to see that NOT EXISTS, NOT IN and LEFT JOIN / IS NULL are semantically equivalent as long as the list values are declared as NOT NULL.
It uses same execution plan for all three methods, and they yield same results in same time.
In Oracle, it is safe to use any method of the three described above to select values from a table that are missing in another table.
However, if the values are not guaranteed to be NOT NULL, LEFT JOIN / IS NULL or NOT EXISTS should be used rather than NOT IN, since the latter will produce different results depending on whether or not there are NULL values in the subquery resultset.
So what you have is already fine. A JOIN would be as good, but not better.
If performance is a problem, there are several guidelines for re-writing a where not exists into a more efficient form:
When given the choice between not exists and not in, most DBAs prefer to use the not exists clause.
When SQL includes a not in clause, a subquery is generally used, while with not exists, a correlated subquery is used.
In many case a NOT IN will produce the same execution plan as a NOT EXISTS query or a not equal query (!=).
In some case a correlated NOT EXISTS subquery can be re-written with a standard outer join with a NOT NULL test.
Some NOT EXISTS subqueries can be tuned using the MINUS operator.
See Burleson for more information.
I have a table of time series data where for almost all queries, I wish to select data ordered by collection time. I do have a timestamp column, but I do not want to use actual Timestamps for this, because if two entries have the same timestamp it is crucial that I be able to sort them in the order they were collected, which is information I have at Insert time.
My current schema just has a timestamp column. How would I alter my schema to make sure I can sort based on collection/insertion time, and make sure querying in collection/insertion order is efficient?
Add column based on sequence (i.e. serial), and create index on (timestamp_column, serial_column). Then you can have insertion order (more or less) by doing:
ORDER BY timestamp_column, serial_column;
You could use a SERIAL column called insert_order. This way there will be no two rows with the same value. However, I am not sure that you requirement of being in absolute time order is possible to achieve.
For example suppose there are two transactions, T1 and T2 and they do happen at the same time, and you are running on a machine with multiple processor, so in fact both T1 and T2 did the insert at exactly the same instant. Is this a case that you are concerned about? There was not enough info your question to know exactly.
Also with a serial column you have the issue of gaps, for example T1 cloud grab serial value 14 and T2 can grab value 15, then T1 rolls back and T2 does not, so you have to expect that the insert_order column might have gaps in it.