postgresql: exclude data based on other incomplete data - postgresql

In this data - there are multiple DATA_ID values associated with time-series data. I am trying to exclude all data from any DATA_ID values that return a NULL value for USE for any timestamp value.
In other words, I only want to return DATA_ID values (and their data) if they have complete (not any NULL) values for all timestamp values.
Sample query given below:
SELECT
My.Table.DATA_ID,
MY.Table.timestamp,
My.Table.USE
FROM
My.TABLE
WHERE timestamp BETWEEN '2012-06-01 00:00:00' AND '2012-06-02 23:59:59'
-- Something here that says exclude all data from DATA_ID(s)
-- with any missing USE data, i.e. USE=NULL
ORDER BY DATA_ID, timestamp

Assuming I understand your question correctly and you want to exclude whole batches of samples (determined by equal data_id and timestamp) that contain a null value.
SELECT
My.Table.DATA_ID,
MY.Table.timestamp,
My.Table.USE
FROM
My.TABLE o
WHERE timestamp BETWEEN '2012-06-01 00:00:00' AND '2012-06-02 23:59:59'
and not exists (select 1 from my_table i
where i.use is null
and i.data_id = o.data_id
and i.timestamp BETWEEN '2012-06-01 00:00:00' AND '2012-06-02 23:59:59')
ORDER BY DATA_ID, timestamp

The simple thing to do is something like this:
CREATE FUNCTION missing_info(MY.TABLE)
RETURNS BOOL
LANGUAGE SQL AS
$$ select $1.use is null -- chain conditions together with or.
-- no from clause needed. no where clause needed.
$$;
Then you can just add:
where (My.Table).missing_info is not true;
And as you need to change the logic as to what sorts of info is missing you can just change it in the function and everything still works.
This is the sort of encapsulation of derived information where ORDBMS's like PostgreSQL really shine.
Edit: Re-reading your example, it looks like what you are looking for is the IS NULL operator. However if you need to re-use some sort of logic, see the above example. NULL never "equals" NULL (because we can't say whether two unknown values are the same). But IS NULL tells you whether it is NULL or not.

Related

postgresql order the null date to bottom [duplicate]

I have a SQL table with a datetime field. The field in question can be null. I have a query and I want the results sorted ascendingly by the datetime field, however I want rows where the datetime field is null at the end of the list, not at the beginning.
Is there a simple way to accomplish that?
select MyDate
from MyTable
order by case when MyDate is null then 1 else 0 end, MyDate
(A "bit" late, but this hasn't been mentioned at all)
You didn't specify your DBMS.
In standard SQL (and most modern DBMS like Oracle, PostgreSQL, DB2, Firebird, Apache Derby, HSQLDB and H2) you can specify NULLS LAST or NULLS FIRST:
Use NULLS LAST to sort them to the end:
select *
from some_table
order by some_column DESC NULLS LAST
I also just stumbled across this and the following seems to do the trick for me, on MySQL and PostgreSQL:
ORDER BY date IS NULL, date DESC
as found at https://stackoverflow.com/a/7055259/496209
If your engine allows ORDER BY x IS NULL, x or ORDER BY x NULLS LAST use that. But if it doesn't these might help:
If you're sorting by a numeric type you can do this: (Borrowing the schema from another answer.)
SELECT *
FROM Employees
ORDER BY ISNULL(DepartmentId*0,1), DepartmentId;
Any non-null number becomes 0, and nulls become 1, which sorts nulls last because 0 < 1.
You can also do this for strings:
SELECT *
FROM Employees
ORDER BY ISNULL(LEFT(LastName,0),'a'), LastName
Any non-null string becomes '', and nulls become 'a', which sorts nulls last because '' < 'a'.
This even works with dates by coercing to a nullable int and using the method for ints above:
SELECT *
FROM Employees
ORDER BY ISNULL(CONVERT(INT, HireDate)*0, 1), HireDate
(Lets pretend the schema has HireDate.)
These methods avoid the issue of having to come up with or manage a "maximum" value of every type or fix queries if the data type (and the maximum) changes (both issues that other ISNULL solutions suffer). Plus they're much shorter than a CASE.
You can use the built-in function to check for null or not null, as below. I test it and its working fine.
select MyDate from MyTable order by ISNULL(MyDate,1) DESC, MyDate ASC;
order by coalesce(date-time-field,large date in future)
When your order column is numeric (like a rank) you can multiply it by -1 and then order descending. It will keep the order you're expecing but put NULL last.
select *
from table
order by -rank desc
In Oracle, you can use NULLS FIRST or NULLS LAST: specifies that NULL values should be returned before / after non-NULL values:
ORDER BY { column-Name | [ ASC | DESC ] | [ NULLS FIRST | NULLS LAST ] }
For example:
ORDER BY date DESC NULLS LAST
Ref: http://docs.oracle.com/javadb/10.8.3.0/ref/rrefsqlj13658.html
If you're using MariaDB, they mention the following in the NULL Values
documentation.
Ordering
When you order by a field that may contain NULL values, any NULLs are
considered to have the lowest value. So ordering in DESC order will see the
NULLs appearing last. To force NULLs to be regarded as highest values, one can
add another column which has a higher value when the main field is NULL.
Example:
SELECT col1 FROM tab ORDER BY ISNULL(col1), col1;
Descending order, with NULLs first:
SELECT col1 FROM tab ORDER BY IF(col1 IS NULL, 0, 1), col1 DESC;
All NULL values are also regarded as equivalent for the purposes of the
DISTINCT and GROUP BY clauses.
The above shows two ways to order by NULL values, you can combine these with the
ASC and DESC keywords as well. For example the other way to get the NULL values
first would be:
SELECT col1 FROM tab ORDER BY ISNULL(col1) DESC, col1;
-- ^^^^
SELECT *
FROM Employees
ORDER BY ISNULL(DepartmentId, 99999);
See this blog post.
Thanks RedFilter for providing excellent solution to the bugging issue of sorting nullable datetime field.
I am using SQL Server database for my project.
Changing the datetime null value to '1' does solves the problem of sorting for datetime datatype column. However if we have column with other than datetime datatype then it fails to handle.
To handle a varchar column sort, I tried using 'ZZZZZZZ' as I knew the column does not have values beginning with 'Z'. It worked as expected.
On the same lines, I used max values +1 for int and other data types to get the sort as expected. This also gave me the results as were required.
However, it would always be ideal to get something easier in the database engine itself that could do something like:
Order by Col1 Asc Nulls Last, Col2 Asc Nulls First
As mentioned in the answer provided by a_horse_with_no_name.
Solution using the "case" is universal, but then do not use the indexes.
order by case when MyDate is null then 1 else 0 end, MyDate
In my case, I needed performance.
SELECT smoneCol1,someCol2
FROM someSch.someTab
WHERE someCol2 = 2101 and ( someCol1 IS NULL )
UNION
SELECT smoneCol1,someCol2
FROM someSch.someTab
WHERE someCol2 = 2101 and ( someCol1 IS NOT NULL)
USE NVL function
select * from MyTable order by NVL(MyDate, to_date('1-1-1','DD-MM-YYYY'))
Here's the alternative of NVL in most famous DBMS
order by -cast([nativeDateModify] as bigint) desc

Is there a way to create a generated column in postgresql that will store a boolean value from comparing 2 dates?

I am trying to create a new generated column call memberstat which is a boolean that will hold just a 'True' or 'false' if the current date is greater than expiration date.
So far, whenever i create 'memberstat boolean generated always as (case when expiredate < current_date then '0' else '1' end) stored' this generates an ERROR: generation expression is not immutable.
Is there a way around this? sorry I am not that familiar with postgresql
As you have seen the function CURRENT_DATE cannot be used is a virtual column definition because it will change without the values in the row changing. A virtual column must be based only on the data in the same row.
The solution is to create a view using either your case statement or a simple comparison operator.
CREATE VIEW expired_or_not AS
SELECT
product_id,
product_name,
expiredate < current_date AS "expired"
FROM table_name;
which will return t or f.

how to coalesce timestamp with not null constraint postgres

insert into employee(eid,dojo) SELECT
14,coalesce(to_char(dojo,'dd-mm-yyyy'),'')
from employee;
I have to insert into table by selecting it from table,my column dojo has not null constraint and timestamp doesn't allow '' to insert please provide an alternate for this if timestamp is null from select query
Your current query has severals problems, two of which I think my answer can resolve. First, you are trying to insert an empty string '' to handle NULL values in the dojo column. This won't work, because empty string is not a valid timestamp. As others have pointed out, one solution would be to use current_timestamp as a placeholder.
Another problem you have is that you are incorrectly using to_char to format your timestamp data. The output of to_char is a string, and the way you are using it would cause Postgres to reject it. Instead, you should be using to_timestamp(), which can parse a string and return a timestamp. Something like the following is what I believe you intend to do:
insert into employee (eid, dojo)
select 14, coalesce(to_timestamp(dojo, 'DD/MM/YYYY HH:MI:SS PM'), current_timestamp)
from employee;
This assumes that your timestamp data is formatted as follows:
DD/MM/YYYY HH:MI:SS PM (e.g. 19/2/1995 12:00:00 PM)
It also is not clear to me why you are inserting back into the employee table which has non usable data, rather than inserting into a new table. If you choose to reuse employee you might want to scrub away the bad data later.
you can use some default date value like 1st jan 1900 or now()
your query should be like
insert into employee(eid,dojo) SELECT
14,coalesce(to_char(dojo,'dd-mm-yyyy'),now())
from employee;
There is no such thing as a non-null yet blank timestamp. NULL = blank.
There is literally nothing you can do but store a valid timestamp or a null. Since you have a non-null constraint your only option is to pick a default timestamp that you consider "blank".
Using a hard coded date to indicate a blank value is a terrible terrible terrible idea btw. If it is blank, remove the not null constraint, make it null and move on.
I am not trying to be condescending but I do not think you understand nulls. See here
https://en.wikipedia.org/wiki/Null_(SQL)

date_trunc on timestamp column returns nothing

I have a strange problem when retrieving records from db after comparing a truncated field with date_trunc().
This query doesn't return any data:
select id from my_db_log
where date_trunc('day',creation_date) >= to_date('2014-03-05'::text,'yyyy-mm-dd');
But if I add the column creation_date with id then it returns data(i.e. select id, creation_date...).
I have another column last_update_date having same type and when I use that one, still does the same behavior.
select id from my_db_log
where date_trunc('day',last_update_date) >= to_date('2014-03-05'::text,'yyyy-mm-dd');
Similar to previous one. it also returns record if I do id, last_update_date in my select.
Now to dig further, I have added both creation_date and last_updated_date in my where clause and this time it demands to have both of them in my select clause to have records(i.e. select id, creation_date, last_update_date).
Does anyone encountered the same problem ever? This similar thing works with my other tables which are having this type of columns!
If it helps, here is my table schema:
id serial NOT NULL,
creation_date timestamp without time zone NOT NULL DEFAULT now(),
last_update_date timestamp without time zone NOT NULL DEFAULT now(),
CONSTRAINT db_log_pkey PRIMARY KEY (id),
I have asked a different question earlier that didn't get any answer. This problem may be related to that one. If you are interested on that one, here is the link.
EDITS:: EXPLAIN (FORMAT XML) with select * returns:
<explain xmlns="http://www.postgresql.org/2009/explain">
<Query>
<Plan>
<Node-Type>Result</Node-Type>
<Startup-Cost>0.00</Startup-Cost>
<Total-Cost>0.00</Total-Cost>
<Plan-Rows>1000</Plan-Rows>
<Plan-Width>658</Plan-Width>
<Plans>
<Plan>
<Node-Type>Result</Node-Type>
<Parent-Relationship>Outer</Parent-Relationship>
<Alias>my_db_log</Alias>
<Startup-Cost>0.00</Startup-Cost>
<Total-Cost>0.00</Total-Cost>
<Plan-Rows>1000</Plan-Rows>
<Plan-Width>658</Plan-Width>
<Node/s>datanode1</Node/s>
<Coordinator-quals>(date_trunc('day'::text, creation_date) >= to_date('2014-03-05'::text, 'yyyy-mm-dd'::text))</Coordinator-quals>
</Plan>
</Plans>
</Plan>
</Query>
</explain>
"Impossible" phenomenon
The number of rows returned is completely independent of items in the SELECT clause. (But see #Craig's comment about SRFs.) Something must be broken in your db.
Maybe a broken covering index? When you throw in the additional column, you force Postgres to visit the table itself. Try to re-index:
REINDEX TABLE my_db_log;
The manual on REINDEX. Or:
VACUUM FULL ANALYZE my_db_log;
Better query
Either way, use instead:
select id from my_db_log
where creation_date >= '2014-03-05'::date
Or:
select id from my_db_log
where creation_date >= '2014-03-05 00:00'::timestamp
'2014-03-05' is in ISO 8601 format. You can just cast this string literal to date. No need for to_date(), works with any locale. The date is coerced to timestamp [without time zone] automatically when compared to creation_date (being timestamp [without time zone]). More details about timestamps in Postgres here:
Ignoring timezones altogether in Rails and PostgreSQL
Also, you gain nothing by throwing in date_trunc() here. On the contrary, your query will be slower and any plain index on the column cannot be used (potentially making this much slower)

PostgreSQL - get records with null values

I'm trying to get a query which would show distributors that haven't sell anything in 90 days, but the problem I get is with NULL values. It seems PostgreSQL ignores null values, even when I queried to show it (or maybe I did it in wrong way).
Let say there are 1000 distributors, but with this query I only get 1 distributor, but there should be more distributors that didn't sell anything, because if I write SQL query to show distributors that sold by any amount in the last 90 days, it shows about 500. So I wonder where are those other 499? If I understand correctly, those other 499, didn't have any sales, so all records are null and are not showed in query.
Does anyone know how to make it show null values of one table where in relation other table is not null? (like partners table (res_partner) is not null, but sale_order table (sales) or object is null? (I also tried to filter like so.id IS NULL, but in such way I get empty query)
Code of my query:
(
SELECT
min(f1.id) as id,
f1.partner as partner,
f1.sum1
FROM
(
SELECT
min(f2.id) as id,
f2.partner as partner,
sum(f2.null_sum) as sum1
FROM
(
SELECT
min(rp.id) as id,
rp.search_name as partner,
CASE
WHEN
sol.price_subtotal IS NULL
THEN
0
ELSE
sol.price_subtotal
END as null_sum
FROM
sale_order as so,
sale_order_line as sol,
res_partner as rp
WHERE
sol.order_id=so.id and
so.partner_id=rp.id
and
rp.distributor=TRUE
and
so.date_order <= now()::timestamp::date
and
so.date_order >= date_trunc('day', now() - '90 day'::interval)::timestamp::date
and
rp.contract_date <= date_trunc('day', now() - '90 day'::interval)::timestamp::date
GROUP BY
partner,
null_sum
)as f2
GROUP BY
partner
) as f1
WHERE
sum1=0
GROUP BY
partner,
sum1
)as fld
EDIT: 2012-09-18 11 AM.
I think I understand why Postgresql behaves like this. It is because of the time interval. It checks if there is any not null value in that inverval. So it only found one record, because that record had sale order with zero (it was not converted from null to zero) and part which checked for null values was just skipped. If I delete time interval, then I would see all distributors that didn't sell anything at all. But with time interval for some reason it stops checking null values and looks if there are only not null values.
So does anyone know how to make it check for null values too in given interval?.. (for the last 90 days to be exact)
Aggregates like sum() and and min() do ignore NULL values. This is required by the SQL standard and every DBMS I know behaves like that.
If you want to treat a NULL value as e.g. a zero, then use something like this:
sum(coalesce(f2.null_sum, 0)) as sum1
But as far as I understand you question and your invalid query you actually want an outer join between res_partner and the sales tables.
Something like this:
SELECT min(rp.id) as id,
rp.search_name as partner,
sum(coalesce(sol.price_subtotal,0)) as price_subtotal
FROM res_partner as rp
LEFT JOIN sale_order as so ON so.partner_id=rp.id and rp.distributor=TRUE
LEFT JOIN sale_order_line as sol ON sol.order_id=so.id
WHERE so.date_order <= CURRENT_DATE
and so.date_order >= date_trunc('day', now() - '90 day'::interval)::timestamp::date
and rp.contract_date <= date_trunc('day', now() - '90 day'::interval)::timestamp::date
GROUP BY rp.search_name
I'm not 100% sure I understood your problem correctly, but it might give you a headstart.
Try to name subqueries, and retrieve their columns with col.q1, col.q2 etc. to make sure which column from which query/subquery you're dealing with. Maybe it's somewhat simple, e.g. it unites some rows containing only NULLs into one row? Also, at least for debugging purposes, it's smart to add , count(*) at the end of each query/subquery to get implicit number of rows returned on result.. hard to guess what exactly happened..