What should my WHERE clause be in a SQL Statement in which I want to return those rows where column A is null or column B is null, but not where both are null?
WHERE (ColA is NULL AND ColB is NOT NULL)
OR (ColB is NULL AND ColA is NOT NULL)
(A IS NULL OR B IS NULL) AND NOT (A IS NULL AND B IS NULL)
Related
I'd like to make a query from which the result is not depending on the conditions in where clause when conditions are null while it is depending on the conditions when they are not null.
A query I made is
select * from mytable where (num_lot = :num_lot or :num_lot is null) and date_work between :date_start and :date_stop
When :num_lot is null, the result was not depending on the num_lot, which was what I wanted.
But :date_start and :date_stop was null, no rows were returned rather than not depending on :date_start and :date_stop.
SELECT * FROM mytable WHERE
num_lot=COALESCE(:num_lot,num_lot) AND
date_work BETWEEN COALESCE(:date_start,date_work) and COALESCE(:date_stop,date_work)
when the verified value is NULL it is replaced with the column value i.e. always true.
Use coalesce() to check if both :date_start and :date_stop are null:
select *
from mytable
where (num_lot = :num_lot or :num_lot is null)
and (date_work between :date_start and :date_stop or coalesce(:date_start, :date_stop) is null)
or:
select *
from mytable
where (num_lot = :num_lot or :num_lot is null)
and ((date_work between :date_start and :date_stop) or (:date_start is null and :date_stop is null))
How to remove columns containing only null values from a table? Suppose I have a table -
SnapshotDate CreationDate Country Region CloseDate Probability BookingAmount RevenueAmount SnapshotDate1 CreationDate1 CloseDate1
null null null null null 25 882000 0 null null null
null null null null null 25 882000 0 null null null
null null null null null 0 882000 0 null null null
null null null null null 0 882000 0 null null null
null null null null null 0 882000 0 null null null
null null null null null 0 882000 0 null null null
null null null null null 0 882000 0 null null null
null null null null null 0 882000 0 null null null
null null null null null 0 882000 0 null null null
null null null null null 0 882000 0 null null null
null null null null null 0 882000 0 null null null
null null null null null 0 882000 0 null null null
null null null null null 0 882000 0 null null null
null null null null null 0 882000 0 null null null
null null null null null 0 882000 0 null null null
null null null null null 0 882000 0 null null null
null null null null null 0 882000 0 null null null
null null null null null 0 882000 0 null null null
null null null null null 0 882000 0 null null null
null null null null null 0 882000 0 null null null
So I would just like to have Probability, BookingAmount and RevenueAmount columns and ignore the rest.
Is there a way to dynamically select the columns?
I am using spark 1.6.1
I solved this with a global groupBy. This works for numeric and non-numeric columns:
case class Entry(id: Long, name: String, value: java.lang.Float)
val results = Seq(
Entry(10, null, null),
Entry(10, null, null),
Entry(20, null, null)
)
val df: DataFrame = spark.createDataFrame(results)
// mark all columns with null only
val row = df
.select(df.columns.map(c => when(col(c).isNull, 0).otherwise(1).as(c)): _*)
.groupBy().max(df.columns.map(c => c): _*)
.first
// and filter the columns out
val colKeep = row.getValuesMap[Int](row.schema.fieldNames)
.map{c => if (c._2 == 1) Some(c._1) else None }
.flatten.toArray
df.select(row.schema.fieldNames.intersect(colKeep)
.map(c => col(c.drop(4).dropRight(1))): _*).show(false)
+---+
|id |
+---+
|10 |
|10 |
|20 |
+---+
Edit: I removed the shuffling of columns. The new approach keeps the given order of the columns.
You can add custom udf, and it in Spark SQL.
sqlContext.udf.register("ISNOTNULL", (str: String) => Option(str).getOrElse(""))
And with Spark SQL you can do :
SELECT ISNOTNULL(Probability) Probability, ISNOTNULL(BookingAmount) BookingAmount, ISNOTNULL(RevenueAmount) RevenueAmount FROM df
I made a constraint where to mark the column completed to true some of the other columns would have to have a value.
But for some reason the constraint does not complain when I leave a specified column blank when completed is marked true. I have also purposely inserted NULL a specified column and still no constraint.
Any ideas?
CREATE TABLE info (
id bigserial PRIMARY KEY,
created_at timestamptz default current_timestamp,
posted_by text REFERENCES users ON UPDATE CASCADE ON DELETE CASCADE,
title character varying(31),
lat numeric,
lng numeric,
contact_email text,
cost money,
description text,
active boolean DEFAULT false,
activated_date date,
deactivated_date date,
completed boolean DEFAULT false,
images jsonb,
CONSTRAINT columns_null_check CHECK (
(completed = true
AND posted_by != NULL
AND title != NULL
AND lat != NULL
AND lng != NULL
AND contact_email != NULL
AND cost != NULL
AND description != NULL
AND images != NULL) OR completed = false)
);
In Chapter 9. Functions and Operators:
To check whether a value is or is not null, use the predicates:
expression IS NULL
expression IS NOT NULL
or the equivalent, but nonstandard, predicates:
expression ISNULL
expression NOTNULL
Therefore you can not use value != NULL to check null values, you can only use value IS NULL and value IS NOT NULL.
For boolean values they are the same:
Boolean values can also be tested using the predicates
boolean_expression IS TRUE
boolean_expression IS NOT TRUE
boolean_expression IS FALSE
boolean_expression IS NOT FALSE
boolean_expression IS UNKNOWN
boolean_expression IS NOT UNKNOWN
I get the select output as null for the following Hive table.
Describe studentdetails;
clustername string
schemaname string
tablename string
primary_key map<string,int>
schooldata struct<alternate_aliases:string,application_deadline:bigint,application_deadline_early_action:string,application_deadline_early_decision:bigint,calendaring_system:string,fips_code:string,funding_type:string,gender_preference:string,iped_id:bigint,learning_environment:string,mascot:string,offers_open_admission:boolean,offers_rolling_admission:boolean,region:string,religious_affiliation:string,school_abbreviation:string,school_colors:string,school_locale:string,school_term:string,short_name:string,created_date:bigint,modified_date:bigint,percent_students_outof_state:float> from deserializer
deletedind boolean
truncatedind boolean
versionid bigint
select * from studentdetails limit 3;
Output :
NULL NULL NULL NULL NULL NULL NULL NULL
NULL NULL NULL NULL NULL NULL NULL NULL
NULL NULL NULL NULL NULL NULL NULL NULL
I have used the following properties while creating the table.
ROW FORMAT SERDE 'org.openx.data.jsonserde.JsonSerDe'
WITH SERDEPROPERTIES ("ignore.malformed.json" = "true")
And the following properties while selecting the data.
SET hive.exec.compress.output=true;
SET io.seqfile.compression.type=BLOCK;
SET mapred.output.compression.codec=org.apache.hadoop.io.compress.GzipCodec;
ADD JAR s3://emr/hive/lib/hive-serde-1.0.jar;
Thank you for the comments, I have found the solution for this.
The issue was that the column name in my json file and the column name that i used while creating the table was different.
When i synched the column names between the Hive table and the Json File the issue was resolved.
Thanks & Regards,
Srivignesh KN
This is something very basic, but I can't understand it, and the manual is not helping:
declare #rule int =
(select id from menu_availability_rules
where (daily_serving_start = null or
(daily_serving_start is null and null is null)) and
(daily_serving_end = null or
(daily_serving_end is null and null is null)) and
(weekly_service_off = 3 or
(weekly_service_off is null and 3 is null)) and
(one_time_service_off = null or
(one_time_service_off is null and null is null)));
print #rule;
-- syntax error here --\/
if (#rule is not null) raiseerror ('test error', 42, 42);
if #rule is not null
begin
delete from menu_availability
where menu_id = 5365 and rule_id = #rule
delete from menu_availability_rules
where (daily_serving_start = null or
(daily_serving_start is null and null is null)) and
(daily_serving_end = null or
(daily_serving_end is null and null is null)) and
(weekly_service_off = 3 or
(weekly_service_off is null and 3 is null)) and
(one_time_service_off = null or
(one_time_service_off is null and null is null))
and not exists
(select rule_id from menu_availability
where rule_id = #rule)
end
Why is it a syntax error? How would I write it? I need to throw error for debugging purposes, just to make sure the code reached the conditional branch.
I can just replace the raiseerror with select 1 / 0 and I will get what I need, but why can't I do it normally?
The correct name is RAISERROR.