sql issue of 'IN' syntax in Avatica calcite - druid

I'm using Avatica calcite as a JDBC driver to query Druid DB. I found the 'IN' syntax CAN NOT followed by more than 19 elements. e.g
SELECT * FROM ds1 WHERE city_id IN
(1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19)
this works, but this one errors:
SELECT * FROM ds1 WHERE city_id IN
(1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20)
how can I use the 'IN' syntax with more than 19 elements ?

The reason this doesn't work is described in https://github.com/druid-io/druid/issues/4203. It should be fixed in Druid SQL after Calcite 1.14 is released, which will let us customize its behavior a bit more.
Until then, try the workaround suggested by #melpomene, which should work:
SELECT * FROM ds1 WHERE
city_id IN (1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19)
OR city_id IN (20,21,22)

Related

Pivot function without manually typing values in `for in`?

Documentation provides an example of using the pivot() function.
SELECT *
FROM (SELECT partname, price FROM part) PIVOT (
AVG(price) FOR partname IN ('prop', 'rudder', 'wing')
);
I would like to use pivot() without having to manually specify each value of partname. I want all parts. I tried:
SELECT *
FROM (SELECT partname, price FROM part) PIVOT (
AVG(price) FOR partname);
That gave an error. Then tried:
SELECT *
FROM (SELECT partname, price FROM part) PIVOT (
AVG(price) FOR partname IN (select distinct partname from part)
);
That also threw an error.
How can I tell Redshift to include all values of partname in the pivot?
I don't think this can be done in a simple single query. This would mean that the query compiler would need to work without knowing how many output columns will be produced. I don't think it can do that.
You can do this in multiple queries - use a query to create the list of partnames and then use this to "generate" a second query that populates the IN list. So something needs issue these queries and generated the second. This can be some code external to Redshift (lots of options) or a stored procedure in Redshift. This code, no matter where it exists, should understand that Redshift has a max number of columns limit - 1,600.
The Redshift docs are fairly good on the topic of dynamic SQL for stored procedures. The EXECUTE statement will be used to fire off the second query in a stored procedure. See: https://docs.aws.amazon.com/redshift/latest/dg/c_PLpgSQL-statements.html

Problem with a query using GROUP BY in PostgreSQL

I'm using the query tool in PGAdmin 4.20 and trying the following query:
select * from metadatavalue group by resource_id order by resource_id;
And I'm getting the following:
ERROR: column "metadatavalue.metadata_value_id" must appear in the
GROUP BY clause or be used in an aggregate function LINE 3: select *
from metadatavalue group by resource_id order by re...
^ SQL state: 42803 Character: 176
The thing is that in another table, I use the same syntax and works:
select * from metadatafieldregistry group by metadata_field_id order by metadata_field_id;
Also, I'm not getting all the entries from a same resource_id, only a few. Could these two problems be related?
Please, help!
Thank you in advance.

How to find data length and index length of particular tables in a postgresql schema?

I am migrating my application from MySQL to PostgreSQL. In MySQL, I have used a query to calculate memory size of particular tables as per my requirement using the below Query,
SELECT SUM(ROUND(((DATA_LENGTH + INDEX_LENGTH) / 1024 / 1024),2)) 'Size in MB' FROM INFORMATION_SCHEMA.TABLES where TABLE_NAME like 'table_name';
But I can't do the same in PostgreSQL. If I run the same query it is displaying as,
ERROR: type "sum" does not exist
And I have tried the solutions in various sites but I can't find the exact solution for my requirement. Please suggest me a solution. Thanks in advance.
PostgreSQL does not have this information in INFORMATION_SCHEMA.TABLES but you can try using the table SVV_TABLE_INFO to get the table size instead.
Hope it helps!
Your query query is failing because you're using single quotes for alias - 'Size in MB'.
Further, there's no DATA_LENGTH and INDEX_LENGTH available in INFORMATION_SCHEMA.TABLES
In Postgres, you may use pg_total_relation_size function.
SELECT ROUND(pg_total_relation_size (oid) / ( 1024.0 * 1024.0 ), 2) AS
"Size in MB"
FROM pg_class c
WHERE relname = 'table_name';
Have a look at this answer to know the functions available to get various sizes.
DEMO

Query is not working in Impala

(SELECT CONCAT('ABCDE',SUM((SELECT MAX(id) FROM optigo_data.admin_userdetails LIMIT 1)+1)))
Above is working in Mysql but its not working in Impala/Hive, please help me out.
Error: sub query is not supported.
Reason for this error is that Impala only supports subqueries in the FROM and WHERE clause.
I think this would be the impala equivalent:
SELECT
CONCAT('ABCDE',cast(SUM(t.value+1) as string))
from
(SELECT MAX(cast(id as int)) as value
FROM optigo_data.admin_userdetails LIMIT 1) as t
But looking at the query assuming your goal is to produce a string containing the highest ID + 1, a simpler solution would be:
SELECT
CONCAT('ABCDE',cast(MAX(cast(id as int)+1) as string))
FROM optigo_data.admin_userdetails
Please correct me if my assumption is incorrect. You can drop the cast as int if id is already in numeric format.

Aliases in Cassandra CQL

My question is about using aliases in CQL queries.
For example in SQL we can write:
SELECT p.name FROM Persons as p
Is there something similar in CQL?
From cassandra 2.0, CQL 3 supports aliases in SELECT
http://www.datastax.com/dev/blog/cql-in-cassandra-2-0
SELECT event_id,
dateOf(created_at) AS creation_date,
blobAsText(content) AS content
FROM timeline;
When I browsed through the documentation of CQL3 I didn't find any reference to using the as alias.
I'd advise you to have a read through datastax's documentation on what the SELECT statement can and cant do in `CQL 3.