What does 'granularity' affect in druid's select query? - druid

As it says on druid doc, Select queries return raw Druid rows, so what does granularity mean in select query? In my opinion, select doesn't need this argument since it returns raw rows.

Yes granularity doesn't have much to do in case of select query returning raw data.
Though its not mandatory field and defaults to ALL.

Related

How PostgreSQL orders the data while using a DISTINCT * option

When i use SELECT * FROM table, PostgreSQL is returning the data ordered by id. But when i use SELECT DISTINCT * FROM table, PostgreSQL is returning the same dataset as there are no duplicates but the order has been changed which is beyond my understanding.
How does PostgreSQL sort the data while using DISTINCT * and without specifying any ORDER BY clause.
If you put DISTINCT into a query, PostgreSQL sorts the result set by all result columns in order to eliminate duplicates. The sort order is “implementation defined” unless you add an explicit ORDER BY clause.
Two remarks:
without the DISTINCT, the table is returned in id order because you inserted it that way and performed no updates or deletes, and because there are no concurrent sequential scans on the table. You can never rely on an order in the result set unless you use ORDER BY.
DISTINCT can be very expensive on large result sets. Use it only if you are certain you need it.

How to limit the maximum no. of rows that a select query can return

Need to limit the maximum no. of rows that a select query can return when its done using a particular login. So basically we would like to append a limit clause to any select query fired by a particular user.
Is this possible in PostgreSQL? If so, how can this be done?
Note:- we need to force the limiting without the need for any intervention from the person executing the query. Which means creating of functions doesn't help in this scenario as the user has to add/append the function name in the SELECT query.
The user just has to apply SELECT * FROM TABLE NAME and the return data limited instead of returning the Whole data.
Thanks in Advance.

How can I get the total run time of a query in redshift, with a query?

I'm in the process of benchmarking some queries in redshift so that I can say something intelligent about changes I've made to a table, such as adding encodings and running a vacuum. I can query the stl_query table with a LIKE clause to find the queries I'm interested in, so I have the query id, but tables/views like stv_query_summary are much too granular and I'm not sure how to generate the summarization I need!
The gui dashboard shows the metrics I'm interested in, but the format is difficult to store for later analysis/comparison (in other words, I want to avoid taking screenshots). Is there a good way to rebuild that view with sql selects?
To add to Alex answer, I want to comment that stl_query table has the inconvenience that if the query was in a queue before the runtime then the queue time will be included in the run time and therefore the runtime won't be a very good indicator of performance for the query.
To understand the actual runtime of the query, check on stl_wlm_query for the total_exec_time.
select total_exec_time
from stl_wlm_query
where query='query_id'
There are some usefuls tools/scripts in https://github.com/awslabs/amazon-redshift-utils
Here is one of said scripts stripped out to give you query run times in milliseconds. Play with the filters, ordering etc to show the results you are looking for:
select userid, label, stl_query.query, trim(database) as database, trim(querytxt) as qrytext, starttime, endtime, datediff(milliseconds, starttime,endtime)::numeric(12,2) as run_milliseconds,
aborted, decode(alrt.event,'Very selective query filter','Filter','Scanned a large number of deleted rows','Deleted','Nested Loop Join in the query plan','Nested Loop','Distributed a large number of rows across the network','Distributed','Broadcasted a large number of rows across the network','Broadcast','Missing query planner statistics','Stats',alrt.event) as event
from stl_query
left outer join ( select query, trim(split_part(event,':',1)) as event from STL_ALERT_EVENT_LOG group by query, trim(split_part(event,':',1)) ) as alrt on alrt.query = stl_query.query
where userid <> 1
-- and (querytxt like 'SELECT%' or querytxt like 'select%' )
-- and database = ''
order by starttime desc
limit 100

DB2 to Netezza Migration

I have one query in DB2 which has mentioned below.
What would be the syntax for the same in NETEZZA?
select distinct acct_num from GTD_demo_dim where ACCT_NUM fetch first 1 rows only);
First, I don't think your statement is valid.
select distinct acct_num from GTD_demo_dim where ACCT_NUM fetch first 1 rows only);
The where clause needs to be finished and you've used a closing parenthesis without an opening one.
fetch first is common (standard?) ODBC syntax, so it's very likely that this will work. However, the usual way to do this in netezza is using a limit. All that said, this is how I'd query and expect the intended result (omitting your where since I can't infer the intent):
select distinct acct_num from gtd_demo_dim limit 1;

PostgreSQL changing returned rows order

I have a table named categories, which contains ID(long), Name(varchar(50)), parentID(long), and shownByDefault(boolean) columns.
This table contains 554 records. All the shownByDefaultValues are 'false'.
When I execute 'select id, name from categories', pg returns me all the categories,
orderer by its id.
Then I update some of the rows of the table('update categories set shownByDefault where parentId = 1'), update OK.
Then, when I try to execute the first query, which returns all the categories, they are
returner with a very weird order.
I do not have problem to add 'order by', but since I am using JPA to get this values, anyone knows what the problem is or if there is a way to fix this?
That's not a problem. The order of rows returned by a SQL SELECT is undefined unless it has an ORDER BY. The order you get them is usually influenced by the order they are stored in the table and/or the indices that are used by the statement.
So depending on that order without using ORDER BY is a very, very bad idea.
If you need them in some order, simply specify that.
It is important that a table is a set of rows and not a sequence of rows.
From the docs:
If the ORDER BY clause is specified, the returned rows are sorted in the specified order. If ORDER BY is not given, the rows are returned in whatever order the system finds fastest to produce.
The rows are returned in whatever their physical order on disk is; you can reorder them physically using the CLUSTER SQL command, but due to the way Postgres works they'll become unordered as soon as you start modifying rows.
For what you're doing an ORDER BY is the right answer.