How to skip showing results in HIVE Command Line? - tsql

I have executed a query in HIVE CLI that should generate around 11.000.000 rows, I know the result because I have executed the query in the MS SQL Server Management Studio too.
The problem is that in HIVE CLI the rows are showing on an on ( right know there are more than 12 hours since I started the execution ) and all I want to know is the time processing, which is showed only after showing the results.
So I have 2 questions :
How to skip showing rows results in HIVE command line ?
If I will execute the query in Beeswax, how do I see statistics like execution time , similar with SET STATISTICS TIME ON in T-SQL ?

You can check it using link given in log .But it wont give you total processing left.

Related

Data viewer in SSIS not showing all records

I am using SSIS in visual studio for a data analysis task where I am comparing two databases
and trying to identify records that appear in one set and not the other. I have enabled dataviewer to see the details of the records however not all the records appear. For example in the attached image there is a difference of 20 records however only 18 appear. I have pressed the green play button but no more appear. Does anyone have any idea how to fix this?
My first guess would be that you've already clicked the Play symbol and the first 2 rows were displayed although I'm not 100% sold on this explanation as I would expect it to display 18 rows then 2.
Quick attempt at a repro. This should page 4 times
SELECT TOP 25 row_number() OVER (ORDER BY (SELECT NULL)) AS rn
,replicate('x', 4000) AS c2
FROM sys.all_columns
We can observe that total rows are cumulative so the next probable explanation would be an error was generated such that the 2 other rows weren't able to be sent to the Data Viewer.
Without seeing your package, it'll be difficult to reproduce this. Were it me, I'd
Add a Multicast operation before the data viewer. This allows the package to run "normal." Then add a second output to the multicast and route it to a new flat file destination. Run package, get the 20 rows, 18 displayed situation and compare that to the flat file output.

Query execution time increased after changing a server

I have changed the test database server from windows to synology. After this change, there is one query which takes 9-10 sec to get a result on synology. On the Windows server ,it took 0.18sec to get data. When I exported data , indexes and triggers were also exported. So no issue with indexes. 
I have also compared "Explain SQL" result on both server. It is same no difference.
select SUM(work_hours) as apt_hour, `employee_id` from `job_status` where `job_date` = '2020-10-23' and `status` != '0' and `job_status`.`deleted_at` is null group by `employee_id`
What is causing issue here ? How to decrease the execution time?

Redshift Compile Time For First Time Run Queries

i am struggling with my dashboard performance which runs queries on Redshift using JDBC driver.
the query is like -
select <ALIAS_TO_SCHEMA.TABLENAME>.<ANOTHER_COLUMN_NAME> as col_0_0_,
sum(<ALIAS_TO_SCHEMA.TABLENAME>.devicecount) as col_1_0_ from <table_schema>.<table_name> <ALIAS_TO_SCHEMA.TABLENAME> where <ALIAS_TO_SCHEMA.TABLENAME>.<COLUMN_NAME>=$1
or <ALIAS_TO_SCHEMA.TABLENAME>.<COLUMN_NAME>=$2
or <ALIAS_TO_SCHEMA.TABLENAME>.<COLUMN_NAME>=$3
or <ALIAS_TO_SCHEMA.TABLENAME>.<COLUMN_NAME>=$4
or <ALIAS_TO_SCHEMA.TABLENAME>.<COLUMN_NAME>=$5
or <ALIAS_TO_SCHEMA.TABLENAME>.<COLUMN_NAME>=$6
or <ALIAS_TO_SCHEMA.TABLENAME>.<COLUMN_NAME>=$7
or <ALIAS_TO_SCHEMA.TABLENAME>.<COLUMN_NAME>=$8
or <ALIAS_TO_SCHEMA.TABLENAME>.<COLUMN_NAME>=$9
or <ALIAS_TO_SCHEMA.TABLENAME>.<COLUMN_NAME>=$10
or <ALIAS_TO_SCHEMA.TABLENAME>.<COLUMN_NAME>=$11
or <ALIAS_TO_SCHEMA.TABLENAME>.<COLUMN_NAME>=$12
or <ALIAS_TO_SCHEMA.TABLENAME>.<COLUMN_NAME>=$13
or <ALIAS_TO_SCHEMA.TABLENAME>.<COLUMN_NAME>=$14
or <ALIAS_TO_SCHEMA.TABLENAME>.<COLUMN_NAME>=$15
or <ALIAS_TO_SCHEMA.TABLENAME>.<COLUMN_NAME>=$16
or <ALIAS_TO_SCHEMA.TABLENAME>.<COLUMN_NAME>=$17
or <ALIAS_TO_SCHEMA.TABLENAME>.<COLUMN_NAME>=$18
or <ALIAS_TO_SCHEMA.TABLENAME>.<COLUMN_NAME>=$19
or <ALIAS_TO_SCHEMA.TABLENAME>.<COLUMN_NAME>=$20
or <ALIAS_TO_SCHEMA.TABLENAME>.<COLUMN_NAME>=$21
or <ALIAS_TO_SCHEMA.TABLENAME>.<COLUMN_NAME>=$22
or <ALIAS_TO_SCHEMA.TABLENAME>.<COLUMN_NAME>=$23
or <ALIAS_TO_SCHEMA.TABLENAME>.<COLUMN_NAME>=$24
or <ALIAS_TO_SCHEMA.TABLENAME>.<COLUMN_NAME>=$25
or <ALIAS_TO_SCHEMA.TABLENAME>.<COLUMN_NAME>=$26
or ........
The For dashboard we use Spring, Hibernate ( I am not 100% sure about it though ).
But the query might sometimes stretch till $1000 + according to the filters/options being selected on the UI.
But the problem we are seeing is - The First Time this query is being run by the reports, it takes more than 40 sec - 60 seconds for the response. After the first time , the query runs quite fast and takes only few seconds to run.
We initially suspected there must be something wrong with redshift caching , but it turns out that , Even simple queries like these ( But Huge ) takes considerable time to COMPILE, which is clear when we look into the svl_compile table which shows this query was compiled in over 35 seconds.
What should I do to handle such issues ?
Recommend restructuring the query generated by your dashboard to use an IN list. Redshift should be able to reuse the already compiled query segments for different length IN lists.
Note that IN lists with less than 10 values will still be evaluated as OR. https://docs.aws.amazon.com/redshift/latest/dg/r_in_condition.html#r_in_condition-optimization-for-large-in-lists
SELECT <ALIAS_TO_SCHEMA.TABLENAME>.<ANOTHER_COLUMN_NAME> as col_0_0_
, SUM(<ALIAS_TO_SCHEMA.TABLENAME>.devicecount) AS col_1_0_
FROM <table_schema>.<table_name> <ALIAS_TO_SCHEMA.TABLENAME>
WHERE <ALIAS_TO_SCHEMA.TABLENAME>.<COLUMN_NAME> IN ( $1,$2,$3,$4,$5,$6,$7,$8,$9,$10,$11 … $1000 )
;

$P{LoggedInUsername} and data from a user in a WITH-clause

I am using JasperSoft Reports v.6.2.1 and when running a report within the Studio preview the output comes after 2 seconds.
Running the same report (output xlsx) on the server takes > half a minute - though there is no data volume issue (crosstab, 500 lines, 17 columns in excel, "ignore pagination" = true).
I am using $P{LoggedInUsername} to filter data within the WHERE-part of a WITH-clause (based on the user's rights), run the report and realized, when using a fixed value (the user's id as a string) instead of the parameter in the query, the report execution speed is good.
Same against Oracle DB from SQL Developer - the query resultset with a user's id string is back in 2 sec.
Also the output of $P{LoggedInUsername} in a TextField produces a String.
Once switching back to the $P{LoggedInUsername}-parameter in the query, the report takes ages again or runs out of heap memory in the Studio/server.
What could be the issue?
Finally my problem was solved using the expression user_id = '$P!{LoggedInUsername}' instead of $P{LoggedInUsername} in the WHERE-part of my query.

In SQL Management studio, what is the overhead of displaying the result of a SELECT statement?

When you perform a SELECT over a large table in SQL Management Studio by default, the result is displayed in a Grid table. As we can imagine, when the result set is million lines long, the data will be fed to that table widget that is displayed in SQL Management Studio.
Will that have an impact on the execution duration of the query ?
If so, is it possible to disable the displaying of the results of the query, to get a more realistic execution time of that query ?
UPDATE :
When I say display, I don't mean "Display ... Execution Plan" but the display of the data on the screen
there are 3 ways to display resultset in management studio
results to text
results to grid
results to file
you can choose option 1 and chk execution time and compare it with option 2 execution time.