Hive: Read timeout exception - scala

The problem is when I'm trying to query smth from hive using my application e.g.
analyze table table_entity compute statistics or for example select count(*) from table_entity sometimes I got such exception:java.net.SocketTimeoutException: Read timed out
But when for example I query show tables or show tblproperties table_entity I didn't get such exception.
Has anyone faced with this problem?
Thank you in advance.

I know this is little late, but I was stuck with same sort of problem where we were not able to drop a table, we raised a case with Cloudera and following was the suggestion provided by them:
Upon investigating the logs and the stack of messages we have observed
during the execution of the DROP query for the table "tableName",
the query got failed due to "Read time out" in HiveServer2 from
HiveMetastore. This means the drop operation of the table was not able
to get complete at Metadata level. (In general - DROP operation
deletes the metadata details of a table from HMS) We are suspecting
that the metadata of this table is not up to date and might have more
data which could have led to the timeout. Could you please perform the
below steps and let us know your feedback?
Login to Beeline-Hive.
Update the partition level metadata in Hive Metastore:
MSCK REPAIR TABLE db.table_name DROP PARTITIONS;
Compute the statistics for the table:
ANALYZE TABLE db.tableName partition(col1, col2) compute statistics noscan;
Drop the partitions of the table first using the query:
ALTER TABLE db.tableName DROP [IF EXISTS] PARTITION partition_spec[, PARTITION partition_spec, ...]
Please replace the partition values in the above command accordingly as per the partitions of the table.
Increase the Hive Metastore client socket timeout.
set hive.metastore.client.socket.timeout=1500.
This will give an increased socket time out only for this session. Hence run the next step #6 in the same session.
Drop the table tableName;
DROP TABLE db.tableName;
Although the question here is regarding Analyze table issue but this answer is intended to cover all the other issues related to Read Timeout Error.
I have shown my scenario(drop statement), however these steps will surely help across all sorts of queries which leads to the Read timeout Exception.
Cheers!

Related

Db2 doesn't allow to update table, throws an error saying that operation is incomplete

I'm getting an error when trying to update a table. The SQL statement is:
UPDATE dda_accounts SET TYPE_SK = TYPE_SK - 10 WHERE TYPE_SK > 9;
The error I get is:
SQL Error [57007]: Operation not allowed for reason code "7" on table
"BANK_0002_TEST.DDA_ACCOUNTS".. SQLCODE=-668, SQLSTATE=57007, DRIVER=4.27.25
SQLSTATE 57007 says that there's something incomplete after an ALTER TABLE was executed.
I found this resolution, but it's not clear if it can be fixed or the only way to recover the table is using a backup.
Running a select statement works, only the update fails. What is the way to fix this table?
You need to REORG the table to recover, see this page for details.
When you get an error like this, lookup the SQL066N code with the reason code "7".
This shows:
The table is in the reorg pending state. This can occur after an ALTER
TABLE tatement containing a REORG-recommended operation.
Be aware the the previous alter table (that put the table into this state of reorg needed) might have happened quite some time ago, possibly without your knowledge.
If you lack the authorisation to perform reorg table inplace "BANK_0002_TEST.DDA_ACCOUNTS" , then contact your DBA for assistance. The DBA may choose to also reorg indexes at the same time, and to perform runstats (docs) on the table following completion of the reorg, and to check whether anything else needs rebinding.

Odoo 10 is not backing up DB in PostgreSQL 9.5. Shows "SQL state: 22008. Timestamp out of range on account_bank_statement_line."

At our company we had a DB crash a few days ago due to hardware reasons. We recovered from that but since then we're having this following error every time we try to back up our DB.
pg_dump: ERROR: timestamp out of range
pg_dump: SQL command to dump the contents of table "account_bank_statement_line"
The error is in "account_bank_statement_line" table, where we have 5 rows created with only the 'create_date' column has a date of year 4855(!!!!), the rest of the columns have null value, even the id (primary key). We can't even delete or update those rows using pgAdmin 4 or PostgreSQL terminal.
We're in a very risky stage right now with no back up of few days of retail sales. Any hints at least would be very highly appreciated.
First, if the data are important, hire a specialist.
Second, run your pg_dump with the option --exclude-table=account_bank_statement_line so that you at least have a backup of the rest of your database.
The next thing you should do is to stop the database and take a cold backup of all the files. That way you have something to go back to if you mess up.
The key point to proceed is to find out the ctids (physical addresses) of the problematic rows. Then you can use that to delete the rows.
You can approach that by running queries like
SELECT create_date FROM account_bank_statement_line
WHERE ctid < '(42,0)';
and try to find the ctids where you get an error. Once you have found a row where the following falls over:
SELECT * FROM account_bank_statement_line
WHERE ctid = '(42,14)';
you can delete the row by its ctid.
Once you are done, take a pg_dumpall of the database cluster, create a new one and restore the dump. It is dangerous to continue working with a cluster that has experienced corruption, because corruption can remain unseen and spread.
I know what we did might not be the most technically advanced, but it solved our issue. We consulted a few experts and what we did was:
migrated all the data to a new table (account_bank_statement_line2), this transferred all the rows that had valid data.
Then we deleted the "account_bank_statement_line" table and
renamed the new table to "account_bank_statement_line".
After that we could DROP the table.
Then the db backup ran smoothly like always.
Hope this helps anyone who's in deep trouble like us. Cheers!

PostgreSQL select query on table that is being updated

I assume this question has been asked before, but unfortunately I cannot find the answer to my question.
I have a table, and I am using an update statement to update a column. Simultaneously I am running a create table query with a select statement that is retrieving data from the table and column that is also being updated.
My questions are: can this lead to wrong results in the output of the create table statement? does the update query finish 1st then the create table with the select execute? I just know that the create table statement is taking way longer to execute.
In PostgreSQL readers never lock writers and vice versa. This is guaranteed by PostgreSQL's MVCC implementation that keeps old row versions around.
If the updating transaction isn't finished yet, the reading transaction will see the old value, and the result is consistent.
There is nothing inside PostgreSQL that should slow down the SELECT statement noticeably, but of course I/O contention is a possible explanation.

Query to find a critical situation in Greenplum database(4.3.12) which has postgres version 8.2.15

Need a SELECT query that can find the users,procpids along with query statements runs by them with the waiting_state when a particular table is accessed in their respective queries which as a result leads to the concept of blocking table and waiting table.
Thanks.

Alter a SELECT Query to Limit

I'm working on a 1M+ row table. The software that inserts the data sometimes tries to select all rows. If it tries to do that; It crashes.
I'm not able to modify the software so I'm trying to implement a fix on the Postgresql side.
I want Postgresql to limit SELECT query results that are coming from a special user to 1.
I tried to implement a RULE but haven't been able to do it with success. Any suggestions are welcome.
Br,
You could rename the table and create a view with the name of the table (selecting from the renamed table).
Then you can include a LIMIT clause in the view definition.
There is a chance you need an index. Let me give you a few scenarios
There is a unique constraint on one of the fields but no corresponding index. This way when you insert a record PostgreSQL has to scan the table to see if there is an existing record with the same value in that field.
Your software mimics unique field constraint. Before inserting a new record it scans the table for a record with the same value in one of the fields to check if such a record already exists. Index on the right field would definitely help.
You software wants to compute the next "id" value. In this case it runs SELECT MAX(id) in order to find the next available value. "id" needs an index.
Try to find out if indexing one of the table fields helps. You can also try to trace and analyze queries submitted to the server and see if those queries can benefit from indexing the table. You can enable query logging this way How to log PostgreSQL queries?
Another guess is that your software buffers all records before processing them. Reading 1M records into memory may crash it. Limiting fetchSize (e.g. if your software uses JDBC you could add defaultRowFetchSize connection parameter to the connection string) may help though I realize you may not have means to change the way the existing software fetches data from DB.