I connect to my server via ssh, and I tried to run query with impala-shell on server like below:
impala-shell -B --output_delimiter=',' -o test_file.txt -q " my query "
Data size is quite big and total row number is about 500,000,000.
But after few hours, server is disconnected and result file has only 470,000,000 rows.
Strange thing is when query ends successfully, I always saw message like below:
Fetched 10438968 row(s) in 433.84s
but there was no message. I think server is disconnected while query is running.
Actually, the server is set up for auto-logout(timeout value is 300). Is this problem? Can the server disconnect even though the query is running?
Related
I have executed some query from my remote PostgreSQL database with the help of a simple jdbc java program for which I want process Id. Can anyone suggest how can I get the process id for the same query I have executed?
The process ID is assigned when you open the connection to the database server.
It's per connection, not an ID "per query"!
So before you run your actual query, you can run:
select pg_backend_pid();
to the get the PID assigned to your JDBC connection. Then you can e.g. log it or print it somehow so that you know it once your query is running.
I have a 3 member replica set. Read preference is set as "Secondary Preferred" How to check application is reading from Secondary Node in MongodB? Please suggest.
Firstly you can configure profiling. For that you need to start your mongodb servers with option --profile 2 and configure log file. It'll log all queries.
After that you can read log for each instance db. Simple example:
db.your_collection.profile.find({ns: "name_your_db.your_collection"})
Secondly you can use mongotop. You need to start it for each mongodb server.
For example:
mongotop -h server_ip:server_port seconds
mongotop -h 127.0.0.1:27017 5
It'll print every specified period of time log, where you can read how much time for read or write is taken for each collection.
Other means of determining whether queries are sent to secondaries:
Enable command logging in the driver, which should tell you which server each command was sent to.
Inspect server logs for connections being made or not made, then set minimum connection pool size to 0 so that a connection is only made when needed for a query, then run some queries.
I am running MySQL Workbench 6.3 on a Windows 7 machine 64-bit laptop. When a do a simple query to get all the data in a single table with ~400 rows of data, the query stays in "running . . . " status and eventually returns the Error Code: 2013 Lost Connection to MySQL server at "waiting for initial communication". If I limit the results to 1000 rows, the query works fine, its only when I allow for more than 2000 rows does this occur.
I do have "Use compression protocol" enabled, which I had hoped would fix the issue.
The other thing I noticed, that if I run the query on my Mac I do not have this issue, I get more than 10,000 rows with no issues.
Has anyone else had this issue and resolved it?
~michemali
Seems like a time out issue. Please refer to this post to see if this resolves your issue:
MySQL Workbench: How to keep the connection alive
I'm trying to bulk load around 200M lines (3.5GB) of data to an Amazon RDS postgresql DB using the following command:
cat data.csv | psql -h<host>.rds.amazonaws.com -U<user> <db> -c "COPY table FROM STDIN DELIMITER AS ','"
After a couple of minutes I get this error:
connection not open
connection to server was lost
If I run head -n 100000000 data.csv to send the first 100M lines instead of all 200M then the command succeeds instead. I'm guessing that there's a timeout somewhere that's causing the query with the full dataset to fail. I've not been able to find any timeout settings or parameters though.
How can I make the bulk insert succeed with the full dataset?
As I read the statement you're using, it basically creates a giant string, then connects to SQL and then it tries to feed the entire string as argument.
If you load psql and run something like \copy ... from '/path/to/data.csv' ..., I'd imagine the connection might stay alive while the file's content is streamed chunk by chunk.
That would be my hunch as to why 10M lines works (= argument pushed entirely before the connection times out) but not the entire file (= argument still uploading).
I'm trying to put a huge data into PostgreSQL (PostGIS for detail).
About 100 scenes, each scene contains 12 bands of raster image. Each image is about 100MB
What I do:
For each scene in scenes (
for each band in scene (
Open connection to postGIS db
Add band
)
SET PGPASSWORD=password
psql -h 192.168.2.1 -p 5432 -U user -d spatial_db -f combine_bands.sql
)
It ran well till scene #46. It causes an error No buffer space available (maximum connections reached)
I run script on Windows 7, my remote server is on Ubuntu 12.04 LTS.
UPDATE: Connect to remote server and run sql file.
This message:
No buffer space available (maximum connections reached?)
comes from a Java exception, not the PostgreSQL server. A java stack trace may be useful to get some context.
If the connection was rejected by PostgreSQL, the message would be:
FATAL: connection limit exceeded for non-superusers
Still it may be that the program exceeds its max number of open sockets by not closing its connections to PostgreSQL. Your script should close each DB connection as soon as it's finished with it, or open just one and reuse it throughout the whole process.
Simultaneaous connections for the same program are only needed when issuing queries in parallel, which doesn't seem to be the case here.