phantom "name" column? - postgresql

I start simple:
hoops=# select * from core_school limit 3;
id | school_name | nickname
----+------------------+----------
1 | Marshall |
2 | Ohio |
3 | Houston |
(10 rows)
Let's introduce an intentional error:
hoops=# select name from core_school;
ERROR: column "name" does not exist
LINE 1: select name from core_school;
But why does this work? (with an unexpected result!):
hoops=# select core_school.name from core_school limit 3;
name
-----------------
(1,Marshall,"")
(2,Ohio,"")
(3,Houston,"")
(3 rows)
Where did the "name" column come from in the third query?

This is PostgreSQL's autocast feature which allows calling function(argument) as argument.function.
What you are really calling is
SELECT NAME(core_school)
FROM core_school
Compare to this:
SELECT (1::int).exp
--
2.71828182845905
which is quite self-explaining.
This "feature" very often leads to confusion and will (finally) be removed in 9.1.

Maybe you have a different version of Postgres than I do. (I've got 8.3.7.) But I don't have any such "phantom" name column.
If you simply say "select core_school from core_school" you'll get one line of output for each row in the table, with that line consisting of an array of the values of all the columns in the table. That's what you're seeing.
Oh, I notice that you're getting a column name of dealer. Maybe you didn't real put a period between "core_school" and "name" but a space, and now "name" is an alias for the column name. (My Postgres installation requires the word "as" to make an alias for a column name, but some databases do not require this, so maybe there's an option in Postgres somewhere for compatibility.)

Related

Optimise a simple update query for large dataset

I have some data migration that has to occur between a parent and child table. For the sake of simplicity, the schemas are as follows:
------- -----------
| event | | parameter |
------- -----------
| id | | id |
| order | | eventId |
------- | order |
-----------
Because of an oversight with business logic that needs to be performed, we need to update parameter.order to the parent event.order. I have come up with the following SQL to do that:
UPDATE "parameter"
SET "order" = e."order"
FROM "event" e
WHERE "eventId" = e.id
The problem is that this query didn't resolve after over 4 hours and I had to clock out, so I cancelled it.
There are 11 million rows on parameter and 4 million rows on event. I've run EXPLAIN on the query and it tells me this:
Update on parameter (cost=706691.80..1706622.39 rows=11217313 width=155)
-> Hash Join (cost=706691.80..1706622.39 rows=11217313 width=155)
Hash Cond: (parameter."eventId" = e.id)
-> Seq Scan on parameter (cost=0.00..435684.13 rows=11217313 width=145)
-> Hash (cost=557324.91..557324.91 rows=7724791 width=26)
-> Seq Scan on event e (cost=0.00..557324.91 rows=7724791 width=26)
Based on this article it tells me that the "cost" referenced by the EXPLAIN is an "arbitrary unit of computation".
Ultimately, this update needs to be performed, but I would accept it happening in one of two ways:
I am advised of a better way to do this query that executes in a timely manner (I'm open to all suggestions, including updating schemas, indexing, etc.)
The query remains the same but I can somehow get an accurate prediction of execution time (even if it's hours long). This way, at least, I can manage the expectations of the team. I understand that without actually running the query it can't be expected to know the times, but is there an easy way to "convert" these arbitrary units into some millisecond execution time?
Edit for Jim Jones' comment:
I executed the following query:
SELECT psa.pid,locktype,mode,query,query_start,state FROM pg_locks pl LEFT JOIN pg_stat_activity psa ON pl.pid = psa.pid
I got 9 identical rows like the following:
pid | locktype | mode | query | query-start | state
-------------------------------------------------------------------------
23192 | relation | AccessShareLock | <see below> | 2021-10-26 14:10:01 | active
query column:
--update parameter
--set "order" = e."order"
--from "event" e
--where "eventId" = e.id
SELECT psa.pid,locktype,mode,query,query_start,state FROM pg_locks pl LEFT JOIN pg_stat_activity psa ON pl.pid = psa.pid
Edit 2: I think I've been stupid here... The query produced by checking these locks is just the commented query. I think that means there's actually nothing to report.
If some rows already have the target value, you can skip empty updates (at full cost). Like:
UPDATE parameter p
SET "order" = e."order"
FROM event e
WHERE p."eventId" = e.id
AND p."order" IS DISTINCT FROM e."order"; -- this
If both "order" columns are defined NOT NULL, simplify to:
...
AND p."order" <> e."order";
See:
How do I (or can I) SELECT DISTINCT on multiple columns?
If you have to update all or most rows - and can afford it! - writing a new table may be cheaper overall, like Mike already mentioned. But concurrency and depending objects may stand in the way.
Aside: use legal, lower-case identifiers, so you don't have to double-quote. Makes your life with Postgres easier.
The query will be slow because for each UPDATE operation, it has to look up the index by id. Even with an index, on a large table, this is a per-row read/write so it is slow.
I'm not sure how to get a good estimate, maybe do 1% of the table and multiply?
I suggest creating a new table, then dropping the old one and renaming the new table.
CREATE TABLE parameter_new AS
SELECT
parameter.id,
parameter."eventId",
e."order"
FROM
parameter
JOIN event AS "e" ON
"e".id = parameter."eventId"
Later, once you verify things:
ALTER TABLE parameter RENAME TO parameter_old;
ALTER TABLE parameter_new RENAME TO parameter;
Later, once you're completely certain:
DROP TABLE parameter_old;

PostgreSQL UPDATE doesn't seem to update some rows

I am trying to update a table from another table, but a few rows simply don't update, while the other million rows work just fine.
The statement I am using is as follows:
UPDATE lotes_infos l
SET quali_ambiental = s.quali_ambiental
FROM sirgas_lotes_centroid s
WHERE l.sql = s.sql AND l.quali_ambiental IS NULL;
It says 647 rows were updated, but I can't see the change.
I've also tried without the is null clause, results are the same.
If I do a join it seems to work as expected, the join query I used is this one:
SELECT sql, l.quali_ambiental, c.quali_ambiental FROM lotes_infos l
JOIN sirgas_lotes_centroid c
USING (sql)
WHERE l.quali_ambiental IS NULL;
It returns 787 rows, (some are both null, that's ok), this is a sample from the result from the join:
sql | quali_ambiental | quali_ambiental
------------+-----------------+-----------------
1880040001 | | PA 10
1880040001 | | PA 10
0863690003 | | PA 4
0850840001 | | PA 4
3090500003 | | PA 4
1330090001 | | PA 10
1201410001 | | PA 9
0550620002 | | PA 6
0430790001 | | PA 1
1340180002 | | PA 9
I used QGIS to visualize the results, and could not find any tips to why it is happening. The sirgas_lotes_centroid comes from the other table, the geometry being the centroid for the polygon. I used the centroid to perform faster spatial joins and now need to place the information into the table with the original polygon.
The sql column is type text, quali_ambiental is varchar(6) for both.
If a directly update one row using the following query it works just fine:
UPDATE lotes_infos
SET quali_ambiental = 'PA 1'
WHERE sql LIKE '0040510001';
If you don't see results of a seemingly sound data-modifying query, the first question to ask is:
Did you commit your transaction?
Many clients work with auto-commit by default, but some do not. And even in the standard client psql you can start an explicit transaction with BEGIN (or syntax variants) to disable auto-commit. Then results are not visible to other transactions before the transaction is actually committed with COMMIT. It might hang indefinitely (which creates additional problems), or be rolled back by some later interaction.
That said, you mention: some are both null, that's ok. You'll want to avoid costly empty updates with something like:
UPDATE lotes_infos l
SET quali_ambiental = s.quali_ambiental
FROM sirgas_lotes_centroid s
WHERE l.sql = s.sql
AND l.quali_ambiental IS NULL
AND s.quali_ambiental IS NOT NULL; --!
Related:
How do I (or can I) SELECT DISTINCT on multiple columns?
The duplicate 1880040001 in your sample can have two explanations. Either lotes_infos.sql is not UNIQUE (even after filtering with l.quali_ambiental IS NULL). Or sirgas_lotes_centroid.sql is not UNIQUE. Or both.
If it's just lotes_infos.sql, your query should still work. But duplicates in sirgas_lotes_centroid.sql make the query non-deterministic (as #jjanes also pointed out). A target row in lotes_infos can have multiple candidates in sirgas_lotes_centroid. The outcome is arbitrary for lack of definition. If one of them has quali_ambiental IS NULL, it can explain what you observed.
My suggested query fixes the observed problem superficially, in that it excludes NULL values in the source table. But if there can be more than one non-null, distinct quali_ambiental for the same sirgas_lotes_centroid.sql, your query remains broken, as the result is arbitrary.You'll have to define which source row to pick and translate that into SQL.
Here is one example how to do that (chapter "Multiple matches..."):
Updating the value of a column
Always include exact table definitions (CREATE TABLE statements) with any such question. This would save a lot of time wasted for speculation.
Aside: Why are the sql columns type text? Values like 1880040001 strike me as integer or bigint. If so, text is a costly design error.

Is this a postgresql bug? Only one row can not query by equal but can query by like

i have a table,only one row in this table can not query by equal query,but can query by like (not incloud %).
postgresql server version:90513
# select id,external_id,username,external_id from users where username = 'oFIC94vdidrrKHpi5lc1_2Ibv-OA';
id | external_id | username | external_id
----+-------------+----------+-------------
(0 rows)
# select id,external_id,username,external_id from users where username like 'oFIC94vdidrrKHpi5lc1_2Ibv-OA';
id | external_id | username | external_id
--------------------------------------+------------------------------+------------------------------+------------------------------
61ebea19-74f5-4713-9a30-63eb5af8ac8f | oFIC94vdidrrKHpi5lc1_2Ibv-OA | oFIC94vdidrrKHpi5lc1_2Ibv-OA | oFIC94vdidrrKHpi5lc1_2Ibv-OA
(1 row)
if i dump this table and restore it,it will be fixed. by why.
it is a postgresql bug? how can i workaround it. I've met twice.
Do you have an index on this table? If yes, this appears like corrupted index - PostgreSQL uses index in first case, and if the index is corrupt it might return no result.
This is usually bug, either software one or hardware (data loss on power loss, or any memory issues). Try dropping and recreating index, or rebuilding it with https://www.postgresql.org/docs/9.3/sql-reindex.html

Build a list of grouped values

I'm new to this page and this is the first time i post a question. Sorry for anything wrong. The question may be old, but i just can't find any answer for SQL AnyWhere.
I have a table like
Order | Mark
======|========
1 | AA
2 | BB
1 | CC
2 | DD
1 | EE
I want to have result as following
Order | Mark
1 | AA,CC,EE
2 | BB,DD
My current SQL is
Select Order, Cast(Mark as NVARCHAR(20))
From #Order
Group by Order
and it just give me with result completely the same with the original table.
Any idea for this?
You can use the ASA LIST() aggregate function (untested, you might need to enclose the order column name into quotes as it is also a reserved name):
SELECT Order, LIST( Mark )
FROM #Order
GROUP BY Order;
You can customize the separator character and order if you need.
Note: it is rather a bad idea to
name your table and column name with like regular SQL clause (Order by)
use the same name for column an table (Order)

Jmeter Issues with JDBC Request and variables

I'm having a few issues with Jmeter and storing/using variables from them:
I have a JDBC request which does a VERY simple "select statement" with the following sql:
select count(member_id) from member
This is then stored in a variable named count. I know what the count should be (should be 312), but the value count_1 gets is 40077. What is even more troubling is at some point, it started working and getting the correct count. Any idea what is going on?
In a seperate JDBC request, I retrieve a list of members:
select member_id from members
This is stored in a variable named members. Then I created a THIRD JDBC request to query and grab a random member:
select * from members where member_id = ?
In "Parameter values", I put in ${__V(member_${__Random(1,10)})} (note I put 10, not $count because I can't even get it to work correctly with a hard coded number). I see that this gets parsed correctly, but the error I get is:
org.postgresql.util.PSQLException: ERROR: invalid input syntax for integer: "member_7"
So it's not substituting the member_7 variable's value. Instead it's just passing the string. What am I doing wrong here?
If you have table member, where you have some member_id in this way (for example):
| member_id |
+-----------+
| 1 |
| 2 |
| 1 |
And you would like to count UNIQUE members from this table, you must use SELECT in this way:
SELECT COUNT(DISTINCT member_id) FROM member;
When you miss keyword DISTINCT, you will get only a COUNT of lines in the table.
The second SELECT you have to use in similar way:
SELECT DISTINCT member_id FROM member;
And the last question is, why you tried to integer value assign a value like 'member_7'?