PostgreSQL Select acorss tables - postgresql

I have very little experience with PostgreSQL and looking for a way using pgadmin to list out the rows across tables with foreign keys so that I can quickly update some records without doing this via a webapp.
All my tables are prefixed with app
I can select * from one table, but don't know what I'm doing beyond that to get the related table data.
SELECT app_choice.choice_text, app_choice.question_id, app_choice_choice_value
FROM app_choice;
INNER JOIN app_projectquestionnairequestion
ON app.questionnaire.id = app_projectquestionnairequestion.id;
Some my choice table is linked to my projectquestionnairequestion table. So basically every question in projectquestionnairequestion has a related choice in the choice table.
But not all questions have a choice available so i need a way to list all the questions to let me add a choice to it?
Sorry for the bad explanation. It's hard to explain when I dont know the terms
Thanks

This is not an answer but a comment that doesn't fit in the comments section. Don't upvote it since I'll remove it.
Your syntax is a bit wrong, but you are not too far from it. Also, you should use "table aliases" as well (c and q below). Your query could look like:
SELECT
c.choice_text, c.question_id, c.choice_value,
q.name -- this would be the "name" column of the second table
FROM app_choice c
INNER JOIN app_projectquestionnairequestion q
ON q.id = c.question_id;

Related

Most efficient way to DECODE multiple columns -- DB2

I am fairly new to DB2 (and SQL in general) and I am having trouble finding an efficient method to DECODE columns
Currently, the database has a number of tables most of which have a significant number of their columns as numbers, these numbers correspond to a table with the real values. We are talking 9,500 different values (e.g '502=yes' or '1413= Graduate Student')
In any situation, I would just do WHERE clause and show where they are equal, but since there are 20-30 columns that need to be decoded per table, I can't really do this (that I know of).
Is there a way to effectively just display the corresponding value from the other table?
Example:
SELECT TEST_ID, DECODE(TEST_STATUS, 5111, 'Approved, 5112, 'In Progress') TEST_STATUS
FROM TEST_TABLE
The above works fine.......but I manually look up the numbers and review them to build the statements. As I mentioned, some tables have 20-30 columns that would need this AND some need DECODE statements that would be 12-15 conditions.
Is there anything that would allow me to do something simpler like:
SELECT TEST_ID, DECODE(TEST_STATUS = *TableWithCodeValues*) TEST_STATUS
FROM TEST_TABLE
EDIT: Also, to be more clear, I know I can do a ton of INNER JOINS, but I wasn't sure if there was a more efficient way than that.
From a logical point of view, I would consider splitting the lookup table into several domain/dimension tables. Not sure if that is possible to do for you, so I'll leave that part.
As mentioned in my comment I would stay away from using DECODE as described in your post. I would start by doing it as usual joins:
SELECT a.TEST_STATUS
, b.TEST_STATUS_DESCRIPTION
, a.ANOTHER_STATUS
, c.ANOTHER_STATUS_DESCRIPTION
, ...
FROM TEST_TABLE as a
JOIN TEST_STATUS_TABLE as b
ON a.TEST_STATUS = b.TEST_STATUS
JOIN ANOTHER_STATUS_TABLE as c
ON a.ANOTHER_STATUS = c.ANOTHER_STATUS
JOIN ...
If things are too slow there are a couple of things you can try:
Create a statistical view that can help determine cardinalities from the joins (may help the optimizer creating a better plan):
https://www.ibm.com/support/knowledgecenter/sl/SSEPGG_9.7.0/com.ibm.db2.luw.admin.perf.doc/doc/c0021713.html
If your license admits you can experiment with Materialized Query Tables (MQT). Note that there is a penalty for modifications of the base tables, so if you have more of a OLTP workload, this is probably not a good idea:
https://www.ibm.com/developerworks/data/library/techarticle/dm-0509melnyk/index.html
A third option if your lookup table is fairly static is to cache the lookup table in the application. Read the TEST_TABLE from the database, and lookup descriptions in the application. Further improvements may be to add triggers that invalidate the cache when lookup table is modified.
If you don't want to do all these joins you could create yourself an own LOOKUP function.
create or replace function lookup(IN_ID INTEGER)
returns varchar(32)
deterministic reads sql data
begin atomic
declare OUT_TEXT varchar(32);--
set OUT_TEXT=(select text from test.lookup where id=IN_ID);--
return OUT_TEXT;--
end;
With a table TEST.LOOKUP like
create table test.lookup(id integer, text varchar(32))
containing some id/text pairs this will return the text value corrseponding to an id .. if not found NULL.
With your mentioned 10k id/text pairs and an index on the ID field this shouldn't be a performance issue as such data amount should be easily be cached in the corresponding bufferpool.

ERROR: syntax error at or near "group"

Hello I'm writing an sql query But i am getting a syntax error on the line with the GROUP BY. What can possibly be the problem, help if you can please.
UPDATE intersection_points i
SET nbr_victimes = sum(tue+bl+bg)
FROM accident_ma a ,intersection_points i
WHERE (ST_DWithin(i.st_intersection,a.geom_acc, 10000) group by st_intersection)) ;
GROUP BY is its own clause, it's not part of a WHERE clause.
This is what you have:
WHERE (
ST_DWithin(i.st_intersection,a.geom_acc, 10000)
group by st_intersection
)
This is what you need:
WHERE ST_DWithin(i.st_intersection,a.geom_acc, 10000)
group by st_intersection
Edit: In response to comments, it sounds like your JOIN is a bit more complex than the UPDATE ... FROM syntax would need. Take a look at the "Notes" section on this page:
When a FROM clause is present, what essentially happens is that the target table is joined to the tables mentioned in the from_list, and each output row of the join represents an update operation for the target table. When using FROM you should ensure that the join produces at most one output row for each row to be modified. In other words, a target row shouldn't join to more than one row from the other table(s). If it does, then only one of the join rows will be used to update the target row, but which one will be used is not readily predictable.
Because of this indeterminacy, referencing other tables only within sub-selects is safer, though often harder to read and slower than using a join.
Normally this would involve changing the syntax to something like:
UDPATE SomeTable
SET SomeColumn = 'Some Value'
WHERE AnotherColumn =
(SELECT AnotherColumn
FROM AnotherTable
-- etc.)
However, the use of ST_DWithin() in this query may complicate that quite a bit. Without much deeper knowledge of the table structures, relationships, and overall intent of this update there probably isn't much more help I can give. Essentially you're going to need to clarify for the database exactly what records need to be updated and how to update them, which may involve changing your query to this latter sub-select syntax in some way.
I don' t understand your data structure. I create the following tables from your query. Please check table structure.
if table's structure is this
your query must be
UPDATE intersection_points SET nbr_victimes = (SELECT SUM(a.tue+a.bl+a.bg) FROM accident_ma a WHERE st_dwithin(st_intersection, a.geom_acc, 1000));

Suggestion needed to avoid table scan in large view w/many duplicate values

Could anyone offer a solution to speed up one of our processes? We have a view used for reporting that is a union all of 10 tables. The view has 180 million rows. We would like to generate a list of the distinct values of individual columns. The current SQL generated by the reporting tool does a select distinct on the view which takes 10 minutes. Preferably the solution would be automatically updated. We have been trying to create a MQT in DB2 udb V8 as a union all, refresh immediate with little success. Any suggestions would be greatly appreciated.
Charles.
There are a lot of restrictions in DB2 8.2 for refresh immediate MQTs, and they can have a significant performance impact on applications that write to the base tables. That said, you may be able to use an MQT. However, instead of using SELECT DISTINCT, try making the query look something like:
select yourcolumn, count(*) as ignore
from union_all_view
group by yourcolumn
The column (yourcolumn) from must be defined as NOT NULL for this to work (in DB2 8.2). The optimizer may not select this MQT if you still issue SELECT DISTINCT against the union all view, so you may need to query the MQT (or a view defined on top of it) directly. Ignore the column "ignore" in the MQT -- that is there only for DB2; if you really don't want to see it, you can create a vew on top of the MQT.
However, this is really a database design issue. Why do you need to scan 180 million rows of data to find the unique values in a particular column? Why don't these values already reside in their own table, with foreign keys defined against it from each of the 10 base tables?

join multiple fields to one table

I have 2 tables (there are more but un related to question) optionValue and productStock
I want to get the option names from the optionValue table for each option1, option2, option3 (the query below will should help to make more sense)
below is my attempt, the current query it only works if all options are set but is returns null if any option is not set:
SELECT s.option1, n1.name s.optionName1,
s.option2, n2.name s.optionName2,
s.option3, n3.name s.optionName3
FROM productStock as s
INNER JOIN optionValue n1 on s.option1 = v1.optionValueID
INNER JOIN optionValue n2 on s.option2 = v2.optionValueID
INNER JOIN optionValue n3 on s.option3 = v3.optionValueID
WHERE s.productStockID = 1
I understand why it doesn't work because when the option is null ther is no matches to the optionValue table but im not sure how to fix it (if it is fixable)
I read in a couple of places about using IN or COALESCE but I don't understand how to use them.
It seems like some of your syntax is a little incorrect.
Apart from that you want LEFT OUTER JOIN instead of INNER JOIN.
Visual explanation of SQL joins.
What you really need first it to correct your database design. Anytime you have fields like this:
s.option1,s.option2, s.option3
Then what you really need is a child table to store the information. What happens when you need 6 options or 25? This is a very bad database design and will cause no end of problems incuding the inefficent query you now have to write. This is a cancer at the heart of your system and needs to be fixed before anything else is done.

sybase - fails to use index unless string is hard-coded

I'm using Sybase 12.5.3 (ASE); I'm new to Sybase though I've worked with MSSQL pretty extensively. I'm running into a scenario where a stored procedure is really very slow. I've traced the issue to a single SELECT stmt for a relatively large table. Modifying that statement dramatically improves the performance of the procedure (and reverting it drastically slows it down; i.e., the SELECT stmt is definitely the culprit).
-- Sybase optimizes and uses multi-column index... fast!<br>
SELECT ID,status,dateTime
FROM myTable
WHERE status in ('NEW','SENT')
ORDER BY ID
-- Sybase does not use index and does very slow table scan<br>
SELECT ID,status,dateTime
FROM myTable
WHERE status in (select status from allowableStatusValues)
ORDER BY ID
The code above is an adapted/simplified version of the actual code. Note that I've already tried recompiling the procedure, updating statistics, etc.
I have no idea why Sybase ASE would choose an index only when strings are hard-coded and choose a table scan when choosing from another table. Someone please give me a clue, and thank you in advance.
1.The issue here is poor coding. In your release, poor code and poor table design are the main reasons (98%) the optimiser makes incorrect decisions (the two go hand-in-hand, I have not figured out the proportion of each). Both:
WHERE status IN ('NEW','SENT')
and
WHERE status IN (SELECT status FROM allowableStatusValues)
are substandard, because in both cases they cause ASE to create a worktable for the contents between the brackets, which can easily be avoided (and all consequential issues avoided with it). There is no possibility of statistics on a worktable, since the statistics on either t.status or s.status is missing (AdamH is correct re that point), it correctly chooses a table scan.
Subqueries have their place, but never as a substitute for a pure (the tables are related) join. The corrections are:
WHERE status = "NEW" OR status = "SENT"
and
FROM myTable t,
allowableStatusValues s
WHERE t.status = s.status
2.The statement
|Now you don't have to add an index to get statistics on a column, but it's probably the best way.
is incorrect. Never create Indices that you will not use. If you want statistics updated on a column, simply
UPDATE STATISTICS myTable (status)
3.It is important to ensure that you have current statistics on (a) all indexed columns and (b) all join columns.
4.Yes, there is no substitute for SHOWPLAN on every code segment that is intended for release, doubly so for any code with questionable performance. You can also SET NOEXEC ON, to avoid execution, eg. for large result sets.
An index hint will work around it, but is probably not the solution.
Firstly I'd like to know if there is an index on allowableStatusValues.status, if there is then sybase will have stats on it and will have a good idea on the number of values in there.
If not then the optimiser probably won't have a good idea how many different values Status may take. It's then having to make the assumption that you're going to be extracting almost all of the rows from myTable, and the best way of doing this is a table scan (if no covering index).
Now you don't have to add an index to get statistics on a column, but it's probably the best way.
If you do have an index on allowableStatusValues.status, then i'd wonder how good your stats are. Get yourself a copy of sp__optdiag. You probably also need to tune the values of "histogram tuning factor" and "number of histogram steps", increasing these slightly from the defaults will give you more detailed statistics which always helps the optimiser.
Does it still do a table scan if you replace the subquery with a join:
SELECT m.ID,m.status,m.dateTime
FROM myTable m
JOIN allowableStatusValues a on m.status = a.status
ORDER BY ID
Rather than relying on experimental observations of how long a query takes to run, I would highly recommend getting Sybase to show you the execution plans for each query, for example:
SET showplan ON
GO
-- query/procedure call goes here
SELECT id, status, datetime
FROM myTable
WHERE status IN('NEW','SENT')
ORDER BY id
GO
SET showplan OFF
GO
With SET showplan ON, Sybase generates execution plans for every statement it executes. These can be invaluable in helping to identify where queries are not making use of appropriate indexes. For stored procedures in Sybase, the execution plan for the entire procedure is generated when the stored procedure is first executed after being compiled.
If you post the plans for each of your queries we might be able to shed more light on the problem.
Amazingly, using an index hint resolves the issue (see the (index myIndexName) line below - re-written/simplififed code below:
-- using INDEX HINT
SELECT ID,status,dateTime
FROM myTable (index myIndexName)
WHERE status in (select status from allowableStatusValues)
ORDER BY ID
Weird that I have to use this technique to avoid a table scan, but there ya go.
Garrett, by showing only the simplified code, you have likely stripped out exactly the information that would illuminate the source of the problem.
My first guess would be a type mismatch between allowableStatusValues.status and myTable.status. However, that is not the only possibility. As ninesided stated, the complete query plans (using showplan and fmtonly flags), as well as the actual table definitions and stored procedure source, is much more likely to produce a useful answer.