Join versus SecondaryTable - openjpa

Trying to understand the pros & cons of SecondaryTables here.
I have 2 tables with schema like this:
Table1 pk ID int, NAME varchar(100), DESC varchar(256)
Table1_Strings fk Table1_ID int, LONG_STRING text
Each Table1 entry will have 0-1 Table1_Strings entry. I'd like to lazy load the Table1_Strings.
What is the difference if I treat Table1_Strings as a SecondaryTable or with a Join?? Clearly I don't really understand the difference. This is my first taste of OpenJPA.
thanks! Sorry for the ugly formatting above!

Related

How do you optimize less than in where?

Suppose we have a query like below:
select top 1 X from dbo.Tab
where SomeDateTime < #P1 and SomeID = #P2
order by SomeDateTime desc
What indexes or other techniques do you use to effectively optimize these types of queries?
The table may still have a primary key but on a different column like Id:
Id BIGINT IDENTITY PK
X something
SomeDateTime DateTime2
SomeID BIGINT FK to somewhere
Suppose there are millions of records for a given SomeDateTime or SomeID - table is "long".
Let's assume that data is read and written with a similar frequency.
For posterity, an article with pictures on the topic:
https://use-the-index-luke.com/sql/where-clause/searching-for-ranges/greater-less-between-tuning-sql-access-filter-predicates

How to insert new rows only on tables without Primary or Foreign Keys?

Scenario: I have two tables. Table A and Table B, both have the same exact columns. My task is to create a master table. I need to ensure no duplicates are in the master table unless it is a new record.
problem: Whoever built the tables did not assign a Primary Key to the table.
Attempts: I attempted running an INSERT INTO WHERE NOT EXISTS query (below as an example not the actual query I ran)
Question: the portion of the query below WHERE t2.id = t1.id confuses me, my table has a multitude of columns, there is no id column like I said it has no PRIMARY key to anchor the match, so, in a scenario where all I have are values without primary keys, how can I append only new records? Also, perhaps I am going about this the wrong way but are there any other functions or options through TSQL worth considering? Maybe not an INSERT INTO statement or perhaps something else? My SQL skills aren't yet that advance so I am not asking for a solution but perhaps ideas or other methods worth considering? Any ideas are welcome.
INSERT INTO TABLE_2
(id, name)
SELECT t1.id,
t1.name
FROM TABLE_1 t1
WHERE NOT EXISTS(SELECT id
FROM TABLE_2 t2
WHERE t2.id = t1.id)
If I understand your question correctly, you would need to amend the SQL sample you posted by changing the condition t2.id = t1.id to whatever columns you do have.
Say your 2 tables have name and brand columns and you don't want duplicates, just change the sample to:
WHERE t2.name = t1.name
AND t2.brand = t1.brand
This will ensure you don't insert and rows in table 2 from table 1 which are duplicates. You would have to make sure the where condition contains all columns (you said the table schemas are identical).
Also, the above code sample copies everything into table 2 - but you said you want a master table - so you'd have to change it to insert into the master table, not table 2.

Create pivot table with dynamic column names

I am creating a pivot table which represents crash values for particular year. Currently, i am doing a hard code for column names to create pivot table. Is there anyway to make the column names dynamic to create pivot table? years are stored inside an array
{2018,2017,2016 ..... 2008}
with crash as (
--- pivot table generated for total fatality ---
SELECT *
FROM crosstab('SELECT b.id, b.state_code, a.year, count(case when a.type = ''Fatal'' then a.type end) as fatality
FROM '||state_code_input||'_all as a, (select * from source_grid_repository where state_code = '''||upper(state_code_input)||''') as b
where st_contains(b.geom,a.geom)
group by b.id, b.state_code, a.year
order by b.id, a.year',$$VALUES ('2018'),('2017'),('2016'),('2015'),('2014'),('2013'),('2012'),('2011'),('2010'),('2009'),('2008') $$)
AS pivot_table(id integer, state_code varchar, fat_2018 bigint, fat_2017 bigint, fat_2016 bigint, fat_2015 bigint, fat_2014 bigint, fat_2013 bigint, fat_2012 bigint, fat_2011 bigint, fat_2010 bigint, fat_2009 bigint, fat_2008 bigint)
)
In the above code, fat_2018, fat_2017 , fat_2016 etc were hard coded. I need the years after fat_ to be dynamic.
This question has been asked many times, & there are decent (even dynamic) solutions. While CROSSTAB() is available in recent versions of Postgres, not everyone has sufficient privileges to install the prerequisite extensions.
One such solution involves a temp type (temp table) created by an anonymous function & JSON expansion of the resultant type.
See also: DB FIDDLE (UK): https://dbfiddle.uk/Sn7iO4zL
How to pivot or crosstab in postgresql without writing a function?
It is not possible. PostgreSQL is strict type system. Result is a table (relation). A format of this table (columns, columns names, columns types) should be defined before query execution (in planning time). So you cannot to write any query for Postgres that returns dynamic number of columns.

Postgres using an index for one table but not another

I have three tables in my app, call them tableA, tableB, and tableC. tableA has fields for tableB_id and tableC_id, with indexes on both. tableB has a field foo with an index, and tableC has a field bar with an index.
When I do the following query:
select *
from tableA
left outer join tableB on tableB.id = tableA.tableB_id
where lower(tableB.foo) = lower(my_input)
it is really slow (~1 second).
When I do the following query:
select *
from tableA
left outer join tableC on tableC.id = tabelA.tableC_id
where lower(tableC.bar) = lower(my_input)
it is really fast (~20 ms).
From what I can tell, the tables are about the same size.
Any ideas as to the huge performance difference between the two queries?
UPDATES
Table sizes:
tableA: 2061392 rows
tableB: 175339 rows
tableC: 1888912 rows
postgresql-performance tag info
Postgres version - 9.3.5
Full text of the queries are above.
Explain plans - tableB tableC
Relevant info from tables:
tableA
tableB_id, integer, no modifiers, storage plain
"index_tableA_on_tableB_id" btree (tableB_id)
tableC_id, integer, no modifiers, storage plain,
"index_tableA_on_tableB_id" btree (tableC_id)
tableB
id, integer, not null default nextval('tableB_id_seq'::regclass), storage plain
"tableB_pkey" PRIMARY_KEY, btree (id)
foo, character varying(255), no modifiers, storage extended
"index_tableB_on_lower_foo_tableD" UNIQUE, btree (lower(foo::text), tableD_id)
tableD is a separate table that is otherwise irrelevant
tableC
id, integer, not null default nextval('tableC_id_seq'::regclass), storage plain
"tableC_pkey" PRIMARY_KEY, btree (id)
bar, character varying(255), no modifiers, storage extended
"index_tableC_on_tableB_id_and_bar" UNIQUE, btree (tableB_id, bar)
"index_tableC_on_lower_bar" btree (lower(bar::text))
Hardware:
OS X 10.10.2
CPU: 1.4 GHz Intel Core i5
Memory: 8 GB 1600 MHz DDR3
Graphics: Intel HD Graphics 5000 1536 MB
Solution
Looks like running vacuum and then analyze on all three tables fixed the issue. After running the commands, the slow query started using "index_patients_on_foo_tableD".
The other thing is that you have your indexed columns queried as lower() , which can also be creating a partial index when the query is running.
If you will always query the column as lower() then your column should be indexed as lower(column_name) as in:
create index idx_1 on tableb(lower(foo));
Also, have you looked at the execution plan? This will answer all your questions if you can see how it is querying the tables.
Honestly, there are many factors to this. The best solution is to study up on INDEXES, specifically in Postgres so you can see how they work. It is a bit of holistic subject, you can't really answer all your problems with a minimal understanding of how they work.
For instance, Postgres has an initial "lets look at these tables and see how we should query them" before the query runs. It looks over all tables, how big each of the tables are, what indexes exist, etc. and then figures out how the query should run. THEN it executes it. Oftentimes, this is what is wrong. The engine incorrectly determines how to execute it.
A lot of the calculations of this are done off of the summarized table statistics. You can reset the summarized table statistics for any table by doing:
vacuum [table_name];
(this helps to prevent bloating from dead rows)
and then:
analyze [table_name];
I haven't always seen this work, but often times it helps.
ANyway, so best bet is to:
a) Study up on Postgres indexes (a SIMPLE write up, not something ridiculously complex)
b) Study up the execution plan of the query
c) Using your understanding of Postgres indexes and how the query plan is executing, you cannot help but solve the exact problem.
For starters, your LEFT JOIN is counteracted by the predicate on the left table and is forced to act like an [INNER] JOIN. Replace with:
SELECT *
FROM tableA a
JOIN tableB b ON b.id = a.tableB_id
WHERE lower(b.foo) = lower(my_input);
Or, if you actually want the LEFT JOIN to include all rows from tableA:
SELECT *
FROM tableA a
LEFT JOIN tableB b ON b.id = a.tableB_id
AND lower(b.foo) = lower(my_input);
I think you want the first one.
An index on (lower(foo::text)) like you posted is syntactically invalid. You better post the verbatim output from \d tbl in psql like I commented repeatedly. A shorthand syntax for a cast (foo::text) in an index definition needs more parentheses, or use the standard syntax: cast(foo AS text):
Create index on first 3 characters (area code) of phone field?
But that's also unnecessary. You can just use the data type (character varying(255)) of foo. Of course, the data type character varying(255) rarely makes sense in Postgres to begin with. The odd limitation to 255 characters is derived from limitations in other RDBMS which do not apply in Postgres. Details:
Refactor foreign key to fields
Be that as it may. The perfect index for this kind of query would be a multicolumn index on B - if (and only if) you get index-only scans out of this:
CREATE INDEX "tableB_lower_foo_id" ON tableB (lower(foo), id);
You can then drop the mostly superseded index "index_tableB_on_lower_foo". Same for tableC.
The rest is covered by the (more important!) indices in table A on tableB_id and tableC_id.
If there are multiple rows in tableA per tableB_id / tableC_id, then either one of these competing commands can swing the performance to favor the respective query by physically clustering related rows together:
CLUSTER tableA USING "index_tableA_on_tableB_id";
CLUSTER tableA USING "index_tableA_on_tableC_id";
You can't have both. It's either B or C. CLUSTER also does everything a VACUUM FULL would do. But be sure to read the details first:
Optimize Postgres timestamp query range
And don't use mixed case identifiers, sometimes quoted, sometimes not. This is very confusing and is bound to lead to errors. Use legal, lower-case identifiers exclusively - then it doesn't matter if you double-quote them or not.

Return Next Available PK in a Collection of Tables

I'm building a report from a series of known tables and their primary keys, e.g.:
BOOKS.bookid
AUTHORS.authorid
GENRE.genreid
What I would like to do is build a t-sql report that simply shows the table, the primary key, and the next available PK, e.g.:
**tabl_name prim_key avail_key**
BOOKS BOOKID 281
AUTHORS AUTHORID 29
GENRE GENREID 18
I already have the table name and its PK by using the information_schema, but somehow joining that with the actual table to derive its next available PK is proving elusive. I'm guessing there's some sort of dynamic sql with cursors solution, but that's maxing my sql skills out.
Try this:
SELECT Col.TABLE_NAME, Col.Column_Name, ident_current(Col.TABLE_NAME) from
INFORMATION_SCHEMA.TABLE_CONSTRAINTS Tab,
INFORMATION_SCHEMA.CONSTRAINT_COLUMN_USAGE Col
WHERE
Col.Constraint_Name = Tab.Constraint_Name
AND Col.Table_Name = Tab.Table_Name
AND Constraint_Type = 'PRIMARY KEY '
By the way, most of the above came from this answer:
https://stackoverflow.com/a/96049/37613