H2 Optimize select statement / shutdown defrag - select

Test Case:
drop table master;
create table master(id int primary key, fk1 int, fk2 int, fk3 int, dataS varchar(255), data1 int, data2 int, data3 int, data4 int,data5 int,data6 int,data7 int,data8 int,data9 int,b1 boolean,b2 boolean,b3 boolean,b4 boolean,b5 boolean,b6 boolean,b7 boolean,b8 boolean,b9 boolean,b10 boolean,b11 boolean,b12 boolean,b13 boolean,b14 boolean,b15 boolean,b16 boolean,b17 boolean,b18 boolean,b19 boolean,b20 boolean,b21 boolean,b22 boolean,b23 boolean,b24 boolean,b25 boolean,b26 boolean,b27 boolean,b28 boolean,b29 boolean,b30 boolean,b31 boolean,b32 boolean,b33 boolean,b34 boolean,b35 boolean,b36 boolean,b37 boolean,b38 boolean,b39 boolean,b40 boolean,b41 boolean,b42 boolean,b43 boolean,b44 boolean,b45 boolean,b46 boolean,b47 boolean,b48 boolean,b49 boolean,b50 boolean);
create index idx_comp on master(fk1,fk2,fk3);
#loop 5000000 insert into master values(?, mod(?,100), mod(?,5), ?,'Hello World Hello World Hello World',?, ?, ?,?, ?, ?, ?, ?, ?,true,true,true,true,true,true,false,false,false,true,true,true,true,true,true,true,false,false,false,true,true,true,true,true,true,true,false,false,false,true,true,true,true,true,true,true,false,false,false,true,true,true,true,true,true,true,false,false,false,true);
1.The following select statement takes up to 30seconds. Is there a way to optimize the response time?
SELECT count(*), SUM(CONVERT(b1,INT)) ,SUM(CONVERT(b2,INT)),SUM(CONVERT(b3,INT)),SUM(CONVERT(b4,INT)),SUM(CONVERT(b5,INT)),SUM(CONVERT(b6,INT)),SUM(CONVERT(b7,INT)),SUM(CONVERT(b8,INT)),SUM(CONVERT(b9,INT)),SUM(CONVERT(b10,INT)),SUM(CONVERT(b11,INT)),SUM(CONVERT(b12,INT)),SUM(CONVERT(b13,INT)),SUM(CONVERT(b14,INT)),SUM(CONVERT(b15,INT)),SUM(CONVERT(b16,INT))
FROM master
WHERE fk1=53 AND fk2=3
2.I tried shutdown defrag. But this statement took about 40min for my test case. After shutdown defrag the select takes up to 15seconds. If i execute the statement again it takes under 1sec. Even if stop and start the server, the statement takes about 1sec.
Has H2 a persistent Cache?
Infrastructure: WebBrowser <-> H2 Console Server <-> H2 DB: h2 1.3.158

According to the profiler output, the main problem (93%) is reading from the disk. I ran this in the H2 Console:
#prof_start;
SELECT ... FROM master WHERE fk1=53 AND fk2=3;
#prof_stop;
and got:
Profiler: top 3 stack trace(s) of 48039 ms [build-158]:
4084/4376 (93%):
at java.io.RandomAccessFile.readBytes(Native Method)
at java.io.RandomAccessFile.read(RandomAccessFile.java:338)
at java.io.RandomAccessFile.readFully(RandomAccessFile.java:397)
at org.h2.store.FileStore.readFully(FileStore.java:285)
at org.h2.store.PageStore.readPage(PageStore.java:1253)
at org.h2.store.PageStore.getPage(PageStore.java:707)
at org.h2.index.PageDataIndex.getPage(PageDataIndex.java:225)
at org.h2.index.PageDataNode.getRowWithKey(PageDataNode.java:269)
at org.h2.index.PageDataNode.getRowWithKey(PageDataNode.java:270)
According to EXPLAIN ANALYZE SELECT it's reading over 55'000 pages from the disk (2 KB each page; 110 MB) for this query. I'm not sure how other databases perform for such a query. But I guess if possible the query should be changed so that it reads less data.

Is it possible to have a temporary table/view that already has the datatype conversions done? If it's feasible to have that update itself from the main table occassionally (once a night or so), then you've got a lot of processing power that goes into the conversion done already.
If that's not feasible, you may want to do multiple sub-selects, one for each "b" column, where you only pull where b# = 1. Then do a COUNT instead of a SUM, which should be faster as well. For instance:
SELECT (count1+count2) AS Count,
(SELECT COUNT(*) FROM master WHERE fk1=53 AND fk2=3 AND b1=1) AS count1
(SELECT COUNT(*) FROM master WHERE fk1=53 AND fk2=3 AND b2=1) AS count2
I'm not sure if that exact syntax works in your program, but hopefully as a generic SQL idea it gets you on the right track.

Related

DB2 measure execution time of triggers

How can I best measure the execution time of triggers in DB2 for insert or update.
It is needed for some performance issues, some of them are behaving very slow.
CREATE OR REPLACE TRIGGER CHECK
NO CASCADE BEFORE INSERT ON DAG
REFERENCING NEW AS OBJ
FOR EACH ROW MODE DB2SQL
WHEN(
xyz)
)
SIGNAL SQLSTATE xxx)
For compiled triggers (that is, with BEGIN ... END body):
SELECT
T.TRIGNAME
, M.SECTION_NUMBER
, M.STMT_EXEC_TIME
, M.NUM_EXEC_WITH_METRICS
-- Other M metrics
, M.*
, M.STMT_TEXT
FROM SYSCAT.TRIGGERS T
JOIN SYSCAT.TRIGDEP D
ON (D.TRIGSCHEMA, D.TRIGNAME) = (T.TRIGSCHEMA, T.TRIGNAME)
JOIN TABLE (MON_GET_PKG_CACHE_STMT (NULL, NULL, NULL, -2)) M
ON (M.PACKAGE_SCHEMA, M.PACKAGE_NAME) = (D.BSCHEMA, D.BNAME)
WHERE D.BTYPE = 'K'
-- Or use whatever filter on your triggers
AND (T.TABSCHEMA, T.TABNAME) = ('MYSCHEMA', 'MYTABLE')
ORDER BY 1, 2
For inlined triggers (that is, with BEGIN ATOMIC ... END body):
No way to get separate statistics for them. They are compiled and optimized with the corresponding statement fired them.

Get postgres query log statement and duration as one record

I have log_min_duration_statement=0 in config.
When I check log file, sql statement and duration are saved into different rows.
(Not sure what I have wrong, but statement and duration are not saved together as this answer points)
As I understand session_line_num for duration record always equals to session_line_num + 1 for relevant statement, for same session of course.
Is this correct? is below query reliable to correctly get statement with duration in one row?
(csv log imported into postgres_log table):
WITH
sql_cte AS(
SELECT session_id, session_line_num, message AS sql_statement
FROM postgres_log
WHERE
message LIKE 'statement%'
)
,durat_cte AS (
SELECT session_id, session_line_num, message AS duration
FROM postgres_log
WHERE
message LIKE 'duration%'
)
SELECT
t1.session_id,
t1.session_line_num,
t1.sql_statement,
t2.duration
FROM sql_cte t1
LEFT JOIN durat_cte t2
ON t1.session_id = t2.session_id AND t1.session_line_num + 1 = t2.session_line_num;

Is it possible to have hibernate generate update from values statements for postgresql?

Given a postgresql table
Table "public.test"
Column | Type | Modifiers
----------+-----------------------------+-----------
id | integer | not null
info | text |
And the following values :
# select * from test;
id | info
----+--------------
3 | value3
4 | value4
5 | value5
As you may know with postgresql you can use this kind of statements to update multiples rows with different values :
update test set info=tmp.info from (values (3,'newvalue3'),(4,'newvalue4'),(5,'newvalue5')) as tmp (id,info) where test.id=tmp.id;
And it results in the table being updated in a single queries to :
# select * from test;
id | info
----+--------------
3 | newvalue3
4 | newvalue4
5 | newvalue5
I have been looking around everywhere as to how to make hibernate generate this kind of statements for update queries. I know how to make it work for insert queries (with reWriteBatchedInserts jdbc option and hibernate batch config options).
But is it possible for update queries or do I have to write the native query myself ?
No matter what I do, hibernate always sends separate update queries to the database (I'm looking to the postgresql server statements logs for this affirmation).
2020-06-18 08:19:48.895 UTC [1642] LOG: execute S_6: BEGIN
2020-06-18 08:19:48.895 UTC [1642] LOG: execute S_8: update test set info = $1 where id = $2
2020-06-18 08:19:48.895 UTC [1642] DETAIL: parameters: $1 = 'newvalue3', $2 = '3'
2020-06-18 08:19:48.896 UTC [1642] LOG: execute S_8: update test set info = $1 where id = $2
2020-06-18 08:19:48.896 UTC [1642] DETAIL: parameters: $1 = 'newvalue4', $2 = '4'
2020-06-18 08:19:48.896 UTC [1642] LOG: execute S_8: update test set info = $1 where id = $2
2020-06-18 08:19:48.896 UTC [1642] DETAIL: parameters: $1 = 'newvalue4', $2 = '5'
2020-06-18 08:19:48.896 UTC [1642] LOG: execute S_1: COMMIT
I always find it many times faster to issue a single massive update query than many separate update targeting single rows. With many seperate update queries, even though they are sent in a batch by the jdbc driver, they still need to be processed sequentially by the server, so it is not as efficient as a single update query targeting multiples rows. So if anyone has a solution that wouldn't involve writing native queries for my entities, I would be very glad !
Update
To further refine my question I want to add a clarification. I'm looking for a solution that wouldn't abandon Hibernate dirty checking feature for entities updates. I'm trying to avoid to write batch update queries by hand for the general case of having to updating a few basic fields with different values on an entity list. I'm currently looking into the SPI of hibernate to see it if it's doable. org.hibernate.engine.jdbc.batch.spi.Batch seems to be the proper place but I'm not quite sure yet because I've never done anything with hibernate SPI). Any insights would be welcomed !
You can use Blaze-Persistence for this which is a query builder on top of JPA which supports many of the advanced DBMS features on top of the JPA model.
It does not yet support the FROM clause in DML, but that is about to land in the next release: https://github.com/Blazebit/blaze-persistence/issues/693
Meanwhile you could use CTEs for this. First you need to define a CTE entity(a concept of Blaze-Persistence):
#CTE
#Entity
public class InfoCte {
#Id Integer id;
String info;
}
I'm assuming your entity model looks roughly like this
#Entity
public class Test {
#Id Integer id;
String info;
}
Then you can use Blaze-Persistence like this:
criteriaBuilderFactory.update(entityManager, Test.class, "test")
.with(InfoCte.class, false)
.fromValues(Test.class, "newInfos", newInfosCollection)
.bind("id").select("newInfos.id")
.bind("info").select("newInfos.info")
.end()
.set("info")
.from(InfoCte.class, "cte")
.select("cte.info")
.where("cte.id").eqExpression("test.id")
.end()
.whereExists()
.from(InfoCte.class, "cte")
.where("cte.id").eqExpression("test.id")
.end()
.executeUpdate();
This will create an SQL query similar to the following
WITH InfoCte(id, info) AS(
SELECT t.id, t.info
FROM (VALUES(1, 'newValue', ...)) t(id, info)
)
UPDATE test
SET info = (SELECT cte.info FROM InfoCte cte WHERE cte.id = test.id)
WHERE EXISTS (SELECT 1 FROM InfoCte cte WHERE cte.id = test.id)

Inserting many rows in treelike structure in SQL SERVER 2005

I have a few tables in SQL that are pretty much like this
A B C
ID_A ID_B ID_C
Name Name Name
ID_A ID_B
As you can see, A is linked to B and B to C. Those are basically tables that contains data models. Now, I would need to be able to create date based on those tables. For example, if I have the following datas
A B C
1 Name1 1 SubName1 1 1 SubSubName1 1
2 Name2 2 SubName2 1 2 SubSubName2 1
3 SubName3 2 3 SubSubName3 2
4 SubSubName4 3
5 SubSubName5 3
I would like to copy the 'content' of those tables in others tables. Of course, the auto numeric key that is generated when inserting into the new tables are diffirent that those one and I would like to be able to keep track so that I can copy the entire thing. The structure of the recipient table contains more information that those, but it's mainly dates and other stuff that are easy to get for me.
I would need to this entirely in TRANSACT-SQL (with built-in function if needed). Is this possible and can anyone give me a short example. I manage to do it for one level, but I get confused for the rest.
thanks
EDIT : The info above is just an example, because my actual diagram looks more like this
Model tables :
Processes -- (1-N) Steps -- (1-N) Task -- (0-N) TaskCheckList
-- (0-N) StepsCheckLists
Where as the table I need to fill looks like this
Client -- (0-N) Sequence -- (1-N) ClientProcesses -- (1-N) ClientSteps -- (1-N)ClientTasks -- (0-N) ClientTaskCheckList
-- (0-N)ClientStepCheckLists
The Client already exists and when I need to run the script, I create one sequence, which will contains all processes, which will contains its steps, taks, etc...
Ok,
So I did a lot of trials and error, and here is what I got. It seems to work fine although it sound quite big for something that seemed easy at first.
The whole this is somehow in french and english because our client is french and so are we anyway. It does insert every date in all tables that I needed. The only thing left to this will be the first lines where I need to select the date to insert according to some parameters but this is the easy part.
DECLARE #IdProcessusRenouvellement bigint
DECLARE #NomProcessus nvarchar(255)
SELECT #IdProcessusRenouvellement = ID FROM T_Ref_Processus WHERE Nom LIKE 'EXP%'
SELECT #NomProcessus = Nom FROM T_Ref_Processus WHERE Nom LIKE 'EXP%'
DECLARE #InsertedSequence table(ID bigint)
DECLARE #Contrats table(ID bigint,IdClient bigint,NumeroContrat nvarchar(255))
INSERT INTO #Contrats SELECT ID,IdClient,NumeroContrat FROM T_ClientContrat
DECLARE #InsertedIdsSeq as Table(ID bigint)
-- Séquences de travail
INSERT INTO T_ClientContratSequenceTravail(IdClientContrat,Nom,DateDebut)
OUTPUT Inserted.ID INTO #InsertedIdsSeq
SELECT ID, #NomProcessus + ' - ' + CONVERT(VARCHAR(10), GETDATE(), 120) + ' : ' + NumeroContrat ,GETDATE()
FROM #Contrats
-- Processus
DECLARE #InsertedIdsPro as Table(ID bigint,IdProcessus bigint)
INSERT INTO T_ClientContratProcessus
(IdClientContratSequenceTravail,IdProcessus,Nom,DateDebut,DelaiRappel,DateRappel,LienAvecPro cessusRenouvellement,IdStatutProcessus,IdResponsable,Sequence)
OUTPUT Inserted.ID,Inserted.IdProcessus INTO #InsertedIdsPro
SELECT I.ID,P.ID,P.Nom,GETDATE(),P.DelaiRappel,GETDATE(),P.LienAvecProcessusRenouvellement,0,0,0
FROM #InsertedIdsSeq I, T_Ref_Processus P
WHERE P.ID = #IdProcessusRenouvellement
-- Étapes
DECLARE #InsertedIdsEt as table(ID bigint,IdProcessusEtape bigint)
INSERT INTO T_ClientContratProcessusEtape
(IdClientContratProcessus,IdProcessusEtape,Nom,DateDebut,DelaiRappel,DateRappel,NomListeVeri fication,Sequence,IdStatutEtape,IdResponsable,IdTypeResponsable,ListeVerificationTermine)
OUTPUT Inserted.ID,Inserted.IdProcessusEtape INTO #InsertedIdsEt
SELECT I.ID,E.ID,
E.Nom,GETDATE(),E.DelaiRappel,GETDATE(),COALESCE(L.Nom,''),E.Sequence,0,0,E.IdTypeResponsabl e,0
FROM #InsertedIdsPro I INNER JOIN T_Ref_ProcessusEtape E ON I.IdProcessus = E.IdProcessus
LEFT JOIN T_Ref_ListeVerification L ON E.IdListeVerification = L.ID
-- Étapes : Items de la liste de vérification
INSERT INTO T_ClientContratProcessusEtapeListeVerificationItem
(IdClientContratProcessusEtape,Nom,Requis,Verifie)
SELECT I.ID,IT.Nom,IT.Requis,0
FROM #InsertedIdsEt I
INNER JOIN T_Ref_ProcessusEtape E ON I.IdProcessusEtape = E.ID
INNER JOIN T_Ref_ListeVerificationItem IT ON E.IdListeVerification = IT.IdListeVerification
-- Tâches
DECLARE #InsertedIdsTa as table(ID bigint, IdProcessusEtapeTache bigint)
INSERT INTO T_ClientContratProcessusEtapeTache
(IdClientContratProcessusEtape,IdProcessusEtapeTache,Nom,DateDebut,DelaiRappel,DateRappel,No mListeVerification,Sequence,IdStatutTache,IdResponsable,IdTypeResponsable,ListeVerificationT ermine)
OUTPUT Inserted.ID,Inserted.IdProcessusEtapeTache INTO #InsertedIdsTa
SELECT I.ID,T.ID,
T.Nom,GETDATE(),T.DelaiRappel,GETDATE(),COALESCE(L.Nom,''),T.Sequence,0,0,T.IdTypeResponsabl e,0
FROM #InsertedIdsEt I
INNER JOIN T_Ref_ProcessusEtapeTache T ON I.IdProcessusEtape = T.IdProcessusEtape
LEFT JOIN T_Ref_ListeVerification L ON T.IdListeVerification = L.ID
-- Tâches : Items de la liste de vérification
INSERT INTO T_ClientContratProcessusEtapeTacheListeVerificationItem
(IdClientContratProcessusEtapeTache,Nom,Requis,Verifie)
SELECT I.ID,IT.Nom,IT.Requis,0
FROM #InsertedIdsTa I
INNER JOIN T_Ref_ProcessusEtapeTache T ON I.IdProcessusEtapeTache = T.ID
INNER JOIN T_Ref_ListeVerificationItem IT ON T.IdListeVerification = IT.IdListeVerification

Entity Framework object missing related data

I have an Entity Framework model that has two tables, client and postcode. Postcode can have many clients, client can have 1 postcode. They are joined on the postcode.
The two tables are mapped to views.
I have some clients that do not have a Postcode in the model, however in the DB they do!
I ran some tests and found postcodes that were returning clients when I do Postcode.Clients but not all of the clients? In the db a postcode had 14 related clients but EF was only returning the first 6. Basically certain postcodes are not returning all the data.
Lazy loading is turned on and I have tried turning it off without any luck.
Any ideas?
I am using VS 2010, C#, .NET 4.0, EF4 and SQL Server 2008
Thanks
UPDATE:
I have been running through this in LinqPad. I try the following code
Client c = Clients.Where(a => a.ClientId == 9063202).SingleOrDefault();
c.PostcodeView.Dump();
This returns null.
I then take the generated SQL and run this in a separate SQL query and it works correctly (after I add the # to the start of the variable name)
SELECT TOP (2)
[Extent1].[ClientId] AS [ClientId],
[Extent1].[Surname] AS [Surname],
[Extent1].[Forename] AS [Forename],
[Extent1].[FlatNo] AS [FlatNo],
[Extent1].[StNo] AS [StNo],
[Extent1].[Street] AS [Street],
[Extent1].[Town] AS [Town],
[Extent1].[Postcode] AS [Postcode]
FROM (SELECT
[ClientView].[ClientId] AS [ClientId],
[ClientView].[Surname] AS [Surname],
[ClientView].[Forename] AS [Forename],
[ClientView].[FlatNo] AS [FlatNo],
[ClientView].[StNo] AS [StNo],
[ClientView].[Street] AS [Street],
[ClientView].[Town] AS [Town],
[ClientView].[Postcode] AS [Postcode]
FROM [dbo].[ClientView] AS [ClientView]) AS [Extent1]
WHERE 9063202 = [Extent1].[ClientId]
GO
-- Region Parameters
DECLARE #EntityKeyValue1 VarChar(8) = 'G15 6NB'
-- EndRegion
SELECT
[Extent1].[Postcode] AS [Postcode],
[Extent1].[ltAstId] AS [ltAstId],
[Extent1].[ltLhoId] AS [ltLhoId],
[Extent1].[ltChcpId] AS [ltChcpId],
[Extent1].[ltCppId] AS [ltCppId],
[Extent1].[ltWardId] AS [ltWardId],
[Extent1].[ltAst] AS [ltAst],
[Extent1].[ltCpp] AS [ltCpp],
[Extent1].[ltWard] AS [ltWard],
[Extent1].[WardNo] AS [WardNo],
[Extent1].[Councillor] AS [Councillor],
[Extent1].[ltAdminCentre] AS [ltAdminCentre],
[Extent1].[ltChcp] AS [ltChcp],
[Extent1].[Forename] AS [Forename],
[Extent1].[Surname] AS [Surname],
[Extent1].[AreaNo] AS [AreaNo],
[Extent1].[LtAomId] AS [LtAomId],
[Extent1].[OOHltCoordinatorId] AS [OOHltCoordinatorId],
[Extent1].[OvernightltCoordinatorId] AS [OvernightltCoordinatorId],
[Extent1].[DayltCoordinatorId] AS [DayltCoordinatorId]
FROM (SELECT
[PostcodeView].[Postcode] AS [Postcode],
[PostcodeView].[ltAstId] AS [ltAstId],
[PostcodeView].[ltLhoId] AS [ltLhoId],
[PostcodeView].[ltChcpId] AS [ltChcpId],
[PostcodeView].[ltCppId] AS [ltCppId],
[PostcodeView].[ltWardId] AS [ltWardId],
[PostcodeView].[ltAst] AS [ltAst],
[PostcodeView].[ltCpp] AS [ltCpp],
[PostcodeView].[ltWard] AS [ltWard],
[PostcodeView].[WardNo] AS [WardNo],
[PostcodeView].[Councillor] AS [Councillor],
[PostcodeView].[ltAdminCentre] AS [ltAdminCentre],
[PostcodeView].[ltChcp] AS [ltChcp],
[PostcodeView].[Forename] AS [Forename],
[PostcodeView].[Surname] AS [Surname],
[PostcodeView].[AreaNo] AS [AreaNo],
[PostcodeView].[LtAomId] AS [LtAomId],
[PostcodeView].[DayltCoordinatorId] AS [DayltCoordinatorId],
[PostcodeView].[OOHltCoordinatorId] AS [OOHltCoordinatorId],
[PostcodeView].[OvernightltCoordinatorId] AS [OvernightltCoordinatorId]
FROM [dbo].[PostcodeView] AS [PostcodeView]) AS [Extent1]
WHERE [Extent1].[Postcode] = #EntityKeyValue1
Ended up removing the relationship and manually getting child data.
Nasty but cannot find a reason why this is happening. Cheers for the comments