<mapper namespace="src.main.domain.EqMapper">
<cache
eviction="FIFO"
size="512"
readOnly="true"/>
<select id="getStoreIdAndEqId" resultType="String" flushCache="false" useCache="true">
select count(author) from blog
</select>
<select id="getWholeData" resultType="java.util.LinkedHashMap" flushCache="false" useCache="true">
select * from blog
</select>
</mapper>
Scenario : 1st I call getWholeData method which brings the complete data at a time. For example at that time the result of select count(author) from blog is 20 in real DB. And then I call getStoreIdAndEqId after 10 mins then I expect to get data as 20 as I have loaded whole data the 1st time which is not happening while getWholeData has 20 records getStoreIdAndEqId returns new data may be 25 at that time no matter it remains in cache then after. Is it not possible that when I call getStoreIdAndEqId anywhere for the 1st it should not query db instead it should query from getWholeData cache and return 20 instead of new 25.
Related
I have to update multiple records in a table with the most efficient way out there having the least latency and without utilising CPU extensively. At a time records to update can be ranged from 1 to 1000.
We do not want to lock the database when this update occurs as other services are utilising it.
Note: There are no dependencies generated from this table towards any other table in the system.
After looking in many places I've drilled down a few ways to do the task-
simple-update: A simple update query to the table with update command with already known id's
Either multiple update queries (one query for each individual record), or
Usage of update ... from clause as mentioned here as a single query (one query for all records)
delete-then-insert: Firstly, delete the outdated data and then insert updated data with new id's (since there is no dependency on records, new id's are acceptable)
insert-then-delete: Firstly, insert updated records with new id's and then delete outdated data using old id's (since there is no dependency on records, new id's are acceptable)
temp-table: Firstly, insert updated records into a temporary table. Secondly, update the original table with inserted records from the temporary table. At last, remove the temporary table.
We must not drop the existing table and create a new one in its place
We must not truncate the existing table because we have a huge number of records that we cannot store in the buffer memory
I'm open to any more suggestions.
Also, what will be the impact of making the update all at once vs doing it in batches of 100, 200 or 500?
References:
https://dba.stackexchange.com/questions/75631
https://dba.stackexchange.com/questions/230257
As mentioned by #Frank Heikens in the comments, I'm sure that different people will have different statistics based on their system design. I did some checks and I have found some insights to share for one of my development systems.
Configurations of the system used:
AWS
Engine: PostgreSQL
Engine version: 12.8
Instance class: db.m6g.xlarge
Instance vCPU: 4
Instance RAM: 16GB
Storage: 1000 GiB
I used a lambda function and pg package to write data into a table (default FILLFACTOR) that contains 34,09,304 records.
Both lambda function and database were in the same region.
UPDATE 1000 records into the database with a single query
Run
Time taken
1
143.78ms
2
115.277ms
3
98.358ms
4
98.065ms
5
114.78ms
6
111.261ms
7
107.883ms
8
89.091ms
9
88.42ms
10
88.95ms
UPDATE 1000 records into the database with a single query in 2 batches of 500 records concurrently
Run
Time taken
1
43.786ms
2
48.099ms
3
45.677ms
4
40.578ms
5
41.424ms
6
44.052ms
7
42.155ms
8
37.231ms
9
38.875ms
10
39.231ms
DELETE + INSERT 1000 records into the database
Run
Time taken
1
230.961ms
2
153.159ms
3
157.534ms
4
151.055ms
5
132.865ms
6
153.485ms
7
131.588ms
8
135.99ms
9
287.143ms
10
175.562ms
I did not proceed to check for updating records with the help of another buffer table because I had found my answer.
I've seen the database metrics graph provided by the AWS and by looking into those it was clear that DELETE + INSERT was more CPU intensive. And from the statistics shared above DELETE + INSERT took more time as compared to UPDATE.
If updates are done concurrently in batches, yes, updates will be faster, depending on the number of connections (a connection pool is recommended).
Using a buffer table, truncate, and other methods might be more suitable approaches if needed to update almost all the records in a giant table, though I currently do not have metrics to support this. However, for a limited number of records, UPDATE is a fine choice to proceed with.
Also, if not executed properly, please be mindful that if DELETE + INSERT fails, you might lose records and if INSERT + DELETE fails you might end up having duplicate records.
I have a table in PostgreSQL "items" and there I have some information like id, name, desc, config etc.
It contains 1.6 million records.
I need to make a query to get all result like "select id, name, description from items"
What is the proper pattern for iterating over large result sets?
I used EntityListIterator:
EntityListIterator iterator = EntityQuery.use(delegator)
.select("id", "name", "description")
.from("items")
.cursorScrollInsensitive()
.queryIterator();
int total = iterator.getResultsSizeAfterPartialList();
List<GenericValue> items = iterator.getPartialList(start+1, length);
iterator.close();
the start here is 0 and the length is 10.
I implemented this so I can do pagination with Datatables.
The problem with this is that I have millions of records and it takes like 20 seconds to complete.
What can I do to improve the performance?
If you are implementing pagination, you shouldn't load all 1.6 million records in memory at once. Use order by id in your query and id from 0 to 10, 10 to 20, etc. in the where clause. Keep a counter that denotes up till which id you have traversed.
If you really want to pull all records in memory, then just load the first few pages' records (e.g. from id=1 to id=100), return it to the client, and then use something like CompletableFuture to asynchronously retrieve the rest of the records in the background.
Another approach is to run multiple small queries in separate threads, depending on how many parallel reads your database supports, and then merge the results.
What about CopyManager? You could fetch your data as a text/csv outputstream, maybe in this way it would be faster to retrieve.
CopyManager cm = new CopyManager((BaseConnection) conn);
String sql = "COPY (SELECT id, name, description FROM items) TO STDOUT WITH DELIMITER ';'";
cm.copyOut(sql, new BufferedWriter(new FileWriter("C:/export_transaction.csv")));
I have a large repository method which generates a regular query at backend, some of the parameters I pass to that repository method are the max-results, firs-result, order-by and order-by-dir in order to control the total of records to display, pagination and the order of the records. The problem is when I am in some configuration ex.(4th page, max-results:10, first-result:40), this should give me the 40th to 50th records of +1000 records in database but only is returning -10 records from +1000 records.
QB Code
....
return $total ? //this is a bool parameter to find out if I want the records or the records amount
$qb
->select($qb->expr()->count('ec.id'))
->getQuery()->getSingleScalarResult() :
$qb//these are the related entities all are joined by leftJoin of QB
->addSelect('c')
->addSelect('e')
->addSelect('pr')
->addSelect('cl')
->addSelect('ap')
->addSelect('com')
->addSelect('cor')
->addSelect('nav')
->addSelect('pais')
->addSelect('tarifas')
->addSelect('transitario')
->orderBy(isset($options['sortBy']) ? $options['sortBy'] : 'e.bl', isset($options['sortDir']) ? $options['sortDir'] : 'asc')
->getQuery()
->setMaxResults(isset($options['limit']) ? $options['limit'] : 10)
->setFirstResult(isset($options['offset']) ? $options['offset'] : 0)
->getArrayResult();
Scenario 1: QueryBuilder with orderBy and database
QB: In this case the result is only one entity with the expected data, but only one entity not 10 when exists more than 1000 records
DB: In this case I get 10 records but with the same entity(the same output from QB but repeated 10 times)
Scenario 2: QueryBuilder with out orderBy and database
QB: In this case the result is as expected 10 records filtered from +1000 records
DB: In this case the result is as expected 10 records
The only problem in this scenario is that I can't order my results using the QB.
Environment description
Symfony: 3.4.11
PostgeSQL: 9.2
PHP 7.2
OS: Ubuntu Server 16.04 x64
Why doctrine/postgres are giving me that kind of result?
There is no Exceptions, miss configurations its only cuts the results when I use orderBy
As from comments posting this as an answer
I guess its because you are selecting related entities via left join, So you will be getting multiple results per main entity (due one to many relationships) but not in a sorted manner but when you do order by on your result set, the duplicates shows up in a same row, In absence of order by the the duplicates were still there but not in same row as unsorted results so you haven't noticed/considered them as duplicate record.
What i think as a workaround for your case is select only your main entity lets A in query builder don't select related ones addSelect(...) and use lazy loading when you want to display your desired results from related entities.
For example, I want to retrieve all data from citizen table which contains about 18K rows.
String sqlResult = "SELECT * FROM CITIZEN";
Query query = getEntityManager().createNativeQuery(sqlResult);
query.setFirstResult(searchFrom);
query.setMaxResults(searchCount); // searchCount is 20
List<Object[]> listStayCit = query.getResultList();
Everything was fine until "searchFrom" offset was large ( 17K or something ). For example, it took 3-4 mins to get 20 rows ( 17,000 to 17,020 ). So is there any better way to get it faster but not via tunning the DB ?
P/s: Sorry for my bad English
You Could use batch queries.
A good article explaining solution to ur problem is available here:
http://java-persistence-performance.blogspot.in/2010/08/batch-fetching-optimizing-object-graph.html
I am working on a project where I came across on a sql query. I want see the execution flow fully. i.e how the query executed on database. what can I do for this?
can any body help me for this?
Atomic Operations
The query will be an atomic operation so that you could set table A.Description = table B.Description without having to worry about overwriting the data from either table.
A great book on sql querying -The book that you need to read is Inside Microsoft® SQL Server® 2008: T-SQL Querying. This will show you exactly how a query is processed. You can also use Display Estimated Execution Plan / Show Actual Execution Plan setting in sql server management studio to see a visual plan. You can right click the query pane to toggle them on and off. You will have to figure out how to read it first.
Actual Order of query based on Itzak Ben-Gan in the book.
1 FROM
2 ON <join_condition>
3 <join_type> JOIN <right_table>
4 WHERE <where_condition>
5 GROUP BY <group_by_list>
6 WITH { CUBE | ROLEUP }
7 HAVING <having_condition>
8 SELECT
9 DISTINCT
10 ORDER BY <order_by_list>
11 TOP