Queries are very slow - typo3

I build a Website with CMS Typo3. Everything works fine, but i have i performance issue. The frontend was generated very slow. So i decide to install the extension "T3Profiler" to analyze the problem.
I find in profiler queries like:
SELECT content FROM cf_cache_hash WHERE identifier = 'f33c135b63eac6bb7194edab51f3c57a' AND cf_cache_hash.expires >= 1441015330 LIMIT 1
Such queries takes 90.000 - 600.000 ms. Why this selects are so slow? How can solve my issue?
Can someone give me a hint?

Sometimes when system tables like sys_log, *_cache_* and others grows to huge sizes it causes that querying them becomes slower and slooooower... While they are often accessed during common rendering process, they can become real performance killers
There are several workarounds for this:
Add scheduler's tasks to clean these tables regularly i.e. sys_log and history entries can be purged after ie. 30 days especially when system is in dev state and maaaany changes are done every day.
Check what fills the tables - i.e. some extension may add several hundreds logs to sys_log i.e. when some method doesn't get some expected argument - if it's used in loop with collection of hundred items, logger has to write the error several hundreds times per request (!), fix the code to avoid such situations
Make sure that all tables structures are correct via Install Tool > Compare current database with specification
Finally use your database GUI to optimize and/or repair tables.

In addition to #biesior answer, another tip is to store caches outside of your TYPO3 database, because db is already stressed enough by data selection.
For TYPO3 7.x I do it like this in AdditionalConfiguration.php:
$redisCacheOptions = [
'hostname' => 'localhost',
'port' => 6379,
'database' => 2,
'password' => '******',
];
$cacheConfigurations = [
'cache_hash',
'cache_imagesizes',
'cache_pages',
'cache_pagesection',
'cache_rootline',
'extbase_datamapfactory_datamap',
'extbase_object',
'extbase_reflection',
'extbase_typo3dbbackend_queries',
'extbase_typo3dbbackend_tablecolumns'
];
foreach ($cacheConfigurations as $cacheConfiguration) {
$GLOBALS['TYPO3_CONF_VARS']['SYS']['caching']['cacheConfigurations'][$cacheConfiguration]['backend'] = \TYPO3\CMS\Core\Cache\Backend\RedisBackend::class;
$GLOBALS['TYPO3_CONF_VARS']['SYS']['caching']['cacheConfigurations'][$cacheConfiguration]['options'] =
$redisCacheOptions + (array)$GLOBALS['TYPO3_CONF_VARS']['SYS']['caching']['cacheConfigurations'][$cacheConfiguration]['options'];
}

Related

Data syncing with pouchdb-based systems client-side: is there a workaround to the 'deleted' flag?

I'm planning on using rxdb + hasura/postgresql in the backend. I'm reading this rxdb page for example, which off the bat requires sync-able entities to have a deleted flag.
Q1 (main question)
Is there ANY point at which I can finally hard-delete these entities? What conditions would have to be met - eg could I simply use "older than X months" and then force my app to only ever displays data for less than X months?
Is such a hard-delete, if possible, best carried out directly in the central db, since it will be the source of truth? Would there be any repercussions client-side that I'm not foreseeing/understanding?
I foresee the number of deleted's growing rapidly in my app and i don't want to have to store all this extra data forever.
Q2 (bonus / just curious)
What is the (algorithmic) basis for needing a 'deleted' flag? Is it that it's just faster to check a flag rather than to check for the omission of an object from, say, a very large list. I apologize if it's kind of a stupid question :(
Ultimately it comes down to a decision that's informed by your particular business/product with regards to how long you want to keep deleted entities in your system. For some applications it's important to always keep a history of deleted things or even individual revisions to records stored as a kind of ledger or history. You'll have to make a judgement call as to how long you want to keep your deleted entities.
I'd recommend that you also add a deleted_at column if you haven't already and then you could easily leverage something like Hasura's new Scheduled Triggers functionality to run a recurring job that fully deletes records older than whatever your threshold is.
You could also leverage Hasura's permissions system to ensure that rows that have been deleted aren't returned to the client. There is documentation and examples for ways to work with soft deletes and Hasura
For your second question it is definitely much faster to check for the deleted flag on records than to have to try and diff the entire dataset looking for things that are now missing.

Can I debug a PostgreSQL query sent from an external source, that I can't edit?

I see how to debug queries stored as Functions in the database. But my problem is with an external QGIS plugin that connects to my Postgres 10.4 via network and does a complex query and calculations, and stores the results back into PostGIS tables:
FOR r IN c LOOP
SELECT
(1 - ST_LineLocatePoint(path.geom, ST_Intersection(r.geom, path.geom))) * ST_Length(path.geom)
INTO
station
(continues ...)
When it errors, it just returns that line number as the failing location, but no clue where it was in the loop through hundreds of features. (And any features it has processed are not stored to the output tables when it fails.) I totally don't know enough about the plugin and about SQL to hack the external query, and I suspect if it was a reasonable task the plugin author would have included more revealing debug messages.
So is there some way I could use pgAdmin4 (or anything) from the server side to watch the query process? Even being able to see if it fails the first time through the loop or later would help immensely. Knowing the loop count at failure would point me to the exact problem feature. Being able to see "station" or "r.geom" would make it even easier.
Perfectly fine if the process is miserably slow or interferes with other queries, I'm the only user on this server.
This is not actually a way to watch the RiverGIS query in action, but it is the best I have found. It extracts the failing ST_Intersects() call from the RiverGIS code and runs it under your control, where you can display any clues you want.
When you're totally mystified where the RiverGIS problem might be, run this SQL query:
SELECT
xs."XsecID" AS "XsecID",
xs."ReachID" AS "ReachID",
xs."Station" AS "Station",
xs."RiverCode" AS "RiverCode",
xs."ReachCode" AS "ReachCode",
ST_Intersection(xs.geom, riv.geom) AS "Fraction"
FROM
"<your project name>"."StreamCenterlines" AS riv,
"<your project name>"."XSCutLines" AS xs
WHERE
ST_Intersects(xs.geom, riv.geom)
ORDER BY xs."ReachID" ASC, xs."Station" DESC
Obviously replace <your project name> with the QGIS project name.
Also works for the BankLines step if you replace "StreamCenterlines" with "BankLines". Probably could be adapted to other situations where ST_Intersects() fails without a clue.
You'll get a listing with shorter geometry strings for good cross sections and double-length strings for bad ones. Probably need to widen your display column a lot to see this.
Works for me in pgAdmn4, or in QGIS3 -> Database -> DB Manager -> (click the wrench icon). You could select only bad lines, but I find the background info helpful.

My heroku postgres psycopg2 (python) query gets slower and slower each time executed. Any insight?

I have a python app ran on heroku that utilizes a standard postgresql heroku db ($50 version). There are 4 tables within the db. My app queries for one primary key within the main table based off input from the users of my app.
The querying worked great at first however now I'm finding it becoming too slow after about 40-50 minutes without restart my dyno. The queries will take 2,000ms after a while and take several seconds to load in front of the users. I'm newer to programming and this is my second app. I'm wondering what would make queries slower with time instead of constant. They are so fast at first. What are best practices for psycopg2 within an app to ensure the db doesn't get hung up? Here is an example of one of the queries (all others have similar syntax throughout the script):
if eventText=="Mc3 my champs":
user=event.source.user_id
profile= line_bot_api.get_profile(user)
name=str((profile.display_name))
cur=None
try:
cur = conn.cursor(cursor_factory=psycopg2.extras.DictCursor)
# get the user's information if it exists
cur.execute("""SELECT lineid, summoner_name, champ_data FROM prestige_data WHERE lineid = %(lineid)s LIMIT 1""", {"lineid": user})
rows = cur.fetchall()
for row in rows:
champs = row[2]
prestige=(calculate_prestige(champs))
champs = json.loads(champs)
champsdict=dict.items(champs)
champs_sorted=sorted(champsdict, key=lambda student: student[1], reverse=True)
l=('\n'.join(map(str,champs_sorted)))
hello=str(l).replace('(', '').replace(')', '')
yay=str(hello).replace("'", "").replace("'", "")
msg=(yay+'\n'+"---------------------------"+'\n'+name+'\n'+"Prestige:"+(str(prestige)))
line_bot_api.reply_message(
event.reply_token,
TextSendMessage(text=msg))
break # we should only have one result, but we'll stop just in case
# The user does not exist in the database already
else:
msg = "Oops! You need to add some champs first. Try 'Mc3 inputchamp'."
line_bot_api.reply_message(
event.reply_token,
TextSendMessage(text=msg))
except BaseException:
if cur is not None:
conn.rollback()
finally:
if cur is not None:
cur.close()
While I know I did not frame that question well (only been programming for a month), I have found one issue potentially causing this that warrants documentation on here.
I had a suspicion the concurrency issue was caused when incorrect data was not found in the queries. In this situation, my conn would not rollback, commit, nor close in the above example.
Per psycopg2s documentation, even select queries need to be committed or rolled back or the transaction will stand. This in turn will keep your heroku dyno worker focused on the transaction for 30 seconds causing h12. So make sure you commit or rollback each query, regardless of the outcome, with in an application to ensure you do not get an idle transaction.
My queries are spiffy now but the issue persists. I'm not sure what slowly but surely idles my waitress workers. I think some ancillary process is somehow started in one of the class modules I've created which goes indefinitely taking hold of each worker until they are all focused on the transaction which leads to h12.
Would love someones input if they've had a similar experience. I don't want to have a cron job reboot the app every 10 minutes to make this self functioning.

How do we optimize our mongo logging for analytics

We log activity on our website using mongo, but now that our traffic has grown we're struggling to pull the reports we want. We hit timeouts. Each user session has it's own record which include the following attributes -
{
channel: 'adwords',
device: {
type: 'mobile'
},
abTestArms: [
test1_arm2,
test3_arm4
],
userEventCounters: {
clickOnRedButton: 3,
adjustPriceSlider: 4,
clickOnBlueButton: 2,
useSearchBox: 4
}
}
We run lots of AB tests and want to find how different versions effect user activity. So we might run queries like -
db.sessions.count({channel:'bing',device:'tablet',abTestArms:'test1_arm1','userEventCounters.useSearchBox':{$exists:true}})
Or
db.sessions.count({channel:'bing',device:'tablet',abTestArms:'test1_arm1'})
We used to use aggregation pipelines, but they started timing out, so now we're building the results bit by bit, but even like this we hit timeouts.
We've tried various different indexes, so for example compound indexes like
{channel:1, device:1, abTestArms:1, userEventCounters:1}
The only thing we haven't tried is creating lots of compound indexes like
{channel:1, device:1, abTestArms:1, 'userEventCounters.hoverOverAd':1}
The issue with this, is that there's lots userEvents that we track, and we don't want 40+ compound indexes like this. Also we'd have to create new indexes whenever we track a new event.
We currently keep about 3m sessions in our db, and the bigger count queries are normally in the 100k's. This can't be an uncommon thing to do, ideally I'd like to be able to do this stuff on the fly (as we used to when traffic levels were small). What am I missing?

Issue with Entity Framework 4.2 Code First taking a long time to add rows to a database

I am currently using Entity Framework 4.2 with Code First. I currently have a Windows 2008 application server and a database server running on Amazon EC2. The application server has a Windows Service installed that runs once per day. The service executes the following code:
// returns between 2000-4000 records
var users = userRepository.GetSomeUsers();
// do some work
foreach (var user in users)
{
var userProcessed = new UserProcessed { User = user };
userProcessedRepository.Add(userProcessed);
}
// Calls SaveChanges() on DbContext
unitOfWork.Commit();
This code takes a few minutes to run. It also maxes out the CPU on the application server. I have tried the following measures:
Remove the unitOfWork.Commit() to see if it is network related when the application server talks to the database. This did not change the outcome.
Changed my application server from a medium instance to a high CPU instance on Amazon to see if it is resource related. This caused the server not to max out the CPU anymore and the execution time improved slightly. However, the execution time was still a few minutes.
As a test I modified the above code to run three times to see if execution time for the second and third loop using the same DbContext. Every consecutive loop took longer to run that the previous one but that could be related to using the same DbContext.
Am I missing something? Is it really possible that something as simple as this takes minutes to run? Even if I don't commit to the database after each loop? Is there a way to speed this up?
Entity Framework (as it stands) isn't really well suited to this kind of bulk operation. Are you able to use one of the bulk insert methods with EC2? Otherwise, you might find that hand-coding the T-SQL INSERT statements is significantly faster. If performance is important then that probably outweighs the benefits of using EF.
My guess is that your ObjectContext is accumulating a lot of entity instances. SaveChanges seems to have a phase that has time linear in the number of entities loaded. This is likely the reason for the fact that it is taking longer and longer.
A way to resolve this is to use multiple, smaller ObjectContexts to get rid of old entity instances.