My AEM server after a few days, becomes unresponsive and crashes. As per this article - https://helpx.adobe.com/experience-manager/kb/check-and-analyze-if-JCR-session-leaks-in-your-AEM-instance.html, on checking http://localhost:4502/system/console/jmx I found out that there are more than 60,000 SessionStatistics objects. I would like to know what these represent? Are these active sessions? or is this the list of all the sessions ever created on AEM server?
I would like to know what these represent? Are these active sessions? or is this the list of all the sessions ever created on AEM server?
Yes, these are active open sessions running currently on your AEM server - created since you last started your instance. You can find the last started time from /system/console/vmstat and all the session objects will have a timestamp after the Last Started time. You'll notice the timestamp against the session name. Something similar to this.
"communities-user-admin#session-1132#25/10/2018 5:03:26 PM"
The link you've posted already indicates potential fixes for open sessions.
Another possible reason for Build up of session objects is due to inefficient long running JCR queries (queries without indexes, very broad predicates, etc). This could lead to increase in garbage collection because of increase in memory usage (if mem params are not specified in start script), analysing gc.log might provide some insights. If you know pretty well that queries are causing build up of session objects, you can use these params in your start script to optimize the resources being used.
-Doak.queryLimitInMemory=1000 -Doak.queryLimitReads=1000 -Dupdate.limit=1000 -Doak.fastQuerySize=true
To find location of gc.log, use lsof
lsof -p ${JAVA PID} | grep gc.log
Related
Here's our issue. Every day, we update our search path by replacing a schema with another.
So if today our search path would be public, alpha, tomorrow it will be public, beta, then back to public, alpha the day after that. We do this because we want our users to get data from the latest schema, while we do some work on the previous day's data.
Our problem is that whenever we switch the search path, we have some time to wait until the connections in Npgsql's pool are closed and get the updated search path. If you add that some user might spam our API continuously, we might end up with a connection that uses the same search path for a lot longer.
Is there a way to update the search path for the whole pool using some kind of trigger? I know that we could set a lifetime for each connection and allow for something like 30 minutes for a connection until it's closed, but I was hoping there was a better solution.
Instead of "switching the search path" (more detail is needed on what exactly that means), you can simply include the search path in the connection string, meaning that you'd be alternating between two connection strings. Since each connection string gets its own pool, there's no problem. The older pool would gradually empty thanks to connection pruning.
Otherwise, a connection pool can be emptied by calling NpsgqlConnection.ClearPool (or ClearAllPools).
My atomist client exposes metrics on commands that are run. Each command is a metric with a username element as well a status element.
I've been scraping this data for months without resetting the counts.
My requirement is to show the number of active users over a time period. i.e 1h, 1d, 7d and 30d in Grafana.
The original query was:
count(count({Username=~".+"}) by (Username))
this is an issue because I dont clear the metrics so its always a count since inception.
I then tried this:
count(max_over_time(help_command{job=“Application
Name”,Username=~“.+“}[1w]) -
max_over_time(help_command{job=“Application name”,Username=~“.+“}[1w]
offset 1w) > 0)
which works but only for one command I have about 50 other commands that need to be added to that count.
I tried the:
"{__name__=~".+_command",job="app name"}[1w] offset 1w"
but this is obviously very expensive (timeout in browser) and has issues with integrating max_over_time which doesn't support it.
Any help, am I using the metric in the wrong way. Is there a better way to query... my only option at the moment is the count (format working above for each command)
Thanks in advance.
To start, I will point out a number of issues with your approach.
First, the Prometheus documentation recommends against using arbitrarily large sets of values for labels (as your usernames are). As you can see (based on your experience with the query timing out) they're not entirely wrong to advise against it.
Second, Prometheus may not be the right tool for analytics (such as active users). Partly due to the above, partly because it is inherently limited by the fact that it samples the metrics (which does not appear to be an issue in your case, but may turn out to be).
Third, you collect separate metrics per command (i.e. help_command, foo_command) instead of a single metric with the command name as label (i.e. command_usage{commmand="help"}, command_usage{commmand="foo"})
To get back to your question though, you don't need the max_over_time, you can simply write your query as:
count by(__name__)(
(
{__name__=~".+_command",job=“Application Name”}
-
{__name__=~".+_command",job=“Application name”} offset 1w
) > 0
)
This only works though because you say that whatever exports the counts never resets them. If this is simply because that exporter never restarted and when it will the counts will drop to zero, then you'd need to use increase instead of minus and you'd run into the exact same performance issues as with max_over_time.
count by(__name__)(
increase({__name__=~".+_command",job=“Application Name”}[1w]) > 0
)
I've run into a mystifying XMLA timeout error when running an ADOMD.Net command from a .Net application. The Visual Basic routine iterates over a list of mining models residing on a SQL Server Analysis Services 2014 instance and performs a cross-validation test on each one. Whenever the time elapsed on the cross-validation test reaches the 60 minute mark, the XML for Analysis parser throws an error, saying that the request timed out. For any routine operations taking less than one hour, I can use the same ADOMD.Net connections with the same server and application without any hitches. The culprit in such cases is often the ExternalCommandTimeout setting on the server, which defaults to 3600 seconds, i.e one hour. In this case, however, all of the following timeout properties on the server are set to zero: CommitTimeout, ExternalCommandTimeout, ExternalConnectionTimeout, ForceCommitTimeout, IdleConnectionTimeout, IdleOrphanSessionTimeout, MaxIdleSessionTimeout and ServerTimeout.
There are only three other timeout properties available, none of which is set to one hour: MinldleSessionTimeout (currently at 2700), DatabaseConnectionPoolConnectTimeout (now at 60 seconds) and DatabaseConnectionPoolTimeout (at 120000). The MSDN documentation lists another three timeout properties that aren't visible with the Advanced Properties checked in SQL Server Management Studio 2017:
AdminTimeout, DefaultLockTimeoutMS and DatabaseConnectionPoolGeneralTimeout. The first two default to no timeout and the third defaults to one minute. MSDN also mentions a few "forbidden" timeout properties, like SocketOptions\ LingerTimeout, InitialConnectTimeout, ServerReceiveTimeout, ServerSendTimeout, which all carry the warning, "An advanced property that you should not change, except under the guidance of Microsoft support." I do not see any means of setting these through the SSMS 2017 GUI though.
Since I've literally run out of timeout settings to try, I'm stumped as to how to correct this behavior and allow my .Net app to wait on those cross-validations through ADOMD. Long ago I was able to solve a few arcane SSAS timeout issues by appending certain property settings to the connection strings, such as "Connect Timeout=0;CommitTimeout=0;Timeout=0" and so on. Nevertheless, attempting to assign an ExternalCommandTimeout value through the connection string in this manner results in the XMLA error
"The ExternalCommandTimeout property was not recognized." I have not tested each and every one of the SSAS server timeouts in this manner, but this exception signifies that ADOMD.Net connection strings can only accept a subset of the timeout properties.
Am I missing a timeout setting somewhere? Does anyone have any ideas on what else could cause this kind of esoteric error? Thanks in advance. I've put this issue on the back burner about as long as I can and really need to get it fixed now. I wonder if perhaps ADOMD.Net has its own separate timeout settings, perhaps going by different names, but I can't find any documentation to that effect...
I tracked down the cause of this error: buried deep in the VB.Net code on the front end was a line that set the CommandTimeout property of the ADOMD.Net Command object to 3600 seconds. This overrode the connection string settings mentioned above, as well as all of the server-level settings. The problem was masked by the fact that cross-validation retrieval operations were also timing out in the Visual Studio 2017 GUI. That occurred because the VS instance was only recently installed and the Connection and Query Timeouts hadn't yet been set to 0 under Options menu/Business Intelligence Designers/Analysis Services Designs/General.
Is there any way to see how many transaction logs a process (agent_id) currently spans? Or list the transaction logs it's currently using/spanning? I.e. is it possible to check if NUM_LOG_SPAN is about to be reached?
We've had an issue recently whereby a long running transaction breached NUM_LOG_SPAN. This was set to 70, when we had 105 logs. We've increased this now but potentially it may still not be enough. We could set NUM_LOG_SPAN to 0, but that's a last resort... what we'd like to be able to do is at least monitor the situation (and not just wait until it hits and causes issues) - to be able to run a command to see if, for example, a process was now using/spanning, say, 90 of the logs? And then we could decide whether to cancel it or not.
We're after something similar to the following statement where you can see the percentage of transaction log usage:
select log_utilization_percent,dbpartitionnum from sysibmadm.log_utilization
-is there anything similar for monitoring processes to ensure they don't cross the NUM_LOG_SPAN threshold?
NB: This is in a SAP system (NW7.3)... perhaps there's something in DBACOCKPIT to view this too?
As far as I can tell you can't calculate this from the monitor functions only, because none of the monitoring functions expose the Start LSN for a unit of work.
You can do this with db2pd, though. Use db2pd -db <dbname> -logs to find the Current LSN, and use db2pd -db <dbname> -transactions to find Firstlsn for the particular unit of work.
With these two numbers, you can use the formula
(currentLSN - firstLSN)
Logs Files Spanned = -------------------------
logfilsiz * 4096
(You should convert the hex values for current LSN and firstLSN returned by db2pd to decimal values).
Well, normally I'm not the person intended to do that, I'm a PHP developer and have general knowledge about Apache and security administration, but for emergency only I have to do this now.
I'm in a situation where I need to write Mod_Security rule that:
- blocks specific IP address from access our website,
- for 5 minutes
- if it try to call more than 10 links in less than 10 seconds
Can I achieve that writing a mod_security rule?
ModSecurity can do this, but wouldn't suggest it.
Have a look at the DOS rules in the OWASP CRS: https://github.com/SpiderLabs/owasp-modsecurity-crs/blob/master/experimental_rules/modsecurity_crs_11_dos_protection.conf. Note these do depend on set up in the main CRS setup file: https://github.com/SpiderLabs/owasp-modsecurity-crs/blob/master/modsecurity_crs_10_setup.conf.example
However ModSecurity collections are not the most stable especially for high volume. You run into problems with multiple threads accessing the collection file. Also might find you have to delete the collection file regularly (e.g. every 24 hours) to prevent it continually growing.