Spring Boot Restful service with incremental memory without load - rest

I have deployed Spring Boot Restful service [spring-boot-starter-parent -
1.3.0.RELEASE] along with: -
spring-boot-starter-cloud-connectors,
spring-boot-starter-data-rest,
spring-boot-starter-jdbc,
spring-boot-starter-test,
spring-boot-configuration-processor,
spring-context-support[4.1.2.RELEASE] and
spring-cloud-starter-parent[Brixton.M4]
maven dependencies. Also connected with MySQL and AutoScalar marketplace services.
This restful service is working excellent with respect to SLA and memory utilisation. However, I have observed following concerns:
The application has been deployed with initial memory however after
couple of minutes (say 15-20 minutes), it is keep growing by 2-5MBs
every couple-of-minutes. However there is no load (service call).
After a huge work load, the application get reached on certain memory (say 600MB). But the Memory is not coming down after the load
(even in idle state for a while ). However due to above point 1, the
memory is keep growing.
The JMX is showing that GC is calling both minor and major while work load and following is the Spring Metrics through spring-boot-starter-actuator:
{"_links" : {"self" : {"href" : "https://Promotion/metrics"}},"mem" :
376320,"mem.free" : 199048,"processors" : 4,"instance.uptime" :
6457682,"uptime" : 6464241,"systemload.average" : 0.05,"heap.committed"
: 376320,"heap.init" : 382976,"heap.used" : 177271,"heap" :
376320,"threads.peak" : 22,"threads.daemon" : 20,"threads.totalStarted"
: 27,"threads" : 22,"classes" : 9916,"classes.loaded" :
9916,"classes.unloaded" : 0,"gc.ps_scavenge.count" :
47,"gc.ps_scavenge.time" : 344,"gc.ps_marksweep.count" :
0,"gc.ps_marksweep.time" : 0,"httpsessions.max" :
-1,"httpsessions.active" : 0,"datasource.primary.active" :
0,"datasource.primary.usage" : 0.0,"gauge.response.metrics" :
7.0,"gauge.response.BuyMoreSaveMore.getDiscDetails" :
14.0,"counter.status.200.metrics" :
7,"counter.status.200.BuyMoreSaveMore.getDiscDetails" : 850}

Related

Deploy Graylog on GKE

I'm having a hard time deploying Graylog on Google Kubernetes Engine, I'm using this configuration https://github.com/aliasmee/kubernetes-graylog-cluster with some minor modifications. My Graylog server is up but show this error in the interface:
Error message
Request has been terminated
Possible causes: the network is offline, Origin is not allowed by Access-Control-Allow-Origin, the page is being unloaded, etc.
Original Request
GET http://ES_IP:12900/system/sessions
Status code
undefined
Full error message
Error: Request has been terminated
Possible causes: the network is offline, Origin is not allowed by Access-Control-Allow-Origin, the page is being unloaded, etc.
Graylog logs show nothing in particular other than this:
org.graylog.plugins.threatintel.tools.AdapterDisabledException: Spamhaus service is disabled, not starting (E)DROP adapter. To enable it please go to System / Configurations.
at org.graylog.plugins.threatintel.adapters.spamhaus.SpamhausEDROPDataAdapter.doStart(SpamhausEDROPDataAdapter.java:68) ~[?:?]
at org.graylog2.plugin.lookup.LookupDataAdapter.startUp(LookupDataAdapter.java:59) [graylog.jar:?]
at com.google.common.util.concurrent.AbstractIdleService$DelegateService$1.run(AbstractIdleService.java:62) [graylog.jar:?]
at com.google.common.util.concurrent.Callables$4.run(Callables.java:119) [graylog.jar:?]
at java.lang.Thread.run(Thread.java:748) [?:1.8.0_181]
but at the end :
2019-01-16 13:35:00,255 INFO : org.graylog2.bootstrap.ServerBootstrap - Graylog server up and running.
Elastic search health check is green, no issues in ES nor Mongo logs.
I suspect a problem with the connection to Elastic Search though.
curl http://ip_address:9200/_cluster/health\?pretty
{
"cluster_name" : "elasticsearch",
"status" : "green",
"timed_out" : false,
"number_of_nodes" : 1,
"number_of_data_nodes" : 1,
"active_primary_shards" : 4,
"active_shards" : 4,
"relocating_shards" : 0,
"initializing_shards" : 0,
"unassigned_shards" : 0,
"delayed_unassigned_shards" : 0,
"number_of_pending_tasks" : 0,
"number_of_in_flight_fetch" : 0,
"task_max_waiting_in_queue_millis" : 0,
"active_shards_percent_as_number" : 100.0
}
After reading the tutorial you shared, I was able to identify that kubelet needs to run with the argument --allow-privileged.
"Elasticsearch pods need for an init-container to run in privileged mode, so it can set some VM options. For that to happen, the kubelet should be running with args --allow-privileged, otherwise the init-container will fail to run."
It's not possible to customize or modify kubelet parameter/arguments, there is a feature request found here: https://issuetracker.google.com/118428580, so this can be implemented in a future.
Also in case you are modifying kubelet directly on the node(s), it's possible that the master resets the configuration and it isn't guaranteed that the configurations will be persistent.

Service Fabric Replica Stuck

I am upgrading an application on Service Fabric and one of the replicas is showing the following warning:
Unhealthy event: SourceId='System.RAP', Property='IStatefulServiceReplica.ChangeRole(S)Duration', HealthState='Warning', ConsiderWarningAsError=false.
The api IStatefulServiceReplica.ChangeRole(S) on node _gtmsf1_0 is stuck. Start Time (UTC): 2018-03-21 15:49:54.326.
After some debugging, I suspect I'm not properly honoring a cancellation token. In the meantime, how do I safely force a restart of this stuck replica to get the service working again?
Partial results of Get-ServiceFabricDeployedReplica:
...
ReplicaRole : ActiveSecondary
ReplicaStatus : Ready
ServiceTypeName : MarketServiceType
...
ServicePackageActivationId :
CodePackageName : Code
...
HostProcessId : 6180
ReconfigurationInformation : {
PreviousConfigurationRole : Primary
ReconfigurationPhase : Phase0
ReconfigurationType : SwapPrimary
ReconfigurationStartTimeUtc : 3/21/2018 3:49:54 PM
}
You might be able to pipe that directly to Restart-ServiceFabricReplica. If that remains stuck, then you should be able to use Get-ServiceFabricDeployedCodePackage and Restart-ServiceFabricDeployedCodePackage to restart the surrounding process. Since Restart-ServiceFabricDeployedCodePackage has options for selecting random packages to simulate failure, just be sure to target the specific code package you're interested in restarting.

How to get a list of available offers and the resources they use in Azure RM REST API?

For the classic Azure model there is a method to get a list of available roles which is described here:
https://msdn.microsoft.com/en-us/library/dn469422.aspx
https://management.core.windows.net/<subscription-id>/rolesizes
The method returns a list of offers and resources they use like memory and number of cores.
On Powershell its Get-AzureRoleSizes which outputs a list of elements like this:
InstanceSize : Standard_L8s
RoleSizeLabel : Standard_L8s (8 cores, 65536 MB)
Cores : 8
MemoryInMb : 65536
SupportedByWebWorkerRoles : False
SupportedByVirtualMachines : True
MaxDataDiskCount : 16
WebWorkerResourceDiskSizeInMb : 0
VirtualMachineResourceDiskSizeInMb : 1421312
OperationDescription : Get-AzureRoleSize
OperationId : 6aae4878-e8f4-7e1a-b434-8fb4dc4fd389
OperationStatus : Succeeded
I need that information to know how many resources a new VM is going to take before deploying it but using the newer ARM REST API.
Is there an equivalent?
Use the following API to view available machine sizes.
https://learn.microsoft.com/en-us/rest/api/compute/virtualmachines/virtualmachines-list-sizes-region

Is CMS Replication required for ApplicationPool also?

Is CMS Replication required for ApplicationPool also?
When I run the command Get-CsManagementStoreReplicationStatus I get UpToDate : True for my domain but it comes False for my ApplicationPool.
UpToDate : True
ReplicaFqdn : ****.*****
LastStatusReport : 07-08-2014 11:42:26
LastUpdateCreation : 07-08-2014 11:42:26
ProductVersion : 5.0.8308.0
UpToDate : False
ReplicaFqdn : MyApplicationPool.****.*****
LastStatusReport :
LastUpdateCreation : 08-08-2014 15:16:03
ProductVersion :
UpToDate : False
ReplicaFqdn : ****.*****
LastStatusReport :
LastUpdateCreation : 08-08-2014 15:10:59
Am I on the right track? Have I created my ApplicationPool wrongly?
Yes, UCMA applications running on an app server generally require access to the CMS, so replication should be enabled.
On the app server, you'd need to:
Ensure the "Lync Server Replica Replicator Agent" service is running
Run Enable-CsReplica in the management shell
Run Enable-CsTopoloy
Then run Invoke-CSManagementStoreReplication to force a replication
I've noticed that it often takes a while for the CMS to be replicated to the app server, so you might need to run Get-CsManagementStoreReplicationStatus a few times before you see UpToDate change to True.

mongod crashed without logging

I'm using mongodb v2.2.2 on single server(Ubuntu 12.04).
It crashed with no log on /var/log/mongodb/mongodb.log.
It seemed crashed during logging.(Character is interrupted. And, this log is normal query log.)
And, I checked on syslog about memory-issue(for example, killed proccess),
but couldn't find it.
Then, I found the following error on mongo-shell(db.printCollectionStats() command).
DLLConnectionResultData
{
"ns" : "UserData.DLLConnectionResultData",
"count" : 8215398,
"size" : 4831306500,
"avgObjSize" : 588.0794211065611,
"errmsg" : "exception: assertion src/mongo/db/database.cpp:300",
"code" : 0,
"ok" : 0
}
How do I figure out problems?
Thank you,
I checked that line in the source code for 2.2.2 (see here for reference). That error is specifically related to enforcing quotas on MongoDB. You haven't mentioned enforcing quotas here or what you have set the files limit to (default is 8) but you could be running into the limit here.
First, I would recommend getting onto a more recent version of 2.2 (and upgrading to 2.4 eventually, but definitely 2.2.7+ initially). If you are using quotas, this fix which went into 2.2.5 would log quota exceeded messages (previously logged only at log level 1, default is log level 0). Hence if a quota violation is the culprit here, you may get an early warning.
If that is the root cause, then you have a couple of options:
After upgrading to the latest version of 2.2, of the issue happens repeatedly, file a bug report for the crash on 2.2
Upgrade to 2.4, verify that the issue still occurs, and file a bug (or add to the above report for 2.2)
In either case, turning off quotas in the interim would be the obvious way to prevent the crash.