Ceph behavior when pool quota is reached - ceph

According to the information from http://docs.ceph.com/docs/master/release-notes/, from Infernalis release, ceph changed default behavior when pool quota is reached (When a pool quota is reached, librados operations now block indefinitely, the same way they do when the cluster fills up. (Previously they would return -ENOSPC.) By default, a full cluster or pool will now block. If your librados application can handle ENOSPC or EDQUOT errors gracefully, you can get error returns instead by using the new librados OPERATION_FULL_TRY flag.)
Does anyone know, is there any way to change this behavior back to return ENOSPC when pool quota is reached?
Thank you!

Have a look at this: https://github.com/ceph/ceph/pull/13615
And if you browser the source code,
in PrimaryLogPG.cc prepare_transaction() function,
you can ignore the check of CEPH_OSD_FLAG_FULL_TRY flag,
and return ENOSPC once the pool FULL flag is set.

Related

Anylogic - recording resource pool activity

I have a job shop scheduling model in anylogic and I want to do a gantt chart for the resource pools (machines). In resource pools actions section exists one called On unit state change and in the help of the resource pool it says the this field has a busy variable associated to it.
Here is the help of resource pool.
I though I could do a while loop in this field so that when the busy variable is true I add the value 1 to my data set and when busy is false I add the value 0. But the problem is when I run my model using that while loop I don't get any error but my model doens't run anymore.
Here is the while loop.
If anyone know what to do please help. Thank you in advance.
Yeah, while loops can easily kill models if the condition stays true.
You need to understand that the code is executed whenever any resource in the pool changes its state.
So you should not use a while loop but a simple if-statement:
if (unit.equals(theUnitMyGanttChartCaresAbout) {
if (busy) {
// tell Gantt chart that "unit" started being busy
} else {
// tell Gantt chart that "unit" started being idle
}
}

How to find the number of active threads in a hystrix thread pool?

I am trying to finetune hystrix threadpool core size and max size. For that I need to know and plot the number of active threads at anytime in the pool. Is there a way to do so?
Is this the right way?
HystrixThreadPoolKey hystrixThreadPoolKey = new HystrixThreadPoolKey() {
#Override
public String name() {
return threadPoolKey;
}
};
HystrixThreadPoolMetrics hystrixThreadPoolMetrics = HystrixThreadPoolMetrics.getInstance(hystrixThreadPoolKey);
log.info("Hystrix active threads: {}", hystrixThreadPoolMetrics.getCurrentActiveCount().toString());
I am not sure because when I use this I get active thread count as 0, when the corePoolSize setting is 10.
This code works fine (after putting a null check for the time till no request is made). But the right way should be to use Netflix's servo.
Netflix Announcement
How to Use
Quoting some part from the announcement:
Servo is designed to make it easy for developers to export metrics from their application code, register them with JMX, and publish them to external monitoring systems
(Also active thread pool size can be less than the core size)

Quickfix reset sequence number at start time but not set ResetSeqNum in Logon message

When the quickfix initiator reconnects at startTime (defined in config) it deletes the files with sequence number, but does not set ResetSeqNumFlag to Y, and the server replies with a Logout message with text "seq msg number to low ..."
Is there a way to set ResetSeqNumFlag = Y only for this behavior? I don`t want to reset the sequence on every log-on.
This appears to be a QuickFIX/J quirk (some might consider it a bug). If ResetOnLogon=N then no ResetSeqNumFlag=Y is sent when the session start time triggers a logon. If ResetOnLogon=Y, the ResetSeqNumFlag=Y is sent on every logon. I believe this is not a big problem in practice because participants in a FIX session typically reset their sequence numbers locally after a session ends (logically ends at the end time, not a connection disconnect).
If you want to slightly modify the source code to implement this behavior, you'd modify the quickfix.Session next() method. You could add a local flag that indicates a session has restarted (per the schedule as determined by checkSessionTime()). Pass that flag to generateLogon() and that method would use it to determine when to send ResetSeqNumFlag=Y regardless of the ResetOnLogon configuration.
I don`t want to reset the sequence on every log-on.
Then don't do it! Set ResetOnLogon=N.
At StartTime, the session will reset sequence numbers always. If ResetOnLogon=N, then they won't reset again until the next StartTime.
The initiator and acceptor should always have matching ResetOnXXX settings.
What you are asking cannot, should not be done. You start you engine with some config and then you change the config while running. If something goes wrong it will be very difficult to pinpoint what started the issue.
Instead of doing ResetSeqNumFlag = Y try adding ResetOnLogon=Y in your config for the acceptor side(that is if you have control over it) or ResetOnLogout=Y / ResetOnDisconnect=Y in your initiator config file. That would be much easier and changing config while running, is possibly not the best solution.
Your logout(disconnect can happen anytime) will happen anyways at EndTime anyways and should be easier for your application.

How does Solaris SMF determine if something is to be in maintenance or to be restarted?

I have a daemon process that I wrote being executed by SMF. The problem is when an error occurs, I have fail code and then it will need to restart from scratch. Right now it is sending sys.exit(0) (Python), but SMF keeps throwing it in maintenance mode.
I've worked with SMF enough to know that it sometimes auto-restarts certain services (and lets others fail and have you deal with them like this). How do I classify this process as one that needs to auto-restart? Is it an SMF setting, a method of failing, what?
Manpage
Solaris uses a combination of startd/critical_failure_count and startd/critical_failure_period as described in the svc.startd manpage:
startd/critical_failure_count
startd/critical_failure_period
The critical_failure_count and critical_failure_period properties together specify the maximum number of service failures allowed in a given time interval before svc.startd transitions the service to maintenance. If the number of failures exceeds critical_failure_count in any period of critical_failure_period seconds, svc.startd will transition the service to maintenance.
Defaults in the source code
The defaults can be found in the source, the value depends on whether the service is "wait style":
if (instance_is_wait_style(inst))
critical_failure_period = RINST_WT_SVC_FAILURE_RATE_NS;
else
critical_failure_period = RINST_FAILURE_RATE_NS;
The defaults are either 5 failures/10 minutes or 5 failures/second:
#define RINST_START_TIMES 5 /* failures to consider */
#define RINST_FAILURE_RATE_NS 600000000000LL /* 1 failure/10 minutes */
#define RINST_WT_SVC_FAILURE_RATE_NS NANOSEC /* 1 failure/second */
These variables can be set in the SMF as properties:
<service_bundle type="manifest" name="npm2es">
<service name="site/npm2es" type="service" version="1">
...
<property_group name="startd" type="framework">
<propval name='critical_failure_count' type='integer' value='10'/>
<propval name='critical_failure_period' type='integer' value='30'/>
<propval name="ignore_error" type="astring" value="core,signal" />
</property_group>
...
</service>
</service_bundle>
TL;DR
After checking against the startd values, If the service is "wait style", it will be throttled to a max restart of 1/sec, until it no longer exits with a non-cfg error. If the service is not "wait style" it will be put into maintenance mode.
Presuming a normal service manifest, I would suspect that you're dropping into maintenance because SMF is restarting you "too quickly" (which is a bit arbitrarily defined). svcs -xv should tell you if that is the case. If it is, SMF is restarting you, and then you're exiting again rapidly and it's decided to give up until the problem is fixed (and you've manually svcadm clear'd it.
I'd wondered if exiting 0 (and indicating success) may cause further confusion, but it doesn't appear that it will.
I don't think Oracle Solaris allows you to tune what SMF considers "too quickly".
You have to create a service manifest. This is more complicated than not. This has example manifests and documents the manifest structure.
http://www.oracle.com/technetwork/server-storage/solaris/solaris-smf-manifest-wp-167902.pdf
As it turns out, I had two pkills in a row to make sure everything was terminated correctly. The second one, naturally, was exiting something other than 0. Changing this to include an exit 0 at the end of the script solved the problem.

Invoke process activity not logging any error in log file

I am trying to use Invoke process to invoke an executable from my windows workflow in my TFS 2010 build.
But when I am looking at the log file it is not logging any error.
I did use WriteBuildMessage and WriteBuildwarning inside my invoke process activity.
I also set the filename,workingdirectory etc in activity.
Can someone please point out why it is not logging?
You can do something like this:
In this case you have to ensure that Message are set as follows:
With those parameters set as depicted, I catch what you seem to be after.
Furthermore, you can check in the Properties of your InvokeProcess: Set the Result into a string-variable and then set in a subsequent WriteBuildMessage this string-variable to be the Message. This way, you 'll additionally catch the return of your invoked process.
EDIT
Another common thing that you 've possibly overlooked is the BuildMessageImportance: if it is not set as High, messages do NOT appear under default Logging Verbosity (= Normal). See here for some background.
In your Invoke Process, you want to set the Result property to update a variable (returns an Int, so lets call it ExitCode), under your Invoke Process (but still in the Agent Scope) you can drop in an If, so you can set the condition of this to ExitCode <> 0 and do whatever you like on a failure (such as failing the build).
Also, as a note, if your WriteBuildMessage is not showing anything in your log, you need to set the Importance to Microsoft.TeamFoundation.Build.Client.BuildMessageImportance.High, anything lower and it wont show in the Normal logging level.