Quickfix reset sequence number at start time but not set ResetSeqNum in Logon message - quickfix

When the quickfix initiator reconnects at startTime (defined in config) it deletes the files with sequence number, but does not set ResetSeqNumFlag to Y, and the server replies with a Logout message with text "seq msg number to low ..."
Is there a way to set ResetSeqNumFlag = Y only for this behavior? I don`t want to reset the sequence on every log-on.

This appears to be a QuickFIX/J quirk (some might consider it a bug). If ResetOnLogon=N then no ResetSeqNumFlag=Y is sent when the session start time triggers a logon. If ResetOnLogon=Y, the ResetSeqNumFlag=Y is sent on every logon. I believe this is not a big problem in practice because participants in a FIX session typically reset their sequence numbers locally after a session ends (logically ends at the end time, not a connection disconnect).
If you want to slightly modify the source code to implement this behavior, you'd modify the quickfix.Session next() method. You could add a local flag that indicates a session has restarted (per the schedule as determined by checkSessionTime()). Pass that flag to generateLogon() and that method would use it to determine when to send ResetSeqNumFlag=Y regardless of the ResetOnLogon configuration.

I don`t want to reset the sequence on every log-on.
Then don't do it! Set ResetOnLogon=N.
At StartTime, the session will reset sequence numbers always. If ResetOnLogon=N, then they won't reset again until the next StartTime.
The initiator and acceptor should always have matching ResetOnXXX settings.

What you are asking cannot, should not be done. You start you engine with some config and then you change the config while running. If something goes wrong it will be very difficult to pinpoint what started the issue.
Instead of doing ResetSeqNumFlag = Y try adding ResetOnLogon=Y in your config for the acceptor side(that is if you have control over it) or ResetOnLogout=Y / ResetOnDisconnect=Y in your initiator config file. That would be much easier and changing config while running, is possibly not the best solution.
Your logout(disconnect can happen anytime) will happen anyways at EndTime anyways and should be easier for your application.

Related

Trouble with agent state chart

I'm trying to create an agent statechart where a transition should happen every day at 4 pm (except weekends).
I have already tried:
1. a conditional transition (condition: getHourOfDay() == 16)
2: A timeout transition that will "reinsert" my agent into the chart every 1 s and check if time = 16.
My code is still not running, does anyone have any idea how to solve it?
This is my statechart view. Customer is a single resource that is supposed to "get" the products out of my stock everyday at 4pm. It is supposed to happen in the "Active" state.
I have set a timeout transition (from Active-Active) that runs every 1s.
Inside my "Active" state in the "entrance action" i wrote my code to check if it is 4 pm and run my action if so.
I thought since i set a timeout transition it would check my condition every 1s, but apparently it is not working.
Your agent does not enter the Active state via an internal transition.
Redraw the transition to actually go out of the Active state and then enter it again as below:
Don't use condition-based transitions, for performance reasons. In your case, it also never triggers because it is only evaluated when something happens in the model. Incidentally, that is not the case at 4pm.
re your timeout approach: Why would you "reinsert" your agent into its own statechart? Not sure I understand...
Why not set up a schedule or event with your recurrence requirement and make it send a message to the statechart: stateChart.fireEvent("trigger!");. In your statechart, add a message-based transition that waits for this message. This will work.
Be careful to understand the difference between the Statechart.fireEvent() and the Statechart.receiveMessage() functions, though.
PS: and agree with Felipe: please start using SOF the way it is meant, i.e. also mark replies as solved. It helps us but also future users to quickly find solutions :-) cheers

Moving from file-based tracing session to real time session

I need to log trace events during boot so I configure an AutoLogger with all the required providers. But when my service/process starts I want to switch to real-time mode so that the file doesn't explode.
I'm using TraceEvent and I can't figure out how to do this move correctly and atomically.
The first thing I tried:
const int timeToWait = 5000;
using (var tes = new TraceEventSession("TEMPSESSIONNAME", #"c:\temp\TEMPSESSIONNAME.etl") { StopOnDispose = false })
{
tes.EnableProvider(ProviderExtensions.ProviderName<MicrosoftWindowsKernelProcess>());
Thread.Sleep(timeToWait);
}
using (var tes = new TraceEventSession("TEMPSESSIONNAME", TraceEventSessionOptions.Attach))
{
Thread.Sleep(timeToWait);
tes.SetFileName(null);
Thread.Sleep(timeToWait);
Console.WriteLine("Done");
}
Here I wanted to make that I can transfer the session to real-time mode. But instead, the file I got contained events from a 15s period instead of just 10s.
The same happens if I use new TraceEventSession("TEMPSESSIONNAME", #"c:\temp\TEMPSESSIONNAME.etl", TraceEventSessionOptions.Create) instead.
It seems that the following will cause the file to stop being written to:
using (var tes = new TraceEventSession("TEMPSESSIONNAME"))
{
tes.EnableProvider(ProviderExtensions.ProviderName<MicrosoftWindowsKernelProcess>());
Thread.Sleep(timeToWait);
}
But here I must reenable all the providers and according to the documentation "if the session already existed it is closed and reopened (thus orphans are cleaned up on next use)". I don't understand the last part about orphans. Obviously some events might occur in the time between closing, opening and subscribing on the events. Does this mean I will lose these events or will I get the later?
I also found the following in the documentation of the library:
In real time mode, events are buffered and there is at least a second or so delay (typically 3 sec) between the firing of the event and the reception by the session (to allow events to be delivered in efficient clumps of many events)
Does this make the above code alright (well, unless the improbable happens and for some reason my thread is delayed for more than a second between creating the real-time session and starting processing the events)?
I could close the session and create a new different one but then I think I'd miss some events. Or I could open a new session and then close the file-based one but then I might get duplicate events.
I couldn't find online any examples of moving from a file-based trace to a real-time trace.
I managed to contact the author of TraceEvent and this is the answer I got:
Re the exception of the 'auto-closing and restarting' feature, it is really questions about the OS (TraceEvent simply calls the underlying OS API). Just FYI, the deal about orphans is that it is EASY for your process to exit but leave a session going. This MAY be what you want, but often it is not, and so to make the common case 'just work' if you do Create (which is the default), it will close a session if it already existed (since you asked for a new one).
Experimentation of course is the touchstone of 'truth' but I would frankly expecting unusual combinations to just work is generally NOT true.
My recommendation is to keep it simple. You need to open a new session and close the original one. Yes, you will end up with duplicates, but you CAN filter them out (after all they are IDENTICAL timestamps).
The other possibility is use SetFileName in its intended way (from one file to another). This certainly solves your problem of file size growth, and often is a good way to deal with other scenarios (after all you can start up you processing and start deleting files even as new files are being generated).

Request time-out in Quickfix ??

is there anyway to set request time-out while sending message from initiator ??
we had a issue where we got late reply from acceptor and application went in not responsive mode. issue can be with network delay or etc. but I think it will be good if we can set time-out option here.
Seeing with Application call back didn't find anything .
I want to set time-out option with SendToTarget API,,
any suggestion
Did you add CheckLatency and MaxLatency in your config file and confirmed ?
CheckLatency If set to Y, messages must be received from the counterparty within a defined number of seconds (see MaxLatency). It is useful to turn this off if a system uses localtime for it's timestamps instead of GMT.
MaxLatency If CheckLatency is set to Y, this defines the number of seconds latency allowed for a message to be processed. Default is 120. positive integer
I'm experiencing the same problem using QuickFix /n
Looking at the source code for version 1.4 the section that reads those settings from the configuration file is commented out and replaced with hard coded default values.
// FIXME to get from config if available
session.MaxLatency = 120;
session.CheckLatency = true;

How does Solaris SMF determine if something is to be in maintenance or to be restarted?

I have a daemon process that I wrote being executed by SMF. The problem is when an error occurs, I have fail code and then it will need to restart from scratch. Right now it is sending sys.exit(0) (Python), but SMF keeps throwing it in maintenance mode.
I've worked with SMF enough to know that it sometimes auto-restarts certain services (and lets others fail and have you deal with them like this). How do I classify this process as one that needs to auto-restart? Is it an SMF setting, a method of failing, what?
Manpage
Solaris uses a combination of startd/critical_failure_count and startd/critical_failure_period as described in the svc.startd manpage:
startd/critical_failure_count
startd/critical_failure_period
The critical_failure_count and critical_failure_period properties together specify the maximum number of service failures allowed in a given time interval before svc.startd transitions the service to maintenance. If the number of failures exceeds critical_failure_count in any period of critical_failure_period seconds, svc.startd will transition the service to maintenance.
Defaults in the source code
The defaults can be found in the source, the value depends on whether the service is "wait style":
if (instance_is_wait_style(inst))
critical_failure_period = RINST_WT_SVC_FAILURE_RATE_NS;
else
critical_failure_period = RINST_FAILURE_RATE_NS;
The defaults are either 5 failures/10 minutes or 5 failures/second:
#define RINST_START_TIMES 5 /* failures to consider */
#define RINST_FAILURE_RATE_NS 600000000000LL /* 1 failure/10 minutes */
#define RINST_WT_SVC_FAILURE_RATE_NS NANOSEC /* 1 failure/second */
These variables can be set in the SMF as properties:
<service_bundle type="manifest" name="npm2es">
<service name="site/npm2es" type="service" version="1">
...
<property_group name="startd" type="framework">
<propval name='critical_failure_count' type='integer' value='10'/>
<propval name='critical_failure_period' type='integer' value='30'/>
<propval name="ignore_error" type="astring" value="core,signal" />
</property_group>
...
</service>
</service_bundle>
TL;DR
After checking against the startd values, If the service is "wait style", it will be throttled to a max restart of 1/sec, until it no longer exits with a non-cfg error. If the service is not "wait style" it will be put into maintenance mode.
Presuming a normal service manifest, I would suspect that you're dropping into maintenance because SMF is restarting you "too quickly" (which is a bit arbitrarily defined). svcs -xv should tell you if that is the case. If it is, SMF is restarting you, and then you're exiting again rapidly and it's decided to give up until the problem is fixed (and you've manually svcadm clear'd it.
I'd wondered if exiting 0 (and indicating success) may cause further confusion, but it doesn't appear that it will.
I don't think Oracle Solaris allows you to tune what SMF considers "too quickly".
You have to create a service manifest. This is more complicated than not. This has example manifests and documents the manifest structure.
http://www.oracle.com/technetwork/server-storage/solaris/solaris-smf-manifest-wp-167902.pdf
As it turns out, I had two pkills in a row to make sure everything was terminated correctly. The second one, naturally, was exiting something other than 0. Changing this to include an exit 0 at the end of the script solved the problem.

Quickfix Syncing Sequence Numbers

If, for whatever reason, my application loses track of what sequence number I was on, what is the recommended way of re-establishing the session to continue trading?
From FIX 4.1 onwards, the Logon message (A) contains a tag ResetSeqNumFlag (141). Setting it to 'Y' "indicates that both sides of the FIX session should reset sequence numbers."
If the acceptor receives a message out of sequence, it would send a reject message and the reason for it in the text field. In the text field it would mention the sequence number expected, so parse that part of the message, take the sequence number and start again.