Moving from file-based tracing session to real time session - trace

I need to log trace events during boot so I configure an AutoLogger with all the required providers. But when my service/process starts I want to switch to real-time mode so that the file doesn't explode.
I'm using TraceEvent and I can't figure out how to do this move correctly and atomically.
The first thing I tried:
const int timeToWait = 5000;
using (var tes = new TraceEventSession("TEMPSESSIONNAME", #"c:\temp\TEMPSESSIONNAME.etl") { StopOnDispose = false })
{
tes.EnableProvider(ProviderExtensions.ProviderName<MicrosoftWindowsKernelProcess>());
Thread.Sleep(timeToWait);
}
using (var tes = new TraceEventSession("TEMPSESSIONNAME", TraceEventSessionOptions.Attach))
{
Thread.Sleep(timeToWait);
tes.SetFileName(null);
Thread.Sleep(timeToWait);
Console.WriteLine("Done");
}
Here I wanted to make that I can transfer the session to real-time mode. But instead, the file I got contained events from a 15s period instead of just 10s.
The same happens if I use new TraceEventSession("TEMPSESSIONNAME", #"c:\temp\TEMPSESSIONNAME.etl", TraceEventSessionOptions.Create) instead.
It seems that the following will cause the file to stop being written to:
using (var tes = new TraceEventSession("TEMPSESSIONNAME"))
{
tes.EnableProvider(ProviderExtensions.ProviderName<MicrosoftWindowsKernelProcess>());
Thread.Sleep(timeToWait);
}
But here I must reenable all the providers and according to the documentation "if the session already existed it is closed and reopened (thus orphans are cleaned up on next use)". I don't understand the last part about orphans. Obviously some events might occur in the time between closing, opening and subscribing on the events. Does this mean I will lose these events or will I get the later?
I also found the following in the documentation of the library:
In real time mode, events are buffered and there is at least a second or so delay (typically 3 sec) between the firing of the event and the reception by the session (to allow events to be delivered in efficient clumps of many events)
Does this make the above code alright (well, unless the improbable happens and for some reason my thread is delayed for more than a second between creating the real-time session and starting processing the events)?
I could close the session and create a new different one but then I think I'd miss some events. Or I could open a new session and then close the file-based one but then I might get duplicate events.
I couldn't find online any examples of moving from a file-based trace to a real-time trace.

I managed to contact the author of TraceEvent and this is the answer I got:
Re the exception of the 'auto-closing and restarting' feature, it is really questions about the OS (TraceEvent simply calls the underlying OS API). Just FYI, the deal about orphans is that it is EASY for your process to exit but leave a session going. This MAY be what you want, but often it is not, and so to make the common case 'just work' if you do Create (which is the default), it will close a session if it already existed (since you asked for a new one).
Experimentation of course is the touchstone of 'truth' but I would frankly expecting unusual combinations to just work is generally NOT true.
My recommendation is to keep it simple. You need to open a new session and close the original one. Yes, you will end up with duplicates, but you CAN filter them out (after all they are IDENTICAL timestamps).
The other possibility is use SetFileName in its intended way (from one file to another). This certainly solves your problem of file size growth, and often is a good way to deal with other scenarios (after all you can start up you processing and start deleting files even as new files are being generated).

Related

how to run bpy callback on workspace tools change

How to add a pre-draw hook to current context workspace.tools change?
I attempted to get there using bpy.types.SpaceView3D.draw_handler_add(...) which as it runs on every draw, checks if workspace.tools changed, and if it changed, run my callback, but my callback wants to add its own SpaceView3D.draw_handler_add and doing it this way adds it a frame-too-late, leaving the view port undrawn until a user event repaints the screen.
I found this post online
https://devtalk.blender.org/t/update-property-when-active-tool-changes/11467/12
summary: maybe there is a mainline callback new
https://developer.blender.org/D10635
AFWS Jan '20
#kaio
This seem like a better solution. It’s kind of a mystery code, cause I
couldn’t figure out where you got that code info ,but then started
looking at the space_toolsystem_common.py file. kaio AFWS Jan '20
Just realized there might be a cleaner way of getting callbacks for
active tools using the msgbus. Since workspace tools aren’t rna
properties themselves, figured it’s possible to monitor the
bpy_prop_collection which changes with the tool.
The handle is the workspace itself, so shouldn’t have to worry about
keeping a reference. The subscription lasts until a new file is
loaded, so add a load_post callback which reapplies it.
Note this doesn’t proactively subscribe to workspaces added
afterwards. Might need a separate callback for that :joy:
import bpy
def rna_callback(workspace):
idname = workspace.tools[-1].idname
print(idname)
def subscribe(workspace):
bpy.msgbus.subscribe_rna(
key=ws.path_resolve("tools", False),
owner=workspace,
args=(workspace,),
notify=rna_callback)
if __name__ == "__main__":
ws = bpy.context.workspace
subscribe(bpy.context.workspace)
# Subscribe to all workspaces: if 0:
for ws in bpy.data.workspaces:
subscribe(bpy.context.workspace)
# Clear all workspace subscriptions if 0:
for ws in bpy.data.workspaces:
bpy.msgbus.clear_by_owner(ws)

Late data handling | Apache Beam

Late data which has missed the window and .withAllowedLateness period is dropped off from the pipeline as documented here
I have a few questions on this behavior:
How to handle late data which is dropped off from the pipeline? Can we add default behavior? Say all late data should be logged somewhere like catch-all bucket?
Can we have a Metric(Google Dataflow Metrics/Beam) to say how many of these messages are dropped off from pipeline due to huge latency?
In general we define late data as elements that, by the time they arrive, we just prefer to drop them and do not want to process any further. As far as I know, adding extra functionality to handle those messages would require substantial effort to modify the Java SDK. However, if you just want to log them this is done by the LateDataDroppingDoFnRunner code, which is responsible for dropping data from expired windows:
for (WindowedValue<InputT> input : concatElements) {
BoundedWindow window = Iterables.getOnlyElement(input.getWindows());
if (canDropDueToExpiredWindow(window)) {
// The element is too late for this window.
droppedDueToLateness.inc();
WindowTracing.debug(
"{}: Dropping element at {} for key:{}; window:{} "
+ "since too far behind inputWatermark:{}; outputWatermark:{}",
LateDataFilter.class.getSimpleName(),
input.getTimestamp(),
key,
window,
timerInternals.currentInputWatermarkTime(),
timerInternals.currentOutputWatermarkTime());
}
}
Note that the log has DEBUG level so you might not see it. As explained here, to override the level in Dataflow, you can use --defaultWorkerLogLevel=DEBUG or, even better, specify a particular class such as --workerLogLevelOverrides={"org.apache.beam.sdk.util.WindowTracing":"DEBUG"}. Choosing your keys wisely can help expose information to identify the dropped message (i.e. data lineage).
As can be seen in the previous snippet, droppedDueToLateness is a Counter metric that is incremented each time we drop an element: droppedDueToLateness.inc();. You can monitor it using Stackdriver with resource type dataflow_job and metric custom.googleapis.com/dataflow/droppedDueToLateness.

(Laravel 5) Monitor and optionally cancel an ALREADY RUNNING job on queue

I need to achieve the ability to monitor and be able to cancel an ALREADY RUNNING job on queue.
There's a lot of answers about deleting QUEUED jobs, but not on an already running one.
This is the situation: I have a "job", which consists of HUNDREDS OF THOUSANDS rows on a database, that need to be queried ONE BY ONE against a web service.
Every row needs to be picked up, queried against a web service, stored the response and its status updated.
I had that already working as a Command (launching from / outputting to console), but now I need to implement queues in order to allow piling up more jobs from more users.
So far I've seen Horizon (which doesn't runs on Windows due to missing process control libs). However, in some demos seen around it lacks (I believe) a couple things I need:
Dynamically configurable timeout (the whole job may take more than 12 hours, depending on the number of rows to process on the selected job)
Ability to CANCEL an ALREADY RUNNING job.
I also considered the option to generate EACH REQUEST as a new job instead of seeing a "job" as the whole collection of rows (this would overcome the timeout thing), but that would give me a Horizon "pending jobs" list of hundreds of thousands of records per job, and that would kill the browser (I know Redis can handle this without itching at all). Further, I guess is not possible to cancel "all jobs belonging to X tag".
I've been thinking about hitting an API route, fire the job and decouple it from the app, but I'm seeing that this requires forking processes.
For the ability to cancel, I would implement a database with job_id, and when the user hits an API to cancel a job, I'd mark it as "halted". On every loop I would check its status and if it finds "halted" then kill itself.
If I've missed any aspect just holler and I'll add it or clarify about it.
So I'm asking for an advice here since I'm new to Laravel: how could I achieve this?
So I finally came up with this (a bit clunky) solution:
In Controller:
public function cancelJob()
{
$jobs = DB::table('jobs')->get();
# I could use a specific ID and user owner filter, etc.
foreach ($jobs as $job) {
DB::table('jobs')->delete($job->id);
}
# This is a file that... well, it's self explaining
touch(base_path(config('files.halt_process_signal')));
return "Job cancelled - It will stop soon";
}
In job class (inside model::chunk() function)
# CHECK FOR HALT SIGNAL AND [OPTIONALLY] STOP THE PROCESS
if ($this->service->shouldHaltProcess()) {
# build stats, do some cleanup, log, etc...
$this->halted = true;
$this->service->stopProcess();
# This FALSE is what it makes the chunk() method to stop looping
return false;
}
In service class:
/**
* Checks the existence of the 'Halt Process Signal' file
*
* #return bool
*/
public function shouldHaltProcess() :bool
{
return file_exists($this->config['files.halt_process_signal']);
}
/**
* Stop the batch process
*
* #return void
*/
public function stopProcess() :void
{
logger()->info("=== HALT PROCESS SIGNAL FOUND - STOPPING THE PROCESS ===");
$this->deleteHaltProcessSignalFile();
return ;
}
It doesn't looks quite elegant, but it works.
I've surfed the whole web and many goes for Horizon or other tools that doesn't fit my case.
If anyone has a better way to achieve this, it's welcome to share.
Laravel queue have 3 important config:
1. retry_after
2. timeout
3. tries
See more: https://laravel.com/docs/5.8/queues
Dynamically configurable timeout (the whole job may take more than 12
hours, depending on the number of rows to process on the selected job)
I think you can config timeout + retry_after about 24h.
Ability to CANCEL an ALREADY RUNNING job.
Delete job in jobs table
Delete process by process id in your server
Hope it help you :)

javax.jcr.InvalidItemStateException: Item cannot be saved

I am getting following exception in a single box cq5 author environment.
javax.jcr.InvalidItemStateException: Item cannot be saved
because node property has been modified externally
more exception details:
Caused by: javax.jcr.InvalidItemStateException: Unable to update a stale item: item.save()
at org.apache.jackrabbit.core.ItemSaveOperation.perform(ItemSaveOperation.java:262)
at org.apache.jackrabbit.core.session.SessionState.perform(SessionState.java:216)
at org.apache.jackrabbit.core.ItemImpl.perform(ItemImpl.java:91)
at org.apache.jackrabbit.core.ItemImpl.save(ItemImpl.java:329)
at org.apache.jackrabbit.core.session.SessionSaveOperation.perform(SessionSaveOperation.java:65)
at org.apache.jackrabbit.core.session.SessionState.perform(SessionState.java:216)
at org.apache.jackrabbit.core.SessionImpl.perform(SessionImpl.java:361)
at org.apache.jackrabbit.core.SessionImpl.save(SessionImpl.java:812)
at com.day.crx.core.CRXSessionImpl.save(CRXSessionImpl.java:142)
at org.apache.sling.jcr.resource.internal.helper.jcr.JcrResourceProvider.commit(JcrResourceProvider.java:511)
... 215 more
Caused by: org.apache.jackrabbit.core.state.StaleItemStateException: 3bec1cb7-9276-4bed-a24e-0f41bb3cf5b7/{}ssn has been modified externally
at org.apache.jackrabbit.core.state.SharedItemStateManager$Update.begin(SharedItemStateManager.java:679)
at org.apache.jackrabbit.core.state.SharedItemStateManager.beginUpdate(SharedItemStateManager.java:1507)
at org.apache.jackrabbit.core.state.SharedItemStateManager.update(SharedItemStateManager.java:1537)
at org.apache.jackrabbit.core.state.LocalItemStateManager.update(LocalItemStateManager.java:400)
at org.apache.jackrabbit.core.state.XAItemStateManager.update(XAItemStateManager.java:354)
at org.apache.jackrabbit.core.state.LocalItemStateManager.update(LocalItemStateManager.java:375)
at org.apache.jackrabbit.core.state.SessionItemStateManager.update(SessionItemStateManager.java:275)
at org.apache.jackrabbit.core.ItemSaveOperation.perform(ItemSaveOperation.java:258)
Here is the code sample:
adminResourceResolver = resourceResolverFactory
.getAdministrativeResourceResolver(null);
Resource fundPageResource = adminResourceResolver.getResource(page
.getPath() + "/jcr:content");
ModifiableValueMap homePageResourceProperties = fundPageResource
.adaptTo(ModifiableValueMap.class);
homePageResourceProperties.put("ssn",(person.getSsn());
adminResourceResolver.commit();
Any ideas ? It could be possible multiple threads accessing this code, as multiple authors on multiple pages calling this code from a authored component.
Thank you,
Sri
This is an error your see often in CQ5.5 (and lessens with each version upwards). The root cause of this issue is that multiple processes/services are modifying the same resource in roughly the same timespan (usually using different sessions, sometimes even with different users).
A small example to demonstrate perhaps. Session A and B both have a reference to Resource X. Session A modifies some properties on X, saves and commits, and is destroyed. This all goes smoothly. Session B still has a snapshot of the situation before A made modifications, session B makes modifications and all seems well UNTIL it tries to save. At this point, session B detects that it can't commit its changes because it doesn't have the latest node state. It has detected some other sessions made changes to the same node. In essence the current node state conflicts with modifications that session A has done and throws an ItemStale exception. The reason for this exception is the notion that the API doesn't know wether you want to keep the changes made by A, keep the changes made by the current session and discard the changes made by A, or merge them.
This error happens often with long running sessions and with workflow/listener combinations. Therefore the recommendation is to keep sessions as short as possible to prevent this kind of conflicts as much as possible.
One way to deal with this is to call session.refresh(keepChangesBoolean) before calling .save(). This instructs the current session to check for updates made by other sessions and deal with it according to the boolean flag you submit. This however is not a guarantee as it's still possible that between your refresh and your save call, yet another session has done the same. It only lowers the odds of this exception occurring.
Another way to deal with this is to retry again from scratch.

Perl CGI gets parameters from a different request to the current URL

This is a weird one. :)
I have a script running under Apache 1.3, with Apache::PerlRun option of mod_perl. It uses the standard CGI.pm module. It's a regularly accessed script on a busy server, accessed over https.
The URL is typically something like...
/script.pl?action=edit&id=47049
Which is then brought into Perl the usual way...
my $action = $cgi->param("action");
my $id = $cgi->param("id");
This has been working successfully for a couple of years. However we started getting support requests this week from our customers who were accessing this script and getting blank pages. We already had a line like the following that put the current URL into a form we use for customers to report an issue about a page...
$cgi->url(-query => 1);
And when we view source of the page, the result of that command is the same URL, but with an entirely different query string.
/script.pl?action=login&user=foo&password=bar
A query string that we recognise as being from a totally different script elsewhere on our system.
However crazy it sounds, it seems that when users are accessing a URL with a query string, the query string that the script is seeing is one from a previous request on another script. Of course the script can't handle that action and outputs nothing.
We have some automated test scripts running to see how often this happens, and it's not every time. To throw some extra confusion into the mix, after an Apache restart, the problem seems to initially disappear completely only to come back later. So whatever is causing it is somehow relieved by a restart, but we can't see how Apache can possibly take the request from one user and mix it up with another.
This, it appears, is an interesting combination of Apache 1.3, mod_perl 1.31, CGI.pm and Apache::GTopLimit.
A bug was logged against CGI.pm in May last year: RT #57184
Which also references CGI.pm params not being cleared?
CGI.pm registers a cleanup handler in order to cleanup all of it's cache.... (line 360)
$r->register_cleanup(\&CGI::_reset_globals);
Apache::GTopLimit (like Apache::SizeLimit mentioned in the bug report) also has a handler like this:
$r->post_connection(\&exit_if_too_big) if $r->is_main;
In pre mod_perl 1.31, post_connection and register_cleanup appears to push onto the stack, while in 1.31 it appears as if the GTopLimit one clobbers the CGI.pm entry. So if your GTopLimit function fires because the Apache process has got to large, then CGI.pm won't be cleaned up, leaving it open to returning the same parameters the next time you use it.
The solution seems to be to change line 360 of CGI.pm to;
$r->push_handlers( 'PerlCleanupHandler', \&CGI::_reset_globals);
Which explicitly pushes the handler onto the list.
Our restart of Apache temporarily resolved the problem because it reduced the size of all the processes and gave GTopLimit no reason to fire.
And we assume it has appeared over the past few weeks because we have increased the size of the Apache process either through new developments which included something that wasn't before.
All tests so far point to this being the issue, so fingers crossed it is!