How to force PerfView to collect ETW events coming only from one process - perfview

I know there is a /Process:NameOrPID switch but it affects only /StopXXX commands. Collecting ETW events from all processes leads to big *.ETL file. I am trying to be able to catch ETW events only from one process in order to avoid polluting the output file with non relevant ETW events.

Updated 2019-04-14.
Now there is a way to do that. Please use /focusProcess=ProcessIDOrName option available in PerfView 2.0.32 (also available in UI starting from 2.0.39).

If you know the names of the ETW providers emitting events from your process you can filter the process when specifying providers in the Additional Providers text box, or in the -Providers or -OnlyProviders command line arguments to perfview.
From PerfView's docs:
The Additional Providers TextBox - A comma separated list of specifications for providers. This can be specified by using the (the ... button) or by the following textual specification. Each provider specification has the general form of provider:keywords:level:values. The keyword and levels specification parts are optional and can be omitted (For example provider:keywords:values or provider:values is legal).
Process filters occur in the values section. Relevant portions from the docs:
values - this is a list of semicolon-separated values KEY=VALUE, which are used to pass extra information to the provider or to the ETW system. KEY values that begin with an # are commands to the ETW system. Everything else is passed on the the provider (EventSources have direct support for accepting this information in its OnEventCommand method). The special ETW keywords include
#ProcessIDFilter - a space separated list of decimal process IDs to collect data from. Only events from these processes (or those named in the #ProcessNameFilter) will be collected. Since IDs only exist after a process is created, this only works on processes that are running at the time collection starts.
#ProcessNameFilter - a space separated list of process names (a process name is the file name (no path) of the executable INCLUDING the .EXE extension). Only events from the names processes (or those named in the #ProcessIDFilter) will be collected. It does not matter if the process was running before collection or not.
So, if I have an ETW provider named my-provider running in a process named my.process.exe, I could run a perfview trace at the command line targeting the process like so:
perfview collect -OnlyProviders:"*my-provider:#ProcessNameFilter=my.process.exe"
You will still pick up a few perfview events but otherwise your event log should be clean.

Related

CQRS and Passing Data

Suppose I have an aggregate containing some data and when it reaches a certain state, I'd like to take all that state and pass it to some outside service. For argument and simplicity's sake, lets just say it is an aggregate that has a list and when all items in that list are checked off, I'd like to send the entire state to some outside service. Now when I'm handling the command for checking off the last item in the list, I'll know that I'm at the end but it doesn't seem correct to send it to the outside system from the processing of the command. So given this scenario what is the recommended approach if the outside system requires all of the state of the aggregate. Should the outside system build its own copy of the data based on the aggregate events or is there some better approach?
Should the outside system build its own copy of the data based on the aggregate events.
Probably not -- it's almost never a good idea to share the responsibility of rehydrating an aggregate from its history. The service that owns the object should be responsible for rehydration.
First key idea to understand is when in the flow the call to the outside service should happen.
First, the domain model processes the command arguments, computing the update to the event history, including the ChecklistCompleted event.
The application takes that history, and saves it to the book of record
The transaction completes successfully.
At this point, the application knows that the operation was successful, but the caller doesn't. So the usual answer is to be thinking of an asynchronous operation that will do the rest of the work.
Possibility one: the application takes the history that it just saved, and uses that history to create schedule a task to rehydrate a read-only copy of the aggregate state, and then send that state to the external service.
Possibility two: you ditch the copy of the history that you have now, and fire off an asynchronous task that has enough information to load its own copy of the history from the book of record.
There are at least three ways that you might do this. First, you could have the command schedule the task as before.
Second, you could have a event handler listening for ChecklistCompleted events in the book of record, and have that handler schedule the task.
Third, you could read the ChecklistCompleted event from the book of record, and publish a representation of that event to a shared bus, and let the handler in the external service call you back for a copy of the state.
I was under the impression that one bounded context should not reach out to get state from another bounded context but rather keep local copies of the data it needed.
From my experience, the key idea is that the services shouldn't block each other -- or more specifically, a call to service B should not block when service A is unavailable. Responding to events is fundamentally non blocking; does it really matter that we respond to an asynchronously delivered event by making an asynchronous blocking call?
What this buys you, however, is independent evolution of the two services - A broadcasts an event, B reacts to the event by calling A and asking for a representation of the aggregate that B understands, A -- being backwards compatible -- delivers the requested representation.
Compare this with requiring a new release of B every time the rehydration logic in A changes.
Udi Dahan raised a challenging idea - the notion that each piece of data belongs to a singe technical authority. "Raw business data" should not be replicated between services.
A service is the technical authority for a specific business capability.
Any piece of data or rule must be owned by only one service.
So in Udi's approach, you'd start to investigate why B has any responsibility for data owned by A, and from there determine how to align that responsibility and the data into a single service. (Part of the trick: the physical view of a service can span process boundaries; in other words, a process may be composed from components that belong to more than one service).
Jeppe Cramon series on microservices is nicely sourced, and touches on many of the points above.
You should never externalise your state. Reporting on that state is a function of the read side, as it produces reports and you'll need that data to call the service. The structure of your state is plastic, and you shouldn't have an external service that relies up that structure otherwise you'll have to update both in lockstep which is a bad thing.
There is a blog that puts forward a strong argument that the process manager is the correct place to put this type of feature (calling an external service), because that's the appropriate place for orchestrating events.

Alfresco: Which tables do I need to query to get workflow details

I am trying to generate analytics reports using high charts.
For that I need to get values from alfresco DB using Postgres query. Can any one tell me what are the tables related workflow creation and save all workflow details.
I got some references from the below link.
http://techogeek.blogspot.in/2015/09/how-retrieve-activiti-workflow-details.html
As Gagravarr suggests, don't hit the database directly and use always the workflowservice to get the data workflow related data.
ACT_RE_*: RE stands for repository. Tables with this prefix contain static information such as process definitions and process resources (images, rules, etc.).
ACT_RU_*: RU stands for runtime. These are the runtime tables that contain the runtime data of process instances, user tasks, variables, jobs, etc. Activiti only stores the runtime data during process instance execution, and removes the records when a process instance ends. This keeps the runtime tables small and fast.
ACT_ID_*: ID stands for identity. These tables contain identity information, such as users, groups, etc.
ACT_HI_*: HI stands for history. These are the tables that contain historic data, such as past process instances, variables, tasks, etc.
ACT_GE_*: general data, which is used in various use cases.

Can watchman send why a file changed?

Is watchman capable of posting to the configured command, why it's sending a file to that command?
For example:
a file is new to a folder would possibly be a FILE_CREATE flag;
a file that is deleted would send to the command the FILE_DELETE flag;
a file that's modified would send a FILE_MOD flag etc.
Perhaps even when a folder gets deleted (and therefore the files thereunder) would send a FOLDER_DELETE parameter naming the folder, as well as a FILE_DELETE to the files thereunder / FOLDER_DELETE to the folders thereunder
Is there such a thing?
No, it can't do that. The reasons why are pretty fundamental to its design.
The TL;DR is that it is a lot more complicated than you might think for a client to correctly process those individual events and in almost all cases you don't really want them.
Most file watching systems are abstractions that simply translate from the system specific notification information into some common form. They don't deal, either very well or at all, with the notification queue being overflown and don't provide their clients with a way to reliably respond to that situation.
In addition to this, the filesystem can be subject to many and varied changes in a very short amount of time, and from multiple concurrent threads or processes. This makes this area extremely prone to TOCTOU issues that are difficult to manage. For example, creating and writing to a file typically results in a series of notifications about the file and its containing directory. If the file is removed immediately after this sequence (perhaps it was an intermediate file in a build step), by the time you see the notifications about the file creation there is a good chance that it has already been deleted.
Watchman takes the input stream of notifications and feeds it into its internal model of the filesystem: an ordered list of observed files. Each time a notification is received watchman treats it as a signal that it should go and look at the file that was reported as changed and then move the entry for that file to the most recent end of the ordered list.
When you ask Watchman for information about the filesystem it is possible or even likely that there may be pending notifications still due from the kernel. To minimize TOCTOU and ensure that its state is current, watchman generates a synchronization cookie and waits for that notification to be visible before it responds to your query.
The combination of the two things above mean that watchman result data has two important properties:
You are guaranteed to have have observed all notifications that happened before your query
You receive the most recent information for any given file only once in your query results (the change results are coalesced together)
Let's talk about the overflow case. If your system is unable to keep up with the rate at which files are changing (eg: you have a big project and are very quickly creating and deleting files and the system is heavily loaded), the OS can't fit all of the pending notifications in the buffer resources allocated to the watches. When that happens, it blows those buffers and sends an overflow signal. What that means is that the client of the watching API has missed some number of events and is no longer in synchronization with the state of the filesystem. If that client is maintains state about the filesystem it is no longer valid.
Watchman addresses this situation by re-examining the watched tree and synthetically marking all of the files as being changed. This causes the next query from the client to see everything in the tree. We call this a fresh instance result set because it is the same view you'd get when you are querying for the first time. We set a flag in the result so that the client knows that this has happened and can take appropriate steps to repair its own state. You can configure this behavior through query parameters.
In these fresh instance result sets, we don't know whether any given file really changed or not (it's possible that it changed in such a way that we can't detect via lstat) and even if we can see that its metadata changed, we don't know the cause of that change.
There can be multiple events that contribute to why a given file appears in the results delivered by watchman. We don't them record them individually because we can't track them with unbounded history; imagine a file that is incrementally being written once every second all day long. Do we keep 86400 change entries for it per day on hand and deliver those to our clients? What if there are hundreds of thousands of files like this? We'd have to truncate that data, and at that point the loss in the data reduces how well you can reason about it.
At the end of all of this, it is very rare for a client to do much more than try to read a file or look at its metadata, and generally speaking, they want to do that only when the file has stopped changing. For this use case, watchman-wait, watchman-make and trigger all have the concept of a settle period that causes the change notifications to be delayed in delivery until after the filesystem has stopped changing.

Should Command / Handles hold the full aggregates or only its id?

I'm trying to play around with DDD and CQRS.
And I got this two solutions :
add AggregateId to my command / event. It's nice beauce I can use my command as my web service's parameter, and I can as well return some instance of my command to my forms for saying "you can do this command,t his one and this one"
add my full Aggregate to my command / event. It's nice because I'm sure that I won't load my aggregate 100 times if there is a lot of event going on, I'll just pass my reference around (for instance I won't load it in my command's validator and in my command handler). But i'd add to create a parameter class for each command wih only the id.
For now I have the id in the commands and the full model in the events (I trust my unit of work for caching the Load(aggregateId) so i won't execute the same request 100 for 1 command).
Is there a right / better way ?
Yes your current approach is correct - reference the aggregate with an identity value on the command. A command is meant to be serialized and sent across process boundaries. Also, a command is normally constructed by a client who may not have enough information to create an entire aggregate instance. This is also why an identity should be used. And yes, your unit of work should take care of caching an aggregate for the duration of a unit of work, if need be.

How to make an InArgument's value dependant upon the value of another InArgument at design time

I have a requirement to allow a user to specify the value of an InArgument / property from a list of valid values (e.g. a combobox). The list of valid values is determined by the value of another InArgument (the value of which will be set by an expression).
For instance, at design time:
User enters a file path into workflow variable FilePath
The DependedUpon InArgument is set to the value of FilePath
The file is queried and a list of valid values is displayed to the user to select the appropriate value (presumably via a custom PropertyValueEditor).
Is this possible?
Considering this is being done at design time, I'd strongly suggest you provide for all this logic within the designer, rather than in the Activity itself.
Design-time logic shouldn't be contained within your Activity. Your Activity should be able to run independent of any designer. Think about it this way...
You sit down and design your workflow using Activities and their designers. Once done, you install/xcopy the workflows to a server somewhere else. When the server loads that Activity prior to executing it, what happens when your design logic executes in CacheMetadata? Either it is skipped using some heuristic to determine that you are not running in design time, or you include extra logic to skip this code when it is unable to locate that file. Either way, why is a server executing this design time code? The answer is that it shouldn't be executing it; that code belongs with the designers.
This is why, if you look at the framework, you'll see that Activities and their designers exist in different assemblies. Your code should be the same way--design-centric code should be delivered in separate assemblies from your Activities, so that you may deliver both to designers, and only the Activity assemblies to your application servers.
When do you want to validate this, at design time or run time?
Design time is limited because the user can use an expression that depends on another variable and you can't read the value from there at design time. You can however look at the expression and possibly deduce an invalid combination that way. In this case you need to add code to the CacheMetadata function.
At run time you can get the actual values and validate them in the Execute function.