Recommender API - Upload Usage Event - recommendation-engine

The documentation of this API is a little hard to understand in functional terms.
https://westus.dev.cognitive.microsoft.com/docs/services/Recommendations.V4.0/operations/577d91f77270320f24da2592
Upload a usage event to a model. If buildId is set to "-1", the event
is ingested against the Active Build of the model. Set the buildId is
set to null or 0, the events are ingested against the Active build, if
Active build doesn't exist, the events are not associated with any
build.
"is ingested against the Active Build of the model"
What does this mean?
What happens when you associate events to a build?
I have been sending events using the Upload usage event API, but I don't see any changes on the active build on the Data Statistics tab.
Any help to understand this would be appreciated.
I'm building a batch process to send new usage events, and right now my approach is this:
Upload New Usage File
Delete Old Usage file
Create New Build
Change Active Build
Delete Old Build
I was hoping that the other API just to send users events would work, but since I can't make it to work as expected, I changed to this approach.
Is this a good approach or should be doing this in a different way?

The upload usage file is a better approach than the upload usage event.
Reasons:
You get to send the events as one file thus decreasing your api usage count
You can always review and correct your usage files in case something is wrong. I do not see an api command to view/edit/delete uploaded events
You can reuse your usage files to recreate the model in case of an issue with the current one
Here is my own process during midnight:
Upload new usage file based on today's events
Create new build
Update my system to use new build number (since I have different build types in the same model)
Why this process?
Apparently, we will need to create a new build anyway for new usage data to be considered.
Per another post (answered by an authority on the subject)
After updloading a usage event you need to create a new build in
that model for the usage event to be considered as part of the
recommendations request.
You can check the whole post here
Also, as mentioned in the linked post, a few usage events may not be enough to change the recommendations if done real time / frequently thus wasting effort. So a batch process, using usage files and done once per day is the more pragmatic approach.

Related

Extending Azure multistage yaml pipelines logs

I'm trying to log completion of each stage of multi stage yaml pipelines with some custom details.
How can i add custom details to https://dev.azure.com//_settings/audit logs.
Is there a way to persist this information in sqldb or any other persistant storage option.
How can i subscribe to the these log events.
How can i add custom details to
https://dev.azure.com//_settings/audit logs.
I'm afraid this does not available for you to achieve.
Because the sentence format of details is defined and fixed by our backend class. Once the corresponding action occurred, beside the action class, the event method will also be called to generate and record the log into audit page. These are all finished by backend. And we haven't expose this operation permission to users until now.
But based on my own, this is a good idea that we may consider to expand. Because Customized details can make the details more readable for the company. You can raise your idea here, vote and comment it. Our corresponding Product Group review these suggestion regularly, and consider to take into our develop roadmap depend on its priority(votes).
How can i subscribe to the these log events.
There has one important thing that I need let you know, the audit log only keep for 90 days. And it will be cleared after 90 days, including our backend database. The nutshell is, if you want the audit logs which more than 90 days, we also have no idea to help on restore that.
So I suggest you can configure one scheduled pipeline with Powershell task.
In this powershell task, run this api to get and then store it with any file type you want, e.g .csv, .json and etc.
For the schedule value, you can set it as any time period you like. Just less than 90 days, so that it do not make you lose any audit event log.
Is there a way to persist this information in sqldb or any other
persistant storage option.
If you can use a different database, I'd better suggest you consider to using a document storage solution such as CouchDB, DynamoDB or MongoDB.
Depend on your actual used, you can make use of Command line task with self-agent, to execute corresponding storing commands.
For sample, what I used is MongoDB and I can run below commands to store the JSON file that api generated previously:
C:\>mongodb\bin\mongoimport --jsonArray -d mer -c docs --file audit20191231.json

Cognitive Service Recommendation API Upload Usage Event

Cognitive Service Recommendation API of Upload Usage Event method does not work well.
Implementation Technique
I was created in the order of the ”model” · ”catalog” · ”file” · ”build” in Cognitive Service Recommendation API.
Response of ”Upload Usage Event” is status code is successful in 201.
I call the ”Update model”.
I call ”Download usage file” and ”Get item to item recommendation”.
The item of ”Upload Usage Event” I tried to make sure it is reflected.
However, it did not reflect.
I want to know how to reflect the item of Upload Usage Event to Build.
Am I wrong what implementation procedure?
After [updloading a usage event][1] you need to create a new build in that model for the usage event to be considered as part of the recommendations request.
Note that a single usage event may not significantly change the model. Usually you retrain the model once a week (or more or less depending on the level of traffic you receive) -- and at that point you would have had sent hundreds or thousands of usage events that may actually impact the model.

CQRS client command management

I'm building a new project using CQRS, It's a 3 tier application and it's expected that clients state are synchronized. The server receive commands and callback events to clients.
Currently, the model has several sub model that can be added/removed/updated, each of which has it's own command. Events are field specific i.e.
UpdateItemCommand
CreateSubItemCommand
RemoveSubItemCommand
UpdateSubItemCommand
...
ItemFieldAUpdatedEvent
SubItemFieldAUpdatedEvent
SubItemFieldBUpdatedEvent
...
So here's my interrogation, the client get the current state of the model, user edit the local model, click the Save button and this is where I bug. Should I
Compare the original state of the model (updated with received events) and the edited state of the model to generate a set of commands (on every received events it's required to identify the fields that has been updated and notify the user if he edited a field that has changed),
Create commands as user are editing the model doing and undoing edit (that would be hard to manage),
...
Basically I don't know which strategy I should apply to generate the commands!
Is there an example out there, I've Google around but found noting on that subject.
Thanks,
Dominik
I recommend you to read about Task-based UI.

How to programmatically create a new version of a CQ5 page?

Is it possible to programmatically create a new version of a CQ5 page that has a start time some time in the future?
As an example, let's say we have a page that displays tax rates. We have a component that allows the author to upload a new rates table (in the form of a css file) and it creates the rates page content. We would like to allow the author to upload rates that will be effective the first of next month.
I know the jcr supports multiple versions of nodes, but its unclear how (or whether) this relates to cq5 page versioning. And, further, whether a new version can be activated in the future.
Given the requirements as you've described them, I would probably accomplish the task in a slightly different way...
Instead of storing my rates table information directly within the page's jcr:content node (or a sub node their of) I'd probably abstract it out to somewhere else in the repository. You could then, if you so desired, create some sort of an admin interface to allow content authors to upload their csv file of new rates, and ingest that into the repository as needed. Alternatively, assuming that data comes from some sort of a database, you could probably just write a job to automatically injest it on some sort of a scheduled basis by using a JDBC connection from CQ. Once the data is in the repository, you could then write the display component to read the data from the repository, instead of it being directly inside the page.
This approach has the advantage of making that data re-useable within CQ to be shown on multiple pages, multiple sites, even many different display formats if need be. In addition, you can design your jcr structure to support whatever requirement you have around updates to the data, including daily, monthly, weekly, yearly etc., obviously this will depend on the specific requirements.
The one downside to this is that since there is a separation b/w the data and the page(s) where it is displayed, you may need to find a way to ensure the cache is properly cleared whenever the data does change.
Update (based on your comment):
The problem I foresee with versioning the page, and granted I've not tried this so maybe it will work, is that there can only ever be one active version at a time. Therefore, once the next months data is uploaded, you need to maintain the old data (active) and the new data (not yet active) at the same time. What happens if you require a separate content change during that window...from a business process perspective that just seems messy to me.
Back to cache clear issues, If you know the affected pages, especially if they are all in one subtree, you could write a custom workflow process that uses the replicator service to clear the cache for the affected pages, then set up a launcher to run the wf on node change for the data.
The other option, and this one is less defined in my head, so some experimentation required, would be to use CQs built in activate later and de-activate later functionality.
Maybe create a specific template for the rates data, with the implicit requirement that only one page using that template is ever active at one time. Your display components could use a query to find the currently active rates data.
I have not personally tried this, but...
I assume that you can use the PageManager service's createRevision method, and then if that returns without throwing an exception, you may call page.getContentResource.adaptTo(Node.class), and from there take the node that is returned and edit the JCR properties for your tax rates component.
See PageManager
You could write a workflow that includes a publish step that is triggered by the arrival of a calendar date. The version of the page with the new tax rates remains in the workflow pipeline in draft form and is only published/activated when the date arrives. (So you'd need some sort of process that wakes up once a day to check the calendar.)
Each time a page is modified cq creates a version of the page.
This modified page's modification time is set in jcr:lastModified property of the page.
Manipulation of this property can be done to save future date and activate page on that date though its not preferred way.
You can store the future date as a property in the page.
Later as suggested by #David you can create a workflow or a scheduled job which activates pages with a future date.

How to Implement Queue Based Workflow System?

I'm working on a document management system. An example workflow would be something like this:
A document is emailed to the system
The system does a number of preparatory actions to the document
Document is presented to a user for further processing
Afterwards, document is sent to Quality Assurance
Afterwards, the system does a number or post-processing actions to the document
Document is considered completely processed and disseminated (e.g. emailed back to whoever emailed the document to the system, etc.)
Since the volume of my input will vary (but will usually be high volume), I am very concerend about scalability.
For example, say the system has already downloaded the email attachments. If the attachments are PDF documents, the system needs to split the PDF into individual pages, then convert each page into multiple size thumbnails, etc. I plan to have a cron job check (say, every minute) to see if there are an PDF documents that need to be processed. Using a flagging system (e.g. "PDF Document Ready to be Processed"), I can check the database for all PDF documents that are flagged to be processed. Once the PDF processing is done, the flag can be updated to say "PDF Processing Done."
However, since the processing of each PDF document is very time consuming, I am concerned that when the next cron job is executed, that cron job will also try to process the PDFs that the previous cron job is still processing.
A possible solution is to immediately flag the PDF documents with "PDF Document Currently Being Processed." That way, when the next cron job is executed, it will exclude the ones already being processed.
Thus, each step in the workflow will probably have 3 flags:
PDF Document Ready to be Processed
PDF Document Currently Being Processed
PDF Processing Done
Same for QA:
Document Ready for QA
Document Currently Being QAd
Document QA Done
Is this a good approach? Is there a better approach? Would I have these flags as a single column of the "PDF Document" table in the database? Or should the flags be its own table (e.g. especially if a document can have multiple flags set).
I'd like to solicit suggestions on how to implement such a system.
To solve your concern about concurrent processing on the same document, you can use many scheduler packages to help you manage this aspect. http://www.quartz-scheduler.org/ is one I've used with great success.
To address your problem, I'd have the 3 states, received, queued, processed (similar to what you suggest).
I'd have a scheduled recurring job which polls the database, looking for received pdfs, and for each, queue a job to process and mark the pdf as queued. If you ensure this happens in the same transaction, and utilize optimistic locking, there is no risk another job could come along and re-read this as received.
Quartz uses a thread pool, with may configuration options, and is great for deferred, resource intensive processing (I use it for image thumbnailing in a server setting).
To take a step back, there are some great workflow packages in the java world which can handle most of what you want to do, including the deferred pdf processing. Take a look at jbpm or drools flow, these are two great, if complex, packages.
UPDATE: Drools Flow has been merged into JBPM. For this particular problem it may be a bit of "killing a mosquito with a bazooka" situation, but it's a great workflow package.
The solution kind of depends on what technologies you are using to implement this system is the pre / post processing done by the same software / language as the emailing software? Additionally are they running in seperate processes.
If you have distributed components you could do much worse than investigating an AMQP solution like RabbitMQ, as this takes care of putting each job into a queue, and making sure that only one of your consumers takes each job. (we'd model each thumbnailing job as individual tasks).
If however the entire system is implemented in one language, and inside a single process there's some simpler systems you can use:
Resque is a good solution for Ruby
Java would work well as a LinkedBlockingQueue
Uh, I'm sure c# will have some way of creating a queue of jobs (disclaimer: I know nothing of c#)