Quartz trigger/job key immutability - quartz-scheduler

I am trying to use the Quartz scheduler library to run some jobs in my web application.
I have trouble finding "the right way" to construct the trigger/job key. If I don't need to
cancel/reschedule the trigger/job there is no problem; I can generate a random UUID and use it
as a key. The problem comes when I need to cancel/reschedule the trigger/job after scheduling
it. How should I generate the key so that later someone doesn't accidentally change my key
construction logic in a way that will cause canceling/rescheduling key to be different from
the original scheduling key? I have come up with some solutions but neither of them satisfies me.
One solution would be to just write a comment on the key construction code. Another more complex solution that I am thinking of is for every use case to have some Key class, write the schema of the class the first time, and later check during runtime that the schema hasn't been changed (something like Hibernate does).
Any ideas would be highly appreciated.

Related

Deploy Knowledge Studio dictionary pre-annotator to Natural Language Understanding

I'm getting started with Knowledge Studio and Natural Language Understanding.
I'm able to deploy a machine-learning model toNatural Language Understanding and use the API to query it.
I would know if there's a way to deploy only the pre-annotator.
I read from Knowledge Studio's documentation that
You can deploy or export a machine-learning annotator. A dictionary pre-annotator can only be used to pre-annotate documents within Watson Knowledge Studio.
Does exist a workaround to create a model that simply does the job of the pre-annotator, i.e. use dictionaries to find entities instead of the machine-learning model?
Does exist a workaround to create a model that simply does the job of the pre-annotator, i.e. use dictionaries to find entities instead of the machine-learning model?
You may need to explain this better in what you need.
WKS allows you to pre-annotate documents with dictionaries you upload. Once you have created a ML model, you can alternatively use that to annotate your training documents, and then manually correct. As you continue the amount of manual work will reduce after each model iteration.
The assumption is that you are creating a model with a reasonable amount of examples. In your model results, you will want the mention/relations to be outside or close to outside the gray area of the report.
The other interpretation of your request I took was you want to create a dictionary based model only. This is possible using the "Rule-Based Model" functionality. You would have to create the parsing rules but you just map what you want to find to the dictionary/rule.
Using this in production though is still limited. You should get a warning when you deploy these kinds of models.
It's slightly better than just a keyword search as you can map items to parts of speech.
The last point. The purpose of WKS is to create a machine learning model which will do the work in discovering new terms you haven't seen before. With the rule based engine it can only find what you explicitly tell it to find.
If all you want is just dictionary entries, then you can create a very simple string comparison solution, but you lose the linguistic features.

Event Sourcing Commands vs Events

I understand the difference between commands and events but in a lot of cases you end up with redundancy and mapping between 2 classes that are essentially the same (ThingNameUpdateCommand, ThingNameUpdatedEvent). For these simple cases can you / do you use the event also as a command? Do people serialise to a store all commands as well as all events? Just seems to be a little redundant to me.
All lot of this redundancy is for a reason in general and you want to avoid using the same message for two different purposes for a number of reasons:
Sourced events must be versioned when they change since they are stored and re-used (deserialized) when you hydrate an aggregate root. It will make things a bit awkward if the class is also being used as a message.
Coupling is increased, the same class is now being used by command handlers, the domain model and event handlers now. De-coupling the command side from the event can simplify life for you down the road.
Finally clarity. Commands are issued in a language that asks something to be done (imperative generally). Events are representations of what has happened (past-tense generally). This language gets muddled if you use the same class for both.
In the end these are just data classes, it isn't like this is "hard" code. There are ways to actually avoid some of the typing for simple scenarios like code-gen. For example, I know Greg has used XML and XSD transforms to create all the classes needed for a given domain in the past.
I'd say for a lot of simple cases you may want to question if this is really domain (i.e. modeling behavior) or just data. If it is just data consider not using event sourcing here. Below is a link to a talk by Udi Dahan about breaking up your domain model so that not all of it requires event-sourcing. I'm kind of in line with this way of thinking now myself.
http://skillsmatter.com/podcast/design-architecture/talk-from-udi-dahan
After working through some examples and especially the Greg Young presentation (http://www.youtube.com/watch?v=JHGkaShoyNs) I've come to the conclusion that commands are redundant. They are simply events from your user, they did press that button. You should store these in exactly the same way as other events because it is data you don't know if you will want to use it in a future view. Your user did add and then later remove that item from the basket or at least attempt to. You may later want to use this information to remind the user of this at later date.

In salesforce.com can you have multivalued attributes?

I am developing a Novell Identity Manager driver for Salesforce.com, and am trying to understand the Salesforce.com platform better.
I have had really good success to date. I can read pretty much arbitrary object classes out of SFDC, and create eDirectory objects for them, and what not. This is all done and working nicely. (Publisher Channel). Once I got Query events mapped out, most everything started working in the Publisher Channel.
I am now working on sending events back to SFDC (Subscriber channel) when changes occur in eDirectory.
I am using the upsert() function in the SOAP API, and with Novell Identity Manager, you basically build the SOAP doc, and can see the results as you build it. (You can do it in XSLT or you can use the various allowed tokens to build the document in DirXML Script. I am using DirXML Script which has been working well so far.).
The upshot of that comment is that I can build the SOAP document, see it, to be sure I get it right. Which is usually different than the Java/C++ approach that the sample code usually provides. Much more visual this way.
There are several things about upsert() that I do not entirely understand. I know how to blank a value, should I get that sort of event. Inside the <urn:sObjects> node, add a node like (assuming you get your namespaces declared already):
<urn1:fieldsToNull>FieldName</urn1:fieldsToNull>
I know how to add a value (AttrValue) to the attribute (FieldName), add a node like:
<FieldName>AttrValue</FieldName>
All this works and is pretty straight forward.
The question I have is, can a value in SFDC be multi-valued? In eDirectory, a multi valued attribute being changed, can happen two ways:
All values can be removed, and the new set re-added.
The single value removed can be sent as that sort of event (remove-value) or many values can be removed in one operation.
Looking at SFDC, I only ever see Multi-picklist attributes that seem to be stored in a single entry : or ; delimited. Is there another kind of multi valued attribute managed differently in SFDC? And if so, how would one manipulate it via the SOAP API?
I still have to decide if I want to map those multi-picklists to a single string, or a multi valued attribute of strings. First way is easier, second way is more useful... Hmmm... Choices...
Some references:
I have been using the page Sample SOAP messages to understand what the docs should look like.
Apex Explorer is a kicking tool for browsing the database and testing queries. Much like DBVisualizer does for JDBC connected databases. This would have been so much harder without it!
SoapUi is also required, and a lovely tool!
As far as I know there's no multi-value field other than multi-select picklists (and they map to semicolon-separated string). Generally platform encourages you to create a proper relationship with another (possibly new, custom) table if you're in need of having multiple values associated to your data.
Only other "unusual" thing I can think of is how the OwnerId field on certain objects (Case, Lead, maybe something else) can be used to point to User or Queue record. Looks weird when you are used to foreign key relationships from traditional databases. But this is not identical with what you're asking as there will be only one value at a time.
Of course you might be surpised sometimes with values you'll see in the database depending on the viewing user's locale (stuff like System Administrator profile becoming Systeembeheerder in Dutch). But this will be still a single value, translated on the fly just before the query results are sent back to you.
When I had to perform SOAP integration with SFDC, I've always used WSDL files and most of the time was fine with Java code generated out of them with Apache Axis. Hand-crafting the SOAP message yourself seems... wow, hardcore a bit. Are you sure you prefer visualisation of XML over the creation of classes, exceptions and all this stuff ready for use with one of several out-of-the-box integration methods? If they'll ever change the WSDL I need just to regenerate the classes from it; whereas changes to your SOAP message creation library might be painful...

Workflow Foundation: where do I put my data?

I'm new to Workflow Foundation. I've worked my way through one book (K. Scott Allen's Programming Windows Workflow Foundation) which was okay but I'm left with several questions. The biggest one is 'where do I put the data'?
Throughout the book he uses the idea of a Bug Tracking system; the scenario isn't too far from what I want to do. His examples use a simple Bug class with three properties and nothing else and he just makes it a field on his activities and passes it around where necessary. But in the real world a bug report would probably have lines of text that you'd want to be searchable; my scenario certainly does. If all of this text is squirrelled away by the persistence service, how do I get at it for text searches?
In the real world, what do you do with your data?
For simple keys or assciated foreign keys you can use the tracking service to store such key/value pairs in the database too and query them later by using the tracking information in the tracking tables.
For full text search you can store the data in additonal tables you are creating side-by-side to the original tables of the sql-persistence-service. You have to manually add and delete them in case of workflow ending or error.

Hashes vs Numeric id's

When creating a web application that some how displays the display of a unique identifier for a recurring entity (videos on YouTube, or book section on a site like mine), would it be better to use a uniform length identifier like a hash or the unique key of the item in the database (1, 2, 3, etc).
Besides revealing a little, what I think is immaterial, information about the internals of your app, why would using a hash be better than just using the unique id?
In short: Which is better to use as a publicly displayed unique identifier - a hash value, or a unique key from the database?
Edit: I'm opening up this question again because Dmitriy brought up the good point of not tying down the naming to db specific property. Will this sort of tie down prevent me from optimizing/normalizing the database in the future?
The platform uses php/python with ISAM /w MySQL.
Unless you're trying to hide the state of your internal object ID counter, hashes are needlessly slow (to generate and to compare), needlessly long, needlessly ugly, and needlessly capable of colliding. GUIDs are also long and ugly, making them just as unsuitable for human consumption as hashes are.
For inventory-like things, just use a sequential (or sharded) counter instead. If you migrate to a different database, you will just have to initialize the new counter to a value at least as large as your largest existing record ID. Pretty much every database server gives you a way to do this.
If you are trying to hide the state of your counter, perhaps because you're counting users and don't want competitors to know how many you have, I suggest avoiding the display of your internal IDs. If you insist on displaying them and don't want the drawbacks of a hash, you might consider using a maximal-period linear feedback shift register to generate IDs.
I typically use hashes if I don't want the user to be able to guess the next ID in the series. But for your book sections, I'd stick with numerical id's.
Using hashes is preferable in case you need to rebuild your database for some reason, for example, and the ordering changes. The ordinal numbers will move around -- but the hashes will stay the same.
Not relying on the order you put things into a box, but on properties of the things, just seems.. safer.
But watch out for collisions, obviously.
With hashes you
Are free to merge the database with a similar one (or a backup), if necessary
Are not doing something that could help some guessing attacks even a bit
Are not disclosing more private information about the user than necessary, e.g. if somebody sees a user number 2 in your current database log in, they're getting information that he is an oldie.
(Provided that you use a long hash or a GUID,) greatly helping youself in case you're bought by YouTube and they decide to integrate your databases.
Helping yourself in case there appears a search engine that indexes by GUID.
Please let us know if the last 6 months brought you some clarity on this question...
Hashes aren't guaranteed to be unique, nor, I believe, consistent.
will your users have to remember/use the value? or are you looking at it from a security POV?
From a security perspective, it shouldn't matter - since you shouldn't just be relying on people not guessing a different but valid ID of something they shouldn't see in order to keep them out.
Yeah, I don't think you're looking for a hash - you're more likely looking for a Guid.If you're on the .Net platform, try System.Guid.
However, the most important reason not to use a Guid is for performance. Doing database joins and lookups on (long) strings is very suboptimal. Numbers are fast. So, unless you really need it, don't do it.
Hashes have the advantage that you can check if they are valid or not BEFORE performing any check to your database whether they exist or not. This can help you to fend off attacks with random hashes as you don't need to burden your database with fake lookups.
Therefor, if your hash has some kind of well-defined format with for example a checksum at the end, you can check if it's correct without needing to go to the database.