Turning off adaptive scoring selectively - emacs

I would like to
- Turn off adaptive scoring for one usenet group
or even better
- Turn off adaptive scoring for one sender in one newsgroup
How to do it?

Look in the Gnus documentation, under Scoring/Score File Format.
‘adapt’
This entry controls the adaptive scoring. If it is ‘t’, the
default adaptive scoring rules will be used. If it is ‘ignore’, no
adaptive scoring will be performed on this group. If it is a list,
this list will be used as the adaptive scoring rules. If it isn’t
present, or is something other than ‘t’ or ‘ignore’, the default
adaptive scoring rules will be used. If you want to use adaptive
scoring on most groups, you’d set ‘gnus-use-adaptive-scoring’ to
‘t’, and insert an ‘(adapt ignore)’ in the groups where you do not
want adaptive scoring. If you only want adaptive scoring in a few
groups, you’d set ‘gnus-use-adaptive-scoring’ to ‘nil’, and insert
‘(adapt t)’ in the score files of the groups where you want it.

Related

Is rule engine suitable for validating data against set of rules?

I am trying to design an application that allows users to create subscriptions based on different configurations - expressing their interest to receive alerts when those conditions are met.
While evaluating the options for achieving the same, I was thinking about utilizing a generic rule engine such as Drools to achieve the same. Which seemed to be a natural fit to this problem looking at an high-level. But digging deeper and giving it a bit more thought, I am doubting if Business Rule Engine is the right thing to use.
I see Rule engine as something that can select a Rule based on predefined condition and apply the Rule to that data to produce an outcome. Whereas, my requirement is to start with a data (the event that is generated) and identify based on Rules (subscriptions) configured by users to identify all the Rules (subscription) that would satisfy the event being handled. So that Alerts can be generated to all those Subscribers.
To give an example, an hypothetical subscription from an user could be, to be alerted when a product in Amazon drops below $10 in the next 7 days. Another user would have created a subscription to be notified when a product in Amazon drops below $15 within the next 30 days and also offers free one-day shipping for Prime members.
After a bit of thought, I have settled down to storing the Rules/Subscriptions in a relational DB and identifying which Subscriptions are to fire an Alert for an Event by querying against the DB.
My main reason for choosing this approach is because of the volume, as the number of Rules/Subscriptions I being with will be about 1000 complex rules, and will grow exponentially as more users are added to the system. With the query approach I can trigger a single query that can validate all Rules in one go, vs. the Rule engine approach which would require me to do multiple validations based on the number of Rules configured.
While, I know my DB approach would work (may not be efficient), I just wanted to understand if Rule Engine can be used for such purposes and be able to scale well as the number of rules increases. (Performance is of at most importance as the number of Events that are to be processed per minute will be about 1000+)
If rule engine is not the right way to approach it, what other options are there for me to explore rather than writing my own implementation.
You are getting it wrong. A standard rule engine selects rules to execute based on the data. The rules constraints are evaluated with the data you insert into the rule engine. If all constraints in a rule match the data, the rule is executed. I would suggest you to try Drools.

Can I exclude some intents in some nodes?

There are some situations that I need the bot classify the sentence according to all intents except some, just compare the sentence to these intents and not those?
So that I avoid if there is a probability that 2 intents may affect confidence of each other.
Can I do such a thing like that?
You can do indirectly.
In each node you can specify the matching criteria, i.e., predicates that need to be true. The expression language for intents allows to access their properties. So you could check which of the two intents in question has the higher confidence.
Depending on the exact situation (you did not provide any details) you might need to set alternate_intents to true to have more detected intents returned.

Migrating to Bounded Context

I currently have a Web API project which currently has all the system processing in the same solution. I'm breaking this out into separate solutions so that they can be ran independently (e.g. an Azure WebJob) as I don't want to have to redploy the Web project if something in the backend has changed.
My issue with this is that even though I have separated the logic they are tied together by a single context so that if I make a change in one I will have to redeploy all as the migrations won't match up.
So that's why I've been looking at Bounded Context and DDD. I'm looking at how to break this up but having trouble understanding how relationships work.
A lot of the site is administrative (i.e. creating entities, no actual processing) so was going to split contexts around this e.g.:
A user adds and maintains currency conversion rates (this is two entities in
total).
A user adds and maintains details on how to process payments (note that is is not processing payments, it only holds information about paypal account details etc).
So I was splitting the context's up by this, does this sound reasonable to start with (there are a lot more like this such as tax bands, charge structures etc)?
If this is the way to go, how do I handle relationships between those two contexts? As an example:
A payment method requires a link to an 'active' currency conversion. I understand I can just have this as an Id, but I need to check it's state so need access to the model.
A currency conversion can only be set to 'Inactive' if there are no payment methods currently using it. Again this needs access to the other model.
So logically the models need access to each other, how would this be included in the context? Can I add navigation properties to a model in a different context? Or should I add it as a separate DbSet and possibly map using a view?
Thanks
So I was splitting the context's up by this, does this sound
reasonable to start with (there are a lot more like this such as tax
bands, charge structures etc)?
"So that they can be ran and deployed independently" may not be a sufficient heuristic to tell when you should split Bounded Contexts. This addresses one aspect of the solution space, but if you haven't looked well enough at the problem space, you'll suffer from a misalignment between BC's and subdomains that can cause a lot of friction. You might end up always deploying a cluster of seemingly unrelated "independently deployable units" together because you didn't realize they talk about the same thing.
Identifying subdomains is the product of distillating your business - separating the big functional areas and defining which parts are your core domain and which are ancillary activities. Each subdomain has its own specific semantics (Ubiquitous Language). In your case, as has been pointed out in the comments, Currency Conversion and Payment Methods might well be part of the same subdomain (Payment?). It does not automatically mean that they should also be in the same BC but it might be a good idea to keep subdomains aligned 1-to-1 with BC's, as additional BC's come at a cost.
Back to deployability, even if it can be one beneficial effect of Bounded Contexts, they are not always so easily translatable in terms of independent units of deployment. Context mapping patterns (Shared Kernel, Customer Supplier, etc.) and BC communication in general can lead to a model, and therefore a part of a codebase, being shared by multiple BC's. Code and API synchronization issues arise that can question a simplistic "deployable free electron" view.
Just because you're using the Bounded Context approach doesn't mean you have to use DDD's tactical patterns (Aggregate Root, invariants, etc.) inside each BC. Using them should be an educated decision to trade solution space complexity off for problem space manageability. If "Currency Conversion can only be set to inactive..." is the only rule pertaining to payment method and currency management in your business, it might not be worth the bother to give that Bounded Context a full-fledged rich domain model. CRUD could be better suited there.

how to categorize "tastekid" and "clerkdogs" recommender system

I wonder what category of recommender system do Tastekid and clerkdogs belong to? Both do not seem to require any rating from users.
I'm not sure about Clerkdogs but Tastekid seems to be a collaborative filtering system (see here). Even when users don't provide ratings you can still apply collaborative filtering techniques. For instance, you can use a binary representation where 1 means a user liked an item and 0 when they don't like it (or unknown).

How do I adapt my recommendation engine to cold starts?

I am curious what are the methods / approaches to overcome the "cold start" problem where when a new user or an item enters the system, due to lack of info about this new entity, making recommendation is a problem.
I can think of doing some prediction based recommendation (like gender, nationality and so on).
You can cold start a recommendation system.
There are two type of recommendation systems; collaborative filtering and content-based. Content based systems use meta data about the things you are recommending. The question is then what meta data is important? The second approach is collaborative filtering which doesn't care about the meta data, it just uses what people did or said about an item to make a recommendation. With collaborative filtering you don't have to worry about what terms in the meta data are important. In fact you don't need any meta data to make the recommendation. The problem with collaborative filtering is that you need data. Before you have enough data you can use content-based recommendations. You can provide recommendations that are based on both methods, and at the beginning have 100% content-based, then as you get more data start to mix in collaborative filtering based.
That is the method I have used in the past.
Another common technique is to treat the content-based portion as a simple search problem. You just put in meta data as the text or body of your document then index your documents. You can do this with Lucene & Solr without writing any code.
If you want to know how basic collaborative filtering works, check out Chapter 2 of "Programming Collective Intelligence" by Toby Segaran
Maybe there are times you just shouldn't make a recommendation? "Insufficient data" should qualify as one of those times.
I just don't see how prediction recommendations based on "gender, nationality and so on" will amount to more than stereotyping.
IIRC, places such as Amazon built up their databases for a while before rolling out recommendations. It's not the kind of thing you want to get wrong; there are lots of stories out there about inappropriate recommendations based on insufficient data.
Working on this problem myself, but this paper from microsoft on Boltzmann machines looks worthwhile: http://research.microsoft.com/pubs/81783/gunawardana09__unified_approac_build_hybrid_recom_system.pdf
This has been asked several times before (naturally, I cannot find those questions now :/, but the general conclusion was it's better to avoid such recommendations. In various parts of the worls same names belong to different sexes, and so on ...
Recommendations based on "similar users liked..." clearly must wait. You can give out coupons or other incentives to survey respondents if you are absolutely committed to doing predictions based on user similarity.
There are two other ways to cold-start a recommendation engine.
Build a model yourself.
Get your suppliers to fill in key information to a skeleton model. (Also may require $ incentives.)
Lots of potential pitfalls in all of these, which are too common sense to mention.
As you might expect, there is no free lunch here. But think about it this way: recommendation engines are not a business plan. They merely enhance the business plan.
There are three things needed to address the Cold-Start Problem:
The data must have been profiled such that you have many different features (with product data the term used for 'feature' is often 'classification facets'). If you don't properly profile data as it comes in the door, your recommendation engine will stay 'cold' as it has nothing with which to classify recommendations.
MOST IMPORTANT: You need a user-feedback loop with which users can review the recommendations the personalization engine's suggestions. For example, Yes/No button for 'Was This Suggestion Helpful?' should queue a review of participants in one training dataset (i.e. the 'Recommend' training dataset) to another training dataset (i.e. DO NOT Recommend training dataset).
The model used for (Recommend/DO NOT Recommend) suggestions should never be considered to be a one-size-fits-all recommendation. In addition to classifying the product or service to suggest to a customer, how the firm classifies each specific customer matters too. If functioning properly, one should expect that customers with different features will get different suggestions for (Recommend/DO NOT Recommend) in a given situation. That would the 'personalization' part of personalization engines.