Type Rank IN Ndepend - ndepend

1)In our application we have found two different ranks one is 287 and another one is 409.
Which one is best rank .what are the code changes developer has to do to get better rank to all the methods.
2)with the help of rank how can we say our method is successfully passed by all Rules of NDepend .

Type Rank or Method rank, higher or lower value, is not better or worse.
It just indicates popularity of a type, if it is used a lot or not. Internally the famous Page Rank algorithm is used on the graph of types and the graph of methods.
Type rank documentation is here.
Recommendations: Types with high TypeRank should be more carefully tested because bugs in such types will likely be more catastrophic.

Related

Imbalanced multiclass classification using company names

I have this classification scenario below in which Im getting a very low F1, precision, recall and other metrics.
Target is multiclass (about ~200 classes) which is highly imbalanced
I only use company names as classifier (mostly 1-2 words which have max of 8 words), no other fields (like description, etc.)
Training data ~ 100k+ records
Preprocessing: numeric and special characters and stopwords removal
I have very low resources for processing (thats why when I try to use oversampling techniques like smote, distance_smote for multiclass, etc., I always get memory error)
Tried using different vectorization/embedding/tokenizer like word2vec, tfidf, fasttext, bert, roberta, etc. but to no avail
Tried using (and fine-tuning) different algorithms (networks, svm, trees, boosting, etc.) but also getting low scores.
I also did cost-sensitive learning (using class weights) but it only decreased my scores.
Tried all options that I know but scores are not increasing. Can you recommend other options here or do you think any part of the process that may be wrong/discarded? Thank you!
Distribution of target labels:
Sample observations
There is essentially no way to know that 'Exxon' is an oil company, and 'Apple' a computer company, and 'McDonalds' a fast-food chain, just from their company names.
Even if you have a list of every other company in the world, by name and type, that's not enough to make the deduction for these last 3. Only other outside info – like a few sentences about them, or other data – could classify them.
In fact, while company names sometimes describe their exact field-of-commerce, often they're totally arbitrary, as that gives them more freedom to range over many products/services, or create their own unique associations with the name (aka branding).
So I strongly suspect your (unshown) names & (unshown) labels are just too arbitrary for the data you're using to get very good at the task you're attempting.
Is there a real-world situation where someone will only have a company name – no other info, or research options – and benefit from correctly guessing the class? If so, more specifics about the situation might help generate more specific tactical recommendations. But mainly such recommendations will be: get richer data about the targets of the classification.
You might squeeze a little more out of vague trends in corporate naming via better preprocessing/feature-extraction. You may want to keep numbers, special-characters, & punctuation in some form, as they might include extra slight hints. Using subwords (character n-grams) might also reveal some shared word-roots used even in made-up names.

Regarding Microservices Fuzzy Boundaries

I am reading Microservices Patterns by Chris. In his book, he gave some example, which I could not able to understand section 5.2.1. The problem with fuzzy boundaries
Here is the link to read online. Can you someone please look into section 5.2.1 and help me understand what exactly the issue with fuzzy boundaries?
I didn't get clearly especially below statement:
In this scenario, Sam reduces the order total by $X and Mary reduces it by $Y. As a result, the Order is no longer valid, even though the application verified that the order still satisfied the order minimum after each consumer’s update
In above statement, can someone please explain me, why Order is no longer valid?
In above statement, can someone please explain me, why Order is no longer valid?
The business problem that Chris Richardson is using in this example assumes that (a) the system should ensure that orders are always valid, and (b) that valid orders exceed some minimum amount.
Minimum amount is determined by a sum of the order_items associated with a specific order.
The "fuzzy boundary" issue comes about because the code in question allows Sam and Mary to manipulate order_items directly; in other words, writing changes to order items does not lock the other items of the order.
If Sam and Mary were forced to acquire a lock on the entire order before validating their changes, then you wouldn't have a problem; the second person would see the changes made by the first.
Alternatively, locking at the level of the order_item would be fine if you weren't trying to ensure that the set of order items satisfy some property. Take away the constraint on the total order cost, and Sam and Mary only need to get locks on their specific item.

evaluation NLP classifier with annotated data

if we want to evaluate a classifier of NLP application with data that are annotated with two annotators, and they are not completely agreed on the annotation, how is the procedure?
That is, if we should compare the classifier output with just the portion of data that annotators agreed on? or just one of the annotator data? or the both of them separately and then compute the average?
Taking the majority vote between annotators is common. Throwing out disagreements is also done.
Here's a blog post on the subject:
Suppose we have a bunch of annotators and we don’t have perfect agreement on items. What do we do? Well, in practice, machine learning evals tend to either (1) throw away the examples without agreement (e.g., the RTE evals, some biocreative named entity evals, etc.), or (2) go with the majority label (everything else I know of). Either way, we are throwing away a huge amount of information by reducing the label to artificial certainty. You can see this pretty easily with simulations, and Raykar et al. showed it with real data.
What's right for you depends heavily on your data and how the annotators disagree; for starters, why not use only items they agree on and see what then compare the model to the ones they didn't agree on?

Designing of REST URL

I have the following problem. Please suggest me the inputs.
I require the list of transactions which can be categorized as
a. Pending/Completed
b. For given account numbers
c. For given customer ids
d. For given categories (category A, Category B) and so on.
Any of these 4 above and all of these are optional.
I am thinking of putting the above 4 options as query param and have a url something like this http://localhost:8080/Transaction/?status=pending&customerid=3,4&category=catA
Is this a Good design for this requirement?
[EDIT]
I do not know if it is good design to pass nouns as query parameters.
As per as REST URI design is concerned, you have to find out the resources first then operations and how they are linked to each other.
Moreover you should use nouns in your URIs, and try to minimize the number of the query parameters.
For this scenario, you basically want to search for transaction based on some conditions, as you have mentioned.
http://localhost:8080/transaction?status=pending&customerid=3,4&category=catA
You are right, but the implementation should split the value of query parameters to extract values.
In a general practice, Consumer,Account can be treated as separate resource.
http://localhost:8080/consumer/{consumerId}/account/{accountNumber}/transaction?status=pending

Most helpful NDepend CQL queries

A client I work for has begun using NDepend as a replacement for FXCop, and the "architect" has compiled a list of practically unusable CQL queries, which I gather he has taken from advice from the NDepend website.
An example of what "I think" is an unhelpful query
WARN IF Count > 0 IN
SELECT METHODS WHERE PercentageComment < 20
AND NbLinesOfCode > 10
ie: must have at least 2 lines of comment for each 10 lines of code
So what I trying to gather is a useful set of queries that we can use as developers.
Please only provide a single query per response (with description) so that it can be voted accordingly.
Please only provide a single query per response (with description) so that it can be voted accordingly.
Xian, now that CQLinq (Code Rule over LINQ Query) is released, dozens of new default rules are available and most existing ones have been enhanced.
Here are ten of my preferred ones:
Avoid namespaces dependency cycles
UI layer shouldn't use directly DB types
Types with disposable instance fields must be disposable
Types that used to be 100% covered by test but not anymore
Avoid transforming an immutable type into a mutable one
Avoid making complex methods even more complex
API Breaking Changes: Types
Potentially dead Methods
Namespace name should correspond to file location
Methods should be declared static if possible