NDepend Average Cyclomatic Complexity without get and set - ndepend

We are using NDepend 5 and I did have some doubts about our Average Cyclomatic Complexity.
When checking how this query is made, I found out that it includes getters and setters of our properties. It seems to also includes auto properties methods. Those methodes are usually equivalent to a CC of 1. I don't like that because it lower our average CC and show us not the real average of methods we code.
Is there a way to remove properties from this calculation ?

You can define a custom complexity metric with a code query that can be transformed into a rule if needed (with the prefix warnif count > 0 and a threshold condition):
from t in JustMyCode.Types
let complexity = t.Methods.Where(m => !(m.IsPropertyGetter || m.IsPropertySetter))
.Sum(m => m.CyclomaticComplexity)
orderby complexity descending
select new { t, complexity, t.CyclomaticComplexity }
The screenshot below compares the obtained values.
Btw, NDepend v5 is not supported anymore and v6 and v2017 brought a lot of new features.

Related

StartsWith not working with integer data type

I am having System.NullReferenceException: Object reference not set to an instance of an object. while running the code below, where lotId is of integer type:
inventories = inventories.Where(u => u.lotId.ToString().StartsWith(param.Lot));
It used to work in netcoreapp2.0 but not working in netcoreapp3.1
The reason it likely worked before was because you were running EF Core 2.x which enabled client-side evaluation by default, where EF Core 3.1+ have it disabled by default. You can enable it for that DbContext instance, or better, consider an approach that doesn't result in a client-side evaluation. For instance if your lot IDs are 7 digit numbers where the first digit denotes a Lot, then calculate a range to compare:
var lotStart = param.Lot * 1000000;
var lotEnd = lotStart + 999999;
inventories = inventories.Where(u => u.lotId >= lotStart && u.lotId <= lotEnd);
This assumes that the first single digit was used to group lots. Client-side evaluation should be avoided where possible because it results in returning far more data to be processed in memory. A client-side eval version as you had it would be returning all inventory records with whatever filtering it might be able to do, then filter out the lot ID check after all of those Lots are loaded.

How to add index in Database using ELKI java API for Custom POJO with String type fields

I am using DBSCAN to cluster some categorical data using a POJO. My class looks like this
public class Dimension {
private String app;
private String node;
private String cluster;
.............
All my fields are String instead of integer or Float because they are discrete/categorical value. Rest of my code is as follows.
final SimpleTypeInformation<Dimension> dimensionTypeInformation = new SimpleTypeInformation<>(Dimension.class);
PrimitiveDistanceFunction<Dimension> dimensionPrimitiveDistanceFunction = new PrimitiveDistanceFunction<Dimension>() {
public double distance(Dimension d1, Dimension d2) {
return simpleMatchingCoefficient(d1, d2);
}
public SimpleTypeInformation<? super Dimension> getInputTypeRestriction() {
return dimensionTypeInformation;
}
public boolean isSymmetric() {
return true;
}
public boolean isMetric() {
return true;
}
public <T extends Dimension> DistanceQuery<T> instantiate(Relation<T> relation) {
return new PrimitiveDistanceQuery<>(relation, this);
}
};
DatabaseConnection dbc = new DimensionDatabaseConnection(dimensionList);
Database db = new StaticArrayDatabase(dbc, null);
db.initialize();
DBSCAN<Dimension> dbscan = new DBSCAN<>(dimensionPrimitiveDistanceFunction, 0.6, 20);
Result result = dbscan.run(db);
Now as expected this code works fine for small dataset but gets very very slow when my dataset gets bigger. So I want to add an index to speed up the process. But all the index that I could think of require me to implement NumberVector. But my class has only Strings not number.
What index can I use in this case ? can I use the distance function, double simpleMatchingCoefficient(Dimension d1, Dimension d2) to create an IndexFactory ?
Thanks in advance.
There are (at least) three broad families of indexes:
Coordinate based indexes, such as the k-d-tree and R-tree. These work well on dense, continuous variables
Metric indexes, that require the distance function to satisfy the triangle inequality. These can work on any kind of data, but may still need a fairly smooth distribution of distance values (e.g., they will not help with the discrete metric, that is 0 of x=y and 1 otherwise).
Inverted lookup indexes. They are mostly used for text search, and exploit that for each attribute only a small subset of the data is relevant. These work well for high-cardinality discrete attributes.
In your case, I'd consider an inverted index. If you have a lot of attributes, a metric index may work, but I doubt that holds, because you use POJOs with strings to store your data.
And of course, profile your code and check if you can improve the implementation of your distance function! E.g. string interning may help, it can reduce matching time of strings to equality testing rather than comparing each character...
First of all, note that the SMC is usually defined as a similarity function, not a distance function, but 1-SMC is the usual transformation. Just don't confuse these two.
For the simple matching coefficient, you probably will want to build your own inverted index, for your particular POJO data type. Because of your POJO design (Dimension sounds like a very bad name, btw.), this cannot be implemented in a generic, reusable, way easily. That would require expensive introspection, and still require customization: should string matches be case sensitive? Do they need trimming? Should they be tokenized?
Your inverted index will then likely contain a series of maps specific to your POJO:
Map<String, DBIDs> by_app;
Map<String, DBIDs> by_node;
Map<String, DBIDs> by_cluster;
...
and for each attribute, you get the matching DBIDs, and count how often they appear. The most often returned DBIDs have the highest SMC (and hence lowest distance).
At some point, you can forget counting candidates that can no longer make it into the result set. Just look up an information retrieval book how such search works.
Such an index is beneficial if the average number of matches for each attribute is low. You can further speed this up by bitmap index compression and such techniques, but that is likely not necessary to do (at some point, it can be attractive to build upon existing tools such as Apache Lucene to handle the search then).

Sphinx Mulit-Level Sort with Randomize

Here is my challenge with Sphinx Sort where I have Vendors who pay for premium placement and those who don't:
I already do a multi-level order including the PaidVendorStatus which is either 0 or 1 as:
order by PaidVendorStatus,Weight()
So in essence I end up with multiple sort groups:
PaidVendorStatus=1, Weight1
....
PaidVendorStatus=1, WeightN
PaidVendorStatus=0, Weight1
...
PaidVendorStatus=0, WeightN
The problem is I have three goals:
Randomly prioritize each vendor in any given sort group
Have each vendor's 'odds' of being randomly assigned top position be equal regardless of how many records they have returned in the group (so if Vendor A has 50 results and VendorB has 2 results they still both have 50% odds of being randomly assigned any given spot)
Ideally, maintain the same results order in any given search (so that if the user searches again the same order will be displayed
I've tried various solutions:
Select CRC32(Vendor) as RANDOM...Order by PaidVendorStatus,Weight(),RANDOM
which solves 2 and 3 except due to the nature of CRC32 ALWAYS puts the same vendor first (and second, third, etc.) so in essence does not solve the issue at all.
I tried making a sphinx sql_attr_string in my Sphinx Configuration which was a concatenation of Vendor and the record Title (Select... concat(Vendor,Title) as RANDOMIZER..)` and then used that to randomize
Select CRC32(RANDOMIZER) as RANDOM...
which solves 1 and 3 as now the Title field gets thrown in the randomization mis so that the same Vendor does not always get first billing. However, it fails at 2 since in essence I am only sorting by Title and thus Vendor B with two results now has a very low change of being sorted first.
In an ideal world naturally I could just order this way;
Order by PaidVendorStatus,Weight(),RAND(Vendor)
but that is not possible.
Any thoughts on this appreciated. I did btw check out as per Barry Hunter's suggestion this thread on UDF but unless I am not understanding it at all (possible) it does not seem to be the solution for this problem.
Well one idea is:
SELECT * FROM (
SELECT *,uniqueserial(vendor_id) AS sorter FROM index WHERE MATCH(...)
ORDER BY PaidVendorStatus DESC ,Weight() DESC LIMIT 1000
) ORDER BY sorter DESC, WEIGHT() DESC:
This exploits SPhixnes 'multiple sort' function with pysudeo subquery.
This works wors becasuse the inner query is sorted by PaidVendor first, so their items are fist. Which works to affect the ordr that unqique serial is called in.
Its NOT really 'randomising' the results as such, seems you jsut randomising them to mix up the vendors (so a single vendor doesnt domninate results. Uniqueserial works by 'spreading' the particular vendors results out - the results will tend to cycle through the vendors.
This is tricky as it exploits a relative undocumented sphinx feature - subqueries.
For the UDF see http://svn.geograph.org.uk/svn/modules/trunk/sphinx/
Still dont have an answer for your biased random (as in 2.)
but just remembered another feature taht can help with 3. - can supply s specific seed to the random number. Typically random generators are seeded from the current time, which gives ever changing values, But using a specific seed.
Seed is however a number, so need a predictable, but changing number. Could CRC the query?
... sphinx doesnt support expressions in the OPTION so would have to caculate the hash in the app.
<?php
$query = $db->Quote($_GET['q']);
$crc = crc32($query);
$sql = "SELECT id,IDIV(WEIGHT(),100) as i,RAND() as r FROM index WHERE MATCH($query)
ORDER BY PaidVendorStatus DESC,i DESC,r ASC OPTION random_seed=$crc";
If wanted the results to only slowly evolve, add the current date, so each day is a new selection...
$crc = crc32($query.date('Ymd'));

How to use CQLinq to get metrics of Methods and Fields within a single query

I am calculating average length of identifiers with CQLinq in NDepend, and I want to get the length of the names of classes, fields and methods. I walked through this page of CQlinq: http://www.ndepend.com/docs/cqlinq-syntax, and I have code like:
let id_m = Methods.Select(m => new { m.SimpleName, m.SimpleName.Length })
let id_f = Fields.Select(f => new { f.Name, f.Name.Length })
select id_m.Union(id_f)
It doesn't work, one error says:
'System.Collections.Generic.IEnumerable' does not
contain a definition for 'Union'...
The other one is:
cannot convert from
'System.Collections.Generic.IEnumerable' to
'System.Collections.Generic.HashSet'
However, according to MSDN, IEnumerable Interface defines Union() and Concat() methods.
It seems to me that I cannot use CQLinq exactly the same way as Linq. Anyway, is there a way to get the information from Types, Methods and Fields domains within a singe query?
Thanks a lot.
is there a way to get the information from Types, Methods and Fields domains within a singe query?
Not for now, because a CQLinq query can only match a sequence of types, or a sequence of methods or a sequence of field, so you need 3 distinct code queries.
For next version CQLinq, will be improved a lot and indeed you'll be able to write things like:
from codeElement in Application.TypesAndMembers
select new { codeElement, codeElement.Name.Length }
Next version will be available before the end of the year 2016.

T-SQL speed comparison between LEFT() vs. LIKE operator

I'm creating result paging based on first letter of certain nvarchar column and not the usual one, that usually pages on number of results.
And I'm not faced with a challenge whether to filter results using LIKE operator or equality (=) operator.
select *
from table
where name like #firstletter + '%'
vs.
select *
from table
where left(name, 1) = #firstletter
I've tried searching the net for speed comparison between the two, but it's hard to find any results, since most search results are related to LEFT JOINs and not LEFT function.
"Left" vs "Like" -- one should always use "Like" when possible where indexes are implemented because "Like" is not a function and therefore can utilize any indexes you may have on the data.
"Left", on the other hand, is function, and therefore cannot make use of indexes. This web page describes the usage differences with some examples. What this means is SQL server has to evaluate the function for every record that's returned.
"Substring" and other similar functions are also culprits.
Your best bet would be to measure the performance on real production data rather than trying to guess (or ask us). That's because performance can sometimes depend on the data you're processing, although in this case it seems unlikely (but I don't know that, hence why you should check).
If this is a query you will be doing a lot, you should consider another (indexed) column which contains the lowercased first letter of name and have it set by an insert/update trigger.
This will, at the cost of a minimal storage increase, make this query blindingly fast:
select * from table where name_first_char_lower = #firstletter
That's because most database are read far more often than written, and this will amortise the cost of the calculation (done only for writes) across all reads.
It introduces redundant data but it's okay to do that for performance as long as you understand (and mitigate, as in this suggestion) the consequences and need the extra performance.
I had a similar question, and ran tests on both. Here is my code.
where (VOUCHER like 'PCNSF%'
or voucher like 'PCLTF%'
or VOUCHER like 'PCACH%'
or VOUCHER like 'PCWP%'
or voucher like 'PCINT%')
Returned 1434 rows in 1 min 51 seconds.
vs
where (LEFT(VOUCHER,5) = 'PCNSF'
or LEFT(VOUCHER,5)='PCLTF'
or LEFT(VOUCHER,5) = 'PCACH'
or LEFT(VOUCHER,4)='PCWP'
or LEFT (VOUCHER,5) ='PCINT')
Returned 1434 rows in 1 min 27 seconds
My data is faster with the left 5. As an aside my overall query does hit some indexes.
I would always suggest to use like operator when the search column contains index. I tested the above query in my production environment with select count(column_name) from table_name where left(column_name,3)='AAA' OR left(column_name,3)= 'ABA' OR ... up to 9 OR clauses. My count displays 7301477 records with 4 secs in left and 1 second in like i.e where column_name like 'AAA%' OR Column_Name like 'ABA%' or ... up to 9 like clauses.
Calling a function in where clause is not a best practice. Refer http://blog.sqlauthority.com/2013/03/12/sql-server-avoid-using-function-in-where-clause-scan-to-seek/
Entity Framework Core users
You can use EF.Functions.Like(columnName, searchString + "%") instead of columnName.startsWith(...) and you'll get just a LIKE function in the generated SQL instead of all this 'LEFT' craziness!
Depending upon your needs you will probably need to preprocess searchString.
See also https://github.com/aspnet/EntityFrameworkCore/issues/7429
This function isn't present in Entity Framework (non core) EntityFunctions so I'm not sure how to do it for EF6.