This is a question about the definition of a class.
Of course I have read the endless examples on the Internet of what should be called a class. I have read that it is all the verbs and nouns that make up a thing. I understand the concept of a car class with properties like size, colour, and methods like drive
I also understand the idea that a class should have only one responsibility and adhere to the other SOLID principles
My problem relates to a program I have developed.
The responsibility of the program is to extract all the similar words from a document. It is therefore not a 'noun' like a car or animal but a verb type class I suppose.
In order to do this the program iterates through a folder of text files, extracts all the text, splits the text up by line and then 20 characters, compares each of the chunks in one file to all of the others by similarity, keeps only the words that are similar between two files, cleans the words to get rid of various characters and then added the words to a text file and repeats this for all the files in the folder.
So I have one responsibility for the class and I have written methods for each of the phrases between the commas.
Having read more about class design then it could to me that some of these methods might be classes in their own right. If a class is defined by having a single responsibility then presumably I could define more classes instead of these methods. E.g. why don't I have a class to find word similarity with only one method....
So my question is how do I define a class on a single responsbility basis if a method also has a single responsibility and the class doesn't define a thing but more of an action. What are the boundaries of what defines a class?
Please no...'Have you read'...because I have read them all. A simple explanation with a well illustrated example (conceptual example is fine)
The term "single responsibility" is very nebulous. I find it much easier to think of it in terms of cohesion and coupling. In short, we have to get things that tend to change together (i.e. are strongly cohesive) into one class and things that don't (i.e. are loosely coupled) into separate classes.
In practice that means things that tend to work with the same "data" belong to the same class. This can be easily enforced if data does not leave the object. Even more pragmatically that means avoiding "getter" methods that return data from an object.
Regarding your problem. You're saying it's not a noun, but only because you don't think of it that way. What is your "business logic"? To collect SimilarWords from a Document. Both are nouns. Your phrases are all about what steps should be taken. Rethink your application in terms of what things are involved and what actions those things would be able to do for you.
Here is a short/incomplete design for the things you describe:
public interface Folder {
public SimilarWords extract();
}
Meaning: I want to extract SimilarWords from a Folder.
public interface TextFile {
public void chunk(Consumer<Chunk> chunkConsumer);
}
Meaning: TextFile chunks the text.
public class Comparison {
public Comparison(TextFile file1, TextFile file2);
public SimilarWords extract();
}
Meaning: Two TextFiles are compared where the SimilarWords come from. You didn't use the word "Comparison" explicitly, I made that up.
And of course SimilarWords need to be added together for all file pairs (?) and then written to some output:
public interface SimilarWords {
public SimilarWords add(SimilarWords other);
public void writeTo(OutputStream output);
}
So that would be a proper OO design. I didn't catch all the details of your domain, so this model may be not exactly what you want, but I think you get the point.
Let's think a little about both your problem, problems in general, and SRP.
SRP states that a class should be concerned with one thing. This doesn't mean exactly to have a single method that does only one thing.
Actually this can be applied outside OOP too: a function should do only a single thing.
Now imagine your program has to implement 200 features. Imagine they are so simple that a single function is enough to implement any feature. And suppose you are using only functions. By the same principle you have to write (at least) 200 functions. Now this is not so great as it looks. First you program structure looks like an endless list of micro-sized pieces of code. Second if they are micro-sized, they can't do much by themselves (this is not bad per see). As you suspected a feature doesn't usually map to a single function in real world. Third if they do almost nothing, they have to ask everything to someone else. Or someone is doing that somewhere else. So there is some place where a function, or a class, is calling all the others. That place centralizes a lot of knowledge about the system. It has to know about everything to be able to call everyone. This is not good for an architecture.
The alternative is to distribute the knowledge.
If you allow those functions or classes to do a little more, they ask less things to others, some of those things are solved locally. Let me guess. As all this classes are in the same application, some of them are related to each other. They can form a group and collaborate. Maybe they can be the same class, or inherit from others. This reduces communication paths. Communication becomes more local.
Comunication paths matter. Imagine there are 125 persons in your company, and the company needs to take collective decisions. Would you do a 125 people meeting, or you group people say in 5 groups, each with 5 teams of 5 people and have small meetings instead, and then the team and group leaders meet themselves? This is a form of hierarchy or structure that helps things.
Can you imagine the fan-in and fan-out in the new structure? 5/5/5 is much better than 1/125.
So this is about a trade-off. You are exchanging communication paths by responsabilities. What you want in the end to have a reasonable architecture, with knowledge distributed evenly.
I'm looking into code smells that have an impact on the readability of an application. I came across long method names and I was wondering if there is a convention for this.
I've checked the naming conventions in the scaladocs but it didn't list anything about the length of a method name.
I also checked the Scalastyle rules and noticed it defaulted to 50.
Is there an official convention for the maximum length of a method name, and if so how long is it?
Scalastyle does default the value to 50.
Paypal's Style Guide mentions "For function names over 30 characters, try to shorten the name."
Neither Databricks' Scala Style Guide nor Twitter's Effective Scala seem to mention method name length.
Personally, I do believe that 30 might still be too much (just try to picture yourself reading code with that method name scattered all through it, and try to imagine how much more difficult it will be to distinguish it from another similar one if they are both long).
It might be useful to look into patterns and default from other languages. I compiled a bit of information in this blog post: Cross-language Best Practices.
this has previously been asked here (http://framework.zend.com/issues/browse/ZF-11135) with no response from Zend so really it has to come down to popular or majority decision.
The reason I am asking is because the company that I work for are increasing in size and having a standard style is obviously a sensible approach.
One example that is ignored from the example linked above is multiple methods per line, I.e
$this->setAction()->setMethod()->etc()
->etc()->andSoForth();
Which assists in the compliance of line length.
So whats your personal opinion?
Method chaining can get a little hard to follow on long lines, but if you add a return before each method call then it is perfectly readable and saves repetitively typing the class variable.
Regarding the question asked at http://framework.zend.com/issues/browse/ZF-11135 - the first and second code examples are identical - should they be showing a difference?
This is a general design question not relating to any language. I'm a bit torn between going for minimum code or optimum organization.
I'll use my current project as an example. I have a bunch of tabs on a form that perform different functions. Lets say Tab 1 reads in a file with a specific layout, tab 2 exports a file to a specific location, etc. The problem I'm running into now is that I need these tabs to do something slightly different based on the contents of a variable. If it contains a 1 I may need to use Layout A and perform some extra concatenation, if it contains a 2 I may need to use Layout B and do no concatenation but add two integer fields, etc. There could be 10+ codes that I will be looking at.
Is it more preferable to create an individual path for each code early on, or attempt to create a single path that branches out only when absolutely required.
Creating an individual path for each code would allow my code to be extremely easy to follow at a glance, which in turn will help me out later on down the road when debugging or making changes. The downside to this is that I will increase the amount of code written by calling some of the same functions in multiple places (for example, steps 3, 5, and 9 for every single code may be exactly the same.
Creating a single path that would branch out only when required will be a bit messier and more difficult to follow at a glance, but I would create less code by placing conditionals only at steps that are unique.
I realize that this may be a case-by-case decision, but in general, if you were handed a previously built program to work on, which would you prefer?
Edit: I've drawn some simple images to help express it. Codes 1/2/3 are the variables and the lines under them represent the paths they would take. All of these steps need to be performed in a linear chronological fashion, so there would be a function to essentially just call other functions in the proper order.
Different Paths
Single Path
Creating a single path that would
branch out only when required will be
a bit messier and more difficult to
follow at a glance, but I would create
less code by placing conditionals only
at steps that are unique.
Im not buying this statement. There is a level of finesse when deciding when to write new functions. Functions should be as simple and reusable as possible (but no simpler). The correct answer is almost never 'one big file that does a lot of branching'.
Less LOC (lines of code) should not be the goal. Readability and maintainability should be the goal. When you create functions, the names should be self documenting. If you have a large block of code, it is good to do something like
function doSomethingComplicated() {
stepOne();
stepTwo();
// and so on
}
where the function names are self documenting. Not only will the code be more readable, you will make it easier to unit test each segment of the code in isolation.
For the case where you will have a lot of methods that call the same exact methods, you can use good OO design and design patterns to minimize the number of functions that do the same thing. This is in reference to your statement "The downside to this is that I will increase the amount of code written by calling some of the same functions in multiple places (for example, steps 3, 5, and 9 for every single code may be exactly the same."
The biggest danger in starting with one big block of code is that it will never actually get refactored into smaller units. Just start down the right path to begin with....
EDIT --
for your picture, I would create a base-class with all of the common methods that are used. The base class would be abstract, with an abstract method. Subclasses would implement the abstract method and use the common functions they need. Of course, replace 'abstract' with whatever your language of choice provides.
You should always err on the side of generalization, with the only exception being early prototyping (where throughput of generating working stuff is majorly impacted by designing correct abstractions/generalizations). having said that, you should NEVER leave that mess of non-generalized cloned branches past the early prototype stage, as it leads to messy hard to maintain code (if you are doing almost the same thing 3 different times, and need to change that thing, you're almost sure to forget to change 1 out of 3).
Again it's hard to specifically answer such an open ended question, but I believe you don't have to sacrifice one for the other.
OOP techniques solves this issue by allowing you to encapsulate the reusable portions of your code and generate child classes to handle object specific behaviors.
Personally I think you might (if possible by your API) create inherited forms, create them on fly on master form (with tabs), pass agruments and embed in tab container.
When to inherit form and when to decide to use arguments (code) to show/hide/add/remove functionality is up to you, yet master form should contain only decisions and argument passing and embeddable forms just plain functionality - this way you can separate organisation from implementation.
Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 7 years ago.
Improve this question
I know that there is no right answer to this question, I'm just asking for your opinions.
I know that creating HUGE class files with thousand lines of code is not a good practice since it's hard to maintain and also it usually means that you should probably review your program logic.
In your opinion what is an average line count for a class let's say in Java (i don't know if the choice of language has anything to do with it but just in case...)
Yes, I'd say it does have to do with the language, if only because some languages are more verbose than others.
In general, I use these rules of thumb:
< 300 lines: fine
300 - 500 lines: reasonable
500 - 1000 lines: maybe ok, but plan on refactoring
> 1000 lines: definitely refactor
Of course, it really depends more on the nature and complexity of the code than on LOC, but I've found these reasonable.
In general, number of lines is not the issue - a slightly better metric is number of public methods. But there is no correct figure. For example, a utility string class might correctly have hundreds of methods, whereas a business level class might have only a couple.
If you are interested in LOC, cyclomatic and other complexity measurements, I can strongly recommend Source Monitor from http://www.campwoodsw.com, which is free, works with major languages such as Java & C++, and is all round great.
From Eric Raymond's "The Art Of Unix Programming"
In nonmathematical terms, Hatton's empirical results imply a sweet spot between 200 and 400 logical lines of code that minimizes probable defect density, all other factors (such as programmer skill) being equal. This size is independent of the language being used — an observation which strongly reinforces the advice given elsewhere in this book to program with the most powerful languages and tools you can. Beware of taking these numbers too literally however. Methods for counting lines of code vary considerably according to what the analyst considers a logical line, and other biases (such as whether comments are stripped). Hatton himself suggests as a rule of thumb a 2x conversion between logical and physical lines, suggesting an optimal range of 400–800 physical lines.
Taken from here
Better to measure something like cyclomatic complexity and use that as a gauge. You could even stick it in your build script/ant file/etc.
It's too easy, even with a standardized code format, for lines of code to be disconnected from the real complexity of the class.
Edit: See this question for a list of cyclomatic complexity tools.
I focus on methods and (try to) keep them below 20 lines of code. Class length is in general dictated by the single responsibility principle. But I believe that this is no absolute measure because it depends on the level of abstraction, hence somewhere between 300 and 500 lines I start looking over the code for a new responsibility or abstraction to extract.
Small enough to do only the task it is charged with.
Large enough to do only the task it is charged with.
No more, no less.
In my experience any source file over 1000 text lines I will start wanting to break up. Ideally methods should fit on a single screen, if possible.
Lately I've started to realise that removing unhelpful comments can help greatly with this. I comment far more sparingly now than I did 20 years ago when I first started programming.
The short answer: less than 250 lines.
The shorter answer: Mu.
The longer answer: Is the code readable and concise? Does the class have a single responsibility? Does the code repeat itself?
For me, the issue isn't LOC. What I look at is several factors. First, I check my If-Else-If statements. If a lot of them have the same conditions, or result in similar code being run, I try to refactor that. Then I look at my methods and variables. In any single class, that class should have one primary function and only that function. If it has variables and methods for a different area, consider putting those into their own class. Either way, avoid counting LOC for two reasons:
1) It's a bad metric. If you count LOC you're counting not just long lines, but also lines which are whitespace and used for comments as though they are the same. You can avoid this, but at the same time, you're still counting small lines and long lines equally.
2) It's misleading. Readability isn't purely a function of LOC. A class can be perfectly readable but if you have a LOC count which it violates, you're gonna find yourself working hard to squeeze as many lines out of it as you can. You may even end up making the code LESS readable. If you take the LOC to assign variables and then use them in a method call, it's more readable than calling the assignments of those variables directly in the method call itself. It's better to have 5 lines of readable code than to condense it into 1 line of unreadable code.
Instead, I'd look at depth of code and line length. These are better metrics because they tell you two things. First, the nested depth tells you if you're logic needs to be refactored. If you are looking at If statements or loops nested more than 2 deep, seriously consider refactoring. Consider refactoring if you have more than one level of nesting. Second, if a line is long, it is generally very unreadable. Try separating out that line onto several more readable lines. This might break your LOC limit if you have one, but it does actually improve readability.
line counting == bean counting.
The moment you start employing tools to find out just how many lines of code a certain file or function has, you're screwed, IMHO, because you stopped worrying about managebility of the code and started bureaucratically making rules and placing blame.
Have a look at the file / function, and consider if it is still comfortable to work with, or starts getting unwieldly. If in doubt, call in a co-developer (or, if you are running a one-man-show, some developer unrelated to the project) to have a look, and have a quick chat about it.
It's really just that: a look. Does someone else immediately get the drift of the code, or is it a closed book to the uninitiated? This quick look tells you more about the readability of a piece of code than any line metrics ever devised. It is depending on so many things. Language, problem domain, code structure, working environment, experience. What's OK for one function in one project might be all out of proportion for another.
If you are in a team / project situation, and can't readily agree by this "one quick look" approach, you have a social problem, not a technical one. (Differing quality standards, and possibly a communication failure.) Having rules on file / function lengths is not going to solve your problem. Sitting down and talking about it over a cool drink (or a coffee, depending...) is a better choice.
You're right... there is no answer to this. You cannot put a "best practice" down as a number of lines of code.
However, as a guideline, I often go by what I can see on one page. As soon as a method doesn't fit on one page, I start thinking I'm doing something wrong. As far as the whole class is concerned, if I can't see all the method/property headers on one page then maybe I need to start splitting that out as well.
Again though, there really isn't an answer, some things just have to get big and complex. The fact that you know this is bad and you're thinking about it now, probably means that you'll know when to stop when things get out of hand.
Lines of code is much more about verbosity than any other thing. In the project I'm currently working we have some files with over 1000 LOC. But, if you strip the comments, it will probably remain about 300 or even less. If you change declarations like
int someInt;
int someOtherInt;
to one line, the file will be even shorter.
However, if you're not verbose and you still have a big file, you'll probably need to think about refactoring.