Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 7 years ago.
Improve this question
I know that there is no right answer to this question, I'm just asking for your opinions.
I know that creating HUGE class files with thousand lines of code is not a good practice since it's hard to maintain and also it usually means that you should probably review your program logic.
In your opinion what is an average line count for a class let's say in Java (i don't know if the choice of language has anything to do with it but just in case...)
Yes, I'd say it does have to do with the language, if only because some languages are more verbose than others.
In general, I use these rules of thumb:
< 300 lines: fine
300 - 500 lines: reasonable
500 - 1000 lines: maybe ok, but plan on refactoring
> 1000 lines: definitely refactor
Of course, it really depends more on the nature and complexity of the code than on LOC, but I've found these reasonable.
In general, number of lines is not the issue - a slightly better metric is number of public methods. But there is no correct figure. For example, a utility string class might correctly have hundreds of methods, whereas a business level class might have only a couple.
If you are interested in LOC, cyclomatic and other complexity measurements, I can strongly recommend Source Monitor from http://www.campwoodsw.com, which is free, works with major languages such as Java & C++, and is all round great.
From Eric Raymond's "The Art Of Unix Programming"
In nonmathematical terms, Hatton's empirical results imply a sweet spot between 200 and 400 logical lines of code that minimizes probable defect density, all other factors (such as programmer skill) being equal. This size is independent of the language being used — an observation which strongly reinforces the advice given elsewhere in this book to program with the most powerful languages and tools you can. Beware of taking these numbers too literally however. Methods for counting lines of code vary considerably according to what the analyst considers a logical line, and other biases (such as whether comments are stripped). Hatton himself suggests as a rule of thumb a 2x conversion between logical and physical lines, suggesting an optimal range of 400–800 physical lines.
Taken from here
Better to measure something like cyclomatic complexity and use that as a gauge. You could even stick it in your build script/ant file/etc.
It's too easy, even with a standardized code format, for lines of code to be disconnected from the real complexity of the class.
Edit: See this question for a list of cyclomatic complexity tools.
I focus on methods and (try to) keep them below 20 lines of code. Class length is in general dictated by the single responsibility principle. But I believe that this is no absolute measure because it depends on the level of abstraction, hence somewhere between 300 and 500 lines I start looking over the code for a new responsibility or abstraction to extract.
Small enough to do only the task it is charged with.
Large enough to do only the task it is charged with.
No more, no less.
In my experience any source file over 1000 text lines I will start wanting to break up. Ideally methods should fit on a single screen, if possible.
Lately I've started to realise that removing unhelpful comments can help greatly with this. I comment far more sparingly now than I did 20 years ago when I first started programming.
The short answer: less than 250 lines.
The shorter answer: Mu.
The longer answer: Is the code readable and concise? Does the class have a single responsibility? Does the code repeat itself?
For me, the issue isn't LOC. What I look at is several factors. First, I check my If-Else-If statements. If a lot of them have the same conditions, or result in similar code being run, I try to refactor that. Then I look at my methods and variables. In any single class, that class should have one primary function and only that function. If it has variables and methods for a different area, consider putting those into their own class. Either way, avoid counting LOC for two reasons:
1) It's a bad metric. If you count LOC you're counting not just long lines, but also lines which are whitespace and used for comments as though they are the same. You can avoid this, but at the same time, you're still counting small lines and long lines equally.
2) It's misleading. Readability isn't purely a function of LOC. A class can be perfectly readable but if you have a LOC count which it violates, you're gonna find yourself working hard to squeeze as many lines out of it as you can. You may even end up making the code LESS readable. If you take the LOC to assign variables and then use them in a method call, it's more readable than calling the assignments of those variables directly in the method call itself. It's better to have 5 lines of readable code than to condense it into 1 line of unreadable code.
Instead, I'd look at depth of code and line length. These are better metrics because they tell you two things. First, the nested depth tells you if you're logic needs to be refactored. If you are looking at If statements or loops nested more than 2 deep, seriously consider refactoring. Consider refactoring if you have more than one level of nesting. Second, if a line is long, it is generally very unreadable. Try separating out that line onto several more readable lines. This might break your LOC limit if you have one, but it does actually improve readability.
line counting == bean counting.
The moment you start employing tools to find out just how many lines of code a certain file or function has, you're screwed, IMHO, because you stopped worrying about managebility of the code and started bureaucratically making rules and placing blame.
Have a look at the file / function, and consider if it is still comfortable to work with, or starts getting unwieldly. If in doubt, call in a co-developer (or, if you are running a one-man-show, some developer unrelated to the project) to have a look, and have a quick chat about it.
It's really just that: a look. Does someone else immediately get the drift of the code, or is it a closed book to the uninitiated? This quick look tells you more about the readability of a piece of code than any line metrics ever devised. It is depending on so many things. Language, problem domain, code structure, working environment, experience. What's OK for one function in one project might be all out of proportion for another.
If you are in a team / project situation, and can't readily agree by this "one quick look" approach, you have a social problem, not a technical one. (Differing quality standards, and possibly a communication failure.) Having rules on file / function lengths is not going to solve your problem. Sitting down and talking about it over a cool drink (or a coffee, depending...) is a better choice.
You're right... there is no answer to this. You cannot put a "best practice" down as a number of lines of code.
However, as a guideline, I often go by what I can see on one page. As soon as a method doesn't fit on one page, I start thinking I'm doing something wrong. As far as the whole class is concerned, if I can't see all the method/property headers on one page then maybe I need to start splitting that out as well.
Again though, there really isn't an answer, some things just have to get big and complex. The fact that you know this is bad and you're thinking about it now, probably means that you'll know when to stop when things get out of hand.
Lines of code is much more about verbosity than any other thing. In the project I'm currently working we have some files with over 1000 LOC. But, if you strip the comments, it will probably remain about 300 or even less. If you change declarations like
int someInt;
int someOtherInt;
to one line, the file will be even shorter.
However, if you're not verbose and you still have a big file, you'll probably need to think about refactoring.
Related
I've seen code where there are just two rather static elements to be mapped such as time intervals with start and end dates, yet map() is being used rather than explicit code for mapping, e.g.
{ map { ... } qw(start end) } # vs.
{ start => ..., end => ... }
Which way is preferrable, and why?
The map form may be less concise but looks more functional (as in functional programming), so I guess that's why it may be preferred over explicit code and is perhaps more DRY.
However, it looks less legible to me because there is more logic going on behind, and mapping should also be less efficient because it invokes calls a and consists of more atomic operations.
EDIT
There is a conflicting goal in programming: KISS (keep it { pick 2 from: small, simple, stupid }). Using map slightly complicates code.
Assuming you're not just setting both items to the same constant or something similarly trivial, I would expect the map version to be more concise.
IMO, the main point in favor of the map version is that you know the same process will be used to produce both values. Not only for the sake of DRY, but also because it eliminates any concern that one might have a subtle change which the other doesn't.
As for the performance concern... If your use case is sufficiently performance-sensitive for any potential difference to matter, then you shouldn't be using Perl in the first place. Switching to well-written C (not C#, not C++, not Objective C - just plain C) will have a far greater performance impact than micro-optimizing whether you assign two values individually vs. using a loop to set them. But the odds of your use case being that sensitive are approximately zero anyhow.
There is a principle of coding known as DRY. Don't Repeat Yourself.
It asserts that:
Every piece of knowledge must have a single, unambiguous, authoritative representation within a system.
And that can be interpreted as condensing duplicate typing with (things like) map/for.
I use idioms like the one you've quoted when I'm trying to expand some text - for example:
my #defs = map { "DEF:$_=$source_file:$_:MAX" } qw ( read write );
This generates me some DEF lines for rrdtool.
I'm doing it this way, because for some cases, I've got considerably longer lists of 'things I want to define' and want to be consistent. (Sometimes I have say, 10 similar lines that differ only by a single word).
But also because:
my #defs = ( "DEF:read=$source_file:read:MAX",
"DEF:write=$source_file:write:MAX" );
There's not much in it for two elements, and I'd suggest it's as much a matter of style as anything. However, if you've got more than that, it quickly becomes very beneficial because you can change the single line - say you've got a different file location? Want to swap MAX for AVERAGE?
It's also quite shockingly easy to go 'punctuation blind' when looking at a long sequence of similar statements, where someone's typo-ed and added a , where it should be . or similar.
And ... you probably don't lose a great deal in terms of readability. But will acknowledge that's something of a style point, because whilst map is pretty amazing, it can make for some rather hard to read code if you're not careful.
Also to specifically address:
mapping should also be less efficient because it invokes calls a and consists of more atomic operations.
A wise man once said:
premature optimization is the root of all evil
Don't think about the efficiency of a statement - look at the legibility/readability. Compilers are pretty clever. Most "obvious" optimisations, they already make for you. Processors are also pretty fast. Your limiting factor in most code isn't the amount of CPU cycles you need, it's IO throughput and memory footprint. So don't worry about it - write clear code.
And if there's a performance critical demand on your code, you should be using a code profiler to look at where you gain the most efficiency for your effort at refactoring. You may end up with less clear code in doing so (sometimes) but that's a more clear tradeoff.
Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 8 years ago.
Improve this question
My aim is to write a basic chess playing AI. It doesn't need to be incredible, but I want it to play with some degree of competence against people with some level of familiarity with the game.
I have a trait called Piece which has abstract methods canMakeMove(m: Move, b: Board) and allMovesFrom(p: Position, b: Board). These methods are important to the program logic for obvious reasons, and are implemented by the concrete classes King, Queen, Pawn, Rook, Bishop, and Knight. Thus in other places, say for example in the code that determines whether or not a particular board has a King in check, these methods are called on values whose type is the abstract type Piece, (piece canMakeMove (..., ...)) so the actual method called is determined at runtime via dynamic dispatch.
I'm wondering if this is too costly for the purposes of a Chess AI program which is going to have to execute this code many times. After looking around online and reading some more about chess programming I've found that the most common representation of a chess board is not like my Vector[Vector[Option[Piece]] but an int matrix ('bit board') that likely uses a switch statement on the values in the board to accomplish the effect that I'm currently relying on dynamic dispatch to achieve. Will this prevent my AI from reaching viable performance level?
I will suggest in this answer that you are subject to a common pitfall in programming.
There is one very important wisdom that I learned from my colleagues, and it has proven its worth to me time and time again:
Only solve problems when they occur, and not any sooner.
Keep in mind that there have been chess programs and chess computers around for a long, long time.
There were even chess programs on the C64, and there were those tiny travel chess computers that were embedded in the chess board, so you could carry them around easily.
Those systems had far less resources than your present day computer by several orders of magnitude.
If you had a similar algorithm as the one that was running on those old systems, and that algorithm was implemented in Scala, with dynamic dispatch, generous vectors instead of bit arrays et cetera, they would still be faster than they were on the older hardware from the past days.
Still, writing even a moderate search heuristic for the chess problem is a colossal task for a single developer.
When faced with a colossal task, one intuitively tends to look for the first bit of the problem that one feels competent in solving, in hopes that this will then naturally lead to the next little problem that you can solve, and so on, eventually leading to a full solution to the colossal problem. Divide and conquer.
However, my personal experience tells me that this is not a good way to tackle bigger programming problems.
Usually, if you follow this path, you end up with a really, really optimized representation of the chess board, consuming only very little memory, and optimized for runtime as much as possible.
And then you're stuck. You have put all your energy into this, you're really proud of your great chessboard class, and somehow it's really dissatisfying that you feel no step closer to actually have a working chess AI.
That's where the rule that I mentioned in the beginning comes in handy.
In short: Don't optimize. Yet. You can still optimize later, if it's necessary.
Instead, do the following:
Make a class or trait that represents a chessboard.
Leave it empty in the beginning - no methods or members.
Do the same with the types representing pieces, board positions, moves etc.
Then, when you start implementing the chess AI, fill in those types with methods. But only add the methods that you really need, and only when you need them, and not any single method more than that.
Prefer to use traits in the beginning, as you can later on produce more optimized versions of a trait, if necessary.
In that way, try to get as soon as you can to the first version of your chess AI that does something interesting.
It's up to you to define what "interesting" means in this context.
It's okay if the first version of the chess AI is crappy, and loses every game because it makes really bad decisions.
It should just have that one single thing: It should do something that you find at least remotely interesting.
Then, identify the top problem that you have with your chess AI.
Think about how to solve it, come up with a plan that solves only that one problem, in the simplest way that you can possibly imagine, even if your programmer instincts tell you to do something more complex.
Resist the urge to delve into complexity fast, because the most complex code that you write will eventually turn out to be your biggest problem, and also the most inflexible part of your code base.
Short, simple, almost stupid steps towards a solution. Prototype a lot, create simple versions in a short time, and only optimize when needed.
As for the optimization - that's really not a problem.
For example, let's say that in the beginning you thought it's a good idea to define a board position like this:
case class Position(x:Int, y:Int)
Later on, you figure that it would have been a better choice to just use a tuple of ints to represent a board position. Simply substitute your previous definition with:
type Position = (Int, Int)
There you go. Do that change, compile the code, and the compile errors will show you all the places in the code that you need to adapt. It's not going to take very long to refactor this.
Even later down the line, you decide that since there are only 64 possible positions on the board, you can easily represent the position with a number in the range from 0 to 64 - reserving the number 64 for "off-board".
Again, that's an easy change:
type Position = Byte
Save, compile, fix the compile errors. Should not take longer than 15 minutes.
With this example, I want to illustrate that you shouldn't be afraid of optimizing later, waiting to solve problems only when they actually occur.
This will always require some refactoring, but the effort is usually not really worth mentioning.
I tend to think of this as "massaging the code".
Of course we know that junior programmers sometimes tend to write "spaghetti code" that is very hard to refactor later.
Maybe that's why there is a certain fear of late optimization that practically every developer knows, and the urge to optimize early.
The only real antidote against such "spaghetti code" is to go through this painful process several times. After a while, you will develop a very good intuition on how to write code that can be optimized and refactored easily.
This will match all the common programming wisdoms, like separation of concerns, short methods, encapsulation and so on.
At this point, I would recommend the book "Clean Code" as an excellent guideline.
Some more tips:
See your program as a bridge. You stand on one side of the bridge, starting off with nothing. On the far end, there is your vision - a decent chess AI. Between the ends, there is a gap. Your program will be the bridge. Don't start by putting all your energy into optimizing the very first brick that you put in your bridge. A bridge with just one perfect brick won't hold. Build a prototype bridge first - crappy, but it connects the two ends. When you have that, it's easier to enhance the crappy bridge step by step.
Abstract all the functionality into traits and classes. Be wary of code repetitions - when you find a bug or need to do a refactoring, repetitious code can be a neck breaker.
Don't be afraid to write code that doesn't look smart, as long as the code is simple and solves the problem. Bonus points for code that reads well and looks almost like an intuitive description of the algorithm.
Keep your methods short. One to six lines.
Make sure that your code doesn't nest too deeply - everyone with a bit of Scala experience should be able to understand what your code is doing.
It can be a good thing to spend some time in order to find good names for your concepts. Maybe there are even some standards of naming certain parts of a chess AI. It's good to do a bit of research first.
While you are researching, see if there are some Scala/Java libraries available that you can use, for important parts of your problem. You can still replace the libraries with your own implementations later, but in the beginning, using the knowledge of other people can give you a jump start into creating your first, prototype version "bridge". Everything that will help you close the "gap" quickly is good.
Don't think of your code as something eternal, perfect, unchangeable. Maybe you have the ambition to write the best representation of a chess board in Scala ever. Don't do that. Eternal, perfect code cannot be refactored or changed easily. Code that can be refactored easily is really good code. Refactoring code is 50% of what a programmer does.
Jump into automated testing. For Scala, using the scalatest library can be a good choice. Use a tool like SBT that will run all the automated tests on every build. If you have certain assumptions about the classes you write, hardcode those assumptions into automated tests. It will feel a little stupid and redundant at first. But the first time that one of your automated tests show you the cause of a bug that would otherwise have been very hard to find, all the time to write those automated tests will have paid out.
An advanced topic about automated testing is code coverage. You can use a tool that will generate a coverage report, how much of your code is covered by automated tests. From my own experience, I would recommend jacoco4sbt.
And finally, the mandatory citation: "Optimization is the root of all evil." - There is a lot of wisdom in this one.
Good luck, and a whole lot of fun!
Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 8 years ago.
Improve this question
For example, what piece of code is considered better styled? If I show my code to a professional developer and ask if my code is good or not, will using the second style be probably considered as a (minor, but...) minus or a plus to my code quality?
I myself tend to like the second style but would prefer to comply to the most common opinion in this case.
1
val foo : Int = -1
val bar : Int = 1
val yohoho : Double = NaN
2
val foo : Int = -1
val bar : Int = 1
val yohoho : Double = NaN
Did you change your tags? I see several answers referring to Python but the only language tag I see is Scala. If you're interested in Scala I would refer to the unofficial Scala Style Guide. Based on the style guide I would suggest your example should look like this:
val foo = -1
val bar = 1
val yohoho = Double.NaN
The second style will get messed up when variables are renamed using refactoring tools.
It's just my humble opinion, but I personally just hate indentation like the second variant in any programming language. Some claim that "it looks nice" but I find it completely impractical. For example in order to rename the variable yohoho you have to realign all the other variables around (and most editors won't help you with that).
Whatever suits you, you're the coder. The first one just looks funny to me really.
Python's PEP 8 (which is the "Style Guide for Python Code") requests that you don't use whitespace to align code:
Pet Peeves
Avoid extraneous whitespace in the following situations:
....
- More than one space around an assignment (or other) operator to
align it with another.
Yes:
x = 1
y = 2
long_variable = 3
No:
x = 1
y = 2
long_variable = 3
I've tried this with Python, for instance in Django model declarations.
It becomes an OCD inducing pain when you're modifying the code later on. For instance, what if you add a var called yoohoohoey in the 2nd example? You have to go back and add spaces to each line to align it to the new length.
So, I prefer to do things the first way.
Scala doesn't force you to work in either style, but the first is generally preferred because it can really play havoc with diff tools if you have to add a new line and then respace everything else...
On the other hand, if your code/algorithm is much more readable when it's aligned, then align it! Readability trumps all other concerns, as per the agile manifesto:
Individuals and interactions over processes and tools
i.e. It's more important to keep other programmers happy than to keep diff tools happy.
I use the second way, if necessary. Is time consuming to do, but with a bit of time you can configure the auto formatting to make it automatically.
But this is a matter of preference.
The Scala Style Guide is silent on the subject, but I haven't seen vertically align declarations in Scala code, so I'd go with the first choice.
If your goal is to conform to community guidelines, probably the first is more common. But are you looking to fit in -- or to stand out by the quality of your code? Learning what works for you is extremely important.
I suggest you try both styles -- then see how you find working with both. For me, aligning variables is a pain, but it's very helpful when working with larger code files. It's also a nice bit of busy work when I'm stuck on some particular aspect of code and want a little time to let my ideas percolate.
Along this same line of thought, I find that when working with, say, JavaScript, pushing the first line of a function to the far left allows me to scan code very rapidly. That breaks normal indentation rules -- but it's helped me read code far better than nicely indented code.
Whatever suits you is only good enough if you work on your own. If you're part of a group of developers, you should vote and stick to one way. Keep a list of coding standards and enforce it in code reviews.
How often do you have multiple initialized val or var declarations in a Scala file anyway? Top level vals and vars in classes should mostly be primary constructor parameters, if possible. Top level vals in traits will likely be abstract, and thus not initialized. Vals and vars inside methods should absolutely not be padded, as reordering is entirely too likely.
I just searched through a 200 class Scala project, and only found one class where this issue even arises (a "cake pattern" module, where the vals declare the components of my application). It's a non-issue.
Whatever suits you, you're the coder. The second one just looks funny to me really.
Whatever suits you, you're the coder. I find the second one is more of a pain to edit.
I can't speak for Python, but in general you will find both examples across all different languages. In my experience, the first example is more common, but it's a matter of personal taste and I don't believe that another developer would think poorly of you for using either style.
I greatly prefer the first example and agree with others who have commented on the maintainability of the second example. With syntax highlighting available in all IDEs and most quality text editors these days, I don't think the latter style is necessary to keep things readable.
From a usability standpoint while reading, the second is akin to a table layout and it helps you scan it vertically to quickly see that there're two Ints and one Double, but it's also slightly harder to connect the horizontal extremes to see that bar is 1.
The first one makes the individual rows take focus and is easier to scan horizontally to see that bar is 1 but it's harder to quickly see that there're two Ints and one Double.
Expand the example with a lot more lines and it might be easier to "feel" this - but which one to use would be up to which is more important and of course what style is used in the project, team, workplace or best practice for the language in question.
Also, why limit table layouts to only left-aligned columns? ^^
val foo : Int = -1
val bar : Int = 1
val yohoho : Double = NaN
Second style giving headache when you need to search for specific text. When code file grows large sometime you need to seek for specific text, If you coding with custom spaces it is hard to find.
We had bad experience with large XML files holding TSQL scripts, Coders before me using style 2 as in your example, end whenever We added new rule or upgrade old one it is painful to search trough files which is codded with custom spaces.
This kind of indentation is useful when someone reads the code, but it is quite up to you to decide.
My answer is don't use it, you will save some time, and you really want to use it, make some text editor that does it, or maybe do yourself some compiler that interpret printed html code, because it's worthless to do it on a text file, which I'll remind you, is a plain long line of bytes reorganized with tabs, space and return carriages.
I have a more or less large Perl script of ~ 1000 lines. The script accepts a few arguments and it runs straight forward. No modules, no functions. The script could be divided into three parts, initialization part, arguments parsing part and work part, but I don't know how to do that. Everything must be kept in a single file. Please, can anyone give me instructions/advice how to structure my Perl script?
Thanks.
You ask for advice on how to refactor your script, but you don't appear to understand why to refactor it. Without the why, the how isn't going to do you much good. And with the why, the how may fall out quite naturally.
If your script is working perfectly and needs no modification and all you'll ever do with it is run it, then you probably don't have a reason to refactor it - and I say that from the perspective of despising long routines. But...
If something's wrong with it
If you are trying to find a bug in your 1,000-line program, you have some hard work ahead of you. The problem could be anywhere. Break it up into smaller pieces so that you can verify the input and output at different stages - ideally, write tests for the smaller pieces. Fine-grained unit tests will tell you what isn't working, the nature of the error, and where the error exists.
If you need to modify it
If you need to change the script to - say - accommodate a new graphics format, or take advantage of multiple processors, or record its activities to a log - you will find it easier to extend if the program elements that need revision or extension are better isolated.
If you're trying to explain it to someone else, or show it off
You will find it much easier to convey the ideas in your script to another developer if the ideas are broken out into discrete methods.
So, there are some reasons why you might choose to refactor. If any of them apply, refactor accordingly; the how will drop out naturally. Extract Method may be your best friend.
If you can see logical parts of your script, you should definitely abstract them into functions. Having a single script of over 1000 lines, and not breaking it up into whatever abstraction units your language provides (functions, classes, etc.) is a very bad idea. Maintaining your script, i. e. adding features and fixing bugs, will be a nightmare.
I strongly suggest you read the book Clean Code by Robert C. Martin. It uses Java for examples, but the ideas are applicable to any language. The one that is most relevant here is "Make your functions small. Then make them smaller."
1000 lines and no functions? why not? this is a vague question.
You could break each section into a separate function, and then have a function that runs through each of these functions in the correct order called 'run()' or something similar. This would allow you to break up the program into more mangeable chunks.
p.s. man I think I used the word function too many times in this answer!
Have you refactored at all? At 1000 lines I'd suspect to see some code that could be broken down into functions internal to the script.
Well, if you have three separate sections that's the logical choice.
You could make each one into a function and then have a simple linear control at the top:
my $var1, $var2, $var3;
$var1 = init();
$var2 = parseInput();
$doWork();
sub init() {
some code here
}
sub parseInput() {
some code here
}
sub doWork() {
some code here
}
The big issue is you're going to be using globals a lot. I'd build them into a structure or two. I would also expect to see the big three broken down into functions themselves. Back in the 80s when the big thing I was learning was structured programming (the best design here I think) the rule of thumb was a function should fit on roughly one screen or less.
Most people would typically answer something like "a subroutine should do one thing" and "a subroutine should only take up one page in your editor". You can try to keep these things in mind when you refactor your code.
Try to identify parts of your code that can be split off into logical sections. You've started this process by spotting 'initialization', 'argument parsing', and 'work'. See if there are some sub-sections within that that can be pruned off into other subroutines.
Also, why do you not use any modules? One that springs to mind is Getopt::Long, which is a core module, so you won't have to install it manually. It will handle all of your argument parsing, and by using it you will probably avoid bugs and could shorten your code to make it more maintainable. By using standard modules like this, you not only (hopefully!) reduce the number of bugs in your code, you make it easier for other Perl programmers to understand.
You could look at search.cpan.org, maybe some Perl module suits your needs. For example there is a CGI::Application
In my C project I have quite a large utils.c file. It is really full of many utilities of different sorts. I feel a bit naughty just stuffing different miscellaneous functions in there. For example it has some utilities related to low level stuff such as a lowercase() function, and it also has some quite sophisticated utilities such as converting to/from different colour formats.
My question is, is it very naughty to have such a large utils.c with many different types of utilities in it? Should I break it up into many different kinds of utility files? Such as graphics_utils.c and so on What do you think?
Breaking them up into separate files based on categories (ie graphics, strings, etc.) will lead to better organization, making it easier to locate certain pieces of code, having smaller files to go through, instead of just one large file.
You want to break it up, not just for organizational reasons, but because you will have many other files that depend on this one. Because everything will depend on this file, it makes this one file difficult to change because it might cause widespread breakage.
http://ifacethoughts.net/2006/04/15/stable-dependencies-principle/
If it's just you that will EVER maintain the stuff, it's a matter of when the complexity gets to the point where you find yourself searching for things. That would be the time to refactor and reorganize (there's a cost to reorganize, just as there's a cost to not reorganize).
If it's POSSIBLE that anyone else will maintain a project that includes your utils, you have to consider THEIR pain point when deciding when to reorganize. Theirs is MUCH lower than yours.
I tend to break them up into various sub-utils as you say (graphics_utils) when it becomes appropriate.
Break it up. Stuff will be easier to find, easier to reuse, easier to refactor, easier to unit test. I recently needed to get a set of ISO-8601 date handling methods out of a ginormous Java utility class of static methods, and it was really hard to find the 5% of the code I needed.
It is definitely not kosher, because the next guy coming through your code won't know where to look for anything. Break it up by function, and your coworkers will thank you!
Another advantage that comes from breaking up the file into separates is that when you place it under source control, you can have finer grained control. This really is useful if you have bits that are tweaked/extended/specialised frequently, and other bits that are relatively stable.
Another point: You should organize your code, i. e. break it up in smaller modules and categorize it, because at some point in time you will end up writing a second and third function for the same thing, simply for the reason that you wont find that function that you knew it was there, but you don't remember it's name.
I've got a (rather large) project with such a module and there is programming logic for which there are up to 5-6 implementations (for the same thing).
Like everyone else I would break them up. But I tend to use Extension Methods now, so I would have one class (and one file) per class being extended (e.g. StringExtensions, SqlDataReaderExtensions, etc). I find this tends to break up the utility methods nicely.