Scala legacy code: how to access input parameters at different points in execution path? - scala

I am working with a legacy scala codebase, and as is always the case modifying the code is quite difficult without touching different parts.
One of my new requirement in to make several decisions based on some input parameters. Problem is that these decisions are to be made at various points along the execution. So either I encapsulate all those parameters in a case class instance and pass it along. But it means I would have to modify multiple methods signatures, and I want to avoid this approach as much as possible.
Another approach can be to create a global object containing all those input parameters and accessible from different points in the execution. Is it a good approach in Scala?

No, using global mutable variables to pass “hidden” parameters is not a good idea, not in Scala and not in any other programming language. It makes the code hard to understand and modify, because a function's behaviour will now depend on which functions were invoked earlier. And it's extremely fragile, because you might forget setting one of those global parameters before invoking the function, which means that it will use whatever value was stored there before. This is the kind of thing that can appear to work for years, and then break when you modify a completely unrelated part of the program.
I can't stress this enough: do not use global mutable variables, period. The solution is to man up and change those method signatures. Depending on the details, dependency injection may or may not help in your particular case.

Related

Best way to initialize Matlab parameters based on the machine

I am currently in a stage where I would like to have my code modularized and following software-engineering techniques to make it reusable and understandable. In particular, I run my code either in my laptop and in an external server.
My goal is to have the main part of the code exactly the same in the laptop and the server, but different initialization parameters in the two case (in the server I will increase the # of iterations for instance and other variables). What is the common practice to do so, apart from clearly an if-else statement in the main?
I was thinking of an initialization file (like a JSON) in the laptop and the server, which is different and I just need to modify the values. Or a Matlab function which initializes the variables, but still, it would have an if-else statement.
Other suggestions? Keep in mind I might to want to extend the algorithmic part and introduce new parameters in the future.
Thanks

How to wrap procedural algorithms in OOP language

I have to implement an algorithm which fits perfectly to the procedural design approach. It has no relations with some data structure, it just takes couple of objects, bunch of control parameters and performs complicated operations on them, including creating and modifying intermediate temporal data, subroutines calls, many cpu-intensive data transformations. The algorithm is too specific to include in either parameter object as method.
What is idiomatic way to wrap such algorithms in an OOP language? Define static object with static method that performs calculation? Define class that takes all algorithm parameters as constructor arguments and have result method to return result? Any other way?
If you need more specifics, I'm writing in scala. But any general OOP approach is also applicable.
A static method (or a method on a singleton object in the case of Scala -- which I'm just gonna call a static method because that's the most common terminology) can work perfectly fine and is probably the most common approach to this.
There's some reasons to use other approaches, but they aren't strictly necessary and I'd avoid them unless you actually need an advantage that they give. The reason for this is because static methods are the simplest (if least versatile) approach.
Using a non-static method can be useful because you can then utilize design patterns like the factory pattern. For example, you might have an Operator class with a method evaluate. Now you could have different factories create different Operators so that you can swap your algorithm on the fly. Perhaps a calculator might have an AddOperatorFactory, MultiplyOperatorFactory and so on. Obviously this requires that you are able to instantiate an object that represents the algorithm. Of course, you could just pass a function around directly, as Scala and many other languages allow. Classes allow for inheritance, though, which opens the doors for some design patterns and, well, you're asking about OOP, not Scala specifically.
Also useful is the ability to have state with an object. With static methods, your only options for retaining state are either having global state (ew) or making the user of the static methods keep track of this state (more work for the users). With an instance of an object, you can keep that state inside the instance. For example, if your algorithm is a graph search, perhaps you'd want to allow resuming a search after you find the first match (which obviously requires storing state).
It's not much harder to have to do new MyAlgorithm().doStuff() instead of MyAlgorithm.doStuff(), so if in doubt, I would err on the side of avoiding static methods if you think you'll need the functionality that having an instance offers.

Progress-gl - What's benefit of placing variable declaration on top of the procedure

I've been doing Progress 4GL for 8 years though it's not my main responsibility. I do C++ and Java a lot more. When programming in other language it's suggested to have the declaration close to the usage. With 4GL however I see people place the declaration on top of the file. It's even in the coding standard.
I think placing them on top of them file would lead to 'vertical separation' problem. In most other language it's even suggested to do the assignment at the same line as the declaration.
The question is why it's suggested to do so in 4GL ? What's the benefit ? I know that it's possible to place the declaration anywhere in the file, given that it's declared before it is used.
I think the answer is to do with scoping, or the lack of it, within Progress 4GL.
If you are used to Java, say, and read a Progress 4GL program, that looks like
DO:
DEFINE VARIABLE x AS INTEGER INITIAL 4.
DISPLAY x.
END.
then you wouldn't expect to be able to use this value of x anywhere else in the program, and that any changes made in the block, wouldn't effect anything outside the block.
As I understand it, all progress variables declared within the body of a program are scoped to the whole program, unless they are declared are within an internal procedure or function, in which case they are scoped to the procedure or function.
(Incidentally any default buffers [i.e. undeclared] you use within an internal procedure/function are scoped to the whole program, not just the procedure or function, so you need to be very careful to explicity declare buffers in functions you intend ot use recursively).
I therefore think the convention of declaring variables at the beginning of a program is in order to reflect the fact that Progress will treat them has having been done so, regardless of where you put the declaration.
There is absolutely no benefit in scoping anything to the program as a whole when it could be scoped smaller.
Smaller scopes are easier to test, give less possibility of namespace conflict, and less opportunity for error.
Tightly scoped named buffers are especially useful when writing to the database because they eliminate the possibility of there ever being some other part of your code that uses the same buffer and causes a share-lock, i.e., this fails to compile:
do for b-customer transaction:
find b-customer where .... exclusive...
...
end.
...
find b-customer...
On the other hand, procedures and functions (and include files...) that share scope with the main body of code are a major source of bugs, because when you pick up your variable or whatever, you can never be entirely certain where it has been...
All of this is just basic Structured Programming, of course. It's true for every language and has been accepted since the 70's.
The "reason" that you usually see variables defined at the top is simple. Habit. That is just how things were done in the bad old days.
A lot of old code, or code written by old fossils, is written that way. No matter the language.
Some languages (COBOL springs to mind) even formalized it.
Is there any advantage to such an approach?
Not especially. I guess you could argue "they are all in one place and easy to find" but that isn't very compelling.
"Habit" is actually more compelling ;) If you are working with a team that expects a certain style or in an application where a particular style is prevalent then you should think twice before unilaterally throwing out a new way of doing things - the confusion could be a bigger problem than the advantages gained.

Design - When to create new functions?

This is a general design question not relating to any language. I'm a bit torn between going for minimum code or optimum organization.
I'll use my current project as an example. I have a bunch of tabs on a form that perform different functions. Lets say Tab 1 reads in a file with a specific layout, tab 2 exports a file to a specific location, etc. The problem I'm running into now is that I need these tabs to do something slightly different based on the contents of a variable. If it contains a 1 I may need to use Layout A and perform some extra concatenation, if it contains a 2 I may need to use Layout B and do no concatenation but add two integer fields, etc. There could be 10+ codes that I will be looking at.
Is it more preferable to create an individual path for each code early on, or attempt to create a single path that branches out only when absolutely required.
Creating an individual path for each code would allow my code to be extremely easy to follow at a glance, which in turn will help me out later on down the road when debugging or making changes. The downside to this is that I will increase the amount of code written by calling some of the same functions in multiple places (for example, steps 3, 5, and 9 for every single code may be exactly the same.
Creating a single path that would branch out only when required will be a bit messier and more difficult to follow at a glance, but I would create less code by placing conditionals only at steps that are unique.
I realize that this may be a case-by-case decision, but in general, if you were handed a previously built program to work on, which would you prefer?
Edit: I've drawn some simple images to help express it. Codes 1/2/3 are the variables and the lines under them represent the paths they would take. All of these steps need to be performed in a linear chronological fashion, so there would be a function to essentially just call other functions in the proper order.
Different Paths
Single Path
Creating a single path that would
branch out only when required will be
a bit messier and more difficult to
follow at a glance, but I would create
less code by placing conditionals only
at steps that are unique.
Im not buying this statement. There is a level of finesse when deciding when to write new functions. Functions should be as simple and reusable as possible (but no simpler). The correct answer is almost never 'one big file that does a lot of branching'.
Less LOC (lines of code) should not be the goal. Readability and maintainability should be the goal. When you create functions, the names should be self documenting. If you have a large block of code, it is good to do something like
function doSomethingComplicated() {
stepOne();
stepTwo();
// and so on
}
where the function names are self documenting. Not only will the code be more readable, you will make it easier to unit test each segment of the code in isolation.
For the case where you will have a lot of methods that call the same exact methods, you can use good OO design and design patterns to minimize the number of functions that do the same thing. This is in reference to your statement "The downside to this is that I will increase the amount of code written by calling some of the same functions in multiple places (for example, steps 3, 5, and 9 for every single code may be exactly the same."
The biggest danger in starting with one big block of code is that it will never actually get refactored into smaller units. Just start down the right path to begin with....
EDIT --
for your picture, I would create a base-class with all of the common methods that are used. The base class would be abstract, with an abstract method. Subclasses would implement the abstract method and use the common functions they need. Of course, replace 'abstract' with whatever your language of choice provides.
You should always err on the side of generalization, with the only exception being early prototyping (where throughput of generating working stuff is majorly impacted by designing correct abstractions/generalizations). having said that, you should NEVER leave that mess of non-generalized cloned branches past the early prototype stage, as it leads to messy hard to maintain code (if you are doing almost the same thing 3 different times, and need to change that thing, you're almost sure to forget to change 1 out of 3).
Again it's hard to specifically answer such an open ended question, but I believe you don't have to sacrifice one for the other.
OOP techniques solves this issue by allowing you to encapsulate the reusable portions of your code and generate child classes to handle object specific behaviors.
Personally I think you might (if possible by your API) create inherited forms, create them on fly on master form (with tabs), pass agruments and embed in tab container.
When to inherit form and when to decide to use arguments (code) to show/hide/add/remove functionality is up to you, yet master form should contain only decisions and argument passing and embeddable forms just plain functionality - this way you can separate organisation from implementation.

What functions to put inside a class

If I have a function (say messUp that does not need to access any private variables of a class (say room), should I write the function inside the class like room.messUp() or outside of it like messUp(room)? It seems the second version reads better to me.
There's a tradeoff involved here. Using a member function lets you:
Override the implementation in derived classes, so that messing up a kitchen could involve trashing the cupboards even if no cupboards are available in a generic room.
Decide that you need to access private variables later on, without having to refactor all the code that uses the function.
Make the function part of an interface, so that a piece of code may require that its argument be mess-up-able.
Using an external function lets you:
Make that function generic, so that you may apply it to rooms, warehouses and oil rigs equally (if they provide the member functions required for messing up).
Keep the class signature small, so that creating mock versions for unit testing (or different implementations) becomes easier.
Change the class implementation without having to examine the code for that function.
There's no real way to have your cake and eat it too, so you have to make choices. A common OO decision is to make everything a method (unless clearly idiotic) and sacrifice the three latter points, but that doesn't mean you should do it in all situations.
Any behaviour of a class of objects should be written as an instance method.
So room.messUp() is the OO way to do this.
Whether messUp has to access any private members of the class or not, is irrelevant, the fact that it's a behaviour of the room, suggests that it's an instance method, as would be cleanUp or paint, etc...
Ignoring which language, I think my first question is if messUp is related to any other functions. If you have a group of related functions, I would tend to stick them in a class.
If they don't access any class variables then you can make them static. This way, they can be called without needing to create an instance of the class.
Beyond that, I would look to the language. In some languages, every function must be a method of some class.
In the end, I don't think it makes a big difference. OOP is simply a way to help organize your application's data and logic. If you embrace it, then you would choose room.messUp() over messUp(room).
i base myself on "C++ Coding Standards: 101 Rules, Guidelines, And Best Practices" by Sutter and Alexandrescu, and also Bob Martin's SOLID. I agree with them on this point of course ;-).
If the message/function doesnt interract so much with your class, you should make it a standard ordinary function taking your class object as argument.
You should not polute your class with behaviours that are not intimately related to it.
This is to repect the Single Responsibility Principle: Your class should remain simple, aiming at the most precise goal.
However, if you think your message/function is intimately related to your object guts, then you should include it as a member function of your class.