Significance of using the term "over" when talking about operations - scala

Here is an example sentence which uses the term "over" when talking abut a type and one of its operations:-
"A monoid is a type along with an associative binary operation over it..."
My question is not about monoids - what I don't quite understand is the use of the term "over".
Is this from mathematics?
Does it mean that the operation works on the data defined by the type itself - as opposed to any data we might pass to it from outside the instance of that type?

Yes, it comes from Math.
In Math and FP, unlike OOP, data structures are defined independently from functions that operate on this structure.
As a consequence it's easy to have many different operations over the same data type. For example Sum monoid over Intergers and Multiplication monoid over Integers.

Related

Why do Calculus of Construction based languages use Setoids so much?

One finds that Setoids are widely used in languages such as Agda, Coq, ... Indeed languages such as Lean have argued that they could help avoid "Setoid Hell". What is the reason for using Setoids in the first place?
Does the move to extensional type theories based on HoTT (such as cubical Agda) reduce the need for Setoids?
As Li-yao Xia's answer describes, setoids are used when we don't have or don't want to use quotients.
In the HoTT book and in Lean quotients are (basically) axiomatized. One difference between Lean and the HoTT book is that the latter has many more higher inductive types, and Lean only has quotients and (regular) inductive types. If we just restrict our attention to quotients (set quotients in the HoTT book), it works the same in Lean and in Book HoTT. In this case you just postulate that given a type A and an equivalence R on A you have a quotient Q, and a function [-] : A → Q with the property ∀ x y : A, R x y → [x] = [y]. It comes with the following elimination principle: to construct a function g : Q → X for some type X (or hSet X in HoTT) we need a function f : A → X such that we can prove ∀ x y : A, R x y → f x = f y. This comes with the computation rule, that states ∀ x : A, g [x] ≡ f x (this is a definitional equality in both Lean and Book HoTT).
The main disadvantage of this quotient is that it breaks canonicity. Canonicity states that every closed term (that is, a term without free variables) in (say) the natural numbers normalizes to either zero or the successor of some natural number. The reason that this quotient breaks canonicity is that we can apply the elimination principle for = to the new equalities in a quotient, and a term like that will not reduce. In Lean the opinion is that this doesn't matter, since in all cases we care about we can still prove an equality, even though it might not be a definitional equality.
Cubical type theory has a fancy way to work with quotients while retaining canonicity. In CTT equality works differently, and this means that higher inductive types can be introduced while keeping canonicity. Potential disadvantages of CTT are that the type theory is a lot more complicated, and that you have to embrace HoTT (and in particular give up on proof irrelevance).
[The answers by Lia-yao Xia and Floris van Doorn are both excellent, so I will try to augment them with additional information.]
Another view is that quotients, while used a lot in classical mathematics, are perhaps not so great after all. Not taking quotients but sticking to Groupoids is exactly where non-commutative geometry starts from! It teaches us that some quotients are incredibly badly behaved, and the last thing we want to do (in those cases!) is to actually quotient. But that the thing itself is not so bad, even quite good, if you treat it using the 'right' tools.
It is arguably also deeply embedded in all of category theory, where one doesn't quotient out equivalent objects. Taking of 'skeletons' in category theory is regarded to be in bad taste. The same is true of strictness, and a host of other things, all of which boil down to trying to squish things down that are better left as they are, as they do no harm at all. We're just used to wanting 'uniqueness' to be reflected in our representations - something we should just get over.
Setoid hell arises not because some coherences must be proven (you need to prove them to show you have a proper equivalence, and again whenever you define functions on raw representations instead of on the quotiented version). It arises when you're forced to prove these coherences again and again when defining functions that can't possibly "go wrong". So Setoid hell is actually caused by a failure to provide proper abstraction mechanisms.
In other words, if you're doing sufficient simple mathematics, where quotients are well-behaved, then there should be some automation that lets you work with that smoothly. Currently, in type theory, working out exactly what that could look like, is ongoing research. Floris' answer outlines well what one pitfall is: at some point, you give up that computations will be well-behaved, and from then on, are forced to do everything via proofs.
Ideally one would certainly like to be able to treat arbitrary equivalence relations as Leibniz equality (eq), enabling rewriting in arbitrary contexts. That means to define the quotient of a type by an equivalence relation.
I'm not an expert on the topic, but I've been wondering the same for a while, and I think the reliance on setoids stems from the fact that quotients are still a poorly understood concept in type theory.
Setoid Hell is where we're stuck when we don't have/want quotient types.
We can axiomatize quotient types, I believe (I could be mistaken) that's what Lean does.
We can develop a type theory which can naturally express quotients, that's what HoTT/Cubical TT do with higher inductive types.
Furthermore, quotient types (or my naive imagination of them) force us to package programs and proofs together in a perhaps less-than-ideal way: a function between two quotient types is a plain function together with a proof that it respects the underlying equivalence relation. While one can technically do that, this interleaving of programming and proving is arguably indesirable because it makes programs unreadable: one often seeks to either keep programs and proofs in two completely separate worlds (so that mandates setoids, keeping types separate from their equivalence relations), or to change some representations so the program and the proof are one and the same entity (so we might not even need to explicitly reason about equivalences in the first place).

Modelica I/O blocks vs. Functions

Blocks and functions in Modelica have some similarities and differences. In blocks, output variables are most likely expressed in terms of input variables using equations, whereas in functions output variables are expressed in terms of input variables using assignments. Given a relationship y = f(u) that can be expressed using both notions, I am interested in knowing which notion shall you favour in which situation?
Personally,
Blocks can be better integrated in block diagrams using input/output connectors
Equations in blocks can be most likely better treated by compilers for symbolic manipulation, optimization, and evaluating analytical derivatives required for Jacobian evaluation. So I guess blocks are likely less sensitive to numerical errors in some boundary cases. For functions, derivatives are likely to be evaluated using finite difference methods, if they are not explicitly provided.
on the other hand a set of assignments in a function will be most likely treated as a single equation. The same set of assignments if expressed in terms of a larger set of equations in a block will result in a model of larger size probably leading to a decrease in runtime performance
although a block with an algorithmic section is kind of equivalent to a function with the same assignments set, the syntax of a function call is favored in couple of situations
One can establish hierarchies of blocks types and do all of sort of things of object oriented modelings. Functions are kind of limited. It is not possible to extend from a non-abstract function that contains an algorithm section. But it is possible to have (an) abstract function(s) that act(s) as (an) interface(s) out of which implemented functions can be established etc.
Some of the above arguments are dependent on the way a specific simulation environment treats a block or a function. These might be low-level details not necessarily known.
The list in your "question" is already a pretty good summary. Still there are some additional things that should be considered:
Regarding the differentiation of functions, the developer at least needs to define how often the assignments can be differentiated (here is a nice read on this), as e.g. Dymola will not do it automatically. Alternatively the differentiated function can be specified manually (here). By the way, a partial derivative can be defined as well, see Language Specification, Sec. 12.7.2.
When it is necessary to invert a function, it can be necessary to define it manually. This is described in the Language Specification, Sec. 12.8.
Also it could be important that code from a function can be inlined, which should overcome some of the issues mentioned above, see Language Specification, Sec. 18.3.
Generally I would go for blocks whenever there is no very strong reason for a function. Some that come to my mind are the need for procedural execution, or for-loops.
This is just my two cents - more opinions welcome...
You might be interested in the opposite: calling a block as if it was a function:
https://github.com/modelica/ModelicaSpecification/issues/1512
The advantage of using function syntax is that you don't need to declare + connect components:
Block b;
equation
connect(x, b.in1);
connect(y, b.in2);
connect(z, b.out1);
vs
z = Block(x, y);
Of course right now, this syntax does not exist yet. And you really want to use blocks when you can. Algorithmic blocks might as well be functions as they are shorter and easier to write and will introduce fewer trajectories in your result-file (good unless you want to debug what happens inside the function call I guess).

Are there algebraic data types outside of sum and product?

By most definitons the common or basic algebraic data types in Haskell or Scala are sum and product. Examples: 1, 2.
Sometimes a definition just says algebraic data types are sum and product, perhaps for simplicity.
However, the definitions leave an impression that other algebraic data types are possible, and sum and product are just the most useful to describe selection or combination of elements.
Given there are subtraction, division, raising to an integer power operations in a basic algebra - is it correct some implementation of other alternative algebraic types in programming is possible, but they are not useful?
Do any programming languages have algebraic data types implemented that are not sum and product types?
"Algebraic" comes from category theory. Every algebraic data type is an initial algebra of a functor. So you could in principle call anything that comes from a functor in this way algebraic, and I think it's quite a large class.
Interpreting "algebraic" to mean "high-school algebra" (I don't mean to be condescending, that's just how we refer to it) as you have, there are some nice analogies.
Arbitrary powers, not just integer powers, are closely analogous to function types, that is, A -> B is analogous to BA. In category theory, when you consider a function ("morphism") as an object of a category, it's called an exponential object, and the latter notation is used. For fun, see if you can prove the law CA+B = CA × CB by writing a bijection between the corresponding types.
Division is analogous to quotient types, which is a fascinating area of research that reaches into things as hott and trendy as homotopy type theory. The analogy of quotients to division is not as strong as product types with multiplication, as you have to divide by an equivalence relation.
At this rate, you would expect subtraction to have some beautiful analogy to go with it, but alas I know of none. Dan Piponi has explored it a little through the antidiagonal, but it is far from a general analogy.
The conventional answer is as given by #luqui, and I have nothing to add to that. The downside is that whilst you distinguish the alternants of the sum by tag ('constructor' in Haskell), you must access the components of the product by position. For homogeneous components as in a vector/array that's fine; but for typical data structures (record types) you want to access by 'field label' and abstract away the position. Haskell's record/label system a) does a very bad job of that; and b) is so deeply ingrained into the language it's proving almost impossible to improve -- see endless proposals and endless discussions resulting so far in no change.
Then an unconventional answer is indexed families aka 'indexed sets'. The idea has been developed mostly around the 'Set-Theoretic Data Structures' of D.L.Childs, for example this one. Childs'approach got a mention in the seminal paper on the Relational (database) Model Codd 1970.
The critical feature is that you can use any type to index the collection of components; the components are heterogeneous; and the compiler supports type-safe access (read and update) both by-component and whole-structure. The components might well be organised positionally within the structure, but that's an implementation detail hidden from the programmer. (Haskell's record system fails on this point.)
Do any programming languages have algebraic data types implemented that are not sum and product types?
You might or might not accept that SQL is a programming language. I might but mostly don't accept that SQL 'column names' are an implementation of 'Indexed families'. SQL's columns and rows are far too much oriented to physical layout (and indeed most vendors' SQL still allows positional notation for columns, even though it's been deprecated by the standard). That said, SQL is the nearest you'll find.
There's been a few extensible/anonymous record systems proposed/developed in Haskell (especially HList) or Haskell-like languages (like Ur/web), or even dear old Hugs' TRex. (See the Gaster & Jones paper for links to other attempts in FP languages.) All of them are limited because they're trying to put lipstick on Haskell's sum-of-product types.
I know Sum, Product, Exponential and Recursive

MAAB guideline : na_0002: Appropriate implementation of fundamental logical and numerical operations

The MAAB guideline [na_0002: Appropriate implementation offundamental logical and numerical operations][1]. Indicates that logical data type shouldn't be used in numerical operations. Among the rationales listed, we can see Code generation, but i can't understand how using logical data type in numerical operations could affect negatively Code generation, since a cast is done for the logical data type ?
Any help please.
Regards
You shouldn't add, subtract, multiply or divide numbers by logicals (booleans). It just doesn't make sense.
Should the result of a math operation with a logic operation give a numerical or a logical result, for example - it's ambiguous.

When should I choose Vector in Scala?

It seems that Vector was late to the Scala collections party, and all the influential blog posts had already left.
In Java ArrayList is the default collection - I might use LinkedList but only when I've thought through an algorithm and care enough to optimise. In Scala should I be using Vector as my default Seq, or trying to work out when List is actually more appropriate?
As a general rule, default to using Vector. It’s faster than List for almost everything and more memory-efficient for larger-than-trivial sized sequences. See this documentation of the relative performance of Vector compared to the other collections. There are some downsides to going with Vector. Specifically:
Updates at the head are slower than List (though not by as much as you might think)
Another downside before Scala 2.10 was that pattern matching support was better for List, but this was rectified in 2.10 with generalized +: and :+ extractors.
There is also a more abstract, algebraic way of approaching this question: what sort of sequence do you conceptually have? Also, what are you conceptually doing with it? If I see a function that returns an Option[A], I know that function has some holes in its domain (and is thus partial). We can apply this same logic to collections.
If I have a sequence of type List[A], I am effectively asserting two things. First, my algorithm (and data) is entirely stack-structured. Second, I am asserting that the only things I’m going to do with this collection are full, O(n) traversals. These two really go hand-in-hand. Conversely, if I have something of type Vector[A], the only thing I am asserting is that my data has a well defined order and a finite length. Thus, the assertions are weaker with Vector, and this leads to its greater flexibility.
Well, a List can be incredibly fast if the algorithm can be implemented solely with ::, head and tail. I had an object lesson of that very recently, when I beat Java's split by generating a List instead of an Array, and couldn't beat that with anything else.
However, List has a fundamental problem: it doesn't work with parallel algorithms. I cannot split a List into multiple segments, or concatenate it back, in an efficient manner.
There are other kinds of collections that can handle parallelism much better -- and Vector is one of them. Vector also has great locality -- which List doesn't -- which can be a real plus for some algorithms.
So, all things considered, Vector is the best choice unless you have specific considerations that make one of the other collections preferable -- for example, you might choose Stream if you want lazy evaluation and caching (Iterator is faster but doesn't cache), or List if the algorithm is naturally implemented with the operations I mentioned.
By the way, it is preferable to use Seq or IndexedSeq unless you want a specific piece of API (such as List's ::), or even GenSeq or GenIndexedSeq if your algorithm can be run in parallel.
Some of the statements here are confusing or even wrong, especially the idea that immutable.Vector in Scala is anything like an ArrayList.
List and Vector are both immutable, persistent (i.e. "cheap to get a modified copy") data structures.
There is no reasonable default choice as their might be for mutable data structures, but it rather depends on what your algorithm is doing.
List is a singly linked list, while Vector is a base-32 integer trie, i.e. it is a kind of search tree with nodes of degree 32.
Using this structure, Vector can provide most common operations reasonably fast, i.e. in O(log_32(n)). That works for prepend, append, update, random access, decomposition in head/tail. Iteration in sequential order is linear.
List on the other hand just provides linear iteration and constant time prepend, decomposition in head/tail. Everything else takes in general linear time.
This might look like as if Vector was a good replacement for List in almost all cases, but prepend, decomposition and iteration are often the crucial operations on sequences in a functional program, and the constants of these operations are (much) higher for vector due to its more complicated structure.
I made a few measurements, so iteration is about twice as fast for list, prepend is about 100 times faster on lists, decomposition in head/tail is about 10 times faster on lists and generation from a traversable is about 2 times faster for vectors. (This is probably, because Vector can allocate arrays of 32 elements at once when you build it up using a builder instead of prepending or appending elements one by one).
Of course all operations that take linear time on lists but effectively constant time on vectors (as random access or append) will be prohibitively slow on large lists.
So which data structure should we use?
Basically, there are four common cases:
We only need to transform sequences by operations like map, filter, fold etc:
basically it does not matter, we should program our algorithm generically and might even benefit from accepting parallel sequences. For sequential operations List is probably a bit faster. But you should benchmark it if you have to optimize.
We need a lot of random access and different updates, so we should use vector, list will be prohibitively slow.
We operate on lists in a classical functional way, building them by prepending and iterating by recursive decomposition: use list, vector will be slower by a factor 10-100 or more.
We have an performance critical algorithm that is basically imperative and does a lot of random access on a list, something like in place quick-sort: use an imperative data structure, e.g. ArrayBuffer, locally and copy your data from and to it.
For immutable collections, if you want a sequence, your main decision is whether to use an IndexedSeq or a LinearSeq, which give different guarantees for performance. An IndexedSeq provides fast random-access of elements and a fast length operation. A LinearSeq provides fast access only to the first element via head, but also has a fast tail operation. (Taken from the Seq documentation.)
For an IndexedSeq you would normally choose a Vector. Ranges and WrappedStrings are also IndexedSeqs.
For a LinearSeq you would normally choose a List or its lazy equivalent Stream. Other examples are Queues and Stacks.
So in Java terms, ArrayList used similarly to Scala's Vector, and LinkedList similarly to Scala's List. But in Scala I would tend to use List more often than Vector, because Scala has much better support for functions that include traversal of the sequence, like mapping, folding, iterating etc. You will tend to use these functions to manipulate the list as a whole, rather than randomly accessing individual elements.
In situations which involve a lot random access and random mutation, a Vector (or – as the docs say – a Seq) seems to be a good compromise. This is also what the performance characteristics suggest.
Also, the Vector class seems to play nicely in distributed environments without much data duplication because there is no need to do a copy-on-write for the complete object. (See: http://akka.io/docs/akka/1.1.3/scala/stm.html#persistent-datastructures)
If you're programming immutably and need random access, Seq is the way to go (unless you want a Set, which you often actually do). Otherwise List works well, except it's operations can't be parallelized.
If you don't need immutable data structures, stick with ArrayBuffer since it's the Scala equivalent to ArrayList.