Pattern matching rewrites with internal DSL? - scala

My question is about DSL design. It is related to internal vs. external DSLs, but is more specific than that.
Background info: I have gone through DSL in Action and other tutorials. The difference between internal and external is clear to me. I also have experience developing external DSLs in Haskell.
Let's take a very simple example. Here below a (simplified) relational algebra exression:
SELECT conditions (
CROSS (A,B)
)
Algebra expressions (ADT in Haskell) can be easily rewritten. For example, this can be trivially rewritten into:
JOIN conditions (A,B)
In the DSLs I have developed, I have always taken this approach: write a parser which creates algebraic expressions like the one above. Then, with a language that allows pattern matching like Haskell, apply a number of rewrites and eventually translate into a target language.
Here comes the question.
For a new DSL I would like to develop, I'd rather opt for an internal DSL. Mainly because I want to take advantage of the host language capabilities (probably Scala in this case). The debate of whether this is the right choice is not the point here. Let's assume its a good choice.
What I miss is: if I go for an internal DSL, then there is no parsing into an ADT. Where is my loved pattern matching rewrite going to fit into this? Do I have to give up on it? Are there best practices to get the best of the two worlds? Or am I not seeing things correctly here?

I'll demonstrate this using an internal expression language for arithmetic in Haskell. We'll implement double-negation elimination.
Internal DSL "embeddings" are either deep or shallow. Shallow embeddings mean that you rely upon sharing operations from the host language to make the domain language run. In our example, this almost annihilates the very DSL-ness of our problem. I'll show it anyway.
newtype Shallow = Shallow { runShallow :: Int }
underShallow1 :: (Int -> Int) -> (Shallow -> Shallow)
underShallow1 f (Shallow a) = Shallow (f a)
underShallow2 :: (Int -> Int -> Int) -> (Shallow -> Shallow -> Shallow)
underShallow2 f (Shallow a) (Shallow b) = Shallow (f a b)
-- DSL definition
instance Num Shallow where
fromInteger n = Shallow (fromInteger n) -- embed constants
(+) = underShallow2 (+) -- lifting host impl into the DSL
(*) = underShallow2 (*)
(-) = underShallow2 (-)
negate = underShallow negate
abs = underShallow abs
signum = underShallow signum
So now we write and execute our Shallow DSL using the overloaded Num methods and runShallow :: Shallow -> Int
>>> fromShallow (2 + 2 :: Shallow)
4
Notably, since everything in this Shallow embedding is represented internally with almost no structure besides the result since all of the work has been dropped down to the host language where our domain language can't "see" it.
A deep embedding clearly separates the representation and the interpretation of a DSL. Typically, a good way to represent it is an ADT which has branches and arities which match a minimal, basis API. We'll just reflect the whole Num class
data Deep
= FromInteger Integer
| Plus Deep Deep
| Mult Deep Deep
| Subt Deep Deep
| Negate Deep
| Abs Deep
| Signum Deep
deriving ( Eq, Show )
Notably, this representation will admit equality (note that this is the smallest equality possible since it ignores "values" and "equivalences") and showing, which is nice. We tie it into the same internal API by instantiating Num
instance Num Deep where
fromInteger = FromInteger
(+) = Plus
(*) = Mult
(-) = Subt
negate = Negate
abs = Abs
signum = Signum
but now we have to create an interpreter which ties the deep embedding into values represented in the host language. Here an advantage of Deep embeddings arises in that we can trivially introduce multiple interpreters. For instance, "showing" can be considered an interpreter from Deep to String
interpretString :: Deep -> String
interpretString = show
We can count the number of embedded constants as an interpreter
countConsts :: Deep -> Int
countConsts x = case x of
FromInteger _ = 1
Plus x y = countConsts x + countConsts y
Mult x y = countConsts x + countConsts y
Subt x y = countConsts x + countConsts y
Negate x = countConsts x
Abs x = countConsts x
Signum x = countConsts x
And finally we can interpret the thing into not just an Int but any other thing which follows the Num API
interp :: Num a => Deep -> a
interp x = case x of
FromInteger n = fromInteger n
Plus x y = interp x + interp y
Mult x y = interp x * interp y
Subt x y = interp x - interp y
Negate x = negate (interp x)
Abs x = abs (interp x)
Signum x = signum (interp x)
So, finally, we can create a deep embedding and execute it in several ways
>>> let x = 3 + 4 * 5 in (interpString x, countConsts x, interp x)
(Plus (FromInteger 3) (Mult (FromInteger 4) (FromInteger 5)), 3, 23)
And, here's the finale, we can use our Deep ADT to implement optimizations
opt :: Deep -> Deep
opt x = case x of
(Negate (Negate x)) -> opt x
FromInteger n = FromInteger n
Plus x y = Plus (opt x) (opt y)
Mult x y = Mult (opt x) (opt y)
Subt x y = Sub (opt x) (opt y)
Negate x = Negate (opt x)
Abs x = Abs (opt x)
Signum x = Signum (opt x)

Related

How do I use scala collection methods to make a method for correlation coefficient?

So I'm using the sample correlation coefficient formula from this website: https://en.wikipedia.org/wiki/Pearson_correlation_coefficient
Formula:[https://i.stack.imgur.com/Jzkm8.png]
How do I even sum up each x and y value from the list individually? This is all I have so far:
def correlation[T](elements: List[T], property1: T => Double, property2: T => Double): Double = {
val xValues = elements.map(property1)
val yValues = elements.map(property2)
val Sx = standardDeviation(xValues, property1)
val Sy = standardDeviation(yValues, property2)
val xSize = xValues.size.toDouble
val ySize = yValues.size.toDouble
val xMean = xValues.sum / xSize
val yMean = yValues.sum / ySize
(1/xSize-1) * (xValues.map(x => x - xMean) * yValues.map(y => y - yMean)).sum
}
So, for example, we have the data set is List((2,7), (8,12), (11,17))
x̅ would be 7 ((2+8+11)/3= 7). y̅ would be 12 ((7+12+17)/3=12).
I'm trying take each x values and minus x̅ from each of them. This gives us, (2-7)= -5; (8-7)= 1; (11-7)= 4. Same for y values, (7-12)= -5; (12-12)=0; (17-12)=5.
And multiply each of the x values and y values gives us, (-5*-5)= 25; (1*0)= 0; (4*5)=20.
Adding up each of these gives us, (25+20)=45
But I can't seem to get the multiplication part of each x and y values before summing. Would I need a recursion for this?
Edit: I have a separate method for calculating the standard deviation
You can do it with the zip function:
xValues.zip(yValues).map((x, y) => (x - xMean) * (y - yMean)).sum
(sorry if the syntax is wrong, have not been programming in scala for years)
The last line ought to be
xValues.zip(yValues).map { case (x, y) => (x - xMean) * (y - yMean) }.sum
This is very close to the other answer, with the added detail that, when mapping, we need to use case to take apart the tuple created by zip. This must be done because map takes only one parameter, while (x, y) => ... is a function with two parameters.
Another valid approach would have been:
xValues.zip(yValues).map(pair => (pair._1 - xMean) * (pair._2 - yMean)).sum
Where _1and _2 access an item in the tuple.
Note that this applies to Scala 2.x, Scala 3 will support parameter untupling.

How do I write an algorithm to cut nodes while preserving a NetworkX network?

Assume I have a simple network as follows, and I want to remove the lower case nodes while preserving the overall structure. How do I do that? Here's some sample code:
import networkx as nx
G = nx.DiGraph();
G.add_edge("A","b")
G.add_edge("b","C")
G.add_edge("b","D")
G.add_edge("D","e")
G.add_edge("e","F")
def printHackyDot(x):
for n in x.nodes():
for pre in x.predecessors(n):
print(pre + " -> " + n)
printHackyDot(G)
badNodes = [n for n in G.nodes if str.islower(n)]
Running this will yield:
A -> b
b -> C
b -> D
D -> e
e -> F
i.e.
How do I write f(G) such that I get a similar, simplified graph less lowercase nodes:
A -> C
A -> D
D -> F
I tried the following, but it fails when you have two lower case in a row:
for badNode in [x for x in list(G.nodes) if str.islower(x)]:
R.remove_node(badNode)
for predNode in G.predecessors(badNode):
for succNode in G.successors(badNode):
R.add_edge(predNode,succNode)
This did it, iff the graph is a DAG. Someone clever could fix this req.
def isBadNode(x):
return str.islower(x)
def goodPreds(X,node):
return [n for n in X.predecessors(node) if not isBadNode(n)]+[gn for n in X.predecessors(node) for gn in goodPreds(X,n) if isBadNode(n)]
def goodSuccs(X,node):
return [n for n in X.successors(node) if not isBadNode(n)]+[gn for n in X.successors(node) for gn in goodSuccs(X,n) if isBadNode(n)]
R= G.copy()
for badNode in [x for x in list(R.nodes) if isBadNode(x)]:
for predNode in goodPreds(G,badNode):
for succNode in goodSuccs(G,badNode):
R.add_edge(predNode,succNode)
R.remove_node(badNode)
I'm assuming R starts off as a copy of G?
This would work if you replace
for badNode in [x for x in list(G.nodes) if str.islower(x)]:
R.remove_node(badNode)
for predNode in G.predecessors(badNode):
for succNode in G.successors(badNode):
R.add_edge(predNode,succNode)
with (edited for bug deleting badNode too soon)
for badNode in [x for x in list(R.nodes) if str.islower(x)]:
for predNode in R.predecessors(badNode):
for succNode in R.successors(badNode):
R.add_edge(predNode,succNode)
R.remove_node(badNode)
The issue is that when you handle the first of two lower case nodes in a row it removes that node. But when you handle the other one, it's seeing that in G that node has a lower case neighbor, and it puts the edge back. If you look at it's neighbors in R you'll handle it correctly.

Macro to generate element-wise methods

I see myself writing this pattern again and again, where a function can take an entity E or a iterator over Es and apply elementwise. eg:- an index or a vector. A column or a matrix. I was wondering if there is a macro so that I need not define multiple methods. I am looking for a julian way of doing this.
In the simplified version of my use case below, compare getrow which is naturally overloaded to work on ints or slices with innerprod that needs to be redefined for Vectors and Matrixes.
type Mat{T<:Number}
data::Matrix{T}
end
getrow{T}(m::Mat{T}, i=size(m)[1]) = m.data[i, :]
innerprod{T}(m::Mat{T}, v::Array{T}) = m.data*v
innerprod{T}(m::Mat{T}, vv::Matrix{T}) = mapslices(v->innerprod(m, v), vv, 1)
m = Mat(rand(1:9, 2, 3))
v = rand(1:9, 3, 4)
f = innerprod(m, v[:,1])
g = innerprod(m, v)

Symbolic integration vs numeric integration in MATLAB

I have an expression with three variables x,y and v. I want to first integrate over v, and so I use int function in MATLAB.
The command that I use is the following:
g =int((1-fxyz)*pv, v, y,+inf)%
PS I haven't given you what the function fxyv is but it is very complicated and so int is taking so long and I am afraid after waiting it might not solve it.
I know one option for me is to integrate numerically using for example integrate, however I want to note that the second part of this problem requires me to integrate exp[g(x,y)] over x and y from 0 to infinity and from x to infinity respectively. So I can't take numerical values of x and y when I want to integrate over v I think or maybe not ?
Thanks
Since the question does not contain sufficient detail to attempt analytic integration, this answer focuses on numeric integration.
It is possible to solve these equations numerically. However, because of complex dependencies between the three integrals, it is not possible to simply use integral3. Instead, one has to define functions that compute parts of the expressions using a simple integral, and are themselves fed into other calls of integral. Whether this approach leads to useful results in terms of computation time and precision cannot be answered generally, but depends on the concrete choice of the functions f and p. Fiddling around with precision parameters to the different calls of integral may be necessary.
I assume that the functions f(x, y, v) and p(v) are defined in the form of Matlab functions:
function val = f(x, y, v)
val = ...
end
function val = p(v)
val = ...
end
Because of the way they are used later, they have to accept multiple values for v in parallel (as an array) and return as many function values (again as an array, of the same size). x and y can be assumed to always be scalars. A simple example implementation would be val = ones(size(v)) in both cases.
First, let's define a Matlab function g that implements the first equation:
function val = g(x, y)
val = integral(#gIntegrand, y, inf);
function val = gIntegrand(v)
% output must be of the same dimensions as parameter v
val = (1 - f(x, y, v)) .* p(v);
end
end
The nested function gIntegrand defines the object of integration, the outer performs the numeric integration that gives the value of g(x, y). Integration is over v, parameters x and y are shared between the outer and the nested function. gIntegrand is written in such a way that it deals with multiple values of v in the form of arrays, provided f and p do so already.
Next, we define the integrand of the outer integral in the second equation. To do so, we need to compute the inner integral, and therefore also have a function for the integrand of the inner integral:
function val = TIntegrandOuter(x)
val = nan(size(x));
for i = 1 : numel(x)
val(i) = integral(#TIntegrandInner, x(i), inf);
end
function val = TIntegrandInner(y)
val = nan(size(y));
for j = 1 : numel(y)
val(j) = exp(g(x(i), y(j)));
end
end
end
Because both function are meant to be fed as an argument into integral, they need to be able to deal with multiple values. In this case, this is implemented via an explicit for loop. TIntegrandInner computes exp(g(x, y)) for multiple values of y, but the fixed value of x that is current in the loop in TIntegrandOuter. This value x(i) play both the role of a parameter into g(x, y) and of an integration limit. Variables x and i are shared between the outer and the nested function.
Almost there! We have the integrand, only the outermost integration needs to be performed:
T = integral(#TIntegrandOuter, 0, inf);
This is a very convoluted implementation, which is not very elegant, and probably not very efficient. Again, whether results of this approach prove to be useful needs to be tested in practice. However, I don't see any other way to implement these numeric integrations in Matlab in a better way in general. For specific choices of f(x, y, v) and p(v), there might be possible improvements.

Apply Functions to get to Multiple States in Matlab

I am trying to create a program that converts between multiple coordinate systems in Matlab.
I have different systems and want to transfer between them. There are different topocentric, geocentric & heliocentric systems. I have written the transformation matrices to transfer between these coordinate systems.
To simplify this problem I'll use an example.
If I have 3 coordinate systems:
Cartesian , Cylindrical & Spherical Coordinates
To convert from Cylindrical coordinates to Cartesian coordinates. I can apply:
x = r ∙ cos(ø)
y = r ∙ sin(ø)
z = z
To convert from Spherical coordinates to Cartesian coordinates. I can apply:
x = R ∙ sin(θ) ∙ cos(ø)
y = R ∙ sin(θ) ∙ sin(ø)
z = R ∙ cos(θ)
Assuming we don't convert Spherical Coordinates directly to Cylindrical Coordinates, we convert:
Spherical -> Cartesian
Cartesian -> Cylindrical
In my real problem, I have 8 different coordinate systems with transformations back and forth between each of them. The systems have only have two paths linking to different coordinate systems.
It looks like this:
A <-> B <-> C <-> D <-> E <-> F <-> G <-> H
I want to create a method for the user to choose a coordinate system, input coordinates and select the destination coordinate systems.
Instead of manually writing functions for: A -> C , A -> D , A -> E ... for 54 different steps
Is there a way I can create a system to connect the paths?
Is there a way I can use a graph or nodes and apply functions which connect the nodes (A->C)
What is this concept so I can read up more on it?
You could implement something complicated with object oriented programming, but to keep things simple I propose to store all different types of coordinates as structs that all have a member type and whatever other members needed for that particular type of coordinate.
You could then define all your conversion functions that do a single step to all have the same function signature function out_coord = A2B(in_coord), e.g.:
function cart = sphere2cart(sphere)
assert(strcmp(sphere.type, 'sphere')) % make sure input is correct type
cart.type = 'cart';
cart.x = sphere.R * sin(sphere.theta) * cos(sphere.omega);
cart.y = sphere.R * sin(sphere.theta) * sin(sphere.omega);
cart.z = sphere.R * cos(sphere.theta);
These functions can be called from one universal convert function like this:
function output_coord = convert(input_coord, target_type)
output_coord = input_coord;
while ~strcmp(output_coord.type, target_type)
func = get_next_conversion_func(input_coord.type, target_type);
output_coord = func(output_coord);
end
which does one conversion step at a time, until output_coord has the correct type. The only missing step is then a function that determines which conversion to do next, based on the current type and the target type. In your case with a 'linear' conversion chain, this is not so hard. In more complex cases, where the types are connected in a complex graph, this might require some shortest path algorithm. Unfortunately, this is a bit cumbersome to implement in Matlab, but one hard-coded solution might be something like:
function func = get_next_conversion_func(current_type. target_type);
switch current_type
case 'A'
func = #A2B;
case 'B'
switch target_type
case 'A'
func = #B2A;
case {'C','D','E'}
func = #B2C;
end
case 'C'
switch target_type
case {'A','B'}
func = #C2B;
case {'D','E'}
func = #C2D;
end
...
end
For sure, there are smarter ways to implement this, this is basically a dispatch table that says in which direction to go based on the current type and the target type.
EDIT
Following Jonas' suggestion of doing all conversions via one central type (let's say C), all of this can be simplified to
function output_coord = convert(input_coord, target_type)
output_coord = input_coord;
if strcmp(output_coord.type, target_type)
return % nothing to convert
end
if ~strcmp(output_coord.type, 'C')
switch output_coord.type
case 'A'
output_coord = A2C(output_coord)
case 'B'
output_coord = B2C(output_coord)
case 'D'
output_coord = D2C(output_coord)
case 'E'
output_coord = E2C(output_coord)
end
end
assert(strcmp(output_coord.type, 'C'))
if ~strcmp(output_coord.type, target_type)
switch target_type
case 'A'
output_coord = C2A(output_coord)
case 'B'
output_coord = C2B(output_coord)
case 'D'
output_coord = C2D(output_coord)
case 'E'
output_coord = C2E(output_coord)
end
end
In your current logic, A <-> B <-> C <-> D <-> E <-> F <-> G <-> H, you will have to write 14 functions for converting (if A->B and B->A count as two).
I suggest that instead, you choose one reference coordinate system, say, A and write the functions A<->B, A<->C, etc.
This solution requires you to write the same number of functions as your solution, but the logic becomes trivial. Furthermore, each conversion takes a maximum of two conversion steps, which will avoid accumulating round-off errors when you perform a chain of conversions.