Kafka Streams DSL folding hierarchical data - apache-kafka

Using Kafka Streams DSL, this is what I want to do:
Input message Serdes: Avro for both key and value
Key: Record with fields L1, L2, L3
Value: Record with value V (in this case an int)
What I want to do is collapse this heirarchy in such a way that the produced stream has the correct summed up value. For example,
Input:
L1 L2 L3 V
a1 b1 c1 v1
a1 b1 c2 v2
a1 b2 c1 v3
a1 b2 c2 v4
a2 b1 c1 v5
Output 1: (Data wanted at L1, L2)
L1 L2 V
a1 b1 v1 + v2
a1 b2 v3 + v4
a2 b1 v5
Output 2 (Data wanted at L1)
L1 V
a1 v1 + v2 + v3 + v4
a2 v2
Is there a way the Streams DSL would allow be this? Note that the key type changes across all outputs and I couldn't find a way to perform these rekey + aggregation (since rekey is esentially supposed to merge multiple values). While there might be ways to achieve this using the processor API or basic Kafka Consumer, want to check how to do this in DSL (if possible).

You should be able to use selectKey():
KStream input = builder.stream(...);
input.selectKey(/*create a new output record with only 2 attributes L1 and L2*/)
.groupyByKey()
.aggregate(...);
input.selectKey(/*create a new output record with only 1 attribute L1*/)
.groupByKey()
.aggregate(...)

Related

Creating an RDD of edges from an RDD of graphs

Consider a collection of graphs. In my current case, it is an RDD[Graph[VD, Double]], but it could with certain effort be reworked into Seq[Graph[VD, Double]], if it would make an answer easier, but I prefer the former.
My question is, how to efficiently create an RDD[Edge[Double]] containing the edges of each of the graphs in the collection?
As an example, let the graph collection contain three graphs G1, G2 and G3. Let G1 have edge set { e1, e2, e3 }, G2 have edge set { e4, e5 } and G3 have edge set { e6, e7, e8, e9 }. For an input RDD of graphs containing G1, G2 and G3, the output should be an RDD[Edge[Double]] containing { e1, e2, e3, e4, e5, e6, e7, e8, e9 }.
First, I've tried with flatMap (graphs.flatMap(graph => graph.edges)), but I get a type mismatch error, stating that a TraversableOnce[?] type is required, but EdgeRDD[Double] found.
Further, I've tried first creating a collection of EdgeRDD[Double] with graphs.map(graph => graph.edges) with the intent of further modifying it, but it expectedly failed with 'Spark does not support nested RDDs'
Look at .toLocalIterator. This method allows you to turn EdgesRDD into Iterable and the flatMap will do the job.
Remember that this operation might be expensive. If your initial RDD of type RDD[Graph[VD, Double]] is not cached you should consider caching it. toLocalIterator would sequentially fetch every partition of EdgesRDD.
Your final call could look like this
graphs.flatMap(_.edges.toLocalIterator)

Notations in Coq

I want to use the notations to represent the predicate test as follows:
Variable A B : Type.
Inductive test : A -> B -> A -> B -> Prop :=
| test1 : forall a1 a2 b1 b2,
a1 \ b1 || a2 \ b2
where "c1 '\' st '||' c2 '\' st'" := (test c1 st c2 st')
.
However, the Coq has an error:
Why this notation cannot be accepted in Coq?
The notation is accepted, it's actually that Coq is incorrectly parsing your use of the notation within the definition of test1. To correctly parse this notation you need to adjust the parsing levels of its terms. You can do that with a reserved notation, since these where clauses for notation within an inductive don't support the syntax for configuring the notation:
Variable A B : Type.
Reserved Notation "c1 '\' st '||' c2 '\' st'" (at level 40, st at next level, c2 at next level, no associativity).
Inductive test : A -> B -> A -> B -> Prop :=
| test1 : forall a1 a2 b1 b2,
a1 \ b1 || a2 \ b2
where "c1 '\' st '||' c2 '\' st'" := (test c1 st c2 st')
.
I don't have a good intuition for what parsing levels work well (40 is somewhat arbitrary above), so the best advice I can give is to experiment and if it's parsed incorrectly somewhere then try adjusting the level.

What are the possible ways to define parallel composition in Coq apart from using list?

I know how to use the list to define it but expecting something other than this. Because I think it is not possible to prove associativity and commutative property if we define it using list.
Just guessing that you are asking about parallel composition of state machines. There are many ways to model them in Coq's logic, I don't know if lists are the best way...
If you define the steps as propositional relations you can compose two machines "in parallel" by creating a new inductive relation that can take a step either in one of the machines, or in the other.
This is sketch is not the most general way to write it, but hopefully it gets the idea across.
Suppose the machines can take steps between the states A, B, or C. Machine M1 behaves in one way, and M2 in another. I.e. M1 can step from A to B or C, from B to C, and from C to A. M2 can only step from A to C, and from C to A.
Inductive state : Set := A | B | C.
Inductive M1 : state -> state -> Prop :=
s1 : M1 A B | s2 : M1 A C | s3 : M1 B C | s4 : M1 C A.
Inductive M2 : state -> state -> Prop :=
ss1 : M2 A C | ss2 : M2 C A.
The we can create a new step relation for the combined system
Inductive M1M2 : (state*state) -> (state*state) -> Prop :=
| st1 s1 s1' s2: M1 s1 s1' -> M1M2 (s1, s2) (s1', s2) (* M1 takes a step *)
| st2 s1 s2 s2': M2 s2 s2' -> M1M2 (s1, s2) (s1, s2'). (* M2 takes a step *)

"for" translation into lists high order functions

As far as I understand, for expressions are translated into Scala expressions which are build upon:
map
flatMap
filterWith
foreach
High order lists methods.
A common example is the one where:
for(b1 <= books; b2 <- books if b1 != b2;
a1 <- b1.authors; a2 <- b2.authors if a1 == a2) yield a1;
Results in:
books flatMap (b1 =>
books withFilter( b2 => b1 != b2) flatMap( b2 =>
b1.authors flatMap ( a1 =>
b2.authors withFilter ( a2 => a2 == a1 ) map ( a2 => a1 )
)
)
)
Where:
books is a list of class Book objects (List[Book])
Book has a public attribute authors of type List[String]
My question is about this line:
b2.authors withFilter ( a2 => a2 == a1 ) map ( a2 => a1 )
Since the condition is a2 == a1 that line is equivalent to:
b2.authors withFilter ( a2 => a2 == a1 ) map ( x => x )
Why the generated code isn't just?
b2.authors filter ( a2 => a2 == a1 )
Can it be explained by the fact that the example is the reproduction of code automatically generated by Scala's compiler?
Is filter out of the for "building bricks"?
The translation of for/yield syntax into method calls is very simple and mechanical, almost at the level of string manipulation. withFilter is necessary in some places for its laziness, therefore it's used everywhere for simplicity. I don't understand the phrasing of your final question, but for/yield expressions are AIUI never translated into calls to filter except in a deprecated way for objects that don't yet have a withFilter method.

Encoding abstract keyword semantic in Alloy by constraints

I would like to encode the abstract keyword semantic as a constraint in Alloy (be patient, I need to do this for a reason! :) ). if I have the following code:
abstract sig A {}
sig a1 extends A{}
sig a2 extends A{}
I think its meaning would be as following (I hope I am right!):
sig A {}
sig a1 in A{}
sig a2 in A{}
fact {
A=a1+a2 //A is nothing other than a1 and a2
a1 & a2 = none // a1 and a1 are disjoint
}
so the two above signatures are equal (i.e., would be semantically equal):
I am eager to use the Abstract keyword that Alloy provide to make life easy, But the problem arises when I make A to be a subset of sig O and use abstract keyword:
sig O{}
abstract sig A in O{}
sig a1 extends A{}
sig a2 extends A{}
above syntax returns an error! Alloy complains:"Subset signature cannot be abstract.", so my first question is : Why Alloy does not allow this?
I don't stop and encode abstract keyword semantic (as explained above) , and come to the following code:
sig O{}
sig A in O{}
sig a1 in A{}
sig a2 in A{}
fact {
A=a1+a2 // A can not be independently instantiated
a1 & a2 = none // a1 and a2 are disjoint
}
And this works, and everything is fine :)
Now if I want to add a3 to my Alloy specification, I need to tweak my specification as the following:
sig O{}
sig A in O{}
sig a1 in A{}
sig a2 in A{}
sig a3 in A{}
fact {
A=a1+a2+3
a1 & a2 = none
a1 & a3 = none
a2 & a3 = none
}
But as you see by comparing the two specification above, if I want to continue this, and add a4 in a similar way to my specification, I need to change the fact part even more, and this continues to be hassle! Actually the number of ai & aj =none (for i=1..n) expressions are increasing non-monotonously! i.e., adding a4 force me to add more than one constraint:
fact {
A=a1+a2+3 +a4
a1 & a2 = none
a1 & a3 = none
a1 & a4 = none
a2 & a3 = none
a2 & a4 = none
a3 & a4 = none
}
So my second question:
Is there any workaround (or probably simpler way) to do this?
Any comment is appreciated.
Thx :)
On Q1 (why does Alloy not allow extension of subset signatures?): I don't know.
On Q2 (is there a workaround): the simplest workaround is to make a1 ... an be subsignatures (extensions) of A, and find another way to establish the relation of A and O. In the simple examples you have given, O has no subtypes so simply changing A in O to A extends O would work.
If O is already partitioned by other signatures you haven't shown us, then that workaround doesn't work; it's impossible to say what would work without more detail. (Ideally, you want a minimum complete working example to illustrate the difficulty: the examples you give are minimal, and illustrate one difficulty, but they don't illustrate why A cannot be an extension of O.)
[Addendum]
In a comment, you say
The reason [that I used A in O instead of A extends O] is that there is another signature C in O that is not shown here. A and C are not necessarily disjoint, so that is the reason I think I have to use in instead of extend in defining them to be subset of O.
The devil is in the details, but the conclusion doesn't follow from the premises stated. If A and C both extend O, they will be disjoint, but if one uses extend and the other uses in they are not automatically disjoint. So if you want to have A and C each be a subset of O, and A to be partitioned by several other signatures, it is possible to do so (unless there are other constraints not yet mentioned).
sig O {}
abstract sig A extends O {}
sig a1, a2 extends A {}
sig a3, a4 extends A {}
sig C in O {}