Grouping and Combining Observables in RxJava - rx-java2

I would like to do the following with RxJava
class Invoice(val dayOfMonth:Int,val amount:Int)
below is the sample monthInvoices:List< Invoice > to process
Invoice(3,100)
Invoice(3,150)
Invoice(3,50)
Invoice(4,350)
Invoice(8,400)
Invoice(8,100)
First, I would like to group it by the day of the month like the following
Invoice(3,300)
Invoice(4,350)
Invoice(8,500)
Then I would like to create a list containing all the days of the month. Say, we are having 30 days for this month, then the output list must contain inserting a empty Invoice object with 0 amount for the days where there is no invoice
Desired output List
Invoice(1,0) //Since day 1 is not in the group summed list
Invoice(2,0) //day 2 is also not there
Invoice(3,300)
Invoice(4,350)
Invoice(5,0)
Invoice(6,0)
Invoice(7,0)
Invoice(8,500)
…..
Invoice(30,0)
Hope I have explained the need clearly. Can anyone please answer me a solution to do it entirely using RxJava?

Try this
fun task(invoices: List<Invoice>) =
Observable.fromIterable(invoices)
.groupBy { it.dayOfMonth }
.flatMapSingle { group -> group.reduce(0) { t1, t2 -> t1 + t2.amount }
.map { group.key to it }}
.toMap({ it.first }, { it.second })
.flatMapObservable { map ->
Observable.range(1, 30)
.map { Invoice(it, map[it] ?: 0) }
}

This can be achieved much more easily using the collection operators inside Kotlin's standard library, but in pure RxJava you can do this by using groupBy and reduce.
val invoices = listOf(
Invoice(3, 100),
Invoice(3, 150),
Invoice(3, 50),
Invoice(4, 350),
Invoice(8, 400),
Invoice(8, 100)
)
Observable.range(1, 30)
.map { Invoice(it, 0) } // Create an Observable of Invoice([day], 0)
.mergeWith(Observable.fromIterable(invoices))
.groupBy { it.dayOfMonth } // Merge the sources and groupBy day
.flatMapMaybe { group ->
group.reduce { t1: Invoice, t2: Invoice ->
Invoice(t1.dayOfMonth, t1.amount + t2.amount) // Reduce each group into a single Invoice
}
}
.subscribe {
// Optionally you can call toList before this if you want to aggregate the emissions into a single list
println(it)
}

Related

Flutter Firebase Query 2 Reference [duplicate]

From the docs:
You can also chain multiple where() methods to create more specific queries (logical AND).
How can I perform an OR query?
Example:
Give me all documents where the field status is open OR upcoming
Give me all documents where the field status == open OR createdAt <= <somedatetime>
OR isn't supported as it's hard for the server to scale it (requires keeping state to dedup). The work around is to issue 2 queries, one for each condition, and dedup on the client.
Edit (Nov 2019):
Cloud Firestore now supports IN queries which are a limited type of OR query.
For the example above you could do:
// Get all documents in 'foo' where status is open or upcmoming
db.collection('foo').where('status','in',['open','upcoming']).get()
However it's still not possible to do a general OR condition involving multiple fields.
With the recent addition of IN queries, Firestore supports "up to 10 equality clauses on the same field with a logical OR"
A possible solution to (1) would be:
documents.where('status', 'in', ['open', 'upcoming']);
See Firebase Guides: Query Operators | in and array-contains-any
suggest to give value for status as well.
ex.
{ name: "a", statusValue = 10, status = 'open' }
{ name: "b", statusValue = 20, status = 'upcoming'}
{ name: "c", statusValue = 30, status = 'close'}
you can query by ref.where('statusValue', '<=', 20) then both 'a' and 'b' will found.
this can save your query cost and performance.
btw, it is not fix all case.
I would have no "status" field, but status related fields, updating them to true or false based on request, like
{ name: "a", status_open: true, status_upcoming: false, status_closed: false}
However, check Firebase Cloud Functions. You could have a function listening status changes, updating status related properties like
{ name: "a", status: "open", status_open: true, status_upcoming: false, status_closed: false}
one or the other, your query could be just
...where('status_open','==',true)...
Hope it helps.
This doesn't solve all cases, but for "enum" fields, you can emulate an "OR" query by making a separate boolean field for each enum-value, then adding a where("enum_<value>", "==", false) for every value that isn't part of the "OR" clause you want.
For example, consider your first desired query:
Give me all documents where the field status is open OR upcoming
You can accomplish this by splitting the status: string field into multiple boolean fields, one for each enum-value:
status_open: bool
status_upcoming: bool
status_suspended: bool
status_closed: bool
To perform your "where status is open or upcoming" query, you then do this:
where("status_suspended", "==", false).where("status_closed", "==", false)
How does this work? Well, because it's an enum, you know one of the values must have true assigned. So if you can determine that all of the other values don't match for a given entry, then by deduction it must match one of the values you originally were looking for.
See also
in/not-in/array-contains-in: https://firebase.google.com/docs/firestore/query-data/queries#in_and_array-contains-any
!=: https://firebase.googleblog.com/2020/09/cloud-firestore-not-equal-queries.html
I don't like everyone saying it's not possible.
it is if you create another "hacky" field in the model to build a composite...
for instance, create an array for each document that has all logical or elements
then query for .where("field", arrayContains: [...]
you can bind two Observables using the rxjs merge operator.
Here you have an example.
import { Observable } from 'rxjs/Observable';
import 'rxjs/add/observable/merge';
...
getCombinatedStatus(): Observable<any> {
return Observable.merge(this.db.collection('foo', ref => ref.where('status','==','open')).valueChanges(),
this.db.collection('foo', ref => ref.where('status','==','upcoming')).valueChanges());
}
Then you can subscribe to the new Observable updates using the above method:
getCombinatedStatus.subscribe(results => console.log(results);
I hope this can help you, greetings from Chile!!
We have the same problem just now, luckily the only possible values for ours are A,B,C,D (4) so we have to query for things like A||B, A||C, A||B||C, D, etc
As of like a few months ago firebase supports a new query array-contains so what we do is make an array and we pre-process the OR values to the array
if (a) {
array addObject:#"a"
}
if (b) {
array addObject:#"b"
}
if (a||b) {
array addObject:#"a||b"
}
etc
And we do this for all 4! values or however many combos there are.
THEN we can simply check the query [document arrayContains:#"a||c"] or whatever type of condition we need.
So if something only qualified for conditional A of our 4 conditionals (A,B,C,D) then its array would contain the following literal strings: #["A", "A||B", "A||C", "A||D", "A||B||C", "A||B||D", "A||C||D", "A||B||C||D"]
Then for any of those OR combinations we can just search array-contains on whatever we may want (e.g. "A||C")
Note: This is only a reasonable approach if you have a few number of possible values to compare OR with.
More info on Array-contains here, since it's newish to firebase docs
If you have a limited number of fields, definitely create new fields with true and false like in the example above. However, if you don't know what the fields are until runtime, you have to just combine queries.
Here is a tags OR example...
// the ids of students in class
const students = [studentID1, studentID2,...];
// get all docs where student.studentID1 = true
const results = this.afs.collection('classes',
ref => ref.where(`students.${students[0]}`, '==', true)
).valueChanges({ idField: 'id' }).pipe(
switchMap((r: any) => {
// get all docs where student.studentID2...studentIDX = true
const docs = students.slice(1).map(
(student: any) => this.afs.collection('classes',
ref => ref.where(`students.${student}`, '==', true)
).valueChanges({ idField: 'id' })
);
return combineLatest(docs).pipe(
// combine results by reducing array
map((a: any[]) => {
const g: [] = a.reduce(
(acc: any[], cur: any) => acc.concat(cur)
).concat(r);
// filter out duplicates by 'id' field
return g.filter(
(b: any, n: number, a: any[]) => a.findIndex(
(v: any) => v.id === b.id) === n
);
}),
);
})
);
Unfortunately there is no other way to combine more than 10 items (use array-contains-any if < 10 items).
There is also no other way to avoid duplicate reads, as you don't know the ID fields that will be matched by the search. Luckily, Firebase has good caching.
For those of you that like promises...
const p = await results.pipe(take(1)).toPromise();
For more info on this, see this article I wrote.
J
OR isn't supported
But if you need that you can do It in your code
Ex : if i want query products where (Size Equal Xl OR XXL : AND Gender is Male)
productsCollectionRef
//1* first get query where can firestore handle it
.whereEqualTo("gender", "Male")
.addSnapshotListener((queryDocumentSnapshots, e) -> {
if (queryDocumentSnapshots == null)
return;
List<Product> productList = new ArrayList<>();
for (DocumentSnapshot snapshot : queryDocumentSnapshots.getDocuments()) {
Product product = snapshot.toObject(Product.class);
//2* then check your query OR Condition because firestore just support AND Condition
if (product.getSize().equals("XL") || product.getSize().equals("XXL"))
productList.add(product);
}
liveData.setValue(productList);
});
For Flutter dart language use this:
db.collection("projects").where("status", whereIn: ["public", "unlisted", "secret"]);
actually I found #Dan McGrath answer working here is a rewriting of his answer:
private void query() {
FirebaseFirestore db = FirebaseFirestore.getInstance();
db.collection("STATUS")
.whereIn("status", Arrays.asList("open", "upcoming")) // you can add up to 10 different values like : Arrays.asList("open", "upcoming", "Pending", "In Progress", ...)
.addSnapshotListener(new EventListener<QuerySnapshot>() {
#Override
public void onEvent(#Nullable QuerySnapshot queryDocumentSnapshots, #Nullable FirebaseFirestoreException e) {
for (DocumentSnapshot documentSnapshot : queryDocumentSnapshots) {
// I assume you have a model class called MyStatus
MyStatus status= documentSnapshot.toObject(MyStatus.class);
if (status!= null) {
//do somthing...!
}
}
}
});
}

Iterate over row and create batch: DataFrame

I have a DataFrame with millions of row and I am iterating over them using following code:
df.foreachPartition { dataSetPartition => {
dataSetPartition.foreach(row => {
// DO SOMETHING like DB write/ s3 publish
})
}
}
Now I want to create batch operation for rows, so I change code with
df.foreachPartition { dataSetPartition => {
val rowBuffer = scala.collection.mutable.ListBuffer[Row]()
dataSetPartition.foreach(row => {
rowBuffer += row
if (rows.size == 1000) {
// DO ACTION like DB write/s3 publish <- DO_ACTION
rowBuffer.clear
}
})
if (rowBuffer.size > 0) {
// DO ACTION like DB write/s3 publish <-DO_ACTION
rowBuffer.clear
}
}
}
Problem in this approach is that DO_ACTION is repeated twice. I do not want to call dataSetPartition.size to get row count beforehand as it is lazy evaluated and might be costly operation.
Version:
Scala: 2.11
Spark: 2.2.1
I would suggest to use Scalas grouped method to create batches :
df.foreachPartition { dataSetPartition => {
dataSetPartition.grouped(1000).foreach(batch => {
// DO ACTION like DB write/s3 publish <- DO_ACTION
})
}
}

Linear funnel from a collection of events with MongoDB aggregation, is it possible?

I have a number of event documents, each event has a number of fields, but the ones that are relevant for my query are:
person_id - a reference to the person that triggered the event
event - a string key to identify the event
occurred_at - the utc of the time the event occurred
What I want to achieve is:
for a list of event keys eg `['event_1','event_2', 'event_3']
get counts of the number of people that performed each event and all the event previous to that event, in order, ie:
the number of people who performed event_1
the number of people who performed event_1, and then event_2
the number of people who performed event_1, and then event_2, and then event_3
etc
a secondary goal is to be able to get the average occurred_at date for each event so that I can calculate the average time between each event
The best I have got is the following two map reduces:
db.events.mapReduce(function () {
emit(this.person_id, {
e: [{
e: this.event,
o: this.occurred_at
}]
})
}, function (key, values) {
return {
e: [].concat.apply([], values.map(function (x) {
return x.e
}))
}
}, {
query: {
account_id: ObjectId('52011239b1b9229f92000003'),
event: {
$in: ['event_a', 'event_b', 'event_c','event_d','event_e','event_f']
}
},
out: 'people_funnel_chains',
sort: { person_id: 1, occurred_at: 1 }
})
And then:
db.people_funnel_chains.mapReduce(function() {
funnel = ['event_a', 'event_b', 'event_c','event_d','event_e','event_f']
events = this.value.e;
for (var e in funnel) {
e = funnel[e];
if ((i = events.map(function (x) {
return x.e
}).indexOf(e)) > -1) {
emit(e, { c: 1, o: events[i].o })
events = events.slice(i + 1, events.length);
} else {
break;
}
}
}, function(key,values) {
return {
c: Array.sum(values.map(function(x) { return x.c })),
o: new Date(Array.sum(values.map(function(x) { return x.o.getTime() }))/values.length)
};
}, { out: {inline: 1} })
I would like to achieve this is in real time using the aggregate framework but can see no way to do it. For 10s of thousands of records this is taking 10s of seconds, I can run it incrementally which means its fast enough for new data coming in but if I want to modify the original query (eg change the event chain) it can't be done in a single request which I would love it to be able to do.
Update using Cursor.forEach()
Using Cursor.forEach() I've managed to get huge improvement on this (essentially removing the requirement for the first map reduce).
var time = new Date().getTime(), funnel_event_keys = ['event_a', 'event_b', 'event_c','event_d','event_e','event_f'], looking_for_i = 0, looking_for = funnel_event_keys[0], funnel = {}, last_person_id = null;
for (var i in funnel_event_keys) { funnel[funnel_event_keys[i]] = [0,null] };
db.events.find({
account_id: ObjectId('52011239b1b9229f92000003'),
event: {
$in: funnel_event_keys
}
}, { person_id: 1, event: 1, occurred_at: 1 }).sort({ person_id: 1, occurred_at: 1 }).forEach(function(e) {
var current_person_id = e['person_id'].str;
if (last_person_id != current_person_id) {
looking_for_i = 0;
looking_for = funnel_event_keys[0]
}
if (e['event'] == looking_for) {
var funnel_event = funnel[looking_for]
funnel_event[0] = funnel_event[0] + 1;
funnel_event[1] = ((funnel_event[1] || e['occurred_at'].getTime()) + e['occurred_at'].getTime())/2;
looking_for_i = looking_for_i + 1;
looking_for = funnel_event_keys[looking_for_i]
}
last_person_id = current_person_id;
})
funnel;
new Date().getTime() - time;
I wonder if something custom with data in memory would be able to improve on this? Getting 100s of thousands of records out of MongoDB into memory (on a different machine) is going to be a bottle neck, is there a technology I'm not aware of that could do this?
I wrote up a complete answer on my MongoDB blog but as a summary, what you have to do is project your actions based on which ones you care about to map values of action field into appropriate key names, group by person aggregating for the three actions when they did them (and optionally how many times) and then project new fields which check if action2 was done after action1, and action3 was done after action2... Last phase just sums up the number of people who did just 1, or 1 and then 2, or 1 and then 2 and then 3.
Using a function to generate the aggregation pipeline, it's possible to generate results based on array of actions passed in.
In my test case, the entire pipeline ran in under 200ms for a collection of 40,000 documents (this was on my small laptop).
As it was correctly pointed out, the general solution I describe assumes that while an actor can take any action multiple times that they can only advance from action1 to action2 but that they cannot skip directly from action1 to action3 (interpreting action order as describing prerequisites where you cannot do action3 until you've done action2).
As it turns out, aggregation framework can be used even for sequences of events where the order is completely arbitrary but you still want to know how many people at some point did the sequence action1, action2, action3.
The main adjustment to make on the original answer is to add an extra two-stage step in the middle. This step unwinds the collected by person document to re-group it finding the first occurrence of the second action that comes after the first occurrence of the first action.
Once we have that the final comparison becomes for action1, followed by earliest occurrence of action2 and compare that to the latest occurrence of action3.
It can probably be generalized to handle arbitrary number of events but every additional event past two would add two more stages to the aggregation.
Here is my write-up of the modification of the pipeline to achieve the answer you are looking for.

mongodb query with group()?

this is my collection structure :
coll{
id:...,
fieldA:{
fieldA1:[
{
...
}
],
fieldA2:[
{
text: "ciao",
},
{
text: "hello",
},
]
}
}
i want to extract all fieldA2 in my collection but if the fieldA2 is in two or more times i want show only one.
i try this
Db.runCommand({distinct:’coll’,key:’fieldA.fieldA2.text’})
but nothing. this return all filedA1 in the collection.
so i try
db.coll.group( {
key: { 'fieldA.fieldA2.text': 1 },
cond: { } },
reduce: function ( curr, result ) { },
initial: { }
} )
but this return an empty array...
How i can do this and see the execution time?? thank u very match...
Since you are running 2.0.4 (I recommend upgrading), you must run this through MR (I think, maybe there is a better way). Something like:
map = function(){
for(i in this.fieldA.fieldA2){
emit(this.fieldA.fieldA2[i].text, 1);
// emit per text value so that this will group unique text values
}
}
reduce = function(values){
// Now lets just do a simple count of how many times that text value was seen
var count = 0;
for (index in values) {
count += values[index];
}
return count;
}
Will then give you a collection of documents whereby _id is the unique text value from fieldA2 and the value field is of the amount of times is appeared i the collection.
Again this is a draft and is not tested.
I think the answer is simpler than a Map/Reduce .. if you just want distinct values plus execution time, the following should work:
var startTime = new Date()
var values = db.coll.distinct('fieldA.fieldA2.text');
var endTime = new Date();
print("Took " + (endTime - startTime) + " ms");
That would result in a values array with a list of distinct fieldA.fieldA2.text values:
[ "ciao", "hello", "yo", "sayonara" ]
And a reported execution time:
Took 2 ms

Retrieve unique random items from a mongodb collection?

I run an IRC bot and I have a function which returns 1 random url using Math.random at the moment, from my Mongodb collection.
I would like to refactor it to return x number of unique items, and for each subsequent invocation of the url fetching command .getlinks I would like that it keeps everything unique, so that a user doesn't see the same link unless all the possible links have been already returned.
Is there some algorithm or native mongodb function I could use for this?
Here's a sample scenario:
I have a total of 9 records in the collection. They have a _id and url field.
user a: .getlinks()
bot returns: http://unique-link-1, http://unique-link-2, http://unique-link-3, http://unique-link-4
user a: .getlinks()
bot returns: http://unique-link-5, http://unique-link-6, http://unique-link-7, http://unique-link-8
user a: .getlinks()
bot returns: http://unique-link-9, http://unique-link-6, http://unique-link-1, http://unique-link-3
Background information:
There's a total of about 200 links. I estimate that will grow to around 5000 links by the end of next year.
Currently the only thing I can think of is keeping an array of all returned items, and grabbing all items from the collection at once and getting a random one 4 times and making sure it's unique and hasn't been shown already.
var shown = [], amountToReturn = 4;
function getLinks() {
var items = links.find(), returned = [];
for ( var i = 0; i<amountToReturn; i++ ) {
var rand = randItem( items );
if ( shown.indexOf( rand.url ) == -1 && shown.length < items.length ) ) {
returned.push( rand.url );
}
}
message.say( returned.join(',') );
}
You should find a number of possible options to get random item(s) from Collection here ...
http://jira.mongodb.org/browse/SERVER-533
Another intersting method is documented here ...
http://cookbook.mongodb.org/patterns/random-attribute/
The method mentioned above basically creates a new key/value on the document using Math.random()
> db.docs.drop()
> db.docs.save( { key : 1, ..., random : Math.random() } )
> db.docs.save( { key : 1, ..., random : Math.random() } )
> db.docs.save( { key : 2, ..., random : Math.random() } )
... many more insertions with 'key : 2' ...
> db.docs.save( { key : 2, ..., random : Math.random() } )
...
Get random records form mongodb via map/reduce
// map
function() {
emit(0, {k: this, v: Math.random()})
}
// reduce
function(k, v) {
var a = []
v.forEach(function(x) {
a = a.concat(x.a ? x.a : x)
})
return {a:a.sort(function(a, b) {
return a.v - b.v;
}).slice(0, 3 /*how many records you want*/)};
}
// finalize
function(k, v) {
return v.a.map(function(x) {
return x.k
})
}