Task list with re-ordering feature using Firebase/Firestore

Task list with re-ordering feature using Firebase/Firestore - swift

I want to make a list of tasks that can change their order, but I am not sure how to store this in a database.
I don't want to use array because I have to do some queries further in future.
Here is the screenshot of my database:
I'm trying to make something like Trello where the user adds tasks and can move tasks upward and downward according to their priority. I need to change the position of the tasks in the database as well to maintain the record. I'm unable to understand how to do that in any database. I'm an experienced developer and I have worked with mongodb and firebase but this is something unique for me.
Here is the code to create and get all tasks. When I try to move some task in collection. I maintained an index in each task.
Let's say when I move a task from the position of index 5 to index 2 then I have to edit all the upcoming indexes by +1. Is there some way I can avoid doing this?
Code Sample
class taskManager {
static let shared = taskManager()
typealias TasksCompletion = (_ tasks:[Task],_ error:String?)->Void
typealias SucessCompletion = (_ error:String?)->Void
func addTask(task:Task,completion:#escaping SucessCompletion){
Firestore.firestore().collection("tasks").addDocument(data: task.toDic) { (err) in
if err != nil {
print(err?.localizedDescription as Any)
}
completion(nil)
}
}
func getAllTask(completion:#escaping TasksCompletion){
Firestore.firestore().collection("tasks")
.addSnapshotListener { taskSnap, error in
taskSnap?.documentChanges.forEach({ (task) in
let object = task.document.data()
let json = try! JSONSerialization.data(withJSONObject: object, options: .prettyPrinted)
var taskData = try! JSONDecoder().decode(Task.self, from: json)
taskData.id = task.document.documentID
if (task.type == .added) {
Task.shared.append(taskData)
}
if (task.type == .modified) {
let index = Task.shared.firstIndex(where: { $0.id == taskData.id})!
Task.shared[index] = taskData
}
})
if error == nil{
completion(Task.shared,nil)
}else{
completion([],error?.localizedDescription)
}
}
}
}

I think the question you're trying to ask about is more about database design.
When you want to be able to keep order with a group of items while being able to reorder them you will need a column to keep the order.
You run into an issue when you try to order them if they are sequentially ordered.
Example
For example if you wanted to move Item1 behind Item4:
Before
An item with an ordering index.
1. Item1, order: 1
2. Item2, order: 2
3. Item3, order: 3
4. Item4, order: 4
5. Item5, order: 5
6. Item6, order: 6
After
Problem: we had to update every record between the item being moved and where it was placed.
Why this is a problem: this is a Big O(n) - for every space we move we have to update that many records. As you get more tasks this becomes more of an issue as it will take longer and not scale well. It would be nice to have a Big O(1) where we have a constant amount of changes or as few as possible.
1. Item2, order: 1 - Updated
2. Item3, order: 2 - Updated
3. Item4, order: 3 - Updated
4. Item1, order: 4 - Updated
5. Item5, order: 5
6. Item6, order: 6
Possible Solution #1 (OK Maybe?) - Spacing
You could try to come up with a crafty method where you try to space the order numbers out so that you have holes that can be filled without updating multiple records.
This could get tricky though, and you may think, "Why not store Item1 at order: 4.5" I added a related question below that goes into that idea and why you should avoid it.
You may be able to verify the safety of the order client side and avoid hitting the database to determine the new order ID of the move.
This also has limitations as you may have to rebalance the spacing or maybe you run out of numbers to items. You may have to check for a conflict and when a conflict arises you perform a rebalance on everything or recursively the items around the conflict making sure that other balancing updates don't cause more conflicts and that additional conflicts are resolved.
1. Item2, order: 200
2. Item3, order: 300
3. Item4, order: 400
4. Item1, order: 450 - Updated
5. Item5, order: 500
6. Item6, order: 600
Possible Solution #2 (Better) - Linked Lists
As mentioned in the related link below you could use a data structure like a linked list. This retains a constant amount of changes to update so it is Big O(1). I will go into a linked list a bit in case you haven't played with the data structure yet.
As you can see below this change only required 3 updates, I believe the max would be 5 as shown in Expected Updates. You may be thinking, "Well it took about that many with the first original problem/example!" The thing is that this will always be a max of 5 updates compared to the possibility of thousands or millions with the original approach [Big O(n)].
1. Item2, previous: null, next: Item3 - Updated // previous is now null
2. Item3, previous: Item2, next: Item4
3. Item4, previous: Item3, next: Item1 - Updated // next is now Item1
4. Item1, previous: Item4, next: Item5 - Updated // previous & next updated
5. Item5, previous: Item1, next: Item4 - Updated // previous is now Item1
6. Item6, previous: Item6, next: null
Expected Updates
Item being moved (previous, next)
Old previous item's next
Old next item's previous
New previous item's next
New next item's previous
Linked Lists
I guess I used a double linked list. You probably could get away with just using a single linked list where it doesn't have a previous attribute and only a next instead.
The idea behind a linked list is to think of it a chain link, when you want to move one item you would decouple it from the link in front of it and behind it, then link those links together. Next you would open up where you would want to place it between, now it would have the new links on each side of it, and for those new links they would now be linked to the new link instead of each other.
Possible Solution #3 - Document/Json/Array Storage
You said you want to stay away from arrays, but you could utilize document storage. You could still have a searchable table of items, and then each collection of items would just have an array of item id/references.
Items Table
- Item1, id: 1
- Item2, id: 2
- Item3, id: 3
- Item4, id: 4
- Item5, id: 5
- Item6, id: 6
Item Collection
[2, 3, 4, 1, 5, 6]
Related Question(s)
Storing a reorderable list in a database
Resources on Big O
A guide on Big O
More on Big O
Wiki Big O
Other Considerations
Your database design will depend on what you're trying to accomplish. Can items belong to multiple boards or users?
Can you offload some ordering to the client side and allow it to tell the server what the new order is? You should still avoid inefficient ordering algorithms on the client side, but you can get them to do some of the dirty work if you trust them and don't have any issues with data integrity if multiple people are working on the same items at the same time (those are other design problems, that may or may not be related to the DB, depending on how you handle them.)

I was stuck on the same problem for a long time. The best solution I found was to order them Lexicographically.
Trying to manage a decimal rank (1, 2, 3, 4...) runs into a lot of problems that are all mentioned in other answers on this question. Instead, I store the rank as a string of characters ('aaa', 'bbb', 'ccc'...) and I use the character codes of the characters in the strings to find a spot between to ranks when adjustments are made.
For example, I have:
{
item: "Star Wars",
rank: "bbb"
},
{
item: "Lord of the Rings",
rank: "ccc"
},
{
item: "Harry Potter",
rank: "ddd"
},
{
item: "Star Trek",
rank: "eee"
},
{
item: "Game of Thrones",
rank: "fff"
}
Now I want to move "Game of Thrones" to the third slot, below "Lord of the Rings" ('ccc') and above "Harry Potter" ('ddd').
So I use the character codes of 'ccc' and 'ddd' to mathematically find the average between the two strings; in this case, that ends up being 'cpp' and I'll update the document to:
{
item: "Game of Thrones",
rank: "cpp"
}
Now I have:
{
item: "Star Wars",
rank: "bbb"
},
{
item: "Lord of the Rings",
rank: "ccc"
},
{
item: "Game of Thrones",
rank: "cpp"
},
{
item: "Harry Potter",
rank: "ddd"
},
{
item: "Star Trek",
rank: "eee"
}
If I run out of room between two ranks, I can simply add a letter to the end of the string; so, between 'bbb' and 'bbc', I can insert 'bbbn'.
This is a benefit over decimal ranking.
Things to be aware of
Do not assign 'aaa' or 'zzz' to any item. These need to be withheld to easily allow for moving items to the top or bottom of the list. If "Star Wars" has rank 'aaa' and I want to move something above it, there would be problems. Solvable problems, but this is easily avoided if you start at rank 'bbb'. Then, if you want to move something above the top rank, you can simply find the average between 'bbb' and 'aaa'.
If your list gets reshuffled frequently, it would be good practice to periodically refresh the rankings. If things are moved to the same spot in a list thousands of times, you may get a long string like 'bbbbbbbbbbbbbbbbbbbbbbbbbbbbbbn'. You may want to refresh the list when a string gets to be a certain length.
Implementation
The algorithm and an explanation of the functions used to achieve this effect can be found here. Credit for this idea goes to the author of that article.
The code I use in my project
Again, credit for this code goes to the author of the article I linked above, but this is the code I have running in my project to find the average between two strings. This is written in Dart for a Flutter app
import 'dart:math';
const ALPHABET_SIZE = 26;
String getRankBetween({String firstRank, String secondRank}) {
assert(firstRank.compareTo(secondRank) < 0,
"First position must be lower than second. Got firstRank $firstRank and second rank $secondRank");
/// Make positions equal
while (firstRank.length != secondRank.length) {
if (firstRank.length > secondRank.length)
secondRank += "a";
else
firstRank += "a";
}
var firstPositionCodes = [];
firstPositionCodes.addAll(firstRank.codeUnits);
var secondPositionCodes = [];
secondPositionCodes.addAll(secondRank.codeUnits);
var difference = 0;
for (int index = firstPositionCodes.length - 1; index >= 0; index--) {
/// Codes of the elements of positions
var firstCode = firstPositionCodes[index];
var secondCode = secondPositionCodes[index];
/// i.e. ' a < b '
if (secondCode < firstCode) {
/// ALPHABET_SIZE = 26 for now
secondCode += ALPHABET_SIZE;
secondPositionCodes[index - 1] -= 1;
}
/// formula: x = a * size^0 + b * size^1 + c * size^2
final powRes = pow(ALPHABET_SIZE, firstRank.length - index - 1);
difference += (secondCode - firstCode) * powRes;
}
var newElement = "";
if (difference <= 1) {
/// add middle char from alphabet
newElement = firstRank +
String.fromCharCode('a'.codeUnits.first + ALPHABET_SIZE ~/ 2);
} else {
difference ~/= 2;
var offset = 0;
for (int index = 0; index < firstRank.length; index++) {
/// formula: x = difference / (size^place - 1) % size;
/// i.e. difference = 110, size = 10, we want place 2 (middle),
/// then x = 100 / 10^(2 - 1) % 10 = 100 / 10 % 10 = 11 % 10 = 1
final diffInSymbols =
difference ~/ pow(ALPHABET_SIZE, index) % (ALPHABET_SIZE);
var newElementCode = firstRank.codeUnitAt(secondRank.length - index - 1) +
diffInSymbols +
offset;
offset = 0;
/// if newElement is greater then 'z'
if (newElementCode > 'z'.codeUnits.first) {
offset++;
newElementCode -= ALPHABET_SIZE;
}
newElement += String.fromCharCode(newElementCode);
}
newElement = newElement.split('').reversed.join();
}
return newElement;
}

There are several approaches you might follow to achieve such functionality.
Approach #1:
You can give your task distant positions instead of continuous position, something like this:
Date: 10 April 2019
Name: "some task name"
Index: 10
...
Index: 20
...
Index: 30
Here are total 3 tasks with position 10, 20, 30. Now lets say you wanted to move third task in the middle, simply change the position to 15, now you have three task with position 10, 15, 20, I am sure you can sort according to the position when getting all tasks from the database, and I also assume that you can get positions of tasks because user will be re arranging the tasks on a mobile app or web app so you can easily get the positions of surrounding tasks and calculate the middle position of surrounding tasks,
Now lets say you wanted to move the first task(which now have possition index 10) in the middle, simply get the positions of surrounding tasks which is 15 and 20 and calculate the middle which is 17.5 ( (20-15)/2=17.5 ) and here you go, now you have positions 15, 17.5, 20
Someone said there is infinity between 1 and 2 so you are not going to run our of numbers I think, but still of you think you will run out of division soon, you can increase the difference and you can make it 100000...00 instead of 10
Approach #2:
You can save all of your tasks in the same document instead of sperate document in stratified json form, something like this:
Tasks: [ {name:"some name",date: "some date" },{name:"some name",date: "some date"},{name:"some name",date: "some date" } ]
By doing this you will get all task at once on the screen and you will parse the json as local array, when user rearrange the task you will simply change the position of that array element locally and save the stratified version of the tasks in database as well, there are some cons of this approach, if you are using pagination it might be difficult to do so but hopefully you will not be using the pagination in task management app and you probably wanted to show all task on the scree at the same time just like Trello does.

Related

Better way to find sums in a grid in Swift

I have an app with a 6x7 grid that lets the user input values. After each value is obtained the app checks to find if any of the consecutive values create a sum of ten and executes further code (which I have working well for the 4 test cases I've written). So far I've been writing if statements similar to the below:
func findTens() {
if (rowOneColumnOnePlaceHolderValue + rowOneColumnTwoPlaceHolderValue) == 10 {
//code to execute
} else if (rowOneColumnOnePlaceHolderValue + rowOneColumnTwoPlaceHolderValue + rowOneColumnThreePlaceHolderValue) == 10 {
//code to execute
} else if (rowOneColumnOnePlaceHolderValue + rowOneColumnTwoPlaceHolderValue + rowOneColumnThreePlaceHolderValue + rowOneColumnFourPlaceHolderValue) == 10 {
//code to execute
} else if (rowOneColumnOnePlaceHolderValue + rowOneColumnTwoPlaceHolderValue + rowOneColumnThreePlaceHolderValue + rowOneColumnFourPlaceHolderValue + rowOneColumnFivePlaceHolderValue) == 10 {
//code to execute
}
That's not quite halfway through row one, and it will end up being a very large set of if statements (231 if I'm calculating correctly, since a single 7 column row would be 1,2-1,2,3-...-2,3-2,3,4-...-67 so 21 possibilities per row). I think there must be a more concise way of doing it but I've struggled to find something better.
I've thought about using an array of each of the rowXColumnYPlaceHolderValue variables similar to the below:
let rowOnePlaceHolderArray = [rowOneColumnOnePlaceHolderValue, rowOneColumnTwoPlaceHolderValue, rowOneColumnThreePlaceHolderValue, rowOneColumnFourPlaceHolderValue, rowOneColumnFivePlaceHolderValue, rowOneColumnSixPlaceHolderValue, rowOneColumnSevenPlaceHolderValue]
for row in rowOnePlaceHolderArray {
//compare each element of the array here, 126 comparisons
}
But I'm struggling to find a next step to that approach, in addition to the fact that those array elements then apparently because copies and not references to the original array anymore...
I've been lucky enough to find some fairly clever solutions to some of the other issues I've come across for the app, but this one has given me trouble for about a week now so I wanted to ask for help to see what ideas I might be missing. It's possible that there will not be another approach that is significantly better than the 231 if statement approach, which will be ok. Thank you in advance!

Here's an idea (off the top of my head; I have not bothered to optimize). I'll assume that your goal is:
Given an array of Int, find the first consecutive elements that sum to a given Int total.
Your use of "10" as a target total is just a special case of that.
So I'll look for consecutive elements that sum to a given total, and if I find them, I'll return their range within the original array. If I don't find any, I'll return nil.
Here we go:
extension Array where Element == Int {
func rangeOfSum(_ sum: Int) -> Range<Int>? {
newstart:
for start in 0..<count-1 {
let slice = dropFirst(start)
for n in 2...slice.count {
let total = slice.prefix(n).reduce(0,+)
if total == sum {
return start..<(start+n)
}
if total > sum {
continue newstart
}
if n == slice.count && total < sum {
return nil
}
}
}
return nil
}
}
Examples:
[1, 8, 6, 2, 8, 4].rangeOfSum(10) // 3..<5, i.e. 2,8
[1, 8, 1, 2, 8, 4].rangeOfSum(10) // 0..<3, i.e. 1,8,1
[1, 8, 3, 2, 9, 4].rangeOfSum(10) // nil
Okay, so now that we've got that, extracting each possible row or column from the grid (or whatever the purpose of the game is) is left as an exercise for the reader. 🙂

When using queryOrderedByChild to get a top score list, how can I display tied scores in correct order?

My app has a top scores table, which uses .queryOrdered(byChild: "userHighScore") and .queryLimited(toLast: 100) to get the 100 highest scores:
func getTopScores() {
var scores = [String]()
self.reference
.child("users")
.queryOrdered(byChild: "userHighScore")
.queryLimited(toLast: 100)
.observeSingleEvent(of: .value, with: { (snapshot) in
for child in snapshot.children.reversed() {
if let dataSnapshot = child as? DataSnapshot, let user = dataSnapshot.value as? [String:AnyObject] {
if let userHighScore = user["userHighScore"] as? Int {
var stringToAppend = "\(scores.count+1). "
if let userName = user["userName"] as? String {
stringToAppend += " \(userName) • \(userHighScore)"
}
scores.append(stringToAppend)
}
}
}
})
}
The problem is when there's a tie score.
Ideally, the first user to get that score would be above the others with the same score on the table (and the most recent user to get that score would be the lowest of those with the same score).
However, the table displays those with tied scores in the same random order that they appear in the Firebase Realtime Database.
Is there a way to sort tied scores chronologically?
I've spun my wheels on this for many hours now. Thanks in advance for any help!

Ideally, the first user to get that score would be above the others with the same score on the table
You're essentially trying to order on two properties: userHighScore and userHighScoreTimestamp (I made up a name, since you didn't specify it in your question). The Firebase Realtime Database can only query on a single property. See Query based on multiple where clauses in Firebase
What you can do is define an additional property that combines the values of the high score and the timestamp that is was accomplished:
users: {
derenceUid: {
userHighScore: 42,
userHighScoreTimestamp: 1557695609344,
userHighScore_Timestamp: "000042_1557695609344",
},
dougUid: {
userHighScore: 31,
userHighScoreTimestamp: 1557609264895,
userHighScore_Timestamp: "000005_1557609264895",
},
pufUid: {
userHighScore: 42,
userHighScoreTimestamp: 1557695651730,
userHighScore_Timestamp: "000042_1557695651730",
}
}
With this structure you can get the most-recent highest-scoring user with:
ref.queryOrdered(byChild: "userHighScore_Timestamp"). queryLimited(toLast: 1)
One thing to note tis that I padded the user's high score to be six digits. This is necessary since Firebase will do a lexicographical comparison of these keys, and without the padding "5_1557609264895" would be larger than "42_1557695609344". You'll have to figure out how many digits your maximum score can hold. In my example I used 6 digits, so it can hold scores up to 1 million.

Ideally, what you need is orderByChild for two values, score and date. As far as i know Firebase Realtime Database does not support that. I could suggest two alternatives:
Store scores not as strings but as objects, including two values: "Score" and "Timestamp".
Then when you get the list of score objects, sort them on your client side by score, and then sort the ties again by date.
Add a small value to each score based on the date. The value needs to be small enough so it won't affect the score. For example: calculate number of minutes from epoch (right now it's about 25961) divide by 1 million, should give you 0.025961. Adding that to your every score won't change score sorting, and will help to keep them chronological. Just don't forget to get rid of the fractional part when presenting the scores.

"Appending" to an ArraySlice?

Say ...
you have about 20 Thing
very often, you do a complex calculation running through a loop of say 1000 items. The end result is a varying number around 20 each time
you don't know how many there will be until you run through the whole loop
you then want to quickly (and of course elegantly!) access the result set in many places
for performance reasons you don't want to just make a new array each time. note that unfortunately there's a differing amount so you can't just reuse the same array trivially.
What about ...
var thingsBacking = [Thing](repeating: Thing(), count: 100) // hard limit!
var things: ArraySlice<Thing> = []
func fatCalculation() {
var pin: Int = 0
// happily, no need to clean-out thingsBacking
for c in .. some huge loop {
... only some of the items (roughly 20 say) become the result
x = .. one of the result items
thingsBacking[pin] = Thing(... x, y, z )
pin += 1
}
// and then, magic of slices ...
things = thingsBacking[0..<pin]
(Then, you can do this anywhere... for t in things { .. } )
What I am wondering, is there a way you can call to an ArraySlice<Thing> to do that in one step - to "append to" an ArraySlice and avoid having to bother setting the length at the end?
So, something like this ..
things = ... set it to zero length
things.quasiAppend(x)
things.quasiAppend(x2)
things.quasiAppend(x3)
With no further effort, things now has a length of three and indeed the three items are already in the backing array.
I'm particularly interested in performance here (unusually!)
Another approach,
var thingsBacking = [Thing?](repeating: Thing(), count: 100) // hard limit!
and just set the first one after your data to nil as an end-marker. Again, you don't have to waste time zeroing. But the end marker is a nuisance.
Is there a more better way to solve this particular type of array-performance problem?

Based on MartinR's comments, it would seem that for the problem
the data points are incoming and
you don't know how many there will be until the last one (always less than a limit) and
you're having to redo the whole thing at high Hz
It would seem to be best to just:
(1) set up the array
var ra = [Thing](repeating: Thing(), count: 100) // hard limit!
(2) at the start of each run,
.removeAll(keepingCapacity: true)
(3) just go ahead and .append each one.
(4) you don't have to especially mark the end or set a length once finished.
It seems it will indeed then use the same array backing. And it of course "increases the length" as it were each time you append - and you can iterate happily at any time.
Slices - get lost!

In mongo, how to find for a set of items and then add more to fill the required item count

Let's say I have a list of items. I need to find (return a cursor) exactly 8 items. First I need to see how many featured items are there. If I can get 8 featured items, then no issue. But if the count is less than 8, I need to randomly items until I get 8.
Is it possible to do this in mongodb?

If you sort the cursor by your featured field you can pick up the featured ones first and then fill in with others:
const noMoreThan8Docs = MyCollection.find({},{ sort: { featured: -1 }, limit: 8 });
This assumes that featured is a boolean key. Booleans sort false-then-true so you need to reverse the sort.
I'm not sure how random the documents that are selected after the featured ones will be. However, since you're using Meteor and Meteor uses random _ids (unlike MongoDB native) you can sort on that key as well.
const noMoreThan8Docs = MyCollection.find({},{ sort: { featured: -1, _id: 1 }, limit: 8 });
This is also not truly random since the same non-featured documents will tend to sort first. If you want to really randomize the non-featured items you'll want to do a random find of those and append them if you have less than 8 featured documents.

I think what you want to do is pad out the list of items to make sure you always return 8. You can do this in the helper method,
var rows = MyTable.find({search: "Something"}).limit(8).fetch();
for (var i=rows.length;i<8;i++) {
rows.push({name: "Empty data row "+i}):
}
return rows;

MongoDB, Atomic Level Operation

i want to ask some info related findAndModify in MongoDB.
As i know the query is "isolated by document".
This mean that if i run 2 findAndModify like this:
{a:1},{set:{status:"processing", engine:1}}
{a:1},{set:{status:"processing", engine:2}}
and this query potentially can effect 2.000 documents then because there are 2-query (2engine) then maybe that some document will have "engine:1" and someother "engine:2".
I don't think findAndModify will isolate the "first query".
In order to isolate the first query i need to use $isolated.
Is everything write what i have write?
UPDATE - scenario
The idea is to write an proximity engine.
The collection User has 1000-2000-3000 users, or millions.
1 - Order by Nearest from point "lng,lat"
2 - in NodeJS i make some computation that i CAN'T made in MongoDB
3 - Now i will group the Users in "UserGroup" and i write an Bulk Update
When i have 2000-3000 Users, then this process (from 1 to 3) take time.
So i want to have Multiple Thread in parallel.
Parallel thread mean parallel query.
This can be a problem since Query3 can take some users of Query1.
If this happen, then at point (2) i don't have the most nearest Users but the most nearest "for this query" because maybe another query have take the rest of Users. This can create maybe that some users in New York is grouped with users of Los Angeles.
UPDATE 2 - scenario
I have an collection like this:
{location:[lng,lat], name:"1",gender:"m", status:'undone'}
{location:[lng,lat], name:"2",gender:"m", status:'undone'}
{location:[lng,lat], name:"3",gender:"f", status:'undone'}
{location:[lng,lat], name:"4",gender:"f", status:'done'}
What i should be able to do, is create 'Group' of users by grouping by the most nearest. Each Group have 1male+1female. In the example above, i'm expecting to have only 1 group (user1+user3) since there are Male+Female and are so near each other (user-2 is also Male, but is far away from User-3 and also user-4 is also Female but have status 'done' so is already processed).
Now the Group are created (only 1 group) so the 2users are marked as 'done' and the other User-2 is marked as 'undone' for future operation.
I want to be able to manage 1000-2000-3000 users very fast.
UPDATE 3 : from community
Okay now. Can I please try to summarise your case. Given your data, you want to "pair" male and female entries together based on their proximity to each other. Presumably you don't want to do every possible match but just set up a list of general "recommendations", and let's say 10 for each user by the nearest location. Now I'd have to be stupid to not see the full direction of where this is going, but does this sum up the basic initial problem statement. Process each user, find their "pairs", mark them as "done" once paired and exclude them from other pairings by combination where complete?

This is a non-trivial problem and can not be solved easily.
First of all, an iterative approach (which admittedly was my first one) may lead to wrong results.
Given we have the following documents
{
_id: "A",
gender: "m",
location: { longitude: 0, latitude: 1 }
}
{
_id: "B",
gender: "f",
location: { longitude: 0, latitude: 3 }
}
{
_id: "C",
gender: "m",
location: { longitude: 0, latitude: 4 }
}
{
_id: "D",
gender: "f",
location: { longitude: 0, latitude: 9 }
}
With an iterative approach, we now would start with "A" and calculate the closest female, which, of course would be "B" with a distance of 2. However, in fact, the closest distance between a male and a female would be 1 (distance from "B" to "C"). But even when we found this, that would leave the other match, "A" and "D", at a distance of 8, where, with our previous solution, "A" would have had a distance of only 2 to "B".
So we need to decide what way to go
Naively iterate over the documents
Find the lowest sum of distances between matching individuals (which itself isn't trivial to solve), so that all participants together have the shortest travel.
Matching only participants within an acceptable distance
Do some sort of divide and conquer and match participants within a certain radius of a common landmark (say cities, for example)
Solution 1: Naively iterate over the documents
var users = db.collection.find(yourQueryToFindThe1000users);
// We can safely use an unordered op here,
// which has greater performance.
// Since we use the "done" array do keep track of
// the processed members, there is no drawback.
var pairs = db.pairs.initializeUnorderedBulkOp();
var done = new Array();
users.forEach(
function(currentUser){
if( done.indexOf(currentUser._id) == -1 ) { return; }
var genderToLookFor = ( currentUser.gender === "m" ) ? "f" : "m";
// using the $near operator,
// the returned documents automatically are sorted from nearest
// to farest, and since findAndModify returns only one document
// we get the closest matching partner.
var nearPartner = db.collection.findAndModify(
query: {
status: "undone",
gender: genderToLookFor,
$near: {
$geometry: {
type: "Point" ,
coordinates: currentUser.location
}
}
},
update: { $set: { "status":"done" } },
fields: { _id: 1}
);
// Obviously, the current use already is processed.
// However, we store it for simplifying the process of
// setting the processed users to done.
done.push(currentUser._id, nearPartner._id);
// We have a pair, so we store it in a bulk operation
pairs.insert({
_id:{
a: currentUser._id,
b: nearPartner._id
}
});
}
)
// Write the found pairs
pairs.execute();
// Mark all that are unmarked by now as done
db.collection.update(
{
_id: { $in: done },
status: "undone"
},
{
$set: { status: "done" }
},
{ multi: true }
)
Solution 2: Find the smallest sum of distances between matches
This would be the ideal solution, but it is extremely complex to solve. We need to all members of one gender, calculate all distances to all members of the other gender and iterate over all possible sets of matches. In our example it is quite simple, since there are only 4 combinations for any given gender. Thinking of it twice, this might be at least a variant of the traveling salesman problem (MTSP?). If I am right with that, the number of combinations should be
for all n>2, where n is the number of possible pairs.
and hence
for n=10
and an astonishing
for n=25
That's 7.755 quadrillion (long scale) or 7.755 septillion (short scale).
While there are approaches to solving this kind of problem, the world record is somewhere in the range of 25,000 nodes using massive amounts of hardware and quite tricky algorithms. I think for all practical purposes, this "solution" can be ruled out.
Solution 3
In order to prevent the problem that people might be matched with unacceptable distances between them and depending on your use case, you might want to match people depending on their distance to a common landmark (where they are going to meet, for example the next bigger city).
For our example assume we have cities at [0,2] and [0,7]. The distance (5) between the cities hence has to be our acceptable range for matches. So we do a query for each city
db.collection.find({
$near: {
$geometry: {
type: "Point" ,
coordinates: [ 2 , 0 ]
},
$maxDistance: 5
}, status: "done"
})
and iterate over the results naively. Since "A" and "B" would be the first in the result set, they would be matched and done. Bad luck for "C" here, as no girl is left for him. But when we do the same query for the second city he gets his second chance. Ok, his travel gets a bit longer, but hey, he got a date with "D"!
To find the respective distances, take a fixed set of cities (towns, metropolitan areas, whatever your scale is), order them by location and set each cities radius to the bigger of the two distances to their immediate neighbors. This way, you get overlapping areas. So even when a match can not be found in one place, it may be found on others.
Iirc, Google Maps allows it to grab the cities of a nation based on their size. An easier way would be to let people choose their respective city.
Notes
The code shown is not production ready and needs to be refined.
Instead of using "m" and "f" for denoting a gender, I suggest using 1 and 0: Can still be easily mapped, but needs less space to save.
Same goes for status.
I think the last solution is the best, optimizing distances some wayish and keeping the chances high for a match.

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse