More efficient way of searching multiple arrays for singular item? - swift

I need to search multiple large arrays in search of one item. If that item is in at least one of the arrays, the result should be true. Between the three examples below, is one more efficient than the other? Or is there a more efficient way?
let arr1 = ["a", "b", "c"]
let arr2 = ["d", "e", "f"]
let arr3 = ["g", "h", "i"]
let q = "h"
if arr1.contains(q) || arr2.contains(q) || arr3.contains(q) {
print("y")
}
for arr in [arr1, arr2, arr3] {
if arr.contains(q) {
print("y")
break
}
}
if (arr1 + arr2 + arr3).contains(q) {
print("y")
}
And I suppose I should ask, if I'm able to make these arrays sets (if I can safely do that knowing they are all unique), would that change anything?

The first two approaches are equivalent - they stop processing as soon as they have found existing solution and do not create any additional lists, i.e. they scan existing lists and if e.g. the first list contains the target element, the second and third arrays will not be touched at all.
But the third option is different - it first concatenates all the arrays into a new one and then scan the new list. In this case if you have large arrays you will face doubled memory consumption at the moment of this operation. Also, it may take additional time to join lists into a new one.
Changing lists to check for a single element will not help in any way, because this transformation will scan the whole list, that is the same operation that is done during search. But if you have a large amount of search requests then it is reasonable to use sets, preferably sorted sets, e.g. tree set.

Assuming the arrays contain unsorted elements that are unique across all arrays:
Arrays:
Option 1 has a worst-case running time of . Option 2 has a worst-case running time of . Option 3 has a worst-case running time of
Eliminating constant factors, the asymptotic running time of all three is .
Sets:
Conversion from array to set is also a operation. It would make sense to just make one set and add everything to it. Once done, the .contains() lookup has a time complexity of .

Related

Scala Count Occurrences in Large List

In Scala I have a list of tuples List[(String, String)]. So now from this list I want to find how many times each unique tuple appears in the list.
One way to do this would be to apply groupby{ x => x} and then find the length. But here my data set it quite large and it's taking a lot of time.
So is there any better way of doing this?
I would do the counting manually, using a Map. Iterate over your collection/list. During the iteration, build a count map. Keys in the count map are unique items from the original collection/list, values are number of occurrences of the key. If the item being processed during the iteration is in the count collection, increase its value by 1. If not, add value 1 to the count map. You can use getOrElse:
count(current_item) = count.getOrElse(current_item, 0) + 1;
This should work faster than groupby, followed by length check. Will also require less memory.
Other suggestions, check also this discussion.

MongoDB Collection Structure Performance

I have a MongoDB database of semi-complex records and my reporting queries are struggling as the collection size increases. I want to make some reporting Views that are optimized for quick searching and aggregating. Here is an sample format:
var record = {
fieldOne:"",
fieldTwo:"",
fieldThree:"", //There is approx 30 fields at this level
ArrayOne:[
{subItem1:""},
{subItem2:""} // There are usually about 10-15 items in this array
],
ArrayTwo:[
{subItem1:""}, //ArrayTwo items reference ArrayOne item ids for ref
{subItem2:""} // There are usually about 20-30 items in this array
],
ArrayThree:[
{subItem1:""},// ArrayThree items reference both ArrayOne and ArrayTwo items for ref
{subItem2:""},// There are usually about 200-300 items in this array
{subArray:[
{subItem1:""},
{subItem2:""} // There are usually about 5 items in this array
]}
]
};
I used to have this data where ArrayTwo was inside ArrayOne items and ArrayThree was inside ArrayTwo items so that referencing a parent was implied, but reporting became a nightmare with multiple nested levels of arrays.
I have a field called 'fieldName' at every level which is a way we target objects in the arrays.
I will often need to aggregate values from any of the 3 arrays across thousands of records in a query.
I see two ways of doing it.
A). Flatten and go Vertically, making a single smaller record in the database for every item in ArrayThree, essentially adding 200 records per single complex record. I tried this and I already have 200K records in 5 days of new data coming in. The benefit to this is that I have fieldNames that I can put indexing on.
B). Flatten Horizontally, making every array flat all within a single collection record. I would use the FieldName located in each array object as the key. This would make a single record with 200-300 fields in it. This would make a lot less records in the collection, but the fields would be dynamic, so adding indexes would not be possible(that I know of).
At this time, I have approx 300K existing records that I would be building this View off of. If I go vertical, that would place 60 Million simple records in the db and if I go Horizontal, it would be 300K records with 200 fields flattened in each with no indexing ability.
What's the right way to approach this?
I'd be inclined to stick with the mongo philosophy and do individual entries for each distinct set/piece of information, rather than relying on references within a weird composite object.
60 Million records is "a lot" (but it really isn't "a ton"), and mongodb loves to have lots of little things tossed at it. On the flipside, you'd end up with fewer big objects and take up just as much space.
(*using the wired tiger back end with compression will make your disk go further too).
**edit:
I'd also add that you really really really want indexes at the end of the day, so that's another vote for this approach.

Selectively remove and delete objects from a NSMutableArray in Swift

Basic question. What is the best way to selectively remove and delete items from a mutable Array in Swift?
There are options that do NOT seem to be suited for this like calling removeObject inside a
for in loop
enumeration block
and others that appear to work in general like
for loop using an index + calling removeObjectAtIndex, even inside the loop
for in loop for populating an arrayWithItemsToRemove and then use originalArray.removeObjectsInArray(arrayWithItemsToRemove)
using .filter to create a new array seems to be really nice, but I am not quite sure how I feel about replacing the whole original array
Is there a recommended, simple and secure way to remove items from an array? One of those I mentioned or something I am missing?
It would be nice to get different takes (with pros and cons) or preferences on this. I still struggle choosing the right one.
If you want to loop and remove elements from a NSMutableArray based on a condition, you can loop the array in reverse order (from last index to zero), and remove the objects satisfying the condition.
For example, if you have an array of integers and want to remove the numbers divisible by three, you can run the loop like this:
var array: NSMutableArray = [1, 2, 3, 4, 5, 6, 7];
for index in stride(from: array.count - 1, through: 0, by: -1) {
if array[index] as Int % 3 == 0 {
array.removeObjectAtIndex(index)
}
}
Looping in reverse order ensures that the index of the array elements still to check doesn't change. In forward mode instead, if you remove for instance the first element, then the element previously at index 1 will change to index 0, and you have to account for that in the code.
Usage of removeObject (which doesn't work with the above code) is not recommended in a loop for performance reasons, because its implementation loops through all elements of the array and uses isEqualTo to determine whether to remove the object or not. The complexity order raises from O(n) to O(n^2) - in a worst case scenario, where all elements of the array are removed, the array is traversed once in the main loop, and traversed again for each element of the array. So all solution based on enumeration blocks, for-in, etc., should be avoided, unless you have a good reason.
filter instead is a good alternative, and it's what I'd use because:
it's concise and clear: 1 line of code as opposed to 5 lines (including closing brackets) of the index based solution
its performances are comparable to the index based solution - it is a bit slower, but I think not that much
It might not be ideal in all cases though, because, as you said, it generates a new array rather than operating in place.
When working with NSMutableArray you shouldn't remove objects while you are looping along the mutable array itself (unless looping backwards, as pointed out by Antonio's answer).
A common solution is to make an immutable copy of the array, iterate on the copy, and remove objects selectively on the original mutable array by calling "removeObject" or by calling "removeObjectAtIndex", but you will have to calculate the index, since indexes in the original array and the copy will not match because of the removals (you will have to decrement the "removal index" each time an object is removed).
Another solution (better) is to loop the array once, create an NSIndexSet with the indexes of the objects to remove, and then call "removeObjectsAtIndexes:" on the mutable array.
See documentation on NSMutableArray's "removeObjectsAtIndexes:" in Swift.
Some of the options:
For loop over indexes and calling removeObjectAtIndex: 1) You will have to deal with the fact that when you remove, the index of the following object will become the current index, so you have to make sure to not increment the index in that case; you can avoid this by iterating backwards. 2) Each call to removeObjectAtIndex is O(n) (since it must shift all following elements forwards), so the algorithm is O(n^2).
For loop to build a set of elements to remove and then calling removeObjectsInArray: The first part is O(n). removeObjectsInArray uses a hash table to test elements for removal efficiently; hash table access is O(1) on average but O(n) worst-case, so the algorithm is O(n) on average, but O(n^2) worst-case.
Using filter to create a new array: This is O(n). It creates a new array.
For loop to build an index set of indexes of elements to remove (or with indexesOfObjectsPassingTest), then remove them using removeObjectsAtIndexes: I believe this is O(n). It does not create a new array.
Use filterUsingPredicate using a predicate based on a block of your test: I believe this is also O(n). It does not create a new array.

Searching for subrow in an array

I have a problem with swift. I want to search an array of strings consists of 3 different elements filling this array in certain manner. Next i want to search the array for a subset of 3 particular strings next to each other and return their indexes. Is that a special array method for this?
Copied from comment:
Assumed i have an Int array like [1,2,1,3,2]. I want to search it for a subarray [1,2,1] and return indexes of those elements. Should i use findwithPredicate method?
I would just iterate the Array checking for matches.
Check the items in the array in sequence comparing the current item with the first item of the subarray. If it does not match move on. If it does start comparing the rest of the subarray, if it fails go bad to scanning the array.

Mongodb: Skip collection values from between (not a normal pagination)

I have browsed through various examples but have failed to find what I am looking for.. What I want is to search for a specific document by _id and skip multiple times between a collection by using one query? Or some alternative which is fast enough to my case.
Following query would skip first one and return second in advance:
db.posts.find( { "_id" : 1 }, { comments: { $slice: [ 1, 1 ] } } )
That would be skip 0, return 1 and leaves the rest out from result..
But what If there would be like 10000 comments and I would want to use same pattern, but return that array values like this:
skip 0, return 1, skip 2, return 3, skip 4, return 5
So that would return collection which comments would be size of 5000, because half of them is skipped away. Is this possible? I applied large number like 10000 because I fear that using multiple queries to apply this would not be performance wise.. (example shown in here: multiple queries to accomplish something similar). Thnx!
I went through several resources and concluded that currently this is impossible to make with one query.. Instead, I agreed on that there are only two options to overcome this problem:
1.) Make a loop of some sort and run several slice queries while increasing the position of a slice. Similar to resource I linked:
var skip = NUMBER_OF_ITEMS * (PAGE_NUMBER - 1)
db.companies.find({}, {$slice:[skip, NUMBER_OF_ITEMS]})
However, depending on the type of a data, I would not want to run 5000 individual queries to get only half of the array contents, so I decided to use option 2.) Which seems for me relatively fast and performance wise.
2.) Make single query by _id to row you want and before returning results to client or some other part of your code, skip your unwanted array items away by using for loop and then return the results. I made this at java side since I talked to mongo via morphia. I also used query explain() to mongo and understood that returning single line with array which has 10000 items while specifying _id criteria is so fast, that speed wasn't really an issue, I bet that slice skip would only be slower.