Inferring type in queries instead of string predicates in Realm Swift? - swift

I'm considering moving from my Core Data wrapper to Realm for my app and one thing that's nagging is how Realm uses strings for their predicates instead of inferring the type in their queries.
For example, why do I have to do this:
Realm().objects(Dog).filter("age < 5").sorted("name")
Instead of the Swift way like this:
Realm().objects(Dog).filter { $0.age < 5 }.sorted { $0.name }
And I missing something, or is this really how you use Realm for Swift?

Using Swift's built-in collection filtering methods is less efficient than using Realm's NSPredicate interface for querying.
A key reason that Swift's built-in collection filtering is less efficient is that it requires allocating a Swift object for each object stored in the Realm. This is necessary as a Swift object must exist in memory for Swift to evaluate expressions such as $0.age < 5. Using NSPredicate allows Realm to translate the predicate into an internal query format that can be evaluated directly against the properties stored in the Realm, without allocating instances of the Swift model classes. The instances can then be lazily allocated as objects in the result set are accessed.
Realm's query execution engine can also perform more optimizations when it understands the semantics of the query being performed. For instance, indexes can be used to more efficiently execute queries when indexed properties are used. If the predicate were a Swift closure its behavior would be opaque to Realm, preventing these optimizations.
It's worth pointing out that NSPredicate is used for queries by Core Data too, for very similar reasons.

Related

NSManagedObjectID vs custom UUID identifier attribute - fetch performance

I would really like to avoid using NSManagedObjectID as a way to connect my model structs to their CoreData objects. I mean something like this:
Say I have a Book entity in CoreData and then I have a model struct like this representing it for my model layer:
struct BookModel {
let name: String
...
let objectID: NSManagedObjectID // I need this to refer back to the entry in the database
}
I don't like this approach. It makes working with the structs tedious and, for instance, testing is annoying because I always have to generate dummy objectIds or make BookModel.objectID optional.
What I would love to have is an id property of type UUID inside the Book entity. This would be so easy to connect to structs and also allows the structs to properly exist without a database:
struct BookModel {
let name: String
...
let id: UUID
...
func object() -> Book {
// Retrieve managed object using a fetch request with a predicate.
}
}
I've noticed that you can actually have UUID properties in an entity. However, the performance difference seems to be enormous. I've created an example that tries to fetch individual objects 10000 times.
First, I fetched them using the contexts object(with: NSManagedObjectID). I hard-coded all the possible objectIds in an array and passed a random one each time.
Then, I used a simple fetch request with a NSPredicate that got passed a random UUID.
The difference in execution time is significant:
With ObjectID: 0.015282376s
With UUID: 1.093346287s
However, the strange thing is that the first method didn't actually produce any SQL queries (I logged them using the launch argument -com.apple.CoreData.SQLDebug 4). This would explain the speed but not why it doesn't need to communicate with the database at all.
I researched a bit but can't really figure out what object(with: NSManagedObjectID) actually does behind the scenes.
Does this mean, using a "custom" UUID property is not a good idea? I would really appreciate any insights on this!
I would not rely on the NSManagedObjectID in your code. It makes your code dependent on Apple's database implementation, which may change at any time, and it would not make your app resilient against future changes.
By way of example, you would not be able to use the new NSPersistentCloudKitContainer. It does not support NSManagedObjectID: see https://developer.apple.com/documentation/coredata/mirroring_a_core_data_store_with_cloudkit/creating_a_core_data_model_for_cloudkit
Instead of hardcoding NSManagedObjectID you are better off giving your entities unique UUIDs, as you have suggested. This may or may not affect performance, but you are better off in the long run, as the underlying core database technologies will shift.
You should just use a String to represent the NSManagedObjectID. To convert from NSManagedObjectID to string is easy:
objectID.uriRepresentation().absoluteString
To convert from String to NSManagedObjectID is slightly more complicated:
if let url = URL(string: viewModel.id),
let objectID = context.persistentStoreCoordinator?.managedObjectID(forURIRepresentation: url)
This will make your model objects cleaner.
NSManagedObjectID is good to be used within one application on one device, but it should never be stored and referenced across different applications on different device. I think it is not true that NSManagedObjectID is not supported for CloudKit.
As per why object(with: NSManagedObjectID) is fast. The document says it returns:
The identified object, if its known to the context; otherwise, a fault
with its objectID property set to objectID.
This means that if the object has been loaded before, it will return it immediately, if it has not been loaded before, it will return a fault. If you want to trigger a SQL to happen for a good comparison, you need to access one of the attributes after you call object(with: NSManagedObjectID). I would assume the performance should be very similar to the one using UUID.

What does Realm Database save? Only variables or functions as well?

In this simple class:
class Simple: Object{
#objc var name: String = ""
func doSomething(){}
}
When I save this into Realm, what does get saved? The variable only or the function as well? The reason I am asking this, is because when I got a lot of Simple objects, I do not want to save the functions ofcourse. The objects would get bigger causing a negative influence on performance.
The variable. It creates a 'column' named "name". Check the realm docs.
Also if you have a lot of data and you would like to browse it you could do it with this Realm Browser where you can see clearly your realm database structure.
You should read through the official documentation and especially the part about supported model properties, which clearly mentions what you can persist in Realm objects.
You can only save properties of certain supported types (Int, String, etc.) or references to other Realm objects (as one-to-to, one-to-many or inverse relations), but you cannot save function references and it wouldn't make sense anyways.
You can add ignored properties and functions to your Realm model classes, but they will only exist in memory, they won't be saved to Realm. For functions this is all you actually need, it wouldn't make any sense to save a function to local storage.
Also, your current model is flawed as your name property is missing the dynamic keyword in its declaration and hence it cannot be treated as a Realm property.

Optimizing lazy collections

This question is about optimizing lazy collections. I will first explain the problem and then give some thoughts for a possible solution. Questions are in bold.
Problem
Swift expects operations on Collections to be O(1). Some operations, especially prefix and suffix-like types, deviate and are on the order of O(n) or higher.
Lazy collections can't iterate through the base collection during initialization since computation should be deferred for as long as possible until the value is actually needed.
So, how can we optimize lazy collections? And of course this begs the question, what constitutes an optimized lazy collection?
Thoughts
The most obvious solution is caching. This means that the first call to a collection's method has an unfavourable time complexity, but subsequent calls to the same or other methods can possibly be computed in O(1). We trade some space complexity to the order of O(n) for faster computation.
Attempting to optimize lazy collections on structs by using caching is impossible since subscript(_ position:) and all other methods that you'd need to implement to conform to LazyProtocolCollection are non-mutating and structs are immutable by default. This means that we have to recompute all operations for every call to a property or method.
This leaves us with classes. Classes are mutable, meaning that all computed properties and methods can internally mutate state. When we use classes to optimize a lazy collection we have two options. First, if the properties of the lazy type are variables then we're bringing ourselves into a world of hurt. If we change a property it could potentially invalidate previously cached results. I can imagine managing the code paths to make properties mutable to be headache inducing. Second, if we use lets we're good; the state set during initialization can't be changed so a cached result doesn't need to be updated. Note that we're only talking about lazy collections with pure methods without side effects here.
But classes are reference types. What are the downsides of using reference types for lazy collections? The Swift standard library doesn't use them for starters.
Any thoughts or thoughts on different approaches?
I completely agree with Alexander here. If you're storing lazy collections, you're generally doing something wrong, and the cost of repeated accesses is going to constantly surprise you.
These collections already blow up their complexity requirements, it's true:
Note: The performance of accessing startIndex, first, or any methods that depend on startIndex depends on how many elements satisfy the predicate at the start of the collection, and may not offer the usual performance given by the Collection protocol. Be aware, therefore, that general operations on LazyDropWhileCollection instances may not have the documented complexity.
But caching won't fix that. They'll still be O(n) on the first access, so a loop like
for i in 0..<xs.count { print(xs[i]) }
is still O(n^2). Also remember that O(1) and "fast" are not the same thing. It feels like you're trying to get to "fast" but that doesn't fix the complexity promise (that said, lazy structures are already breaking their complexity promises in Swift).
Caching is a net-negative because it makes the normal (and expected) use of lazy data structures slower. The normal way to use lazy data structures is to consume them either zero or one times. If you were going to consume them more than one time, you should use a strict data structure. Caching something that you never use is a waste of time and space.
There are certainly conceivable use cases where you have a large data structure that will be sparsely accessed multiple times, and so caching would be useful, but this isn't the use case lazy was built to handle.
Attempting to optimize lazy collections on structs by using caching is impossible since subscript(_ position:) and all other methods that you'd need to implement to conform to LazyProtocolCollection are non-mutating and structs are immutable by default. This means that we have to recompute all operations for every call to a property or method.
This isn't true. A struct can internally store a reference type to hold its cache and this is common. Strings do exactly this. They include a StringBuffer which is a reference type (for reasons related to a Swift compiler bug, StringBuffer is actually implemented as a struct that wraps a class, but conceptually it is a reference type). Lots of value types in Swift store internal buffer classes this way, which allows them to be internally mutable while presenting an immutable interface. (It's also important for CoW and lots of other performance and memory related reasons.)
Note that adding caching today would also break existing use cases of lazy:
struct Massive {
let id: Int
// Lots of data, but rarely needed.
}
// We have lots of items that we look at occassionally
let ids = 0..<10_000_000
// `massives` is lazy. When we ask for something it creates it, but when we're
// done with it, it's thrown away. If `lazy` forced caching, then everything
// we accessed would be forever. Also, if the values in `Massive` change over
// time, I certainly may want it to be rebuilt at this point and not cached.
let massives = ids.lazy.map(Massive.init)
let aMassive = massives[10]
This isn't to say a caching data structure wouldn't be useful in some cases, but it certainly isn't always a win. It imposes a lot of costs and breaks some uses while helping others. So if you want those other use cases, you should build a data structure that provides them. But it's reasonable that lazy is not that tool.
Swift's lazy collections are intended to provide one off access to elements. Subsequent access cause redundant computation (e.g. a lazy map sequence would recompute the transform closure.
In the case where you want repeated access to elements, it's best to just slice the portion of the lazy sequence/collection you care about, and create a proper Collection (e.g. an Array) out of it.
The book keeping overhead of lazily evaluating and caching each element would probably be greater than the benefits.

Why RealmCollectionType methods return Results and not RealmCollectionType/AnyRealmCollection?

I recently moved from Array to RealmCollectionType because it provides more effective filters. Now I want to migrate my unit-tests as well, but I don't like the In-memory Realm because it requires me to setup a lot of links and relations between my objects. I was trying to mock Results and LinkingObjects by conforming my mock to RealmCollectionType. Unfortunately, I'm stuck implementing filter operation because it should return Results which is declared as final.
What is the purpose of filters to narrow it's return type to Results?
RealmCollection.filter(...) returns a Results because that's the query result container in Realm. It does share some common interface elements with other collection types in Realm (like LinkingObjects and List), which is why it conforms to the RealmCollection protocol.
If you'd like to test code that's generic on say the Collection protocol in the Swift standard library, which RealmCollection inherits from, you can do so.

Scala, Morphia and Enumeration

I need to store Scala class in Morphia. With annotations it works well unless I try to store collection of _ <: Enumeration
Morphia complains that it does not have serializers for that type, and I am wondering, how to provide one. For now I changed type of collection to Seq[String], and fill it with invoking toString on every item in collection.
That works well, however I'm not sure if that is right way.
This problem is common to several available layers of abstraction on the top of MongoDB. It all come back to a base reason: there is no enum equivalent in json/bson. Salat for example has the same problem.
In fact, MongoDB Java driver does not support enums as you can read in the discussion going on here: https://jira.mongodb.org/browse/JAVA-268 where you can see the problem is still open. Most of the frameworks I have seen to use MongoDB with Java do not implement low-level functionalities such as this one. I think this choice makes a lot of sense because they leave you the choice on how to deal with data structures not handled by the low-level driver, instead of imposing you how to do it.
In general I feel that the absence of support comes not from technical limitation but rather from design choice. For enums, there are multiple way to map them with their pros and their cons, while for other data types is probably simpler. I don't know the MongoDB Java driver in detail, but I guess supporting multiple "modes" would have required some refactoring (maybe that's why they are talking about a new version of serialization?)
These are two strategies I am thinking about:
If you want to index on an enum and minimize space occupation, you will map the enum to an integer ( Not using the ordinal , please can set enum start value in java).
If your concern is queryability on the mongoshell, because your data will be accessed by data scientist, you would rather store the enum using its string value
To conclude, there is nothing wrong in adding an intermediate data structure between your native object and MongoDB. Salat support it through CustomTransformers, on Morphia maybe you would need to do the conversion explicitely. Go for it.