Why is IQueryable used instead of IEnumerable with MEF catalogs? - mef

Why is it that the catalogs expose parts through IQueryable and not just IEnumerable. I've been thinking about it but I don't understand how (or if) they actually use any of the IQueryable services that interface provides.

Because it allows for implementations which don't have to scan all available parts (an O(N) operation) for each query.
To give a concrete example, consider the following query which might be similar to something that MEF does internally to find an export with the right contract:
var matches = catalog.Parts
.Where(part => part.ExportDefinitions.Any(
export => export.ContractName == "foo"));
The catalog implementation of IQueryProvider could recognize the resulting expression tree as "give me parts which export the contract 'foo'" and then retrieve them from a dictionary by using 'foo' as the key, an O(1) operation - instead of actually enumerating all parts and executing the lambda passed to .Where, as would be the case for an IEnumerable.
edit: my example above isn't really a good one because there already is a GetExports method specifically for this case; it wouldn't be necessary to query the Parts property like that. Perhaps a better example would involve export.Metadata.

Related

ToAsyncEnumerable().Single() vs SingleAsync()

I'm constructing and executing my queries in a way that's independent of EF-Core, so I'm relying on IQueryable<T> to obtain the required level of abstraction. I'm replacing awaited SingleAsync() calls with awaited ToAsyncEnumerable().Single() calls. I'm also replacing ToListAsync() calls with ToAsyncEnumerable().ToList() calls. But I just happened upon the ToAsyncEnumerable() method so I'm unsure I'm using it correctly or not.
To clarify which extension methods I'm referring to, they're defined as follows:
SingleAsync and ToListAsync are defined on the EntityFrameworkQueryableExtensions class in the Microsoft.EntityFrameworkCore namespace and assembly.
ToAsyncEnumerable is defined on the AsyncEnumerable class in the System.Linq namespace in the System.Interactive.Async assembly.
When the query runs against EF-Core, are the calls ToAsyncEnumerable().Single()/ToList() versus SingleAsync()/ToListAsync() equivalent in function and performance? If not then how do they differ?
For methods returning sequence (like ToListAsync, ToArrayAsync) I don't expect a difference.
However for single value returning methods (the async versions of First, FirstOrDefault, Single, Min, Max, Sum etc.) definitely there will be a difference. It's the same as the difference by executing those methods on IQueryable<T> vs IEnumerable<T>. In the former case they are processed by database query returning a single value to the client while in the later the whole result set will be returned to the client and processed in memory.
So, while in general the idea of abstracting EF Core is good, it will cause performance issues with IQueryable<T> because the async processing of queryables is not standartized, and converting to IEnumerable<T> changes the execution context, hence the implementation of single value returning LINQ methods.
P.S. By standardization I mean the following. The synchronous processing of IQueryable is provided by IQueryProvider (standard interface from System.Linq namespace in System.Core.dll assembly) Execute methods. Asynchronous processing would require introducing another standard interface similar to EF Core custom IAsyncQueryProvider (inside Microsoft.EntityFrameworkCore.Query.Internal namespace in Microsoft.EntityFrameworkCore.dll assembly). Which I guess requires cooperation/approval from the BCL team and takes time, that's why they decided to take a custom path for now.
When the original source is a DbSet, ToAsyncEnumerable().Single() is not as performant as SingleAsync() in the exceptional case where the database contains more than one matching row. But in in the more likely scenario, where you both expect and receive only one row, it's the same. Compare the generated SQL:
SingleAsync():
SELECT TOP(2) [l].[ID]
FROM [Ls] AS [l]
ToAsyncEnumerable().Single():
SELECT [l].[ID]
FROM [Ls] AS [l]
ToAsyncEnumerable() breaks the IQueryable call chain and enters LINQ-to-Objects land. Any downstream filtering occurs in memory. You can mitigate this problem by doing your filtering upstream. So instead of:
ToAsyncEnumerable().Single( l => l.Something == "foo" ):
SELECT [l].[ID], [l].[Something]
FROM [Ls] AS [l]
you can do:
Where( l => l.Something == "foo" ).ToAsyncEnumerable().Single():
SELECT [l].[ID], [l].[Something]
FROM [Ls] AS [l]
WHERE [l].[Something] = N'foo'
If that approach still leaves you squirmish then, as an alternative, consider defining extension methods like this one:
using System.Linq;
using System.Threading.Tasks;
using Microsoft.EntityFrameworkCore;
using Microsoft.EntityFrameworkCore.Query.Internal;
static class Extensions
{
public static Task<T> SingleAsync<T>( this IQueryable<T> source ) =>
source.Provider is IAsyncQueryProvider
? EntityFrameworkQueryableExtensions.SingleAsync( source )
: Task.FromResult( source.Single() );
}
According to the official Microsoft documentation for EF Core (all versions, including the current 2.1 one):
This API supports the Entity Framework Core infrastructure and is not intended to be used directly from your code. This API may change or be removed in future releases.
Source: https://learn.microsoft.com/en-us/dotnet/api/microsoft.entityframeworkcore.query.internal.asynclinqoperatorprovider.toasyncenumerable?view=efcore-2.1
p.s. I personally found it problematic in combination with the AutoMapper tool (at least, until ver. 6.2.2) - it just doesn't map collection of type IAsyncEnumerable (unlike IEnumerable, with which the AutoMapper works seamlessly).
I took a peek at the source code of Single (Line 90).
It cleary illustrates that the enumerator is only advanced once (for a successful operation).
using (var e = source.GetEnumerator())
{
if (!await e.MoveNext(cancellationToken)
.ConfigureAwait(false))
{
throw new InvalidOperationException(Strings.NO_ELEMENTS);
}
var result = e.Current;
if (await e.MoveNext(cancellationToken)
.ConfigureAwait(false))
{
throw new InvalidOperationException(Strings.MORE_THAN_ONE_ELEMENT);
}
return result;
}
Since this kind of implementation is as good as it gets (nowadays), one can say with certainty that using the Ix Single Operator would not harm performance.
As for SingleAsync, you can be sure that it is implemented in a similar manner, and even if it is not (which is doubtful), it could not outperform the Ix Single operator.

Common in scala's Array and List

I'm new to scala(just start learning it), but have figured out smth strange for me: there are classes Array and List, they both have such methods/functions as foreach, forall, map etc. But any of these methods aren't inherited from some special class(trait). From java perspective if Array and List provide some contract, that contract have to be declared in interface and partially implemented in abstract classes. Why do in scala each type(Array and List) declares own set of methods? Why do not they have some common type?
But any of these methods aren't inherited from some special class(trait)
That simply not true.
If you open scaladoc and lookup say .map method of Array and List and then click on it you'll see where it is defined:
For list:
For array:
See also info about Traversable and Iterable both of which define most of the contracts in scala collections (but some collections may re-implement methods defined in Traversable/Iterable, e.g. for efficiency).
You may also want to look at relations between collections (scroll to the two diagrams) in general.
I'll extend om-nom-nom answer here.
Scala doesn't have an Array -- that's Java Array, and Java Array doesn't implement any interface. In fact, it isn't even a proper class, if I'm not mistaken, and it certainly is implemented through special mechanisms at the bytecode level.
On Scala, however, everything is a class -- an Int (Java's int) is a class, and so is Array. But in these cases, where the actual class comes from Java, Scala is limited by the type hierarchy provided by Java.
Now, going back to foreach, map, etc, they are not methods present in Java. However, Scala allows one to add implicit conversions from one class to another, and, through that mechanism, add methods. When you call arr.foreach(println), what is really done is Predef.refArrayOps(arr).foreach(println), which means foreach belongs to the ArrayOps class -- as you can see in the scaladoc documentation.

Why use Collection.empty[T] instead of new Collection[T]()

I was wondering if there is a good reason to use Collection.empty[T] instead of new Collection[T]() (or the inverse) ? Or is it just a personal preference ?
Thanks.
Calling new Collection[T]() will create a new instance every time. On the other hand, Collection.empty[T] will most likely always return the same singleton object, usually defined somewhere as
object Empty extends Collection[Nothing] ...
which will be much faster. Edit: This is only possible for immutable collections, mutable collections have to return a new instance every time empty is called.
You should always prefer Collection.empty[Type].
In addition to Collection.empty[T] being clearer on the intent, you should favour it for the same reason that you should favour factory methods in general when instantiating a collection: because thoses factories abstract away some implementation details that you might not (or should not) care about.
By example, when you do Seq.empty[String] you actually get an instance of List[String]. You could directly instantiate a List[String] but if all you care about is to have some Seq you would introduce a needless dependency to List (well OK, actually you cannot as it stands, because List is already abstract, but let's pretend we can for the sake of the argument)
The whole point of factories is precisely to have some amount of separation of concern and not bother with unnecessary instantiation details.
As another more elaborate example, let's talk about collection.immutable.HashMap. This one is very much a concrete class so you might think there is no need for a factory here. Except that for optimization purpose the factory in the companion object collection.immutable.HashMap will actually create different sub-classes depending on the number of elements that you initialize the map with (see this question: Scala: how to make a Hash(Trie)Map from a Map (via Anorm in Play)). Obviously, if you directly instantiate collection.immutable.HashMap you will lose this optimization.
Another common optimization for empty is to always return (when it is an immutable collection) the same instance, yet another useful optimization that you would lose by directly instantiating the collection.
So as a rule of thumb, as far as you can you should use the factories that are provided by the various collection companion objects, so as to shield yourself from unneeded dependencies while at the same time benefiting from potential optimizations provided by the collection framework.
empty is just a special case of factory, and so the same logic applies.

How do I deal with Scala collections generically?

I have realized that my typical way of passing Scala collections around could use some improvement.
def doSomethingCool(theFoos: List[Foo]) = { /* insert cool stuff here */ }
// if I happen to have a List
doSomethingCool(theFoos)
// but elsewhere I may have a Vector, Set, Option, ...
doSomethingCool(theFoos.toList)
I tend to write my library functions to take a List as the parameter type, but I'm certain that there's something more general I can put there to avoid all the occasional .toList calls I have in the application code. This is especially annoying since my doSomethingCool function typically only needs to call map, flatMap and filter, which are defined on all the collection types.
What are my options for that 'something more general'?
Here are more general traits, each of which extends the previous one:
GenTraversableOnce
GenTraversable
GenIterable
GenSeq
The traits above do not specify whether the collection is sequential or parallel. If your code requires that things be executed sequentially (typically, if your code has side effects of any kind), they are too general for it.
The following traits mandate sequential execution:
TraversableOnce
Traversable
Iterable
Seq
LinearSeq
The first one, TraversableOnce only allows you to call one method on the collection. After that, the collection has been "used". In exchange, it is general enough to accept iterators as well as collections.
Traversable is a pretty general collection that has most methods. There are some things it cannot do, however, in which case you need to go to Iterable.
All Iterable implement the iterator method, which allows you to get an Iterator for that collection. This gives it the capability for a few methods not present in Traversable.
A Seq[A] implements the function Int => A, which means you can access any element by its index. This is not guaranteed to be efficient, but it is a guarantee that each element has an index, and that you can make assertions about what that index is going to be. Contrast this with Map and Set, where you cannot tell what the index of an element is.
A LinearSeq is a Seq that provides fast head, tail, isEmpty and prepend. This is as close as you can get to a List without actually using a List explicitly.
Alternatively, you could have an IndexedSeq, which has fast indexed access (something List does not provide).
See also this question and this FAQ based on it.
The most obvious one is to use Traversable as the most general trait which will have the goodies you want. However, I think you are generally better sticking to:
Seq
IndexedSeq
Set
Map
A Seq will cover List, Vector etc, IndexedSeq will cover Vector etc etc. I found myself not using Iterable because I often need (or want) to know the size of the thing I have and back pre scala-2.8 Iterable did not provide access to this, so I kept having to turn things into sequences anyway!
Looks like Traversable and Iterable now have size methods so maybe I should go back to using them! Of course you could start "going mad" with GenTraversableOnce but that is not likely to aid in readability.

How to workaround the XmlSerialization issue with type name aliases (in .NET)?

I have a collection of objects and for each object I have its type FullName and either its value (as string) or a subtree of inner objects details (again type FullName and value or a subtree).
For each root object I need to create a piece of XML that will be xml de-serializable.
The problem with XmlSerializer is that i.e. following object
int age = 33;
will be serialized to
<int>33</int>
At first this seems to be perfectly ok, however when working with reflection you will be using System.Int32 as type name and int is an alias and this
<System.Int32>33</System.Int32>
will not deserialize.
Now additional complexity comes from the fact that I need to handle any possible data type.
So solutions that utilize System.Activator.CreateInstance(..) and casting won't work unless I go down the path of code gen & compilation as a way of achieving this (which I would rather avoid)
Notes:
A quick research with .NET Reflector revealed that XmlSerializer uses internal class TypeScope, look at its static constructor to see that it initializes an inner HashTable with mappings.
So the question is what is the best (and I mean elegant and solid) way to workaround this sad fact?
I don't exactly see, where your problem originally stems from. XmlSerializer will use the same syntax/mapping for serializing as for deserializing, so there is no conflict when using it for both serializing and deserializing.
Probably the used type-tags are some xml-standard thing, but not sure about that.
I guess the problem is more your usage of reflection. Do you instantiate your
imported/deserialized objects by calling Activator.CreateInstance ?
I would recommend the following instead, if you have some type Foo to be created from the xml in xmlReader:
Foo DeserializedObject = (Foo)Serializer(typeof(Foo)).Deserialize(xmlReader);
Alternatively, if you don't want to switch to the XmlSerializer completely, you could do some preprocessing of your input. The standard way would then be to create some XSLT, in which you transform all those type-elements to their aliases or vice versa. then before processing the XML, you apply your transformation using System.Xml.Xsl.XslCompiledTransform and use your (or the reflector's) mappings for each type.
Why don't you serialize all the field's type as an attribute?
Instead of
<age>
<int>33</int>
</age>
you could do
<age type="System.Int32">
33
</age>