MongoDB and Large Datasets when using a Repository pattern - mongodb

Okay so at work we are developing a system using MVC C# & MongoDB. When first developing we decided it would probably be a good idea to follow the Repository pattern (what a pain in the ass!), here is the code to give an idea of what is currently implemented.
The MongoRepository class:
public class MongoRepository { }
public class MongoRepository<T> : MongoRepository, IRepository<T>
where T : IEntity
{
private MongoClient _client;
private IMongoDatabase _database;
private IMongoCollection<T> _collection;
public string StoreName {
get {
return typeof(T).Name;
}
}
}
public MongoRepository() {
_client = new MongoClient(ConfigurationManager.AppSettings["MongoDatabaseURL"]);
_database = _client.GetDatabase(ConfigurationManager.AppSettings["MongoDatabaseName"]);
/* misc code here */
Init();
}
public void Init() {
_collection = _database.GetCollection<T>(StoreName);
}
public IQueryable<T> SearchFor() {
return _collection.AsQueryable<T>();
}
}
The IRepository interface class:
public interface IRepository { }
public interface IRepository<T> : IRepository
where T : IEntity
{
string StoreNamePrepend { get; set; }
string StoreNameAppend { get; set; }
IQueryable<T> SearchFor();
/* misc code */
}
The repository is then instantiated using Ninject but without that it would look something like this (just to make this a simpler example):
MongoRepository<Client> clientCol = new MongoRepository<Client>();
Here is the code used for the search pages which is used to feed into a controller action which outputs JSON for a table with DataTables to read. Please note that the following uses DynamicLinq so that the linq can be built from string input:
tmpFinalList = clientCol
.SearchFor()
.OrderBy(tmpOrder) // tmpOrder = "ClientDescription DESC"
.Skip(Start) // Start = 99900
.Take(PageLength) // PageLength = 10
.ToList();
Now the problem is that if the collection has a lot of records (99,905 to be exact) everything works fine if the data in a field isn't very large for example our Key field is a 5 character fixed length string and I can Skip and Take fine using this query. However if it is something like ClientDescription can be much longer I can 'Sort' fine and 'Take' fine from the front of the query (i.e. Page 1) however when I page to the end with Skip = 99900 & Take = 10 it gives the following memory error:
An exception of type 'MongoDB.Driver.MongoCommandException' occurred
in MongoDB.Driver.dll but was not handled in user code
Additional information: Command aggregate failed: exception: Sort
exceeded memory limit of 104857600 bytes, but did not opt in to
external sorting. Aborting operation. Pass allowDiskUse:true to opt
in..
Okay so that is easy to understand I guess. I have had a look online and mostly everything that is suggested is to use Aggregation and "allowDiskUse:true" however since I use IQueryable in IRepository I cannot start using IAggregateFluent<> because you would then need to expose MongoDB related classes to IRepository which would go against IoC principals.
Is there any way to force IQueryable to use this or does anyone know of a way for me to access IAggregateFluent without going against IoC principals?
One thing of interest to me is why the sort works for page 1 (Start = 0, Take = 10) but then fails when I search to the end ... surely everything must be sorted for me to be able to get the items in order for Page 1 but shouldn't (Start = 99900, Take = 10) just need the same amount of 'sorting' and MongoDB should just send me the last 5 or so records. Why doesn't this error happen when both sorts are done?
ANSWER
Okay so with the help of #craig-wilson upgrading to the newest version of MongoDB C# drivers and changing the following in MongoRepository will fix the problem:
public IQueryable<T> SearchFor() {
return _collection.AsQueryable<T>(new AggregateOptions { AllowDiskUse = true });
}
I was getting a System.MissingMethodException but this was caused by other copies of the MongoDB drivers needing updated as well.

When creating the IQueryable from an IMongoCollection, you can pass in the AggregateOptions which allow you to set AllowDiskUse.
https://github.com/mongodb/mongo-csharp-driver/blob/master/src/MongoDB.Driver/IMongoCollectionExtensions.cs#L53

Related

Advice needed for 'Cannot use multiple context instances within a single query execution. Ensure the query uses a single context instance.'

I've been having an issue with one of my .net core services when using the join statement.
This is the problem code:
public List<Category> GetCategoryList()
{
var catlist = (from m in _mappingProductCategoryRepository.Table
join c in _categoryRepository.Table on m.CategoryId equals c.Id
select c).ToList();
return catlist;
}
and the error it throws is this:
InvalidOperationException: Cannot use multiple context instances within a single query execution. Ensure the query uses a single context instance.
Here is the relevant excerpt from my repository:
public partial class PSRepository<TEntity> : IRepository<TEntity> where TEntity : class
{
private readonly PSContext _db;
public PSRepository(PSContext PSContext)
{
_db = PSContext;
}
public virtual IQueryable<TEntity> Table => _db.Set<TEntity>();
}
And here is my context being registered in the startup.cs file
public void ConfigureContainer(ContainerBuilder builder)
{
builder.RegisterType<PSContext>().As<PSContext>().InstancePerDependency();
builder.RegisterGeneric(typeof(PSRepository<>)).As(typeof(IRepository<>)).InstancePerLifetimeScope();
builder.RegisterType<CategoryService>().As<ICategoryService>().InstancePerLifetimeScope();
}
I've changed the context registration from 'InstancePerDependency' to 'InstancePerLifetimeScope' so that it now looks like this:
builder.RegisterType<PSContext>().As<PSContext>().InstancePerLifetimeScope();
And it seems to have fixed the issue. So being somewhat new to .Net Core my question is whether or not this was the correct fix for the issue? Or have I created further issues for myself further down the line?
I realise I could get rid of the repository and use the context directly but I don't really want to do this.
Any help gratefully received

Getting JsonSerializationException

I'm having an issue trying to convert an object to json. The error is a Newtonsoft.Json.JsonSerializationException:
Self referencing loop detected for property 'Project' with type 'System.Data.Entity.DynamicProxies.Project_F29F70EF89942F6344C5B0A3A7910EF55268857CD0ECC4A484776B2F4394EF79'. Path '[0].Categories[0]'.
The problem is that the object (it's actually a list of objects) has a property which is another object that refers back to the first object:
public partial class Project
{
...
public virtual ICollection<Category> Categories { get; set; }
...
}
public partial class Category
{
...
public virtual Project Project { get; set; }
...
}
This is all fine and dandy as far as Entity Framework is concerned, but to convert this to json would result in an infinite regress, hence the exception.
Here is my code:
public async Task<HttpResponseMessage> GetProjects()
{
var projects = _projectService.GetProjects().ToList();
string jsonString = JsonConvert.SerializeObject(projects); // <-- Offending line
return Request.CreateResponse(HttpStatusCode.OK, jsonString);
}
I've looked online for solutions to this and I found this stackoverflow post:
JSON.NET Error Self referencing loop detected for type
They suggest three solutions, none of which work:
1) Ignore the circular reference:
public async Task<HttpResponseMessage> GetProjects()
{
var projects = _projectService.GetProjects().ToList();
JsonSerializerSettings settings = new JsonSerializerSettings()
{
ReferenceLoopHandling = ReferenceLoopHandling.Ignore
};
string jsonString = JsonConvert.SerializeObject(projects, settings);
return Request.CreateResponse(HttpStatusCode.OK, jsonString);
}
This resulted in the call to SerializeObject(...) hanging for a bit then throwing a System.OutOfMemoryException (which tells me the circular references were NOT being ignored).
Mind you, the author of this proposed solution at stackoverflow says to set the ignore setting in WebApiConfig.cs but I tried that and it has no effect.
He also says:
"If you want to use this fix in a non-api ASP.NET project, you can add the above line to Global.asax.cs, but first add: var config = GlobalConfiguration.Configuration;"
Mine's a web API with no global file so I shouldn't have to do this.
I also don't want to ignore circular references because I don't want to lose data.
2) Preserve the circular reference:
public async Task<HttpResponseMessage> GetProjects()
{
var projects = _projectService.GetProjects().ToList();
JsonSerializerSettings settings = new JsonSerializerSettings()
{
ReferenceLoopHandling = ReferenceLoopHandling.Serialize,
PreserveReferencesHandling = PreserveReferencesHandling.Objects
};
string jsonString = JsonConvert.SerializeObject(projects, settings);
return Request.CreateResponse(HttpStatusCode.OK, jsonString);
}
This just resulted in the request timing out because it would just hang.
Again, the author says to put this in WebApiConfig.cs, but again this had no effect.
3) Add ignore/preserve reference attributes to the objects and properties:
Ignoring Categories:
public partial class Project
{
...
[JsonIgnore]
public virtual ICollection<Category> Categories { get; set; }
...
}
This has no effect. I hover over the project list and see that it still has categories, and each category still has an instance of the project. I still get the same exception.
Again, even if this worked, I don't want to ignore the categories.
Preserve Categories:
[JsonObject(IsReference = true)]
public partial class Project
{
...
public virtual ICollection<Category> Categories { get; set; }
...
}
Again, same results.
Even if this method worked, the attributes wouldn't be preserved. I'd be doing it on Entity Framework classes which are re-generated automatically every time I recompile. (Is there a way to tell it to set these attributes in the model? Can I set them on the other half of the partial class?)
Alternatively, I'm open to suggestions other than converting to json and sending back in the response. Is there another way to get the data back to the client?
What would be the fix to this problem? Thanks.
Briefly
The best way to fix this problem is to create completely brand-new Models (xxxModel, xxxViewModel, xxxResponse, etc..) on Presentation layer which will be returned to end-users. Than just cast one object to another using AutoMapper or your own custom methods.
Keep your database entities separate from real world!
In detail
There are so many problems that you will encounter:
Disclosure of sensitive data. Your database entity could/will contain sensitive data which end-users shouldn't receive;
Performance issues and waste of RAM and CPU. It would be better to load only those properties that end-users is required, instead all;
Serialization problems. EF entities almost always contain Navigation properties which will be serialized together in case lazy-loading enabled. Imagine dozens related entities, which will be lazy-loaded when your composite root is being serialized. It will cause dozens unexpected requests to database;
Fragility. Any changes related your EF entities will affect on Presentation Layer and on end-users. For instance, in case with API, new added property just extend response, but deleted or renamed will break logic in your customers' application.
There are a lot of other problems, just be careful.
I would recommend not Serializing Entity Framework classes and creating a specific class that only inherits from Object and has only the data you need

Specify connection string for a query with DbContextScope project

I am currently using Mehdi El Gueddari's DbContextScope project, I think by the book, and it's awesome. But I came across a problem I'm unsure how to solve today. I have a query that I need to execute using a different database login/user because it requires additional permissions. I can create another connection string in my web.config, but I'm not sure how to specify that for this query, I want to use this new connection string. Here is my usage:
In my logic layer:
private static IDbContextScopeFactory _dbContextFactory = new DbContextScopeFactory();
public static Guid GetFacilityID(string altID)
{
...
using (_dbContextFactory.CreateReadOnly())
{
entity = entities.GetFacilityID(altID)
}
}
That calls into my data layer which would look something like this:
private AmbientDbContextLocator _dbcLocator = new AmbientDbContextLocator();
protected CRMEntities DBContext
{
get
{
var dbContext = _dbcLocator.Get<CRMEntities>();
if (dbContext == null)
throw new InvalidOperationException("No ambient DbContext....");
return dbContext;
}
}
public virtual Guid GetFaciltyID(string altID)
{
return DBContext.Set<Facility>().Where(f => f.altID = altID).Select(f => f.ID).FirstOrDefault();
}
Currently my connection string is set in the default way:
public partial class CRMEntities : DbContext
{
public CRMEntities()
: base("name=CRMEntities")
{}
}
Is it possible for this specific query to use a different connection string and how?
I ended up modifying the source code in a way that feels slightly hacky, but is getting the job done for now. I created a new IAmbientDbContextLocator with a Get<TDbContext> method override that accepts a connection string:
public TDbContext Get<TDbContext>(string nameOrConnectionString) where TDbContext : DbContext
{
var ambientDbContextScope = DbContextScope.GetAmbientScope();
return ambientDbContextScope == null ? null : ambientDbContextScope.DbContexts.Get<TDbContext>(nameOrConnectionString);
}
Then I updated the DbContextCollection to pass this parameter to the DbContext's existing constructor overload. Last, I updated the DbContextCollection maintain a Dictionary<KeyValuePair<Type, string>, DbContext> instead of a Dictionary<Type, DbContext> as its cached _initializedDbContexts where the added string is the nameOrConnectionString param. So in other words, I updated it to cache unique DbContext type/connection string pairs.
Then I can get at the DbContext with the connection I need like this:
var dbContext = new CustomAmbientDbContextLocator().Get<CRMEntities>("name=CRMEntitiesAdmin");
Of course you'd have to be careful your code doesn't end up going through two different contexts/connection strings when it should be going through the same one. In my case I have them separated into two different data access class implementations.

How to intercept a SELECT query

I am exploring Entity Framework 7 and I would like to know if there is a way to intercept a "SELECT" query. Every time an entity is created, updated or deleted I stamp the entity with the current date and time.
SELECT *
FROM MyTable
WHERE DeletedOn IS NOT NULL
I would like all my SELECT queries to exclude deleted data (see WHERE clause above). Is there a way to do that using Entity Framework 7?
I am not sure what your underlying infrastructure looks like and if you have any abstraction between your application and Entity Framework. Let's assume you are working with DbSet<T> you could write an extension method to exclude data that has been deleted.
public class BaseEntity
{
public DateTime? DeletedOn { get; set; }
}
public static class EfExtensions
{
public static IQueryable<T> ExcludeDeleted<T>(this IDbSet<T> dbSet)
where T : BaseEntity
{
return dbSet.Where(e => e.DeletedOn == null);
}
}
//Usage
context.Set<BaseEntity>().ExcludeDeleted().Where(...additional where clause).
I have somewhat same issue. I'm trying to intercept read queries like; select, where etc in order to look into the returned result set. In EF Core you don't have an equivalent to override SaveChanges for read queries, unfortunately.
You can however, still i Entity Framework Core, hook into commandExecuting and commandExecuted, by using
var listener = _context.GetService<DiagnosticSource>();
(listener as DiagnosticListener).SubscribeWithAdapter(new CommandListener());
and creating a class with following two methods
public class CommandListener
{
[DiagnosticName("Microsoft.EntityFrameworkCore.Database.Command.CommandExecuting")]
public void OnCommandExecuting(DbCommand command, DbCommandMethod executeMethod, Guid commandId, Guid connectionId, bool async, DateTimeOffset startTime)
{
//do stuff.
}
[DiagnosticName("Microsoft.EntityFrameworkCore.Database.Command.CommandExecuted")]
public void OnCommandExecuted(object result, bool async)
{
//do stuff.
}
}
However these are high lewel interceptors and hence you won't be able to view the returned result set (making it useless in your case).
I recommend two things, first go to and cast a vote on the implementation of "Hooks to intercept and modify queries on the fly at high and low level" at: https://data.uservoice.com/forums/72025-entity-framework-core-feature-suggestions/suggestions/1051569-hooks-to-intercept-and-modify-queries-on-the-fly-a
Second you can use PostSharp (a commercial product) by using interceptors like; LocationInterceptionAspect on properties or OnMethodBoundaryAspect for methods.

Testing EF ConcurrencyCheck

I have a base object, that contains a Version property, marked as ConcurrencyCheck
public class EntityBase : IEntity, IConcurrencyEnabled
{
public int Id { get; set; }
[ConcurrencyCheck]
[Timestamp]
public byte[] Version { get; set; }
}
This works, however, I want to write a test to ensure it doesn't get broken. Unfortunately, I can't seem to figure out how to write a test that doesn't rely on the physical database!
And the relevant test code that works, but uses the database...
protected override void Arrange()
{
const string asUser = "ConcurrencyTest1"; // used to anchor and lookup this test record in the db
Context1 = new MyDbContext();
Context2 = new MyDbContext();
Repository1 = new Repository<FooBar>(Context1);
Repository2 = new Repository<FooBar>(Context2);
UnitOfWork1 = new UnitOfWork(Context1);
UnitOfWork2 = new UnitOfWork(Context2);
Sut = Repository1.Find(x => x.CreatedBy.Equals(asUser)).FirstOrDefault();
if (Sut == null)
{
Sut = new FooBar
{
Name = "Concurrency Test"
};
Repository1.Insert(Sut);
UnitOfWork1.SaveChanges(asUser);
}
ItemId = Sut.Id;
}
protected override void Act()
{
_action = () =>
{
var item1 = Repository1.FindById(ItemId);
var item2 = Repository2.FindById(ItemId);
item1.Name = string.Format("Changed # {0}", DateTime.Now);
UnitOfWork1.SaveChanges("test1");
item2.Name = string.Format("Conflicting Change # {0}", DateTime.Now);
UnitOfWork2.SaveChanges("test2"); //Should throw DbUpdateConcurrencyException
};
}
[TestMethod]
[ExpectedException(typeof(DbUpdateConcurrencyException))]
public void Assert()
{
_action();
}
How can I remove the DB requirement???
I would recommend extracting your MyDbContext into an interface IMyDbContext, and then creating a TestDbContext class that will also implement SaveChanges the way you have it up there, except with returning a random value (like 1) instead of actually saving to the database.
At that point then all you'd need to do is to test that, in fact, all of the entities got their version number upped.
Or you could also do the examples found here or here, as well.
EDIT: I actually just found a direct example with using TimeStamp for concurrency checks on this blog post.
It's my opinion that you should not try to mock this behaviour to enable "pure" unit testing. For two reasons:
it requires quite a lot of code that mocks database behaviour: materializing objects in a way that they have a version value, caching the original objects (to mock a store), modifying the version value when updating, comparing the version values with the original ones, throwing an exception when a version is different, and maybe more. All this code is potentially subject to bugs and, worse, may differ slightly from what happens in reality.
you'll get trapped in circular reasoning: you write code specifically for unit tests and then... you write unit tests to test this code. Green tests say everything is OK, but essential parts of application code are not covered.
This is only one of the many aspects of linq to entities that are hard (impossible) to mock. I am compiling a list of these differences here.