Having conditional multiple filters in Morphia query for Mongo database - mongodb

Environment : MongoDb 3.2, Morphia 1.1.0
So lets say i am having a collection of Employees and Employee entity has several fields. I need to do something like apply multiple filters (conditional) and return a batch of 10 records per request.
pesudocode as below.
#Entity("Employee")
Employee{
String firstname,
String lastName,
int salary,
int deptCode,
String nationality
}
and in my EmployeeFilterRequesti carry the request parameter to the dao
EmployeeFilterRequest{
int salaryLessThen
int deptCode,
String nationality..
}
Pseudoclass
class EmployeeDao{
public List<Employee> returnList;
public getFilteredResponse(EmployeeFilterRequest request){
DataStore ds = getTheDatastore();
Query<Employee> query = ds.createQuery(Emploee.class).disableValidation();
//conditional request #1
if(request.filterBySalary){
query.filter("salary >", request.salary);
}
//conditional request #2
if(request.filterBydeptCode){
query.filter("deptCode ==", request.deptCode);
}
//conditional request #3
if(request.filterByNationality){
query.filter("nationality ==", request.nationality);
}
returnList = query.batchSize(10).asList();
/******* **THIS IS RETURNING ME ALL THE RECORDS IN THE COLLECTION, EXPECTED ONLY 10** *****/
}
}
SO as explained above in the code.. i want to perform conditional filtering on multiple fields. and even if batchSize is present as 10, i am getting complete records in the collection.
how to resolve this ???
Regards
Punith

Blakes is right. You want to use limit() rather than batchSize(). The batch size only affects how many documents each trip to the server comes back with. This can be useful when pulling over a lot of really large documents but it doesn't affect the total number of documents fetched by the query.
As a side note, you should be careful using asList() as it will create objects out of every document returned by the query and could exhaust your VM's heap. Using fetch() will let you incrementally hydrate documents as you need each one. You might actually need them all as a List and with a size of 10 this is probably fine. It's just something to keep in mind as you work with other queries.

Related

Validate data exits using EF core best practices

List list = new List();
I have a list of Guid. What is the best to check all guid exits or not using ef core table?
I am currently using the below code but the performance is very bad. assume user table as 1 million records.
for Example
public async Task<bool> IsIdListValid(IEnumerable<int> idList)
{
var validIds = await _context.User.Select(x => x.Id).ToListAync();
return idList.All(x => validIds.Contains(x));
}
The performance is bad because you are reading each row of the table into memory, and then iterating through it (ToList materializes the query.) Try using the Any() method to take advantage of the strength of the database. Use something like the following: bool exists = _context.User.Any(u => idList.Contains(u));. This should translate to an SQL IN clause.
Provided you assert that the # of IDs being sent in is kept reasonable, you could do the following:
var idCount = _context.User.Where(x => idList.Contains(x.Id)).Count();
return idCount == idList.Count;
This assumes that you are comparing on a unique constraint like the PK. We get a count of how many rows have a matching ID from the list, then compare that to the count of IDs sent.
If you're passing a large # of IDs, you would need to break the list up into reasonable sets as there are limits to what you can do with an IN clause and potential performance costs as well.

JPQL In clause error - Statement too complex

Following is the code which is blowing up if the list which is being passed in to "IN" clause has several values. In my case the count is 1400 values. Also the customer table has several thousands (arround 100,000) of records in it. The query is executing against DERBY database.
public List<Customer> getCustomersNotIn(String custType, List<Int> customersIDs) {
TypedQuery<Customer> query = em.createQuery("from Customer where type=:custType and customerId not in (:customersIDs)", Customer.class);
query.setParameter("custType", custType);
query.setParameter("customersIDs", customersIDs);
List<Customer> customerList = query.getResultList();
return customerList;
}
The above mentioned method perfectly executes if the list has less values ( probably less than 1000 ), if the list customersIDs has more values since the in clause executes based on it, it throws an error saying "Statement too complex"
Since i am new to JPA can any one please tell me how to write the above mention function in the way described below.. * PLEASE READ COMMENTS IN CODE *
public List<Customer> getCustomersNotIn(String custType, List<Int> customersIDs) {
// CREATE A IN-MEMORY TEMP TABLE HERE...
// INSERT ALL VALUES FROM customerIDs collection into temp table
// Change following query to get all customers EXCEPT THOSE IN TEMP TABLE
TypedQuery<Customer> query = em.createQuery("from Customer where type=:custType and customerId not in (:customersIDs)", Customer.class);
query.setParameter("custType", custType);
query.setParameter("customersIDs", customersIDs);
List<Customer> customerList = query.getResultList();
// REMOVE THE TEMP TABLE FROM MEMORY
return customerList;
}
The Derby IN clause support does have a limit on the number of values that can be supplied in the IN clause.
The limit is related to an underlying limitation in the size of a single function in the Java bytecode format; Derby currently implements IN clause execution by generating Java bytecode to evaluate the IN clause, and if the generated bytecode would exceed the JVM's basic limitations, Derby throws the "statement too complex" error.
There have been discussions about ways to fix this, for example see:
DERBY-6784
DERBY-6301, or
DERBY-216
But for now, your best approach is probably to find a way to express your query without generating such a large and complex IN clause.
Ok here is my solution that worked for me. I could not change the part generating the customerList since it is not possible for me, so the solution has to be from within this method. Bryan your explination was the best one, i am still confuse how "in" clause worked perfectly with table. Please see below solution.
public List<Customer> getCustomersNotIn(String custType, List<Int> customersIDs) {
// INSERT customerIds INTO TEMP TABLE
storeCustomerIdsIntoTempTable(customersIDs)
// I AM NOT SURE HOW BUT, "not in" CLAUSE WORKED INCASE OF TABLE BUT DID'T WORK WHILE PASSING LIST VALUES.
TypedQuery<Customer> query = em.createQuery("select c from Customer c where c.customerType=:custType and c.customerId not in (select customerId from TempCustomer)");
query.setParameter("custType", custType);
List<Customer> customerList = query.getResultList();
// REMOVE THE DATA FROM TEMP TABLE
deleteCustomerIdsFromTempTable()
return customerList;
}
private void storeCustomerIdsIntoTempTable(List<Int> customersIDs){
// I ENDED UP CREATING TEMP PHYSICAL TABLE, INSTEAD OF JUST IN MEMORY TABLE
TempCustomer tempCustomer = null;
try{
tempCustomerDao.deleteAll();
for (Int customerId : customersIDs) {
tempCustomer = new TempCustomer();
tempCustomer.customerId=customerId;
tempCustomerDao.save(tempCustomer);
}
}catch(Exception e){
// Do logging here
}
}
private void deleteCustomerIdsFromTempTable(){
try{
// Delete all data from TempCustomer table to start fresh
int deletedCount= tempCustomerDao.deleteAll();
LOGGER.debug("{} customers deleted from temp table", deletedCount);
}catch(Exception e){
// Do logging here
}
}
JPA and the underlying Hibernate simply translate it into a normal JDBC-understood query. You wouldn't write a query with 1400 elements in the IN clause manually, would you? There is no magic. Whatever doesn't work in normal SQL, wouldn't in JPQL.
I am not sure how you get that list (most likely from another query) before you call that method. Your best option would be joining those tables on the criteria used to get those IDs. Generally you want to execute correlated filters like that in one query/transaction which means one method instead of passing long lists around.
I also noticed your customerId is double - a poor choice for a PK. Typically people use long (autoincremented/sequenced, etc.) And I don't get the "temp table" logic.

Count in jpa without getting result [duplicate]

I like the idea of Named Queries in JPA for static queries I'm going to do, but I often want to get the count result for the query as well as a result list from some subset of the query. I'd rather not write two nearly identical NamedQueries. Ideally, what I'd like to have is something like:
#NamedQuery(name = "getAccounts", query = "SELECT a FROM Account")
.
.
Query q = em.createNamedQuery("getAccounts");
List r = q.setFirstResult(s).setMaxResults(m).getResultList();
int count = q.getCount();
So let's say m is 10, s is 0 and there are 400 rows in Account. I would expect r to have a list of 10 items in it, but I'd want to know there are 400 rows total. I could write a second #NamedQuery:
#NamedQuery(name = "getAccountCount", query = "SELECT COUNT(a) FROM Account")
but it seems a DRY violation to do that if I'm always just going to want the count. In this simple case it is easy to keep the two in sync, but if the query changes, it seems less than ideal that I have to update both #NamedQueries to keep the values in line.
A common use case here would be fetching some subset of the items, but needing some way of indicating total count ("Displaying 1-10 of 400").
So the solution I ended up using was to create two #NamedQuerys, one for the result set and one for the count, but capturing the base query in a static string to maintain DRY and ensure that both queries remain consistent. So for the above, I'd have something like:
#NamedQuery(name = "getAccounts", query = "SELECT a" + accountQuery)
#NamedQuery(name = "getAccounts.count", query = "SELECT COUNT(a)" + accountQuery)
.
static final String accountQuery = " FROM Account";
.
Query q = em.createNamedQuery("getAccounts");
List r = q.setFirstResult(s).setMaxResults(m).getResultList();
int count = ((Long)em.createNamedQuery("getAccounts.count").getSingleResult()).intValue();
Obviously, with this example, the query body is trivial and this is overkill. But with much more complex queries, you end up with a single definition of the query body and can ensure you have the two queries in sync. You also get the advantage that the queries are precompiled and at least with Eclipselink, you get validation at startup time instead of when you call the query.
By doing consistent naming between the two queries, it is possible to wrap the body of the code to run both sets just by basing the base name of the query.
Using setFirstResult/setMaxResults do not return a subset of a result set, the query hasn't even been run when you call these methods, they affect the generated SELECT query that will be executed when calling getResultList. If you want to get the total records count, you'll have to SELECT COUNT your entities in a separate query (typically before to paginate).
For a complete example, check out Pagination of Data Sets in a Sample Application using JSF, Catalog Facade Stateless Session, and Java Persistence APIs.
oh well you can use introspection to get named queries annotations like:
String getNamedQueryCode(Class<? extends Object> clazz, String namedQueryKey) {
NamedQueries namedQueriesAnnotation = clazz.getAnnotation(NamedQueries.class);
NamedQuery[] namedQueryAnnotations = namedQueriesAnnotation.value();
String code = null;
for (NamedQuery namedQuery : namedQueryAnnotations) {
if (namedQuery.name().equals(namedQueryKey)) {
code = namedQuery.query();
break;
}
}
if (code == null) {
if (clazz.getSuperclass().getAnnotation(MappedSuperclass.class) != null) {
code = getNamedQueryCode(clazz.getSuperclass(), namedQueryKey);
}
}
//if not found
return code;
}

How do I get the date a MongoDB collection was created using MongoDB C# driver?

I need to iterate through all of the collections in my MongoDB database and get the time when each of the collections was created (I understand that I could get the timestamp of each object in the collection, but I would rather not go that route if a simpler/faster method exists).
This should give you an idea of what I'm trying to do:
MongoDatabase _database;
// code elided
var result = _database.GetAllCollectionNames().Select(collectionName =>
{
_database.GetCollection( collectionName ) //.{GetCreatedDate())
});
As far as I know, MongoDB doesn't keep track of collection creation dates. However, it's really easy to do this yourself. Add a simple method, something like this, and use it whenever you create a new collection:
public static void CreateCollectionWithMetadata(string collectionName)
{
var result = _db.CreateCollection(collectionName);
if (result.Ok)
{
var collectionMetadata = _db.GetCollection("collectionMetadata");
collectionMetadata.Insert(new { Id = collectionName, Created = DateTime.Now });
}
}
Then whenever you need the information just query the collectionMetadata collection. Or, if you want to use an extension method like in your example, do something like this:
public static DateTime GetCreatedDate(this MongoCollection collection)
{
var collectionMetadata = _db.GetCollection("collectionMetadata");
var metadata = collectionMetadata.FindOneById(collection.Name);
var created = metadata["Created"].AsDateTime;
return created;
}
The "creation date" is not part of the collection's metadata. A collection does not "know" when it was created. Some indexes have an ObjectId() which implies a timestamp, but this is not consistent and not reliable.
Therefore, I don't believe this can be done.
Like Mr. Gates VP say, there is no way using the metadata... but you can get the oldest document in the collection and get it from the _id.
Moreover, you can insert an "empty" document in the collection for that purpose without recurring to maintain another collection.
And it's very easy get the oldest document:
old = db.collection.find({}, {_id}).sort({_id: 1}).limit(1)
dat = old._id.getTimestamp()
By default, all collection has an index over _id field, making the find efficient.
(I using MongoDb 3.6)
Seems like it's some necroposting but anyway: I tried to find an answer and got it:
Checked it in Mongo shell, don't know how to use in C#:
// db.payload_metadata.find().limit(1)
ObjectId("60379be2bec7a3c17e6b662b").getTimestamp()
ISODate("2021-02-25T12:45:22Z")

MongoDB C# offic. List<BsonObject> query issue and always olds values?

I have not clearly issue during query using two criterials like Id and Other. I use a Repository storing some data like id,iso,value. I have created an index("_id","Iso") to performs queries but queries are only returning my cursor if i use only one criterial like _id, but is returning nothing if a use two (_id, Iso) (commented code).
Are the index affecting the response or the query method are failing?
use :v1.6.5 and C# official.
Sample.
//Getting Data
public List<BsonObject> Get_object(string ID, string Iso)
{
using (var helper = BsonHelper.Create())
{
//helper.Db.Repository.EnsureIndex("_Id","Iso");
var query = Query.EQ("_Id", ID);
//if (!String.IsNullOrEmpty(Iso))
// query = Query.And(query, Query.EQ("Iso", Iso));
var cursor = helper.Db.Repository.FindAs<BsonObject>(query);
return cursor.ToList();
}
}
Data:
{
"_id": "2345019",
"Iso": "UK",
"Data": "Some data"
}
After that I have Updated my data using Update.Set() methods. I can see the changed data using MongoView. The new data are correct but the query is always returning the sames olds values. To see these values i use a page that can eventually cached, but if add a timestamp at end are not changing anything, page is always returning the same olds data. Your comments are welcome, thanks.
I do not recall offhand how the C# driver creates indexes, but the shell command for creating an index is like this:
db.things.ensureIndex({j:1});
Notice the '1' which is like saying 'true'.
In your code, you have:
helper.Db.Repository.EnsureIndex("_Id","Iso");
Perhaps it should be:
helper.Db.Repository.EnsureIndex("_Id", 1);
helper.Db.Repository.EnsureIndex("Iso", 1);
It could also be related to the fact that you are creating indexes on "_Id" and the actual id field is called "_id" ... MongoDB is case sensitive.
Have a quick look through the index documentation: http://www.mongodb.org/display/DOCS/Indexes