How to create/guarantee a unique timestamp in MongoDb collection using Spring Boot, to limit queries sorted by timestamp without paging - mongodb

The application is syncing data records between devices via an online Mongo DB collection. Multiple devices can send batches of new or modified records to the server Mongo collection at any time. Devices get all record updates for them that they don't already have, by requesting records added or modified since their last get request.
Approach 1 - was to add a Date object field (called stored1) to the records before saving to MongoDb. When a device requests records , mongoDb paging is used to skip entries up to the current page, and then limit to 1000. Now that the data set is large, each page request is taking a long time, and mongo hit a memory error.
https://docs.mongodb.com/manual/reference/limits/#operations
Setting allowDiskUse(true) as shown in the posted code in my current configuration isn't fixing the memory error for some reason. If that can be fixed, it still wouldn't be a long term solution as the query times with the paging are already too long.
Approach 2:
What is the best way for pagination on mongodb using java
https://arpitbhayani.me/blogs/benchmark-and-compare-pagination-approach-in-mongodb
The 2nd approach considered is to change from Mongo paging skipping returned records, to just asking for stored time > largest stored time last received, until the number of records in a return is less than the limit. This requires the stored timestamp to be unique between all records matching the query, or it could miss records or get duplicate records etc.
In the example code, using the stored2 field, there's still a chance of duplicate timestamps, even if the probability is low.
Mongo has a BSON timestamp that guarantees unique values per collection, but I don't see a way to use it with document save(), or query on it in Spring Boot. It would need to be set on each record newly inserted, or replaced, or updated.
https://docs.mongodb.com/manual/reference/bson-types/#timestamps
Any suggestions on how to do this?
#Getter
#Setter
public abstract class DataModel {
private Map<String, Object> data;
#Id // maps this field name to the database _id field, automatically indexed
private String uid;
/** Time this entry is written to the db (new or modified), to support querying for changes since last query */
private Date stored1; //APPROCAH 1
private long stored2; //APPROACH 2
}
/** SpringBoot+MongoDb database interface implementation */
#Component
#Scope("prototype")
public class SpringDb implements DbInterface {
#Autowired
public MongoTemplate db; // the database
#Override
public boolean set(Collection<?> newRecords, Collection<?> updatedRecords) {
// get current time for this set
Date date = new Date();
int randomOffset = ThreadLocalRandom.current().nextInt(0, 500000);
long startingNanoSeconds = Instant.now().getEpochSecond() * 1000000000L + instant.getNano() + randomOffset;
int ns = 0;
if (updatedRecords != null && updatedRecords.size() > 0) {
for (Object entry : updatedRecords) {
entry.setStored1(date); //APPROACH 1
entry.setStored2(startingNs + ns++); //APPROCH 2
db.save(entry, repoName);
}
}
// for new documents only
if (newRecords != null && newRecords.size() > 0) {
for (DataModel entry : newRecords) {
entry.setStored1(date); //APPROACH 1
entry.setStored2(startingNs + ns++); // APPROACH 2
}
//multi record insert
db.insert(newRecords, repoName);
}
return true;
}
#Override
public List<DataModel> get(Map<String, String> params, int maxResults, int page, String sortParameter) {
// generate query
Query query = buildQuery(params);
//APPROACH 1
// do a paged query
Pageable pageable = PageRequest.of(page, maxResults, Direction.ASC, sortParameter);
List<T> queryResults = db.find(query.allowDiskUse(true).with(pageable), DataModel.class, repoName); //allowDiskUse(true) not working, still get memory error
// count total results
Page<T> pageQuery = PageableExecutionUtils.getPage(queryResults, pageable,
() -> db.count(Query.of(query).limit(-1).skip(-1), clazz, getRepoName(clazz)));
// return the query results
queryResults = pageQuery.getContent();
//APPROACH 2
List<T> queryResults = db.find(query.allowDiskUse(true), DataModel.class, repoName);
return queryResults;
}
#Override
public boolean update(Map<String, String> params, Map<String, Object> data) {
// generate query
Query query = buildQuery(params);
//This applies the same changes to every entry
Update update = new Update();
for (Map.Entry<String, Object> entry : data.entrySet()) {
update.set(entry.getKey(), entry.getValue());
}
db.updateMulti(query, update, DataModel.class, repoName);
return true;
}
private Query buildQuery(Map<String, String> params) {
//...
}
}

The solution I ended up using was to define, and index on, another field called storedId, which is a string concatenation of the modified record storedTime, and the _id. This guarantees all these storedId record fields are unique, because _id is unique.
Here's an example to show how indexing and querying on the concatenated storedTime+_id field works, while indexing and querying on the separate storedTime and _id fields fails:
public abstract class DataModel {
private Map<String, Object> data;
#Indexed
private String _id; // Unique id
#Indexed
private String storedTime; // Time this entry is written to the db (new or modified)
#Indexed
String storedId; // String concatenation of storedTime and _id field
}
//Querying on separate fields and indexes:
{
//storedTime, _id
"time1", "id2"
"time1", "id3"
"time1", "id4"
"time2", "id1"
"time2", "id5"
}
get (storedTime>"time0", _id>"id0", limit=2) // returns _id's 2,3 (next query needs to check for more at storedTime="time1" but skip _id’s <="id3")
get (storedTime>="time1", _id>"id3", limit=2) // returns _id's 4,5
//FAILS because this second query MISSES _id 1 (Note any existing _id record can be modified at any time, so the _id fields are not in storedTime order)
//Querying on the combined field and index:
{
//storedId
"time1-id2"
"time1-id3"
"time1-id4"
"time2-id1"
"time2-id5"
}
get (storedId>"time0", limit=2) // returns _id's 2,3 (next query for values greater than the greatest last value returned)
get (storedId>"time1-id3", limit=2) // returns _id's 4,1 (next query for values greater than the greatest last value returned)
get (storedId>"time2-id1", limit=2) //: returns _id 5
//WORKS, this doesn't miss or duplicate any records

Related

Morphia Aggregation taking more time than expected

I am using dev.morphia.aggregation.experimental Aggregation to fetch the data. But it takes almost 70 -110 ms to fetch the results.
I am not sure if this is normal or it can be improved.
I have 8078801 documents present in the DB. Indexes on -
compund index : D_ID, L_ID, P_ID
_id field
here is my method :
public AnnoPage find(String dId, String lId, String pId, List<Type> aTypes) {
List<Filter> filters =
new ArrayList<>(
Arrays.asList(eq(D_ID, dId),
eq(L_ID, lId),
eq(P_ID, pId));
Aggregation<Page> query = datastore.aggregate(Page.class)
.match(filters.toArray(new Filter[0]));
query = filter(query, aTypes);
return query.execute(Page.class).tryNext();
}
private Aggregation<Page> filter(Aggregation<Page> pageQuery,
List<Type> aTypes) {
if (aTypes.isEmpty()) {
return pageQuery;
}
List<String> dcTypes = getDcTypes(aTypes);
return pageQuery.project(Projection.project()
.include(D_ID)
.include(L_ID)
.include(P_ID)
.include(RESOURCE)
.include(MODIFIED)
.include(DELETED)
.include("ans",
filter(field("ans"),
ArrayExpressions.in(value("$$ann.type"),
value(dcTypes))).as("annotation")));
}

SpringData Cassandara findAll method returns only one record

Creating a project with SpringData using Reactive Cassandra repository. I have sample Book application where I wrote custom query.
#GetMapping("/books2")
public Flux<Book2> getBooks(#Valid #RequestBody Book2 book ){
MapId id1 = id( "id", book.getId()).with("isbn", book.getIsbn());
if(Objects.nonNull(book.getName()))
id1.with( "name", book.getName());
if(Objects.nonNull(book.getLocalDate()))
id1.with( "localDate", book.getLocalDate());
return book2Repository.findAllById( Collections.singletonList(id1));
}
I have many rows but return result is only one.
Looking into code SimpleReactiveCassandraRepository.java,
public Flux<T> findAllById(Iterable<ID> ids) {
Assert.notNull(ids, "The given Iterable of ids must not be null");
if (FindByIdQuery.hasCompositeKeys(ids)) {
return this.findAllById((Publisher)Flux.fromIterable(ids));
} else {
FindByIdQuery query = FindByIdQuery.forIds(ids);
List<Object> idCollection = query.getIdCollection();
....
....
public Flux<T> findAllById(Publisher<ID> idStream) {
Assert.notNull(idStream, "The given Publisher of ids must not be null");
return Flux.from(idStream).flatMap(this::findById);
}
The findAllById seem to check if Query has composite key and calls "findAllById" which seem to call findById , which return single record.
How do i get to return multiple rows ?
Is this a bug ?
I tried with 2.2.7 and 3.0.1 spring-data-cassandara and results seems same.

SpringBatch JpaPagingItemReader SortOrder

I am using SpringBatch version 3.0.7, Hibernate 4.3.11, and H2 database. When using the JpaPagingItemReader, does the JPQL require a unique sort order? I see that it is required for the JdbcPagingItemReader (see BATCH-2465).
In a Step I am using a JpaPagingItemReader to load entities from the database and then write them to a flat file. I expect the flat file to contain unique entities sorted in the order specified by the JPQL. If I set the page size to something small, like 1, and then provide a JPQL statement that sorts entities with a non unique key, I am seeing the same entity repeat multiple times in the output file. If I sort by a unique key, there are no "duplicates". If I set the page size >= total number of entities, so there is only 1 page, there are no "duplicates".
Empirically it would seem that the JpaPagingItemReader requires the JPQL to have a unique sort key.
Having a look at the implementation of JpaPagingItemReader, you'll find the method doReadPage():
#Override
#SuppressWarnings("unchecked")
protected void doReadPage() {
EntityTransaction tx = null;
if (transacted) {
tx = entityManager.getTransaction();
tx.begin();
entityManager.flush();
entityManager.clear();
}//end if
Query query = createQuery().setFirstResult(getPage() * getPageSize()).setMaxResults(getPageSize());
if (parameterValues != null) {
for (Map.Entry<String, Object> me : parameterValues.entrySet()) {
query.setParameter(me.getKey(), me.getValue());
}
}
if (results == null) {
results = new CopyOnWriteArrayList<T>();
}
else {
results.clear();
}
if (!transacted) {
List<T> queryResult = query.getResultList();
for (T entity : queryResult) {
entityManager.detach(entity);
results.add(entity);
}//end if
} else {
results.addAll(query.getResultList());
tx.commit();
}//end if
}
As you can see, for every page that is read, an new query is created for every Page. Therefore, it must be ensured that your query returns always the same amount of elements in the exact same order, and hence, it needs a 'unique sort key'. Otherwise you will have duplicates and missing entries (there will be a missing entry for every duplicate, since the total number of rows will be identical).

Having conditional multiple filters in Morphia query for Mongo database

Environment : MongoDb 3.2, Morphia 1.1.0
So lets say i am having a collection of Employees and Employee entity has several fields. I need to do something like apply multiple filters (conditional) and return a batch of 10 records per request.
pesudocode as below.
#Entity("Employee")
Employee{
String firstname,
String lastName,
int salary,
int deptCode,
String nationality
}
and in my EmployeeFilterRequesti carry the request parameter to the dao
EmployeeFilterRequest{
int salaryLessThen
int deptCode,
String nationality..
}
Pseudoclass
class EmployeeDao{
public List<Employee> returnList;
public getFilteredResponse(EmployeeFilterRequest request){
DataStore ds = getTheDatastore();
Query<Employee> query = ds.createQuery(Emploee.class).disableValidation();
//conditional request #1
if(request.filterBySalary){
query.filter("salary >", request.salary);
}
//conditional request #2
if(request.filterBydeptCode){
query.filter("deptCode ==", request.deptCode);
}
//conditional request #3
if(request.filterByNationality){
query.filter("nationality ==", request.nationality);
}
returnList = query.batchSize(10).asList();
/******* **THIS IS RETURNING ME ALL THE RECORDS IN THE COLLECTION, EXPECTED ONLY 10** *****/
}
}
SO as explained above in the code.. i want to perform conditional filtering on multiple fields. and even if batchSize is present as 10, i am getting complete records in the collection.
how to resolve this ???
Regards
Punith
Blakes is right. You want to use limit() rather than batchSize(). The batch size only affects how many documents each trip to the server comes back with. This can be useful when pulling over a lot of really large documents but it doesn't affect the total number of documents fetched by the query.
As a side note, you should be careful using asList() as it will create objects out of every document returned by the query and could exhaust your VM's heap. Using fetch() will let you incrementally hydrate documents as you need each one. You might actually need them all as a List and with a size of 10 this is probably fine. It's just something to keep in mind as you work with other queries.

MongoDB C# Remove doesn't work

i have this code for removing an item froma a mongofb collation
private MongoCollection<T> GetCollection()
{
connectionString = "mongodb://localhost/?safe=true";
server = MongoServer.Create(connectionString);
database = server.GetDatabase("CSCatalog");
return database.GetCollection<T>("myCollectionName");
}
public bool Delete(T entity)
{
var id = typeof(T).GetProperty("Id").GetValue(entity,null).ToString();
var query = Query.EQ("_id",id);
var finded = GetCollection().Find(query); // return null
var result= GetCollection().Remove(query, MongoDB.Driver.RemoveFlags.Single); // no errors, but don't remove
return esito.Ok; //return true but donn't remove.
}
the GetCollection() method retrive the right collection, i have tested it width debug.
In the collection there is the item that i want remove, it have the same id that i have retrived in first line.
the entity have some fields and a Objectid filed called "Id"
the type of _id you created is ObjectId class and you are trying to equate with string so its not able to remove. use
var queryId = new ObjectId(id);
Your finded variable should not be null if the .find() has returned something from your database. That it is null means that you have not found anything, and therefore nothing is to be removed.
What it looks like is happening here is that you are querying on _id for the ObjectId, while you are storing that ObjectId in the database as Id.