Preventing duplicates in MongoDB using Spring Data (Spring Roo) - mongodb

I have been trying to get my head wrapped around MongoDB, as it's used by Spring, so I decided to start a little project in Spring Roo.
In my project, I am storing my User Login data to MongoDB. The trouble is that the registration process, which creates a new User object and stores it in the MongoDB, has a tendency to create duplicates despite the fact I have #Unique on the loginId field.
Now, I know part of the problem is that I am thinking about things from a JPA/RDBMS perspective, and MongoDB is not a relational DB and thus has a different set of parameters in which to operate with, but I having trouble finding guidance in anything more than a VERY simple sample code.
First, what Spring/Other annotations are available, and more importantly, commonly used when dealing with MongoDB from a Spring-world? Second, when dealing with documents that need to be "uniqued", how does one typically do this? Do you first search on the unique field to ensure it's not already there first, then do the insert? Third, in JPA-land, I could use the annotations #PrePersist and #PreUpdate to do last-minute data manipulation, like MD5-hashing passwords that have been updated or adding/updating a "Last Modified" date just prior to storing. I know this are JPA-isms, but can I still use those, and if not, is there an alternative for use with Spring Data/MongoDB?

I ended up using the #Id annotation on my Entities, which indicates which field is used as the id field. As long as the field is unique, writting subsequent updates will properly replace the existing entity instead of adding a new one.

I ended up creating additional method to check if there exists a data which have a duplicate value to the one we are entering.
If it exists, i return failure mentioning that there exist duplicate value. Otherwise it saves the newly entered value

Related

How to rehash and clean existing entities in the database?

We have an entity and a corresponding table in the database with one additional column which contains digested hash of the entity fields, calculated each time programmatically in application. Entity has associations with two additional tables/entities which fields also take part in hashing.
Now a decision was made to get rid of one of the fields from the main entity (boolean flag) and exclude it from hashing, since it makes two otherwise identical entities get different hashes when one entity has its flag set to true, while other is false. Since hashes are different both entities get stored in the database, which is not what we want.
Removing the field is simple, but we also need to re-calculate hashes for entities which have been already stored in the database. Since there might be duplicates, we also need to get rid of one of two duplicated entries. This whole operation must be done once after migration.
The stack we use is Quarkus, Flyway, Hibernate with Panache, and PostgreSQL. I have tried to use Flyway callbacks with Event.AFTER_MIGRATE to get all existing entities from the db, but I can't use Panache since its not initialised yet by the time callback hits. Using plain java.sql.* Connection and Statement is pretty cumbersome, cause I need to fetch data from 3 tables, create entity from all of the fields, re-calculate hash and put it back, while taking care of possible conflicts. Another option would be to create a new REST API endpoint specifically for the job which the client will have to call after app has booted, but somehow I don't feel that that is the best solution.
How do you tackle this kind of a situation?

Intercepting JPA query to calculate the key fields

I am quite new to JPA. I have a particular repository that uses the keys that have parts that are set by the caller and some values that are automatically calculated using these values. There is a need for this :)
Since the keys and entities are simple Java classes it appears to me that I need to put my code that modifies the key (or substitutes it with an internal one with additional values) is the repository implementation. However I do not think that copying the code from SimpleJpaRepository to my custom repositories is a good idea...I think that something should be possible with the entity manager. Basically what I need is proxy that gets called every time something like find() or delete() is called, takes the entity, updates its key, passes the call over to the real repository implementation.
Could someone point me to the right direction or an example that does something similar?
Thanks!
In JPA, you have a bunch of events for this, just chose the one that suits you best. It looks like you are looking for #PrePersist.
http://www.objectdb.com/api/java/jpa/annotations/callback
That said, if the data of these fields is calculated based only in the data of the other fields, it goes against database normalization. A more sensate approach would be make the calculated field #Transient and provide only the getters, that will calculate the values based in the persistent fields.

JPA Lazy Fetch Custom Query

I am using JPA/JFreeChart to display data I collected with a microcontroller, however, I measure 14 sensors every 10 seconds. I have been measuring for over 2 months and I have over 7000000 sets of data.
Now to my actual problem, since I don't want to load 7000000 rows every time I start my program, I only want to use average values by minutes/hours. I have thought of using a NamedQuery however I don't know how to keep the relationship within it and make JPA use it since up until now the loading of the data has been done by JPA itself. Maybe I can just solve this by adding more annotations to this?
#OneToMany(mappedBy="sensor")
#OrderBy("timestamp ASC")
public List<Value> getValues() {
return this.values;
}
Thanks in advance!
Best Regards
Straight JPA does not allow filtering results, since this means that the entity's relationship no longer reflects exactly what is in the database, and it would have to standardize behavior on what is done when adding an entity to the relationship that isn't in the collection, but already exists in the database.
The easiest way for this mapping though would be to mark the attribute as #Transient. You can then use the get method to read the values from the database using when needed, and cache them in the entity if you want.
Many providers do allow adding filters to the queries used to bring in mappings, for instance EclipseLink allows setting #AdditionalCriteria on the mapping as described here: http://wiki.eclipse.org/EclipseLink/Development/AdditionalCriteria Or you can modify the mapping directly as shown here: http://wiki.eclipse.org/EclipseLink/Examples/JPA/MappingSelectionCriteria

How do I tell EF to ignore columns from my database?

I'm building my EF (v4.0.30319) data model from my SQL Server database. Each table has Created and Updated fields populated via database triggers.
This database is the backend of a ASP.NET Web API application and I recently discovered a problem. The problem is, since the Created and Updated fields are not populated in the model passed into the api endpoint, they are written to the database as NULL. This overwrites any values already in the database for those properties.
I discovered I can edit the EF data model and just delete those two columns from the entity. It works and the datetimes are not overwritten with NULL. But this leads to another, less serious but more annoying, problem... my data model has a bunch of tables that contain these properties and all the tables need to be updated by removing these two columns.
Is there a way to tell EF to ignore certain columns in entities across the entire data model without manually deleting them?
As far as I know, no. Generating the model from the database is going to always create all of the fields from the database table. Well, as long as it has a primary key it will.
It is possible to only update the fields that you want i.e. don't include the "Created" and "Updated" fields in your create and update methods. I'd have to say though, that I'd think it'd be better if those fields didn't even exist on the model at that point. You may at one point see those fields on the model and not remember that they won't get persisted to the DB.
You may want to look into just inserting the datetimes into those fields when you call your create() and update() methods. Then you could just ditch the triggers. You'd obviously want to use a class library for all of your database operations so this functionality would be in one place. To keep it nice and neat, you know?

Managing changes in class structure to be consistent with mongodb collection

We are using mongodb with c#. We are trying to figure out a way to keep our collection consistent seamlessly. Right now, if a developer make any changes to the class structure(add a field or change data type or changing the property within a nested class) he/she has to change the mongo collection manually.
Its a pain as our project is growing and the developers working on the project keeps increasing. Was wondering whether someone already have figured out a way to manage this issue.
Research
I found a similar question. however, couldn't find the solution.
Found a way to find all properties Finding the properties; however, datatype and nested documents becomes an issue.
If you want to migrate gradually as records are accessed you need to follow a few simple rules:
1) If you add a field it had better be nullable or have a default value specified.
2) Never rename fields, never change field types
- Instead always add new fields, add migration code, remove the old fields only when all documents have been migrated over.
For prototyping with MongoDB and C# I build a dynamic wrapper ... that lets you specify your objects using only interfaces (no classes needed), and it lets you dynamically add new interfaces to an existing object. Not ready for production use but for prototyping it saves a lot of effort and makes migration really easy.