Change detection in data from webservice with Hibernate? - postgresql

My web application periodically fetches data from various different remote web services. The data (array of JSON objects) mostly stays the same, but minor changes and additions/deletions to that array can happen.
The objective is to keep a local database that is up to date with the fetched data. However, I also need to know which objects were modified, added or deleted whenever updating the local database.
Using Hibernate (and assuming nobody else modifies the database), what would be a good strategy to implement this? Would it be a good idea to keep a checksum column for identifying modifications? Table sizes can be up to a few 100 000 rows, but only around 10 000 rows are updated at once.

You can use Hibernate interceptors to gain access to the previous and new property state, but depending on your use case, it might be better to use a CDC solution like Debezium.

Related

Is there any way we can monitor all data modifications in intersystems cache?

I'm a newbie of intersystems cache. We have an old system using cache database, now we want to extract and transform all data of it to store into another different database such as PostgreSQL first, and then monitor all modifications of the original cache data to modify(new or update) our transformed data in PostgreSQL in time.
Is there any way we can monitor all data modifications in cache?
Does cache have got any modification/replication log just like mongodb's oplog?
Any idea would be appreciated, thanks!
In short, yes, there is a way to monitor data modifications. InterSystems uses journaling for some reasons, mostly related to keeping data consistent. In some situations, journals may have more records than in others. It used to rollback transactions, restore data in unexpected shutdowns, for backups, for mirroring, and so on.
But I think in your situation it may help. Most of the quite old applications on Caché does not use Objects and SQL tables and works just with globals as is. While in journals you will find only globals, so, you if already have objects and tables in your Caché application, you should know where and how it stores data in globals. And with Journals API you will be able to read every change in the data. There will be any changes, if the application uses transactions, you will have flags about it as well. Every record changes only one value, it could be set or kill.
And be aware, that Caché cleans outdated journal files by settings, so, you have to increase the number of days after which it will be purged.

PostgreSQL - store real-time data of multiple subjects

Scenario:
I'm trying to build a real-time monitoring webpage for ship operations
I have 1,000 - 10,000 ships operating
All ships are sending real-time data to DB, 24 hours - for 30 days
Each new data inserted has a dimension of 1 row X 100 col
When loading the webpage, all historic data of a chosen ship will be fetched and visualized
Last row of the ship's real-time data table will be queried, and fetched on the webpage to update real-time screen
Each ship has its own non-real-time data, such as ship dimensions, cargo, attendants, etc...
So far I've been thinking about creating a new schema for each ship. Something like this:
public_schema
ship1_schema
ship2_schema
ship3_schema
|--- realtime_table
|--- cargo_table
|--- dimensions_table
|--- attendants_table
ship4_schema
ship5_schme
Is this a good way to store individual ship's real-time data, and fetch them on a webserver? What other ways would you recommend?
For time-series wise, I'm already using a PostgreSQL extension called Timescale DB. My question rather about storing time-series data, in case I have many ships. Is it a good idea to differentiate each ship's RT data my constructing a new schema?
++ I'm pretty new to PostgreSQL, and some of the advice I got from other people was too advanced for me... I would greatly appreciated if you suggest some method, briefly explain what it is
This seems personally like the wrong way to work.
In this case i would have all the ship data in one table and from there on i would include a shipid to
realtime_table
cargo_table
dimensions_table
attendants_table
From there on if you believe that your data will reach a lot of volume you have the following choices.
Create indexes on the fields that are important to query, Postgres query planner is very useful for that.
Latest Postgres has implemented table partitioning based on criteria you provide without having to use table inheritance.**
Since you will be needing live data on the web page you can use
Listen command for Postgres
for when data are received from the ship (Unless you have another way of sending this data to the web server like web sockets)
Adding a bit of color here - if you are already using the TimescaleDB extension, you won't need to use table partitioning, since TimescaleDB will handle that for you automatically.
The approach of storing all ship data in a single table with a metadata table outside of the time series table is a common practice. As long as you build the correct indexes, as others have suggested, you should be fine. An important thing to note is that if you (for example) build an index on time, you want to make sure to include time in your queries to benefit from constraint exclusion.

Hazelcast, MongoDB persitance

I am currently using Hazelcast Community Edition as a caching mechanism used for a web application so it needs to be fast.
Currently, we store a lot of data in there and this is growing even more. As it's an in-memory DB, RAM is expensive. So wanted to know what the best practice was. I was planning to store only a small amount of data in the cache and store the rest in MongoDB. I want Hazelcast to persist and get the data from MongoDB only if it can't find it.
I have created the mapstore, but I am not sure how to tell it to "look" in MongoDB for data it can't find in the cache. Is it simply the case of getMap("something") if this result is empty then load("") from MapStore?
Thanks
In EAGER mode it will load all the entries on the map init (so on hz.getMap())
In LAZY mode it will load a partition when it is first touched.
Additionally, if you do map.get() and there's no value in the IMap it will try loading that value from the MapLoader using MapLoader.load(key) method.
Also, if you do map.put() and there is no value in the IMap it will do MapLoader.load(key), since the put methods is supposed to return the previous value. If you want to avoid it use map.set().
It would be also good if you had a look at the manual section related to MapStore/MapLoader. It should describe all the subtle differences.

Persistent: How can I get a random record from the database

I want to use persistent to fetch a number of records from the database, selected randomly.
My first idea was to have an auto-incremented id field on the model, and fetch records by id. But I quickly concluded that this is probably not a good idea. If for instance some records are deleted, there will be gaps. (in my scenario the data will be more or less static, but it's still vulnerable)
For now the plan is to use mongodb, but this may change, I'm just researching now, the project hasn't started yet.
Some databases have native functions for selecting random records, does persistent support this? Or is there another way to accomplish the same?

Setting up Core Data object relations within an efficient performance (Large Data Set)

I'm have to deal with to import large dateset into Core Data (~15 000 recods). But the problem is setting up the relations that takes a lot of time maybe more than 10 minutes. Because each records have 3 or more relations that i have to fetch their related object from MOC. With basic calculation i have to perform 45.000 fetch request while the number of objects in MOC is being populating. I have read some topics about and mostly it is suggested to migrate from Core Data to SQLite. But sure it takes time and lots of modification in code. Is there anyway to solve this without giving up Core Data. Is it possible to insert record into sqlite DB of Core Data directly?
\Thanks
With that many records being saved I would say your best bet is to make the best user experience out of it. Let them know it will take awhile.
What kind of relationships are you building? Are you setting both sides of the relationship or just one?
I'd say the best improvement you'll get is to figure out the cleanest way to fetch everything. And if you can, only save the context at the end instead of every time (if you are doing that).
Another alternative might be to not do the relationship building until you need it.
But without code, it is hard to give suggestions for improvement.