Just wanted to find out if it's possible to persist subsets of the data, say only some tables, columns and/or rows, filtered by some conditions. Thanks for any input.
It's possible to control if an entire cache/table needs to be persisted. Create a data region with persistence enabled and start/store all the caches there.
Related
My web application periodically fetches data from various different remote web services. The data (array of JSON objects) mostly stays the same, but minor changes and additions/deletions to that array can happen.
The objective is to keep a local database that is up to date with the fetched data. However, I also need to know which objects were modified, added or deleted whenever updating the local database.
Using Hibernate (and assuming nobody else modifies the database), what would be a good strategy to implement this? Would it be a good idea to keep a checksum column for identifying modifications? Table sizes can be up to a few 100 000 rows, but only around 10 000 rows are updated at once.
You can use Hibernate interceptors to gain access to the previous and new property state, but depending on your use case, it might be better to use a CDC solution like Debezium.
So, in DynamoDB the reccomended approach to a many-to-many relationship is using Adjacency List Pattern.
Now, it works great for when you need to read the Data because you can easily read several items with one request.
But what if I need to update/delete the Data? These operations happen on a specific item instead of a query result.
So if I have thousands of replicated data to facilitate a GET operation, how am I going to update all of these replicas?
The easiest way I can think of is instead of duplicating the data, I only store an immutable ID, but that's pretty much emulating a relational database and will take at least 2 requests.
Simple answer: You just update the duplicated items :) AFAIK redundant data is preferred in NoSQL databases and there are no shortcuts to updating data.
This of course works best when read/write ratio of the data is heavily on the read side. And in most everyday apps that is the case (my gut feeling that could be wrong), so updates to data are rare compared to queries.
DynamoDB has a couple of utils that might be applicable here. Both have their shortcomings though
BatchWriteItem allows to put or delete multiple items in one or more tables. Unfortunately, it does not allow updates, so probably not applicable to your case. The number of operations is also limited to 25.
TransactWriteItems allows to perform an atomic operation that groups up to 10 action requests in one or more tables. Again the number of operations is limited for your case
My understanding is that both of these should be used with caution and consideration, since they might cause performance bottlenecks for example. The simple way of updating each item separately is usually just fine. And since the data is redundant, you can use async operations to make multiple updates in parallel.
I have data that correspond to 400 millions of rows in a table and it will certainly keep increasing, I would like to know what can I do to have such a table in PostgreSQL in a way that it would still be posible to make complex queries using it. In other words what should I do to have all the data in the most performative way?
Try to find a way to split your data into partitons (e.g. by day/month/week/year).
In Postgres, it is implemented using inheritance.
This way, if your queries are able to just use certain partitions, you'll have to handle less data at a time (e.g. read less data from disk).
You'll have to design your tables/indexes/partitions together with your queries - their struture will depend on how you want to use them.
Also, you could have overnight jobs preparing materialised views based on historical data. This way you don't have to delete you old data and you can deal with an aggregated view and most recent data only.
I want to use persistent to fetch a number of records from the database, selected randomly.
My first idea was to have an auto-incremented id field on the model, and fetch records by id. But I quickly concluded that this is probably not a good idea. If for instance some records are deleted, there will be gaps. (in my scenario the data will be more or less static, but it's still vulnerable)
For now the plan is to use mongodb, but this may change, I'm just researching now, the project hasn't started yet.
Some databases have native functions for selecting random records, does persistent support this? Or is there another way to accomplish the same?
I've to put hierarchical classifier to a database. The problem with this classifier that
it contains several GBs of data and ~20 millions of the records
it has 'sparse' structure, like when level 7 subtree connects directly to level 2 one
levels and level groups may have different sets of the attributes
I need to have fast read queries for subtrees and parents (and no writes at all)
Initially, I've tried to use a single table with nested sets and all possible columns, but it does not work well.
Now I trying to find a way to split all this data across the different tables and keep level-specific data within the lazy-fetched entities. But, at the same time, I'd like to have a single entity (or Map) to work with.
Is there a way to achieve this?