since the data may become huge , currently, i can only use delete partitions to delete old metircs, does CrateDB have Mechanism for expiring old data?
No CrateDB does not support that currently. Feel free to open a feature request on github including some details/use-cases.
Related
I'm storing backups in Cloud Storage. A desirable property of such a backup is to ensure the device being backed up cannot erase the backups, to protect against ransomware or similar threats. At the same time, it is desirable to allow the backup client to delete so old files can be pruned. (Because the backups are encrypted, it isn't possible to use lifecycle management to do this.)
The solution that immediately comes to mind is to enable object versioning and use lifecycle rules to retain object versions (deleted files) for a certain amount of time. However, I cannot see a way to allow the backup client to delete the current version, but not historical versions. I thought it might be possible to do this with an IAM condition, but the conditional logic doesn't seem flexible enough to parse out the object version. Is there another way I've missed?
The only other solution that comes to mind is to create a second bucket, inaccessible to the backup client, and use a Cloud Function to replicate the first bucket. The downside of that approach is the duplicate storage cost.
To answer this:
However, I cannot see a way to allow the backup client to delete the current version, but not historical versions
When you delete a live object, object versioning will retain a noncurrent version of it. When deleting the noncurrent object version, you will have to specify the object name along with its generation number.
Just to add, you may want to consider using a transfer job to replicate your data on a separate bucket.
Either way, both approach (object versioning or replicating buckets) will incur additional storage costs.
I noticed that when using curl to get content from github using this format:
https://raw.githubusercontent.com/${org}/${repo}/${branch}/path/to/file
It will sometimes return cached/stale content. For example with this sequence of operations:
curl https://raw.githubusercontent.com/${org}/${repo}/${branch}/path/to/file
Push a new commit to that branch
curl https://raw.githubusercontent.com/${org}/${repo}/${branch}/path/to/file
Step 3 will return the same content as step 1 and not reflect the new commit.
How can avoid getting a stale version?
I noticed on the Github WebUI, it adds a token to the url, eg: ?token=AABCIPALAGOZX5R which presumably avoids getting cached content. What's the nature of this token and how can I emulate this? Would tacking on ?token=$(date +%s) work?
Also I'm looking for a way to avoid the stale content without having to switch to a commit hash in the url, since it will require more changes. However, if that's the only way to achieve it, then I'll go that route.
GitHub caches this data because otherwise frequently requested files would involve serving a request to the backend service each time and this is more expensive than serving a cached copy. Using a CDN provides improved performance and speed. You cannot bypass it.
The token you're seeing in the URL is a temporary token that is issued for the logged-in user. You cannot use a random token, since that won't pass authentication.
If you need the version of that file in a specific commit, then you'll need to explicitly specify that commit. However, do be aware that you should not do this with some sort of large-scale automated process as a way to bypass caching. For example, you should not try to do this to always get the latest version of a file for the purposes of a program you're distributing or multiple instances of a service you're running. You should provide that data yourself, using a CDN if necessary. That way, you can decide for yourself when the cache needs to be expired and get both good performance and the very latest data.
If you run such a process anyway, you may cause an outage or overload, and your repository or account may be suspended or blocked.
My question might be simple, and the solution as well, however, i want to know, supposing that a user syncs a branch, and later delete the physical files from his local machine manually, the metadata about these files wil still exist in the server...
In the long run i'm afraid this could slow down the server.
I haven't found much about this issue, this is why i'm asking here, how do companies usually manage their Perforce metadata? A trigger that verifies the existing metadatas? a program that runs sync #none for client directories that does not exist anymore from time to time?
As i said, there might be many simple ways to solve that, but i'm looking for the best one.
Any help is appreciated.
In practice I don't think you'll have too much to worry about.
That being said, if you want to keep the workspace metadata size to a minimum, there are two things you'll need to do:
You'll need to write the sync #none script you referenced above, and also make sure to delete any workspaces that are no longer in use.
Create a checkpoint, and recreate the metadata from that checkpoint. When the metadata is recreated, that should remove any data from deleted clients. My understanding of the Perforce metadata is that it won't shrink unless it's being recreated from a checkpoint.
In all enterprise applications I've seen pretty much nothing gets actually deleted from persistent storage, it's just marked as deleted with a flag or delete date. If I'm designing such an application should I ever use DELETE requests? If they should be used, how exactly they should look like? For example, if I want to let's say block a credit card I would issue something like
POST /block_orders
card_number=123&reason=card_stolen
But the app doesn't look RESTful if it doesn't use all of the available verbs. Does DELETE has any place in enterprise?
UPD: Is it good design to allow to DELETE a resource if you can GET that resource later to view history of operations, for example?
DELETE has a place in the enterprise.
Use DELETE requests to flag records as deleted, and hide those records from GET requests: as far as the client is concerned, those records are deleted.
Don't worry that those records can be recovered; even if you deleted them from the database, they could still be recovered from the backups. Nothing is really deleted nowadays. :-)
That sounds like a PATCH / PUT to me:
If you want to change the state of the credit card (to blocked) use PATCH more info here.
PATCH /users/<user_id>/credit_cards/<creditcard_id>/
JSON
{
"reason": "stolen"
}
If you are giving all the resource with the request, to edit it knowing its identifier, use PUT.
DELETE fits perfectly with the deleted flag approach. Just update that flag when you receive DELETE request and if the user tries to retrieve the resource, just give her/him the available info for that blocked credit card. But in your scenario I would not use it to block credit cards... I would use it when a user wants to cancel that credit card, and also I would use soft deletion (delete flag) because maybe you are interested in statistical process or something... corporations have reasons not to delete data... :)
The DELETE HTTP method conveys the intent of the client to DELETE a resource. There is no requirement on the server to actually physically delete it. Feel free to just flag it.
And REST doesn't care if you don't use all the methods. It only cares if you use one incorrectly.
My app is using TouchDB and is doing a ton of REST requests to pull deleted documents/revisions. Is it possible to basically tell Couch, "Hey, just delete them, forget about their past"? I know _changes will show what was deleted, but I'd love it if it just deleted, and didn't ask anything else... for the sake of iPhone battery life, and 3G connectivity.
There are 'compact' api, which removes all deleted docs from couchDB. You can sometimes launch it.
But standard couchDB replication doesn't send ALL revisions, but only revision history (without data), and last revision. Check your replication algorithm if for compatability with this API