OrientDB: RIDs assigned once and forever? - orientdb

Imagine the following:
a newly created node gets for example the RID #19:2
some time later that node #19:2 gets deleted (so the id would be theoretically available again)
Now my question is: Is there a possibility/risk, that the RID #19:2 could be assigned again to another newly created node or can I be sure, that it won't be assigned ever again?

Luca Garulli (Founder and CEO of OrientDB LTD) wrote here on SO:
the RID (Record ID) is never recycled.
But it would probably be wise in general to regard it as an internal "implementation detail" whenever possible, if only because #rid values might not survive export/import. (For this reason, I think it would be nice to have a shorthand for SELECT FROM <class> WHERE <id> = <value>)

Related

How to check when is the last time a row is selected in PostgreSQL?

Literally as the title said.
I'm checking an old database left by earlier developer, and apparently instead of creating a new "Master" table, he created a table which contains constants in the form of JSONs. Now however I want to check whether that row is still used, and when is the last time it's used.
When transitioning, the developer doesn't provide documentation whatsoever. So I have to check on my own on how things should work. I want to know because the code is really messy. Also since I can't seem to find this on Google, it's something worth to ask.
You cannot log past events. PostgreSQL does not retain that information.
The best you can do is:
Set log_statement = 'all'
Examine the statements in the log.

Is a SharedIndexInformer's Indexer's ThreadSafeStore ever emptied?

I'm carefully making my way through the Go code in the tools/cache package for what seems like the hundredth time.
From close-reading the code, when you create a new SharedIndexInformer, the following things happen:
SharedIndexInformer creates a new Indexer.
The Indexer so created creates a new ThreadSafeStore internally for holding representations of Kubernetes resources.
SharedIndexInformer.Run() creates a new DeltaFIFO with the new Indexer as its second parameter.
The Indexer supplied to the new DeltaFIFO therefore functions as its KeyLister
and its KeyGetter. (This is used to track "prior state" for deletions; i.e. if there's an object in it but the latest sync up with Kubernetes does not contain that object, then we know it has been deleted from the cluster.)
The HandleDeltas function is the one that the underlying controller will call when Kubernetes resources are added, updated or deleted.
The HandleDeltas function will call the Add, Update and Delete methods on the Indexer (and, by extension, on its underlying ThreadSafeStore).
Let's say I write a new controller using a SharedIndexInformer. Let's further say that it is watching for Pods. Let's finally say that there are 10 Pods in the cluster.
Does this mean that after the SharedIndexInformer's Run method has been called and some requisite amount of time has passed that there will be 10 JSON representations of Pods in the ThreadSafeStore, and, further, that none will be removed from this store until an actual deletion of one of the corresponding Pods occurs in Kubernetes?
Correct, except for the JSON part.
The Store contains native Object structs, deserialized from protobuf messages.

Google Cloud Function - Cloud storage object overwritten event

I have a cloud function which is set to be triggered by a cloud storage bucket.
Is there anyway to tell if an event is set off by a newly uploaded object or rather an overwritten object?
I tried to console.log out the event object but nothing seems to be indicative of whether or not the object has been overwritten.
I do notice that there exists an "overwroteGeneration" attribute in Cloud Pub/Sub Notifications which the trigger event here is based on, but it's not available here.
As staged by #Doug Stevenson, it looks like there is no easy way to achieve this. Moreover, I have been playing around with metageneration and, according to this example in the documentation (under "You upload a new version of the image
"), when an object is overwritten (even when versioning is enabled), it will get its own new generation and metageneration numbers, reason why, as you commented in the comment to the other answer, metageneration = 1 in such a scenario.
The only workaround I see, and which may not satisfy your specific requirements, is using two Cloud Functions (let's call them funcA and funcB), one (funcA) that identifies object creation with google.storage.object.finalize and another one (funcB) that detects object overwriting with google.storage.object.archive or google.storage.object.delete (depending if you are using versioning or not, respectively). In this case, funcA would be triggered twice, because for object overwritting it will be triggered too, but depending on your use case, you can identify the proximity in time of the create and delete events and detect that these detect to a single event:
funcA logs:
funcB logs:
I know this does not exactly solve the question you posted, but I do not think there is any way to identify, using a single Cloud Function, that an object has been either created or overwritten, it looks like you will need one Cloud Function for each of those procedures.
In order to determine the nature of the file upload (new file upload rather than existing file upload), you need to use the event and delivered to the function to figure it out.
You'll need to use an Object Finalize event type. As you can see from the documentation there:
This event is sent when a new object is created (or an existing object
is overwritten, and a new generation of that object is created) in the
bucket.
What's not so clear from that doc is that the metageneration combined with the resourceState property of the event is the indicator.
The documentation here should be clear about how to use metageneration along with resourceState to determine if a change to a bucket is a new file or a replaced file:
The resourceState attribute should be paired with the 'metageneration'
attribute if you want to know if an object was just created. The
metageneration attribute is incremented whenever there's a change to
the object's metadata. For new objects, the metageneration value is 1.

Preventing Deletion with django-simple-history

I started using django-simple-history in order to keep the history but when I delete an object (from admin page at least) I notice that it is gone for good.
I suppose I could create tags and "hide" objects instead of deleting in my views but would be nice if there is an easier way with django-simple-history, which would also cover admin operations.
When objects are deleted, that deletion is also recorded in history. The object does not exist anymore, but its history is safe.
If you browse your database, you should find a table named:
[app_name]_history[model_name]
It contains a line with the last state of the object. That line also contains additional columns: history_id, history_change_reason, history_date, history_type. For a deletion, history_type will be set to "-" (minus sign).
Knowing that, it is possible to revert a deletion programmatically, but not through the Django Admin. Have a look at django-simple-history documentation for details on how to do it programmatically.
Hope that helps!

PostgreSQL transaction variables

This question is sort of a follow up to this question, but it's different enough of a topic that I feel like it merits it's own discussion. For a bit of background, you can refer to it.
As a part of a new file importing system, I am building an audit system based on this wiki page. But, one of the things that I would like to include in the audit trail is the file name of the file that the data came from (these files are archived for long term storage so if there are questions, I can always go back).
One way I could go it is to create a import_batch record and record the name of the file there and then just stamp records when they update. Which is the path that I'm going down. But, it feels a bit clunky in a way. I'm been pondering the idea of trying to have the audit trigger be able to get the import_batch_id without it having to be in the NEW.* record. It seems like to me there are at least a couple of ways I might be able to accomplish this.
I could have a function that could create a temp table and store any information in it that I want (such as batch # or file name or whatever). This seem pretty clean and as I understand it would only live for the duration of the transaction. And as I understand it, it wouldn't have to worry about naming collisions. Each transaction would have a temp file named "tmp_import_info".
If I only care about the import_batch_id (which has a seq), I could probably just get the current value of the sequencer. I'm not a 100% sure how this would behave in a multi-user setting. I would think it would be possible for trans#1 to create import_batch_id #222 and then trans#2 to start and get #223. And then my audit trail would record the wrong data.
Are there other options that I'm not seeing here? Is there a way to add a transaction/session variable? Basically, something like pg_settings (but, that does allow for inserts, updates and deletes of values).
It feels like the best option might be the temp table.
The main good news for variant 2. is - quoting the manual here:
currval
Return the value most recently obtained by nextval for this sequence in the current session. (An error is reported if nextval has never been called for this sequence in this session.) Because this is returning a session-local value, it gives a predictable answer whether or not other sessions have executed nextval since the current session did.
Store your import file names in a table with a serial primary key. You can refer to your last value from the sequence with currval or lastval. Concurrent users cannot interfere. As long as you don't foil this path inside your own transaction yourself, this is safe.