Google Cloud Storage transactions? - google-cloud-storage

It does not appear that GCS has any transaction mechanism. Is this correct?
I would like to be able to have a long lived transaction. For example, it would be great if I could start a transaction and specify an expiration time (if not committed within X time it automatically gets rolled back). Then I could use this handle to insert objects, compose, delete etc. and if all goes well, issue a isCommitPossible(), and if yes, then commit().
Is this a possibility?

Object writes are transactional (either the complete object and its metadata are successfully written and the object becomes visible; or it fails without becoming visible). But there's no transaction mechanism spanning multiple GCS operations.
Mike

The Cloud Storage client libraries offer a file-like object to work with, which has an Open() and Close() operation. If a single operation can be transactional then, in theory, it should be possible to open a single "lock file" for the duration of all other operations, only closing it when you're done will all the other files.
In other words, you would have to write your processes to use a "lock file" and, in that way, you could, at the least, know whether or not all your files were written/read or if there was some error. Whenever the next round of operations takes place, it would just look for the existence of the lock file that corresponds to the set of files written (you'd have to arrange your naming, directory layout, etc, to have it make sense for this). If it exists, we can assume that the file group was written successfully. If it doesn't exist, assume that something happened (or that the process hasn't yet completed).
I have not actually tested this out. But I offer it as an idea for others who might be desperate enough to try.

Related

Do Firebase/Firestore Transactions create internal queues?

I'm wondering if transactions (https://firebase.google.com/docs/firestore/manage-data/transactions) are viable tools to use in something like a ticketing system where users maybe be attempting to read/write to the same collection/document and whoever made the request first will be handled first and second will be handled second etc.
If not what would be a good structure for such a need with firestore?
Transactions just guarantee atomic consistent update among the documents involved in the transaction. It doesn't guarantee the order in which those transactions complete, as the transaction handler might get retried in the face of contention.
Since you tagged this question with google-cloud-functions (but didn't mention it in your question), it sounds like you might be considering writing a database trigger to handle incoming writes. Cloud Functions triggers also do not guarantee any ordering when under load.
Ordering of any kind at the scale on which Firestore and other Google Cloud products operate is a really difficult problem to solve (please read that link to get a sense of that). There is not a simple database structure that will impose an order where changes are made. I suggest you think carefully about your need for ordering, and come up with a different solution.
The best indication of order you can get is probably by adding a server timestamp to individual documents, but you will still have to figure out how to process them. The easiest thing might be to have a backend periodically query the collection, ordered by that timestamp, and process things in that order, in batch.

Hyperledger Fabric: Blockchain consistency checks

I have a few questions on internal consistency checks in HLF. Thanks in advance for any information.
Which part of the HLF system keeps track of the consistency of the blockchain?
If I were to open the HLF ledger file of a peer (on the docker instance) in a binary editor and change it at one place with a random number (thereby breaking the hash or clobbering the header), which part of the system detects this as a problem? And when?
Is such a consistency check done only for the last block while appending the new block? If the change is made to a historic block (ie., not the last one), when is this detected?
If such a problem is found, does the damaged copy of the blockchain automatically get rebuilt by the peer?
Does a Read also trigger a consistency check (assuming that a Write/Append does)? If a block is damaged and there is no Write for a long time, will intervening Reads report wrong data?

DB2 AS400/IBM ISeries Triggers/On File Change

Looking for best practices to get DELTAs of data over time.
No timestamps available, cannot program timestamps!
GOAL: To get differences in all files for all fields over time. Only need primary key as output. Also I need this for 15 minute intervals of data changes
Example:
Customer file has 50 columns/fields, if any field changes I want another file to record the primary key. Or anything to record the occurrence of a change in the customer file.
Issue:
I am not sure if triggers are the way to go since there is a lot of overhead associated with triggers.
Can anyone suggest best practices for DB2 deltas over time with consideration to overhead and performance?
I'm not sure why you think there is a lot of overhead associated with triggers, they are very fast in my experience, but as David suggested, you can journal the files you want to track, then analyze the journal receivers.
To turn on Journaling you need to perform three steps:
Create a receiver using CRTJRNRCV
Create a journal for the receiver using CRTJRN
Start journaling on the files using STRJRNPF. You will need to keep *BEFORE and *AFTER images to detect a change on update, but you can omit *OPNCLS records to save some space.
Once you do this, you can also use commitment control to manage transactions! But, you will now have to manage those receivers as they use a lot of space. You can do that by using MNGRCV(*SYSTEM) on the CRTJRN command. I suspect that you will want to prevent the system from deleting the old receivers automatically as that could cause you to miss some changes when the system changes receivers. But that means you will have to delete old receivers on your own when you are done with them. I suggest waiting a day or two to delete old receivers. That can be an overnight process.
To read the journal receiver, you will need to use RTVJRNE (Retreive Journal Entries) which lets you retrieve journal entries into variables, or DSPJRN (Display Journal) which lets you return journal entries to the display, a printer file, or an *OUTFILE. The *OUTFILE can then be read using ODBC, or SQL or however you want to process it. You can filter the journal entries that you want to receive by file, and by type.
Have you looked at journalling the files and evaluating the journal receivers?

Perl share hashmap through file

Currently I have a script that collects data of the specified server. The data is stored inside a hash which I store into a file for persistence.
If the script is being called with another server it should load the hash from the file and extend the hash with the data from the second server. Then save it back.
I use the storable module.
use Storable;
$recordedpkgs = retrieve($MONPKGS_DATA_FILE) if ( -e $MONPKGS_DATA_FILE);
store $recordedpkgs, $MONPKGS_DATA_FILE;
Obviously there is a access issue if one writes while the other has already read the file. Some data will be then lost.
What would be an ideal solution to that? Use basic file locking? Is there better ways to achieve that?
It depends - what you're talking about is inter process communication, and perl has a whole documentation segment on the subject perlipc
But to answer your question directly - yes, file locking is the way to go. It's exactly the tool for the job you describe.
Unfortunately, it's often OS dependent. Windows and Linux locking semantics are different. Take a look at flock - that's the basic start on Unix based systems. Take a look at: http://www.perlmonks.org/?node_id=7058
It's an advisory lock, where you can request a shared (read) or exclusive (write) lock. And either block (until released), or fail and return if you cannot acquire that lock.
Storable does implement some locking semantics: http://perldoc.perl.org/Storable.html#ADVISORY-LOCKING
But you might find you want to use a lock file if you're doing a read-modify-write cycle on the saved content.
I would just use a basic lock file that is checked before operations are performed upon the file, if the lock file is in place then simply make your other process either wait + check (either infinite or a set amount of times before exiting), or simply exit with an error.

Syncing objects between two disparate systems, best approach?

I am working on syncing two business objects between an iPhone and a Web site using an XML-based payload and would love to solicit some ideas for an optimal routine.
The nature of this question is fairly generic though and I can see it being applicable to a variety of different systems that need to sync business objects between a web entity and a client (desktop, mobile phone, etc.)
The business objects can be edited, deleted, and updated on both sides. Both sides can store the object locally but the sync is only initiated on the iPhone side for disconnected viewing. All objects have an updated_at and created_at timestamp and are backed by an RDBMS on both sides (SQLite on the iPhone side and MySQL on the web... again I don't think this matters much) and the phone does record the last time a sync was attempted. Otherwise, no other data is stored (at the moment).
What algorithm would you use to minimize network chatter between the systems for syncing? How would you handle deletes if "soft-deletes" are not an option? What data model changes would you add to facilite this?
The simplest approach: when syncing, transfer all records where updated_at >= #last_sync_at. Down side: this approach doesn't tolerate clock skew very well at all.
It is probably safer to keep a version number column that is incremented each time a row is updated (so that clock skew doesn't foul your sync process) and a last-synced version number (so that potentially conflicting changes can be identified). To make this bandwidth-efficient, keep a cache in each database of the last version sent to each replication peer so that only modified rows need to be transmitted. If this is going to be a star topology, the leaves can use a simplified schema where the last synced version is stored in each table.
Some form of soft-deletes are required in order to support sync of deletes, however this can be in the form of a "tombstone" record which contains only the key of the deleted row. Tombstones can only be safely deleted once you are sure that all replicas have processed them, otherwise it is possible for a straggling replica to resurrect a record you thought was deleted.
So I think in summary your questions relate to disconnected synchronization.
So here is what I think should happen:
Initial Sync You retrieve the data and any information associated with it (row versions, file checksums etc). it is important you store this information and leave it pristine until the next succesful sync. Changes should be made on a COPY of this data.
Tracking Changes If you are dealing with database rows, the idea is, you basically have to track insert, update and delete operations. If you are dealing with text files like xml, then its slightly more complicated. If it likely that multiple users will edit this file at the same time, then you would have to have a diff tool, so conflicts can be detected in a more granular level (instead of the whole file).
Checking for conflicts Again if you are just dealing with database rows, conflicts are easy to detect. You can have another column that increments whenever the row is updated (i think mssql has this builtin not sure about mysql). So if the copy you have has a different number than what's on the server, then you have a conflict. For files or strings, a checksum will do the job. I suppose you could also use modified date but make sure that you have a very precise and accurate measurement to prevent misses. for example: lets say I retrieve a file and you save it as soon as I retrieved it. Lets say the time difference is a 1 millisecond. I then make changes to file then I try to save it. If the recorded last modified time is accurate only to 10 milliseconds, there is a good chance that the file I retrieved will have the same modified date as the one you saved so the program thinks theres no conflict and overwrites your changes. So I generally don't use this method just to be on the safe side. On the other hand the chances of a checksum/hash collision after a minor modification is close to none.
Resolving conflicts Now this is the tricky part. If this is an automated process, then you would have to assess the situation and decide whether you want to overwrite the changes, lose your changes or retrieve the data from the server again and attempt to redo the changes. Luckily for you, it seems that there will be human interaction. But its still a lot of pain to code. If you are dealing with database rows, you can check each individual column and compare it against the data in the server and present it to the user. The idea is to present conflicts to the user in a very granular way so as to not overwhelm them. Most conflicts have very small differences in many different places so present it to the user one small difference at a time. So for text files, its almost the same but more a hundred times more complicated. So basically you would have to create or use a diff tool (Text comparison is a whole different subject and is too broad to mention here) that lets you know of the small changes in the file and where they are in a similar fashion as in a database: where text was inserted, deleted or edited. Then present that to the user in the same way. so basically for each small conflict, the user would have to choose whether to discard their changes, overwrite changes in the server or perform a manual edit before sending to the server.
So if you have done things right, the user should be given a list of conflicts if there are any. These conflicts should be granular enough for the user to decide quickly. So for example, the conflict is a spelling change from, it would be easier for the user to choose from word spellings in contrast to giving the user the whole paragraph and telling him that there was a change and that they have to decide what to do, the user would then have to hunt for this small misspelling.
Other considerations: Data Validation - keep in mind that you have to perform validation after resolving conflicts since the data might have changed Text Comparison - like I said, this is a big subject. so google it! Disconnected Synchronization - I think there are a few articles out there.
Source: https://softwareengineering.stackexchange.com/questions/94634/synchronization-web-service-methodologies-or-papers