Is it possible to get the generated key using Apache Beam JdbcIO.Write?

Is it possible to get the generated key using Apache Beam JdbcIO.Write? - apache-beam

If I call a stored procedure using JdbcIO.Write is it possible to capture the ID (primary key) if the stored procedure returns this data?
public JdbcIO.Write<MyObject> writeMyObject() {
final String UPSERT_MY_OBJECT = "EXEC [MySchema].[UspertMyObject] ?,?,?";
// If my stored procedure returns the generated or existing ID
// is it possible to update the object I'm writing with the ID?
return JdbcIO.<MyObject>write()
.withDataSourceConfiguration(myDataSourceConfig)
.withStatement(UPSERT_MY_OBJECT)
.withPreparedStatementSetter((JdbcIO.PreparedStatementSetter<MyObject>) (myObject, ps) -> {
ps.setInt(1, myObject.getFieldOne());
ps.setString(2, myObject.getFieldTwo());
ps.setString(3, myObject.getFieldThree());
});
}

I don't think it's possible but, as a workaround, you can wait for write's finish (with Wait transform, see an example there) and then read them from database.

Related

Google Cloud Storage atomic creation of a Blob

I'm using haddop-connectors
project for writing BLOBs to Google Cloud Storage.
I'd like to make sure that a BLOB with a specific target name that is being written in a concurrent context is either written in FULL or not appearing at all as visible in case that an exception has occurred while writing.
In the code below, in case that that an I/O exception occurs, the BLOB written will appear on GCS because the stream is being closed in finally:
val stream = fs.create(path, overwrite)
try {
actions.map(_ + "\n").map(_.getBytes(UTF_8)).foreach(stream.write)
} finally {
stream.close()
}
The other possibility would be to not close the stream and let it "leak" so that the BLOB does not get created. However this is not really a valid option.
val stream = fs.create(path, overwrite)
actions.map(_ + "\n").map(_.getBytes(UTF_8)).foreach(stream.write)
stream.close()
Can anybody share with me a recipe on how to write to GCS a BLOB either with hadoop-connectors or cloud storage client in an atomic fashion?

I have used reflection within hadoop-connectors to retrieve an instance of com.google.api.services.storage.Storage from the GoogleHadoopFileSystem instance
GoogleCloudStorage googleCloudStorage = ghfs.getGcsFs().getGcs();
Field gcsField = googleCloudStorage.getClass().getDeclaredField("gcs");
gcsField.setAccessible(true);
Storage gcs = (Storage) gcsField.get(googleCloudStorage);
in order to have the ability to make a call based on an input stream corresponding to the data in memory.
private static StorageObject createBlob(URI blobPath, byte[] content, GoogleHadoopFileSystem ghfs, Storage gcs)
throws IOException
{
CreateFileOptions createFileOptions = new CreateFileOptions(false);
CreateObjectOptions createObjectOptions = objectOptionsFromFileOptions(createFileOptions);
PathCodec pathCodec = ghfs.getGcsFs().getOptions().getPathCodec();
StorageResourceId storageResourceId = pathCodec.validatePathAndGetId(blobPath, false);
StorageObject object =
new StorageObject()
.setContentEncoding(createObjectOptions.getContentEncoding())
.setMetadata(encodeMetadata(createObjectOptions.getMetadata()))
.setName(storageResourceId.getObjectName());
InputStream inputStream = new ByteArrayInputStream(content, 0, content.length);
Storage.Objects.Insert insert = gcs.objects().insert(
storageResourceId.getBucketName(),
object,
new InputStreamContent(createObjectOptions.getContentType(), inputStream));
// The operation succeeds only if there are no live versions of the blob.
insert.setIfGenerationMatch(0L);
insert.getMediaHttpUploader().setDirectUploadEnabled(true);
insert.setName(storageResourceId.getObjectName());
return insert.execute();
}
/**
* Helper for converting from a Map<String, byte[]> metadata map that may be in a
* StorageObject into a Map<String, String> suitable for placement inside a
* GoogleCloudStorageItemInfo.
*/
#VisibleForTesting
static Map<String, String> encodeMetadata(Map<String, byte[]> metadata) {
return Maps.transformValues(metadata, QuickstartParallelApiWriteExample::encodeMetadataValues);
}
// A function to encode metadata map values
private static String encodeMetadataValues(byte[] bytes) {
return bytes == null ? Data.NULL_STRING : BaseEncoding.base64().encode(bytes);
}
Note in the example above, that even if there are multiple callers trying to create a blob with the same name in parallel, ONE and only ONE will succeed in creating the blob. The other callers will receive 412 Precondition Failed.

GCS objects (blobs) are immutable 1, which means they can be created, deleted or replaced, but not appended.
The Hadoop GCS connector provides the HCFS interface which gives the illusion of appendable files. But under the hood, it is just one blob creation, GCS doesn't know if the content is complete or not from the application's perspective, just as you mentioned in the example. There is no way to cancel a file creation.
There are 2 options you can consider:
Create a temp blob/file, copy it to the final blob/file, then delete the temp blob/file, see 2. Note that there is no atomic rename operation in GCS, rename is implemented as copy-then-delete.
If your data fits into the memory, first read up the stream and buffer the bytes in memory, then create the blob/file, see 3.
GCS connector should also work with the 2 options above, but I think GCS client library gives you more control.

How do I confirm I am reading the data from Mongo secondary server from Java

For performance optimisation we are trying to read data from Mongo secondary server for selected scenarios. I am using the inline query using "withReadPreference(ReadPreference.secondaryPreferred())" to read the data, PFB the code snippet.
What I want to confirm the data we are getting is coming from secondary server after executing the inline query highlighted, is there any method available to check the same from Java or Springboot
public User read(final String userId) {
final ObjectId objectId = new ObjectId(userId);
final User user = collection.withReadPreference(ReadPreference.secondaryPreferred()).findOne(objectId).as(User.class);
return user;
}

Pretty much the same way in Java. Note we use secondary() not secondaryPrefered(); this guarantees reads from secondary ONLY:
import com.mongodb.ReadPreference;
{
// This is your "regular" primaryPrefered collection:
MongoCollection<BsonDocument> tcoll = db.getCollection("myCollection", BsonDocument.class);
// ... various operations on tcoll, then create a new
// handle that FORCES reads from secondary and will timeout and
// fail if no secondary can be found:
MongoCollection<BsonDocument> xcoll = tcoll.withReadPreference(ReadPreference.secondary());
BsonDocument f7 = xcoll.find(queryExpr).first();
}

Cannot drop Firebird table when using multiple connections

I would like to safely drop Firebird table. I have 3 transactions, one to recreate table, one to do something with the table (just inserting a single row to keep it simple) and the last one to drop the table.
If all these txns are executed using single connection these works. If I use a different connection, then the drop command fails with
lock conflict on no wait transaction
unsuccessful metadata update
object TABLE "DEMO" is in use
private static void Test() {
using var conn1 = new FbConnection(ConnectionString);
using var conn2 = new FbConnection(ConnectionString);
using var conn3 = new FbConnection(ConnectionString);
conn1.Open();
conn2.Open();
conn3.Open();
ExecuteTxn(conn1, cmd => {
cmd.CommandText = "recreate table demo (id int primary key)";
cmd.ExecuteNonQuery();
});
ExecuteTxn(conn2, cmd => {
cmd.CommandText = "insert into demo (id) values (1)";
cmd.ExecuteNonQuery();
});
ExecuteTxn(conn3, cmd => {
cmd.CommandText = "drop table demo";
cmd.ExecuteNonQuery();
});
}
private static void ExecuteTxn(FbConnection conn, Action<FbCommand> todo) {
using (var txn = conn.BeginTransaction())
using (var cmd = conn.CreateCommand()) {
cmd.Transaction = txn;
todo(cmd);
txn.Commit();
}
}
I realized that changing the transaction options as
txn = conn.BeginTransaction(new FbTransactionOptions { TransactionBehavior = FbTransactionBehavior.Wait }))
seems to help. But I'm not sure if this the right thing to do or just a coincidence...
Using Firebird 3.0.6, FirebirdSql.Data.FirebirdClient.dll 7.5.0.0

As far as I understand it, the problem has to do with how Firebird caches certain metadata, which might result in existence locks being retained, which will prevent deletion of the object. In addition, it is possible - this is a guess! - that the Firebird ADO.net provider retains the statement handle with the insert statement prepared, which will also result in an existence lock being retained.
Executing in a WAIT transaction (optionally with a timeout) is considered an appropriate workaround by the Firebird core developers.
For reference, see the following tickets:
CORE-3766 - Transaction can`t change metadata if it is run in no_wait and there is another connect that once had queried these metadata
CORE-6382 - Triggers accessing a table prevent concurrent DDL command from dropping that table
In certain cases, switching from Firebird ClassicServer or Firebird SuperClassic to Firebird SuperServer can also prevent this problem.
However, if you want a more in-depth explanation, it might be worthwhile to ask this question on the firebird-devel mailing list.

Clearing the update log on postgresql DB

What I am trying to do is to call a function whenever there is an update to the DB. I use the 'Committed transaction information' aspect to get the status on the DB update.
pg_xact_commit_timestamp(xid)
SELECT pg_xact_commit_timestamp(xmin) ts FROM "TABLE_NAME" WHERE pg_xact_commit_timestamp(xmin) IS NOT NULL;
The problem: After the first-iteration of update to the DB. The commit_timestamp(xmin) would never be empty. Is there a way that once the functions is executed we can set the commit_timestamp list to be empty?
My requirement: To execute a function only when there is an new (without any information on the previous update) update
I understand that, this is over use of an inbuilt-function.
E.g.
Connection DBconn = DBConnection.connect(url,user,password);
Boolean DBStatus = checkDBStatus(DBconn, configuration);
if(DBStatus) {
System.out.println("DB updated recently");
feedbackStatusJSON = runBusinessLogic();
}
else {
System.out.println("DB not updated");
}

c# entity framework savechangesasync saves new record but returns 0

Entity Framework: 6.1.3.
I have a function that reads a simple table for a record and either updates it or first creates a new entity. Either way it then calls AddOrUpdate and SaveChangesAsync. This function has worked for quite some time without any apparent problem.
In my current situation, however, I'm getting a return value of 0 from SaveChangesAsync. I have a break point just before the save and verified that the record doesn't exist. I step through the code and, as expected, a new entity was created. The curious part is that the record is now in the table as desired. If I understand the documentation, 0 should indicate that nothing was written out.
I'm not using transactions for this operation. Other database operations including writes would have already occurred on the context prior to this function being called, however, they should all have been committed.
So how can I get a return of 0 and still have something written out?
Here is a slightly reduced code fragment:
var settings = OrganizationDb.Settings;
var setting = await settings.FirstOrDefaultAsync(x => x.KeyName == key).ConfigureAwait(false);
if (setting == null)
{
setting = new Setting()
{
KeyName = key,
};
}
setting.Value = value;
settings.AddOrUpdate(setting);
if (await OrganizationDb.SaveChangesAsync().ConfigureAwait(false) == 0)
{
//// error handling - record not written out.
}

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse

Is it possible to get the generated key using Apache Beam JdbcIO.Write? - apache-beam

I don't think it's possible but, as a workaround, you can wait for write's finish (with Wait transform, see an example there) and then read them from database.

Related

Google Cloud Storage atomic creation of a Blob

How do I confirm I am reading the data from Mongo secondary server from Java

Cannot drop Firebird table when using multiple connections

Clearing the update log on postgresql DB

c# entity framework savechangesasync saves new record but returns 0

Categories

Resources