Connect Cadence to Azure Cosmo Cassandra API (Cadence workflow) - cadence-workflow

I am running cadence with cassandra externally running using docker run -e CASSANDRA_SEEDS=10.x.x.x e ubercadence/server:. and its running sucessfully.
Azure cosmos says, any system running on Cassandra can use Azure cosmos using provided cosmos cassandra APi, by modifying the client connection creation code, for example : GO app sample code :
func GetSession(cosmosCassandraContactPoint, cosmosCassandraPort, cosmosCassandraUser, cosmosCassandraPassword string) *gocql.Session {
clusterConfig := gocql.NewCluster(cosmosCassandraContactPoint)
port, err := strconv.Atoi(cosmosCassandraPort)
clusterConfig.Authenticator = gocql.PasswordAuthenticator{Username: cosmosCassandraUser, Password: cosmosCassandraPassword}
clusterConfig.Port = port
clusterConfig.SslOpts = &gocql.SslOptions{Config: &tls.Config{MinVersion: tls.VersionTLS12}}
clusterConfig.ProtoVersion = 4
session, err := clusterConfig.CreateSession()
...
return session
}
From my end, I can connect external cassandra's cqlsh(which cadence is using for persisting) to azure cosmos and can create KeySpace, table in azure cosmo db. However, when I run Cadence server, all new tables are still created on local cassandra itself(instead of Axure cosmos) might be, cadence is connected to cassandra only.
So there are basically 2 question shared below :
1.Since cadence is written in GO, can we modify the source code to establish connection to AzureCosmoDb. or
or can we pass the cosmocassandra's host, port, username, password, while running the cassandra and cadence separately (docker run -e CASSANDRA_SEEDS=10.x.x.x e ubercadence/server:)
cosmosCassandraContactPoint : xyz.cassandra.cosmos.azure.com cosmosCassandraPort : 10350 cosmosCassandraUser : xyz cosmosCassandraPassword : xyz

It’s really exciting that Azure Cosmo Cassandra is now support LWT!
I took a quick look at the docs
I think it’s may not going to work directly, because it doesn’t support LoggedBatch
But Cadence is using logged batch:
I think it’s probably okay to use unlogged batch in Cadence.
Because all the operation are in a single partition always.
From the Datastax docs:
Single partition batch operations are atomic automatically, while multiple partition batch operations require the use of a batchlog to ensure atomicity
This means using unlogged batch should be the same for Cadence.
(though I believe Cadence choose to use logged batch for safe)
It should work if we change the code slightly in the Cadence Cassandra plugin

Related

Azure Data Factory not Using Data Flow Runtime

I have an Azure Data Factory with a pipeline that I'm using to pick up data from an on-premise database and copy to CosmosDB in the cloud. I'm using a data flow step at the end to delete documents that don't exist in the source from the sink.
I have 3 integration runtimes set up:
AutoResolveIntegrationRuntime (default set up by Azure)
Self hosted integration runtime (I set this up to connect to the on-premise database so it's used by the source dataset)
Data flow integration runtime (I set this up to be used by the data flow step with a TTL setting)
The issue I'm seeing is when I trigger the pipeline the AutoResolveIntegrationRuntime is the one being used so I'm not getting the optimisation that I need from the Data flow integration runtime with the TTL.
Any thoughts on what might be going wrong here?
Per my experience, only the AutoResolveIntegrationRuntime (default set up by Azure) supports the optimization:
When we choose the data flow run on non-default integration, there isn't the optimization:
And once the integration runtime created, we also couldn't change the settings:
Data Factory documents didn't talk more about this. When I run the pipeline, I found that the dataflowruntime won't work:
That means that no matter which integration runtime you used to connect to the dataset, data low will always use the Azure Default integration runtime.
SHIR doesnt support dataflow execution.

MDS import data queue

I am following this guidance: https://www.mssqltips.com/sqlservertutorial/3806/sql-server-master-data-services-importing-data/
The instructions say after we load data into the staging tables, we go into the MDS integration screen and select "START BATCHES".
Is this a manual override to begin the process? or how do I know how to automatically queue up a batch to begin?
Thanks!
Alternative way to run the staging process
After you load the staging table with required data.. call/execute the Staging UDP.
Basically, Staging UDPs are different Stored Procedures for every entity in the MDS database (automatically created by MDS) that follow the naming convention:
stg.udp_<EntityName>_Leaf
You have to provide it values for some parameters. Here is a sample code of how to call these.
USE [MDS_DATABASE_NAME]
GO
EXEC [stg].[udp_entityname_Leaf]
#VersionName = N'VERSION_1',
#LogFlag = 1,
#BatchTag = N'batch1'
#UserName=N’domain\user’
GO
For more details look at:
Staging Stored Procedure (Master Data Services).
Do remember that the #BatchTag value has to match the value that you initially populated in the Staging table.
Automating the Staging process
The simplest way for you to do that would be to schedule a job in SQL Agent which would execute something like the code above to call the staging UDP.
Please note that you would need to get creative about figuring out how the Job will know the correct Batch Tag.
That said, a lot of developers just create a single SSIS Package which does the Loading of data in the Staging table (as step 1) and then Executes the Staging UDP (as the final step).
This SSIS package is then executed through a scheduled SQL Agent job.

Newbie help - how to connect to AWS Redshift cluster (currently using Aginity)

(I'm afraid I'm probably about to reveal myself as completely unfit for the task at hand!)
I'm trying to setup a Redshift cluster and database to help manage data for a class/group project.
I have a dc2.large cluster running with either default options, or what looked like the most generic in the couple of place I was forced to make entries.
I have downloaded Aginity (Win64) as it is described as being specialized for Redshift. That said, I can't find any instructions for connecting using it. The connection dialog requests the follwoing:
Server: using the endpoint for my cluster (less :57xx at the end).
UserID: the Master username for the database defined for the cluster.
Password: to match the UserID
SSL Mode (Disable, Allow, Prefer, Require): trying various options
Database: as named in cluster setup
Port: as defined in cluster setup
I can't get it to connect ("failed to establish connection") and don't know if I'm entering something wrong in Aginity or if I haven't set up my cluster properly.
Message: Failed to establish a connection to 'abc1234-smtm.crone7m2jcwv.us-east-1.redshift.amazonaws.com'.
Type : Npgsql.NpgsqlException
Source : Npgsql
Trace : at Npgsql.NpgsqlClosedState.Open(NpgsqlConnector context, Int32 timeout)
at Npgsql.NpgsqlConnector.Open()
at Npgsql.NpgsqlConnection.Open()
at Aginity.MPP.Common.BaseDataProvider.get_Connection()
at Aginity.MPP.Common.BaseDataProvider.CreateCommand(String commandText, CommandType commandType, IDataParameter[] commandParams)
at Aginity.MPP.Common.BaseDataProvider.ExecuteReader(String commandText, CommandType commandType, IDataParameter[] commandParams)
--- Inner Exception: ---
......
It seems there is not enough information going into Aginity to authorize connection to my cluster - no account credential are supplied. For UserID, am I meant to enter the ID of a valid user? Can I use the root account? What would the ID look like? I have setup a User with FullAccess to S3 and Redshift, then entered the UserID in this format
arn:aws:iam::600123456789:user/john
along with the matching password, but that hasn't worked either.
The only training/tutorial I have been able to find/do on this is the Intro AWS direct you to, at https://qwiklabs.com/focuses/2366, which uses a web-based client that I can't find outside of the tutorial (pgweb).
Any advice what I am doing wrong, and how to do it right?
Well, I think I got it working - I haven't had a chance to see if I can actually create table yet, but it seems to be connected. I had to allow inbound traffic from outside the VPC, as per the above snapshot.
I'm guessing there's a better way than opening it up to all IP addresses, but I don't know the users' (fellow team members) IPs, and aren't they all subject to change depending on the device they're using to connect?
How does one go about getting inside the VPC to connect that way, presumably more securely?

Hosting the database separately for Meteor apps

It seems to be a common and safer practice to host the database separately from Meteor apps. That is to say, have an EC2 instance for your Meteor app, and an EC2 instance for your MongoDB, and make them talk to one another.
From what I understand, people do this because it's more secure, and it allows them to deploy newer versions of their app without touching the database.
I'd like to do this with Amazon EC2 alone, as opposed to using another 3rd party service, like Compose.io.
How can I host a Meteor app and its database separately on two EC2 instances, and have them communicate with one another?
It is common practice, and people mostly do it because it offers you the ability to scale them both independently.
As to the how, you'll want to obviously configure each of your Amazon EC2 instances, installing meteor on one, and MongoDB on the other. You'll also need to configure your VPC (Amazon Virtual Private Cloud) so that your MongoDB instance accepts incoming connections on whatever port you specify (default is 27017), so that your Meteor Application can connect.
After that it's just a matter of telling your meteor app where to go to get the database connection. The most secure way of doing this will be to set a couple Environment Variables, named MONGODBSERVER and MONGODBPORT, DBUSER, DBPASSWORD, etc.
You'll then want to set some variables in your server Meteor code, using something like:
Meteor.startup(function() {
var DbUser = process.env.DBUSER;
var DbPassword = process.env.DBPASSWORD;
var MongoDBServer = process.env.MONGODBSERVER;
var MongoDBPort = process.env.MONGODBPORT;
});
And if you're using the native MongoDB Driver, connecting becomes trivial:
var MongoClient = require('mongodb').MongoClient;
MongoClient.connect('mongodb://DbUser:DbPassword#MongoDBServer:MongoDBPort/databasename', function(err, db) {
...
});
Then it's just a matter of constructing your Mongo models using something like:
Temperatures = new Mongo.Collection('temperatures');
Temperatures._ensureIndex({temp: 1, time: 1});
And then taking action on those models in regard to the database:
Temperatures.insert({temp: ftemp, time: Math.floor(Date.now() / 1000)});
I'll also mention that http://modulus.io is a really decent Meteor hosting solution. I'd recommend them, unless you are stuck on using Amazon EC2 instances, which is fine, but more complicated for a simple application.
You need to set an Environment Variable for Mongo where it is hosted
MONGO_URL
mongodb://:#hostingproviderurl:port/xxx?autoReconnect=true&connectTimeoutMS=60000
the correct mongodb:// url string would be provided by the mongodb hosting provider.

Windows ServiceBus 1.1 for Windows Server

I did move the databases from our ServiceBus test enviroment.
I started by leaving the farm with the single node, then I moved the databases.
After rejoining the farm I see that GatewayDBConnectionString is till pointing to the old one.
I can't find any valid PowerShell command to reconfigure the value in question.
Anyone know how to fix this?
Thank you in advance.
To answer this I will need you to understand this a bit more - and hence giving a high-level overview of Service Bus 1.1 Server farm configuration:
Service Bus Server 1.1 is a platform where users can create highly-durable distributed Pub-Sub (messaging Queues/Topics) entities. In simple words - the main job of this is to translate the Compute (your VMs) and Data (your MsgContainer databases) into messaging functionality Durable Queues and Topics. So, in short - the configuration wizard or the Powershell cmdlets used to configure ServiceBus 1.1 Server will try to take the VMs and Databases from you.
The Db SBManagementDB is considered to be the authoritative source of truth for any Farm level configuration -> like Nodes that are part of the Farm (Store.Nodes), Ports opened on each of the nodes, Gateway database connection string (Cluster Config) etc. Also pl. note that - as per the Windows Server product guidelines - any information that has to be securely persisted will be encrypted - so as the Gateway DB connection String.
a) when you did New-SBFarm (with a Gateway DB connection string) - you have essentially communicated to SBMgmtDB - the Gateway DB Server, database name etc.
b) when you do Add-SBHost - again you have communicated to SBMgmtDb that you want to add one Node to this Farm
Gateway db connection string is the one place for Truth for all Gateway Services to find any run-time info -> like Container Databases, entity to container mapping etc.
again, when you do New-SBMessageContainer PSCmdlet --> you communicated to SBGatewayDB that you are adding one db
Now, with this background - lets see how the action you did above will take into effect:
- When you moved all the Databases to a different Server - you changed the Gateway Database connection string - But the Gateway connection string you had communicated to the SBManagementDB (using the New-SBFarm cmdlet) was pointing to the Old Server.
- When you removed the Node from the Farm and again Joined back - you removed one node from the configuration and re-added it - no affect :)
The ANSWER
Use Restore-SBFarm PS Cmdlet to communicate to the SBManagementDB that you changed the GW db
and then Use Restore-SBMessageContainer PS Cmdlet to communicate to Gateway DB that you changed the Container databases.
Now, add the Nodes back to this restored farm.
HTH!
Sree