Scale set using keyvault in another region - azure-service-fabric

I'm working with an ARM template that creates a VM Scale Set for a Service Fabric cluster and associates some secrets with the VMs from a keyvault. I discovered this morning that it appears the VMs and keyvault must exist in the same region or I get an error like this:
New-AzureRmResourceGroupDeployment : 9:24:55 AM - Resource Microsoft.Compute/virtualMachineScaleSets 'StdNode' failed with message '{ "status": "Failed", "error": {
"code": "ResourceDeploymentFailure",
"message": "The resource operation completed with terminal provisioning state 'Failed'.",
"details": [
{
"code": "KeyVaultAndVMInDifferentRegions",
"message": "The Key Vault https://obscured.vault.azure.net/secrets/secretname/1112222aa31c4dcca4363bb0013e9999 is located in location West US, which is different from the location of the VM, northcentralus. "
}
] } }'
This feels like an artificial limitation and is a major issue for me. I want to have a centralized keyvault where I deploy all of my secrets and utilize them from all my deployments. Having to duplicate my secrets in regions around the world seems ridiculous and VERY error prone. There should be no significant perf issue here in obtaining secrets across regions. So what is the reason behind this, and will it change?
Anyone from the Azure Scale Sets team want to offer some color to this?

the reason that we enforce region boundaries is to prevent users from creating architectures that have cross region dependencies.
For an application designed like this an outage of the japaneast datacenter will cause your VMSSes in JapanWest to not be able to successfully scale out.
Regional isolation is a key design principle of cloud based applications, and we want to prevent users from making bad choices if we can.
The reason we do not allow cross subscription references is as an important final step to prevent malicious users from using CRP as a privilege escalation mechanism to access other users secrets.
There are other mechanisms which also prevent this in ARM, but are based on a configuration.

To overcome the problem you may simply want to apply a simple fix
Get-AzVM -ResourceGroupName "rg1" -Name "vm1" | Remove-AzVMSecret | Update-AzVM
This will remove the earlier secret and reissue a new one so that your vm is back in provisioning state.

You can use an architecture of a central key vault that you access for template parameters and store those secrets in a regional key vault. Then link to the regional key vault for your scale set. If the secrets are certificates you can have an ARM function to format the certificate (as a secret) properly to be imported as a part of the OSImage property on the VM/VMSS.
A more indepth look can be found here: https://devblogs.microsoft.com/premier-developer/centralized-vm-certificate-deployment-across-multiple-regions-with-arm-templates/

Related

Allow Cloud Service ARM template deployment to a different subscription than the keyVault

Our KeyVault is in subscription 1 and we have multiple Cloud Services for multiple areas that we need deployed in different subscriptions. While working in Azdo I found out that I am unable to deploy CSES to a subscription that is different than the keyVault since the ARM template used for deployment is trying to access secrets from the keyvault.
Then, when I read this document https://learn.microsoft.com/en-us/azure/cloud-services-extended-support/deploy-prerequisite, it states that the "The key vault must be created in the same region and subscription as the cloud service".
Does anyone know of a way around this? It's imperative that we are able to deploy multiple Cloud Services (for different areas) in different Subscriptions and we only have one keyvault that stores all values used by the cloud services.
As mentioned in the Microsoft Documentation that you have shared , its not possible as its a prerequisite to create Key vault in the same subscription as the cloud Services.
In this Github Issue , it is possible to use secrets from one subscription in another subscription but using certificates is an limitation in ARM templates.
It is recommended by Azure to use for different Key vault for different environments for using Certificates .
Secrets can be referenced as parameters in the ARM template to used by Azure Services but certificate can't be referenced from another subscriptions otherwise you will get the below error :
{
"status": "Failed",
"error": {
"code": "InvalidParameter",
"target": "sourceVault.id",
"message": "The SubscriptionId:\"<id>\" of the request must match the SubscriptionId \"<sharedId>\" contained in the Key Vault Id."
}
}

Getting started with Vault for existing non-containerized Windows apps

We have a bunch of Windows server applications that currently handle secrets as follows; our apps are in C#.
We store them in settings files in code
We store them encrypted, using a certificate
The servers have this certificate with the private key, so they can decrypt the secret
We're looking at implementing Hashicorp Vault. It seems easy enough to simply replace the encrypt-store-decrypt with storing the secret in Vault in the KV engine, and just grabbing it in our apps - that takes that certificate out of the picture entirely. Since we're on-prem, I'll need to figure out our auth method.
We have different apps running on different machines, and it's somewhat dynamic (not as much as an autoscaling scenario, but not permanent - so we can't just assign servers to roles one time and depend on Kerberos auth).
I'm unsure how to make AppRole work in our scenario. We don't have one of the example "trusted platforms" or "trusted entities", there's no Nomad, Chef, Terraform, etc. We have Windows machines, in a domain, and we have a homegrown orchestrator that could be queried to say "This machine name runs these apps", so maybe there's something that can be done there?
Am I in "write your own auth plugin" territory, to speak to our homegrown orchestrator?
Edit - someone on Reddit suggested that this is a simple solution if our apps are all 1-to-1 with the Windows domain account they run under, because then we can just use kerb authentication. That's not currently the way we're architected, but we've got to solve this somehow, and that might do it nicely.
2nd edit - replaced "services" with "apps", since most of our services aren't actually running as Windows services, just processes. The launcher is a Windows service but the individual processes it launches are not.
How about Group Managed Service Accounts?
https://learn.microsoft.com/en-us/windows-server/security/group-managed-service-accounts/group-managed-service-accounts-overview
Essentially you created one "trusted platform" (to your key vault service).
Your service can still has its own identity but delegation to the gMSA when you want to retrieve the secrets.
For future visibility, here's what we landed on:
TLS certificate authentication. Using Vault, we issue a handful of certs, each will correspond to a security policy/profile, so that any machine that holds that certificate will be able to authenticate and retrieve the secrets they should have access to.
Kerberos ended up being a dead-end for two reasons. The vault.exe agent (which is part of this use case) can't use the native Windows Kerberos SSPI, so we'd have to manage and distribute keytab files. Also, if we used machine authentication, it would blow up our client count (we're using the cloud-hosted HCP Vault, where pricing is partially based on client count).
Custom plugins can't be loaded into the HCP, of course
Azure won't work, it requires Managed Identities which you can't assign to on-prem machines. Otherwise this might have been a great fit

How to use Hashicorp Vault's AppRole in production?

We have installed and configured Hashicorp Vault AppRole authentication for one server, by storing the role_id and secret_id in a local file on the server, and we're able to have code on the server read the values from file, authenticate to Vault, receive a token and then read the secrets it needs from Vault. So far so good. However, the secret_id expires after 31 days, and so the process fails.
I've read up on the concepts of using AppRoles, and they seem like the perfect fit for our use case, but for this expiration. We don't want to have to re-generate the secret_id every month.
From what I've read, if you create the role without setting secret_id_ttl it should be non-expiring, but that isn't the case. This may be due to how the AppRole auth method is configured, but I haven't seen anything solid on this.
So I found an article on the Hashicorp website where AppRoles are discussed in detail. The article gives good arguments for expiring secret_id's in a CI/CD environment, even illustrating how this works in 8 simple steps. I understand how this works, but the article fails to mention how the CI/CD and Orchestrator systems themselves are authenticated to Vault? Or am I missing something?
In the end, I want to have the secret_id not expire. Ever.
Without additional support from your environment you will have to write some logic in your installer, and have a service manager of some sort to start your services. In many cloud environments, you may already have the equivalent entities (Terraform, Cloud Formation, etc.) and you should leverage their secrets management capabilities where needed.
For custom installations, here is a workflow that I have used.
Have an installation manager process that can be invoked to perform installation / upgrade. Make sure installation / upgrade of services is always through this process.
Have a service manager process that is responsible for starting individual services and monitoring them / restarting them. Make sure service start-ups are always via this service manager.
During installation, generate self-signed certificates for Vault, installation manager and service manager. Vault certificates should trust the certs for the installation manager and the service manager. Store these with limited permission (600) in directories owned by the installation user or the service manager user as the case may be. Set up certificate-based authentication in Vault using these certs.
These credentials should have limited capabilities associated with them. The installation manager should only be able to create new roles and not delete anything. The service manager should only be able to create secrets for the named roles created by the installation manager, and delete nothing.
During installation / upgrade, the installation manager should connect to Vault and create all necessary service-specific roles. It should also be able to set role ids for individual services in per-service config files that the services may read on start-up.
During each service's start-up, the service manager should connect to Vault and create secret ids corresponding to each service's role. It should set the secret id in an environment variable and start the service. The secret id should have time-bound validity (by setting TTLs) so that they cannot be used for much beyond the creation of the auth token (see #7).
Each service should read the role id from the config file, and the secret id from the environment variable. It should then generate the auth token using these two, and use the token to authenticate itself with vault for its lifetime.
It is possible to create a Vault AppRole with a secret_id that essentially never expires. However, this should be limited to use on a Vault development server -- one that does not contain any production credentials -- and for use in a development environment.
That being said, here's the procedure I used based on several articles in the Vault documentation, but primarily AppRole Pull Authentication.
This assumes that the Vault approle authentication method is already installed at approle/ and that you are logged in to Vault, have root or admin privileges on the Vault server and have a valid, non-expired token.
Note: For the values supplied for the fields below, the maximum value that vault seems to accept is 999,999,999. For the TTL fields, that is the number of seconds which comes out to more than 31 years. That's not forever, but it is long enough that renewing the secret_id will probably be somebody else's problem (SEP).
# Vault server address to be used by the Vault CLI.
export VAULT_ADDR="https://vault-dev.example.com:8200/"
# Vault namespace to be used by the CLI.
# Required for Cloud and Enterprise editions
# Not applicable for Open Source edition
export VAULT_NAMESPACE="admin"
# The name of the Vault AppRole
export VAULT_ROLE=my-approle
# Override defaults on the approle authentication method
# NOTE: In this command, the field names, default-lease-ttl
# and max-lease-ttl contain dashes ('-'), NOT
# underscores ('_'), and are preceded by a single
# dash ('-').
vault auth tune \
-default-lease-ttl=999999999 \
-max-lease-ttl=999999999 approle/
# Override defaults on the approle
# NOTE: In this command, the field names, secret_id_ttl and
# secret_id_num contain underscores ('_'), NOT
# dashes ('-'), and are NOT preceded by a single
# dash ('-').
vault write auth/approle/role/my-approle \
secret_id_ttl=999999999 \
secret_id_num_uses=999999999
# Create a new secret_id for the approle which uses the new defaults
vault write -f auth/approle/role/my-approle/secret-id
Update the server config file to use the new secret_id and you are ready to go.
As the OP has noted, the Hashicorp Vault documentation assumes that the application is able to authenticate, somehow, to the vault and then retrieve the secret ID (possibly wrapped) from the vault and then, use that to authenticate and fetch a token used to actually work with secrets. The answers here are posing alternative approaches to retrieving that initial token.
Alan Thatcher wrote a blog article, Vault AppRole Authentication, that provides another well thought out approach:
Create a policy that allows the user to retrieve the secret-id and role-id, but nothing else.
Create a long lived, periodic/renewable token based on that policy.
Store the long lived token securely, e.g. as a Kubernetes secret
At runtime, use the long-lived token to:
acquire the secret-id and role-id,
authenticate to vault using these and acquire short-lived token
use current short-lived token to work with secrets
For Java applications, the Spring Vault project supports this approach if you configure the long-lived token as the "initial token" and the approle authencation name, e.g. chef-ro in the blog case.
My personal feeling is that this approach is about as secure but a bit simpler than the mutual TLS approach. I agree that using an infinite TTL for the secret-id is a less secure practice for Production environments.
Thanks to Mr. Thatcher for thinking this one through.
This is probably not the canonnical answer, but I found it empty so decided to add some pointers.
As per Hashicorp Vault AppRole: role-id and secret-id:
Additional brownie information: Ideally, it's best practice to keep
the TTL low, 30 minutes max - if your application is stateful, or
maybe even less if it's a stateless application. The secret key of
Vault approle should also be rotated every 90 days. Please note by
default, Vault approle backend has 31 days of TTL, so if you want to
set it to 90 days, you need to increase TTL of the approle backend as
well.
However (in the same question):
You can generate secret-id with indefinite validity. But doing so will
be as good as keeping your secrets in the configuration file.
For ephemeral instances you can use configuration management to pass in secrets via a third (broker) role. With regard to a server that exists indefinitely, i'm still working that out...
Ideas:
TLS certificates might work well on Windows, don't know about Linux.
GitHub Personal Access Tokens, but this is not org. friendly.
Review the other auth methods available to see if there's one that fits your requirements (e.g. AWS).

Using Azure RM Powershell how do I find out which Recovery Services Vault VM is connected to?

I can clearly see they can do that in the Portal, so must be the way - without looping through all available recovery services vaults - to get the RSV VM belongs to.
In other words: I know only VM - Name, Id etc. I want to know the Vault it belongs to.
Without traversing through all the vaults in the subscription - I know how to achieve this, but it's too slow with many Vaults.
Traversing / Looping / FOREACHing through all vaults trying to find the VM is not an acceptable, although working, solution.

Best practices for setting up developer access to Azure Resources

I would like to find out what the best practices are for managing developers' access to a sub-set of resources on a client's subscription?
I've searched Google and the Azure documentation looking for definitive answers, but I have yet to come across an article that puts it all together. Because Azure is still developing so rapidly I often find it difficult to determine whether a particular article may still be relevant.
To sum up our situation:
I've been tasked with researching and implementing the Azure infrastructure for a web site our company is developing for a client. At the moment our manager and I have access to the client's entire subscription on the Azure Portal by means of the Service Administrator's credentials, even though we're managing only:
Azure Cloud Service running a Web-Role (2-instances with Production and Staging environments).
Azure SQL Database.
Azure Blob Storage for deployments, diagnostics etc.
We're now moving into a phase where more of the developers in the team will require access to perform maintenance type tasks such as performing a VIP swap, retrieving diagnostic info etc.
What is the proper way to manage developer's access on such a project?
The approach I've taken was to implement Role Based Access Control (https://azure.microsoft.com/en-us/documentation/articles/role-based-access-control-configure/)
Move 1, 2, and 3 above into a new Resource Group according to http://blog.kloud.com.au/2015/03/24/moving-resources-between-azure-resource-groups/
Creating a new User Group for our company, say "GroupXYZ".
Adding the "GroupXYZ" to the Contributor role.
Adding the particular developer's company accounts to "GroupXYZ"
Motivation for taking the role-based approach
From what I understand giving everyone access as a Co-Administrator would mean that they have full access to every subscription in the portal.
Account-based authentication is preferable to certificate-based authentication due to the complexity added by managing the certificates.
What caused me to question my approach was the fact that I could not perform a VIP swap against the Cloud Service using PowerShell; I received an error message stating that a certificate could not be found.
Do such role-based accounts only have access to Azure by means of the Resource Manager Commandlets?
I had to switch PowerShell to the Azure Service Manager (ASM) Mode before having access to the Move-AzureDeployment commandlet.
Something else I'm not sure of is whether or not Visual Studio will have access to those resources (in the Resource Group) when using Role Based Access Control.
When you apply RBAC to Azure as you have or just in general, give access to an account via RBAC, then those accounts can only access Azure via the Azure Resource Manager APIs, whether that's PowerShell, REST or VS.
VS 2015 can access Azure resources via RBAC when using the 2.7 SDK. VS 2013 will have support for it soon.