Obtaining Azure VM scale set private IP addresses through Azure REST API - rest

Can you get a list of Azure VM scale set instance private IP addresses through the Azure REST API?
It seems that Microsoft does not publish the VMSS IP configuration objects under the normal methods for retrieving a list of "ipConfigurations".
Here are some relevant API doc pages:
https://learn.microsoft.com/en-us/rest/api/compute/virtualmachinescalesets/listall
https://learn.microsoft.com/en-us/rest/api/compute/virtualmachinescalesets/get
https://learn.microsoft.com/en-us/rest/api/compute/virtualmachines/listall
In particular, this one only gives you the IP configuration of VMs, not VMSSes:
https://learn.microsoft.com/en-us/rest/api/virtualnetwork/networkinterfaces/listall

Here's how to get a list of private IP addresses for VMs and VMSS instances through Ruby:
require 'openssl'
require 'azure_mgmt_network'
require 'azure_mgmt_compute'
require 'awesome_print'
options = {
tenant_id: '<tenant_id>',
client_id: '<client_id>',
client_secret: '<client_secret>',
subscription_id: '<subscription_id>'
}
def net_interface_to_ip_mapping(client)
network_interfaces = client.network_interfaces.list_all
pairs = network_interfaces.collect { |ni| [ni.id.split('/').last, ni.ip_configurations.collect { |ip| ip.private_ipaddress }.flatten.compact[0] ] }
[network_interfaces, pairs]
end
def net_interface_to_vm(ni)
interface_vm_set = ni.collect { |prof| [prof.id, prof.virtual_machine, prof.ip_configurations.collect(&:id)] }
ipconf_to_host = interface_vm_set.collect { |x| [x[2][0], x[1]&.id&.split('/')&.last] }.to_h
conf_ip_map = ni.collect(&:ip_configurations).flatten.compact.collect { |ipconf| [ipconf&.id, ipconf&.private_ipaddress] }.to_h
[ipconf_to_host, conf_ip_map]
end
puts "*** Network Interfaces"
puts
client = Azure::Network::Profiles::Latest::Mgmt::Client.new(options)
ni, pairs = net_interface_to_ip_mapping(client)
pairs.to_h.each do |ni, ip|
puts " #{ni}: #{ip}"
end
puts
puts "*** Virtual Machines"
puts
ipconf_to_host, conf_ip_map = net_interface_to_vm(ni)
ipconf_to_host.each do |ipconf, host|
ni_name = ipconf.split('/')[-3]
puts " #{host || '# ' + ni_name} - #{conf_ip_map[ipconf]}"
end
puts
puts "*** Virtual Machine Scale Sets"
puts
vns = client.virtual_networks.list_all
vns.each do |vn|
resource_group = vn.id.split('/')[4]
puts
vn_details = client.virtual_networks.get(resource_group, vn.name, expand: 'subnets/ipConfigurations')
ip_configs = vn_details&.subnets&.collect { |subnet| subnet&.ip_configurations&.collect { |ip| [ip&.id, ip&.name, ip&.private_ipaddress] } }.compact
vmss_ipconf = ip_configs.collect { |subnet| subnet.select { |ipconf| ipconf[0].include?('/virtualMachineScaleSets/') } }
vmss_ipconf.each do |subnet|
subnet.each do |ipconf|
vmss_name = ipconf[0].split('/')[8]
vmss_instance = ipconf[0].split('/')[10]
puts "#{vmss_name} ##{vmss_instance} - #{ipconf[2]}"
end
end
end

Looking at the Azure CLI, there is az vmss nic list which returns all network interfaces in a virtual machine scale set. Looking at the results, there is
{
"dnsSettings": {
...
},
"ipConfigurations": [
{
privateIpAddress: "..."
}
]
}
You can use the --query syntax to get all private IPs.
az vmss nic list -g <resource_group> --vmss-name <vmss_name> --query [].{ip:ipConfigurations[0].privateIpAddress} -o tsv

you can get VM hostnames that will resolve to IPs thanks to Azure DNS
$ curl -H "Authorization: Bearer $JWT_TOCKEN" -sf https://management.azure.com/subscriptions/${subscription_id}/resourceGroups/${resourc_group}/providers/Microsoft.Compute/virtualMachineScaleSets/${scale_set}/virtualMachines?api-version=2018-10-01 | jq '.value[].properties.osProfile.computerName'
"influx-meta000000"
"influx-meta000001"
$ getent hosts influx-meta000001
10.120.10.7 influx-meta000001.l55qt5nuiezudgvyxzyvtbihmf.gx.internal.cloudapp.net

Related

GCP: using image from one account's Artifact Registry on other account

Hello and wish you a great time!
I've got following terraform service account deffinition:
resource "google_service_account" "gke_service_account" {
project = var.context
account_id = var.gke_account_name
display_name = var.gke_account_description
}
That I use in GCP kubernetes node pool:
resource "google_container_node_pool" "gke_node_pool" {
name = "${var.context}-gke-node"
location = var.region
project = var.context
cluster = google_container_cluster.gke_cluster.name
management {
auto_repair = "true"
auto_upgrade = "true"
}
autoscaling {
min_node_count = var.gke_min_node_count
max_node_count = var.gke_max_node_count
}
initial_node_count = var.gke_min_node_count
node_config {
machine_type = var.gke_machine_type
service_account = google_service_account.gke_service_account.email
metadata = {
disable-legacy-endpoints = "true"
}
# Needed for correctly functioning cluster, see
# https://www.terraform.io/docs/providers/google/r/container_cluster.html#oauth_scopes
oauth_scopes = [
"https://www.googleapis.com/auth/logging.write",
"https://www.googleapis.com/auth/monitoring",
"https://www.googleapis.com/auth/devstorage.read_only",
"https://www.googleapis.com/auth/servicecontrol",
"https://www.googleapis.com/auth/cloud-platform"
]
}
}
However the current solution requires the prod and dev envs to be on various GCP accounts but use the same image from prod artifact registry.
As for now I have JSON key file for service account in prod having access to it's registry. Maybe there's a pretty way to use the json file as a second service account for kubernetes or update current k8s service account with json file to have additional permissions to the remote registry?
I've seen the solutions like put it to a secret or user cross-account-service-account.
But it's not the way I want to resolve it since we have some internal restrictions.
Hope someone faced similar task and has a solution to share - it'll save me real time.
Thanks in advance!

Terraform Error creating Topic: googleapi: Error 403: User not authorized to perform this action

Googleapi: Error 403: User not authorized to perform this action
provider "google" {
project = "xxxxxx"
region = "us-central1"
}
resource "google_pubsub_topic" "gke_cluster_upgrade_notifications" {
name = "cluster-notifications"
labels = {
foo = "bar"
}
message_storage_policy {
allowed_persistence_regions = [
"region",
]
}
}
# create the storage bucket for our scripts
resource "google_storage_bucket" "source_code" {
name = "xxxxxx-bucket-lh05111992"
location = "us-central1"
force_destroy = true
}
# zip up function source code
data "archive_file" "function_script_zip" {
type = "zip"
source_dir = "./function/"
output_path = "./function/main.py.zip"
}
# add function source code to storage
resource "google_storage_bucket_object" "function_script_zip" {
name = "main.py.zip"
bucket = google_storage_bucket.source_code.name
source = "./function/main.py.zip"
}
resource "google_cloudfunctions_function" "gke_cluster_upgrade_notifications" {---
-------
}
The service account has the owner role attached
Also tried using
1.export GOOGLE_APPLICATION_CREDENTIALS={{path}}
2.credentials = "${file("credentials.json")}" by place json file in terraform root folder.
It seems that the used account is missing some permissions (e.g. pubsub.topics.create) to create the Cloud Pub/Sub topic. The owner role should be sufficient to create the topic, as it contains the necessary permissions (you can check this here). Therefore, a wrong service account might be set in Terraform.
To address these IAM issues I would suggest:
Use the Policy Troubleshooter.
Impersonate service account and do the API call using CLI with --verbosity=debug flag, which will provide helpful information about the missing permissions.

How can I name eks worker nodes provisioned with terraform?

I am using terraform 12.20.0 and I have provisioned an EKS cluster with 2 node groups.
How can I add name tags to EKS node workers according to their node group names?
I have tried adding "Name" tag in the additional tag sections of each node-group but the tags did not take and my EC2 instance names are empty, while other tags appear.
Here is the configuration - I have skipped the less relevant bits:
module "eks-cluster" {
...
node_groups_defaults = {
disk_size = 128
key_name = var.key_name
subnets = [
aws_subnet.{{}}.id,
aws_subnet.{{}}.id,
]
k8s_labels = {
env = var.environment
}
additional_tags = {
env = var.environment
"k8s.io/cluster-autoscaler/enabled" = "true"
"k8s.io/cluster-autoscaler/${var.cluster-name}" = "true"
}
}
node_groups = {
app = {
name = "app"
.....
k8s_labels = {
nodegroup = "app"
}
additional_tags = {
nodegroup = "app"
Name = "${var.environment}-app-node"
}
}
ml = {
name = "ml"
...
instance_type = "m5.xlarge"
k8s_labels = {
nodegroup = "ml"
}
additional_tags = {
nodegroup = "ml"
Name = "${var.environment}-ml-node"
}
}
}
tags = {
env = var.environment
}
map_roles = [{
......
}]
}
As per documentation Resource: aws_eks_node_group doesn't allow for modifying tags on your instances.
There is a nice feature coming soon to EKS node groups which will allow you to pass a custom userdata script. Using that you will be able to modify programatically tags for your instances. Issues can be tracked -> https://github.com/aws/containers-roadmap/issues/596
UPDATE:
As of 20/08/2020, you can now utilise launch_template with your node group. This will allow you to pass in Name tag. Example:
resource "aws_launch_template" "cluster" {
image_id = data.aws_ssm_parameter.cluster.value
instance_type = "t3.medium"
name = "eks-launch-template-test"
update_default_version = true
tag_specifications {
resource_type = "instance"
tags = {
Name = "eks-node-group-instance-name"
}
}
}
The following Terraform resource works.
resource "aws_autoscaling_group_tag" "your_group_tag" {
autoscaling_group_name = aws_eks_node_group.your_group.resources[0].autoscaling_groups[0].name
tag {
key = "Name"
value = "enter-your-name-here"
propagate_at_launch = true
}
depends_on = [
aws_eks_node_group.your_group
]
}
Had the same issue, can this is the solution I came up with which works great.
First, create the ASG tags via the aws_autoscaling_group_tag resource.
resource "aws_autoscaling_group_tag" "mytag" {
autoscaling_group_name = aws_eks_node_group.main.resources[0].autoscaling_groups[0].name
tag {
key = "foo"
value = "bar"
propagate_at_launch = true
}
depends_on = [aws_eks_node_group.main]
}
Unfortunately this resource block doesn't accept multiple tags, so you'd have to create this resource block individually for each tag.
Another thing to keep in mind, is that the tags are applied to future scaled EC2 instances, not the currently running ones.
Which means, that you need to either manually scale down your nodes and scale back up, or write a bash script and run it as a local provisioner with terraform.
resource "null_resource" "refresh_autoscale" {
provisioner "local-exec" {
command = "cd ${path.module}/scripts ; bash ./scale_refresh.sh"
environment = {
ASG_NAME = aws_eks_node_group.main.resources[0].autoscaling_groups[0].name
CLUSTER_NAME = "foo_cluster"
NODE_GROUP_NAME = "foo_cluster_node"
REGION = var.region
AWS_PROFILE = var.aws_profile
DESIRED_SIZE = var.desired_size
MIN_SIZE = var.min_size
MAX_SIZE = var.max_size
}
}
depends_on = [aws_eks_node_group.main]
}
Your bash script can run commands via the AWS CLI to scale down and up your node groups.
aws --profile ${AWS_PROFILE} --region ${REGION} eks update-nodegroup-config --cluster-name ${CLUSTER_NAME} \
--scaling-config "minSize=0,maxSize=1,desiredSize=0" --nodegroup-name ${NODE_GROUP_NAME}
Because the instances are not scaled down immediately, there is a period of waiting for the scale down to complete. If you have jq installed, you can periodically query the state of your ASG and see how many instances are currently running.
INSTANCE_COUNT=$(aws --profile ${AWS_PROFILE} --region ${REGION} autoscaling describe-auto-scaling-groups --auto-scaling-group-name ${ASG_NAME} \
| jq '.[][0] | .Instances | length')
I just noticed that
"k8s.io/cluster-autoscaler/${var.cluster-name}" = "true"
Might need to be
"k8s.io/cluster-autoscaler/${var.cluster-name}" = "owned"
There is an existing issue with node group to add the "Name" tag on ASG. https://github.com/aws/containers-roadmap/issues/608 (open) and this on terraform end https://github.com/terraform-aws-modules/terraform-aws-eks/issues/860 (closed)
However, there is an alternative to use aws cli command to add tag explicitly.
Try below to use in terraform
resource "null_resource" "add_custom_tags_to_asg" {
for_each = module.eks-cluster.node_groups
provisioner "local-exec" {
command = <<EOF
aws autoscaling create-or-update-tags \
--tags ResourceId=${each.value.resources[0].autoscaling_groups[0].name},ResourceType=auto-scaling-group,Key=Name,Value=k8s-node-groups-${each.value.labels["env"]},PropagateAtLaunch=true
EOF
}
}

Terraform: How to create a Kubernetes cluster on Google Cloud (GKE) with namespaces?

I'm after an example that would do the following:
Create a Kubernetes cluster on GKE via Terraform's google_container_cluster
... and continue creating namespaces in it, I suppose via kubernetes_namespace
The thing I'm not sure about is how to connect the newly created cluster and the namespace definition. For example, when adding google_container_node_pool, I can do something like cluster = "${google_container_cluster.hosting.name}" but I don't see anything similar for kubernetes_namespace.
In theory it is possible to reference resources from the GCP provider in K8S (or any other) provider in the same way you'd reference resources or data sources within the context of a single provider.
provider "google" {
region = "us-west1"
}
data "google_compute_zones" "available" {}
resource "google_container_cluster" "primary" {
name = "the-only-marcellus-wallace"
zone = "${data.google_compute_zones.available.names[0]}"
initial_node_count = 3
additional_zones = [
"${data.google_compute_zones.available.names[1]}"
]
master_auth {
username = "mr.yoda"
password = "adoy.rm"
}
node_config {
oauth_scopes = [
"https://www.googleapis.com/auth/compute",
"https://www.googleapis.com/auth/devstorage.read_only",
"https://www.googleapis.com/auth/logging.write",
"https://www.googleapis.com/auth/monitoring"
]
}
}
provider "kubernetes" {
host = "https://${google_container_cluster.primary.endpoint}"
username = "${google_container_cluster.primary.master_auth.0.username}"
password = "${google_container_cluster.primary.master_auth.0.password}"
client_certificate = "${base64decode(google_container_cluster.primary.master_auth.0.client_certificate)}"
client_key = "${base64decode(google_container_cluster.primary.master_auth.0.client_key)}"
cluster_ca_certificate = "${base64decode(google_container_cluster.primary.master_auth.0.cluster_ca_certificate)}"
}
resource "kubernetes_namespace" "n" {
metadata {
name = "blablah"
}
}
However in practice it may not work as expected due to a known core bug breaking cross-provider dependencies, see https://github.com/hashicorp/terraform/issues/12393 and https://github.com/hashicorp/terraform/issues/4149 respectively.
The alternative solution would be:
Use 2-staged apply and target the GKE cluster first, then anything else that depends on it, i.e. terraform apply -target=google_container_cluster.primary and then terraform apply
Separate out GKE cluster config from K8S configs, give them completely isolated workflow and connect those via remote state.
/terraform-gke/main.tf
terraform {
backend "gcs" {
bucket = "tf-state-prod"
prefix = "terraform/state"
}
}
provider "google" {
region = "us-west1"
}
data "google_compute_zones" "available" {}
resource "google_container_cluster" "primary" {
name = "the-only-marcellus-wallace"
zone = "${data.google_compute_zones.available.names[0]}"
initial_node_count = 3
additional_zones = [
"${data.google_compute_zones.available.names[1]}"
]
master_auth {
username = "mr.yoda"
password = "adoy.rm"
}
node_config {
oauth_scopes = [
"https://www.googleapis.com/auth/compute",
"https://www.googleapis.com/auth/devstorage.read_only",
"https://www.googleapis.com/auth/logging.write",
"https://www.googleapis.com/auth/monitoring"
]
}
}
output "gke_host" {
value = "https://${google_container_cluster.primary.endpoint}"
}
output "gke_username" {
value = "${google_container_cluster.primary.master_auth.0.username}"
}
output "gke_password" {
value = "${google_container_cluster.primary.master_auth.0.password}"
}
output "gke_client_certificate" {
value = "${base64decode(google_container_cluster.primary.master_auth.0.client_certificate)}"
}
output "gke_client_key" {
value = "${base64decode(google_container_cluster.primary.master_auth.0.client_key)}"
}
output "gke_cluster_ca_certificate" {
value = "${base64decode(google_container_cluster.primary.master_auth.0.cluster_ca_certificate)}"
}
Here we're exposing all the necessary configuration via outputs and use backend to store the state, along with these outputs in a remote location, GCS in this case. This enables us to reference it in the config below.
/terraform-k8s/main.tf
data "terraform_remote_state" "foo" {
backend = "gcs"
config {
bucket = "tf-state-prod"
prefix = "terraform/state"
}
}
provider "kubernetes" {
host = "https://${data.terraform_remote_state.foo.gke_host}"
username = "${data.terraform_remote_state.foo.gke_username}"
password = "${data.terraform_remote_state.foo.gke_password}"
client_certificate = "${base64decode(data.terraform_remote_state.foo.gke_client_certificate)}"
client_key = "${base64decode(data.terraform_remote_state.foo.gke_client_key)}"
cluster_ca_certificate = "${base64decode(data.terraform_remote_state.foo.gke_cluster_ca_certificate)}"
}
resource "kubernetes_namespace" "n" {
metadata {
name = "blablah"
}
}
What may or may not be obvious here is that cluster has to be created/updated before creating/updating any K8S resources (if such update relies on updates of the cluster).
Taking the 2nd approach is generally advisable either way (even when/if the bug was not a factor and cross-provider references worked) as it reduces the blast radius and defines much clearer responsibility. It's (IMO) common for such deployment to have 1 person/team responsible for managing the cluster and a different one for managing K8S resources.
There may certainly be overlaps though - e.g. ops wanting to deploy logging & monitoring infrastructure on top of a fresh GKE cluster, so cross provider dependencies aim to satisfy such use cases. For that reason I'd recommend subscribing to the GH issues mentioned above.

unable to use ssh key when spawning digitalocean droplet using terraform

I am using terraform v0.10.6 to spin up a droplet on digitalocean. I am referencing a key and SSH fingerprint that has already been added to digitalocean in my terraform config (copied below). I am able to log onto existing droplets using this ssh key but not on a newly formed droplet (SSH simply fails). Any thoughts on how to troubleshoot this so that when I launch the droplet via terraform, I should be able to log onto the droplet via the key that has already been added on digitalocean (and visible on DO console). Currently, the droplet appears on the digitalocean admin console but I am never able to SSH onto the server (connection gets denied).
test.tf
# add base droplet with name
resource "digitalocean_droplet" "do-mail" {
image = "ubuntu-16-04-x64"
name = "tmp.validdomain.com"
region = "nyc3"
size = "1gb"
private_networking = true
ssh_keys = [
"${var.ssh_fingerprint}",
]
connection {
user = "root"
type = "ssh"
private_key = "${file(var.private_key)}"
timeout = "2m"
}
provisioner "remote-exec" {
inline = [
"export PATH=$PATH:/usr/bin",
"sudo apt-get update",
]
}
}
terraform.tfvars
digitalocean_token = "correcttoken"
public_key = "~/.ssh/id_rsa.pub"
private_key = "~/.ssh/id_rsa"
ssh_fingerprint = "correct:finger:print"
provider.tf
provider "digitalocean" {
token = "${var.digitalocean_token}"
}
variables.tf
##variables used by terraform
# DO token
variable "digitalocean_token" {
type = "string"
}
# DO public key file location on local server
variable "public_key" {
type = "string"
}
# DO private key file location on local server
variable "private_key" {
type = "string"
}
# DO ssh key fingerprint
variable "ssh_fingerprint" {
type = "string"
}
I was able to setup a new droplet with the SSH key at initialization time when I specified the digitalocean token as an environment variable (as opposed to relying on the terraform.tfvars file).