How does cloud-init work? - cloud-init

cloud-init is package performing various configurations on a virtual machine on first boot. You have to configure a file with your config, and throw it at your VM then you virtualize it.
But how exactly does it work ? How is the user data sent to the VM, and how cloud-init manages to execute the configurations ?
Thank you.

short answer - datasources.
Cloud-init has the concept of datasources which point to the source of the user-data and metadata.
Have a look here https://cloudinit.readthedocs.io/en/latest/topics/datasources.html

Related

How to wait for full cloud-initialization before VM is marked as running

I am currently configuring a virtual machine to work as an agent within Azure (with Ubuntu as image). In which the additional configuration is running through a cloud init file.
In which, among others, I have the below 'fix' within bootcmd and multiple steps within runcmd.
However the machine already gives the state running within the azure portal, while still running the cloud configuration phase (cloud_config_modules). This has as a result pipelines see the machine as ready for usage while not everything is installed/configured yet and breaks.
I tried a couple of things which did not result in the desired effect. After which I stumbled on the following article/bug;
The proposed solution worked, however I switched to a rhel image and it stopped working.
I noticed this image is not using walinuxagent as the solution states but waagent, so I tried to replacing that like the example below without any success.
bootcmd:
- mkdir -p /etc/systemd/system/waagent.service.d
- echo "[Unit]\nAfter=cloud-final.service" > /etc/systemd/system/waagent.service.d/override.conf
- sed "s/After=multi-user.target//g" /lib/systemd/system/cloud-final.service > /etc/systemd/system/cloud-final.service
- systemctl daemon-reload
After this, also tried to set the runcmd steps to the bootcmd steps. This resulted in a boot which took ages and eventually froze.
Since I am not that familiar with rhel and Linux overall, I wanted to ask help if anyone might have some suggestions which I can additionally try.
(Apply some other configuration to ensure await on the cloud-final.service within a waagent?)
However the machine already had the state running, while still running the cloud configuration phase (cloud_config_modules).
Could you please be more specific? Where did you read the machine state?
The reason I ask is that cloud-init status will report status: running until cloud-init is done running, at which point it will report status: done
I what is the purpose of waiting until cloud-init is done? I'm not sure exactly what you are expecting to happen, but here are a couple of things that might help.
If you want to execute a script "at the end" of cloud-init initialization, you could put the script directly in runcmd, and if you want to wait for cloud-init in an external script you could do cloud-init status --wait, which will print a visual indicator and eventually return once cloud-init is complete.
On not too old Azure Linux VM images, cloud-init rather than WALinuxAgent acts as the VM provisioner. The VM is marked provisioned by the Azure cloud-init datasource module very early during cloud-init processing (source), before any cloud-init modules configurable with user data. WALinuxAgent is only responsible for provisioning Azure VM extensions. It does not appear to be possible to delay sending the 'VM ready' signal to Azure without modifying the VM image and patching the source code of cloud-init Azure datasource.

Cloud Init on Google Compute Engine (GCE) with Centos7/8 doesn't run properly on First boot, but fine after any other reboot

We have a CentOS 8 (tried 7 as well) image and I am adding some config to act as a router.
The issue is, for some reason, the first time the instance is created, cloud init doesn't read the network config we pass using the user-data metadata
#cloud-config
network
version: 1
etc...
We configure eth1 to use dhcp and get cloud-init to manage it, as well as add a route.
Works perfectly every time after the initial boot up (and stop>start again).
To me it feels like cloud-init is not aware of the config, but when I go in the machine and do cloud-init query userdata i can see the data, and even then if I do cloud-init clean && cloud-init init it doesn't do anything. The same commands work fine if the machine was rebooted
Try running cloud-init analyze show both times (instance creation and consecutive reboot) and check for any differences.
Sadly, cloud providers kind-of abuse the abilities of cloud-init, not to a complete fault. cloud-init allows for customization of vendor/user provided configuration (who overrides what), changing the order of boot stages, etc.
This is done mostly because different cloud providers need network/provisioning/storage at different times. For example, AWS attaches storage after network (EBS only), Azure provides VM only after storage is attached and it's natively provided as NTFS (they really format the drive if you need anything else), etc.
These shenanigans, while understandable (datacenter infrastructure defines user availability) make cloud-init's documentation merely a suggestion for the user to investigate.
From my experience, Azure is the closest to original implementation. Possibly they haven't learned yet how to utilize the potential in their favor.
My general suggestion for any instance customization (almost always works) is to write a script with write_files and execute them with bootcmd/runcmd, because these run at the final stage, and provide for best override opportunity. Edit hosts, change firewall rules - most of the stuff will not require reboot.

How to enable networking service at boot in Yocto

I have a Yocto based OS on which I have everything installed to start the network.
Nevertheless, at each boot I need to do systemctl start networking to start it. Initially the service was even masked. I found out how to unmask it but I can't find a way to start it automatically.
I don't know much about systemd but the networking.service is located in generator.late folder. From what I understood, it's generated afterward.
How can I enable it?
It depends if you want to enable the service only on one particular device. If yes, it is simple:
systemctl enable networking
Append the parameter --now if you also want to start the service just now.
If you want to enable the service on all your devices (i.e. it will be automatically enabled in all your images coming from build), the best way is to extend the recipe, but please see below for other ways how to handle the network. The process is describe at NXP support for example.
Some notes about networking.service itself: I assume that your networking.service comes from init-ifupdown recipe. If yes, is there any reason to handle network configuration using old SysV init script in system with systemd? The service is generated from SysV init script by systemd-sysv-generator. So I would suggest to try other networking services like systemd's native "systemd-networkd", "NetworkManager" or "connman". The best choice depends on type of your embedded systemd. These services are integrated with systemd much better.
Some more information on activating or enabling the services: https://unix.stackexchange.com/questions/302261/systemd-unit-activate-vs-enable

NixOS within NixOS?

I'm starting to play around with NixOS deployments. To that end, I have a repo with some packages defined, and a configuration.nix for the server.
It seems like I should then be able to test this configuration locally (I'm also running NixOS). I imagine it's a bad idea to change my global configuration.nix to point to the deployment server's configuration.nix (who knows what that will break); but is there a safe and convenient way to "try out" the server locally - i.e. build it and either boot into it or, better, start it as a separate process?
I can see docker being one way, of course; maybe there's nothing else. But I have this vague sense Nix could be capable of doing it alone.
There is a fairly standard way of doing this that is built into the default system.
Namely nixos-rebuild build-vm. This will take your current configuration file (by default /etc/nixos/configuration.nix, build it and create a script allowing you to boot the configuration into a virtualmachine.
once the script has finished, it will leave a symlink in the current directory. You can then boot by running ./result/bin/run-$HOSTNAME-vm which will start a boot of your virtualmachine for you to play around with.
TLDR;
nixos-rebuild build-vm
./result/bin/run-$HOSTNAME-vm
nixos-rebuild build-vm is the easiest way to do this, however; you could also import the configuration into a NixOS container (see Chapter 47. Container Management in the NixOS manual and the nixos-container command).
This would be done with something like:
containers.mydeploy = {
privateNetwork = true;
config = import ../mydeploy-configuration.nix;
};
Note that you would not want to specify the network configuration in mydeploy-configuration.nix if it's static as that could cause conflicts with the network subnet created for the container.
As you may already know, system configurations can coexist without any problems in the Nix store. The problem here is running more than one system at once. For this, you need an isolation or virtualization tools like Docker, VirtualBox, etc.
NixOS Containers
NixOS provides an efficient implementation of the container concept, backed by systemd-nspawn instead of an image-based container runtime.
These can be specified declaratively in configuration.nix or imperatively with the nixos-container command if you need more flexibility.
Docker
Docker was not designed to run an entire operating system inside a container, so it may not be the best fit for testing NixOS-based deployments, which expect and provide systemd and some services inside their units of deployment. While you won't get a good NixOS experience with Docker, Nix and Docker are a good fit.
UPDATE: Both 'raw' Nix packages and NixOS run in Docker. For example, Arion supports images from plain Nix, NixOS modules and 'normal' Docker images.
NixOps
To deploy NixOS inside NixOS it is best to use a technology that is designed to run a full Linux system inside.
It helps to have a program that manages the integration for you. In the Nix ecosystem, NixOps is the first candidate for this. You can use NixOps with its multiple backends, such as QEMU/KVM, VirtualBox, the (currently experimental) NixOS container backend, or you can use the none backend to deploy to machines that you have created using another tool.
Here's a complete example of using NixOps with QEMU/KVM.
Tests
If the your goal is to run automated integration tests, you can make use of the NixOS VM testing framework. This uses Linux KVM virtualization (expose /dev/kvm in sandbox) to run integrations test on networks of virtual machines, and it runs them as a derivation. It is quite efficient because it does not have to create virtual machine images because it mounts the Nix store in the VM. These tests are "built" like any other derivation, making them easy to run.
Nix store optimization
A unique feature of Nix is that you can often reuse the host Nix store, so being able to mount a host filesystem in the container/vm is a nice feature to have in your solution. If you are creating your own solutions, depending on you needs, you may want to postpone this optimization, because it becomes a bit more involved if you want the container/vm to be able to modify the store. NixOS tests solve this with an overlay file system in the VM. Another approach may be to bind mount the Nix store forward the Nix daemon socket.

When is cloud-init run and how does it find its data?

I'm currently dealing with CoreOS, and so far I think I got the overall idea and concept. One thing that I did not yet get is execution of cloud-init.
I understand that cloud-init is a process that does some configuration for CoreOS. What I do not yet understand is…
When does CoreOS run cloud-init? On first boot? On each boot? …?
How does cloud-init know where to find its configuration data? I've seen that there is config-drive and that totally makes sense, but is this the only way? What exactly is the role of the user-data file? …?
CoreOS runs cloudinit a few times during the boot process. Right now this happens at each boot, but that functionality may change in the future.
The first pass is the OEM cloud-init, which is baked into the image to set up networking and other features required for that provider. This is done for EC2, Rackspace, Google Compute Engine, etc since they all have different requirements. You can see these files on Github.
The second pass is the user-data pass, which is handled differently per provider. For example, EC2 allows the user to input free-form text in their UI, which is stored in their metadata service. The EC2 OEM has a unit that reads this metadata and passes it to the second cloud-init run. On Rackspace/Openstack, config-drive is used to mount a read-only filesystem that contains the user-data. The Rackspace and Openstack OEMs know to mount and look for the user-data file at that location.
The latest version of CoreOS also has a flag to fetch a remote file to be evaluated for use with PXE booting.
The CoreOS distribution docs have a few more details as well.