Managing Azure AKS clusters with VMWare Aria Automation

The use case presented to me for POC was to deploy a new Azure AKS cluster then install a basic application. Simple use case, but for those using VRA you know the kubernetes capabilities are all but non existent.

But after digging around and tinkering I figured a CodeStream (now called Pipelines) would probably fit the bill. The pipeline would run a terraform plan to build, and then destroy the deployment later on.

Keeping track of the state file between runs also presented a ‘problem’. After lots of kicking the tires I came up with a way to store the state file securing in an Azure Storage account. The state file in the container is simply the deployment name plus .tfstate. This allowed me to refer to it using day two actions and Event Broker Subscriptions (EBS).

Another issue that came up was deleting ‘codestream.execution’ resources when the deployment is deleted. Since these deployments are handled by terraform I needed another WF which called a pipeline to destroy the deployment when the eventType was DESTROY_DEPLOYMENT.

The files for this article can be found at azure-terraform-blog

Terraform is used to do the heavy lifting. The backend values get replaced with some pipeline inputs in the first pipeline task. The most important one is the deployment name. When destroying the deployment, terraform will pull the current state for that deployment and do its thing.

The CodeStream pipeline (Now Pipelines) uses a custom docker image. It includes the latest version of Terraform (Currently at 1.5.4), AZ CLI, Kubectl, and Helm (for another use case). It is stored on DockerHub as americanbwana/cas-terraform-154:latest.

I didn’t come up with the basic Template. I found this article on vEducate.co.uk. A very good starting point. ‘pipelineTask’ is used by the pipeline to either create (apply) or destroy the deployment. More on that later.

formatVersion: 1
inputs:
  pipelineTask:
    type: string
    title: Pipeline Task
    description: 'Create '
    readOnly: true
    default: create
resources:
  cs.pipeline:
    type: codestream.execution
    properties:
      pipelineId: 2b80427c...
      outputs:
        computed: true
      inputs:
        deploymentName: ${env.deploymentName}
        pipelineTask: ${input.pipelineTask}

vRA doesn’t delete the actual codestream.execution items when you destroy the deployment. A workflow called ‘Terraform delete AKS and Helm deployment’ is called by an Event Broker Subscription (EBS). Make sure to update the ‘codestreamPipelineId’ in the WF variables.

And finally on to the pipeline. The initialize task copies several variables into a file, which is then sourced by most stages. Terraform apply is only fired if the pipelineTask = ‘create’. And Terraform destroy is only fired when pipelineTask = ‘destroy’.

‘Get Service IP’ is also only fired if the pipelineTask = ‘create’. This task will get the IP address of WordPress and export it back to vRA.

Nuff for now. Happy coding.

CyberArk Ansible Integration

As an alternative to vRA Cloud Secrets

Well its been a while since I posted anything. To be honest, this site and posts were used to support my vExpert applications, but apparently blog content doesn’t count anymore. So…. now that I’m free from that obligation, I can just post because I want to.

This article details my efforts to understand how CyberArk and Ansible work together. My particular use case is to replace vRA Cloud secrets with variables stored in CyberArk. More specifically the issue with vRA secrets is they are limited to a single Project. This doesn’t work to well for a company with more than one project. Basically have one secret (mysecret) per project. Or if you have 10 projects, 10 secrets named mysecret (one for each project).

Now down to business. The first thing is to setup CyberArk following the instructions from their Quick Start tutorial. The basic setup is done by step 6, no real need to go past that unless you want to. A couple of notes here. First the Master Key (Step 2) and Admin api_key (Step 5) are saved to a text file on your docker host. And secondly, by default the SSL generated by the installer uses localhost, proxy, and 127.0.0.1 as the SAN. You can change this in conjur-quickstart/conf/tls/tls.conf. I’ll be using the default proxy as the hostname, along with some entries in /etc/hosts on my Mac and Ansible host.

Next I installed Cyberark CLI on my Mac. The instructions are available here. Note is is only supported on Windows, RHEL and Mac.

The setup file on my Mac for ~.conjurcli looks like this.

cert_file: /Users/me/conjur-server.pem
conjur_account: myConjurAccount
conjur_url: https://proxy:8443

Now to define some CyberArk Conjur (conjur) policy files. The first was to define a new clean branch for my ansible policies. I called it mybranch (Hey it was Friday and I already used my weekly good braincell quota). I even used a creative name, ‘create-ansible-branch.yaml’.

- !policy
  id: mybranch

And to apply it (assuming you’ve already logged in as Admin).

mymac>conjur policy replace -b root -f create-ansible-branch.yaml
mymac>conjur list
[
    "myConjurAccount:policy:mybranch",
    "myConjurAccount:policy:root"
]

Now on to defining the ansible host (ansible2)

- !layer

- !host ansible2

- !grant
  role: !layer
  member: !host ansible2

mymac>conjur policy load -b mybranch -f ansible2-host-policy.yaml

The result will contain an api_key for the new host. You’ll probably want to copy this into your scratch pad.

  {
      "created_roles": {
          "myConjurAccount:host:mybranch/ansiblehost": {
              "id": "myConjurAccount:host:mybranch/ansiblehost",
              "api_key": "1xgpkp02d8etyz2zb........" # <--- api_key
          }
      },
      "version": 2
  }

Now to create a new group, variable, and grant ansible2 permissions.

# Declare the secrets which are used to access the database
- &variables
  - !variable password2

# Define a group which will be able to fetch the secrets
- !group secrets-users

- !permit
  resource: *variables
  # "read" privilege allows the client to read metadata.
  # "execute" privilege allows the client to read the secret data.
  # These are normally granted together, but they are distinct
  #   just like read and execute bits on a filesystem.
  privileges: [ read, execute ]
  roles: !group secrets-users
# Entitlements

- !grant
  role: !group secrets-users
  member: !layer /mybranch

mymac>conjur policy load -b mybranch -f ansible2-access-policy.yaml
### Set the password variable value
mymac>conjur variable set -i mybranch/password2 -v "HelloWorld"

Our work with CyberArk is done for the time being. Now on to your ansible host. Here the assumption is our ansible host is setup properly. First install the Cyberark.conjur collection.

ubunutu@ansible2$ansible-galaxy collection install cyberark.conjur

Now to define some files on your ansible host. The file names and content are shown below. You can figure out how to get the contents of conjur.pem.

/etc/conjur.conf

account: myConjurAccount
appliance_url: https://proxy:8443
cert_file: /etc/conjur.pem
netrc_path: /etc/conjur.identity
plugins: []

/etc/conjur.identity

machine https://proxy:8443/authn
    login host/mybranch/ansible2
    password gybp2n1wssmh1fr8n5k27.........


/etc/conjur.pem

-----BEGIN CERTIFICATE-----
.......
-----END CERTIFICATE-----

Almost there, now to define and run a basic ansible playbook. And by basic, I mean basic.

# get_conjur_var.yaml

---
- hosts: localhost
  tasks:
  - name: Lookup variable in Conjur
    debug:
      msg: "{{ lookup('cyberark.conjur.conjur_variable', 'mybranch/password2') }}"

ubunutu@ansible2$ansible-playbook get_conjur_var.yaml

.... 
ok: [localhost] => {
    "msg": "HelloWorld"
}
....

The next article will demonstrate how to use this with vRA cloud to replace all those repetitive secrets (Per project, Yuk!)

Custom vSphere Template import into AWS as AMI

My current customer asked if they could use the same vSphere template as an AWS AMI. The current vSphere template has a custom disk layout to help them troubleshoot issues. The default single disk layout for AMI’s actually hinders their troubleshooting methodology.

Aside from the custom disk layout, I know VMtools would have to be replaced with cloud-init. Sure no problem. RIGHT! Well actually it wasn’t that hard.

Well I was finally able get it to work, and learned a bunch along the way. Those lessons include,

The RHEL default DHCP client is incompatible with AWS.
EFI bios is only supported in larger, more expensive instances.
AWS VM image import.
Make sure to enable ‘disable_vmware_customization’, if that made sense.

Requirements

AWS roles, policies and permissions per this document.
S3 bucket (packer-import-example) to store the VMDK until it is imported.
Basic IAM user (packer) with the correct permissions assigned (see above).
vSphere environment to build the image.
A RHEL 8.x DVD ISO for installation.
HTTP repo to store the kickstart file.

Now down to brass tacks. To be honest it took lots of trial and error (mostly error) to get this working right. For example, on one pass Cloud-Init wouldn’t run on the imported AMI. After looking at cloud.cfg I noticed ‘disable_vmware_customization’ was set to false instead of ‘true’. Another error occurred when my first import attempt failed as the machine did not have a ‘DHCP client’. That was odd as it booted up fine in vSphere and got an IP Address. Apparently AWS only supports certain DHCP clients. Go figure.

Eventually the machine booted properly in AWS, with the user-data applied correctly. The working user-data is in the repo’s cloud-init directory.

And my super simple vRAC blue print even worked. This simple BP adds a new user, assigns a password, and grants it SUDO permissions.

A couple of notes on the packer amazon-import post processor. Those include,

The images are encrypted by default, even tho the default for ‘ami_encrypt’ is false by default.
‘ami_name’ requires the AWS permission of ec2:CopyImage on the policy for the import role.
Don’t use the default encryption key if you wish to share this. You’ll need a Customer Managed Key (CMK). The import role (vmimport) will need to be a key user. You can set this with ‘ami_kms_key’ set to the Id of the CMK (i.e., ebea!!!!!!!!-aaaa-zzzz-xxxxxxxxxxxxxx)
The CMK needs to be shared with the target customer before sharing the AMI. ‘ami_org_arns’ allows you to set the organizations you’d like to share the AMI with.
There are lots of import options, you can check them out here.

This working example, plus others I’ve been working are available in this github repo.

Now onto another vRAC adventure.

Packer HCL and PVSCSI drivers

Just this last week I was updating an old Packer build configuration from JSON to HCL. But for the life of me could not get a new vSphere Windows 2019 machine to find a disk attached to Para Virtualized disk controller.

I repeatedly received this error after the machine new machine booted.

In researching error 0x80042405 in C:\Windows\pather\setuperr.log, I found it simply could not find the attached disk.

After some research I determined the PVSCSI drivers added to the floppy disk where not being discovered. Or more specifically the new machine didn’t know to search the floppy for additional drivers.

I finally found a configuration section for my autounattend.xml file which would fix it after an almost exhaustive online search.

The magic section reads as follows.

<unattend xmlns="urn:schemas-microsoft-com:unattend">
    <settings pass="windowsPE">        
       <component name="Microsoft-Windows-PnpCustomizationsWinPE" processorArchitecture="amd64" publicKeyToken="31bf3856ad364e35" language="neutral" versionScope="nonSxS" xmlns:wcm="http://schemas.microsoft.com/WMIConfig/2002/State" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
            <DriverPaths>
                <PathAndCredentials wcm:action="add" wcm:keyValue="A">
                    <!-- pvscsi-Windows8.flp -->
                    <Path>A:\</Path>
                </PathAndCredentials>
            </DriverPaths>
        </component>
...
    </settings>

After adding this section, the new vSphere Windows machine easily found the additional drivers.

This was tested against Windows 2019 in both AWS and vSphere deployments.

The vSphere deployment took an hour, mostly waiting for the updates to be applied. AWS takes significantly less time as I’m using the most recently updated image they provide.

The working files are located in the packer-hcl-vsphere-aws github repo.

Code Stream Nested Esxi pipeline Part 2

In this second part, I’ll discuss the actual Code Stream pipeline.

As stated before, the inspiration was William Lams wonderful Power Shell scripts to deploy a nested environment from a CLI. His original logic was retained as much as possible, however due to the nature of K8S a few things had to be changed. I’ll try to address those as they come up.

After some thought I decided to NOT allow the requester to select the amount of Memory, vCPU, or VSAN size. Each Esxi host has 24G of Ram, 4 vCPU, and contributes a touch over 100G to the VSAN. The resulting cluster has 72G of RAM, 12 vCPUs and a roughly 300G VSAN. Only Standard vSwitches are configured in each host.

The code, pipeline and other information is available on this github repo.

Deployment of the Esxi hosts is initiated by ‘deployNestedEsxi.ps1’. There are few changes from the original script.

The OVA configuration is only grabbed once. Then only the specific host settings (IP Address and Name are changed.
The hosts are moved into a vApp once built.
The NetworkAdapter settings are performed after deployment.
Persisted the log to /var/workspace_cache/logs/vsphere-deployment-$BUILDTIME.log.

Deployment of the vCSA is handled by ‘deployVcsa.ps1’ Some notable changes from the original code include.

Hardcoded the SSO username to administrator@vsphere.local.
Hardcoded the size to ‘tiny’.
Save the log file to /var/workspace_cache/logs/NestedVcsa-$BUILDTIME.log.
Save the configuration template to /var/workspace_cache/vcsajson/NestedVcsa-$BUILDTIME.json.
Move the VCSA into the vApp after deployment is complete.

And finally ‘configureVc.ps1’ sets up the Cluster and VSAN. Some changed include.

Hardcoded the Datacenter name (DC), and Cluster (CL1).
Import the Esxi hosts by IP (No DNS records setup for the hosts or vCenter).
Append the configuration results to /var/workspace_cache/logs/vsphere-deployment-$BUILDTIME.log.

So there you go, down and simple Code Stream pipeline to deploy a nested vSphere environment in about an hour.

Stay tuned. The next article will include an NSX-T deployment.

Code Stream Nested Esxi pipeline Part 1

Been a while since my last post. Over the last couple of months I’ve been tinkering with using Code Stream to deploy a Nested Esxi / vCenter environment.

My starting point is William Lams excellent PowerShell script (vsphere-with-tanzu-nsxt-automated-lab-deployment). I also wanted to use the official vmware/poweclicore docker image.

Well let’s just say it’s been an adventure. Much has been learned through trial and (mostly) error.

For example in Williams script, all of the files are located on the workstation where the script runs. Creating a custom docker image with those files would have resulted in a HUGE file, almost 16GB (Nested ESXi appliance, vCSA appliance and supporting files, and NSX-T OVA files). As one of my co-worker says, “Don’t be that guy”.

At first I tried cloning the files into the container as part of the CI setup. Downloading the ESXi OVA worked fine, but failed when I tried copying over the vCSA files. I think it’s just too much.

I finally opted to use a Kubernetes Code Stream instead of a Docker pipeline. This allowed me to use a Persistent Volume Claim.

Kubernetes setup

Some of the steps may lack details, as this has been an ongoing effort and just can’t remember everything. Sorry peeps!

Create two Name Spaces, codestream-proxy and codestream-workspace. Codestream-proxy is used by Code Stream to host a Proxy pod.

Codestream-workspace will host the containers running the pipeline code.

Next came the service account for Code Stream. The path of least resistance was to simply assign ‘cluster-admin’ to the new service account. NOTE: Don’t do this in a production environment.

apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
  name: cs-cluster-role-binding
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name: cluster-admin
subjects:
- kind: ServiceAccount
  name: codestream
  apiGroup: ""
  namespace: default

Next came the Persistent Volume (pv) and Persistent Volume Claim (pvc). My original pv was set to 20GI, which after some testing was determined too small. It was subsequently increased it to 30GI. The larger pv allowed me to retain logs and configurations between runs (for troubleshooting).

apiVersion: v1
kind: PersistentVolume
metadata:
  annotations:
  name: cs-persistent-volume-cw
spec:
  accessModes:
  - ReadWriteMany
  capacity:
    storage: 30Gi
  hostPath:
    path: /mnt/nested
  persistentVolumeReclaimPolicy: Retain

apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: cs-pvc-cw
  namespace: codestream-workspace
spec:
  accessModes:
  - ReadWriteMany
  resources:
    requests:
      storage: 30Gi
  volumeName: cs-persistent-volume-cw

The final step in k8s is to get the Service Account token. In this example the SA is called ‘codestream’ (So creative).

k get secret codestream-token-blah!!! -o jsonpath={.data.token} | base64 -d | tr -d "\n"

eyJhbGciOiJSUzI1NiIsImtpZCI6IncxM0hIYTZndS1xcEdFVWR2X1Z4UFNLREdQcGdUWDJOWUF1NDE5YkZzb.........

Copy the token, then head off to Code Stream.

Codestream setup

There I added a Variable to hold the token, called DAG-K8S-Secret.

Then went over to Endpoints, where I added a new Kubernetes endpoint.

Repo setup

The original plan was to download the OVA/OVF files from a repo every time the pipeline ran. However an error would occur on every VCSA file set download. Adding more memory to the container didn’t fix the problem, so I had to go in another direction.

The repo is well connected to the k8s cluster, so the transfer is pretty quick. Here is the directory structure for the repo (http://repo.corp.local/repo/).

NOTE: You will need a valid account to download VCSA and NSX-T.

NOTE: NSX-T will be added to the pipeline later.

Simply copying the files interactively on the k8s node seemed like the next logical step. Yes the files copied over nicely, but any attempt to deploy the VCSA appliance would throw a python error complaining about a missing ‘vmware’ module.

However I was able to run the container manually, copy the files over and run the scripts successfully. Maybe a file permissions issue?

Finally I ran the pipeline with a long sleep at the beginning. Using an interactive session, and copied the files over. This fixed the problem.

Here are the commands I used to copy the files over interactively.

k -n codestream-workspace exec -it po/running-cs-pod-id bash
wget -mxnp -q -nH http://repo.corp.local/repo/ -P /var/workspace_cache/ -R "index.html*"
# /var/workspace_cache is the mount point for the persistent volume
# need to chmod +x a few files to get the vCSA to deploy
chmod +x /var/workspace_cache/repo/vcsa/VMware-VCSA-all-7.0.3/vcsa/ovftool/lin64/ovftool*
chmod +x /var/workspace_cache/repo/vcsa/VMware-VCSA-all-7.0.3/vcsa/vcsa-cli-installer/lin64/vcsa-deploy*

This should do it for now. The next article will cover some of the pipeline details, and some of the changes I had to make to William Lams Powershell code.

Happy holidays.

Cloud Extensibility Appliance vRO Properties using PowerShell

In this article I’ll show you how to return JSON as a vRO Property type using vRA Cloud Extensibility Proxy (CEXP) vRO PowerShell 7 scriptable tasks.

First a couple of notes about the CEXP.

It is BIG, 32GB of RAM. However my lab instance is using less than 7 GB active memory.
8 vCPU, and runs about 50% on average.
It deploys with 4 disks, using a tad less than 210 GB.

Why PowerShell 7? Well it was a design decision based on the customers PS proficiency.

Now down to the good stuff. Here are the details of this basic workflow using PowerShell 7 as Scriptable Tasks.

Get a new vRA Cloud Bearer Token
- Save it, along with other common header values to an output variable named ‘headers’ (Properties)
The second scriptable task will use the header and apiEndpoint to GET the vRAC version information (About).
- Then save version information to an output variable named ‘vRacAbout’ (Properties)

Getting (actually POST) the bearerToken is fairly simple. Here is the code for the first task.

function Handler($context, $inputs) {
    <#
    .PARAMETER $inputs.refreshToken (SecureString)
        vRAC Refresh Token

    .PARAMETER $inputs.apiEndpoint (String)
        vRAC Base API URL

    .OUTPUT headers (Properties)
        Headers including the bearerToken

    #>
    $body = @{ refreshToken = $inputs.refreshToken } | ConvertTo-Json

    $headers = @{'Accept' = 'application/json'
                'Content-Type' = 'application/json'}
    
    $Uri = $inputs.apiEndpoint + "/iaas/api/login"
    $requestResponse = Invoke-RestMethod -Uri $Uri -Method Post -Body $body -Headers $headers 

    $bearerToken = "Bearer " + $requestResponse.token 
    $authorization = @{ Authorization = $bearerToken}
    $headers += $authorization

    $output=@{headers = $headers}

    return $output
}

The second task consumes the headers produced by the first task, then GET(s) the Version Information from the vRA Cloud About route (‘/iaas/api/about’). The results are then returned as the vRacAbout (Properties) variable.

function Handler($context, $inputs) {
    <#
    .PARAMETER $inputs.headers (Properties)
        vRAC Refresh Token

    .PARAMETER $inputs.apiEndpoint (String)
        vRAC Base API URL

    .OUTPUT vRacAbout (Properties)
        vRAC version information from the About route

    #>
    $requestUri += $inputs.apiEndpoint + "/iaas/api/about"
    $requestResponse = Invoke-RestMethod -Uri $requestUri -Method Get -Headers $headers

    $output=@{vRacAbout = $requestResponse}

    return $output
}

Here, you can see the output variables for both tasks are populated. Pretty cool.

As you can see, using the vRO Properties type is fairly simple using the PowerShell on CEXP vRO.

The working workflow package is available here.

Happy coding.