Managing Azure AKS clusters with VMWare Aria Automation

The use case presented to me for POC was to deploy a new Azure AKS cluster then install a basic application. Simple use case, but for those using VRA you know the kubernetes capabilities are all but non existent.

But after digging around and tinkering I figured a CodeStream (now called Pipelines) would probably fit the bill. The pipeline would run a terraform plan to build, and then destroy the deployment later on.

Keeping track of the state file between runs also presented a ‘problem’. After lots of kicking the tires I came up with a way to store the state file securing in an Azure Storage account. The state file in the container is simply the deployment name plus .tfstate. This allowed me to refer to it using day two actions and Event Broker Subscriptions (EBS).

Another issue that came up was deleting ‘codestream.execution’ resources when the deployment is deleted. Since these deployments are handled by terraform I needed another WF which called a pipeline to destroy the deployment when the eventType was DESTROY_DEPLOYMENT.

The files for this article can be found at azure-terraform-blog

Terraform is used to do the heavy lifting. The backend values get replaced with some pipeline inputs in the first pipeline task. The most important one is the deployment name. When destroying the deployment, terraform will pull the current state for that deployment and do its thing.

The CodeStream pipeline (Now Pipelines) uses a custom docker image. It includes the latest version of Terraform (Currently at 1.5.4), AZ CLI, Kubectl, and Helm (for another use case). It is stored on DockerHub as americanbwana/cas-terraform-154:latest.

I didn’t come up with the basic Template. I found this article on vEducate.co.uk. A very good starting point. ‘pipelineTask’ is used by the pipeline to either create (apply) or destroy the deployment. More on that later.

formatVersion: 1
inputs:
  pipelineTask:
    type: string
    title: Pipeline Task
    description: 'Create '
    readOnly: true
    default: create
resources:
  cs.pipeline:
    type: codestream.execution
    properties:
      pipelineId: 2b80427c...
      outputs:
        computed: true
      inputs:
        deploymentName: ${env.deploymentName}
        pipelineTask: ${input.pipelineTask}

vRA doesn’t delete the actual codestream.execution items when you destroy the deployment. A workflow called ‘Terraform delete AKS and Helm deployment’ is called by an Event Broker Subscription (EBS). Make sure to update the ‘codestreamPipelineId’ in the WF variables.

And finally on to the pipeline. The initialize task copies several variables into a file, which is then sourced by most stages. Terraform apply is only fired if the pipelineTask = ‘create’. And Terraform destroy is only fired when pipelineTask = ‘destroy’.

‘Get Service IP’ is also only fired if the pipelineTask = ‘create’. This task will get the IP address of WordPress and export it back to vRA.

Nuff for now. Happy coding.

Packer HCL and PVSCSI drivers

Just this last week I was updating an old Packer build configuration from JSON to HCL. But for the life of me could not get a new vSphere Windows 2019 machine to find a disk attached to Para Virtualized disk controller.

I repeatedly received this error after the machine new machine booted.

In researching error 0x80042405 in C:\Windows\pather\setuperr.log, I found it simply could not find the attached disk.

After some research I determined the PVSCSI drivers added to the floppy disk where not being discovered. Or more specifically the new machine didn’t know to search the floppy for additional drivers.

I finally found a configuration section for my autounattend.xml file which would fix it after an almost exhaustive online search.

The magic section reads as follows.

<unattend xmlns="urn:schemas-microsoft-com:unattend">
    <settings pass="windowsPE">        
       <component name="Microsoft-Windows-PnpCustomizationsWinPE" processorArchitecture="amd64" publicKeyToken="31bf3856ad364e35" language="neutral" versionScope="nonSxS" xmlns:wcm="http://schemas.microsoft.com/WMIConfig/2002/State" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
            <DriverPaths>
                <PathAndCredentials wcm:action="add" wcm:keyValue="A">
                    <!-- pvscsi-Windows8.flp -->
                    <Path>A:\</Path>
                </PathAndCredentials>
            </DriverPaths>
        </component>
...
    </settings>

After adding this section, the new vSphere Windows machine easily found the additional drivers.

This was tested against Windows 2019 in both AWS and vSphere deployments.

The vSphere deployment took an hour, mostly waiting for the updates to be applied. AWS takes significantly less time as I’m using the most recently updated image they provide.

The working files are located in the packer-hcl-vsphere-aws github repo.

Code Stream Nested Esxi pipeline Part 2

In this second part, I’ll discuss the actual Code Stream pipeline.

As stated before, the inspiration was William Lams wonderful Power Shell scripts to deploy a nested environment from a CLI. His original logic was retained as much as possible, however due to the nature of K8S a few things had to be changed. I’ll try to address those as they come up.

After some thought I decided to NOT allow the requester to select the amount of Memory, vCPU, or VSAN size. Each Esxi host has 24G of Ram, 4 vCPU, and contributes a touch over 100G to the VSAN. The resulting cluster has 72G of RAM, 12 vCPUs and a roughly 300G VSAN. Only Standard vSwitches are configured in each host.

The code, pipeline and other information is available on this github repo.

Deployment of the Esxi hosts is initiated by ‘deployNestedEsxi.ps1’. There are few changes from the original script.

The OVA configuration is only grabbed once. Then only the specific host settings (IP Address and Name are changed.
The hosts are moved into a vApp once built.
The NetworkAdapter settings are performed after deployment.
Persisted the log to /var/workspace_cache/logs/vsphere-deployment-$BUILDTIME.log.

Deployment of the vCSA is handled by ‘deployVcsa.ps1’ Some notable changes from the original code include.

Hardcoded the SSO username to administrator@vsphere.local.
Hardcoded the size to ‘tiny’.
Save the log file to /var/workspace_cache/logs/NestedVcsa-$BUILDTIME.log.
Save the configuration template to /var/workspace_cache/vcsajson/NestedVcsa-$BUILDTIME.json.
Move the VCSA into the vApp after deployment is complete.

And finally ‘configureVc.ps1’ sets up the Cluster and VSAN. Some changed include.

Hardcoded the Datacenter name (DC), and Cluster (CL1).
Import the Esxi hosts by IP (No DNS records setup for the hosts or vCenter).
Append the configuration results to /var/workspace_cache/logs/vsphere-deployment-$BUILDTIME.log.

So there you go, down and simple Code Stream pipeline to deploy a nested vSphere environment in about an hour.

Stay tuned. The next article will include an NSX-T deployment.

Code Stream Nested Esxi pipeline Part 1

Been a while since my last post. Over the last couple of months I’ve been tinkering with using Code Stream to deploy a Nested Esxi / vCenter environment.

My starting point is William Lams excellent PowerShell script (vsphere-with-tanzu-nsxt-automated-lab-deployment). I also wanted to use the official vmware/poweclicore docker image.

Well let’s just say it’s been an adventure. Much has been learned through trial and (mostly) error.

For example in Williams script, all of the files are located on the workstation where the script runs. Creating a custom docker image with those files would have resulted in a HUGE file, almost 16GB (Nested ESXi appliance, vCSA appliance and supporting files, and NSX-T OVA files). As one of my co-worker says, “Don’t be that guy”.

At first I tried cloning the files into the container as part of the CI setup. Downloading the ESXi OVA worked fine, but failed when I tried copying over the vCSA files. I think it’s just too much.

I finally opted to use a Kubernetes Code Stream instead of a Docker pipeline. This allowed me to use a Persistent Volume Claim.

Kubernetes setup

Some of the steps may lack details, as this has been an ongoing effort and just can’t remember everything. Sorry peeps!

Create two Name Spaces, codestream-proxy and codestream-workspace. Codestream-proxy is used by Code Stream to host a Proxy pod.

Codestream-workspace will host the containers running the pipeline code.

Next came the service account for Code Stream. The path of least resistance was to simply assign ‘cluster-admin’ to the new service account. NOTE: Don’t do this in a production environment.

apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
  name: cs-cluster-role-binding
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name: cluster-admin
subjects:
- kind: ServiceAccount
  name: codestream
  apiGroup: ""
  namespace: default

Next came the Persistent Volume (pv) and Persistent Volume Claim (pvc). My original pv was set to 20GI, which after some testing was determined too small. It was subsequently increased it to 30GI. The larger pv allowed me to retain logs and configurations between runs (for troubleshooting).

apiVersion: v1
kind: PersistentVolume
metadata:
  annotations:
  name: cs-persistent-volume-cw
spec:
  accessModes:
  - ReadWriteMany
  capacity:
    storage: 30Gi
  hostPath:
    path: /mnt/nested
  persistentVolumeReclaimPolicy: Retain

apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: cs-pvc-cw
  namespace: codestream-workspace
spec:
  accessModes:
  - ReadWriteMany
  resources:
    requests:
      storage: 30Gi
  volumeName: cs-persistent-volume-cw

The final step in k8s is to get the Service Account token. In this example the SA is called ‘codestream’ (So creative).

k get secret codestream-token-blah!!! -o jsonpath={.data.token} | base64 -d | tr -d "\n"

eyJhbGciOiJSUzI1NiIsImtpZCI6IncxM0hIYTZndS1xcEdFVWR2X1Z4UFNLREdQcGdUWDJOWUF1NDE5YkZzb.........

Copy the token, then head off to Code Stream.

Codestream setup

There I added a Variable to hold the token, called DAG-K8S-Secret.

Then went over to Endpoints, where I added a new Kubernetes endpoint.

Repo setup

The original plan was to download the OVA/OVF files from a repo every time the pipeline ran. However an error would occur on every VCSA file set download. Adding more memory to the container didn’t fix the problem, so I had to go in another direction.

The repo is well connected to the k8s cluster, so the transfer is pretty quick. Here is the directory structure for the repo (http://repo.corp.local/repo/).

NOTE: You will need a valid account to download VCSA and NSX-T.

NOTE: NSX-T will be added to the pipeline later.

Simply copying the files interactively on the k8s node seemed like the next logical step. Yes the files copied over nicely, but any attempt to deploy the VCSA appliance would throw a python error complaining about a missing ‘vmware’ module.

However I was able to run the container manually, copy the files over and run the scripts successfully. Maybe a file permissions issue?

Finally I ran the pipeline with a long sleep at the beginning. Using an interactive session, and copied the files over. This fixed the problem.

Here are the commands I used to copy the files over interactively.

k -n codestream-workspace exec -it po/running-cs-pod-id bash
wget -mxnp -q -nH http://repo.corp.local/repo/ -P /var/workspace_cache/ -R "index.html*"
# /var/workspace_cache is the mount point for the persistent volume
# need to chmod +x a few files to get the vCSA to deploy
chmod +x /var/workspace_cache/repo/vcsa/VMware-VCSA-all-7.0.3/vcsa/ovftool/lin64/ovftool*
chmod +x /var/workspace_cache/repo/vcsa/VMware-VCSA-all-7.0.3/vcsa/vcsa-cli-installer/lin64/vcsa-deploy*

This should do it for now. The next article will cover some of the pipeline details, and some of the changes I had to make to William Lams Powershell code.

Happy holidays.

VMware Code Stream saved Custom Integration version issues

First the bad news. The VMware Code Stream Cloud version has a limit of 300 saved Custom Integrations and versions. Your once working pipelines will all of sudden get a validation error of “The saved Custom Integration version is not longer available” if you exceed this limit.

Now the not so good news. They haven’t fixed it yet!

And now the details.

vExpert 2021 Applications are open

The 2021 vExpert applications are now open!

The program “is about giving back to the community beyond your day job”.

One way I give back is by posting new and unique content here once or twice a month. Sometimes a post is simply me clearing a thought before the weekend, completing a commitment to a BU, or documenting something before moving on to another task. It doesn’t take long, but could open the door for one of my peers.

My most frequently used benefit is the vExpert and Cloud Management Slack channels. I normally learn something new every-week. And it sure does feel good to help a peer struggling with something I’ve already tinkered with.

Here’s a list of some of the benefits for receiving the award.

Networking with 2,000+ vExperts / Information Sharing
Knowledge Expansion on VMware & Partner Technology
Opportunity to apply for vExpert BU Lead Subprograms
Possible Job Opportunities
Direct Access to VMware Business Units via Subprograms
Blog Traffic Boost through Advocacy, @vExpert, @VMware, VMware Launch & Announcement Campaigns
1 Year VMware Licenses for Home Labs for almost all Products & Some Partner Products
Private VMware & VMware Partner Sessions
Gifts from VMware and VMware Partners
vExpert Celebration Parties at both VMworld US and VMworld Europe with VMware CEO, Pat Gelsinger
VMware Advocacy Platform Invite (share your content to thousands of vExperts & VMware employees who amplify your content via their social channels)
Private Slack Channels for vExpert and the BU Lead Subprograms

The applications close on January 9th, 2021. Start working on those applications now.

VMware Cloud Assembly Custom Integration lessons learned

I’ve recently spent some time refactoring Sam McGeown’s Image as Code (IaC) Codestream pipeline for VMware vRealize Automation Cloud. The following details some of the lessons learned.

First off, I found the built in Code Stream REST tasks do not have a retry. I learned this the hard way when they had issues with their cloud offering last month. At times it would get a 500 error back when making a request, resulting in a failed execution.

This forced me to look at Python custom integrations which would retry until the correct success code was returned. I was able to get the pipeline working, but it did have a lot of repetitive code, lacked the ability to limit the number of retries, and was based on Python 2.

Seeing the error of my ways, I decided to again refactor the code with a custom module (For the repetitive code), and migrate to Python 3.

The original docker image was CentOS based and did not have Python 3 installed. Instead of just installing Python 3 thus increasing the size of the image, I opted to start with the Docker Official Python 3 image. I’ll get to the build file later.

Now on to the actual refactoring. Here I wanted to combine the reused code into a custom python module. My REST calls include POST (To get a Bearer Token), GET (With and without a Filter), PATCH (To update Image Mappings), and DELETE (To delete the test Image Profile and Cloud Template).

This module snippet includes PostBearerToken_session which returns the bearToken and other headers. GetFilteredVrac_sessions returns a filtered GET request. It also limits the retries to 5 attempts.

import requests
import logging
import os 
import json 
from requests.adapters import HTTPAdapter
from requests.packages.urllib3.util.retry import Retry

# retry strategy to tolerate API errors.  Will retry if the status_forcelist matches.
retry_strategy = Retry (
    total=5,
    status_forcelist=[429, 500, 502, 503, 504],
    method_whitelist=["GET", "POST"],
    backoff_factor=2,
    raise_on_status=True,
)


adapter = HTTPAdapter(max_retries=retry_strategy)
https = requests.Session()
https.mount("https://", adapter)

vRacUrl = "https://api.mgmt.cloud.vmware.com"

def PostBearerToken_session(refreshToken):
    # Post to get the bearerToken from refreshToken
    # Will return the headers with Authorization populated
    # Build post payload 
    pl = {}
    pl['refreshToken'] = refreshToken.replace('\n', '')
    logging.info('payload is ' + json.dumps(pl))
    loginURL = vRacUrl + "/iaas/api/login"
    headers = {
        "Accept": "application/json",
        "Content-Type": "application/json"
        }
    r = https.post(loginURL, json=pl, headers=headers)
    responseJson = r.json()
    token = "Bearer " + responseJson["token"]
    headers['Authorization']=token
    return headers 

def GetFilteredVrac_sessions(requestUrl, headers, requestFilter):
    # Get a thing using a filter
    requestUrl = vRacUrl + requestUrl + requestFilter
    print(requestUrl)
    adapter = HTTPAdapter(max_retries=retry_strategy)
    https = requests.Session()
    https.mount("https://", adapter)
    r = https.get(requestUrl, headers=headers)
    responseJson = r.json()
    return responseJson

Here is the working Code Stream Custom Integration used to test this module. It will get the headers, then send a filtered request using the Cloud Account Name. It then pulls some information out of the Payload and prints it out. (Sorry for formatting).


 runtime: "python3"
 code: |
   import json 
   import requests
   import os 
   import sys
   import logging
 # append /build to the path
   # this is where the custom python module is copied to
   sys.path.append('/build')
   import vRAC
 # context.py is automatically added.
   from context import getInput, setOutput
   def main():
def main():
   refreshToken=getInput('RefreshToken')
   # now with new model
   # authHeaders will have all of the required headers, including the bearerToken
   authHeaders=vRAC.PostBearerToken_session(refreshToken)
   # test the getFunction with filter
   requestUrl = "/iaas/api/cloud-accounts"
   requestFilter = "?$filter=name eq '" + getInput('cloudAccountName') + "'" 
   # get the cloudAccount by name    
   cloudAccountJson=vRAC.GetFilteredVrac_sessions(requestUrl, authHeaders, requestFilter) 
   # get some specific data out for later
   cloudAccountId = cloudAccountJson['content'][0]['id']   
   logging.info('cloudAccountId: ' + cloudAccountId)
 if name == 'main':
       main()
 inputProperties:      # Enter fields for input section of a task
   # Password input
 - name: RefreshToken
   type: text
   title: vRAC Refresh Token
   placeHolder: 'secret/password field'
   defaultValue: changeMe
   required: true
   bindable: true
   labelMessage: vRAC RefreshToken 

 - name: cloudAccountName
   type: text 
   title: Cloud Account Name
   placeHolder: vc.corp.local
   defaultValue: vc.corp.local
   required: true
   bindable: true
   labelMessage: Cloud Account Name

Next the new Docker image. This uses the official Python 3 image as a starting point. The build file copies everything over (Including the custom module and requirements.txt), then installs ‘requests’.

FROM python:3

WORKDIR /build

COPY . ./
RUN pip install --no-cache-dir -r requirements.txt

Now that the frame work is ready, it’s time to create the pipeline and test it. This is well documented here Creating and using pipelines in VMware Code Stream. Update the Host field with your predefined Docker Host, set the Builder image URL (Docker Hub repo and tag), and set the Working directory to ‘/build’ (to match WORKDIR in the Dockerfile).

Running the pipeline worked and returned the requested information.

This was a fairly brief article. I really just wanted to get everything written down before the weekend. I’ll have more in the near future.

VMWare vRealize Automation Cloud Image Profiles lessons learned

Over the last month I’ve spend most of my time trying to replicate Sam McGeown’s Build, test and release VM images with vRealize Automation Code Stream and Packer in vRealize Automation Cloud. One of the tasks is to create or update an Image Profile. The following article details some of the things I’ve learned.

First off, the Image Mappings shown in the UI cannot be updated via the API directly. The API allows you create, change and delete Image Profiles. An Image Profile contains the Image Mappings and is tied to a Region.

For example, in this image the IaC Image Mappings are displayed.

Here, some of same Image Mappings as seen when you GET the Image Profile by id.

{
"imageMapping": {
"mapping": {
"IaC-build-test-patch": {
"id": "8fc331632163f53fd0c66e0407495504295b4c1c",
"name": "",
"description": "Template: vSphere-CentOS8-CUSTOM-2020.09.25.181019"
},
"IaC-prod-profile": {
"id": "2e2d31be93c59531d2c1eeeadc58f68b66174559",
"name": "",
"description": "Template: vSphere-CentOS8-CUSTOM-2020.09.25.144314"
},
"IaC-test-profile": {
"id": "842c91f05185978d62d201df3b47d1505cf3fea3",
"name": "",
"description": "Generic CentOS 7 template with cloud-init installed and VM hardware version 13 (compatible with ESXi 6.5 or greater)."
}
}
},
"regionId": "71cecc477594a67558b9d5xxxxxxx",
"name": "IaC-build-test-profile",
"description": "Packer build image for testing"
}

But how do you get the Image Profile Id? I ended up using a filter based on the externalRegionId (I used another filtered search to find the externalRegionId by Region Name).

https://api.mgmt.cloud.vmware.com/iaas/api/image-profiles?$filter=externalRegionId eq 'Datacenter:datacenter-xyz'

This returned the following payload (Cleaned up bit). I reference this later as cloudAccountJson.

{
"content": [
{
"imageMappings": {
"mapping": {
"IaC-prod-profile": {
...
},
"IaC-build-test-patch": {
...
},
"IaC-test-profile": {
...
}
},
"externalRegionId": "",
...
},
"externalRegionId": "Datacenter:datacenter-xyz",
...
"name": "IaC-build-test-profile",
"description": "Packer build image for testing",
"id": "fa57fef8-5d0e-494b-b299-7e4a9030ac11-71cecc477594a67558b9d5f056260",
...
],
"totalElements": 1,
"numberOfElements": 1
}

The id will be used later to update (PATCH) the Image Profile.

Now to build the PATCH body to update the Profile. The API has the following body example.

{ "name": "string", "description": "string", "imageMapping": "{ \"ubuntu\": { \"id\": \"9e49\", \"name\": \"ami-ubuntu-16.04-1.9.1-00-1516139717\"}, \"coreos\": { \"id\": \"9e50\", \"name\": \"ami-coreos-26.04-1.9.1-00-543254235\"}}", "regionId": "9e49" }

Then using cloudAccountJson returned previously, I built a new body using the following (partial) code (I couldn’t get the formatting right, hence the image.)

Now some gotcha’s.

First, remember that the image mappings are tied to the region. You will loose any Image Mappings NOT included in the POST/PATCH Body. Make sure you back up the Image Profile settings (Do a get by Image Profile Id) before attempting to change the mappings via the API.

Secondly, an Image Profile does not have a name by default. You need to set this via the API. Why would you need it? Well you may want to find the Image Profile by name later. My current customer creates new customer Image Profiles via the API and uses a ‘tag’ like naming convention.

Thirdly, I’ve experienced several 500 errors when interfacing with the vRA Cloud API. The out of box Code Stream REST tasks do not retry. I ended up writing python custom integrations as a work around. These retry until receiving the correct response code (I’ve seen up to 15 500 errors before getting a 200).

This is just one thing I’ve learned about the vRA Cloud API, and Code Stream. I’ll post more as I have time.