Code Stream Nested Esxi pipeline Part 1

Been a while since my last post. Over the last couple of months I’ve been tinkering with using Code Stream to deploy a Nested Esxi / vCenter environment.

My starting point is William Lams excellent PowerShell script (vsphere-with-tanzu-nsxt-automated-lab-deployment). I also wanted to use the official vmware/poweclicore docker image.

Well let’s just say it’s been an adventure. Much has been learned through trial and (mostly) error.

For example in Williams script, all of the files are located on the workstation where the script runs. Creating a custom docker image with those files would have resulted in a HUGE file, almost 16GB (Nested ESXi appliance, vCSA appliance and supporting files, and NSX-T OVA files). As one of my co-worker says, “Don’t be that guy”.

At first I tried cloning the files into the container as part of the CI setup. Downloading the ESXi OVA worked fine, but failed when I tried copying over the vCSA files. I think it’s just too much.

I finally opted to use a Kubernetes Code Stream instead of a Docker pipeline. This allowed me to use a Persistent Volume Claim.

Kubernetes setup

Some of the steps may lack details, as this has been an ongoing effort and just can’t remember everything. Sorry peeps!

Create two Name Spaces, codestream-proxy and codestream-workspace. Codestream-proxy is used by Code Stream to host a Proxy pod.

Codestream-workspace will host the containers running the pipeline code.

Next came the service account for Code Stream. The path of least resistance was to simply assign ‘cluster-admin’ to the new service account. NOTE: Don’t do this in a production environment.

apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
  name: cs-cluster-role-binding
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name: cluster-admin
subjects:
- kind: ServiceAccount
  name: codestream
  apiGroup: ""
  namespace: default

Next came the Persistent Volume (pv) and Persistent Volume Claim (pvc). My original pv was set to 20GI, which after some testing was determined too small. It was subsequently increased it to 30GI. The larger pv allowed me to retain logs and configurations between runs (for troubleshooting).

apiVersion: v1
kind: PersistentVolume
metadata:
  annotations:
  name: cs-persistent-volume-cw
spec:
  accessModes:
  - ReadWriteMany
  capacity:
    storage: 30Gi
  hostPath:
    path: /mnt/nested
  persistentVolumeReclaimPolicy: Retain
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: cs-pvc-cw
  namespace: codestream-workspace
spec:
  accessModes:
  - ReadWriteMany
  resources:
    requests:
      storage: 30Gi
  volumeName: cs-persistent-volume-cw

The final step in k8s is to get the Service Account token. In this example the SA is called ‘codestream’ (So creative).

k get secret codestream-token-blah!!! -o jsonpath={.data.token} | base64 -d | tr -d "\n"

eyJhbGciOiJSUzI1NiIsImtpZCI6IncxM0hIYTZndS1xcEdFVWR2X1Z4UFNLREdQcGdUWDJOWUF1NDE5YkZzb.........

Copy the token, then head off to Code Stream.

Codestream setup

There I added a Variable to hold the token, called DAG-K8S-Secret.

Then went over to Endpoints, where I added a new Kubernetes endpoint.

Repo setup

The original plan was to download the OVA/OVF files from a repo every time the pipeline ran. However an error would occur on every VCSA file set download. Adding more memory to the container didn’t fix the problem, so I had to go in another direction.

The repo is well connected to the k8s cluster, so the transfer is pretty quick. Here is the directory structure for the repo (http://repo.corp.local/repo/).

NOTE: You will need a valid account to download VCSA and NSX-T.

NOTE: NSX-T will be added to the pipeline later.

Simply copying the files interactively on the k8s node seemed like the next logical step. Yes the files copied over nicely, but any attempt to deploy the VCSA appliance would throw a python error complaining about a missing ‘vmware’ module.

However I was able to run the container manually, copy the files over and run the scripts successfully. Maybe a file permissions issue?

Finally I ran the pipeline with a long sleep at the beginning. Using an interactive session, and copied the files over. This fixed the problem.

Here are the commands I used to copy the files over interactively.

k -n codestream-workspace exec -it po/running-cs-pod-id bash
wget -mxnp -q -nH http://repo.corp.local/repo/ -P /var/workspace_cache/ -R "index.html*"
# /var/workspace_cache is the mount point for the persistent volume
# need to chmod +x a few files to get the vCSA to deploy
chmod +x /var/workspace_cache/repo/vcsa/VMware-VCSA-all-7.0.3/vcsa/ovftool/lin64/ovftool*
chmod +x /var/workspace_cache/repo/vcsa/VMware-VCSA-all-7.0.3/vcsa/vcsa-cli-installer/lin64/vcsa-deploy*

This should do it for now. The next article will cover some of the pipeline details, and some of the changes I had to make to William Lams Powershell code.

Happy holidays.

vRA Cloud Day 2 Resource Action using a Polyglot workflow

One of my peers came up with an interesting use case today. His customer wanted to mount an existing disk on a virtual machine using a vRA Cloud day 2 action.

I couldn’t find an out of the box workflow or action on my vRO, which meant I had to do this thing from scratch.

After a quick look around I found a PowerCli cmdlet (New-Hardisk) which allowed me to mount an existing disk.

My initial attempts to just run it as a scriptable task resulted in the following error.

Hmm, so how do you increase the memory in a scriptable task? Simple, you can’t. Thus I had to move the script into an action, which does allow me to increase the memory. After some tinkering I found that 256M was sufficient to run the code.

function Handler($context, $inputs) {
    # $inputs:
    ## vmName: string
    ## vcName: string (in configuration element)
    ## vcUsername: string (in configuration element)
    ## vcPassword: secureString (in configuration element)
    ## diskPath: string. Example in code. 
    # output:
    ## actionResult: Not used
    $inputsString = $inputs | ConvertTo-Json -Compress

    Write-Host "Inputs were $inputsString"

    $output=@{status = 'done'}

    # connect to viserver
    Set-PowerCLIConfiguration -InvalidCertificateAction:Ignore -Confirm:$false
    Connect-VIServer -Server $inputs.vcName -Protocol https -User $inputs.vcUsername -Password $inputs.vcPassword

    # Get vm by name
    Write-Host "vmName is $inputs.vmName"
    $vm = Get-VM -Name $inputs.vmName

    # New-HardDisk -VM $vm -DiskPath "[storage1] OtherVM/OtherVM.vmdk"
    $result = New-HardDisk -VM $vm -DiskPath $inputs.diskPath 
    Write-Host "Result is $result"

    return "It worked!"
}

Looking at the code, you will notice an input of vmName (used by PS to find the VM). Getting the vmName is actually pretty stupid simple using JavaScript. My first task in the WF takes care of this.

// get the vmName
// $inputs.vm
// output: vmName
vmName = vm.name

The next step was to setup a resource action. The settings are shown in the following snapshot. Please note the setting within the green box. ‘vm’ is set with a binding action.

Changing the binding is fairly simple. Just click the binding link, then change the value to ‘with binding action’. The default values work just fine.

The disk I used in the test was actually a copy of another VM boot disk. It was copied over to another datastore, then renamed to ‘ExistingDisk2.vmdk’. The full diskPath was [dag-nfs] ExistingDisk/ExistingDisk2.vmdk.

Running the day 2 action on deployed machine seemed to work, as the WF logs show.

So there you have a basic PolyGlot vRO workflow using PowerCli and JavaScript.

I trust this quick blog was helpful in some small way.

Changing vRealize Automation Cloud Proxy internal network ranges

My current customer needs to use 172.18.0.0/16 for their new VMWare Cloud on AWS cluster. However we tried this in the past and were getting a “NO ROUTE TO HOST” error when trying to add the VMC vCenter as a cloud account.

The problem was eventually traced back to the ‘on-prem-collector’ (br-57b69aa2bd0f) network in the Cloud Proxy which also uses the same subnet.

Let’s say the vCenters IP is 172.18.32.10. From inside cloudassembly-sddc-agent container, I try to connect to the vCenter. Eventually getting a ‘No route to host’ error. Can anyone say classic overlapping IP space?

We reached out to our VMWare Customer Success Team and TAM, who eventually provided a way to change the Cloud Proxy docker and on-prem-collector subnets.

Now for the obligatory warning. Don’t try this in production without having GSS sign off on it.

In this example I’m going to change the docker network to 192.168.0.0/24 and the on-prem-collector network to 192.168.1.0/24.

First to update the docker interface range.

Add the following two lines to /etc/docker/daemon.json. Don’t forget to add the necessary comma(s). Then save and close.

{
  "bip": "192.168.0.1/24",
  "fixed-cidr": "192.168.0.1/25"
}

Restart the docker service.

# systemctl restart docker

Now onto the on-prem-collector network.

Check to see which containers are using this network with docker network inspect on-prem-collector. Mine had two, cloudassembly-sddc-agent, cloudassembly-cmx-agent.

# docker network inspect on-prem-collector
[
    {
        "Name": "on-prem-collector",
        "Id": "57b69aa2bd0f694d76cc553769321deebcdb79e009e0964c4b5cc47aadb14684",
        "Created": "2021-02-10T16:05:21.953266873Z",
        "Scope": "local",
        "Driver": "bridge",
        "EnableIPv6": false,
        "IPAM": {
            "Driver": "default",
            "Options": {},
            "Config": [
                {
                    "Subnet": "172.18.0.0/16",
                    "Gateway": "172.18.0.1"
                }
            ]
        },
        "Internal": false,
        "Attachable": false,
        "Ingress": false,
        "ConfigFrom": {
            "Network": ""
        },
        "ConfigOnly": false,
        "Containers": {
            "05105324cff757d76de9e2f535cfb72d2e96094a630561aa141a40aa04095f00": {
                "Name": "cloudassembly-cmx-agent",
                "EndpointID": "8f6717a969b5a1edfea37b9e3d77565c38419de18774bebf4c3981e41c1ad017",
                "MacAddress": "02:42:ac:12:00:03",
                "IPv4Address": "172.18.0.3/16",
                "IPv6Address": ""
            },
            "b227cf1add6caca415b88f927fb10982b0cd846f71548f95071b65330e4024e1": {
                "Name": "cloudassembly-sddc-agent",
                "EndpointID": "4f802a81e0a5dfe50ca39675a5b5106a5fb647198f3bfa898f4f62793baad448",
                "MacAddress": "02:42:ac:12:00:02",
                "IPv4Address": "172.18.0.2/16",
                "IPv6Address": ""
            }
        },
        "Options": {},
        "Labels": {}
    }
]

Disconnect those two machines from the on-prem-collector network.

# docker ps
CONTAINER ID        IMAGE                                                                          COMMAND                  CREATED             STATUS              PORTS                      NAMES

05105324cff7        symphony-docker-external.jfrog.io/vmware/cloudassembly-cmx-agent:207           "./run.sh --lemansDa…"   4 days ago          Up 5 minutes        127.0.0.1:8004->8004/tcp   cloudassembly-cmx-agent

b227cf1add6c        symphony-docker-external.jfrog.io/vmware/cloudassembly-sddc-agent:4cda576      "./run.sh --lemansDa…"   4 days ago          Up 5 minutes        127.0.0.1:8002->8002/tcp   cloudassembly-sddc-agent

# docker network disconnect on-prem-collector b227cf1add6c
# docker network disconnect on-prem-collector 05105324cff7
# docker network inspect on-prem-collector
[
    {
        "Name": "on-prem-collector",
        "Id": "57b69aa2bd0f694d76cc553769321deebcdb79e009e0964c4b5cc47aadb14684",
        "Created": "2021-02-10T16:05:21.953266873Z",
        "Scope": "local",
        "Driver": "bridge",
        "EnableIPv6": false,
        "IPAM": {
            "Driver": "default",
            "Options": {},
            "Config": [
                {
                    "Subnet": "172.18.0.0/16",
                    "Gateway": "172.18.0.1"
                }
            ]
        },
        "Internal": false,
        "Attachable": false,
        "Ingress": false,
        "ConfigFrom": {
            "Network": ""
        },
        "ConfigOnly": false,
        "Containers": {},
        "Options": {},
        "Labels": {}
    }
]

Delete the on-prem-collector network, then re-add using the new subnet (using 192.168.1.0/24)

# docker network rm on-prem-collector
on-prem-collector
# docker network create --subnet=192.168.1.0/24 --gateway=192.168.1.1 on-prem-collector
47e3d477a87c4459f57e3a7305754b1d91e4d13e645ad4c160de5b8e64fede1a

Reconnect the two containers to the new docker network.

# docker network connect on-prem-collector 05105324cff7
# docker network connect on-prem-collector b227cf1add6c
# 
# docker network inspect on-prem-collector
[
    {
        "Name": "on-prem-collector",
        "Id": "47e3d477a87c4459f57e3a7305754b1d91e4d13e645ad4c160de5b8e64fede1a",
        "Created": "2021-05-18T15:58:55.019732144Z",
        "Scope": "local",
        "Driver": "bridge",
        "EnableIPv6": false,
        "IPAM": {
            "Driver": "default",
            "Options": {},
            "Config": [
                {
                    "Subnet": "192.168.1.0/24",
                    "Gateway": "192.168.1.1"
                }
            ]
        },
        "Internal": false,
        "Attachable": false,
        "Ingress": false,
        "ConfigFrom": {
            "Network": ""
        },
        "ConfigOnly": false,
        "Containers": {
            "05105324cff757d76de9e2f535cfb72d2e96094a630561aa141a40aa04095f00": {
                "Name": "cloudassembly-cmx-agent",
                "EndpointID": "34df13b0accf2f561e0226918a7e84d02995a25f4cc3969758a913a3f6c4e8bb",
                "MacAddress": "02:42:c0:a8:01:02",
                "IPv4Address": "192.168.1.2/24",
                "IPv6Address": ""
            },
            "b227cf1add6caca415b88f927fb10982b0cd846f71548f95071b65330e4024e1": {
                "Name": "cloudassembly-sddc-agent",
                "EndpointID": "405e7e8e1a4ad09b4cc99b0661454a4b0f32687152ca2346daf72f5a424dcd4d",
                "MacAddress": "02:42:c0:a8:01:03",
                "IPv4Address": "192.168.1.3/24",
                "IPv6Address": ""
            }
        },
        "Options": {},
        "Labels": {}
    }
]

Reboot and do the happy dance.

Happy not-coding.