AWS IPAM with vRealize Automation Cloud and InfoBlox Part 1

The next two articles will discuss how to setup InfoBlox for AWS as an IPAM provider to vRealize Automation Cloud (vRAC). InfoBlox will be hosted in AWS using a community AMI. I’ll be using the latest version (1.0) of the VMware InfoBlox vRA 8.x plugin available on the VMware Solution Exchange, and InfoBlox version 8.5.0 (Any version that supports WAPI v2.7 should work).

Two AWS accounts are needed, one for InfoBlox vDiscovery and the other for vRAC AWS Cloud Account.

First the InfoBlox vDiscovery user, create a role following the directions on page 35 of the vNIOS for AWS document. Then create a new user, and download the credentials.

Secondly, assuming you already have your AWS Cloud Account setup, add the following roles and permissions to your AWS vRAC user.

  • IAMReadOnlyAccess / AWS Managed Policy – Needed when adding the InfoBlox Integration
  • AWSLambdaBasicExecutionRole / AWS Managed Policy – Used by the plugin to run Lambda functions
  • IAM:PassRole / Inline policy – Needed when adding the InfoBlox Integration

Here is a screen shot of my working AWS Policy and Permissions for the vrac user account.

Now on to deploying the InfoBlox for AWS AMI. This deployment requires two subnets in the same availability zone. Detailed installation directions start on page 22 of the NVIOS for AWS document. Make sure to select one of the DDI BYOL AMI’s. I’m using ami-044c7a717e19bb001 for this blog. Here is a screen shot of the search of the community InfoBlox AMI’s.

Some notes on the AMI deployment. 1., Make sure the additional (new) interface is on a different subnet. The management interface (eth1) will need internet access. 2., Assign a Security Group which allows SSH from your local machine, and HTTPS from anywhere.

Take a 10 or 15 minute break as the instance boots and the Status Checks complete. You may use this time to assign an EIP to the ENI assigned to eth1. You can get the Interface ID by clicking on the instance eth1 interface under Instance Description and copying the Interface ID value (at the top of the popup).

Next assign a new or existing EIP to the Network Interface.

Take a 10 or 15 minute break as the instance boots and the Status Checks complete. SSH to the instance as admin with the default password of infoblox. Once logged in you will need to add some temporary licenses (Or permanent if you have them). Add the license options shown in this screen shot. When adding #4, select #2, IB-V825. This will force a reboot.

Give the appliance about 5 minutes before browsing to https://<EIP Address>. Login as admin with the default password of infoblox.

The first login will eventually send you the Grid Setup Wizard. My environment was setup using these settings.

  1. Step 1, Configure as a Grid Master
  2. Step 2, Changed the Shared Secret
  3. Step 3, No changes
  4. Step 4, Changed the password to something more complex than ‘infoblox’
  5. Step 5, No changes
  6. Step 6, Click Finish

Next enable the DNS Resolver in Grid Properties (Click on Grid, click Grid Properties, then add the DNS server under DNS Resolver.

Add a new Authoritative forward-mapping zone under Data Management -> DNS. I’m using corp.local for this article.

Then start the DNS server under Grid -> Grid Manager. Then click DNS, select the grid master, and click the start button.

Now on to discovering the VPC, Subnets and used IPs. Click on Data Management -> IPAM, then click on vDiscovery on the right hand side. I used the following settings.

  1. Step 1, Job Name – AWS. Member infoblox.localdomain (assuming you left everything default when setting up the grid).
  2. Step 2, Server Type – AWS, Service Endpoint – ec2.<region>.amazonaws.com, Access Key ID – <vDiscovery User Access Key>, Secret Access Key <vDiscovery Access Key>.
  3. Step 3, no changes
  4. Step 4, enable DNS host record creation. Set the computed DNS name to ${vm_name}
  5. Step 5, Click Save & Close

Here is a screen shot of my settings for Step 4 (above).

Now to run the vDiscovery. Click the drop down arrow on Discovery and select vDiscovery Manager. Select the AWS Job, then click start.

Hopefully the job will complete in a few seconds (Assuming you have a small environment). My job ran fine and discovered the two VPC’s I have in my Region.

Drilling down into the first Subnet in my default VPC lists the addresses currently in use or reserved. Here I set the filter to show a Status equals used.

This should do for now. The next article will walk through the integration with vRAC, including the deployment of an AWS machine with defined IP, and one with the first available IP in a Range.

Stay tuned.

vExpert 2020

It looks like my efforts in 2019 finally paid off, as I was awarded VMware vExpert 2020 in late February 2020.

I started working on content early last year, publishing close to 10 articles between knotacoder and my employers website.

Plus I signed up for a couple of VMware Design Programs (vRA and vROPS) which also helped give me more items to claim on my submissions.

Thanks to everyone who visited this site since last year. I’m currently working on additional vRAC content even as I type.

vRAC Security Groups revisited

This is follow up to the previous article. A co-worker of mine came up with a better and much cleaner solution.

My original solution worked, but introduced a nasty deployment topology diagram.  In effect it showed every SG as attached, even unused ones. This diagram is very misleading and doesn’t reflect the actual assignment of the Security Groups.

The new solution is much cleaner and more closely represents what the user actually requested.  Here the two mandatory SG’s as well as the required role SG are attached.

The new conceptual code seemed logical, but vRAC just didn’t like it.

formatVersion: 1
inputs:
  extraSG:
    type: string
    title: Select extra SG
    default: nsx:compute_web_sg
    oneOf:
      - title: web
        const: nsx:compute_web_sg
      - title: app
        const: nsx:compute_app_sg
      - title: db
        const: nsx:compute_db_sg
resources:
  ROLE_SG:
    type: Cloud.SecurityGroup
    properties:
      constraints:
        - tag: '${input.extraSG}'
      securityGroupType: existing
  Cloud_Machine_1:
    type: Cloud.Machine
    properties:
      image: RHEL 8 - Encrypted EBS
      flavor: generic.small
      networks:
        - network: '${resource.Cloud_Network_1.id}'
          assignPublicIpAddress: false
          securityGroups:
            - '${resource.ROLE_SG.id}'

After some tinkering I came up the following blueprint.

formatVersion: 1
inputs:
  nsxNetwork:
    type: string
    default: compute
    enum:
      - compute
      - transit
  extraSG:
    type: string
    title: Select extra SG
    default: web
    enum:
      - web
      - app
      - db
resources:
  ROLE_SG:
    type: Cloud.SecurityGroup
    properties:
      name: '${input.extraSG + ''_sg''}'
      constraints:
        - tag: '${''nsx:'' + input.nsxNetwork + ''_'' + input.extraSG + ''_sg''}'
      securityGroupType: existing
  vmOverlay:
    type: Cloud.SecurityGroup
    properties:
      name: NSX Overlay
      constraints:
        - tag: 'nsx:vm-overlay-sg'
      securityGroupType: existing
  WebDMZ:
    type: Cloud.SecurityGroup
    properties:
      name: WebDMZ
      constraints:
        - tag: 'nsx:compute_webdmz'
      securityGroupType: existing
  Cloud_Machine_1:
    type: Cloud.Machine
    properties:
      remoteAccess:
        authentication: keyPairName
        keyPair: id-rsa 
      image: RHEL 8 - Encrypted EBS
      flavor: generic.small
      constraints:
        - tag: 'cloud_type:public'
      tags:
        - key: nsxcloud
          value: trans_ssh
      networks:
        - network: '${resource.Cloud_Network_1.id}'
          assignPublicIpAddress: false
          securityGroups:
            - '${resource.WebDMZ.id}'
            # Adding for NSX Cloud
            - '${resource.vmOverlay.id}'
            # if input.extraSG = "web" then WEB_SG else if imput.extraSG = "app" then APP_DB else DB_SG
            # - '${input.extraSG == "web" ? resource.WEB_SG.id : input.extraSG == "app" ? resource.APP_SG.id : resource.DB_SG.id}'
            - '${resource.ROLE_SG.id}'
  Cloud_Network_1:
    type: Cloud.Network
    properties:
      networkType: existing
      constraints:
        - tag: 'nsx:cloud_compute'

As you can see, there is more than one way to solve use cases with vRAC.  The key sometimes is just to keep trying different options to get the results you want.

vRAC Security Groups lessons learned

A recent vRealize Automation Cloud (vRAC) use case involved applying AWS Security Groups(SG) when deploying a new machine. First, every new AWS machine will be assigned two standard SGs. A third one will be assigned based on the application type (Web, App, DB).

After looking at the Cloud Assembly blueprint expression syntax page, it looked like we would be limited to only two options in our condition (if else). For example, ${input.count < 2 ? "small" : "large"}. Or if input.count < 2 then “small” else “large”.

But we have three options, not two. Effectively we needed.

if (extraSG == 'web' {
then web_sg }
else if (extraSG = 'app' {
then app_sg }
else {
db_sg
}

Or using javascript shorthand.

extraSG == 'web' ? web_sg : extraSG == 'app' ? app_sg : db_sg

Or converted into something vRAC can consume.

${input.extraSG == "web" ? resource.WEB_SG.id : input.extraSG == "app" ? resource.APP_SG.id : resource.DB_SG.id}

Lets see what happens when we deploy this blueprint selecting WEB_SG from the list.

formatVersion: 1
inputs:
  extraSG:
    type: string
    title: Select extra SG
    default: web
    oneOf:
      - title: WEB_SG
        const: web
      - title: APP_SG
        const: app
      - title: DB_SG
        const: db
resources:
  WEB_SG:
    type: Cloud.SecurityGroup
    properties:
      name: WEB_SG
      constraints:
        - tag: 'nsx:compute_web_sg'
      securityGroupType: existing
  APP_SG:
    type: Cloud.SecurityGroup
    properties:
      name: APP_SG
      constraints:
        - tag: 'nsx:compute_app_db'
      securityGroupType: existing
  DB_SG:
    type: Cloud.SecurityGroup
    properties:
      name: DB_SG
      constraints:
        - tag: 'nsx:compute_db_sg'
      securityGroupType: existing
  vmOverlay:
    type: Cloud.SecurityGroup
    properties:
      name: NSX Overlay
      constraints:
        - tag: 'nsx:vm-overlay-sg'
      securityGroupType: existing
  WebDMZ:
    type: Cloud.SecurityGroup
    properties:
      name: WebDMZ
      constraints:
        - tag: 'nsx:compute_webdmz'
      securityGroupType: existing
  Cloud_Machine_1:
    type: Cloud.Machine
    properties:
      remoteAccess:
        authentication: keyPairName
        keyPair: id-rsa
      image: RHEL 8
      flavor: generic.small
      constraints:
        - tag: 'cloud_type:public'
      tags:
        - key: nsxcloud
          value: trans_ssh
      networks:
        - network: '${resource.Cloud_Network_1.id}'
          assignPublicIpAddress: false
          securityGroups:
            - '${resource.WebDMZ.id}'
            # Adding for NSX Cloud
            - '${resource.vmOverlay.id}'
            # if input.extraSG = "web" then WEB_SG else if imput.extraSG = "app" then APP_DB else DB_SG
            - '${input.extraSG == "web" ? resource.WEB_SG.id : input.extraSG == "app" ? resource.APP_SG.id : resource.DB_SG.id}'
  Cloud_Network_1:
    type: Cloud.Network
    properties:
      networkType: existing
      constraints:
        - tag: 'nsx:cloud_compute'

Let’s take a look at the deployment topology from vRAC.

This diagram indicates that all of the SGs where attached to the machine. That can’t be right. I wonder what the machine looks like in AWS.

Hmm, looks like a bug to me.

Now on to leveraging NSX Cloud with vRAC. Stay tuned.

vRAC Ansible Control Host Lessons Learned

Now that I’ve spent a couple of months with vRealize Automation Cloud (vRAC) (AKA VMware Cloud Assembly) I figured it would be a good time to jot down some lessons learned from deploying and using several Ansible Control Hosts (ACH).

My first ACH was an Ubuntu machine deployed on our vSphere cluster. This deployment model requires connectivity to a Cloud Proxy. All of my ACH’s since then have been on an AWS t2.micro Amazon Linux instance. The remaining blog focuses on that environment.

First you should assign an Elastic IP if you plan on shutting it down when not in use. Failing to do so makes your vRAC integrated control host unreachable when it reboots, as the IP will change.

This leaves you with an unusable ACH, with no option but to replace it with a new one. A word of warning here, make sure to delete any deployments that used that ACH. Failing to do so will leave you with an undeletable deployment. Plus you may not even be able to delete the ACH from the integrations page. Oh joy!

Next is the AWS security group. The off shore developers cannot provide a list of source IP’s for any vRAC call into the ACH. Or in other words you have to open your server up to the world. I’ve been told they are working on this, but do not know when they’ll have a fix.

So, I’ve been experimenting with ConfigServer Security and Firewall (CSF). My current test is pretty out of the box and is using two block lists in csf.blocklists. So far so good, as I was able to add the test ACH without an issue, and it is blocking tons of IP’s. The host is using about 180K of RAM with the blocklists in place.

Changing the SSH server port to something other than port 22 doesn’t work right now. I brought this up on my weekly VMware call, so hopefully they’ll get it fixed in the near future.

I’m using an S3 bucket to backup my playbooks once a day. I did have to create an IAM Role with S3 permissions and assigned it to my ACH. I’ll eventually the files pushed up to a repository.

Now on to troubleshooting. One of my beefs is the naming convention of the log files in the ACH users home directory (var/tmp/vmware/provider/user_defined_script/). The only way to figure out which directory to look in is to list them by date (ls -alt) or do a find on the machine IP.

The ACH playbook log directory contains different files depending on the run state. A running deployment contains additional information like what the actual ansible command was called (exec). Exec disappears when the deployment is complete or it errors out, making it almost impossible to look at the ansible command and figure out why it failed. My work around is to quickly jump into the directory and print out ‘exec’.

The ansible command isn’t the only thing that is deleted when a deployment fails, they also delete the host and user variables under /etc/ansible/host_vars/{ip address}. I just keep two windows open, and print out the contents of anything beginning with ‘vra*’. However, ‘log.txt’ in the deployment log folder does contain the extra host variables, but does not contain the user variables.

I’m still figuring out how much space I need in the users home directory for log files. vRAC doesn’t delete folders when destroying a deployment. I suspect an active ACH will eventually fill up the ACH user log files (var/tmp/vmware/provider/user_defined_script/). Right now I’m seeing an average folder size of a bit less than 20k for completed deployments, or about 1G per 60 deployments. And no this isn’t on my grip list yet, but will be.

That’s it for now. Come back soon.

vRAC AWS Windows Domain Join using Ansible

My most recent vRealize Automation Cloud (vRAC) task was to leverage Ansible to join a new AWS Windows machine to an Active Directory domain.

First off I needed to figure out how to get WinRM working in an AWS AMI. Initially I just deployed a Windows instance, installed WinRM, added a new account admin account, then created a private AMI. This did work, sometimes, but really wasn’t the best solution for the customer. What I really needed was a way to install WinRM and create the new user in an EC2 instance deployed from a publicly available AMI.

The dots finally connected last week when it dawned on me that vRAC cloudConfig equals AWS instance User Data. Yes it was that simple.

After reviewing Running Commands on your Windows Instance at Launch, and some tinkering I came up with this basic vRAC cloudConfig PowerShell script. It adds the new user in the local Administrators group, then installs and configures WinRM. This leaves me a clean Ansible ready machine.

  cloudConfig: |
    <powershell>
    # Add new user for ansible access
    $password = ConvertTo-SecureString ${input.new_user_password} -AsPlainText -Force
    $newUser = New-LocalUser -Name "${input.new_user_name}" -Password $password -FullName "Ansible Remote User" -Description "Ansible remote user" 
    Add-LocalGroupMember -Group "Administrators" -Member "${input.new_user_name}"    
    # Setup WinRM
    Invoke-Expression ((New-Object System.Net.Webclient).DownloadString('https://raw.githubusercontent.com/ansible/ansible/devel/examples/scripts/ConfigureRemotingForAnsible.ps1'))
    </powershell>

The resulting instance User Data includes the expanded variables as seen below.

<powershell>
# Add new user for ansible access
$password = ConvertTo-SecureString VMware123! -AsPlainText -Force
$newUser = New-LocalUser -Name "ansibleuser" -Password $password -FullName "Ansible Remote User" -Description "Ansible remote user" 
Add-LocalGroupMember -Group "Administrators" -Member "ansibleuser"    
# Setup WinRM
Invoke-Expression ((New-Object System.Net.Webclient).DownloadString('https://raw.githubusercontent.com/ansible/ansible/devel/examples/scripts/ConfigureRemotingForAnsible.ps1'))
</powershell>

The playbook turned out to be fairly simple. It waits for 5 minutes, points DNS to the DC (also running DNS), renames the machine, and joins it to the domain.

- hosts: win
  gather_facts: no
  tasks:

  - name: Pause for OS
    pause:
      minutes: 5

  - name: Change DNS to DC
    win_dns_client:
      adapter_names: '*'
      ipv4_addresses:
        - 10.10.0.100

  - name: Rename machine
    win_hostname:
      name: "{{ hostname }}"
    register: res

  - name: Reboot if necessary
    win_reboot:
    when: res.reboot_required

  - name: Wait for WinRM to be reachable
    wait_for_connection:
      timeout: 900

  - name: Join to "{{ domain_name }}"
    win_domain_membership:
      hostname: "{{ hostname }}"
      dns_domain_name: "{{ domain_name }}"
      domain_admin_user: "{{ domain_user }}"
      domain_admin_password: "{{ domain_user_password }}"
      domain_ou_path: "{{ domain_oupath }}"
      state: domain
    register: domain_state

  - name: Wait for 2 minutes
    pause:
      minutes: 2

  - name: reboot if necessary
    win_reboot:
      post_reboot_delay: 120
    when: domain_state.reboot_required

Blending existing AWS User Data with vRAC cloudConfig finally provided a clean solution without having to write a super complex ansible playbook. Keeping it simple once again pays off.

The blueprint and playbook referenced in this article are available this github repo.

vRealize Automation Cloud Ansible Enhancements

VMware released some Ansible enhancements within the last couple of weeks.

First is the ability to use the private IP of the deployed machine.  Prior the this fix, disabling the public IP threw and error and the deployment failed.

To disable the assignment of a pubic IP (default), simply add ‘assignPublicIpAddress: false‘ in the network properties.

Cloud_Machine_1:
  type: Cloud.Machine
  properties:
    remoteAccess:
      keyPair: id_rsa
      authentication: keyPairName
      image: CentOS 7
      flavor: generic.tiny
      attachedDisks:
        - source: '${resource.Cloud_Volume_1.id}'
      networks:
        - network: '${resource.Cloud_Network_1.id}'
          assignPublicIpAddress: false

By default, vRAC will use the private ip address of the first NIC on the machine.

Just a few things about the placement of the machines.  First my Ansible Control Host (ACH) is on a Public AWS subnet.  My first attempt to install NGINX on a machine deployed to the same subnet failed as it could not find the repo.  After some troubleshooting I determined the new machine needs to be deployed on a private subnet, with a NAT Gateway.  Oh and make sure the ACH can connect to the deployed machine on TCP port 22 (SSH).

The second was having the ability to send extra variables to the ACH.  Here the use case is to join an AWS backed Windows server to a domain using an ansible playbook.

Ansible extra variables can be added under the properties in the Ansible component.  Here I’m going to add several just to demonstrate what it looks like.

Cloud_Ansible_1:
  type: Cloud.Ansible
  properties:
    host: '${resource.Cloud_Machine_1.*}'
    osType: linux
    account: ansible-control-host
    username: centos
    privateKeyFile: /home/ansibleoss/.ssh/id_rsa
    playbooks:
    provision:
      - /home/ansibleoss/playbooks/centos-nginx/playbook.yml
    groups:
      - linux
    hostVariables:
      bluePrintName: BP- ${env.blueprintName}
      message: Hello World
      domain: corp.local
      orgUnit: ou=sample,dc=corp,dc=local
      disks:
        disk1:
          size: '${resource.Cloud_Volume_1.capacityGb}'
          label: '${input.disk1_label}'
        disk2:
          size: 20
          label: Fake disk

These variables are stored in /etc/ansible/host_vars/vra_user_host_vars.yml.

This is the resulting YAML file for this blueprint request.

vra_user_host_vars

They also changed the default connection type to winrm (default is SSH) if the osType is set to ‘windows’.

This will be the topic of my next article.

Stay tuned.

CentOS Image for Cloud-init on VMWare Cloud Assembly Services

My current customer is looking at using VMware Cloud Assembly Services (CAS) for their next generation SDDC.  It looked like CAS would be able to address some of their Ansible and other OS customization use cases.

Ubuntu cloud ready OVA works great, but unfortunately a cloud ready CentoOS OVA was not available (they use RHEL and CentOS as their primary Linux Distro).

Well it took me a bit, but was able to build an OVA that worked.  Here is how I did it.

First I built a clean CentOS 7.x image using the minimal install ISO.  I’m not going through this step by step as its been well documented else where.

Secondly, make sure to change the CD ROM back to client after reboot.  Do not leave it pointed at one on a datastore, even if it is not connected at power on.  Why? Well a Cloud-Init ISO is mounted on the machine when it powers up.  CI failed to run when I left the CD pointed at and ISO on a datastore.

After the initial reboot, I simply updated the machine, and installed open-vm-tools and cloud-init.

#yum update -y

#yum install -y open-vm-tools cloud-init

Then cleaned up the machine.

#cloud-init clean --logs

#sys-unconfig

This last command will return the machine to an uninstalled state, and shut it down.

Next, within vCenter I enabled the vApp Options.

vAppOptions.jpg

Then gave the appliance a name and added a few properties.

vAppProperties.jpg

And finally enable ISO as the environment transport.

vAppTransport

After saving the settings, I converted it into a template and imported it into CAS.

From there I created a new Image Mapping, and gave it a try in CAS.

CentosCloudInitSuccess

The blueprint and Ansible playbook can be found at this github repository.