CyberArk Ansible Integration

As an alternative to vRA Cloud Secrets

Well its been a while since I posted anything. To be honest, this site and posts were used to support my vExpert applications, but apparently blog content doesn’t count anymore. So…. now that I’m free from that obligation, I can just post because I want to.

This article details my efforts to understand how CyberArk and Ansible work together. My particular use case is to replace vRA Cloud secrets with variables stored in CyberArk. More specifically the issue with vRA secrets is they are limited to a single Project. This doesn’t work to well for a company with more than one project. Basically have one secret (mysecret) per project. Or if you have 10 projects, 10 secrets named mysecret (one for each project).

Now down to business. The first thing is to setup CyberArk following the instructions from their Quick Start tutorial. The basic setup is done by step 6, no real need to go past that unless you want to. A couple of notes here. First the Master Key (Step 2) and Admin api_key (Step 5) are saved to a text file on your docker host. And secondly, by default the SSL generated by the installer uses localhost, proxy, and 127.0.0.1 as the SAN. You can change this in conjur-quickstart/conf/tls/tls.conf. I’ll be using the default proxy as the hostname, along with some entries in /etc/hosts on my Mac and Ansible host.

Next I installed Cyberark CLI on my Mac. The instructions are available here. Note is is only supported on Windows, RHEL and Mac.

The setup file on my Mac for ~.conjurcli looks like this.

cert_file: /Users/me/conjur-server.pem
conjur_account: myConjurAccount
conjur_url: https://proxy:8443

Now to define some CyberArk Conjur (conjur) policy files. The first was to define a new clean branch for my ansible policies. I called it mybranch (Hey it was Friday and I already used my weekly good braincell quota). I even used a creative name, ‘create-ansible-branch.yaml’.

- !policy
  id: mybranch

And to apply it (assuming you’ve already logged in as Admin).

mymac>conjur policy replace -b root -f create-ansible-branch.yaml
mymac>conjur list
[
    "myConjurAccount:policy:mybranch",
    "myConjurAccount:policy:root"
]

Now on to defining the ansible host (ansible2)

- !layer

- !host ansible2

- !grant
  role: !layer
  member: !host ansible2

mymac>conjur policy load -b mybranch -f ansible2-host-policy.yaml

The result will contain an api_key for the new host. You’ll probably want to copy this into your scratch pad.

  {
      "created_roles": {
          "myConjurAccount:host:mybranch/ansiblehost": {
              "id": "myConjurAccount:host:mybranch/ansiblehost",
              "api_key": "1xgpkp02d8etyz2zb........" # <--- api_key
          }
      },
      "version": 2
  }

Now to create a new group, variable, and grant ansible2 permissions.

# Declare the secrets which are used to access the database
- &variables
  - !variable password2

# Define a group which will be able to fetch the secrets
- !group secrets-users

- !permit
  resource: *variables
  # "read" privilege allows the client to read metadata.
  # "execute" privilege allows the client to read the secret data.
  # These are normally granted together, but they are distinct
  #   just like read and execute bits on a filesystem.
  privileges: [ read, execute ]
  roles: !group secrets-users
# Entitlements

- !grant
  role: !group secrets-users
  member: !layer /mybranch

mymac>conjur policy load -b mybranch -f ansible2-access-policy.yaml
### Set the password variable value
mymac>conjur variable set -i mybranch/password2 -v "HelloWorld"

Our work with CyberArk is done for the time being. Now on to your ansible host. Here the assumption is our ansible host is setup properly. First install the Cyberark.conjur collection.

ubunutu@ansible2$ansible-galaxy collection install cyberark.conjur

Now to define some files on your ansible host. The file names and content are shown below. You can figure out how to get the contents of conjur.pem.

/etc/conjur.conf

account: myConjurAccount
appliance_url: https://proxy:8443
cert_file: /etc/conjur.pem
netrc_path: /etc/conjur.identity
plugins: []

/etc/conjur.identity

machine https://proxy:8443/authn
    login host/mybranch/ansible2
    password gybp2n1wssmh1fr8n5k27.........


/etc/conjur.pem

-----BEGIN CERTIFICATE-----
.......
-----END CERTIFICATE-----

Almost there, now to define and run a basic ansible playbook. And by basic, I mean basic.

# get_conjur_var.yaml

---
- hosts: localhost
  tasks:
  - name: Lookup variable in Conjur
    debug:
      msg: "{{ lookup('cyberark.conjur.conjur_variable', 'mybranch/password2') }}"

ubunutu@ansible2$ansible-playbook get_conjur_var.yaml

.... 
ok: [localhost] => {
    "msg": "HelloWorld"
}
....

The next article will demonstrate how to use this with vRA cloud to replace all those repetitive secrets (Per project, Yuk!)

vRAC Ansible Control Host Lessons Learned

Now that I’ve spent a couple of months with vRealize Automation Cloud (vRAC) (AKA VMware Cloud Assembly) I figured it would be a good time to jot down some lessons learned from deploying and using several Ansible Control Hosts (ACH).

My first ACH was an Ubuntu machine deployed on our vSphere cluster. This deployment model requires connectivity to a Cloud Proxy. All of my ACH’s since then have been on an AWS t2.micro Amazon Linux instance. The remaining blog focuses on that environment.

First you should assign an Elastic IP if you plan on shutting it down when not in use. Failing to do so makes your vRAC integrated control host unreachable when it reboots, as the IP will change.

This leaves you with an unusable ACH, with no option but to replace it with a new one. A word of warning here, make sure to delete any deployments that used that ACH. Failing to do so will leave you with an undeletable deployment. Plus you may not even be able to delete the ACH from the integrations page. Oh joy!

Next is the AWS security group. The off shore developers cannot provide a list of source IP’s for any vRAC call into the ACH. Or in other words you have to open your server up to the world. I’ve been told they are working on this, but do not know when they’ll have a fix.

So, I’ve been experimenting with ConfigServer Security and Firewall (CSF). My current test is pretty out of the box and is using two block lists in csf.blocklists. So far so good, as I was able to add the test ACH without an issue, and it is blocking tons of IP’s. The host is using about 180K of RAM with the blocklists in place.

Changing the SSH server port to something other than port 22 doesn’t work right now. I brought this up on my weekly VMware call, so hopefully they’ll get it fixed in the near future.

I’m using an S3 bucket to backup my playbooks once a day. I did have to create an IAM Role with S3 permissions and assigned it to my ACH. I’ll eventually the files pushed up to a repository.

Now on to troubleshooting. One of my beefs is the naming convention of the log files in the ACH users home directory (var/tmp/vmware/provider/user_defined_script/). The only way to figure out which directory to look in is to list them by date (ls -alt) or do a find on the machine IP.

The ACH playbook log directory contains different files depending on the run state. A running deployment contains additional information like what the actual ansible command was called (exec). Exec disappears when the deployment is complete or it errors out, making it almost impossible to look at the ansible command and figure out why it failed. My work around is to quickly jump into the directory and print out ‘exec’.

The ansible command isn’t the only thing that is deleted when a deployment fails, they also delete the host and user variables under /etc/ansible/host_vars/{ip address}. I just keep two windows open, and print out the contents of anything beginning with ‘vra*’. However, ‘log.txt’ in the deployment log folder does contain the extra host variables, but does not contain the user variables.

I’m still figuring out how much space I need in the users home directory for log files. vRAC doesn’t delete folders when destroying a deployment. I suspect an active ACH will eventually fill up the ACH user log files (var/tmp/vmware/provider/user_defined_script/). Right now I’m seeing an average folder size of a bit less than 20k for completed deployments, or about 1G per 60 deployments. And no this isn’t on my grip list yet, but will be.

That’s it for now. Come back soon.

vRAC AWS Windows Domain Join using Ansible

My most recent vRealize Automation Cloud (vRAC) task was to leverage Ansible to join a new AWS Windows machine to an Active Directory domain.

First off I needed to figure out how to get WinRM working in an AWS AMI. Initially I just deployed a Windows instance, installed WinRM, added a new account admin account, then created a private AMI. This did work, sometimes, but really wasn’t the best solution for the customer. What I really needed was a way to install WinRM and create the new user in an EC2 instance deployed from a publicly available AMI.

The dots finally connected last week when it dawned on me that vRAC cloudConfig equals AWS instance User Data. Yes it was that simple.

After reviewing Running Commands on your Windows Instance at Launch, and some tinkering I came up with this basic vRAC cloudConfig PowerShell script. It adds the new user in the local Administrators group, then installs and configures WinRM. This leaves me a clean Ansible ready machine.

  cloudConfig: |
    <powershell>
    # Add new user for ansible access
    $password = ConvertTo-SecureString ${input.new_user_password} -AsPlainText -Force
    $newUser = New-LocalUser -Name "${input.new_user_name}" -Password $password -FullName "Ansible Remote User" -Description "Ansible remote user" 
    Add-LocalGroupMember -Group "Administrators" -Member "${input.new_user_name}"    
    # Setup WinRM
    Invoke-Expression ((New-Object System.Net.Webclient).DownloadString('https://raw.githubusercontent.com/ansible/ansible/devel/examples/scripts/ConfigureRemotingForAnsible.ps1'))
    </powershell>

The resulting instance User Data includes the expanded variables as seen below.

<powershell>
# Add new user for ansible access
$password = ConvertTo-SecureString VMware123! -AsPlainText -Force
$newUser = New-LocalUser -Name "ansibleuser" -Password $password -FullName "Ansible Remote User" -Description "Ansible remote user" 
Add-LocalGroupMember -Group "Administrators" -Member "ansibleuser"    
# Setup WinRM
Invoke-Expression ((New-Object System.Net.Webclient).DownloadString('https://raw.githubusercontent.com/ansible/ansible/devel/examples/scripts/ConfigureRemotingForAnsible.ps1'))
</powershell>

The playbook turned out to be fairly simple. It waits for 5 minutes, points DNS to the DC (also running DNS), renames the machine, and joins it to the domain.

- hosts: win
  gather_facts: no
  tasks:

  - name: Pause for OS
    pause:
      minutes: 5

  - name: Change DNS to DC
    win_dns_client:
      adapter_names: '*'
      ipv4_addresses:
        - 10.10.0.100

  - name: Rename machine
    win_hostname:
      name: "{{ hostname }}"
    register: res

  - name: Reboot if necessary
    win_reboot:
    when: res.reboot_required

  - name: Wait for WinRM to be reachable
    wait_for_connection:
      timeout: 900

  - name: Join to "{{ domain_name }}"
    win_domain_membership:
      hostname: "{{ hostname }}"
      dns_domain_name: "{{ domain_name }}"
      domain_admin_user: "{{ domain_user }}"
      domain_admin_password: "{{ domain_user_password }}"
      domain_ou_path: "{{ domain_oupath }}"
      state: domain
    register: domain_state

  - name: Wait for 2 minutes
    pause:
      minutes: 2

  - name: reboot if necessary
    win_reboot:
      post_reboot_delay: 120
    when: domain_state.reboot_required

Blending existing AWS User Data with vRAC cloudConfig finally provided a clean solution without having to write a super complex ansible playbook. Keeping it simple once again pays off.

The blueprint and playbook referenced in this article are available this github repo.

vRealize Automation Cloud Ansible Enhancements

VMware released some Ansible enhancements within the last couple of weeks.

First is the ability to use the private IP of the deployed machine.  Prior the this fix, disabling the public IP threw and error and the deployment failed.

To disable the assignment of a pubic IP (default), simply add ‘assignPublicIpAddress: false‘ in the network properties.

Cloud_Machine_1:
  type: Cloud.Machine
  properties:
    remoteAccess:
      keyPair: id_rsa
      authentication: keyPairName
      image: CentOS 7
      flavor: generic.tiny
      attachedDisks:
        - source: '${resource.Cloud_Volume_1.id}'
      networks:
        - network: '${resource.Cloud_Network_1.id}'
          assignPublicIpAddress: false

By default, vRAC will use the private ip address of the first NIC on the machine.

Just a few things about the placement of the machines.  First my Ansible Control Host (ACH) is on a Public AWS subnet.  My first attempt to install NGINX on a machine deployed to the same subnet failed as it could not find the repo.  After some troubleshooting I determined the new machine needs to be deployed on a private subnet, with a NAT Gateway.  Oh and make sure the ACH can connect to the deployed machine on TCP port 22 (SSH).

The second was having the ability to send extra variables to the ACH.  Here the use case is to join an AWS backed Windows server to a domain using an ansible playbook.

Ansible extra variables can be added under the properties in the Ansible component.  Here I’m going to add several just to demonstrate what it looks like.

Cloud_Ansible_1:
  type: Cloud.Ansible
  properties:
    host: '${resource.Cloud_Machine_1.*}'
    osType: linux
    account: ansible-control-host
    username: centos
    privateKeyFile: /home/ansibleoss/.ssh/id_rsa
    playbooks:
    provision:
      - /home/ansibleoss/playbooks/centos-nginx/playbook.yml
    groups:
      - linux
    hostVariables:
      bluePrintName: BP- ${env.blueprintName}
      message: Hello World
      domain: corp.local
      orgUnit: ou=sample,dc=corp,dc=local
      disks:
        disk1:
          size: '${resource.Cloud_Volume_1.capacityGb}'
          label: '${input.disk1_label}'
        disk2:
          size: 20
          label: Fake disk

These variables are stored in /etc/ansible/host_vars/vra_user_host_vars.yml.

This is the resulting YAML file for this blueprint request.

vra_user_host_vars

They also changed the default connection type to winrm (default is SSH) if the osType is set to ‘windows’.

This will be the topic of my next article.

Stay tuned.