This past week or so has been spent deploying VMware PKS Enterprise in my lab. My main installation guide was provided by Pivotal’s Installing Enterprise PKS on vSphere with NSX-T.
All was going well until I tried to deploy a cluster. I could see the machines being deployed in vCenter and various NSX-T components being deployed. However, the cluster deployment failed with the following error.
Plan Name: small
Last Action: CREATE
Last Action State: failed
Last Action Description: Instance provisioning failed:
..... task-id: 289, ... result: 2 of 7 pre-start scripts failed. Failed Jobs: ...
As you can see task 289 failed. Now how the heck do I get the details of the failed task?
The bosh cli client appeared to be the answer. Reading further it looked like I needed to set some environment variables to make it work properly.
After reading a few online documents, I was able to find the Bosh Command Line Credentials (Actually the bosh environment variables) by clicking on the Bosh Tile in Operations Manager, clicking on the Credentials tab, then clicking the link next to Bosh Commandline Credentials.
The provided BOSH_CA_CERT path and file do not exist on my jump machine. I was able to download the root CA following these steps. (Installing uaac is beyond the scope of this document).
uaac target https://opsman.corp.local/uaa --skip-ssl-validation
uaac token owner getClient ID: opsman
User name: admin
Copy the admin bearer token from the client_id section (the token is actually called access_token).
access_token: eyJhbGci .....
Finally downloading the certificate to my jump machine.
curl https://opsman.corp.local/api/v0/security/root_ca_certificate -X GET -H "Authorization: Bearer eyJhbGci ....." -k > root_ca_certificate
My reformatted bosh environment settings, along with the correct path to my certificate ended up like this.
export BOSH_CA_CERT=/root/root_ca_certificate <--- Correct path and file
After pasting the variables into my console, I attempted to get the details from the failed task.
bosh task 289
Validating Director connection config:
Parsing certificate 1: Missing PEM block
Exit code 1
What the heck? Apparently the downloaded certificate is actually in JSON format, AND it includes ‘\n’ as line returns.
Using Notepad ++ I replaced all of the ‘\n’ with a line return.
Then I removed the quotes, brackets , root_ca_certificate.pem section, and deleted all of the other newlines leaving me with a clean certificate (Each line needs to be 64 characters long).
After saving this on my machine, I attempted to run the command again, this time using the –ca-cert option pointing to the new certificate.
bosh task 289 --ca-cert root_ca_2.pem
Using environment 'bosh.corp.local' as client 'ops_manager'
Task 289 | 16:58:24 | Updating instance master: master/51431548-e35a-471b-853f-26dc7eca9f7c (0) (canary) (00:02:06)
L Error: Action Failed get_task: Task ...
Task 289 | 17:00:30 | Error: Action Failed get_task: ...
Task 289 Started Wed May 8 16:55:34 UTC 2019
Task 289 Finished Wed May 8 17:00:30 UTC 2019
Task 289 Duration 00:04:56
Task 289 error
Capturing task '289' output:
Expected task '289' to succeed but state is 'error'
Exit code 1
Now all all I need to do is figure out the error. Oh joy!