Cluster Initialization

This details the process we went through the to initialize all cluster components for the first time. This requires some additional steps compared to starting an initialized cluster, or adding nodes to an existing cluster(see Cluster Maintenance for those topics).

Initial Cluster Setup

Setup the OS, sudo access, & networking for all of your Controller, Compute, & Storage Nodes according to the Node Setup section.

Add variables file for all of your nodes to the host_vars directory - look at hosts in the same group to see what variables need to be defined.

Next run the ansible playbook with the initial tag:

ansible-playbook acorn.yml -t initial

This will fail when mysql is restarted because there is no running cluster for the nodes to join - but that’s OK because restarting MySQL is the last task in the initial tag.

Now run the playbook with the ha tag to install the High Availability dependencies:

ansible-playbook acorn.yml -t ha

Follow the instructions in the High Availability Initialization section to setup the MySQL Cluster, Master Controller Virtual IP Address, & HAProxy.

On your controllers, add the Open vSwitch Bridge:

sudo ovs-vsctl add-br br-provider

On your compute nodes, add the Open vSwitch Bridge & attach the provider interface:

sudo ovs-vsctl add-br br-provider
sudo ovs-vsctl add-port br-provider THE_NODES_PROVIDER_INTERFACE

Now run the entire playbook:

ansible-playbook acorn.yml

Once that’s finished, follow the instructions in the Ceph Initialization section.

You should be set now, you can verify by running the following commands on the first controller node:

cd ~
. admin-openrc.sh

# Image Service
sudo apt-get install -y qemu-utils
wget http://download.cirros-cloud.net/0.3.5/cirros-0.3.5-x86_64-disk.img
qemu-img convert -f qcow2 -O raw cirros-0.3.5-x86_64-disk.img cirros.raw
openstack image create "cirros" --file cirros.raw --disk-format raw \
    --container-format bare --public
openstack image list

# Compute Service
openstack compute service list

# Networking Service
neutron ext-list
openstack network agent list

# Block Storage Service
openstack volume service list

# Launch a VM
openstack flavor create --id 0 --vcpus 1 --ram 64 --disk 1 nano
. acorn-openrc.sh
openstack security group rule create --proto icmp default
openstack security group rule create --proto tcp --dst-port 22 default
openstack network list
PRIVATE_NETWORK_ID="$(openstack network list -f value -c ID -c Name | grep private | cut -f1 -d' ')"
openstack server create --flavor nano --image cirros \
    --nic net-id=$PRIVATE_NETWORK_ID --security-group default test-instance
openstack server list
openstack floating ip create provider   # Check the created IP
FLOATING_IP="$(openstack floating ip list -c 'Floating IP Address' -f value)"
openstack server add floating ip test-instance $FLOATING_IP
echo $FLOATING_IP
# Should be able to ssh in as `cirros` w/ password `cubswin:)`

High Availability Initialization

Some manual setup is required for highly available controller nodes. All your Controller nodes should be online during this process & you should have already run the Ansible playbook with the ha tag.

MySQL

Stop the mysql server on the first controller node & start it as a cluster:

# On stack-controller-1
sudo systemctl stop mysql
sudo galera_new_cluster

Once that has finished, you can start mysql on the other controller nodes:

# On stack-controller-2, stack-controller-3
sudo systemctl start mysql

RabbitMQ

Join the backup controllers to the master controller:

# On stack-controller-2, stack-controller-3
sudo rabbitmqctl stop_app
sudo rabbitmqctl join_cluster rabbit@stack-controller-1
sudo rabbitmqctl start_app

Then, on any controller node, enable mirroring of all queues:

sudo rabbitmqctl cluster_status
sudo rabbitmqctl set_policy ha-all '^(?!amq\.).*' '{"ha-mode": "all"}'

Pacemaker

Ansible only installs the Pacemaker & HAProxy packages. You will need to create the cluster & Virtual IP address when first creating the OpenStack cluster.

Start by SSHing into stack-controller-1, removing the initial config file & authenticating the controller node:

# On stack-controller-1
sudo pcs cluster destroy
sudo pcs cluster auth stack-controller-1 stack-controller-2 stack-controller-3 \
    -u hacluster -p PASSWORD

Create, start, & enable the cluster:

sudo pcs cluster setup --start --enable --name acorn-controller-cluster \
    --force stack-controller-1 stack-controller-2 stack-controller-3

Set some basic properties:

sudo pcs property set pe-warn-series-max=1000 \
    pe-input-series-max=1000 \
    pe-error-series-max=1000 \
    cluster-recheck-interval=3min

Disable STONITH:

sudo pcs property set stonith-enabled=false

Create the Virtual IP Address:

sudo pcs resource create management-vip ocf:heartbeat:IPaddr2 \
    ip="10.2.1.10" cidr_netmask="24" op monitor interval="30s"

Add HAProxy to the cluster & only serve the VIP when HAProxy is running:

sudo pcs resource create lb-haproxy lsb:haproxy --clone
sudo pcs constraint order start management-vip then lb-haproxy-clone kind=Optional
sudo pcs constraint colocation add lb-haproxy-clone with management-vip

Add the Glance service to Pacemaker:

sudo pcs resource create glance-api lsb:glance-api --clone --force

Add the Cinder service to Pacemaker:

sudo pcs resource create cinder-api lsb:cinder-api --clone interleave=true --force
sudo pcs resource create cinder-scheduler lsb:cinder-scheduler --clone interleave=true --force

Ceph Initialization

Ansible only installs the ceph-deploy tool on controller nodes, the Ceph storage cluster must be manually initialized.

Ceph Setup

Start by SSHing into the master controller, we’ll make running repeated commands easier by setting some array variables:

# On stack-controller-1
CONTROLLERS=('stack-controller-1' 'stack-controller-2' 'stack-controller-3')
COMPUTE=('stack-compute-1' 'stack-compute-2' 'stack-compute-3')
STORAGE=('stack-storage-1' 'stack-storage-2' 'stack-storage-3')

Then generate an SSH key & copy it to the Controller & Storage nodes:

ssh-keygen -t ecdsa -b 521
for SRV in "${CONTROLLERS[@]}" "${COMPUTE[@]}" "${STORAGE[@]}"; do ssh-copy-id $SRV; done

Now create a directory for the cluster configuration:

mkdir ~/ceph-cluster
cd ~/ceph-cluster

Deploy the initial cluster with the Controller nodes as monitors:

ceph-deploy new --public-network 10.4.1.0/24 ${CONTROLLERS[@]}

Open up the ceph.conf in ~/ceph-cluster/ and add the cluster network & nearfull ratio settings:

cluster_network = 10.5.1.0/24
mon_osd_nearfull_ratio = 0.67

A nearfull ratio of 0.67 is based off of allowing 1-node to fail in a 3-node ceph cluster.

Install Ceph on the nodes(we specify the full repo URL instead of just using --release mimic to avoid HTTPS, allowing packages to be cached by our web proxy):

ceph-deploy install --release mimic --repo-url http://download.ceph.com/debian-mimic ${CONTROLLERS[@]} ${STORAGE[@]}

Then create the initial monitors & start them on boot:

ceph-deploy mon create-initial
for SRV in "${CONTROLLERS[@]}"; do
    ssh $SRV sudo systemctl enable ceph-mon.target
done

Next, add the OSDs. You’ll want an SSD with a journal partition for each OSD(/dev/sdb#), and an HDD for each OSD:

# Block Storage
ceph-deploy osd create stack-storage-1 --data /dev/sdc
ceph-deploy osd create stack-storage-1 --data /dev/sdd
ceph-deploy osd create stack-storage-2 --data /dev/sdc
ceph-deploy osd create stack-storage-2 --data /dev/sdd
# etc.

# File Storage
ceph-deploy osd create stack-storage-1 --filestore --data /dev/sdc --journal /dev/sdb1
# etc.

# If your drive layout is identical on every storage server:
OSDS=('/dev/sdc' '/dev/sdd')
for SRV in "${STORAGE[@]}"; do
    for OSD in "${OSDS[@]}"; do
        ceph-deploy osd create $SRV --data $OSD
    done
done

Now copy the configuraton file & admin key to the controller nodes:

ceph-deploy admin ${CONTROLLERS[@]}

And set the correct permissions on the admin key:

for SRV in "${CONTROLLERS[@]}"; do
    ssh $SRV sudo chmod +r /etc/ceph/ceph.client.admin.keyring
done

Enable the manager daemon:

ceph-deploy mgr create ${CONTROLLERS[@]}

Check the health of the storage cluster with ceph health & watch syncing progress with ceph -w.

OpenStack Integration

Now we’ll make OpenStack use the Ceph cluster for Image & Block storage. Start by creating some pools to use:

ceph osd pool create volumes 512 replicated replicated_rule 64
rbd pool init volumes
ceph osd pool create vms 128
rbd pool init vms
ceph osd pool create images 64
rbd pool init images

Create Ceph Users for the various OpenStack Services, and assign them the appropriate pool permissions:

ceph auth get-or-create client.glance mon 'allow r' osd 'allow class-read object_prefix rbd_children, allow rwx pool=images'
ceph auth get-or-create client.cinder mon 'allow r' osd 'allow class-read object_prefix rbd_children, allow rwx pool=volumes, allow rwx pool=vms, allow rwx pool=images'

Then copy them to your nodes:

# Copy glance key to controllers
for SRV in ${CONTROLLERS[@]}; do
    ceph auth get-or-create client.glance | ssh $SRV sudo tee /etc/ceph/ceph.client.glance.keyring
    ssh $SRV sudo chown glance:glance /etc/ceph/ceph.client.glance.keyring
done

# Copy cinder key to controller & compute nodes
for SRV in "${CONTROLLERS[@]}" "${COMPUTE[@]}"; do
    ceph auth get-or-create client.cinder | ssh $SRV sudo tee /etc/ceph/ceph.client.cinder.keyring
done

# Set the correct permissions on controller nodes
for SRV in "${CONTROLLERS[@]}"; do
    ssh $SRV sudo chown cinder:cinder /etc/ceph/ceph.client.cinder.keyring
done

Copy the ceph.conf to the Compute nodes(it should already be present on the other nodes):

for SRV in "${COMPUTE[@]}"; do
    ssh $SRV sudo tee /etc/ceph/ceph.conf < /etc/ceph/ceph.conf
done

Display the secret key for the client.cinder ceph user and add it to the ansible password vault as vaulted_rbd_cinder_key:

ceph auth get-key client.cinder

Generate a UUID to use for the libvirt secret using uuidgen. Add the UUID to the ansible password vault as vaulted_rbd_cinder_uuid. Make sure to re-run the ansible playbook for the compute nodes so the libvirt secret is added(ansible-playbook acorn.yml -t compute).

Finally, restart the OpenStack services:

# On Controller
for SRV in "${CONTROLLERS[@]}"; do
    ssh $SRV sudo systemctl restart glance-api
    ssh $SRV sudo systemctl restart cinder-volume
done

# On Compute
for SRV in "${COMPUTE[@]}"; do
    ssh $SRV sudo systemctl restart nova-compute
done