Mirantis OpenStack 8.0 OperationsGuide

Mirantis OpenStack Operations Guide
version 8.0
Mirantis OpenStack v8.0
Monitoring Guide
Contents
Preface 1
Intended Audience 1
Documentation History 1
Introduction 2
Accessing the shell on the nodes 3
Uploading Public Keys 3
SSH to the Fuel Master Node 3
SSH to target nodes 3
How To: Exclude some drives from RAID-1 array 5
How To: Modify Kernel Parameters 6
Using the Cobbler web UI to set kernel parameters 6
Using the dockerctl command to set kernel parameters 7
HowTo: Create an XFS disk partition 8
HowTo: Enable/Disable Galera Cluster Autorebuild Mechanism 9
HowTo: Backport Galera Pacemaker OCF script 10
HowTo: Backport Memcached backend fixes 12
HowTo: Backport RabbitMQ Pacemaker OCF script 14
HowTo: Manage OpenStack services 22
Adding, Redeploying, and Replacing Nodes 25
Redeploy a Non-Controller Node 25
Add a Non-Controller Node 25
Add a MongoDB node 25
Add a controller node 26
Remove a Controller node 27
Configuring an Operating System node 27
How To: Safely remove a Ceph OSD node 29
How To: Adjust Placement Groups when adding additional Ceph OSD node(s) 31
HowTo: Shut down the whole cluster 32
Starting up the cluster 32
Creating and Configuring ML2 Drivers for Neutron 33
Using YAML configuration files 34
©2016, Mirantis Inc. Page i

Monitoring Guide
Customizing Passwords 34
Adding new modules 35
Docker Containers and Dockerctl 37
Container types 37
Command reference 37
Basic usage 37
Dockerctl 38
System changes for Docker affecting Fuel 5.0 and later 39
Fuel Master architecture changes for Docker 40
Enable Experimental Features 41
Fuel Access Control 42
Managing your Ceph cluster 44
Accessing the Puppet manifest for Ceph 44
Verify the deployment 44
Missing OSD instances 45
Ceph pools 46
Test that Ceph works with OpenStack components 46
Glance 46
Cinder 47
Rados GW 49
Swift 49
Reset the Ceph cluster 51
S3 API in Ceph RADOS Gateway 51
Introduction 51
Getting started 51
User authentication 52
Migrate workloads from a compute node for maintenance 56
Disable virtual machine scheduling 56
Migrate instances 57
Monitor the migration process 59
Restore a compute node after maintenance 60
Maintenance Mode 61
©2016, Mirantis Inc. Page ii

Monitoring Guide
Overview 61
Using the umm command 61
Configuring the UMM.conf file 62
Example of using MM on one node 62
Example of putting all nodes into the maintenance mode at the same time 63
Running vCenter 66
Nova-compute and vSphere clusters mapping 66
Performance Notes 67
Keystone Token Cleanup 67
HowTo: Backup and restore Fuel Master 68
Running the backup 68
Restoring Fuel Master 68
How slave nodes choose the interface to use for PXE booting 70
Horizon Deployment Notes 71
Overview 71
Details of Health Checks 72
Sanity tests description 72
Functional tests description 72
Network issues 74
HA tests description 74
Configuration tests description 74
Cloud validation tests description 75
Notes on Corosync and Pacemaker 76
Troubleshooting 77
Logs and messages 77
Screen notifications 77
Viewing logs through Fuel 77
Viewing the Fuel Master node logs 78
Viewing logs for target nodes ("Other servers") 78
syslog 79
/var/logs 79
atop logs 80
©2016, Mirantis Inc. Page iii

Monitoring Guide
Fuel Master node log rotation 81

Enabling debug logging for OpenStack services 81
Fuel Master and Docker disk space troubleshooting 82
Overview 82
PostgreSQL database inconsistency 82
Docker metadata corruption loses containers 83
Read-only containers 86
Corrupt ext4 filesystem on Docker container 86
How To Troubleshoot Corosync/Pacemaker 87
crm - Cluster Resource Manager 87
How to verify that Neutron HA is working 93
Corosync crashes without network connectivity 95
How To Troubleshoot AMQP issues 97
Check if there is a problem in the Corosync/Pacemaker layer 97
How to recover 97
Check if there is a problem in the RabbitMQ layer 97
How to recover 98
Check if there is a problem in the Oslo messaging layer 98
How to recover 98
Check if there are AMQP problems with any of the OpenStack components 98
How to recover 98
How to make RabbitMQ OCF script tolerate rabbitmqctl timeouts 99
Timeout In Connection to OpenStack API From Client Applications 99
Enable Ubuntu bootstrap (EXPERIMENTAL) 101
HA testing scenarios 102
Regular testing scenarios 102
Nova-network 102
Neutron 104
Bonding 106
Failover testing scenarios 107
Rally 111
OpenStack Database Backup and Restore with Percona XtraBackup 112
©2016, Mirantis Inc. Page iv

Monitoring Guide
Backing up with Percona XtraBackup 112

Restoring with Percona XtraBackup 113
Writing a bootable Fuel ISO to a USB drive 115
Deploying an Empty Role through Fuel CLI 116
Configuring repositories 118
For Ubuntu 118
Repository priorities 118
Downloading Ubuntu system packages 119
Setting up local mirrors 119
Installing on a Red Hat based server 122
Debian-based server 122
Troubleshooting partial mirror 123
Applying patches 124
Introduction 124
Usage scenarios 124
Default scenario 124
Custom scenario: deploying from local mirrors; patching from local mirrors 125
Custom scenario: deploying from Mirantis mirrors; patching from local mirrors 125
Additional information 126
Verifying the installed packages on the Fuel Master node 128
Verifying the installed packages on the Fuel Slave nodes 128
Using the reduced footprint feature 130
Reduced footprint flow in brief 130
Reduced footprint flow detailed 131
Switching on SSL and secure access 135
Horizon dashboard and the OpenStack publicURL endpoints 135
HTTPS access to the Fuel Master node 136
Additional information 136
Using Networking Templates 137
Networking Templates Limitations 137
Working with Networking Templates 137
Networking Templates Samples 137
©2016, Mirantis Inc. Page v

Monitoring Guide
Networking Templates Structure 138

Operating with Networking Templates 141
Network Template Examples 143
Configuring Two Networks 143
Configuring a Single Network 144
Configure Neutron 144
Index 145
©2016, Mirantis Inc. Page vi

Monitoring Guide Preface
Preface
This documentation provides information on how to use Fuel to deploy OpenStack environments. The
information is for reference purposes and is subject to change.
Intended Audience
This documentation is intended for OpenStack administrators and developers; it assumes that you have
experience with network and cloud concepts.
Documentation History
The following table lists the released revisions of this documentation:
Revision Date Description

February, 2016 8.0 GA
©2016, Mirantis Inc. Page 1

Monitoring Guide Introduction
Introduction
This is a collection of useful procedures for using and managing your Mirantis OpenStack environment. The
information given here supplements the information in:
• OpenStack Admin User Guide

• OpenStack Cloud Administrator Guide

Monitoring Guide Accessing the shell on the nodes
Accessing the shell on the nodes

Many maintenance and advanced configuration tasks require that you access the Fuel Master and target nodes at
the shell level. All these systems run the bash shell and support standard Linux system commands. Each system
has its own console that you can use directly but the standard practice is to use SSH to access the consoles of the
different nodes.
Uploading Public Keys

You access the shell on the Fuel Master node and the target nodes using SSH, and you must define a Public Key
to use SSH.
1. Generate a Public Key with the following command on your client:
ssh-keygen -t rsa
2. Paste the public key to the Public key field. Fuel uploads the public key to each Fuel Slave node it deploys.
However, the key is not uploaded to the Fuel Slave nodes that have been already deployed.
3. To upload the SSH key to the Fuel Master node or any deployed node, use the following command
sequence:
ssh-agent
ssh-copy-id -i .ssh/id_rsa.pub root@<ip-addr>
<ip-addr> is the IP address for the Fuel Master node, which is the same IP address you use to access the Fuel
console.
You can use this same command to add a public key to a deployed target node. See SSH to target nodes for
information about getting the <ip-addr> values for the target nodes.
You can instead add the content of your key (stored in the .ssh/id_rsa.pub file) to the node's
/root/.ssh/authorized_keys file.
SSH to the Fuel Master Node

You can now use ssh to access the console of the Fuel Master Node or scp to securely copy a file to the Fuel
Master node.
SSH to target nodes

You can SSH to any of the target nodes in your environment from the Fuel Master node.
Use the fuel node list command to get a list like the following:
id | status | name | cluster | ip | mac | roles | pe

---|--------|------------------|---------|------------|-------------------|------------|---
5 | ready | Untitled (4d:4d) | 2 | 10.110.0.3 | b2:8b:55:17:ae:40 | controller |
8 | ready | Untitled (3a:7f) | 2 | 10.110.0.6 | 92:93:99:70:14:4c | compute |

Monitoring Guide Accessing the shell on the nodes
6 | ready | Untitled (34:84) | 2 | 10.110.0.4 | f2:b3:1a:74:da:41 | cinder |

7 | ready | Untitled (f0:9b) | 2 | 10.110.0.5 | 56:09:fe:c6:06:40 | compute |
You can ssh to any of the nodes using the IP address. For example, to ssh to the Cinder node:
ssh 10.110.0.4
You can also use the "id" shown in the first column, for example:
ssh node-6

Monitoring Guide How To: Exclude some drives from RAID-1 array
How To: Exclude some drives from RAID-1 array

RAID-1 spans all configured disks on a node, putting a boot partition on each disk because OpenStack does not
have access to the BIOS. It is not currently possible to exclude some drives from the Fuel configuration on the
Fuel UI. This means that one cannot, for example, configure some drives to be used for backup and recover or as
b-cache.
You can work around this issue as follows. This example is for a system that has three disks: sda, sdb, and sdc.
Fuel will provision sda and sdb as RAID-1 for OpenStack but sdc will not be used as part of the RAID-1 array:
1. Use the Fuel CLI to obtain provisioning data:
fuel provisioning --env-id 1 -d
2. Remove the drive which you do not want to be part of RAID:
- size: 300
type: boot
- file_system: ext2
mount: /boot
name: Boot
size: 200
type: raid
3. Run deployment
fuel provisioning --env-id 1 -u
4. Confirm that your partition is not included in the RAID array:
[root@node-2 ~]# cat /proc/mdstat

Personalities : [raid1]
md0 : active raid1 sda3[0] sdb3[1] 204736 blocks
super 1.0 [2/2] [UU]

Monitoring Guide How To: Modify Kernel Parameters
How To: Modify Kernel Parameters

Kernel parameters are options that are passed to the kernel when a Linux system is booted. Kernel parameters
can be set in any of the following ways:
• Use the Fuel Welcome screen to define kernel parameters that will be set for the Fuel Master node when it
is installed.
• Use the Initial parameters field on the Settings tab to define kernel parameters that Fuel will set on the
target nodes. This only affects target nodes that will be deployed in the future, not those that have already
been deployed.
• Use the Cobbler web UI (see Using the Cobbler web UI to set kernel parameters) to change kernel parameters
for all nodes using or for specific nodes. The nodes appear in Cobbler only after they are deployed; to
change parameters before deployment, stop the deployment, change the parameters, then proceed.
• Issue the dockerctl command on each node where you want to set kernel parameters; see Using the dockerctl
command to set kernel parameters.
Any kernel parameter supported by Ubuntu can be set for the target nodes and the Fuel Master node.
Using the Cobbler web UI to set kernel parameters

You can use the Cobbler web UI to set kernel parameters:
• Use the https://<ip-addr>/cobbler_web URL to access the Cobbler web UI; replace <ip-addr> with the IP
address for your Fuel Master Node.
• Log in, using the user name and password defined in the cobbler section of the /etc/fuel/astute.yaml file.
• Select Systems from the menu on the right. This lists the nodes that are deployed in your environment.
Select the node(s) for which you want to set new parameters and click "Edit". The following screen is
displayed:

Mirantis OpenStack v8.0 Using the dockerctl command to set kernel
Monitoring Guide parameters
• Add the kernel parameters and values to the Kernel Options (Post-install) field then click the `Save button.
Using the dockerctl command to set kernel parameters

Use the dockerctl console command on the Fuel Master node to add a kernel parameter definition. For example,
the following command sets the intel_iommu=off parameter:
`dockerctl shell cobbler cobbler profile edit --name bootstrap --kopts="intel_iommu=off" --

Monitoring Guide HowTo: Create an XFS disk partition
HowTo: Create an XFS disk partition

In most cases, Fuel creates the XFS partition for you. If for some reason you need to create it yourself, use this
procedure:
Note
Replace /dev/sdb with the appropriate block device you wish to configure.
1. Create the partition itself
fdisk /dev/sdb
n(for new)
p(for partition)
<enter> (to accept the defaults)
<enter> (to accept the defaults)
w(to save changes)
2. Initialize the XFS partition
mkfs.xfs -i size=1024 -f /dev/sdb1
3. For a standard swift install, all data drives are mounted directly under /srv/node, so first create the mount
point
mkdir -p /srv/node/sdb1
4. Finally, add the new partition to fstab so it mounts automatically, then mount all current partitions
echo "/dev/sdb1 /srv/node/sdb1 xfs \

noatime,nodiratime,nobarrier,logbufs=8 0 0" >> /etc/fstab
mount -a

Mirantis OpenStack v8.0 HowTo: Enable/Disable Galera Cluster Autorebuild
Monitoring Guide Mechanism
HowTo: Enable/Disable Galera Cluster Autorebuild Mechanism

By default, the autorebuild mechanism is enabled, so Fuel reassembles a Galera cluster automatically without
any user interaction.
• The OCF Galera script checks every node in the Galera Cluster for the SEQNO position. This allows it to
find the node with the most recent data.
• The script checks for the status of the current node, if it is synchronized with the quorum, the
procedure stops; otherwise, SEQNO is obtained and stored in the Corosync CIB as a variable.
• The script sleeps for 300 seconds, allowing other nodes to join the Corosync quorum and push their
UUIDs and SEQNOs, too.
• For every node in the quorum, the script compares the UUID and SEQNO. If at least one node has a
higher SEQNO, it bootstraps the node as the Primary Component, allowing other nodes to join the
newly formed cluster later;
• The Primary Component node is started with the --wsrep-new-cluster option, forming a new quorum.
To prevent the autorebuild feature you should do:
crm_resource unmanage clone_p_mysql
To re-enable autorebuild feature you should do:
crm_resource manage clone_p_mysql
To check GTID and SEQNO across all nodes saved in Corosync CIB you should do:
cibadmin --query --xpath "//nodes/*/*/nvpair[@name=\"gtid\"]"
To try an automated reassemble without reboot if cluster is broken just issue:
crm resource restart clone_p_mysql
To remove all GTIDs and SEQNOs from Corosync CIB and allow the OCF script to reread the data from the
grastate.dat file, you should do:
cibadmin --delete-all --query --xpath "//nodes/*/*/nvpair[@name=\"gtid\"]" --force

Monitoring Guide HowTo: Backport Galera Pacemaker OCF script
HowTo: Backport Galera Pacemaker OCF script

Starting from Fuel 5.1, OCF script was completely redesigned which makes the Galera cluster cluster more reliable
and predictable. The script can be backported to the Fuel pre-5.1 releases following the instructions below;
similar steps could be used to backport to the older versions by adjusting the MySQL commands to match those
used by the specific version of MySQL.
Warning
Before performing any operations with Galera, you should schedule the maintenance window, perform
backups of all databases, and stop all MySQL related services.
1. Set the p_mysql primitive in maintenance mode:
crm configure edit p_mysql \

meta is-managed=false
2. Check the status. It should show clone_p_mysql primitives as Unmanaged:
crm_mon -1
3. Download the latest OCF script from the fuel-library repository to the Fuel Master node:
wget --no-check-certificate -O \
/etc/puppet/modules/galera/files/ocf/mysql-wss \
https://raw.githubusercontent.com/stackforge/fuel-library/master/deployment/puppet/ \
galera/files/ocf/mysql-wss
4. The OCF script requires some modification as it was originally designed for MySQL 5.6:
perl -pi -e 's/--wsrep-new-cluster/--wsrep-cluster-address=gcomm:\/\//g' \

/etc/puppet/modules/galera/files/ocf/mysql-wss
5. Copy the script to all controllers:
for i in $(fuel nodes | awk '/ready.*controller.*True/{print $1}'); \

do scp /etc/puppet/modules/galera/files/ocf/mysql-wss \
node-$i:/etc/puppet/modules/galera/files/ocf/mysql-wss; done
for i in $(fuel nodes | awk '/ready.*controller.*True/{print $1}'); \

do scp /etc/puppet/modules/galera/files/ocf/mysql-wss \
node-$i:/usr/lib/ocf/resource.d/mirantis/mysql-wss; done
6. Configure the p_mysql resource for the new Galera OCF script:

Monitoring Guide HowTo: Backport Galera Pacemaker OCF script
crm configure edit p_mysql
Example of primitive for Ubuntu:
crm configure primitive p_mysql ocf:mirantis:mysql-wss \

params socket="/var/run/mysqld/mysqld.sock" \
pid="/var/run/mysqld/mysqld.pid" \
test_passwd="password" test_user="wsrep_sst" \
op monitor timeout="55" interval="60" enabled=true \
op start timeout="475" interval="0" \
op stop timeout="175" interval="0" \
meta is-managed=true
Note
During this operation, the MySQL/Galera cluster will be restarted. This may take up to 5 minutes.
7. Check whether Galera Cluster is synced and functioning:
mysql -e "show global status like 'wsrep_cluster_status'"
8. IPs of all nodes should be present in Galera Cluster. This guarantees that all nodes participate in Cluster
operations and acting properly.
mysql -e "show global status like 'wsrep_incoming_addresses'"
9. Restart MySQL related services:
• Restart neutron on every Controller (if installed).

• Restart the remaining OpenStack services on each Controller and Storage node.
• Restart the OpenStack services on the Compute nodes.

Monitoring Guide HowTo: Backport Memcached backend fixes
HowTo: Backport Memcached backend fixes

Fuel 6.1 contains several High Availability (HA) fixes to the configuration of memcache_pool (see Memcached)
backend for Keystone which makes response time of OpenStack services shorter if a memcached node fails. There
is also a security fix to the Keystone configuration which makes the revocation of tokens to be kept in the MySQL
backend instead of memcached. These patches can be backported to the Fuel 6.0 release following the
instructions below. Note, Fuel 5.0 or older does not support memcache_pool backend and the fixes are not
applicable for them.
Warning
Before performing any operations with Keystone, you should schedule a maintenance window, perform
backups of Keystone configuration files, and stop all Keystone related services.
1. Download the related fixes for puppet modules from the fuel-library repository to the Fuel Master node:
wget --no-check-certificate -O /etc/puppet/modules/keystone/manifests/init.pp \

https://raw.githubusercontent.com/stack \
forge/fuel-library/stable/6.0/deployment/puppet/keystone/manifests/init.pp
wget --no-check-certificate -O /etc/puppet/modules/keystone/spec/classes \
/keystone_spec.rb \
forge/fuel-library/stable/6.0/deployment/puppet/keystone/spec/classes \
/keystone_spec.rb
wget --no-check-certificate -O /etc/puppet/modules/openstack/manifests/keystone.pp \
forge/fuel-library/stable/6.0/deployment/puppet/deployment/puppet/openstack \
/manifests/keystone.pp
2. Copy the fixed files to all Controllers:
for i in $(fuel nodes --env <env_ID> | awk '/ready.*controller.*True/{print $1}'); do

scp /etc/puppet/modules/keystone/manifests/init.pp \
node-$i:/etc/puppet/modules/keystone/manifests/init.pp;\
scp /etc/puppet/modules/keystone/spec/classes/keystone_spec.rb \
node-$i:/etc/puppet/modules/keystone/spec/classes/keystone_spec.rb; \
scp /etc/puppet/modules/openstack/manifests/keystone.pp \
node-$i:/etc/puppet/modules/openstack/manifests/keystone.pp;\
done

Monitoring Guide HowTo: Backport Memcached backend fixes
Note
This step assumes the environment id is a "1" and the controller nodes names have a standard Fuel
notation, like "node-1", "node-42" and so on.
3. Update the /etc/keystone/keystone.conf configuration file on all of the controller nodes as the following:
[revoke]
driver = keystone.contrib.revoke.backends.sql.Revoke
[cache]
memcache_dead_retry = 30
memcache_socket_timeout = 1
memcache_pool_maxsize = 1000
[token]
driver = keystone.token.persistence.backends.memcache_pool.Token
4. Restart all Keystone related services:
• Restart Keystone on every Controller.

• Restart Neutron on every Controller (if installed).
• Restart the remaining OpenStack services on each Controller and Storage node.
• Restart the OpenStack services on the Compute nodes.

Monitoring Guide HowTo: Backport RabbitMQ Pacemaker OCF script
HowTo: Backport RabbitMQ Pacemaker OCF script

Fuel 6.1 contains many fixes to the RabbitMQ OCF script which makes the RabbitMQ cluster more reliable and
predictable. This can be backported to the Fuel 5.1 and 6.0 releases following the instructions below. The older
Fuel versions do not use Pacemaker for RabbitMQ cluster management, hence the given changes to the OCF
script are not applicable for them.
Note
The OCF script in the Fuel 6.1 release also distributes and ensures the consistent Erlang cookie file
among the all controller nodes. For backports to the older Fuel versions, this feature is disabled by default
in the OCF script. If you think you want to enable it, please read carefully the details below.
1. Schedule maintenance window.
Warning
Before performing any operations with RabbitMQ, you should schedule maintenance window,
perform backups of all RabbitMQ mnesia files and OCF scripts, and stop all OpenStack services an all
environment nodes, see HowTo: Manage OpenStack services for details.
Mnesia files are located at /var/lib/rabbitmq/mnesia/ and OCF files can be found at
/usr/lib/ocf/resource.d/mirantis/.

2. Inside the maintenance window, put the p_rabbitmq-server primitive in unmanaged state at one of the
controller nodes:
pcs resource unmanage master_p_rabbitmq-server
or with the crm tool:
crm resource unmanage master_p_rabbitmq-server
Note
Normally, the crm tool can be installed from the crmsh package, by commands:
yum install crmsh || apt-get install crmsh
3. Check the status. It should show p_rabbitmq-server primitives as "Unmanaged":
pcs resource show
crm_mon -1
Unmanaged p_rabbitmq-server resources should look like:
Master/Slave Set: master_p_rabbitmq-server [p_rabbitmq-server] (unmanaged) \

p_rabbitmq-server (ocf::fuel:rabbitmq-server): Master node-1 (unmanaged) \
p_rabbitmq-server (ocf::fuel:rabbitmq-server): Slave node-2 (unmanaged) \
p_rabbitmq-server (ocf::fuel:rabbitmq-server): Slave node-3 (unmanaged) \
4. Download the latest OCF script from the fuel-library repository to the Fuel Master node:
wget --no-check-certificate -O /etc/puppet/modules/nova/files/ocf/rabbitmq \

https://raw.githubusercontent.com/stack\
forge/fuel-library/stable/6.0/deployment/puppet/nova/files/ocf/rabbitmq
chmod +x /etc/puppet/modules/nova/files/ocf/rabbitmq

Note
For the Fuel 5.1 release update the link to use a "5.1" version in the download path.
5. Copy the script to all controllers:
for i in $(fuel nodes --env <env_ID> | awk '/ready.*controller.*True/{print $1}'); \

do scp /etc/puppet/modules/nova/files/ocf/rabbitmq \
node-$i:/etc/puppet/modules/nova/files/ocf/rabbitmq; done
for i in $(fuel nodes --env <env_ID> | awk '/ready.*controller.*True/{print $1}'); \
do scp /etc/puppet/modules/nova/files/ocf/rabbitmq \
node-$i:/usr/lib/ocf/resource.d/mirantis/rabbitmq-server; done
Note
This step assumes the environment id is a "1" and the controller nodes names have a standard Fuel
notation, like "node-1", "node-42", and so on.
6. Update the configuration of the p_rabbitmq-server resource for the new RabbitMQ OCF script at any
controller node:
crm configure edit p_rabbitmq-server
An example primitive may look like:
primitive p_rabbitmq-server ocf:mirantis:rabbitmq-server \

params node_port="5673" \
meta failure-timeout="60s" migration-threshold="INFINITY" \
op demote interval="0" timeout="60" \
op notify interval="0" timeout="60" \
op promote interval="0" timeout="120" \
op start interval="0" timeout="120" \
op monitor interval="30" timeout="60" \
op stop interval="0" timeout="60" \
op monitor interval="27" role="Master" timeout="60"
or in an XML notation:
xml <primitive class="ocf" id="p_rabbitmq-server" provider="mirantis" \

type="rabbitmq-server">

<operations>
<op id="p_rabbitmq-server-monitor-30" interval="30" name="monitor" timeout="60"/> \
<op id="p_rabbitmq-server-monitor-27" interval="27" name="monitor" \
role="Master" timeout="60"/>
<op id="p_rabbitmq-server-start-0" interval="0" \
name="start" timeout="60"/>
<op id="p_rabbitmq-server-stop-0" interval="0" \
name="stop" timeout="60"/>
<op id="p_rabbitmq-server-promote-0" interval="0" \
name="promote" timeout="120"/>
<op id="p_rabbitmq-server-demote-0" interval="0" \
name="demote" timeout="60"/>
<op id="p_rabbitmq-server-notify-0" interval="0" \
name="notify" timeout="60"/>
</operations> \
<instance_attributes id="p_rabbitmq-server-instance_attributes"> \
<nvpair id="p_rabbitmq-server-instance_attributes-node_port" \
name="node_port" value="5673"/>
</instance_attributes> \
<meta_attributes id="p_rabbitmq-server-meta_attributes"> \
<nvpair id="p_rabbitmq-server-meta_attributes-migration-threshold" \
name="migration-threshold" value="INFINITY"/>
<nvpair id="p_rabbitmq-server-meta_attributes-failure-timeout" \
name="failure-timeout" value="60s"/> \
</meta_attributes> \
</primitive>
#vim:set syntax=pcmk
Make sure the following changes are applied:
• To the params stanza:
• Add the parameter command_timeout with the value --signal=KILL
Note
The command_timeout parameter value is given for Ubuntu OS.
Use some_param="some_value" notation, or for the XML case:
<nvpair id="p_rabbitmq-server-instance_attributes-some_param" \
name="some_param" value="some_value"/>
• Add the erlang_cookie parameter with the value false

Note
If you want to allow the OCF script to manage the Erlang cookie files, provide the existing
Erlang cookie from /var/lib/rabbitmq/.erlang.cookie as an
erlang_cookie parameter, otherwise set this parameter to a false. Note, that a
different Erlang cookie would require to erase mnesia files for all controller nodes as
well.
Warning
Erasing the mnesia files will also erase all custom users, vhosts, queues, and other
RabbitMQ entities, if any.
• To the meta stanza:
• Set the failure-timeout to a "360s"

• To the op stanzas:
• Set the notify interval to a "0" and the timeout to a "180"

• Set the start interval to a "0" and the timeout to a "360"
Or the same with the pcs tool:
pcs resource meta p_rabbitmq-server failure-timeout=360s

pcs resource op remove p_rabbitmq-server notify interval=0 timeout=60
pcs resource op add p_rabbitmq-server notify interval=0 timeout=180
pcs resource op remove p_rabbitmq-server start interval=0 timeout=60
pcs resource op add p_rabbitmq-server start interval=0 timeout=360
Note
Ignore messages like "Error: Unable to find operation matching:"

Note
You cannot add resource attributes with pcs tool, you should install crmsh package and use crm tool
in order to update command_timeout and erlang_cookie parameters, see details above.
As a result, the given example resource should look like:
# pcs resource show p_rabbitmq-server
Resource: p_rabbitmq-server (class=ocf provider=fuel type=rabbitmq-server)

Attributes: node_port=5673 debug=false command_timeout=--signal=KILL erlang_cookie=EO
Meta Attrs: migration-threshold=INFINITY failure-timeout=360s
Operations: monitor interval=30 timeout=60 (p_rabbitmq-server-monitor-30)
monitor interval=27 role=Master timeout=60 (p_rabbitmq-server-monitor-27)
monitor interval=103 role=Slave timeout=60 OCF_CHECK_LEVEL=30 (p_rabbitmq
start interval=0 timeout=360 (p_rabbitmq-server-start-0)
stop interval=0 timeout=120 (p_rabbitmq-server-stop-0)
promote interval=0 timeout=120 (p_rabbitmq-server-promote-0)
demote interval=0 timeout=120 (p_rabbitmq-server-demote-0)
notify interval=0 timeout=180 (p_rabbitmq-server-notify-0)
# crm configure show p_rabbitmq-server

primitive p_rabbitmq-server ocf:fuel:rabbitmq-server \
op monitor interval=30 timeout=60 \
op monitor interval=27 role=Master timeout=60 \
op monitor interval=103 role=Slave timeout=60 OCF_CHECK_LEVEL=30 \
op start interval=0 timeout=360 \
op stop interval=0 timeout=120 \
op promote interval=0 timeout=120 \
op demote interval=0 timeout=120 \
op notify interval=0 timeout=180 \
params node_port=5673 debug=false command_timeout="--signal=KILL" erlang_cookie=EOKOWX
meta migration-threshold=INFINITY failure-timeout=360s
The output also may have an XML notation and may look like:
xml <primitive class="ocf" id="p_rabbitmq-server" provider="mirantis" \

type="rabbitmq-server"> \
<operations> \
timeout="60"/>
role="Master" timeout="60"/>

<op id="p_rabbitmq-server-start-0" interval="0" name="start" \

timeout="360"/>
<op id="p_rabbitmq-server-stop-0" interval="0" name="stop" \
timeout="60"/>
<op id="p_rabbitmq-server-promote-0" interval="0" name="promote" \
timeout="120"/>
<op id="p_rabbitmq-server-demote-0" interval="0" name="demote" \
timeout="60"/>
<op id="p_rabbitmq-server-notify-0" interval="0" name="notify" \
timeout="180"/>
</operations> \
<instance_attributes id="p_rabbitmq-server-instance_attributes"> \
<nvpair id="p_rabbitmq-server-instance_attributes-node_port" \
name="node_port" value="5673"/>
<nvpair id="p_rabbitmq-server-instance_attributes-command_timeout" \
name="command_timeout" value="--signal=KILL"/>
<nvpair id="p_rabbitmq-server-instance_attributes-erlang_cookie" \
name="erlang_cookie" value="EOKOWXQREETZSHFNTPEY"/> \
</instance_attributes> \
<meta_attributes id="p_rabbitmq-server-meta_attributes"> \
<nvpair id="p_rabbitmq-server-meta_attributes-migration-threshold" \
name="migration-threshold" value="INFINITY"/>
<nvpair id="p_rabbitmq-server-meta_attributes-failure-timeout" \
name="failure-timeout" value="360s"/> \
</meta_attributes> \
</primitive>

7. Put the p_rabbitmq-server to management state and restart it:
pcs resource manage master_p_rabbitmq-server

pcs resource disable master_p_rabbitmq-server
pcs resource enable master_p_rabbitmq-server
pcs resource cleanup master_p_rabbitmq-server
crm resource manage master_p_rabbitmq-server

crm resource restart master_p_rabbitmq-server
crm resource cleanup master_p_rabbitmq-server
Note
During this operation, the RabbitMQ cluster will be restarted. This may take from a 1 up to 20
minutes. If there are any issues, see crm - Cluster Resource Manager.
8. Check whether the RabbitMQ cluster is functioning on each controller node:
rabbitmqctl cluster_status
rabbitmqctl list_users
9. Restart RabbitMQ related services:

See HowTo: Manage OpenStack services for details.

Monitoring Guide HowTo: Manage OpenStack services
HowTo: Manage OpenStack services

1. Stop or start OpenStack services
In order to get a full list of OpenStack services from the corresponding list of OpenStack projects and their
statuses in SysV or Upstart use the following commands:
On Ubuntu:
services=$(curl http://git.openstack.org/cgit/open\
stack/governance/plain/reference/projects.yaml | \
egrep -v 'Security|Documentation|Infrastructure' | \
perl -n -e'/^(\w+):$/ && print "openstack-",lc $1,".*\$|",lc $1,".*\$|"')
initctl list | grep -oE $services | grep start
Now you can start, stop, or restart the OpenStack services, see details about recommended order below.
Warning
Fuel configures some services, like Neutron agents, Heat engine, Ceilometer agents, to be managed
by Pacemaker instead of generic init scripts. These services should be managed only with the pcs or
crm tools!
In order to figure out the list of services managed by Pacemaker, you should first find disabled or not
running services with the command:
On Ubuntu:
initctl list | grep -oE $services | grep stop
Next, you should inspect the output of command pcs resource (or crm resource list) and find the
corresponding services listed, if any.
The Pacemaker resources list is:
root@node-1:~# crm status
Stack: corosync
Current DC: node-1.domain.tld (1) - partition with quorum
Version: 1.1.12-561c4cf
3 Nodes configured
43 Resources configured
Online: [ node-1.domain.tld node-2.domain.tld node-5.domain.tld ]
Clone Set: clone_p_vrouter [p_vrouter]

Started: [ node-1.domain.tld node-2.domain.tld node-5.domain.tld ]

vip__management (ocf::fuel:ns_IPaddr2): Started node-1.domain.tld
vip__public_vrouter (ocf::fuel:ns_IPaddr2): Started node-1.domain.tld
vip__management_vrouter (ocf::fuel:ns_IPaddr2): Started node-1.domain.tld
vip__public (ocf::fuel:ns_IPaddr2): Started node-2.domain.tld
Master/Slave Set: master_p_conntrackd [p_conntrackd]
Masters: [ node-1.domain.tld ]
Slaves: [ node-2.domain.tld node-5.domain.tld ]
Clone Set: clone_p_haproxy [p_haproxy]
Clone Set: clone_p_dns [p_dns]
Clone Set: clone_p_mysql [p_mysql]
Master/Slave Set: master_p_rabbitmq-server [p_rabbitmq-server]
Clone Set: clone_p_heat-engine [p_heat-engine]
Clone Set: clone_p_neutron-plugin-openvswitch-agent [p_neutron-plugin-openvswitch-agen
Clone Set: clone_p_neutron-dhcp-agent [p_neutron-dhcp-agent]
Clone Set: clone_p_neutron-metadata-agent [p_neutron-metadata-agent]
Clone Set: clone_p_neutron-l3-agent [p_neutron-l3-agent]
Clone Set: clone_p_ntp [p_ntp]
Clone Set: clone_ping_vip__public [ping_vip__public]
You may notice, that there is only a heat-engine service is managed by Pacemaker and disabled in OS. At
any controller node, use the following command to start or stop cluster-wide:
pcs resource enable clone_p_openstack-heat-engine

pcs resource disable clone_p_openstack-heat-engine
or with crm tool:
crm resource start clone_p_openstack-heat-engine

crm resource stop clone_p_openstack-heat-engine
2. Start, stop, restart order for OpenStack services.
• Start/stop/restart keystone on every Controller.

• Start/stop/restart neutron-server and agents on every Controller (if installed).
Note
Use pcs or crm tools for corresponding services, when managed by Pacemaker
• Start/stop/restart the remaining OpenStack services on each Controller and Storage node, in any order.
Note
Use pcs or crm tools for corresponding services, when managed by Pacemaker
• Start/stop/restart the OpenStack services on the Compute nodes, in any order.

3. Unmanage, manage services controlled by Pacemaker.
In order to put a resource in uncontrolled state, use the following commands:
pcs resource unmanage <some_resource_name>
or with crm tool
crm resource unmanage <some_resource_name>
This will not stop the running resources.

And to bring the resource back to be managed by Pacemaker:
pcs resource manage <some_resource_name>
or with crm tool
crm resource manage <some_resource_name>

Monitoring Guide Adding, Redeploying, and Replacing Nodes
Adding, Redeploying, and Replacing Nodes

This section discusses how to add, redeploy, and replace nodes in your OpenStack environment. Compute and
Storage nodes have always been expandable; Controller nodes could not be added or replaced before Mirantis
OpenStack 5.1.
Redeploy a Non-Controller Node

Redeploying a node refers to the process of changing the roles that are assigned to it. For example, you may have
several nodes that run both Compute and Storage roles and you want to redeploy some of those nodes to be only
Storage nodes while others become only Compute nodes. Or you might want to redeploy some Compute and
Storage nodes to be MongoDB nodes.
To redeploy a non-controller node, follow these steps:
1. Use live migration to move instances from the Compute nodes you are going to redeploy.
2. If appropriate, back up or copy information from the Operating System nodes being redeployed.
3. Select the node and click Remove to remove the node from the environment. Select the node(s) to be
deleted then click on the "Delete Nodes" button.
4. Click on the "Deploy Changes" button on that screen.
5. Wait for the node to become available as an unallocated node.
6. Use the same Fuel screen to assign an appropriate role to each node being redeployed.
7. Click on the "Deploy Changes" button.
8. Wait for the environment to be deployed.
After redeploying an Operating System node, you will have to manually apply any configuration changes you
made and reinstall the software that was running on the node or restore the system from the backup you made
before redeploying the node.
Add a Non-Controller Node

Non-controller nodes can be added to your OpenStack environment.
To add a non-controller node to your environment, follow these steps:
1. Physically configure the node into your hardware environment.

2. Wait for the new node to show up as an "Unallocated Node" on your Fuel dashboard.
3. Click Add Node.
4. Assign a role or roles to the node that you want.
5. Click Deploy Changes and wait for the node to deploy.
The cluster must be redeployed to update the configuration files. Most of the services that are running are
not affected but the redeployment process restarts HAProxy and a few other services.
Add a MongoDB node

Monitoring Guide Add a controller node
Additional MongoDB roles can be added to an existing deployment by using shell commands. Any number of
MongoDB roles (or standalone nodes) can be deployed into an OpenStack environment using the Fuel Web UI
during the initial deployment but you cannot use the Fuel Web UI to add MongoDB nodes to an existing
environment.
Fuel installs MongoDB as a backend for Ceilometer. Ideally, you should configure one MongoDB node for each
Controller node in the environment so, if you add Controller nodes, you should also add MongoDB nodes.
To add one or more MongoDB nodes to the environment:
1. Add an entry for each new MongoDB node to the connection parameter in the ceilometer.conf file on
each Controller node. This entry needs to specify the new node's IP address for the Management logical
network.
2. Open the astute.yaml file on any deployed MongoDB node and determine which node has the
primary-mongo role. Write down the value of the fqdn parameter that you will use to connect to this
node.
For more information, see: MongoDB nodes configuration
<http://docs.openstack.org/developer/fuel-docs/userdocs/fuel-user-guide/file-ref/astute-yaml-target.html> in the
Fuel User Guide
3. Retrieve the db_password value from the Ceilometer configuration section in the astute.yaml file. You
will use this password to access the primary MongoDB node.
For more information, see: Ceilometer configuration
<http://docs.openstack.org/developer/fuel-docs/userdocs/fuel-user-guide/file-ref/astute-yaml-target.html> in the
Fuel User Guide.
4. Connect to the MongoDB node that has the primary-mongo role and log into Mongo:
ssh ... <fqdn-of-primary-mongo-node>

mongo -u admin -p <db_password> admin
5. Configure each MongoDB node to be added to the environment:
ceilometer:PRIMARY> rs.add ("<management-ip-address-of-node>")
6. Restart the ceilometer services.
Add a controller node

You can add new controller nodes to your OpenStack environment without redeploying the entire environment.
However, compute and Cinder storage nodes, if any, do not automatically apply the configuration changes after
addition of a new controller node. Therefore, you must redeploy compute and Cinder storage nodes.
You may want to add a new controller node to your environment if:
• OpenStack environment has only one controller node. To make the environment highly-available, add at
least two additional controller nodes.
• The resources of your existing controller nodes are being exhausted and you want to supplement them.
• A controller node has failed and requires replacement. In this case, you must first remove the failed
controller node as described in Remove a Controller node.

Monitoring Guide Remove a Controller node
Note
Each OpenStack environment must include an odd number of the controller nodes. For example, you can
deploy one, three, or five controller nodes, and so on.
To add a controller node:
1. Configure the server hardware.

2. Wait for the server to be discovered and reflected in the Unallocated Node count in the Fuel web UI.
3. Assign the Controller role to each node.
4. Verify the connectivity between the nodes.
5. Click Deploy changes.
Fuel adds the new controller nodes.
6. Run the post-deployment tests.
7. Redeploy the Cinder and compute nodes.
Remove a Controller node

You may need to remove a Controller node from your environment, usually because you want to replace it with a
different server. This may be because of a catastrophic hardware failure on the node's server or because you want
to replace the server with a more powerful system.
To remove a Controller node:
1. Remove Controller(s) from environment by going to the Nodes tab in the Fuel desktop, selecting the node(s)
to be deleted, and click on the "Delete Nodes" button.
Puppet removes the controller(s) from the configuration files and retriggers services.
2. Physically remove the controller from the configuration.
Configuring an Operating System node

Fuel provisions the Operating System Role with the supported operating system that was selected for the
environment but Puppet does not deploy other packages on this node. You can find out more about what Fuel
installs on such nodes in How the Operating System Role is provisioned.
You can access an Operating System node using ssh, just as you would access any other node; see Accessing the
shell on the nodes. Some general administrative tasks you may need to perform are:
• Create file systems on partitions you created and populate the fstab file so they will mount automatically.
• Configure additional logical networks you need; Fuel only configures the Admin/PXE network.
• Set up any monitoring facilites you want to use such as monit and atop; configure syslog to send error
messages to a centralized syslog server.

Monitoring Guide Remove a Controller node
• Tune kernel resources to optimize performance for the particular applications you plan to run here.
You are pretty much free to install and configure this node any way you like. By default, all the repositories from
the Fuel Master node are configured so you can install packages from these repositories by running the apt-get
install <package-name> command. You can also use scp to copy other software packages to this node and then
install them using apt-get or yum.

Monitoring Guide How To: Safely remove a Ceph OSD node
How To: Safely remove a Ceph OSD node

Before a Ceph OSD node can be deleted from an environment all data must be moved to other OSDs. Fuel
prevents the deletion of OSDs that still have data on them. The following process will move any data present on
the soon-to-be-deleted node to other OSDs in the cluster.
1. Determine which OSD processes are running on the target node (node-35 in this case):
# ceph osd tree

# id weight type name up/down reweight
-1 0.4499 root default
-2 0.09998 host node-35
0 0.04999 osd.0 up 1
1 0.04999 osd.1 up 1
From this output we can see that OSDs 0 and 1 are running on this node.
2. Remove the OSDs from the Ceph cluster:
# ceph osd out 0

# ceph osd out 1
This will trigger a rebalance. Placement groups will be moved to other OSDs. Once that has completed we
can finish removing the OSD. This process can take minutes or hours depending on the amount of data to
be rebalanced. While the rebalance is in progress the cluster state will look something like:
# ceph -s
cluster 7fb97281-5014-4a39-91a5-918d525f25a9
health HEALTH_WARN recovery 2/20 objects degraded (10.000%)
monmap e1: 1 mons at {node-33=10.108.2.4:6789/0}, election epoch 1, quorum 0 node-33
osdmap e172: 7 osds: 7 up, 5 in
pgmap v803: 960 pgs, 6 pools, 4012 MB data, 10 objects
10679 MB used, 236 GB / 247 GB avail
2/20 objects degraded (10.000%)
1 active
959 active+clean
Once the rebalance is complete it will look like:
# ceph -s
cluster 7fb97281-5014-4a39-91a5-918d525f25a9
health HEALTH_OK
monmap e1: 1 mons at {node-33=10.108.2.4:6789/0}, election epoch 1, quorum 0 node-33
pgmap v804: 960 pgs, 6 pools, 4012 MB data, 10 objects

Monitoring Guide How To: Safely remove a Ceph OSD node
10679 MB used, 236 GB / 247 GB avail

960 active+clean
When the cluster is in state HEALTH_OK the OSD(s) can be removed from the CRUSH map:
On Ubuntu hosts:
# stop ceph-osd id=0
# ceph osd crush remove osd.0

# ceph auth del osd.0
# ceph osd rm osd.0
After all OSDs have been deleted the host can be removed from the CRUSH map:
# ceph osd crush remove node-35
3. This node can now be deleted from the environment with Fuel.
Caveats:
• Ensure that at least replica count hosts remain in the cluster. If the replica count is 3 then there should
always be at least 3 hosts in the cluster.
• Do not let the cluster reach its full ratio. By default when the cluster reaches 95% utilization Ceph will
prevent clients from writing data to it. See Ceph documentation for additional details.

Mirantis OpenStack v8.0 How To: Adjust Placement Groups when adding
Monitoring Guide additional Ceph OSD node(s)
How To: Adjust Placement Groups when adding additional Ceph

OSD node(s)
When adding additional Ceph OSD nodes to an environment, it may be necessary to addjust the placement
groups (pg_num) and the placement groups for placement (pgp_num) for the pools. The following process can be
used to update these two values for the Ceph cluster. These adjustments may not be necessary unless you add a
significant number of OSDs to an existing cluster. Fuel attempts to calculate appropriate values for the initial
deployment but does not adjust them when adding additional OSDs.
1. Determine the current values for pg_num and pgp_num for each pool that will be touched. The pools that
may need to be adjusted are the 'backups', 'images', 'volumes' and 'compute' pools.
First get the list of pools that are currently configured.
# ceph osd lspools

0 data,1 metadata,2 rbd,3 images,4 volumes,5 compute,
The pools that may need to be adjusted are the 'backups', 'images', 'volumes', and 'compute' pools. You can
query each pool to see what the current value of pg_num and pgp_num is for a pool individually.
# ceph osd pools get {pool-name} pg_num

# ceph osd pools get {pool-name} pgp_num
2. Calculate the correct value for pg_num and pgp_num that should be used based on the number of OSDs the
cluster has. See the Ceph Placement Groups documentation for additional details on how to properly
calculate these values. Ceph.com has a Ceph PGs Per Pool Calculator that can be helpful in this calculation.
3. Adjust the pg_num and pgp_num for each of the pools as necessary.
# ceph osd pool set {pool-name} pg_num {pg_num}

# ceph osd pool set {pool-name} pgp_num {pgp_num}
It should be noted that this will cause a cluster rebalance to occur which may have performance impacts on
services consuming Ceph.
Caveats:
• It is also not advisable to set pg_num and pgp_num to large values unless necessary as it does have an
impact on the amount of resources required. See the Choosing the Number of Placement Groups
documentation for additional details.

Monitoring Guide HowTo: Shut down the whole cluster
HowTo: Shut down the whole cluster

To shut down the whole cluster, follow these steps:
1. Stop all virtual machines.

2. Either power off all the nodes (with clicking poweroff button or running poweroff command) at once or shut
them down in the following order:
• Computes
• Controllers (one by one or all in one)
• Cinder/Ceph/Swift
• Other/Neutron (if separate Neutron node exists)
Starting up the cluster

To start up the cluster, power on all the nodes.
If you have a cluster with Neutron, take the instructions below into consideration.
Cinder or Ceph require Neutron, Neutron node requires database and Controllers, so the following sequence is
possible:
1. Power on Ceph/Cinder/Swift/MongoDB/Zabbix nodes.

2. Start all controllers and Neutron (if separate Neutron node exists) and wait until RabbitMQ, Neutron agents
and Galera get synced by Pacemaker. It should take no more than 5 minutes.
3. Ensure that HAProxy is OK. Note, that HAProxy is under Pacemaker so you should use pcs resource
<command> clone_p_haproxy to manage it.
4. Ensure that RabbitMQ cluster is OK and fix it if there are failed nodes.
5. Ensure that Galera cluster is OK and fix it if necessary. Note, that Galera is under Pacemaker, so you should
use pcs resource to manage it.
6. Run crm resource restart clone_p_neutron-metadata-agent command.
7. Run crm resource restart clone_p_neutron-plugin-openvswitch-agent
8. Wait several minutes and check the cluster state with crm status command. All Neutron agents including L3
and DHCP should be started.
9. Restart all Openstack services.
10. Power on compute nodes.

Monitoring Guide Creating and Configuring ML2 Drivers for Neutron
Creating and Configuring ML2 Drivers for Neutron

Fuel supports the Modular Layer 2 (ML2) mechanism drivers for Neutron. Some network configurations, such as
advanced features provided by the Mellanox adapters, require ML2 mechanism driver configuration.
You can add ML2 configuration data to the quantum_settings section of the node.yaml file (see Using YAML
configuration files); this updates the astute.yaml file:
quantum_settings:
server:
service_plugins:
'neutron.services.l3_router.l3_router_plugin.L3RouterPlugin,
neutron.services.firewall.fwaas_plugin.FirewallPlugin,
neutron.services.metering.metering_plugin.MeteringPlugin'
L2:
provider:'ml2'
mechanism_drivers: 'openvswitch'
type_drivers: "local,flat,l2[:segmentation_type]"
tenant_network_types: "local,flat,l2[:segmentation_type]"
flat_networks: '*'
segmentation-type:'vlan'
tunnel_types: l2[:segmentation_type]
tunnel_id_ranges: l2[:tunnel_id_ranges]
vxlan_group: 'None'
vni_ranges: l2[:tunnel_id_ranges]
Note the following:
• The following values should be set only if L2[enable_tunneling] is true: tunnel_types, tunnel_id_ranges,
vxlan_group, vni_ranges.
• The l2[:item] settings refer to values that are already in the quantum_settings.
• This only shows new items that are related to ml2 configuration. The values shown are the defaults that are
used if no other value is set.
• All Neutron component packages are available to download.

Monitoring Guide Using YAML configuration files
Using YAML configuration files

Fuel uses YAML files to pass configuration attributes to Puppet.
Passing custom attributes are useful when you have some Puppet manifests that should be run but are not
supported by Fuel itself.
Warning
Be very careful when modifying the configuration files. A simple typo when editing these files may
severely damage your environment. When you modify the YAML files, you will receive a warning that
some attributes were modified from the outside. Some features may become inaccessible from the UI
after you do this.
To do this:
1. Create an OpenStack environment as described in the Fuel User Guide.

2. Assign roles to nodes, but do not start deployment yet.
3. Log into the Fuel Master node and dump provisioning information using the following Fuel command:
fuel --env 1 provisioning default
where --env 1 that points to the specific environment (id=1 in this example).
To dump deployment information, the command is:
fuel --env 1 deployment default
For a full list of configuration types that can be dumped, see the Fuel section.
These commands create a directories called provisioning_1 or deployment_1, which include a number of
YAML files that correspond to roles that are currently assigned to nodes. Each file includes parameters for
current role, so you can freely modify and save them.
Customizing Passwords
A few service passwords are not managed by the Keystone based access control facility. You may want to set your
own passwords in some cases.
You can edit files and modify password values. For example, you can set the MySQL root password in this block:
"mysql": {
"root_password": "mynewpassword"
},

Monitoring Guide Adding new modules
Adding new modules

You can extend Fuel functionality by adding new Puppet modules. You can do this either by adding them to the
/etc/puppet/modules file on the Fuel Master node, or you can edit existing modules to change the deployment
behavior in some way.
As an example, let's add a new module called 'packages' that installs some useful packages from the repository
that is located on the Master node.
The module should have the following structure:
packages/
packages/manifests
packages/manifests/init.pp
init.pp should have this content:
class profile {
$tools = $::fuel_settings['tools']
package { $tools :
ensure => installed,
}
}
To implement this module:
1. Copy this module to the /etc/puppet/modules directory on the Master node.

2. Add 'include profile' to the end of the /etc/puppet/manifests/site.pp file to enable this module. Placing new
include statements in the middle of the file may break the deployment process and/or its dependencies.
3. As you can see, there is list of packages to install that should be passed through the Fuel parameters
system.
Let's add this attribute to the downloaded file’s top level hash:
“tools”: [
“htop”,
“tmux”,
]
Provisioned nodes will have this addition in their parameters and our 'profile' module will be able to access
their values and install the given list of packages during node deployment.
4. Upload the modified configuration:
fuel --env 1 deployment upload
You can also use the --dir option to set a directory from which to load the parameters.

Monitoring Guide Adding new modules
5. Start the deployment process as usual.

This operation has following effects:
• Parameters that are about to be sent to the orchestrator are replaced completely with the ones you
specified.
• The cluster sets the is_customized flag, which is checked by the UI so you will get a message about
attributes customization.

Monitoring Guide Docker Containers and Dockerctl
Docker Containers and Dockerctl

Docker provides user-friendly commands that can be used to deploy LXC (Linux containers) containers.
• Docker brings LXC to the foreground by wrapping it with user-friendly commands.

• Dockerctl is a simple wrapper for Docker that improves Docker's offline image handling and container
versioning. It adds additional management tools that are useful when managing your Fuel deployment.
See Docker containers and dockerctl for more background information.
Container types
Application Container
An application container is the most common type of container. It usually runs a single process in the
foreground and writes logs to stdout/stderr. Docker traps these logs and saves them automatically in its
database.
Storage Container
A storage container is a minimalistic container that runs Busybox and acts as a sharer of one or more
directories. It needs to run only one time and then spends the majority of its existence in Exited state.
Command reference
Below is a list of commands that are useful when managing LXC containers on the Fuel Master.
Basic usage
Get a list of available commands:
docker help
Get a list of all running containers:
docker ps
Get a list of all containers available:
docker ps -a
Note
the storage containers used for sharing files among application containers are usually in Exited state.
Exited state means that the container exists, but no processes inside are running.

Monitoring Guide Docker Containers and Dockerctl
Start a new Docker container with the specified commands:
docker run [options] imagename [command]
Example: The command below creates a temporary postgres container that is ephemeral and not tied to any other
containers. This is useful for testing without impacting production containers.
docker run --rm -i -t fuel/postgres /bin/bash
Import a Docker image:
docker load -i (archivefile)
Loads in a Docker image in the following formats: .tar, .tar.gz, .tar.xz. lrz is not supported.
Save a Docker image to a file:
docker save image > image.tar
Dockerctl
Build and run storage containers, then run application containers:
dockerctl build all
Note
This can take a few minutes, depending on your hardware.
Launch a container from its image with the necessary options. If the container already exists, will ensure that this
container is in a running state:
dockerctl start <appname> [--attach]
Optionally, --attach option can be used to monitor the process and view its stdout and stderr.
Display the entire container log for /app/. Useful for troubleshooting:
dockerctl logs <appname>
Stop or restart a container:

Mirantis OpenStack v8.0 System changes for Docker affecting Fuel 5.0 and
Monitoring Guide later
dockerctl stop|restart <appname>
Create a shell or run a command:
dockerctl shell <appname> [command]
Note
The container must be running first in order to use this feature. Additionally, quotes must be escaped if
your command requires them.
Stop and destroy a container:
dockerctl destroy <appname>
Note
This is not reversible, so use with caution.
Find containets names by running:
dockerctl list
System changes for Docker affecting Fuel 5.0 and later

The Fuel Master base system is modified in 5.0. These changes were made mostly to enable directory sharing
between containers to operate smoothly:
• /etc/astute.yaml moved to /etc/fuel/astute.yaml

• /etc/nailgun/version.yaml moved to /etc/fuel/version.yaml
• Base OS puppet is now run from /etc/puppet/modules/nailgun/examples/host-only.pp
• Postgres DB is now inside a container. You can access it if you run "dockerctl shell postgres" or connect to
localhost from base host.
• DNS resolution is now performed inside the cobbler container. Additional custom entries should be added
inside /etc/dnsmasq.d/ inside the cobbler container or via Cobbler itself.

Monitoring Guide Fuel Master architecture changes for Docker
• Starting with Fuel 6.1, Docker containers with host networking are used. This means that dhcrelay is not
used anymore because cobbler/dnsmasq are bound to the host system.
• Application logs are inside /var/log/docker-logs, including astute, nailgun, cobbler, and others.
• Supervisord configuration is located inside /etc/supervisord.d/(CurrentRelease)/
• Containers are automatically restarted by supervisord. If you need to stop a container for any reason, first
run supervisorctl stop /app/, and then dockerctl stop /app/
Fuel Master architecture changes for Docker

Starting with Fuel 5.1, in order to enable containerization of the Fuel Master's services, several pieces of the Fuel
Master node design were changed. Most of that change came from Puppet, but below is a list of modifications to
Fuel to enable Docker:
• DNS lookups come from Cobbler container

• App containers launch in order, but not in a synchronous manner. Retries were added to several sections of
deployment in case a dependent service is not yet ready.
• The version.yaml file is extended to include production key with values docker and docker-build.
• Extended Docker's default iptables rules to ensure that traffic visibility is appropriate for each service.

Monitoring Guide Enable Experimental Features
Enable Experimental Features

Experimental features provide useful functionality but may not be mature enough to be appropriate for
environments that require high levels of stability. By default, experimental features are disabled in Fuel.
To enable experimental features, complete the steps described in this section. You should do this before creating
and configuring your environment if you want to have access to experimental features during those phases. You
can also enable experimental features after you upgrade your Fuel Master node.
• Log into your Fuel Master console as root.

• Manually modify the /etc/fuel/version.yaml file to add "experimental" to the "feature_groups" list in the
"VERSION" section. For example:
VERSION:
...
feature_groups:
- mirantis
- experimental
• Restart the Nailgun container with dependencies by running:
dockerctl restart nailgun

dockerctl restart nginx
dockerctl shell cobbler
cobbler sync
exit
For more details about configuring the Nailgun settings see Fuel Developer documentation
<http://docs.openstack.org/developer/fuel-docs/>.
Alternatively, you can build a custom ISO with the experimental features enabled:
make FEATURE_GROUPS=experimental iso

Monitoring Guide Fuel Access Control
Fuel Access Control

Access to the Fuel Dashboard is controlled in Mirantis OpenStack 5.1 and later. Authentication is under control of
Keystone.
The default username/password can be changed:
• During Fuel installation; see Fuel Installation Guide.

• From the main Fuel UI screen; see Fuel User Guide.
• Using the Fuel CLI; see Fuel User Guide.
If the password for the Fuel Dashboard is changed using the Fuel UI or the Fuel CLI, the new password is stored
only in Keystone; it is not written to any file. If you forget the password, you can change it by using the keystone
command on the Fuel Master Node:
keystone --os-endpoint=http://<your_master_ip>:35357/v2.0 --os-token=<admin_token> password
The default value of <your_master_ip> is 10.20.0.2. The port number of 35357 never changes.
The <admin_token> is stored in the /etc/fuel/astute.yaml file on the Fuel Master node.
To run or disable authentication, modify /etc/nailgun/settings.yaml (AUTHENTICATION_METHOD) in the Nailgun
container.
All endpoints except the agent updates and version endpoint are protected by an authentication token, obtained
from Keystone by logging into Fuel as the admin user with the appropriate password.
Services such as Astute, Cobbler, Postgres, MCollective, and Keystone), which used to be protected with the default
password, are now each protected by a user/password pair that is unique for each Fuel installation.
Beginning with release 6.0, the Nailgun and OSTF services endpoints are added to Keystone and now it is possible
to use the Keystone service catalog to obtain URLs of those services instead of hardcoding them.
Fuel Authentication is implemented by a dedicated Keystone instance that Fuel installs in a new docker container
on the Fuel Master.
• Fuel Menu generates passwords for fresh installations; the upgrade script generates passwords when
upgrading. The password is stored in the Keystone database.
• The nailgun and ostf users are created in the services project with admin roles. They are used to authenticate
requests in middleware, rather than requiring that each request by middleware be validated using the
Keystone admin token as was done in Release 5.1.
• Some Nailgun URLs are not protected; they are defined in nailgun/middleware/keystone.py in the public_url
section.
• The authentication token does not expire for 24 hours so it is not necessary to store the username and
password in the browser cache.
• A cron script runs daily in the Keystone container to delete outdated tokens using the keystone-manage
token_flush command. It can be seen using the crontab -l command in the Keystone container.
• Support for storing authentication token in cookies is added in releases 5.1.1 and later; this allows the API
to be tested from the browser.

Monitoring Guide Fuel Access Control
• The keystonemiddleware python package replaces the deprecated keystoneclinet.middleware package; this
is an internal change that makes the implementation more stable. All recent fixes and changes are made to
keystonemiddleware; which was extracted from keystoneclinet.middleware in earlier releases.
Beginning with releases 5.1.1 and later, the user must supply a password when upgrading Fuel from an earlier
release. This password can be supplied on the command line when running the installation script or in response
to the prompt (this is the same password that is used to access Fuel UI).

Monitoring Guide Managing your Ceph cluster
Managing your Ceph cluster

Accessing the Puppet manifest for Ceph
The node parameter defines the names of the Ceph pools to pre-create. By default, volumes and images are
required to set up the OpenStack hooks.
node 'default' {
...
}
The class section configures components for all Ceph-OSD nodes in the environment:
class { 'ceph::deploy':
auth_supported => 'cephx',
osd_journal_size => '2048',
osd_mkfs_type => 'xfs',
}
You can modify the authentication type, Journal size (specified in KB), and the filesystem architecture to use.
Verify the deployment

Use the ceph -s or ceph health command on one of the Controller or Ceph-OSD nodes to check the current status
of the Ceph cluster.
The output of this command looks like:
root@fuel-ceph-02:~# ceph -s
health HEALTH_OK
monmap e1: 2 mons at {fuel-ceph-01=10.0.0.253:6789/0,fuel-ceph-02=10.0.0.252:6789/0}, el
pgmap v275: 448 pgs: 448 active+clean; 9518 bytes data, 141 MB used, 28486 MB / 28627 MB
mdsmap e4: 1/1/1 up {0=fuel-ceph-02.local.try=up:active}
Look for the following two lines:
• monmap: should display the correct number of Ceph-MON processes.

• osdmap: should display the correct number of Ceph-OSD instances (one per node per volume)
ceph -s may return an error similar to the following:
root@fuel-ceph-01:~# ceph -s
health HEALTH_WARN 63 pgs peering; 54 pgs stuck inactive; 208 pgs stuck unclean; recovery 2
...
ceph commands return "missing keyring" error, such as:

Monitoring Guide Managing your Ceph cluster
2013-08-22 00:06:19.513437 7f79eedea760 -1 monclient(hunting): ERROR: missing keyring, cann

2013-08-22 00:06:19.513466 7f79eedea760 -1 ceph_tool_common_init failed.
To analyze the problem:
• Check the links in /root/ceph*.keyring. There should be one for each admin, osd, and mon role that is
configured. If any are missing, it may be the cause of the error. To correct use soft-links:
ceph-deploy gatherkeys {monitor host}
Place the downloaded keys to /etc/ceph/. Remove the original files from /root and create symlinks with the
same names in /root, pointing to the actual files in /etc/ceph/.
• Try to run the following command:
ceph-deploy gatherkeys {mon-server-name}
If this does not work,

an error may have occurred when initializing the cluster.
• Run the following command to find running ceph processes:
ps axu | grep ceph
If this lists a python process running for ceph-create-keys, it usually indicates that the Ceph-MON processes
are unable to communicate with each other.
• Check the network and firewall for each Ceph-MON. Ceph-MON defaults to port 6789.
• If public_network is defined in the ceph.conf file, mon_host and DNS names must be inside the
public_network or ceph-deploy does not create the Ceph-MON processes.
Missing OSD instances

For the default configuration, you should have one Ceph-OSD instance per volume for each Ceph-OSD node
listed in the configuration. If one or more of these is missing, it may indicate a problem initializing and mounting
the disks. Common causes:
• The disk or volume is in use.

• The disk partition did not refresh in the kernel.
• Customized Ceph.conf may contain errors on mount parameters string. Ceph-deploy fails to mount such
partitions.
Check the osd tree:
#ceph osd tree

Monitoring Guide Test that Ceph works with OpenStack components
# id weight type name up/down reweight

-1 6 root default
-2 2 host controller-1
0 1 osd.0 up 1
3 1 osd.3 up 1
1 1 osd.1 up 1
4 1 osd.4 up 1
2 1 osd.2 up 1
5 1 osd.5 up 1
Ceph pools
To see which Ceph pools have been created, use the ceph osd lspools command:
# ceph osd lspools

0 data,1 metadata,2 rbd,3 images,4 volumes,
By default, two pools -- image and volumes are created. In this case, we also have data, metadata, and rbd pools.
Test that Ceph works with OpenStack components

Glance
Upload an image to Glance to see if it is saved in Ceph:
source ~/openrc
glance image-create --name cirros --container-format bare \
--disk-format qcow2 --is-public yes --location \
https://launchpad.net/cirros/trunk/0.3.0/+download/cirros-0.3.0-x86_64-disk.img
This should return something similar to the following:
+------------------+--------------------------------------+
| Property | Value |
+------------------+--------------------------------------+
| checksum | None |
| container_format | bare |
| created_at | 2013-08-22T19:54:28 |
| deleted | False |
| deleted_at | None |
| disk_format | qcow2 |
| id | f52fb13e-29cf-4a2f-8ccf-a170954907b8 |
| is_public | True |
| min_disk | 0 |

| min_ram | 0 |
| name | cirros |
| owner | baa3187b7df94d9ea5a8a14008fa62f5 |
| protected | False |
| size | 0 |
| status | active |
| updated_at | 2013-08-22T19:54:30 |
+------------------+--------------------------------------+
Then check rbd:
rbd ls images
rados -p images df
Cinder
Create a small volume and see if it is saved in Cinder:
source openrc
cinder create 1
This will instruct Сinder to create a 1 GB volume. This should return something similar to the following:
+=====================+======================================+
+=====================+======================================+
| attachments | [] |
+---------------------+--------------------------------------+
| availability_zone | nova |
+---------------------+--------------------------------------+
| bootable | false |
+---------------------+--------------------------------------+
| created_at | 2013-08-30T00:01:39.011655 |
+---------------------+--------------------------------------+
| display_description | None |
+---------------------+--------------------------------------+
| display_name | None |
+---------------------+--------------------------------------+
| id | 78bf2750-e99c-4c52-b5ca-09764af367b5 |
+---------------------+--------------------------------------+
| metadata | {} |
+---------------------+--------------------------------------+
| size | 1 |
+---------------------+--------------------------------------+
| snapshot_id | None |
+---------------------+--------------------------------------+
| source_volid | None |

+---------------------+--------------------------------------+
| status | creating |
+---------------------+--------------------------------------+
| volume_type | None |
+---------------------+--------------------------------------+
Check the status of the image using its id with the cinder show <id> command:
cinder show 78bf2750-e99c-4c52-b5ca-09764af367b5
+==============================+======================================+
+==============================+======================================+
| attachments | [] |
+------------------------------+--------------------------------------+
| availability_zone | nova |
+------------------------------+--------------------------------------+
| bootable | false |
+------------------------------+--------------------------------------+
| created_at | 2013-08-30T00:01:39.000000 |
+------------------------------+--------------------------------------+
| display_description | None |
+------------------------------+--------------------------------------+
| display_name | None |
+------------------------------+--------------------------------------+
| id | 78bf2750-e99c-4c52-b5ca-09764af367b5 |
+------------------------------+--------------------------------------+
| metadata | {} |
+------------------------------+--------------------------------------+
| os-vol-host-attr:host | controller-19.domain.tld |
+------------------------------+--------------------------------------+
| os-vol-tenant-attr:tenant_id | b11a96140e8e4522b81b0b58db6874b0 |
+------------------------------+--------------------------------------+
| size | 1 |
+------------------------------+--------------------------------------+
| snapshot_id | None |
+------------------------------+--------------------------------------+
| source_volid | None |
+------------------------------+--------------------------------------+
| status | available |
+------------------------------+--------------------------------------+
| volume_type | None |
+------------------------------+--------------------------------------+
If the image shows status available, it was successfully created in Ceph. You can check this with the rbd ls volumes
command.

rbd ls volumes
volume-78bf2750-e99c-4c52-b5ca-09764af367b5
Rados GW
First confirm that the cluster is HEALTH_OK using ceph -s or ceph health detail. If the cluster is not healthy most of
these tests will not function.
Note
RedHat distros: mod_fastcgi's /etc/httpd/conf.d/fastcgi.conf must have FastCgiWrapper Off or rados calls
will return 500 errors.
Rados relies on the service radosgw (Debian) ceph-radosgw (RHEL) to run and create a socket for the webserver's
script service to talk to. If the radosgw service is not running, or not staying running then you need to inspect it
closer.
The service script for radosgw might exit 0 and not start the service. An easy way to test this is to simply service
ceph-radosgw restart if the service script can not stop the service, it was not running in the first place.
You can also check to see if the radosgw service is be running with the ps axu | grep radosgw command, but this
might also show the webserver script server processes as well.
Most commands from radosgw-admin will work regardless of whether the radosgw service is running or not.
Swift
Create a new user:
radosgw-admin user create --uid=test --display-name="username" --email="username@domain

{ "user_id": "test",
"display_name": "username",
"email": "username@domain.com",
"suspended": 0,
"max_buckets": 1000,
"auid": 0,
"subusers": [],
"keys": [
{ "user": "test",
"access_key": "CVMC8OX9EMBRE2F5GA8C",
"secret_key": "P3H4Ilv8Lhx0srz8ALO\/7udwkJd6raIz11s71FIV"}],
"swift_keys": [],
"caps": []}
Swift authentication works with subusers. In OpenStack this will be tenant:user, so you need to mimic it:

radosgw-admin subuser create --uid=test --subuser=test:swift --access=full
"suspended": 0,
"auid": 0,
"subusers": [
{ "id": "test:swift",
"permissions": "full-control"}],
"keys": [
{ "user": "test",
"swift_keys": [],
"caps": []}
Generate a secret key.
Note
--gen-secret is required in Cuttlefish and newer.
radosgw-admin key create --subuser=test:swift --key-type=swift --gen-secret

"suspended": 0,
"auid": 0,
"subusers": [
{ "id": "test:swift",
"permissions": "full-control"}],
"keys": [
{ "user": "test",
"swift_keys": [
{ "user": "test:swift",
"secret_key": "hLyMvpVNPez7lBqFlLjcefsZnU0qlCezyE2IDRsp"}],
"caps": []}
Sample test commands should look as follows:

Monitoring Guide Reset the Ceph cluster
swift -A http://localhost:6780/auth/1.0 -U test:swift -K "eRYvzUr6vubg93dMRMk60RWYiGdJGv

Reset the Ceph cluster

You can reset the Ceph cluster if necessary after correcting configuration errors. This is often easier than having
to re-deploy the entire OpenStack environment.
To do this, create a simple shell script with the following contents (below), then edit the string export
all="compute-4 controller-1 controller-2 controller-3" and define the variable $all to contain all nodes that contain
Ceph-MON and Ceph-OSD services that you want to re-initialize as well as all Compute nodes.
You can get the node names with the fuel node list command.
export all="compute-4 controller-1 controller-2 controller-3"

for node in $all
do
ssh $node 'service ceph -a stop ;
umount /var/lib/ceph/osd/ceph* ;
done;
ceph-deploy purgedata $all;
ceph-deploy purge $all;
yum install -y ceph-deploy;
rm ~/ceph* ;
ceph-deploy install $all
S3 API in Ceph RADOS Gateway

Introduction
Ceph RADOS Gateway offers accessing the same objects and containers using many different APIs. The two most
important are: OpenStack Object Storage API v1 (aka Swift API) and Amazon S3 (Simple Storage Service). Beside
these, radosgw supports several internal interfaces dedicated for logging, replication, and administration.
Covering them is not the purpose of this document.
Getting started
Assumption has been made that you posses a working Ceph cluster and the radosgw is able to access the cluster.
In case of using cephx security system, which is the default scenario, both radosgw and cluster must authenticate
to each other. Please note this is not related to any user-layer authentication mechanism used in radosgw like
Keystone, TempURL, or TempAuth. If radosgw is deployed with Fuel, cephx should work out of the box. In case of
manual deployment, official documentation will be helpful.
To enable or just verify whether S3 has been properly configured, the configuration file used by radosgw (usually
/etc/ceph/ceph.conf) should be inspected. Please take a look at radosgw's section (usually client.radosgw.gateway)
and consider the following options:
• rgw_enable_apis - if present, it must contain at least s3

User authentication
The component providing S3 API implementation inside radosgw actually supports two methods of user
authentication: Keystone-based and RADOS-based (internal). Each of them may be separately enabled or disabled
with an appropriate configuration option. The first one takes precedence over the second. That is, if both
methods are enabled and Keystone authentication fails for any reason (wrong credentials, connectivity problems
etc.), the RADOS-based will be treated as fallback.
Keystone-based
Configuration
Keystone authentication for S3 is not enabled by default, even if using Fuel. Please take a look at appropriate
section (usually client.radosgw.gateway) in radosgw's configuration file and consider following options:
Present in F
Default uel-deploye
value in d configurat
Option name rgw Comment ion
rgw_s3_auth_use_keystone false must be present and set to true no
rgw_keystone_url empty must be present and set to point admin interface yes
of Keystone (usually port 35357, some versions of
Fuel wrongly set the port to 5000).
rgw_keystone_admin_token empty must be present and match the admin token set in yes
Keystone configuration
rgw_keystone_accepted_roles Member, should correspond the Keystone schema. Fuel yes
admin setting seems to be OK
In case of using Keystone-based authentication, user management is fully delegated to Keystone. You may use
keystone CLI command to do that. Please be aware that the EC2/S3 <AccessKeyId>:<secret> credentials pair do not
map well into Keystone (there is no a direct way to specify a tenant), so special compatibility layer has been
introduced. Practically, it means you need to tell Keystone about the mapping parameters manually. For example:
keystone ec2-credentials-create --tenant-id=68a23e70b5854263ab64f2ddc16c2a38 --user-id=2ccd
WARNING: Bypassing authentication using a token & endpoint (authentication credentials are
+-----------+----------------------------------+
+-----------+----------------------------------+
| access | 3862b51ecc6a43a78ffca23a05e7c0ad |
| secret | a0b4cb375d5a409893b05e36812fb811 |
| tenant_id | 68a23e70b5854263ab64f2ddc16c2a38 |
| trust_id | |
| user_id | 2ccdd07ae153484296d308eab10c85dd |
+-----------+----------------------------------+
access and secret are the parameters needed to authenticate a client to S3 API.

Performance Impact
Please be aware that Keystone's PKI tokens are not available together with S3 API. Moreover, radosgw doesn't
cache Keystone responses while using S3 API. This could lead to authorization service overload.
RADOS-based (internal)
Configuration
The RADOS-based authentication mechanism should work out of the box. It is enabled by default in radosgw and
Fuel does not change this setting. However, in case of necessity to disable it, the rgw_s3_auth_use_rados may be
set to false.
User management could be performed with command line utility radosgw-admin provided with Ceph. For
example, to create a new user the following command should be executed:
radosgw-admin user create --uid=ant --display-name="aterekhin"

{ "user_id": "ant",
"display_name": "aterekhin",
"email": "",
"suspended": 0,
"auid": 0,
"subusers": [],
"keys": [
{ "user": "ant",
"access_key": "9TEP7FTSYTZF2HZD284A",
"secret_key": "8uNAjUZ+u0CcpbJsQBgpoVgHkm+PU8e3cXvyMclY"}],
"swift_keys": [],
"caps": [],
"op_mask": "read, write, delete",
"default_placement": "",
"placement_tags": [],
"bucket_quota": { "enabled": false,
"max_size_kb": -1,
"max_objects": -1},
"user_quota": { "enabled": false,
"max_size_kb": -1,
"max_objects": -1},
"temp_url_keys": []}
access_key and secret_key are the parameters needed to authenticate a client to S3 API.
Verification
To check whether everything works fine a low-level S3 API client might be very useful, especially if it can provide
assistance in the matter of authentication signature generation. S3 authentication model requires that the client
provides a key identifier (AccessKeyId) and HMAC-based authentication signature, which is calculated against a
user key (secret) and some HTTP headers present in the request. The well-known solution is s3curl application.
However, unpatched versions contain severe bugs (see LP1446704). We fixed them already and sent a pull
request to its author. However, until it is not merged, we may recommend trying this version of s3curl.

Step-by-step instruction
1. Install the libdigest-hmac-perl package

2. Download the S3 API client using the following link:
git clone https://github.com/rzarzynski/s3curl
3. Set permisions for s3curl.pl:
chmod u+x s3curl.pl
4. Create .s3curl file in your home directory. This file should contain your AccessKeyId and SecretAccessKey
pairs.
%awsSecretAccessKeys = (
# your account
ant => {
id => '9TEP7FTSYTZF2HZD284A',
key => '8uNAjUZ+u0CcpbJsQBgpoVgHkm+PU8e3cXvyMclY',
},
);
5. Set the S3 endpoint in s3curl.pl file, for example:
my @endpoints = ('172.16.0.2');
or use s3curl.pl script to add it:
./s3curl.pl --id ant --endpoint <s3-endpoint-host>
Example:
./s3curl.pl --id ant --endpoint 172.16.0.2
Note
You can get your S3 endpoint using the keystone CLI command as follows:
keystone endpoint-get --service 's3'

+--------------+------------------------+
+--------------+------------------------+

| s3.publicURL | http://172.16.0.2:8080 |
+--------------+------------------------+
6. Try to run the s3curl command to test S3 API, for example:
• To get an object
./s3curl.pl --id <frienfdly name> -- <endpoint>/<bucket name>/<key name>
Example:
./s3curl.pl --id ant -- http://172.16.0.2:8080/bucket/key
• Upload a file
./s3curl.pl --id <frienfdly name> --put <path to file> -- <endpoint>/<bucket name>/
Example:
./s3curl.pl --id ant --put file -- http://172.16.0.2:8080/bucket/key
Note
Known issues: LP1477457 in Fuel 6.1.

Mirantis OpenStack v8.0 Migrate workloads from a compute node for
Monitoring Guide maintenance
Migrate workloads from a compute node for maintenance

To ensure application uptime during an operating system kernel upgrade or hardware replacement on a compute
node under maintenance, temporarily migrate virtual machine instances to an operational compute node. After
completing maintenance procedures, enable the virtual machine scheduling.
This section includes the following topics:
• Disable virtual machine scheduling

• Migrate instances
• Monitor the migration process
• Restore a compute node after maintenance
Disable virtual machine scheduling

Before starting maintenance procedures, you must cancel virtual machine creation on the compute nodes that
you need to maintain. To stop virtual machine scheduling, disable the nova-compute service on the
corresponding compute node.
To disable virtual machine scheduling:
1. View the list of all nova-compute services:
nova service-list
Example of system response:
+---+-----------------+-----+--------+-------+-------+----------+--------+
| Id| Binary |Host | Zone |Status | State |Updated_at|Disabled|
| | | | | | | |Reason |
+---+-----------------+-----+--------+-------+-------+----------+--------+
| 1 | nova-conductor |node1|internal|enabled| up |2015-11-16| - |
| 3 | nova-cert |node1|internal|enabled| up |2015-11-16| - |
| 4 | nova-network |node1|internal|enabled| up |2015-11-16| - |
| 5 | nova-scheduler |node1|internal|enabled| up |2015-11-16| - |
| 6 | nova-consoleauth|node1|internal|enabled| up |2015-11-16| - |
| 7 | nova-compute |node1|nova |enabled| up |2015-11-16| - |
| 8 | nova-network |node2|internal|enabled| up |2015-11-16| - |
| 9 | nova-compute |node2|nova |enabled| up |2015-11-16| |
+---+-----------------+-----+--------+-------+-------+----------+--------+
2. Disable the corresponding nova-compute service:
nova service-disable [--reason <reason>] <hostname> <binary>
Example:

Monitoring Guide Migrate instances
nova service-disable node2 nova-compute
This command puts the compute node node2 into a maintenance mode and prevents the
nova-compute service to boot instances on this compute node.
System response:
+-------+--------------+----------+
| Host | Binary | Status |
+-------+--------------+----------+
| node2 | nova-compute | disabled |
+-------+--------------+----------+
3. Proceed to Migrate instances.
Migrate instances
After you disable virtual machine scheduling as described in Disable virtual machine scheduling, you must migrate
the virtual machines instances.
To migrate instances:
1. Select from the following options:
• Manual live migration:
• If the compute node shares storage with other hosts, type:
nova live-migration <instance>
Example:
nova live-migration instance_1
• If the compute node does not have a shared storage,
nova live-migration --block-migrate <instance_name> <host>
Example:
nova live-migration --block-migrate instance_1 node3
• Host live migration:
• Migrate all instances from the node under maintenance to other available compute nodes:

Monitoring Guide Migrate instances
nova host-evacuate-live [--target-host <target_host>] \

[--block-migrate] [--disk-over-commit] [--max-servers <max_servers>]\
<host>
Example:
Host live migration with shared storage:
nova host-evacuate-live node2
+--------------------------+-------------------------+--------------+
| Server UUID | Live Migration Accepted |Error Message |
+--------------------------+-------------------------+--------------+
| 4c379e02-f474-4fc3-929d- | True | |
| f917b5d3bca0 | | |
+--------------------------+-------------------------+--------------+
Example:
Host live evacuation without shared storage:
nova host-evacuate-live --block-migrate node2
• Cold migration:
• If you want to move specific instances:
1. Stop the required instance:
nova stop <instance>
2. Migrate the instance:
nova migrate <instance>
3. Confirm migration:
nova resize-confirm <instance>
4. Repeat previous step for all migrated instances.

5. Start the instance:
nova start <instance>

Monitoring Guide Monitor the migration process
• If you want to move all instances, type:
nova host-servers-migrate <instance>
Example:
nova host-servers-migrate node2
+-------------------------------+--------------------+---------------+
| Server UUID | Migration Accepted | Error Message |
+-------------------------------+--------------------+---------------+
| 4c379e02-f474-4fc3-929d-f917b | True | |
| 86b5e13a-35c4-434e-aaac-5941f | True | |
+-------------------------------+--------------------+---------------+
2. After migration completes, start the maintenance procedure.
3. When you complete maintenance, proceed to Restore a compute node after maintenance.
Monitor the migration process

After you start the migration of your virtual machines, you can monitor the migration process. For monitoring, use
the nova instance-action command.
To monitor the migration process:
1. View the list of actions:
nova instance-action-list <instance>
+----------------+-------------------+---------+------------------------+
| Action | Request_ID | Message | Start_Time |
+----------------+-------------------+---------+------------------------+
| create | req-16c835a0-ce12-| - | 2015-10-21T13:16:37.000|
| live-migration | req-fe6f0edc-e0ef | - | 2015-10-21T13:22:59.000|
+----------------+-------------------+---------+------------------------+
2. View detailed information about an action by specifying a Request_ID:
nova instance-action <instance> <request_id>
Example:

Monitoring Guide Restore a compute node after maintenance
nova instance-action vm1 req-7c1e7b16-3a0b-4083-b740-53997211e441
+-------------------+------------------------------------------+
+-------------------+------------------------------------------+
| action | live-migration |
| events | [] |
| instance_uuid | 6e891799-2180-4b22-99a3-565143a001ea |
| message | - |
| project_id | 7909fc66b9254c38955c5a1bcc75f918 |
| request_id | req-7c1e7b16-3a0b-4083-b740-53997211e441 |
| start_time | 2015-11-10T17:04:11.000000 |
| user_id | b9357b628a774a9eb79ae40aedf9add6 |
+-------------------+------------------------------------------+
Restore a compute node after maintenance

After completing maintenance, restore the compute node operation.
To restore a compute node after maintenance:
1. Log in to a controller node.

2. Enable the nova-compute service:
nova service-enable <node_name> nova-compute
Example:
nova service-enable node2 nova-compute
System response:
+-------+--------------+----------+
| Host | Binary | Status |
+-------+--------------+----------+
| node2 | nova-compute | enabled |
+-------+--------------+----------+

Monitoring Guide Maintenance Mode
Maintenance Mode
During maintenance mode, an operating system runs only a critical set of working services needed for basic
network and disk operations. When in maintenance mode, a node is still reachable using SSH.
You can put your system in Maintenance Mode for system repair or other service operations.
Typically, Mirantis OpenStack maintenance updates do not cause service downtime. However, when you replace
hardware or update kernel packages, your virtual machines become unavailable to users. Therefore, if you run
critical applications that cannot tolerate downtime, you must move your virtual machines from the compute node
under maintenance to an operational compute node as described in Migrate workloads from a compute node for
maintenance before enabling maintenance mode.
Overview
Maintenance mode is enforced by using the umm parameter in one of the following ways:
• by selecting the respective option in the boot menu;

• by forcing the reboot into maintenance mode from shell with the umm on command;
• automatically, by reaching an number of unclean-reboots specified in REBOOT_COUNT parameters.
Unclean reboot means that system reboots unexpectedly without a direct call from the user.
You can also disable maintenance mode functionality if you do not need it (for example, you do not want to be
automatically booted into it every time).
You can operate in maintenance mode through ssh or tty2.
A return back to normal mode is issued with the umm off command.
Note
If you manually start a service in the maintenance mode, it will not be automatically restarted when you
put the system back in the normal mode with the umm off command.
Using the umm command

There are several parameters to use with the umm command:
• umm on [cmd] - enter the maintenance mode, and execute cmd when MM is reached;
• umm status - check the mode status. There are three statuses:
• runlevel - the system is in the normal mode.

• reboot - the system is starting to enter the maintenance mode.
• umm - the system is in the maintenance mode.
• disabled - the maintenance mode functionality is disabled.

Monitoring Guide Configuring the UMM.conf file
• umm off [reboot] - resume boot or reboot into normal mode.

• umm enable - enable the maintenance mode functionality.
• umm disable - disable the maintenance mode functionality.
Configuring the UMM.conf file

You can automate the maintenance mode start by editing the /etc/umm.conf file.
The configuration options are:
• UMM=yes
• REBOOT_COUNT=2
• COUNTER_RESET_TIME=10
where:
UMM
tells the system to go into the maintenance mode based on the REBOOT_COUNT and
COUNTER_RESET_TIME values. If the value is anything other than yes (or if the UMM.conf file is missing),
the system will go into the native Ubuntu recovery mode.
REBOOT_COUNT
determines the number of unclean reboots that trigger the system to go into the maintenance mode.
COUNTER_RESET_TIME
determines the period of time (in minutes) before the Unclean reboot counter reset.
Example of using MM on one node

• Switching node into MM:
1 root@node-1:~#umm on
2 umm-gr start/running, process 6657
3
4 Broadcast message from root@node-1
5 (/dev/pts/0) at 14:29 ...
6
7 The system is going down for reboot NOW!
8 root@node-1:~# umm status
9 rebooting
10 root@node-1:~# Connection to node-1 closed by remote host.
11 Connection node-1:~# closed.
12 root@fuel:~#:~$
13
14 root@node-1:~#ssh
15
16 root@node-1:~# umm status

Mirantis OpenStack v8.0 Example of putting all nodes into the maintenance
Monitoring Guide mode at the same time
17 umm
18 root@node-1:~#ps -Af
We can see only small set of working processes.

• Start the service:
1 root@node-1:~# /etc/init.d/apache2 start

2 root@node-1:~# /etc/init.d/apache2 status
3 Apache2 is running (pid 1907).
• Switch back to the working mode:
1 root@node-1:~#umm off
• Continue booting into working mode:
1 root@node-1:~#umm status
2 runlevel N 2
3 root@node-1:~#/etc/init.d/apache2 status
4 Apache2 is running (pid 1907).
We can see that service was not restarted during switching from MM to working mode.
• Check the state of the OpenStack services:
1 root@node-1:~#crm status
• If you want to reach working mode by reboot, you should use the following command:
1 root@node-1:~# umm off reboot umm-gr start/running, process 2825

2
4 (/dev/pts/0) at 11:23 ...
5
7 root@node-1:~# Connection to node-1 closed by remote host.
8 Connection to node-1 closed.
9 [root@fuel ~]#
Example of putting all nodes into the maintenance mode at the same time
The following maintenance mode sequence is called Last input First out. This guarantees that there is going to be
the most recent data on the Cloud Infrastructure Controller (CIC) that comes back first.
• Determine which nodes have Controller (CIC) role:

1 [root@fuel ~]# fuel nodes

2 id | status | name | cluster| ip | mac | roles
3 ---|--------|------------------|--------|-----------|-------------------|------------
4 2 | ready | Untitled (c0:02) | 1 | 10.20.0.4 | e6:6a:42:96:a4:45 | controller
5 4 | ready | Untitled (c0:04) | 1 | 10.20.0.6 | 66:10:2e:0c:12:4a | compute
6 1 | ready | Untitled (c0:01) | 1 | 10.20.0.3 | fa:a1:39:94:7f:4c | controller
7 3 | ready | Untitled (c0:03) | 1 | 10.20.0.5 | 82:cb:bb:50:40:47 | controller
• Copy id_rsa to the CICs for passwordless ssh authentification:
1 [root@fuel ~]# scp .ssh/id_rsa node-1:.ssh/id_rsa

2 Warning: Permanently added 'node-1' (RSA) to the list of known hosts.
3 id_rsa 100% 1675 1.6KB/s 00:00
6 id_rsa 100% 1675 1.6KB/s 00:00
9 id_rsa 100% 1675 1.6KB/s 00:00
• Enforce switching into MM mode on all nodes:
1 [root@fuel ~]# ssh node-1 umm on ssh node-2 umm on ssh node-3 umm on
3 umm-gr start/running, process 24318
4 Connection to node-1 closed by remote host.
6 [root@fuel ~]#
7 [root@fuel ~]# ssh -tt node-1 ssh -tt node-2 ssh -tt node-3 sleep 1
9 ECDSA key fingerprint is 84:17:0d:ea:27:1f:4e:08:f7:54:b2:8c:fe:8a:13:1a.
10 Are you sure you want to continue connecting (yes/no)? yes
11 Warning: Permanently added 'node-2,10.20.0.4' (ECDSA)
12 to the list of known hosts. established.
13 ECDSA key fingerprint is
14 c3:c6:ca:7d:11:d3:53:01:15:64:20:f7:c7:44:fb:d1.
15 Are you sure you want to continue connecting (yes/no)? yes
16 Warning: Permanently added 'node-3,192.168.0.6' (ECDSA)
17 to the list of known hosts.
20 Connection to node-1 closed. [root@fuel ~]#
• Wait until the last node reboots:
1 [root@fuel ~]# ssh node-3


3 Welcome to Ubuntu 12.04.4 LTS (GNU/Linux 3.13.0-32-generic x86_64)

4 * Documentation: https://help.ubuntu.com/
5 Last login: Tue Dec 23 05:55:47 2014 from 10.20.0.2
6 root@node-3:~#
8 (unknown) at 6:00 ...
10 Connection to node-3 closed by remote host.
12 [root@fuel ~]#
• Perform all the steps planned for MM.

• Enforce a return back into normal mode in reverse state:
1 [root@fuel ~]# ssh node-3 umm off


Monitoring Guide Running vCenter
Running vCenter
After the OpenStack environment that is integrated with the vCenter server is deployed, you can manage the
VMware cluster using the Horizon dashboard and/or the Nova CLI:
• Log into the Horizon dashboard.

• Open the Hypervisor tab and select the VMware vCenter Server.
• To boot a new VM in vCenter:
• Open the "Manage Compute" tab and go to the Instances page.

• Press the "Launch instance" button and specify the VM parameters in the pop-up window. This
launches a new VM which can be seen in a vSphere web client.
• Use the Horizon UI or the Nova CLI to stop or delete booted VMs in the vCenter.
• Use the Nova CLI to see the VMware cluster resources or to boot a new VM in vCenter.
Nova-compute and vSphere clusters mapping

In earlier Fuel releases, 1-N mapping between nova-compute service and vSphere cluster (cluster that is formed
from ESXi hosts by vCenter server) was used. In most cases, a single nova-compute service instance uses many
vSphere clusters, managed by a single vCenter. Beginning with 6.1 Fuel release, this behaviour was changed to
1-1 mapping, so that a single nova-compute service instance now interacts with a single vSphere cluster.

Monitoring Guide Performance Notes
Performance Notes
Keystone Token Cleanup
The Keystone service creates new tokens in the Keystone database each time an external query is run against
OpenStack but it does not automatically clean up expired tokens because they may be required for forensics work
such as that required after a security breach. However, the accumulation of the expired tokens in the database
can seriously degrade the performance of the entire OpenStack.
Beginning with version 5.0, Mirantis OpenStack includes the pt-archiver command from the Percona Toolkit. We
recommend using pt-archiver to set up a cleanup job that runs periodically; the cleanup-keystone-tokens.sh
script from TripleO is a good example:
pt-archiver --source h=$DB_HOST,u=$DB_USER,p=$DB_PASS,D=$DB_NAME,t=token \

--charset utf8 \
--where "expires < UTC_TIMESTAMP()" \
--purge \
--txn-size 500 \
--run-time 59m \
--statistics \
--primary-key-only
It is better to use pt-archiver instead of deleting the expired tokens using standard database manipulation
commands because it prevents the Keystone database from being blocked for significant time periods while the
rows with expired tokens are deleted.

Monitoring Guide HowTo: Backup and restore Fuel Master
HowTo: Backup and restore Fuel Master

Because the Fuel Master itself is not available with high availability, it is strongly recommended to create a
backup after each deployment. This allows for recovery in the event of data corruption or hardware loss. The
backup process requires the operator to place the backup on a remote location so that it survives a complete
system failure. Note that this backup is only for the Fuel Master itself and not for OpenStack deployments. In
order to be able to manage existing OpenStack environments deployed by Fuel, it is necessary to restore a
backed up Fuel Master if a reinstall was performed.
You can back up your Fuel Master without downtime. What this means is Fuel API, Fuel UI, DHCP, DNS, and NTP
services continue to operate during the backup process, causing no impact for any deployed or bootstrapped
nodes.
In order to back up the Fuel Master, you need to meet these requirements:
• No deployment tasks are currently running

• You have at least 11GB free disk space
The backup contains the following items:
• All docker containers (including Fuel DB)

• PXE deployment configuration
• All OpenStack environment configurations
• Package repositories
• Deployment SSH keys
• Puppet manifests
Items not backed up include logs and host network configuration. If preserving log data is important, back up the
/var/log directory separately. This could be done by using scp to transfer /var/log to another host. Network
configuration needs to be done manually via Fuel Setup if you reinstall your Fuel Master before restoring it.
Running the backup

To start a backup, run dockerctl backup. Optionally, you can specify a path for backup. The default path is
/var/backup/fuel. This process takes approximately 30 minutes and is dependent on the performance of your
hardware. After the backup is done, you may want to copy the backup to a separate storage medium.
Note
If you make further changes to your environment after a backup, you should make a new backup.
Restoring Fuel Master

The restore is quite similar to the backup process. This process can be run any time after installing a Fuel Master
node. Before starting a restore operation, ensure the following:

Monitoring Guide HowTo: Backup and restore Fuel Master
• The Fuel version is the same release as the backup

• There are no deployments running
• At least 11GB free space in /var
If you reinstall your Fuel Master host, you need to configure your network settings via Fuel Setup the same way it
was configured originally. It is particularly important that the configuration for the Admin (PXE) network is the
same as before.
To run the restore, simply run dockerctl restore /path/to/backup.

Mirantis OpenStack v8.0 How slave nodes choose the interface to use for PXE
Monitoring Guide booting
How slave nodes choose the interface to use for PXE booting
Fuel configures the NIC name/order based on the data seen by the Nailgun agent (/opt/nailgun/bin/agent) on the
discovered nodes. This is in turn the result of how the NICs are named/ordered by the bootstrap node.
The device used by the Admin (PXE) network will be the interface that is directly attached to this network. If one
is not available, it will fall back to the interface with the default gateway.
For example:
Physical device Interface MAC

0 eth0 :FE:A0
1 eth1 :BC:6D
2 eth2 :E1:B2
If physical device 0 is connected to the Admin (PXE) network, then eth0 will be the admin interface in Fuel.
If instead physical device 1 is connected to the Admin (PXE) network, then eth1 will be the admin interface in
Fuel.
A common issue here is that physical device 0, may not always be assigned device eth0. You may see:
Physical device Interface MAC

0 eth2 :FE:A0
1 eth0 :BC:6D
2 eth1 :E1:B2
In this case, having physical device 0 connected to the Admin (PXE) network will result in the eth2 interface
being used as the admin interface in Fuel.
You can confirm that the right interface is in use because the MAC address did not change even though the
device name did.

Monitoring Guide Horizon Deployment Notes
Horizon Deployment Notes

Overview
Horizon is the OpenStack web dashboard which provides a web based user interface to OpenStack services
including Nova, Swift, Keystone, etc.

Monitoring Guide Details of Health Checks
Details of Health Checks

The Health Checks are run from the Fuel console. This section provides background details on the actual checks
these tests perform.
Sanity tests description

Sanity checks work by sending a query to all OpenStack components to get a response back from them. Many of
these tests are simple in that they ask each service for a list of their associated objects and then wait for a
response. The response can be something, nothing, an error, or a timeout, so there are several ways to determine
if a service is up. The following list includes the suite of sanity tests implemented:
• Instance list availability

• Images list availability
• Volume list availability
• Snapshots list availability
• Flavor list availability
• Limits list availability
• Services list availability
• User list availability
• Stack list availability
• Check if all the services execute normally
• Check Internet connectivity from a Compute node
• Check DNS resolution on a Compute node
• Murano environment and service creation, listing and deletion
• Networks availability
• Ceilometer meter, alarm and resource list availability
Functional tests description

Functional tests verify how your system handles basic OpenStack operations under normal circumstances. The
Functional Test series gives you information about the speed of your environment and runs timeout tests.
All tests use the basic OpenStack services (Nova, Glance, Keystone, Cinder, etc), so, if any of these are inactive,
the test using it will fail. You should run all sanity checks before running the functional checks to verify that all
services are alive. This helps ensure that you do not get false negatives. The following is a description of each
sanity test available:
• Create instance flavor

• Create instance volume
• Launch instance, create snapshot, launch instance from snapshot

Monitoring Guide Details of Health Checks
• Keypair creation
• Security group creation
• Check networks parameters
• Launch instance
• Assign floating IP
• Check that VM is accessible via floating IP address
• Check network connectivity from instance via floating IP
• Check network connectivity from instance without floating IP
• Launch instance with file injection
• Launch instance and perform live migration
• User creation and authentication in Horizon

Monitoring Guide Network issues
Network issues
Fuel has the built-in capability to run a network check before or after an OpenStack environment deployment.
The connectivity check includes tests for connectivity between nodes through configured VLANs on configured
host interfaces. Additionally, the checks for an unexpected DHCP server are performed to verify that outside
DHCP servers will not interfere with a deployment. If the verification does not succeed, the Connectivity Check
screen displays the details of a failure.
HA tests description
HA tests verify the High Availability (HA) architecture. The following is a description of each HA test available:
• Check data replication over MySQL

• Check if the amount of tables in OS databases is the same on each node
• Check Galera environment state
• RabbitMQ availability
• RabbitMQ replication
• Check Pacemaker status
Configuration tests description

Configuration tests verify if the default user data (e.g. username, password for OpenStack cluster) were changed.
The following is a description of each test available:
• Check usage of the default credentials (password) for root user to SSH on the Fuel Master node. If the
default password was not changed, the test will fail with a recommendation to change it.

Monitoring Guide Cloud validation tests description
• Check usage of the default credentials for OpenStack cluster. If the default values are used for the admin
user, the test will fail with a recommendation to change the password/username for the OpenStack user
with the admin role.
Cloud validation tests description

The following tests for cloud validation are implemented:
• Check disk space outage on the Controller and Compute nodes

• Check log rotation configuration on all nodes

Monitoring Guide Notes on Corosync and Pacemaker
Notes on Corosync and Pacemaker

Keep in mind the following on Corosync and Pacemaker:
• If you restart the Corosync service, you will need to restart the Pacemaker as well.
• Corosync 1.x cannot be upgraded to 2.x without full cluster downtime. All Pacemaker resources, such as
Neutron agents, DB and AMQP clusters and others, will be hit by the downtime as well.
• All location constraints for cloned the Pacemaker resources must be removed as a part of the Corosync
version upgrade procedure.

Monitoring Guide Troubleshooting
Troubleshooting
Logs and messages
A number of logs are available to help you understand and troubleshoot your OpenStack environment.
Screen notifications
To view the latest updates on the nodes statuses, see the notifications drop-down list using the rightmost icon in
the upper right corner of the Fuel web UI. Click a notification to get a summary configuration view of a particular
node.
Viewing logs through Fuel

The Logs tab on the Fuel web UI enables you to view the key logs for any node in the OpenStack environment.
Use the drop-down list on the left to select whether to view the Fuel Master node logs or the Fuel Slave nodes
logs:
Select the log to view by setting the fields:

Logs
Choose between the Fuel Master or Other servers. When you select Other servers, the display changes to
provide a drop-down list of all nodes in the OpenStack environment.
Source

Log you want to view. See the lists below.

Min Level
By default, this displays INFO messages only. If you are running your OpenStack environment in the debug
mode, you can use this field to filter out some of the messages.
When you have set all the fields, click SHOW to display the requested log for the specified node.
Viewing the Fuel Master node logs

The following logs can be selected from the Source list for the the Fuel Master node:
puppet
Logs activity of the Puppet configuration management system.
anaconda
Logs activities of the Anaconda installation agent used for provisioning.
syslog
Displays the syslog entries that will be sent to the rsyslog server (the Fuel Master node, by default).
Other install logs
storage
Log entries for disk partitioning.
kickstart-pre
Shows activities before the Cobbler kickstart mechanism runs.
kickstart-post
Shows activities after the Cobbler kickstart mechanism runs.
Bootstrap logs
Web backend
Logs each connection from the Fuel Master node to the Internet.
REST API
Nailgun API activities.
RPC consumer
Logs messaging between Nailgun and the orchestration service
Astute
Records the activity of the Astute agents that implements the Nailgun configuration tasks.
Health Check
Displays the results of the most recent run of the tests run from the Health Check tab.
Viewing logs for target nodes ("Other servers")

When you select Other Servers, the display includes an extra field that you use to select the node which logs you
want to view:

Many of the same logs shown for the Fuel Master node are also shown for the Fuel Slave nodes. The difference is
in the nodes given for bootstrap logs. Additionally, the controller node includes a set of OpenStack logs that
diplays logs for services that run on the Controller node.
The "Bootstrap logs" for "Other servers" are:
Bootstrap logs
dmesg
Standard Linux dmesg log that displays log messages from the most recent system startup.
messages
Logs all kernel messages for the node.
mcollective
Logs activities of Mcollective
agent
Logs activities of the Nailgun agent.
syslog
OpenStack uses the standard Linux syslog/rsyslog facilities to manage logging in the environment. Fuel puts the
appropriate templates into the /etc/rsyslog.d directory on each target node.
By default, Fuel sets up the Fuel Master node to be the remote syslog server. See the Fuel User Guide for
instructions on how to configure the environment to use a different server as the rsyslog server. Note that Fuel
configures all the files required for rsyslog when you use the Fuel Master node as the remote server; if you
specify another server, you must configure that server to handle messages from the OpenStack environment.
/var/logs
Logs for each node are written to the node's /var/logs directory and can be viewed there. Under this directory, you
will find subdirectories for the major services that run on that node such as nova, cinder, glance, and heat.
On the Fuel Master node, /var/log/remote is a symbolic link to the /var/log/docker-logs/remote directory.
Logging events from Fuel OCF agents are collected both locally in /var/log/daemon.log and remotely. The naming
convention for a remote log file is the following:
ocf-<AGENT-NAME>.log
For example:
ocf-foo-agent.log
Note
RabbitMQ logs its OCF events to the lrmd.log file to keep it backwards compatible.

atop logs
Fuel installs and runs atop on all deployed nodes in the environment. The atop service uses screen to display
detailed information about resource usage on each node. The data shows usage of the hardware resources that
are most important from a performance standpoint: CPU, memory, disk, and network interfaces and can be used
to troubleshoot performance and scalability issues.
The implementation is:
• By default, atop takes a snapshot of system status every 20 seconds and stores this information in binary
form.
• The binary data is stored locally on each node in the /var/log/atop directory.
• Data is kept for seven days; a logrotate job deletes logs older than seven days.
• Data is stored locally on each target node; it is not aggregated for the whole environment.
• The data consumes no more than 2GB of disk space on each node with the default configuration settings.
• The atop service can be disabled from the Linux shell. A Puppet run (either done manually or when patching
OpenStack) re-enables atop.
• The Diagnostic Logs snapshot includes the atop data for the current day only.
To view the atop data, run the atop(1) command on the shell of the node you are analyzing:
atop -r /var/log/atop/<filename>
Use t and T to navigate through the state snapshots.

You can also search the atop data. For example, the following command reports on all sh processes in the atop
binary file that ran between 10:00 and 12:00 (searching the atop_current binary file). For each process, the PPRG
flag causes it to report when it started, when it ended, and the exit code provided:
atop -PPRG -r /var/log/atop/atop_current -b "10:00" -e "12:00" | grep 'sh'
See the atop(1) man page for a description of the atop options and commands.
Each target node has a configuration file that controls the behavior of atop on that node. On Ubuntu nodes, this
is the /etc/default/atop file. The contents of the file are:
INTERVAL=20
LOGPATH="/var/log/atop"
OUTFILE="$LOGPATH/daily.log"`
Modifying the value of the INTERVAL parameter or the logrotate settings affects the size of the logs maintained.
For the most efficient log size, use a larger interval setting and a smaller rotate setting. To modify the rotate
setting, edit the /etc/logrotate.d/atop file and make both of the following modifications; the value of X should be
the same in both cases:
• Modify the value of the rotate X setting.

• Modify the value of the mtime +X argument to the lastaction setting.

Fuel Master node log rotation

The Fuel Master node provides an ability to configure log rotation using the template file.
• By default, the logrotate script calls /etc/logrotate.d/fuel.nodaily file every 15 minutes to verify whether one
of the following conditions is met:
• age - if the log has not been rotated in more than the specified period of time (weekly by default), and
the file is larger than 10MB on the Fuel Master node or 5MB on slave nodes;
• size - if a log file exceeds 100MB on the Fuel Master node or 20MB on slave nodes.
Warning
Fuel enforces the following global changes to the logrotate configuration: delaycompress and
copytruncate.
• You can run a quick test to check if the logrotate script works.
Find the biggest file in /var/log/, and check date time stamps in the first and the last line:
biggest_file=$(find /var/log/ -type f | xargs du -h | sort -h | tail -n 1 | cut -f2);

ls -lah $biggest_file;
head -n1 $biggest_file | head -c35; echo "";
tail -n1 $biggest_file | head -c35; echo "";
If it is older than your rotation schedule and bigger than maxsize, then logrotate is not working correctly.
To debug, type:
logrotate -v -d /etc/logrotate.d/fuel.nodaily
The output of this command is a list of files examined by logrotate, including whether they should be
rotated or not.
• You can find an example of a writing rate evaluation for the neutron-server log file: LP1382515.
• When backporting the reworked logrotate configuration to the older Fuel releases, purge old template files:
rm /etc/logrotate.d/{1,2}0-fuel*
The /usr/bin/fuel-logrotate script is needed as well as a new cron job to perform the rotation with it.
Enabling debug logging for OpenStack services

Monitoring Guide Fuel Master and Docker disk space troubleshooting
Most OpenStack services use the same configuration options to control log level and logging method. If you need
to troubleshoot a specific service, locate its config file under /etc (e.g. /etc/nova/nova.conf) and revert the values
of debug and use_syslog flags like this:
debug = True
use_syslog = False
Disabling syslog will protect the Fuel master node from being overloaded from a flood of debug messages sent
from that node to the rsyslog server. Do not forget to revert both flags back to original values when done with
troubleshooting.
Fuel Master and Docker disk space troubleshooting

Overview
One major consideration in maintaining a Fuel Master node is managing disk space. While there is currently no
monitoring for Fuel Master, it is important to budget enough disk space. Failure to do so may lead to logs
overwhelming the /var partition. For example, enabling Ceilometer and debug logging will quickly fill up disk
space. The following sections describe failures that may occur if the disk fills up and gives solutions for resolving
them.
If the solution to your issue requires rebuilding Docker containers, take note that data recovery is not necessary
for the following containers: mcollective, nginx, ostf, nailgun, rsyslog, keystone, rabbitmq. This is because the
data on these containers is stateless. For astute, cobbler, and postgres containers it is necessary to recover
stateful data from the affected container. Instructions below will guide you through the processes.
PostgreSQL database inconsistency

Diagnosis
The following symptoms will be present:
• Fuel Web UI fails to work

• The dockerctl list -l output reports that the nailgun, ostf, and/or keystone container is down
• The output of the fuel task command reports an error similar to the following:
HTTP Error 400: Bad Request (This Session's transaction has been rolled back
due to a previous exception during flush. To begin a new transaction with
this Session, first issue Session.rollback(). Original exception was:
(InternalError) index "notifications_pkey" contains unexpected zero page at
block 26
HINT: Please REINDEX it.
Solution
The postgres container should still be running, so you simply need to run an SQL command to correct the fault.
Before attempting to fix the database, make a quick backup of it:

date=$(date --rfc-3339=date)
dockerctl shell postgres su - postgres -c 'pg_dumpall --clean' \
> /root/postgres_backup_$(date +"%F-%T").sql
Now try to reindex nailgun database:
dockerctl shell postgres su - postgres -c \

"psql nailgun -S -c \"select pg_terminate_backend(pid) from pg_stat_activity \
where datname='nailgun';\""
dockerctl shell postgres su - postgres -c "psql nailgun -c 'reindex database nailgun;'"
Lastly, check proper function:
dockerctl check all
Note
You may need to restart the postgres, keystone, nailgun, nginx or ostf Docker container using the
dockerctl restart CONTAINERNAME command.
Docker metadata corruption loses containers

Diagnosis
The following symptoms will be present:
• Deployment fails for some reason

• One or more Docker containers is missing from docker ps -a
• /var/log/docker contains the following message:
Cannot start container fuel-core-6.1-postgres: Error getting container

273c9b19ea61414d8838772aa3aeb0f6f1b982a74555fb6631adb6232459fe80 from driver
devicemapper: Error writing metadata to
/var/lib/docker/devicemapper/devicemapper/.json325916422: write
/var/lib/docker/devicemapper/devicemapper/.json325916422: no space left on device
Solution
This solution requires data recovery, described in the Summary above. It is necessary to recover data manually
using the dmsetup and mount commands.

First, you need the full UID of the docker container that was lost. In the log message above, we can see the ID is
273c9b19ea61414d8838772aa3aeb0f6f1b982a74555fb6631adb6232459fe80. If you are missing such a
message, it can be found this way:
container=postgres
container_id=$(sqlite3 /var/lib/docker/linkgraph.db \
"select entity_id from edge where name like '%$container%'")
echo $container_id
#should look like:
273c9b19ea61414d8838772aa3aeb0f6f1b982a74555fb6631adb6232459fe80
Once you have the container ID, you need to get the devicemapper block device ID for the container:
device_id=$(python -c 'import sys; import json; input = json.load(sys.stdin);\

[sys.stdout.write(str(v["device_id"])) for k, v in input["Devices"].items() if \
k == sys.argv[1]]' "$container_id" < /var/lib/docker/devicemapper/devicemapper/json)
echo $device_id
Now activate the volume and mount it:
# Verify the your device_id and container variables are defined

echo $device_id
echo $container
pool=$(echo /dev/mapper/docker*pool)
dmsetup create "${container}_recovery" --table "0 20971520 thin $pool $device_id"
mkdir -p "/mnt/${container}_recovery"
mkdir -p "/root/${container}_recovery"
mount -t ext4 -o rw,relatime,barrier=1,stripe=16,data=ordered,discard \
"/dev/mapper/${container}_recovery" "/mnt/${container}_recovery"
Verify that data is present in the mounted directory:
• for PostgreSQL:
ls -la /mnt/${container}_recovery/rootfs/var/lib/pgsql/9.3/data/
• for Astute:
ls -la /mnt/${container}_recovery/rootfs/var/lib/astute
• for Cobbler:
ls -la /mnt/${container}_recovery/rootfs/var/lib/cobbler
Next, it is necessary to purge the container record from the Docker sqlite database. You may see an issue when
running dockerctl start CONTAINER:

Abort due to constraint violation: constraint failed
Run this command before trying to restore the container data or if you are simply destroying and recreating it:
#Make a backup dump of docker sqlite DB

cp /var/lib/docker/linkgraph.db /root/linkgraph_$(date +"%F-%T").db
echo "Deleting container ID ${container_id}..."
sqlite3 /var/lib/docker/linkgraph.db "delete from entity where\
id='${container_id}';delete from edge where entity_id='${container_id}';"
Now perform the following recovery actions, which vary depending on whether you need to recover data from
Cobbler, Astute, or PostgreSQL:
For Cobbler:
cp -Rp /mnt/cobbler_recovery/rootfs/var/lib/cobbler /root/cobbler_recovery

dockerctl destroy cobbler
dockerctl start cobbler
dockerctl copy "/root/cobbler_recovery/*" cobbler:/var/lib/cobbler/
dockerctl restart cobbler
For PostgreSQL:
cp -Rp /mnt/postgres_recovery/rootfs/var/lib/pgsql /root/postgres_recovery

dockerctl destroy postgres
dockerctl start postgres
dockerctl shell postgres mv /var/lib/pgsql /root/pgsql_old
dockerctl copy /root/postgres_recovery/pgsql postgres:/var/lib/
dockerctl shell postgres chown -R postgres:postgres /var/lib/pgsql
dockerctl restart postgres keystone nailgun nginx ostf
You may want to make a PostgreSQL backup at this point:
dockerctl shell postgres su - postgres -c 'pg_dumpall --clean' \

> /root/postgres_backup_$(date +"%F-%T").sql
To recover a corrupted PostgreSQL database, you can import the dump to another PostgreSQL installation with
the same version, as on fuel master(in 6.0 it is 9.3.5) There you can get a clean dump that you then import to your
PostgreSQL container:
yum install postgresql-server

cp -rf data/ /var/lib/pgsql/
service postgresql start

su - postgres -c 'pg_dumpall --clean' > dump.sql

service postgresql stop
Now import the dump.sql file to the postgres container's database:
dockerctl shell postgres su - postgres -c "psql nailgun" < dump.sql
For Astute:
cp -Rp /mnt/astute_recovery/var/lib/astute /root/astute_recovery

dockerctl destroy astute
dockerctl start astute
dockerctl copy "/var/lib/astute/*" astute:/var/lib/astute/
dockerctl restart astute
Finally, clean up the recovery mount point:
umount "/mnt/${container}_recovery"
dmsetup clear $device_id
Read-only containers
Symptoms
• Fuel Web UI does not work

• Fuel CLI fails to report any commands
• Some containers may be failing and stopped
• Trying to run dockerctl shell CONTAINER touch /root/test results in "Read-only filesystem" error
Solution
Because of bugs in docker-io 0.10, the only way to correct this issue is to restart the Fuel Master node. If it still
fails with the same issue, you may have a corrupt filesystem. See the next section for more details.
Corrupt ext4 filesystem on Docker container

Symptoms
Error:
Cannot start container fuel-core-6.1-rsync: Error getting container

df5f1adfe6858a13b0a9fe81217bf7db33d41a3d4ab8088d12d4301023d4cca3 from driver
devicemapper: Error mounting
'/dev/mapper/docker-253:2-341202-df5f1adfe6858a...d41a3d4ab8088d12d4301023d4cca3'
on
'/var/lib/docker/devicemapper/mnt/df5f1adfe6858a...d41a3d4ab8088d12d4301023d4cca3':
invalid argument

Monitoring Guide How To Troubleshoot Corosync/Pacemaker
Solution
If the container affected is stateful, it is necessary to recover the data. Otherwise, you can simply destroy and
recreate stateless containers.
For stateless containers:
container="rsync" # Change container name

dockerctl destroy $container
dockerctl start $container
For stateful containers:
echo $container_id
umount -l /dev/mapper/docker-*$container_id
fsck -y /dev/mapper/docker-*$container_id
dockerctl start $container
How To Troubleshoot Corosync/Pacemaker

Pacemaker and Corosync come with several CLI utilities that can help you troubleshoot and understand what is
going on.
Note
In Mirantis OpenStack 6.0 and later, multiple L3 agents are configured as clones, one on each Controller.
When troubleshooting Corosync and Pacemaker, the clone_p_neutron-l3-agent resource (new in 6.0) is
used to act on all L3 agent clones in the environment. The p_neutron-l3-agent resource is still provided,
to act on a specific resource on a specific Controller node;
crm - Cluster Resource Manager

The crm utility shows you the state of the Pacemaker cluster and can be used for maintenance and to analyze
whether the cluster is consistent. This section discusses some frequently-used commands.
crm status
This command shows you the main information about the Pacemaker cluster and the state of the resources being
managed.
On the conrtoller node run:
root@node-1:~# crm

Then run the status command:
crm(live)#
crm(live)# status
Example of the output for the above command:
============
Last updated: Tue Jun 23 08:47:23 2015
Last change: Mon Jun 22 17:24:32 2015
Stack: corosync
3 Nodes configured
============

Clone Set: clone_p_neutron-plugin-openvswitch-agent [p_neutron-plugin-openvswitch-agent]


Or run the crm status command on the controller node:
root@node-1:~# crm status
============
Last updated: Tue Jun 23 08:47:52 2015
Last change: Mon Jun 22 17:24:32 2015
Stack: corosync
3 Nodes configured
============



crm(live)# resource
Here you can enter resource-specific commands:
root@node-1:~# crm
crm(live)# resource
crm(live)resource# status

vip__management (ocf::fuel:ns_IPaddr2): Started
vip__public_vrouter (ocf::fuel:ns_IPaddr2): Started
vip__management_vrouter (ocf::fuel:ns_IPaddr2): Started
vip__public (ocf::fuel:ns_IPaddr2): Started


crm(live)resource# start|restart|stop|cleanup <resource_name>

These commands, in order, allow you to start, stop, and restart resources.
crm(live)resource#cleanup
The cleanup command resets a resource's state on the node if it is currently in a failed state because of some
unexpected operation, such as some side effects of a System V init operation on the resource. If this happens,
Pacemaker can do the clean-up, deciding which node will run the resource.
Example:
3 Nodes configured, 3 expected votes

3 Resources configured.
============
3 Nodes configured, 3 expected votes

16 Resources configured.
Online: [ controller-01 controller-02 controller-03 ]
vip__management (ocf::heartbeat:IPaddr2): Started controller-01

vip__public (ocf::heartbeat:IPaddr2): Started controller-02
Started: [ controller-01 controller-02 controller-03 ]
Clone Set: clone_p_neutron-openvswitch-agent [p_neutron-openvswitch-agent]
Clone Set: clone_p_neutron-metadata-agent [p_neutron-dhcp-agent]
p_neutron-dhcp-agent (ocf::mirantis:neutron-agent-dhcp): Started controller-01
In this case, crm found residual OpenStack agent processes that had been started by Pacemaker because of
network failure and cluster partitioning. After the restoration of connectivity, Pacemaker saw these duplicate
resources running on different nodes. You can let it clean up this situation automatically or, if you do not want to
wait, cleanup them manually.
For more information, see crm interactive help and documentation.

Sometimes a cluster gets split into several parts. In this case, crm status shows something like this:
On ctrl1
============
….
Online: [ ctrl1 ]
On ctrl2
============
….
Online: [ ctrl2 ]
On ctrl3
============
….
Online: [ ctrl3 ]
You can troubleshoot this by checking connectivity between nodes. Look for the following:
1. By default Fuel configures corosync over UDP. Security Appliances shouldn't block UDP traffic for 5404,
5405 ports. Deep traffic inspection should be turned off for these ports. These ports should be accepted on
the management network between all controllers.
2. Corosync should start after the network interfaces are activated.
3. bindnetaddr should be located in the management network or at least in the same reachable segment.
corosync-cfgtool -s
This command displays the cluster connectivity status.:
Printing ring status.

Local node ID 50490378
RING ID 0
id = 10.107.0.8
status = ring 0 active with no faults
FAULTY status indicates connectivity problems.

corosync-objctl
This command can get/set runtime Corosync configuration values including the status of Corosync redundant ring
members:
runtime.totem.pg.mrp.srp.members.134245130.ip=r(0) ip(10.107.0.8)
runtime.totem.pg.mrp.srp.members.134245130.join_count=1
...
runtime.totem.pg.mrp.srp.members.201353994.ip=r(0) ip(10.107.0.12)
runtime.totem.pg.mrp.srp.members.201353994.join_count=1
runtime.totem.pg.mrp.srp.members.201353994.status=joined

If the IP of the node is 127.0.0.1, it means that Corosync started when only the loopback interface was available
and bound to it.
If the members list contains only one IP address or is incomplete, it indicates that there is a Corosync
connectivity issue because this node does not see the other ones.
As no-quorum-policy is set to stop on fully functioning cluster, Pacemaker will stop all resources on quorumless
partition. If quorum is present, the cluster will function normally, allowing to drop minor set of controllers. This
eliminates split-brain scenarios where nodes doesn't have quorum or can't see each other.
In some scenarios, such as manual cluster recovery, no-quorum-policy can be set to ignore. This setting allows
operator to start operations on single controller rather than waiting for for quorum.
pcs property set no-quorum-policy=ignore
Once quorum or cluster is restored, no-quorum-policy should be set back to its previous value.
Also, Fuel temporarily sets no-quorum-policy to ignore when Cloud Operator adds/removes a controller node to
the cluster. This is required for scenarios when Cloud Operator adds more controller nodes than the cluster
currently consist of. Once addition/removal of new controller node is done, Fuel sets no-quorum-policy to stop
value.
It's also recommended to configure fencing (STONITH) for Pacemaker cluster. That could be done manually or
with help of Fencing plugin[1]_ for Fuel. When STONITH enabled, no-quorum-policy could be set to suicide as
well. When set to suicide, the node will shoot itself and any other nodes in the partition without quorum - but it
won't try to shoot the nodes it can't see. When set to ignore (or when it has quorum), it will shoot anyone it can't
see. For any other value, it won't shoot anyone when it doesn't have quorum.
Furthermore, Corosync will always try to automatically restore the cluster back into single partition and start all
of the resources, if any were stopped, unless some controller nodes are damaged (cannot run the Corosync
service for example). Such nodes cannot join back the cluster and must be fenced by the STONITH daemon. That
is why production cluster should always have a fencing enabled.
How to verify that Neutron HA is working

To verify that Neutron HA is working, simply shut down the node hosting the Neutron agents (either gracefully or
with a hard shutdown). You should see agents start on the other node:
and see corresponding Neutron interfaces on the new Neutron node:
# ip link show
11: tap7b4ded0e-cb: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc

12: qr-829736b7-34: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc
13: qg-814b8c84-8f: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc
You can also check ovs-vsctl show output to see that all corresponding tunnels/bridges/interfaces are
created and connected properly:
1 Fencing plugin

ce754a73-a1c4-4099-b51b-8b839f10291c
Bridge br-mgmt
Port br-mgmt
Interface br-mgmt
type: internal
Port "eth1"
Interface "eth1"
Bridge br-ex
Port br-ex
Interface br-ex
type: internal
Port "eth0"
Interface "eth0"
Port "qg-814b8c84-8f"
Interface "qg-814b8c84-8f"
type: internal
Bridge br-int
Port patch-tun
Interface patch-tun
type: patch
options: {peer=patch-int}
Port br-int
Interface br-int
type: internal
Port "tap7b4ded0e-cb"
tag: 1
Interface "tap7b4ded0e-cb"
type: internal
Port "qr-829736b7-34"
tag: 1
Interface "qr-829736b7-34"
type: internal
Bridge br-tun
Port "gre-1"
Interface "gre-1"
type: gre
options: {in_key=flow, out_key=flow, remote_ip="10.107.0.8"}
Port "gre-2"
Interface "gre-2"
type: gre
Port patch-int
Interface patch-int
type: patch
options: {peer=patch-tun}
Port "gre-3"
Interface "gre-3"
type: gre


Port "gre-4"
Interface "gre-4"
type: gre
Port br-tun
Interface br-tun
type: internal
ovs_version: "1.4.0+build0"
Corosync crashes without network connectivity

Depending on a wide range of systems and configurations in the network, it is possible for Corosync's networking
protocol, Totem, to time out. If this happens for an extended period of time, Corosync may crash. In addition,
MySQL may have stopped. This guide illustrates the process of working through issues with Corosync and MySQL.
Workaround:
1. Verify that Corosync is really broken:
service corosync status
• You should see this error:
corosync dead but pid file exists
2. Start Corosync manually:
service corosync start
3. Run the following command:
ps -ef | grep mysql
and kill ALL(!) mysqld and mysqld_safe processes.

4. Wait for Pacemaker to completely start MySQL processes.
• Check it with the following command:
ps -ef | grep mysql

• If it doesn't start, run:
crm resource p_mysql
5. To verify that this host is a member of the cluster and that p_mysql does not contain any "Failed actions",
run the following command:
crm status

Monitoring Guide How To Troubleshoot AMQP issues
How To Troubleshoot AMQP issues

AMQP is the heart of OpenStack. If something gets wrong with the messaging layer, normally an OpenStack
application can tolerate this by issuing reconnects, reporting some requests as failed or retrying them. While this
depends on the particular application and the underlying Oslo messaging library design, there are also some
generic health checks and troubleshooting steps for operators to know and to do.
Check if there is a problem in the Corosync/Pacemaker layer

Normally, failures of the RabbitMQ multistate resource are automatically fixed by Corosync and Pacemaker. But if
a failure cannot be automatically fixed for some reason, the master_p_rabbitmq-server resource status will be
considered "bad". The status is considered "good" when there exists a single master for the
master_p_rabbitmq-server and the rest of the resource instances are reported as slaves. Note, that the command
pcs status issued on all controllers should be enough to gather all the required information, see crm - Cluster
Resource Manager. You can also try issuing an extended output with the crm tool alternative:
crm_mon -fotAW -1
If there are some RabbitMQ resource failures, they will be shown in the command output with the time stamps,
so you can search for the events in the logs around that moment.
Note
If there are split clusters of Corosync running, you should first fix your Corosync cluster, because you
cannot resolve issues with the Pacemaker resources, including RabbitMQ cluster, when there is a split
brain in the Corosync cluster.
How to recover
It is recommended to clean up and restart the master_p_rabbitmq-server Pacemaker resource, see crm - Cluster
Resource Manager.
Note
Restarting the RabbitMQ Pacemaker resource will introduce a full downtime for the AMQP cluster and
OpenStack applications. The downtime may take from a few to up to 20 minutes.
Check if there is a problem in the RabbitMQ layer

Normally, failures of the RabbitMQ cluster are automatically healed by OCF resource agents in Pacemaker. But if
this fails for some reason, there may be unsynchronized queues, wrong cluster membership reported, or

Mirantis OpenStack v8.0 Check if there is a problem in the Oslo messaging
Monitoring Guide layer
partitions detected by the rabbitmqctl tool, or even some list channels/queues requests may hang. Note, that the
command rabbitmqctl report issued on all controllers should be enough to gather all the required
information, but there is also a special group of Fuel OSTF HA health checks available in the Fuel UI and CLI. See
also RabbitMQ OSTF replication tests <https://blueprints.launchpad.net/fuel/+spec/ostf-rabbit-replication-tests>.
How to recover
It is recommended to clean up and restart the master_p_rabbitmq-server Pacemaker resource, see crm - Cluster
Resource Manager.
Check if there is a problem in the Oslo messaging layer

Note, that normally an OpenStack application should be able to reconnect the AMQP host and restore its
operations, eventually. But if it cannot for some reason, there may be "down" status reports or failures of CLI
commands and messaging related or publish/consume message related records in the log files of the OpenStack
services relying on the Oslo messaging library. For example, there may be records in the log files similar to the
following ones:
Timed out waiting for a reply to message...
How to recover
It is recommended to restart the affected OpenStack service or services, see HowTo: Manage OpenStack services.
Note
Restarting the OpenStack service will introduce a short (near to zero) downtime for the related OpenStack
application.
Check if there are AMQP problems with any of the OpenStack components
Note, that normally an OpenStack application should be able to reconnect the AMQP host and restore its
operations, eventually. But if it cannot for some reason, there may be "down" status reports or failures of CLI
commands and AMQP/messaging related records in the log files of the services belonging to the affected
OpenStack component under verification. For Nova, for example, there may be records in the log files in the
/var/log/nova/ directory similar to the following ones:
AMQP server on ... is unreachable: [Errno 113] EHOSTUNREACH...
How to recover
It is recommended to restart all instances of the OpenStack services related to the affected OpenStack
component, see HowTo: Manage OpenStack services. For example, for Nova Compute component, you may want to
restart all instances of the Nova services on the Controllers and Compute nodes affected by the AMQP issue.

Mirantis OpenStack v8.0 How to make RabbitMQ OCF script tolerate
Monitoring Guide rabbitmqctl timeouts
How to make RabbitMQ OCF script tolerate rabbitmqctl timeouts

If on a node where RabbitMQ is deployed other processes consume a significant part of CPU, RabbitMQ starts
responding slow to queries from rabbitmqctl utility. The utility is used by the OCF script to monitor the state
of the RabbitMQ. When the utility fails to complete in a pre-defined period of time, the OCF script considers the
RabbitMQ being down and restarts it, which might lead to a several minutes cloud downtime. Such restarts are
undesirable, as they cause a downtime without any benefit. To mitigate the issue, you can configure the OCF
script to tolerate certain amount of rabbitmqctl timeouts in a row using the following command:
crm_resource --resource p_rabbitmq-server --set-parameter \

max_rabbitmqctl_timeouts --parameter-value N
Replace N with the number of timeouts. For instance, if it is set to 3, the OCF script will tolerate two rabbitmqctl
timeouts in a row, but fail if the third one occurs.
By default, the parameter is set to 1, which means rabbitmqctl timeout is not tolerated at all. The downside
of increasing the parameter is a delay in restarting RabbitMQ when it is down. For example, if a real issue occurs
and causes a rabbitmqctl timeout, the OCF script will detect that only after N monitor runs and then restart
RabbitMQ, which might fix the issue.
To understand that RabbitMQ's restart was caused by a rabbitmqctl timeout, you should examine lrmd.log of
the corresponding Controller on the Fuel Master node in /var/log/docker-logs/remote/ directory for the presence of
the following lines:
"the invoked command exited 137: /usr/sbin/rabbitmqctl list_channels ..."
This indicates a rabbitmqctl timeout. The next line will explain if it caused restart or not. For example:
"rabbitmqctl timed out 2 of max. 3 time(s) in a row. Doing nothing for now."
Timeout In Connection to OpenStack API From Client Applications

If you use Java, Python or any other code to work with OpenStack API, all connections should be done over
OpenStack Public network. To explain why we can not use Fuel network, let's try to run nova client with debug
option enabled:
[root@controller-6 ~]# nova --debug list
REQ: curl -i http://192.168.0.2:5000/v2.0/tokens -X POST -H "Content-Type: appli

cation/json" -H "Accept: application/json" -H "User-Agent: python-novaclient" -d
'{"auth": {"tenantName": "admin", "passwordCredentials": {"username": "admin",
"password": "admin"}}}'
INFO (connectionpool:191) Starting new HTTP connection (1): 192.168.0.2

DEBUG (connectionpool:283) "POST /v2.0/tokens HTTP/1.1" 200 2702
RESP: [200] {'date': 'Tue, 06 Aug 2013 13:01:05 GMT', 'content-type': 'applicati

Mirantis OpenStack v8.0 How to make RabbitMQ OCF script tolerate
Monitoring Guide rabbitmqctl timeouts
on/json', 'content-length': '2702', 'vary': 'X-Auth-Token'}

RESP BODY: {"access": {"token": {"issued_at": "2013-08-06T13:01:05.616481", "exp
ires": "2013-08-07T13:01:05Z", "id": "c321cd823c8a4852aea4b870a03c8f72", "tenant
": {"description": "admin tenant", "enabled": true, "id": "8eee400f7a8a4f35b7a92
bc6cb54de42", "name": "admin"}}, "serviceCatalog": [{"endpoints": [{"adminURL":
"http://192.168.0.2:8774/v2/8eee400f7a8a4f35b7a92bc6cb54de42", "region": "Region
One", "internalURL": "http://192.168.0.2:8774/v2/8eee400f7a8a4f35b7a92bc6cb54de4
2", "id": "6b9563c1e37542519e4fc601b994f980", "publicURL": "http://172.16.1.2:87
74/v2/8eee400f7a8a4f35b7a92bc6cb54de42"}], "endpoints_links": [], "type": "compu
te", "name": "nova"}, {"endpoints": [{"adminURL": "http://192.168.0.2:8080", "re
gion": "RegionOne", "internalURL": "http://192.168.0.2:8080", "id": "4db0e11de35
74c889179f499f1e53c7e", "publicURL": "http://172.16.1.2:8080"}], "endpoints_link
s": [], "type": "s3", "name": "swift_s3"}, {"endpoints": [{"adminURL": "http://1
92.168.0.2:9292", "region": "RegionOne", "internalURL": "http://192.168.0.2:9292
", "id": "960a3ad83e4043bbbc708733571d433b", "publicURL": "http://172.16.1.2:929
2"}], "endpoints_links": [], "type": "image", "name": "glance"}, {"endpoints": [
{"adminURL": "http://192.168.0.2:8776/v1/8eee400f7a8a4f35b7a92bc6cb54de42", "reg
ion": "RegionOne", "internalURL": "http://192.168.0.2:8776/v1/8eee400f7a8a4f35b7
a92bc6cb54de42", "id": "055edb2aface49c28576347a8c2a5e35", "publicURL": "http://
172.16.1.2:8776/v1/8eee400f7a8a4f35b7a92bc6cb54de42"}], "endpoints_links": [], "
type": "volume", "name": "cinder"}, {"endpoints": [{"adminURL": "http://192.168.
0.2:8773/services/Admin", "region": "RegionOne", "internalURL": "http://192.168.
0.2:8773/services/Cloud", "id": "1e5e51a640f94e60aed0a5296eebdb51", "publicURL":
"http://172.16.1.2:8773/services/Cloud"}], "endpoints_links": [], "type": "ec2"
, "name": "nova_ec2"}, {"endpoints": [{"adminURL": "http://192.168.0.2:8080/",
"region": "RegionOne", "internalURL": "http://192.168.0.2:8080/v1/AUTH_8eee400f
7a8a4f35b7a92bc6cb54de42", "id": "081a50a3c9fa49719673a52420a87557", "publicURL
": "http://172.16.1.2:8080/v1/AUTH_8eee400f7a8a4f35b7a92bc6cb54de42"}], "endpoi
nts_links": [], "type": "object-store", "name": "swift"}, {"endpoints": [{"admi
nURL": "http://192.168.0.2:35357/v2.0", "region": "RegionOne", "internalURL": "
http://192.168.0.2:5000/v2.0", "id": "057a7f8e9a9f4defb1966825de957f5b", "publi
cURL": "http://172.16.1.2:5000/v2.0"}], "endpoints_links": [], "type": "identit
y", "name": "keystone"}], "user": {"username": "admin", "roles_links": [], "id"
: "717701504566411794a9cfcea1a85c1f", "roles": [{"name": "admin"}], "name": "ad
min"}, "metadata": {"is_admin": 0, "roles": ["90a1f4f29aef48d7bce3ada631a54261"
]}}}
REQ: curl -i http://172.16.1.2:8774/v2/8eee400f7a8a4f35b7a92bc6cb54de42/servers/

detail -X GET -H "X-Auth-Project-Id: admin" -H "User-Agent: python-novaclient" -
H "Accept: application/json" -H "X-Auth-Token: c321cd823c8a4852aea4b870a03c8f72"
INFO (connectionpool:191) Starting new HTTP connection (1): 172.16.1.2
Even though initial connection was in 192.168.0.2, the client tries to access the Public network for Nova API. The
reason is because Keystone returns the list of OpenStack services URLs, and for production-grade deployments it
is required to access services over public network.

Monitoring Guide Enable Ubuntu bootstrap (EXPERIMENTAL)
Enable Ubuntu bootstrap (EXPERIMENTAL)

By default, Fuel 7.0 uses CentOS 6.6 bootstrap operating system. Ubuntu 14.04 bootstrap is only available as an
experimental feature. See Deployment with Ubuntu 14.04 bootstrap in Release notes for Mirantis OpenStack 7.0 for
details about known issues with this feature.
To enable Ubuntu 14.04 bootstrap:
1. Enable experimental features.

2. Verify that you are logged as root into your Fuel Master node console and that your Master node has an
access to the Internet.
3. Run the fuel-bootstrap-image-set ubuntu command.
4. Run the ls -l /var/www/nailgun/bootstrap/ubuntu/root.squashfs command to verify that the Ubuntu image
is built successfully. The build log is available in /var/log/fuel-bootstrap-image-build.log.
5. Reboot the discovered nodes.

Monitoring Guide HA testing scenarios
HA testing scenarios
Currently, several testing scenarios are provided to check HA environment.
Regular testing scenarios

Nova-network
You can run the following tests on the supported operating system.
1. Deploy a cluster in HA mode with VLAN Manager. Steps to perform:
• Create a cluster.
• Add 3 nodes with controller role.
• Add 2 nodes with compute role.
• Set up a cluster to use Network VLAN manager with 8 networks.
• Deploy the cluster.
• Make sure that the cluster is configured correctly: there should be no dead services or no errors in the
logs. Also, you should check that all nova services are running and they are in up state; TestVM must
appear in Glance and only one nova-network should be present.
• Run network verification test.
• Run OSTF.
2. Deploy a cluster in HA mode with nova-network and Flat DHCP manager enabled. Steps to perform:
• Perform a security check: verify that it is impossible to access TCP or UDP unused ports.
• Run OSTF.
3. Add a compute node to a cluster in HA mode with nova-network with Flat DHCP manager enabled. Steps to
perform:
• Create cluster
• Add 3 nodes with controller role

appear in Glance and only one nova-network is present.
• Add one node with compute role.
• Re-deploy the cluster.
• Run OSTF.
4. Deploy an HA cluster with Ceph and nova-network: Steps to perform:
• Create a cluster: use Ceph for volumes and images.

• Add 3 nodes with controller and Ceph OSD roles.
• Add one node with Ceph OSD role.
• Add 2 nodes with compute and Ceph OSD roles.
• Start cluster deployment.
• Check Ceph status with ceph health command. Command output should have HEALTH_OK.
• Run OSTF.
5. Stop and reset nova-network cluster in HA mode. Steps to perform:
• Start cluster deployment.
• Stop deployment.
• Reset settings.
• Re-deploy the cluster.
• Run OSTF.
6. Deploy nova-network cluster in HA mode with Ceilometer. Steps to perform:
• Create a cluster. On Settings tab of the Fuel web UI, select Install Ceilometer option.
• Add one node with MongoDB role.

• Check that partitions on MongoDB node are the same as those selected on the Fuel web UI.
• Make sure that Ceilometer API is running (it must be present in ps ax output).
• Run OSTF.
7. Check HA mode on scalability. Steps to perform:
• Add 1 controller node.
• Add 2 controller nodes.
• Deploy the changes.
• Check Pacemaker status: all nodes must be online after running crm_mon -1 command.
• Add 2 controller nodes.
• Deploy the changes.
• Check that public and management vIPs have started after running crm_mon -1 command.
• Run OSTF.
8. Backup/restore Fuel Master node with HA cluster. Steps to perform:
• Create a cluster with 3 controllers and 2 compute nodes.

• Backup Fuel Master node.
• Check if the backup succeeded.
• Run OSTF.
• Add 1 node with compute role.
• Restore Fuel Master node.
• Check if restore procedure succeeded. Before backup, a file is created and its checksum is saved. After
backuping and restoring the environment, you get the checksum of this file and verify that it is equal
to the checksum that was saved before backup.
• Run OSTF.
Neutron
You can run the following tests on the supported operating system.
1. Deploy a cluster in HA mode with Neutron GRE segmentation. Steps to perform:


• Run OSTF.
2. Deploy a cluster in HA mode with Neutron GRE segmentation and public network assigned to all nodes.
Steps to perform:
• On Settings tab of the Fuel web UI, select Assign public networks to all nodes option.
• Check that public network is assigned to all nodes.
• Run OSTF.
3. Deploy a cluster in HA mode with Neutron VLAN. Steps to perform:
• Run OSTF.
4. Deploy cluster in HA mode with Neutron VLAN and public network assigned to all nodes. Steps to perform:
• On Settings tab of the Fuel web UI, select Assign public networks to all nodes option.
• Check that public network is assigned to all nodes.
• Run OSTF.
5. Deploy a cluster in HA mode with Murano and Neutron GRE segmentation. Steps to perform:
• Create a cluster. On Settings tab of the Fuel web UI, select Install Murano option.


• Add 1 node with compute role.
• Verify that Murano services are up and running (check that murano-api is present in 'ps ax' output on
every controller).
• Run OSTF.
• Register Murano image.
• Run Murano platform OSTF tests.
6. Deploy Heat cluster in HA mode. Steps to perform:
• Verify that Heat services are up and running (check that heat-api is present in 'ps ax' output on every
controller).
• Run OSTF.
• Register Heat image.
• Run OSTF platform tests.
7. Deploy a new Neutron GRE cluster in HA mode after Fuel Master is upgraded. Steps to perform:
• Create a cluster with 1 controller with Ceph, 2 compute nodes with Ceph; Ceph for volumes and images
should also be enabled.
• Run upgrade on Fuel Master node.
• Check that upgrade has succeeded.
• Deploy a new upgraded cluster with HA Neutron VLAN manager, 3 controllers, 2 compute nodes and 1
Cinder.
• Run OSTF.
Bonding
You can run the following tests on the supported operating system:
1. Deploy cluster in HA mode for Neutron VLAN with bonding. Steps to perform:
• Set up bonding for all interfaces in active-backup mode.

Monitoring Guide Failover testing scenarios

• Run OSTF.
2. Deploy cluster in HA mode for Neutron GRE with bonding. Steps to perform:
• Setup bonding for all interfaces in balance-slb mode.
• Run OSTF.
Failover testing scenarios
Warning
These scenarios are destructive and you should not try to reproduce them.
1. Neutron L3-agent rescheduling after L3-agent dies. Steps to perform:
• Create a cluster (HA mode, Neutron with GRE segmentation).

• Add one node with Cinder role.
• Manually reschedule router from the primary controller to another one.
• Stop L3-agent on a new node with - pcs resource ban p_neutron-l3-agent NODE command.
• Check whether L3-agent has been rescheduled.
• Check network connectivity from the instance with dhcp namespace.
• Run OSTF.
2. Deploy nova-network environment with Ceph in HA mode. Steps to perform:
• Create a cluster with Ceph for images and volumes.

• Add 3 nodes with controller and Ceph OSD roles.

• Add 1 node with Ceph OSD role.

• Add 2 nodes with compute and Ceph OSD roles.
• Check Ceph status with ceph-health command. Command output should have HEALTH_OK.
• Destroy a node with Ceph role and check Ceph status.
• Run OSTF and check Ceph status.
• Destroy the compute node with Ceph and check Ceph status.
• Restart 4 online nodes.
3. Monit on compute nodes for nova-network and Neutron. Steps to perform:
• Deploy HA cluster with nova-network or Neutron 3 controllers and 2 compute nodes.

• SSH to each compute node.
• Kill nova-compute service.
• Check that service has been restarted by Monit.
4. Pacemaker restarts heat-engine when AMQP connection is lost Steps to perform:
• Deploy HA cluster with nova-network or Neutron, 3 controllers and 2 compute nodes.

• SSH to any controller.
• Check heat-engine status.
• Block heat-engine AMQP connections.
• Check that heat-engine has stopped on the current controller.
• Unblock heat-engine AMQP connections.
• Check that heat-engine process is running with new pid.
• Check that AMQP connection has re-appeared for heat-engine.
The following testing scenarios (from 5 to 11) may be mixed with Nova or Neutron.
5. Shut down primary controller. Steps to perform:

• Deploy a cluster with 3 controllers and 2 compute nodes.

• Destroy the primary controller.
• Wait until MySQL Galera is up (command should return "On"):
SELECT VARIABLE_VALUE FROM information_schema.GLOBAL_STATUS WHERE VARIABLE_NAME \

= 'wsrep_ready'
• Run OSTF.
6. Shut down non-primary controller. Steps to perform:

• Destroy non-primary controller.
• Wait until MySQL Galera is up (it must return "On"):
"SELECT VARIABLE_VALUE FROM information_schema.GLOBAL_STATUS WHERE VARIABLE_NAME \

= 'wsrep_ready'
• Run OSTF.
7. Shut down management interface on the primary controller.
Note
When you use ifdown, ifup, or commands that call them, it can cause the Corosync service to update
the cluster state and in most cases leads to so-called split-brain: the test will fail. Instead, use ip link
set down <ethX> or physically disconnect the interface.
Steps to perform:


• Disconnect eth2 of the first controller via iptables.
• Wait for vip__ resources to migrate to the working controllers.
• Run 'smoke' OSTF tests.
• Restore connectivity to the first controller.
• Wait until Pacemaker specifies the lost controller as online.
• Wait for Pacemaker resources to become operational on all controllers.
• Run "sanity" and "smoke" OSTF tests.
• Repeat steps described above (from disconnecting eth2) for another controller.
• Run OSTF.
8. Delete all management and public vIPs on all controller nodes: Steps to perform:
• Delete all secondary vIPs.

• Wait till it gets restored.
• Ensure that vIp has restored.
• Run OSTF.
9. Terminate HAProxy on all controllers one by one: Steps to perform:
• Terminate HAProxy on every controller in cycle.

• Wait till it gets restarted.
• Go to another controller and repeat steps above.
• Run OSTF.
10. Terminate MySQL on all controllers one by one Steps to perform:
• Terminate MySQL on every controller in cycle.

• Wait until it gets restarted.
• Verify that MySQL has restarted.
• Go to another controller.
• Run OSTF.
11. Verify that resources are configured. Steps to perform:
• SSH to controller node.

• Verify that all resources are configured.
• Go to another controller.

Rally
1. Run Rally for generating typical activity on a cluster (for example, create or delete instance and/or
volumes). Shut down the primary controller and start Rally:
• Ensure that vIP addresses have moved to another controller.

• Ensure that VM is reachable from the outside world.
• Check the state of Galera and RabbitMQ clusters.
2. HA load testing with Rally. Steps to perform:
• Deploy HA cluster with Neutron GRE or VLAN, 3 MongoDB controllers and 4 Ceph compute nodes. You
should also have Ceph volumes and images enabled for Storage.
• Create an instance.
• Wait until instance is created.
• Delete the instance.
• Run Rally for generating the same activity on the cluster. In average, 500-1000 VMs should be created
using 50, 70 or 100 parallel requests.

Mirantis OpenStack v8.0 OpenStack Database Backup and Restore with
Monitoring Guide Percona XtraBackup
OpenStack Database Backup and Restore with Percona

XtraBackup
With the procedure described in this topic you will be able to back up and restore your OpenStack MySQL
database.
You will need to put the OpenStack environment into maintenance mode.
In the maintenance mode the following services will be unavailable:
• MySQL and HAProxy on the selected controller node

• HAProxy on other controller nodes in the cluster for a short time
Backing up with Percona XtraBackup

1. Enable the HAProxy stats socket for every controller in a cluster:
1. Open the /etc/haproxy/haproxy.cfg file for editing.

2. Find the stats socket /var/lib/haproxy/stats line in the global section and add level admin at the end of
the line.
1. Restart HAProxy in one of the following ways:
• Execute /usr/lib/ocf/resource.d/mirantis/ns_haproxy reload on every controller

Or
• Reload all HAProxy instances on all controllers in a cluster with a temporary services stop by running
the crm resource restart p_haproxy command.
2. On the Fuel Master node, run the fuel nodes | grep controller command. If the node that you are going to
back up is a host for a Neutron agent, you can move the agent to a different controller with the following
command:
ssh node-1
pcs resource move agent_name node_name
where "node-1" is the name of the node from which you would like to move.
3. For every controller in the cluster, put the MySQL service into maintenance mode by running the following
command from the Fuel Master node:
ssh -t node-1 'echo "disable server mysqld/node-1" | socat stdio /var/lib/haproxy/stats
4. Put the node into maintenance mode for Pacemaker:
ssh node-1
crm node maintenance

Monitoring Guide Restoring with Percona XtraBackup
where "node-1" is the name of the node from which you would like to move.
5. Stop data replication on the selected MySQL instance:
mysql -e "SET GLOBAL wsrep_on=off;"
6. Run the backup:
xtrabackup --backup --stream=tar ./ | gzip - > backup.tar.gz
7. Make a streaming backup as described in Percona Guide.

8. Move the archive file to a safe place.
9. Re-enable the data replication:
mysql -e "SET GLOBAL wsrep_on=on;"
10. Take the MySQL service out of maintenance mode with the following command for every controller in the
cluster:
ssh -t node-1 'echo "enable server mysqld/node-1" | socat stdio /var/lib/haproxy/stats'
11. Put the node into the ready mode:
ssh -t node-1 crm node ready
where "node-1" is the node that you have backed up.
Restoring with Percona XtraBackup

1. Remove grastate.dat (e.g. move to a different place) оn all nodes:
ssh node-1 mv /var/lib/mysql/grastate.dat /var/lib/mysql/grastate.old

2. Extract the database backup file on the first controller:
ssh node-1 'cd /var/lib/mysql/ ;tar -xvzf clear-base.tgz'
where "node-1" is the node that you have backed up.

3. Change the owner:
chown -R mysql:mysql /var/lib/mysql

Monitoring Guide Restoring with Percona XtraBackup
4. Export the variables for mysql-wss on all nodes:
export OCF_RESOURCE_INSTANCE=p_mysql
export OCF_ROOT=/usr/lib/ocf
export OCF_RESKEY_socket=/var/run/mysqld/mysqld.sock
5. Export the variable for mysql-wss on the first node:
export OCF_RESKEY_additional_parameters="--wsrep-new-cluster"
6. Start mysqld on the first controller:
/usr/lib/ocf/resource.d/fuel/mysql-wss start
7. Start mysqld on all other controllers:
/usr/lib/ocf/resource.d/fuel/mysql-wss start
8. Copy the extracted database backup.

9. Check the crm status for all nodes.

Monitoring Guide Writing a bootable Fuel ISO to a USB drive
Writing a bootable Fuel ISO to a USB drive

Having downloaded a Fuel ISO, and having plugged in your USB drive, issue the following command:
# dd if=/way/to/your/ISO of=/way/to/your/USB/stick
where /way/to/your/ISO is the path to your Fuel ISO, and /way/to/your/USB/stick is the path to your USB drive.
For example, if your Fuel ISO is in the /home/user/fuel-isos/ folder and your USB drive is at /dev/sdc, issue the
following:
# dd if=/home/user/fuel-isos/fuel-7.0.iso of=/dev/sdc
Note
This operation will wipe all the data you have on on the USB drive and will place a bootable Fuel ISO on
it. You also have to write the ISO to the USB drive itself, not to a partition on it.

Monitoring Guide Deploying an Empty Role through Fuel CLI
Deploying an Empty Role through Fuel CLI

Make sure there are zero environments:
[root@nailgun tmp]# fuel env

id | status | name | mode | release_id | pending_release_id
---|--------|------|------|------------|-------------------
Check the operating systems:
[root@nailgun tmp]# fuel release

id | name | state | operating_system | version
---|----------------------|-------------|------------------|-------------
2 | Kilo on Ubuntu 14.04 | available | Ubuntu | 2015.1.0-7.0
1 | Kilo on CentOS 6.5 | unavailable | CentOS | 2015.1.0-7.0
Note down the numbers under the id column. You will need these later.
Check the existing nodes:
[root@nailgun tmp]# fuel node

id | status | name | cluster | ip | mac | roles |
---|----------|------------------|---------|-------------|-------------------|-------|
10 | discover | Untitled (8a:15) | None | 10.109.0.4 | 64:dd:40:75:8a:15 | |
8 | discover | Untitled (96:c1) | None | 10.109.0.5 | 64:2e:f0:06:96:c1 | |
9 | discover | Untitled (8b:4a) | None | 10.109.0.3 | 64:7b:44:59:8b:4a | |
7 | discover | Untitled (d2:bf) | None | 10.109.0.12 | 64:79:31:7a:d2:bf | |
pending_roles | online | group_id

--------------|------------------
True | None
True | None
True | None
False | None
There are three nodes online: 8,9,10

Create a new environment:
[root@nailgun tmp]# fuel env create --name test --release 1

Environment 'test' with id=4, mode=ha_compact and network-mode=neutron was created!
Check if the environment has been created:
[root@nailgun tmp]# fuel env

id | status | name | mode | release_id | pending_release_id

Monitoring Guide Deploying an Empty Role through Fuel CLI
---|--------|------|------------|------------|-------------------
4 | new | test | ha_compact | 1 | None
Note down the id of the environment. You will need this later.
Check the existing roles:
[root@nailgun tmp]# fuel role --release 1

name | id
--------------|---
controller | 1
compute | 2
cinder | 3
cinder-vmware | 4
ceph-osd | 5
mongo | 6
base-os | 7
The role that you need is base-os.

Add the node whose id is 8 and the role is base-os to the environment whose id is 4:
[root@nailgun tmp]# fuel node set --env 4 --node 8 --role base-os

Nodes [8] with roles ['base-os'] were added to environment 4
Check the results:
[root@nailgun tmp]# fuel node

id | status | name | cluster | ip | mac | roles |
---|----------|------------------|---------|-------------|-------------------|-------|
10 | discover | Untitled (8a:15) | None | 10.109.0.4 | 64:dd:40:75:8a:15 | |
8 | discover | Untitled (96:c1) | 4 | 10.109.0.5 | 64:2e:f0:06:96:c1 | |
9 | discover | Untitled (8b:4a) | None | 10.109.0.3 | 64:7b:44:59:8b:4a | |
7 | discover | Untitled (d2:bf) | None | 10.109.0.12 | 64:79:31:7a:d2:bf | |
pending_roles | online | group_id

---------------|--------|---------
| True | None
base-os | True | 4
| True | None
| False | None
Your node with an empty role has been added to the cluster.

Monitoring Guide Configuring repositories
Configuring repositories
You may need to configure repositories to:
• Download Ubuntu packages

• Apply patches
By default, your OpenStack environments have the configuration of the repositories that point to the Mirantis
update and security repository mirrors. There is also an Auxiliary repository configured on the Fuel Master node,
which can be used to deliver packages to the nodes.
For the details, go to Settings > General > Repositories:
To change the list of repositories, amend the fields that contain the required information for the repositories
configuration depending on the distribution you install.
For Ubuntu
|repo-name|apt-sources-list-string|repo-priority|
my-repo deb http://my-domain.local/repo trusty main 1200
Repository priorities
The process of setting up repositories and repository priorities is the same one you normally do on your Linux
distribution.
For more information, see the documentation to your Linux distribution.

Monitoring Guide Downloading Ubuntu system packages
Downloading Ubuntu system packages

In Fuel 6.0 and older there is an option to select an Ubuntu release in Fuel and deploy it, since all the Ubuntu
packages are located on the Fuel Master node by default.
Now all Ubuntu packages are downloaded from the Ubuntu official mirrors by default, but you can specify
another mirror in Fuel UI or by using Fuel CLI.
Updates to the Mirantis packages are fetched from the Mirantis mirrors by default.
Note
To be able to download Ubuntu system packages from the official Ubuntu mirrors and Mirantis packages
from the Mirantis mirrors you need to make sure your Fuel Master node and Slave nodes have Internet
connectivity.
To change the Ubuntu system package repositories from the official ones to your company's local ones, do the
following:
1. In Fuel web UI, navigate to the Settings tab and then scroll down to the Repositories section.
2. Change the path under URI.
Note
You can also change the repositories after a node is deployed, but the new repository paths will only be
used for the new nodes that you are going to add to a cluster.
See also Configuring repositories.

There is also a fuel-createmirror script on the Fuel Master node that you can use to synchronize Ubuntu
packages to the Fuel Master node.
Setting up local mirrors

You can create and update local mirrors of Mirantis OpenStack and/or Ubuntu packages using the
fuel-createmirror script.
Note
The script supports only rsync mirrors. Please refer to the official upstream Ubuntu mirrors list.

The script uses a Docker container with Ubuntu to support dependencies resolution.
The script can be installed on any Red Hat based or Debian based system. On a Debian based system it requires
only bash and rsync. On a Red Hat based system it also requires docker-io, dpkg, and dpkg-devel packages (from
Fedora).
When run on the Fuel Master node, the script will attempt to set the created Mirantis OpenStack and/or Ubuntu
local repositories as the default ones for new environments, and apply these repositories to all the existing
environments in the "new" state. This behavior can be changed by using the command line options described
below.
The script supports running behind an HTTP proxy as long as the proxy is configured to allow proxying to Port
873 (rsync). The following environment variables can be set either system-wide (via ~/.bashrc), or in the script
configuration file (see below):
http_proxy=http://username:password@host:port/
RSYNC_PROXY=username:password@host:port
You may also want to configure Docker to use the proxy to download the Ubuntu image needed to resolve the
packages dependencies. Add the above environment variables to the file /etc/sysconfig/docker, and export them:
http_proxy=http://username:password@host:port/
RSYNC_PROXY=username:password@host:port
export http_proxy RSYNC_PROXY
Then, restart the docker daemon:
service docker restart
Or alternatively (recommended), reboot the Fuel Master node.

Issue the following command to get the fuel-createmirror help:
fuel-createmirror -h
OR
fuel-createmirror --help
To create or update a local Mirantis OpenStack mirror only, issue:
fuel-createmirror -M
OR
fuel-createmirror --mos
To create or update a local Ubuntu mirror only, issue:

fuel-createmirror -U
OR
fuel-createmirror --ubuntu
If no parameters are specified, the script will create/update both Mirantis OpenStack and Ubuntu mirrors.
Note
Options -M/--mos and -U/--ubuntu can't be used simultaneously.
To disable changing the default repositories for new environments, issue:
fuel-createmirror -d
OR
fuel-createmirror --no-default
To disable applying the created repositories to all environments, in the "new" state, issue:
fuel-createmirror -a
OR
fuel-createmirror --no-apply
Note
If you change the default password (admin) in Fuel web UI, you will need to run the utility with the
--password switch, or it will fail.
The following configuration file can be used to modify the script behavior:
/etc/fuel-createmirror/common.cfg

Monitoring Guide Installing on a Red Hat based server
In this file you can redefine the upstream mirrors, set local paths for repositories, configure the upstream
packages mirroring mode, set proxy settings, enable or disable using Docker, and set a path for logging. Please
refer to the comments inside the file for more information.
The following configuration file contains the settings related to Fuel:
/etc/fuel-createmirror/fuel.cfg
If you run the script outside of Fuel node, you may need to redefine the FUEL_VERSION and the FUEL_SERVER
parameters.
Installing on a Red Hat based server

1. Configure MOS RPM repository:
tee /etc/yum.repos.d/mos-rpm.repo <<EOF

[mos-rpm]
name=MOS RPM packages
baseurl=http://mirror.fuel-infra.org/fwm/6.1/centos/os/x86_64
gpgcheck=0
enabled=0
EOF
2. Install the package and its dependencies:
yum --enablerepo=mos-rpm install fuel-createmirror
3. Check and configure the settings in /etc/fuel-createmirror/common.cfg.

4. Make sure the Docker service is up and running.
5. Run fuel-createmirror
Debian-based server
1. Configure MOS DEB repository:
echo "deb http://mirror.fuel-infra.org/mos/ubuntu/ mos6.1 main restricted"\

| sudo tee /etc/apt/sources.list.d/mos-deb.list
2. Make apt-get update, then install the package apt-get install fuel-createmirror

3. Check and configure the settings in /etc/fuel-createmirror/common.cfg.
4. Run fuel-createmirror

Monitoring Guide Troubleshooting partial mirror
Troubleshooting partial mirror

If there some packages required by your installation missing from from the partial mirror created by the script,
add them to /etc/fuel-createmirror/requirements-deb.txt.
The package format to add to the requirements-deb.txt file is simple:
package1
package2
...
packageN
You can also look up the package names at the official Ubuntu website.
Having done that, restart the script. This will download all the missing packages and recreate a local partial
mirror.

Monitoring Guide Applying patches
Applying patches
This section describes how to apply, rollback, and verify the patches applied to the Fuel Master node and the Fuel
Slave nodes.
Introduction
Patching in brief:
• The patching feature was introduced in Mirantis OpenStack 6.1 and will not work in older releases.
• There are two types of patches: bug-fixes and security updates.
• Patches are downloaded from the Mirantis public repositories.
• You can always check what patches are available and get instructions on how to apply them at the
Maintenance Update section of the Release Notes.
• The changes that the patches introduce will be applied to the new OpenStack environments.
Usage scenarios
Default scenario
• Verify you are registered at the official Mirantis website.

• Once you are registered, you will receive regular email notifications on the available patches with a link to
the documentation on how to apply the patches. The documentation is available in the Maintenance
Updates section of the Release Notes.
• By default, you download patches from the default Mirantis mirrors. The patching repositories are as
follows:

• Check each patching item and proceed with the instructions (plan accordingly: for example, schedule a
maintenance slot to run the update).
• Patching Fuel Master node:
• Run the command specified in the documentation to install the patch.

• After the patch is installed, restart the affected service as specified in the documentation.
• Patching a slave node:
• Run the command specified in the documentation to download the patch.

Custom scenario: deploying from local mirrors; patching from local mirrors
In this custom scenario you deploy from your local mirrors and download patches from your local mirrors.
For information on how to create and update local mirrors of Mirantis OpenStack see Configuring repositories.

• Check each patching item and proceed with the instructions (plan accordingly).
• Make sure your local mirror is up to date: run fuel-createmirror -M


Custom scenario: deploying from Mirantis mirrors; patching from local mirrors
In this custom scenario you deploy from Mirantis mirrors and download patches from your local mirrors.

• Configure your local mirrors to download patches from Mirantis mirrors as described in Configuring
repositories.
• Check each patching item and proceed with the instructions (plan accordingly).

• Make sure your local mirror is up to date: run fuel-createmirror -M


Additional information
Rolling back patches
Note
Use the instructions listed here only for Mirantis OpenStack 6.1 and 7.0.
Note
The rollback instructions listed here are for advanced administrators. If you are not sure how to plan and
execute the rollbacks, your best option is to contact Mirantis support.
Rolling back Fuel Master node
• Roll back the packages on the Fuel Master node. Refer to this article as an example.
• Roll back all the changes to the configuration you made when applying the patching instructions.
• Run dockerctl destroy all.
• Run dockerctl start all.
• Wait for bootstrap to complete.
Rolling back an Ubuntu slave node
• Evacuate all the running resources from the node.

• Make sure new workloads are not scheduled to the node: Put nova services in maintenance, turn on
Pacemaker into maintenance mode etc.
• Look up the packages you want to roll back in /var/log/apt/history.log and
/var/log/dpkg.log.

• Figure out where to get the old package version. Run apt-cache policy.
• Figure out if the old package version is available locally.
• If it is, install these versions using dpkg. Otherwise, check the snapshots of previous repositories on
http://mirror.fuel-infra.org/mos/snapshots and pick the repository that contains the packages you need.
• Add this repository to the environment configuration.
• On the Fuel Master node run:
fuel node --node-id <comma_separated_list_of_nodes_you_want_to_update_repo> \

--tasks upload_core_repos
This will propagate the new repos configuration.

• Install the packages with specific versions:
apt-get install <pkg1>=<ver1> <pkg2>=<ver2>
• Roll back all the changes to the configuration you made when applying the patching instructions.
• Reboot the node.
Applying all accumulated changes in one go
Note
This set of actions should be applied carefully and with consideration. It is strongly recommended that
you do this on your test staging environment before applying the updates to production.
It is a good practice to apply the updates node by node so that you can stop the update procedure whenever an
issue occurs. It is also strongly recommended to back up all sensitive data that can be altered continuously
during the whole lifetime of your environment and the Fuel Master node.
These instructions assume that if you add any custom repositories to your environment configuration, these
commands will update your environment taking packages from these repositories.
Patching Fuel Master node
• Back up your data with dockerctl backup. This will save the data to /var/backup/fuel/.
• Run yum update.
• Run dockerctl destroy all.
• Run dockerctl start all.
• Wait for the new containers deployment to finish.
Patching an Ubuntu slave node

Mirantis OpenStack v8.0 Verifying the installed packages on the Fuel Master
Monitoring Guide node
• Run apt-get update.
• Run apt-get upgrade.
• Apply all the additional configuration options as described in the supporting documentation.
• Reboot the node.
Applying Puppet changes on a slave node

You may want to apply all changes on a slave node or run a single granular task so that Fuel Puppet changes
take effect.
To run a complete Puppet cycle on a slave node, run:
• Update fuel-libraryX.X on Fuel Master yum update

• Run fuel node --node NODE_ID --deploy
If you want to just update Puppet manifests and apply a single task, then run:
• Update fuel-libraryX.X on Fuel Master yum update

• Run fuel node --node node-XX --task rsync_core_puppet hiera globals TASK
Note
The tasks rsync_core_puppet, hiera, and globals are required for processing any Puppet changes.
Verifying the installed packages on the Fuel Master node

After you apply a patch to the Fuel Master node, you can verify that the Fuel Master node is using the latest
packages.
To verify the packages on the Fuel Master node:
1. Log in to the Fuel Master node CLI.

2. Type:
yum clean expire-cache

yum -y update
Verifying the installed packages on the Fuel Slave nodes

When you apply a patch to the Fuel Slave nodes, ensure that the versions of packages on all Fuel Slave nodes are
identical. Therefore, verify that the Fuel Slave nodes within one OpenStack environment have the same
repository configuration, as well as the same versions of packages are installed on all nodes.
To verify the packages are up-to-date on the Fuel Slave nodes:
1. Log in to the Fuel Master node CLI.

Mirantis OpenStack v8.0 Verifying the installed packages on the Fuel Master
Monitoring Guide node
2. Update the list of available packages:
apt-get update
3. Update all packages:
apt-get upgrade
4. Log in to the Fuel Master node GUI:

5. Click Support.
6. Generate and download a diagnostic snapshot by clicking Generate Diagnostic Snapshot.
The Fuel Master node generates ubuntu_installed_debs.txt.
7. Analyze ubuntu_installed_debs.txt to verify the versions of the packages.
Additionally, you can analyze the ubuntu_repo_list.txt file to verify the repositories.

Monitoring Guide Using the reduced footprint feature
Using the reduced footprint feature

With the reduced footprint feature you can spawn virtual machines on nodes.
This can be useful in the following scenarios (but not limited to them):
• Run a minimal two node cluster on a single physical machine.

• Put external services on the spawned virtual machines (e.g. a monitoring service).
• Run three controllers on virtual machines on three different physical machines.
Reduced footprint flow in brief

Minimal requirements:
• Two bare metal nodes. Alternatively, you can have one virtual machine (with Fuel installed on it) and one
bare metal.
• Fuel 7.0 ISO.
Deployment flow:
1. Install Fuel on a bare metal or virtual machine.

2. Boot another bare metal machine via Fuel PXE.
3. Enable the Advanced feature group in Fuel.
4. Create a new environment in Fuel.
5. Optionally, modify the libvirt VM template on the Fuel Master node:
/etc/puppet/modules/osnailyfacter/templates/vm_libvirt.erb The default template
supports tunneling segmentation. If you use VLAN segmentation, change the bridge name 'br-mesh' to
'br-prv' and set type to 'openvswitch'. For example:
<interface type='bridge'>
<source bridge='br-prv'/>
<virtualport type='openvswitch'/>
<model type='virtio'/>
</interface>
6. If you use tagged VLANs (VLAN segmentation or 'Use VLAN tagging' in the "Networks" tab), you should
upload a network template. For details see Using Networking Templates. See also network template samples
for reduced footprint:
• VLAN segmentation
• VLAN tagging
7. Assign the "virt" role to the discovered node.
8. Upload the virtual machine configuration to Fuel.
9. Provision the bare metal node with the "virt" role. This will also spawn the virtual machines.
10. Assign roles to the spawned and discovered virtual machines.

Monitoring Guide Reduced footprint flow detailed
11. Deploy the environment.

12. Migrate the Fuel server as an additional virtual machine located on the physical server.
Reduced footprint flow detailed

1. Install Fuel on a bare metal or virtual machine as described in the Fuel Installation Guide.
2. Boot another bare metal machine via Fuel PXE.
3. Enable the Advanced feature group in Fuel. On the Fuel Master node edit the
/etc/fuel/version.yaml file and add advanced under feature groups there. Here is a sample:
VERSION:
feature_groups:
- mirantis
- advanced
Having added "advanced" to the yaml file, issue the following commands:
dockerctl shell nailgun

supervisorctl restart nailgun
4. Create a new environment in Fuel.

5. Assign the "virt" role to the discovered node. On the Fuel Master node, issue the following command:
fuel --env-id=<ENV_ID> node set --node-id=<NODE_ID> --role=virt
where <NODE_ID> points to a specific node identified by its ID (a number) that you can get by issuing the
fuel nodes command; <ENV_ID> points to the environment ID respectively; you can get the environment
ID by issues the fuel environment command.
For example:
fuel --env-id=1 node set --node-id=1 --role=virt
6. Upload the virtual machine configuration to Fuel. On the Fuel Master node, issue the following command:
fuel2 node create-vms-conf <NODE_ID> --conf '{"id":<VM_ID>, \

"mem":<MEMORY_SIZE>,"cpu":<CPU_CORE_COUNT>}'
For example:
fuel2 node create-vms-conf 2 --conf '{"id":1,"mem":2,"cpu":4}'
where <NODE_ID> is "virt" node ID, <VM_ID> is VM ID that should be unique on that "virt" node,
<MEMORY_SIZE> is the memory amount in gigabytes, and <CPU_CORE_COUNT> is the number of CPUs.
7. Provision the bare metal node with the virtual role and spawn virtual machines. At this point you can go
back to the Fuel UI. On the Dashboard there you will see the Provision VMs button that you need to click.
Alternatively, you can do this through Fuel CLI on the Fuel Master node by issuing the following command:
fuel2 env spawn-vms <CLUSTER_ID>
For example:
fuel2 env spawn-vms 1
8. Assign controller roles to the spawned virtual machines. Alternatively, you can do this through Fuel CLI by
issuing the following command:
fuel --env-id=<ENV_ID> node set --node-id=<NODE_ID> --role=controller
You can specify several nodes with the --node-id parameter. For example:
fuel --env-id=1 node set --node-id=2,3,4 --role=controller
9. Deploy the environment using Fuel UI or Fuel CLI. If you deploy the OpenStack environment using Fuel CLI,
type:
fuel --env <ENV_ID> node --deploy --node-id=<NODE_ID>
You can specify several nodes with the --node-id parameter. For example:
fuel --env 1 node --deploy --node-id=1,2,3,4
10. Use the fuel-migrate scrpt to migrate the Fuel Master node into into a virtual machine on a compute node.
This allows for reduced resource use in small environments and lets the Fuel Master node run on physical
or virtual machines by essentially making it host agnostic.
To run the script issue the following command:
fuel-migrate
Note
This will give you all the available parameters to properly do the migration with the
fuel-migrate script.

Simple usage scenario:
1. Identify the node with the compute role by issuing the following command on the Fuel Master node
(and checking its output):
fuel node
2. Run the migration script:
fuel-migrate <DESTINATION_COMPUTE>
where <DESTINATION_COMPUTE> is the name or IP address of the destination compute node where
the virtual machine will be created.
For example:
fuel-migrate node-1
Or:
fuel-migrate 192.168.116.1
Note
You can get the node name or the IP address by issuing the fuel node command.
Once you start the script, it will do the following:
1. Create a blank disk image on the destination node, define the virtual machine, start the virtual
machine, and boot with Fuel PXE server.
2. Partition the disk on the destination node.
3. Reboot the Fuel Master node into maintenance mode and synchronize the data.
4. Swap the IP address on the source and destination Fuel Master nodes. It will then reboot the
destination virtual machine.
An indication of that the script has run successfully will be the following message (with additional details
on how to proceed) after you log in to the Fuel Master node via SSH:
Congratulation! You are on cloned Fuel now!

The migration tasks have completed. The clone should be up and
functioning correctly.
Additional notes:
• You can
©2016, Mirantis Inc.define the destination disk size in gigabytes with the --fvm_disk_size parameter.
Page 133
For example:
fuel-migrate node-1 --fvm_disk_size=50g
• By default, the destination node will use the admin network interface. If you need to create additional
interfaces, you can do so with the --other_net_bridges parameter.
For example:
fuel-migrate node-1 --other_net_bridges=eth1,,virbr13
Note
Pay attention that --other_net_bridges uses three parameters, and if you skip one of
these as in this example, you still need to separate it with commas ,,.
• By default, the migration log file is /var/log/fuel-migrate.log. If the migration fails, check the log for
errors.
Custom usage example:
fuel-migrate 192.168.116.1 --admin_net_br=virbr12 --del_vm \

--other_net_bridges=eth1,,virbr13 --fvm_disk_size=100g \
--dkvm_folder=/var/lib/libvirt/images/
This example will do the following:
1. Set the destination compute node with the IP address 192.168.116.1

2. Use virbr12 on the host to connect to the admin interface (which is public network connected to the
current Fuel Master node in this case).
3. Remove the destination virtual machine if it exists.
4. Use virbr13 for Ethernet 1.
5. Set the destination disk size to 100 GB.
6. Set the path to the folder on KVM hist where disk will be created to /var/lib/libvirt/images/

Monitoring Guide Switching on SSL and secure access
Switching on SSL and secure access

To configure secure access to Horizon and OpenStack public endpoints, go to Settings > Security, and configure
public TLS:
Horizon dashboard and the OpenStack publicURL endpoints

The HTTPS for Horizon checkbox enables SSL access to the Horizon dashboard.
Note
With the HTTPS enabled, you are not able to access the Horizon dashboard through plain HTTP. You will
automatically be redirected to HTTPS port 8443.
The TLS for OpenStack public endpoints checkbox enables TLS termination on HAProxy for OpenStack services.
Note
With TLS for OpenStack public endpoints enabled, you are not able to access the public endpoints through
plain HTTP.
After enabling one or both of the secure access options, you will need to generate or upload a certificate and
update your DNS entries:
1. Select the certificate source:
• Self-signed -- The certificate will be generated before the environment deployment.

Monitoring Guide HTTPS access to the Fuel Master node
• I have my own keypair with certificate -- You will need to upload a file with the certificate information
and a private key that can be consumed by HAProxy. For detailed information read HOWTO SSL
NATIVE IN HAPROXY.
2. Update your DNS entries -- Set the DNS hostname for public TLS endpoints. This hostname will be used in
the two following cases:
• When setting up DNS in the cluster.

• As a name for OpenStack services when adding them to Identity. For example, you will see this name
when you issue the keystone --endpoint-list command on one of the Controllers in a
deployed cluster.
HTTPS access to the Fuel Master node

You can now access the Fuel Master node through HTTPS. To do this, you will need to use port 8443:
https://10.20.0.2:8443
Additional information
• Changing keypairs for a cluster -- There is currently no automated way to do this. You can manually change
the keypairs in /var/lib/astute/haproxy/public_haproxy.pem on Controller and Compute
nodes. Make sure you restart the HAProxy service after you edit the file.
• Changing keypairs for the Fuel Master node -- You need to write the key to
/var/lib/fuel/keys/master/nginx/nginx.key and the certificate to
/var/lib/fuel/keys/master/nginx/nginx.crt. Make sure you restart nginx after that.
• Making access to the Fuel Master node HTTPS only -- Edit the /etc/fuel/astute.yaml file so that it
contains the following:
SSL:
force_https: true
Note
Currently Fuel CLI does not support HTTPS. You will also need to update fuel-nailgun-agent
on all nodes deployed with older than 7.0, otherwise they will be reported as inactive.

Monitoring Guide Using Networking Templates
Using Networking Templates

Starting with Fuel 7.0 you can use networking templates. Templates allow for more flexible network
configurations and provide you with the following abilities:
• Ability to create additional networks (e.g. an extra network for Swift) and delete networks.
• Have a specific set of network roles.
• Ability to create a network only if a relevant node role is present on the node.
• Ability to provide custom networking topologies (e.g. subinterface bonding).
Networking Templates Limitations

• Interdependencies between templates for different node roles cannot be set.
• Network roles to networks mapping and network topology cannot be set for nodes individually. They can
only be set for node role or/and node group.
• There is no UI support for networking templates. You can only operate via CLI or API. The "Configure
Interfaces" tab of Fuel Web UI will become inactive after you upload a networking template.
Note
If you delete the template, Fuel's default network solution will automatically become live and all network
related sections in Fuel Web UI will become available again.
Working with Networking Templates

A networking template is a YAML file in the following format:
network_template_<ENV_ID>.yaml
where <ENV_ID> is the ID of your OpenStack environment that you can get by issuing the fuel environment
command.
For example, if the ID of your environment is 1, the name of the template must be
network_template_1.yaml to operate with the template via Fuel CLI.
Networking Templates Samples

You can download samples from the network_templates repository folder.

Note
There is no default or generated template in your Fuel installation provided by default.
Networking Templates Structure

Each template consists of five major sections.
• adv_net_template -- This is the network configuration template for the environment. The template
operates with node groups. Sample:
adv_net_template:
default: # name of the node group
nic_mapping:
...
templates_for_node_role:
...
network_assignments:
...
network_scheme:
...
group_11: # name of the node group
nic_mapping:
network_scheme:
The following four sections are defined for each node group in the environment. Definitions from the
default node group will be used for the node groups not listed in the template.
• nic_mapping -- Aliases to NIC names mapping are set here.
• templates_for_node_role -- List of template names for every node role used in the environment.
• network_assignments -- Endpoints used in the template body. This is where the mapping is set
between endpoints and network names to set the L3 configuration for the endpoints.
• network_scheme -- Template bodies for every template listed under templates_for_node_role
nic_mapping section detailed

Sample:
nic_mapping:
default:
if1: eth0
if2: eth1

if3: eth2
if4: eth3
node-33:
if1: eth1
if2: eth3
if3: eth2
if4: eth0
NIC aliases (e.g. "if1") are used in templates in the topology description in the transformations section. With
nic_mapping you can set mapping of aliases to NIC names for different nodes.
The default mapping is set for all nodes that do not have name aliases. Custom mapping can be set for any
particular node (by node name).
The number of NICs for any node may vary. It depends on the topologies defined for the nodes in templates in
the transformations section.
Use of aliases in templates is optional. You can use NIC names if all nodes have the same set of NICs and they
are connected in the same way.
templates_for_node_role section detailed

Sample:
controller:
- public
- private
- storage
- common
compute:
- common
- private
- storage
ceph-osd:
- common
- storage
This is where you provide the list of template names for every node role used in the environment.
The order of templates matters. The description of the topology that is in the transformations section of the
template is executed by Puppet in the order provided on its input. Also, the order of creating the networking
objects cannot be arbitrary. For example, a bridge should be created first, and the subinterface that will carry its
traffic should be created after that.
While templates can be reused for different node roles, each template is executed once for every node.
When several roles are mixed on one node, an alphabetical order of node roles is used to determine the final
order of the templates.
network_assignments section detailed

Sample:
storage:
ep: br-storage
private:
ep: br-prv
public:
ep: br-ex
management:
ep: br-mgmt
fuelweb_admin:
ep: br-fw-admin
Endpoints are used in the template body. The mapping is set here between endpoints and network names to get
the networks' L3 configuration to be set for endpoints.
The sample above shows the default mapping which is set without a template. The set of networks can be
changed using API: networks can be created or deleted via API.
network_scheme section detailed

Sample:
network_scheme:
storage: # template name
transformations:
...
endpoints:
...
roles:
...
private:
transformations:
...
endpoints:
...
roles:
...
...
Each template has a name which is referenced in the sections above and consists of the three following sections:
• transformations -- A sequence of actions to build proper network topology is defined here. The
"transformation" from physical interface to endpoint is described here. The transformations are applied by
the Puppet l23network module and must be compatible with it.
• endpoints -- All endpoints introduced by the template.

• roles -- The mapping of network roles to endpoints. When several templates are used for one node there
should be no contradictions in this mapping.
Operating with Networking Templates
Note
The order in which you add or remove networks and load the the template does not matter. However,
adding or removing networks will not make sense if a template is not uploaded for the environment at all,
because the default network solution takes into account only the networks created by default.
To upload a networking template, on the Fuel Master node issue the following command:
fuel --env <ENV_ID> network-template --upload --dir <PATH>
where where <ENV_ID> is the ID of your OpenStack environment that you can get by issuing the
fuel environment command; <PATH> is the path to where your template is.
For example:
fuel --env 1 network-template --upload --dir /home/stack/
To download a networking template to the current directory, on the Fuel Master node issue the following
command:
fuel --env <ENV_ID> network-template --download
For example:
fuel --env 1 network-template --download
To delete an existing networking template, on the Fuel Master node issue the following command:
fuel --env <ENV_ID> network-template --delete
For example:
fuel --env 1 network-template --delete
To create a network group, issue the following command:

fuel network-group --create --node-group <GROUP_ID> --name \

"<GROUP_NAME>" --release <RELEASE_ID> --vlan <VLAN_ID> \
--cidr <NETWORK_CIDR>
where <GROUP_ID> is the ID of your Node group that you can get by issuing the fuel nodegroup command;
<GROUP_NAME> is the name that you would like to assign to your group; <RELEASE_ID> is the ID of your release;
<VLAN_ID> is the VLAN ID; <NETWORK_CIDR> is an IP address with an associated routing prefix.
For example:
fuel network-group --create --node-group 1 --name \

"new network" --release 2 --vlan 100 --cidr 10.0.0.0/24
To list all available network groups issue the following command:
fuel network-group list
To filter network groups by node group:
fuel network-group --node-group <GROUP_ID>
For example:
fuel network-group --node-group 1
To delete network groups:
fuel network-group --delete --network <GROUP_ID>
For example:
fuel network-group --delete --network 1
You can also specify multiple groups to delete:
fuel network-group --delete --network 2,3,4

Monitoring Guide Network Template Examples
Network Template Examples

You can use network templates to configure Fuel to use one or two networks for all OpenStack network traffic.
Configuring Two Networks

Fuel supports the two-network configuration where one network interface is dedicated for PXE traffic and
another network interface, or bonding, for all other traffic.
To configure two networks:
1. Create a new network for all non-PXE traffic:
# fuel network-group --create --name everything --cidr <cidr>

--gateway <gateway> --nodegroup <nodegroup>
2. Set the render_addr_mask parameter to internal for this network by typing:
# fuel network-group --set --network 39 --meta '{"name":

"everything", "notation": "cidr", "render_type": null, "map_priority": 2,
"configurable": true, "use_gateway": true, "render_addr_mask":
"internal", "vlan_start": null, "cidr": "10.108.31.0/24"}'
This parameter is required by the Fuel library. The Fuel library requires a value called
internal_address for each node. This value is set to the node's IP address from a network group which
has render_addr_mask set to internal in its metadata. Therefore, update render_addr_mask for this
network.
3. Save network template for two networks as network_template_<env id>.yaml.
Note
Verify that nic_mapping matches your configuration.
4. Upload the network template by typing:
# fuel network-template --upload --env <env id>

6. After Fuel completes the deployment, verify that only one bridge is configured by typing:
# ip -4 a
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default

Monitoring Guide Configuring a Single Network
inet 127.0.0.1/8 scope host lo

valid_lft forever preferred_lft forever
8: br-fw-admin: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group
inet 10.108.5.3/24 brd 10.108.5.255 scope global br-fw-admin
16: vr-host-base: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP
inet 240.0.0.5/30 scope global vr-host-base
30: hapr-host: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP gro
inet 240.0.0.1/30 scope global hapr-host
Configuring a Single Network

Fuel supports a single network configuration where one network interface is responsible for all OpenStack traffic.
This configuration is common in the proof of concept deployments where no additional network interfaces are
available.
To configure a single network:
1. Save network template for one network as network_template_<env id>.yaml.

2. Upload the network template by typing:
# fuel network-template --upload --env <env id>

4. Proceed to Configure Neutron.
Configure Neutron
After you deploy your environment, allocate the correct floating IP pool to the network.
To allocate the correct floating IP pool:
1. Clear the gateway from router04.

2. Delete the admin_floating_net__subnet subnet.
3. Create a new subnet with the floating IP pool from the single network.
4. Set gateway on router04.

Monitoring Guide Index
Index
F
Fuel UI: Network Issues
H
Horizon
HowTo: Backport Galera Pacemaker OCF script
HowTo: Backport Memcached backend fixes
HowTo: Backport RabbitMQ Pacemaker OCF script
HowTo: Backup and Restore Fuel Master
HowTo: Create an XFS disk partition
HowTo: Functional tests for HA
HowTo: Galera Cluster Autorebuild
HowTo: Manage OpenStack services
HowTo: Troubleshoot AMQP issues
HowTo: Troubleshoot Corosync/Pacemaker

Mirantis OpenStack 8.0 OperationsGuide

Uploaded by

Copyright:

Available Formats

Mirantis OpenStack 8.0 OperationsGuide

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Mirantis OpenStack 8.0 OperationsGuide

Uploaded by

Copyright:

Available Formats

Mirantis OpenStack Operations Guide

©2016, Mirantis Inc. Page i

©2016, Mirantis Inc. Page ii

©2016, Mirantis Inc. Page iii

Fuel Master node log rotation 81

©2016, Mirantis Inc. Page iv

Backing up with Percona XtraBackup 112

©2016, Mirantis Inc. Page v

Networking Templates Structure 138

©2016, Mirantis Inc. Page vi

Revision Date Description

©2016, Mirantis Inc. Page 1

• OpenStack Admin User Guide

©2016, Mirantis Inc. Page 2

Accessing the shell on the nodes

Uploading Public Keys

1. Generate a Public Key with the following command on your client:

SSH to the Fuel Master Node

SSH to target nodes

id | status | name | cluster | ip | mac | roles | pe

©2016, Mirantis Inc. Page 3

6 | ready | Untitled (34:84) | 2 | 10.110.0.4 | f2:b3:1a:74:da:41 | cinder |

©2016, Mirantis Inc. Page 4

How To: Exclude some drives from RAID-1 array

1. Use the Fuel CLI to obtain provisioning data:

fuel provisioning --env-id 1 -d

2. Remove the drive which you do not want to be part of RAID:

fuel provisioning --env-id 1 -u

4. Confirm that your partition is not included in the RAID array:

[root@node-2 ~]# cat /proc/mdstat

©2016, Mirantis Inc. Page 5

How To: Modify Kernel Parameters

Using the Cobbler web UI to set kernel parameters

©2016, Mirantis Inc. Page 6

Using the dockerctl command to set kernel parameters

`dockerctl shell cobbler cobbler profile edit --name bootstrap --kopts="intel_iommu=off" --

©2016, Mirantis Inc. Page 7

HowTo: Create an XFS disk partition

1. Create the partition itself

2. Initialize the XFS partition

mkfs.xfs -i size=1024 -f /dev/sdb1

echo "/dev/sdb1 /srv/node/sdb1 xfs \

©2016, Mirantis Inc. Page 8

HowTo: Enable/Disable Galera Cluster Autorebuild Mechanism

crm_resource unmanage clone_p_mysql

To re-enable autorebuild feature you should do:

crm_resource manage clone_p_mysql

cibadmin --query --xpath "//nodes/*/*/nvpair[@name=\"gtid\"]"

To try an automated reassemble without reboot if cluster is broken just issue:

crm resource restart clone_p_mysql

cibadmin --delete-all --query --xpath "//nodes/*/*/nvpair[@name=\"gtid\"]" --force

©2016, Mirantis Inc. Page 9

HowTo: Backport Galera Pacemaker OCF script

1. Set the p_mysql primitive in maintenance mode:

crm configure edit p_mysql \

2. Check the status. It should show clone_p_mysql primitives as Unmanaged:

perl -pi -e 's/--wsrep-new-cluster/--wsrep-cluster-address=gcomm:\/\//g' \

5. Copy the script to all controllers:

for i in $(fuel nodes | awk '/ready.*controller.*True/{print $1}'); \

cibadmin --query --xpath "//nodes///nvpair[@name=\"gtid\"]"

cibadmin --delete-all --query --xpath "//nodes///nvpair[@name=\"gtid\"]" --force

for i in $(fuel nodes | awk '/ready.controller.True/{print $1}'); \

for i in $(fuel nodes | awk '/ready.controller.True/{print $1}'); \

for i in $(fuel nodes --env <env_ID> | awk '/ready.controller.True/{print $1}'); do

for i in $(fuel nodes --env <env_ID> | awk '/ready.controller.True/{print $1}'); \