Jan 032017
 

In this step by step guide I will take you through all steps required to configure a highly available, 2-node MySQL cluster (plus witness server) in Google Cloud Platform (Google Compute Engine, aka GCE).  The guide includes both screenshots, shell commands and code snippets as appropriate.  I assume that you are somewhat familiar with Google Cloud Platform and already have an account.  If not, you can sign up for a free trial today.  I’m also going to assume that you have basic linux system administration skills as well as understand basic failover clustering concepts like Virtual IPs, quorum, etc.

Disclaimer: The cloud is a rapidly moving target.  As such, features/screens/buttons are bound to change over time so your experience may vary slightly from what you’ll see below.  While this guide will show you how to make a MySQL database highly available, you could certainly adapt this information and process to protect other applications or databases, like SAP,PostgreSQL, Oracle, WebSphere MQ, NFS file servers, and more.

These are the high level steps to create a highly available MySQL database within Google Compute Engine:

  1. Create a Project
  2. Create Instances (Virtual Machines)
  3. Create Instance Groups
  4. Create Firewall Rule to allow VNC access
  5. Linux OS Configuration
  6. Install and Configure MySQL
  7. Install and Configure Cluster
  8. Create an Internal Load Balancer
  9. Create Firewall Rules for Internal Load Balancer
  10. Test Cluster Connectivity

Overview

This article will describe how to create a cluster within a single Google Cloud region.  The cluster nodes (node1, node2 and the witness server) will all reside in the region “us-central1” (10.128.0.0/20 network) but you can select your region accordingly.

The configuration will look like this:

The following IP addresses will be used:

  • node1: 10.128.0.2
  • node2: 10.128.0.3
  • witness: 10.128.0.4
  • Internal Load Balancer IP: 10.128.0.99
  • MySQL port: 3306

Create a Project

Upon first login, you’ll see an empty Home Dashboard and will be asked to create a Project.  All Google Compute Engine resources we will be creating will belong to this Google Cloud Platform project.

Here we will call our newly created Project “LinuxCluster”:

Create Instances (Virtual Machines)

We will be provisioning 3 Virtual Machines in this guide.  The first two VMs (I’ll call them “node1” and “node2”) will function as cluster nodes with the ability to bring the MySQL database and it’s associated resources online.  The 3rd VM will act as the cluster’s witness server for added protection against split-brain.

To ensure maximum availability, all 3 VMs will reside in different zones within the region (in this example: us-central1-a, us-central1-b, us-central1-c).

Create “node1” Instance

Create your first VM instance (“node1”).  If this is the first time you’ve created an instance, your screen will look something like the image below. Click “Create Instance” button in the center of your screen:

If you have other instances already running in GCE, your screen will look a bit different.  Still, click “create instance” to continue:

By default, Debian linux is usually selected by default.  We *DON’T* want this as we’ll be using CentOS 6.X in this guide.

Give the instance a name (“node1”), select the first zone (a) within our region (us-central1) and make sure to click “Change” to select the proper boot image.  You can size the instance based on your workload requirements, but for the purposes of this guide we’ll be using the default size to minimize cost, which is a fairly small VM (1 vCPU and only 3.75GB RAM)

In the Boot disk pop up screen, select CentOS 6 and at the bottom we will go with an SSD boot disk.  10GB is more than sufficient for the purposes of this guide.  You can size your systems accordingly:

After clicking “Select”, you’ll be brought back to the Create An Instance screen.  Towards the bottom, click “Management, disk, networking, SSH keys” because we will be adding a 2nd disk to our VM.  This 2nd disk will be used to store our databases, and it’s what will later be replicated/synchronized by the clustering software.

 

Select the “Disks” tab, and click “+ Add item” to add a 2nd disk to this instance:

Click “Create disk”:

Give the new disk a name, select desired type, and start with a blank disk.  10 GB should be more than sufficient for our needs here in this example configuration.  NOTE: Remember the value you set here.  Both cluster nodes (node1 and node2) needs to be the SAME SIZE:

Finally, click the “Networking” tab, and give node1 a customer Internal IP.

Click “Create” to launch your new instance:

Create “node2”

Repeat the steps above twice to create your second cluster node (“node2”).  Create this instance just like you did node1, including the addition of the 2nd disk.

IMPORTANT: make sure it’s in a different zone (us-central1-b) and give it a unique IP (10.128.0.3)

Create the “witness” VM

Create your third VM (“witness”) and make sure it’s in a different zone (us-central1-c) from the first two instances.

NOTE: This instance DOESN’T need the extra disk added.

It may take a little while for your 3 VM instances to provision.  Once complete, you’ll see your VMs listed on the VM Instances screen within your Google Cloud Console.  Verify that you properly launched each VM into a different zone:

 

Create Instance Groups

Later in this guide we will be creating an Internal Load Balancer to route traffic to the active cluster node.  All of the possible load balancer configurations available on Google Cloud Platform require instance groups to serve the traffic sent from the load balancer.

Two instance groups will be created, and each one will contain one cluster node.

Create Instance Group 1

Give your first instance group a name (“instance-group-1a”), select “Single-zone” and make sure to properly select the Zone where your first VM instance resides.  Here, we select us-central-1a, because that is where “node1” was deployed.  Below, choose “Select existing instances” and pick “node1” from the VM instances drop down:

Create Instance Group 2

Repeat the previous step once more, this time selecting the zone your second node resides in.  us-central-1b and node2:

Create Firewall Rule to allow VNC access

You can see all of your existing firewall rules by navigating to:

Networking -> Firewall Rules

By default, the only ports open in the “Google firewall” from the outside world into your VMs are are ping, SSH (port 22) and RDP (port 3389).

Later in the guide, we’ll be using VNC to access the desktop of “node1” and configure the cluster using a GUI.  Create an Firewall Rule to allow VNC access.  In this guide port 5902 is used.  Adjust this according based on your VNC configuration.

 

Linux OS Configuration

Next, we will need to configure the Linux OS of our instances and get our hands dirty on the command line, which as a Linux administrator you should be used to by now.

There are a number of ways to connect to the console of your linux VMs.  You can initiate an SSH connection directly from the GCE web interface, or you can install the Google Cloud SDK locally on your laptop/workstation.

To SSH using your browser, simple go to Compute -> VM Instances and to the right of the VM you wish to connect to, under “Connect” select “Open in browser window”.

If you would prefer to have the Google Cloud command line tools installed natively on your laptop/workstation, please see the documentation here:

https://cloud.google.com/sdk/docs/quickstarts

Once connected use the “sudo” command to gain root privileges:

$sudo su -

Edit /etc/hosts

Unless you have already have a DNS server setup, you’ll want to create host file entries on all 3 servers so that they can properly resolve each other by name

Add the following lines to the end of your /etc/hosts file:

10.128.0.2    node1
10.128.0.3    node2
10.128.0.4    witness
10.128.0.99   mysql-vip

Disable SELinux

Edit /etc/sysconfig/linux and set “SELINUX=disabled”:

# vi /etc/sysconfig/selinux

# This file controls the state of SELinux on the system.
# SELINUX= can take one of these three values:
#     enforcing - SELinux security policy is enforced.
#     permissive - SELinux prints warnings instead of enforcing.
#     disabled - No SELinux policy is loaded.
SELINUX=disabled
# SELINUXTYPE= can take one of these two values:
#     targeted - Targeted processes are protected,
#     mls - Multi Level Security protection.
SELINUXTYPE=targeted

Install various RPM packages

Next, install a handful of rpm packages that will be needed later as pre-requisites for our clustering software:

# yum install redhat-lsb patch

Install and Configure VNC (and related packages)

In order to access the GUI of our linux servers, to later configure our cluster, install VNC server on your cluster node.  In my setup I only did this on “node1”

# yum install tigervnc-server xterm
# vncpasswd
# vi /etc/sysconfig/vncservers

      VNCSERVERS="2:root"
      VNCSERVERARGS[2]="-geometry 1024x768"

# service vncserver start
# chkconfig vncserver on

Test connectivity by opening a VNC client on your laptop/desktop, and connecting to the Public IP of your cluster node

Reboot Cluster Nodes

Reboot so that SELinux is disabled. All 3 systems (node1, node2, witness) need to be rebooted.

Partition and Format the “data” disk

During VM instance creation, an extra disk was added to each cluster node to store the application data we will be protecting.  In this case it happens to be MySQL databases.

The disk configuration of our VMs is as follows:

  • /dev/sda – OS disk
  • /dev/sdb – data disk

The 2nd disk added during instance creation  /dev/sdb.  You can run the “fdisk -l” command to verify.  You’ll see that /dev/sda (OS) already has a disk partition and is being used.

# fdisk -l

Disk /dev/sda: 10.7 GB, 10737418240 bytes
255 heads, 63 sectors/track, 1305 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes
Sector size (logical/physical): 512 bytes / 4096 bytes
I/O size (minimum/optimal): 4096 bytes / 4096 bytes
Disk identifier: 0x00035e98

 Device Boot Start End Blocks Id System
/dev/sda1 * 1 1306 10484736 83 Linux

Disk /dev/sdb: 10.7 GB, 10737418240 bytes
64 heads, 32 sectors/track, 10240 cylinders
Units = cylinders of 2048 * 512 = 1048576 bytes
Sector size (logical/physical): 512 bytes / 4096 bytes
I/O size (minimum/optimal): 4096 bytes / 4096 bytes
Disk identifier: 0x762b810b


Here we will create a partition (/dev/sdb1), format it, and mount it at the default location for MySQL, which is /var/lib/mysql.  Perform the following steps on BOTH “node1” and “node2”:

# fdisk /dev/sdb
Command (m for help): n
Command action
e   extended
p   primary partition (1-4)
p
Partition number (1-4): 1
First cylinder (1-1305, default 1): <enter>
Using default value 1
Last cylinder, +cylinders or +size{K,M,G} (1-1305, default 1305): <enter>
Using default value 1305
 
Command (m for help): w
The partition table has been altered!
Calling ioctl() to re-read partition table.
Syncing disks.
[root@node1 ~]#

# mkfs.ext4 /dev/sdb1
# mkdir /var/lib/mysql

On node1, mount the filesystem:

# mount /dev/sdb1 /var/lib/mysql

Install and Configure MySQL

Next, install install the MySQL packages, initialize a sample database, and set “root” password for MySQL.

On “node1”:

# yum -y install mysql mysql-server
# /usr/bin/mysql_install_db --datadir="/var/lib/mysql/" --user=mysql
# mysqld_safe --user=root --socket=/var/lib/mysql/mysql.sock --port=3306 --datadir=/var/lib/mysql --log &
#
# # NOTE: This next command allows remote connections from ANY host.  NOT a good idea for production!
# echo "update user set Host='%' where Host='node1'; flush privileges" | mysql mysql
#
# #Set MySQL's root password to 'SIOS'
# echo "update user set Password=PASSWORD('SIOS') where User='root'; flush privileges" | mysql mysql

Create a MySQL configuration file. We will place this on the data disk  (that will later be replicated – /var/lib/mysql/my.cnf).  Example:

# vi /var/lib/mysql/my.cnf

[mysqld]
datadir=/var/lib/mysql
socket=/var/lib/mysql/mysql.sock
pid-file=/var/lib/mysql/mysqld.pid
user=root
port=3306
# Disabling symbolic-links is recommended to prevent assorted security risks
symbolic-links=0
 
[mysqld_safe]
log-error=/var/log/mysqld.log
pid-file=/var/run/mysqld/mysqld.pid
 
[client]
user=root
password=SIOS

Delete the original MySQL configuration file, located in /etc, if it exists:

# rm /etc/my.cnf

On “node2”:

On “node2”, you ONLY need to install the MySQL packages.  The other steps aren’t required:

[root@node2 ~]# yum -y install mysql mysql-server

Install and Configure the Cluster

At this point, we are ready to install and configure our cluster.  SIOS Protection Suite for Linux (aka SPS-Linux) will be used in this guide as the clustering technology.  It provides both high availability failover clustering features (LifeKeeper) as well as real-time, block level data replication (DataKeeper) in a single, integrated solution.  SPS-Linux enables you to deploy a “SANLess” cluster, aka a “shared nothing” cluster meaning that cluster nodes don’t have any shared storage, as is the case with Azure VMs.

Install SIOS Protection Suite for Linux

Perform the following steps on ALL 3 VMs (node1, node2, witness):

Download the SPS-Linux installation image file (sps.img) and and obtain either a trial license or purchase permanent licenses.  Contact SIOS for more information.

You will loopback mount it and run the “setup” script inside, as root (or first “sudo su -” to obtain a root shell)

For example:

# mkdir /tmp/install
# mount -o loop sps.img /tmp/install
# cd /tmp/install
# ./setup

During the installation script, you’ll be prompted to answer a number of questions.  You will hit Enter on almost every screen to accept the default values.  Note the following exceptions:

  • On the screen titled “High Availability NFS” you may select “n” as we will not be creating a highly available NFS server
  • Towards the end of the setup script, you can choose to install a trial license key now, or later. We will install the license key later, so you can safely select “n” at this point
  • In the final screen of the “setup” select the ARKs (Application Recovery Kits, i.e. “cluster agents”) you wish to install from the list displayed on the screen.
    • The ARKs are ONLY required on “node1” and “node2”.  You do not need to install on “witness”
    • Navigate the list with the up/down arrows, and press SPACEBAR to select the following:
      • lkDR – DataKeeper for Linux
      • lkSQL – LifeKeeper MySQL RDBMS Recovery Kit
    • This will result in the following additional RPMs installed on “node1” and “node2”:
      • steeleye-lkDR-9.0.2-6513.noarch.rpm
      • steeleye-lkSQL-9.0.2-6513.noarch.rpm

Install Witness/Quorum package

The Quorum/Witness Server Support Package for LifeKeeper (steeleye-lkQWK) combined with the existing failover process of the LifeKeeper core allows system failover to occur with a greater degree of confidence in situations where total network failure could be common. This effectively means that failovers can be done while greatly reducing the risk of “split-brain” situations.

Install the Witness/Quorum rpm on all 3 nodes (node1, node2, witness):

# cd /tmp/install/quorumrpm -Uvh steeleye-lkQWK-9.0.2-6513.noarch.rpm

On ALL 3 nodes (node1, node2, witness), edit /etc/default/LifeKeeper, set

NOBCASTPING=1

On ONLY the Witness server (“witness”), edit /etc/default/LifeKeeper, set

WITNESS_MODE=off/none

Install a License key

On all 3 nodes, use the “lkkeyins” command to install the license file that you obtained from SIOS:

# /opt/LifeKeeper/bin/lkkeyins <path_to_file>/<filename>.lic

Start LifeKeeper

On all 3 nodes, use the “lkstart” command to start the cluster software:

# /opt/LifeKeeper/bin/lkstart

Set User Permissions for LifeKeeper GUI

On all 3 nodes, edit /etc/group and add the “tony” user (or whatever username you are logged in as) to the “lkadmin” group to grant access to the LifeKeeper GUI.  By default only “root” is a member of the group, and we don’t have the root password in :

# vi /etc/group

lkadmin:x:502:root,tony

Open the LifeKeeper GUI

Make a VNC connection to the Public IP address of node1.  Based on the VNC and Firewall Rule configuration from above, you would connect to <Public_IP>:2 using the VNC password you specified earlier.  Once logged in, open a terminal window and run the LifeKeeper GUI using the following command:

# /opt/LifeKeeper/bin/lkGUIapp &

You will be prompted to connect to your first cluster node (“node1”).  Enter the linux userid and password specified during VM creation:

lk-gui-connect1

Next, connect to both “node2” and “witness” by clicking the “Connect to Server” button highlighted in the following screenshot:

lk-gui-connect2

You should now see all 3 servers in the GUI, with a green checkmark icon indicating they are online and healthy:

lk-gui-connect3

Create Communication Paths

Right-click on “node1” and select Create Comm Path

comm path1

Select BOTH “node2” and “witness” and then follow the wizard.  This will create comm paths between:

  • node1 & node2
  • node1 & witness

comm path2

A comm path still needs to be created between node2 & witness.   Right click on “node2” and select Create Comm Path.  Follow the wizard and select “witness” as the remote server:

comm path3

At this point the following comm paths have been created:

  • node1 <—> node2
  • node1 <—> witness
  • node2 <—> witness

The icons in front of the servers have changed from a green “checkmark” to a yellow “hazard sign”.  This is because we only have a single communication path between nodes.

If the VMs had multiple NICs (information on creating Azure VMs with multiple NICs can be found here, but won’t be covered in this article), you would create redundant comm paths between each server.

comm path4

To remove the warning icons, go to the View menu and de-select “Comm Path Redundancy Warning”:

comm path5

Result:

comm path6

Verify Communication Paths

Use the “lcdstatus” command to view the state of cluster resources.  Run the following commands to verify that you have correctly created comm paths on each node to the other two servers involved:

# /opt/LifeKeeper/bin/lcdstatus -q -d node1

MACHINE  NETWORK ADDRESSES/DEVICE   STATE     PRIO

node2    TCP     10.128.0.2/10.128.0.3  ALIVE        1

witness  TCP     10.128.0.2/10.128.0.4  ALIVE        1

#/opt/LifeKeeper/bin/lcdstatus -q -d node2

MACHINE  NETWORK ADDRESSES/DEVICE   STATE     PRIO

node1    TCP     10.128.0.3/10.128.0.2  ALIVE        1

witness  TCP     10.128.0.3/10.128.0.4  ALIVE        1

#/opt/LifeKeeper/bin/lcdstatus -q -d witness

MACHINE  NETWORK ADDRESSES/DEVICE   STATE     PRIO

node1    TCP     10.128.0.4/10.128.0.2  ALIVE        1

node2    TCP     10.128.0.4/10.128.0.3  ALIVE        1

Create a Data Replication cluster resource (i.e. Mirror)

Next, create a Data Replication resource to replicate the /var/lib/mysql partition from node1 (source) to node2 (target).  Click the “green plus” icon to create a new resource:

data replication1

Follow the wizard with these selections:

Please Select Recovery Kit:  Data Replication
Switchback Type: intelligent
Server: node1
Hierarchy Type: Replicate Exiting Filesystem
Existing Mount Point: /var/lib/mysql
Data Replication Resource Tag: datarep-mysql
File System Resource Tab: /var/lib/mysql
Bitmap File: (default value)
Enable Asynchronous Replication:  No

After the resource has been created, the “Extend” (i.e. define backup server) wizard will appear.  Use the following selections:

Target Server: node2
Switchback Type: Intelligent
Template Priority: 1
Target Priority: 10
Target Disk: /dev/sdb1
Data Replication Resource Tag: datarep-mysql
Bitmap File: (default value)
Replication Path: 10.128.0.2/10.128.0.3
Mount Point: /var/lib/mysql
Root Tag: /var/lib/mysql

The cluster will look like this:

data replication2

 

Create the MySQL resource hierarchy

Next, create a MySQL cluster resource.  The MySQL resource is responsible for stopping/starting/monitoring of your MySQL database.  To create, click the “green plus” icon to create a new resource:

Follow the wizard with to create the IP resource with these selections:

Select Recovery Kit: MySQL Database
Switchback Type: Intelligent
Server: node1
Location of my.cnf: /var/lib/mysql
Location of MySQL executables: /usr/bin
Database Tag: mysql

Extend the IP resource with the following selections:

Target Server: node2
Switchback Type: intelligent
Template Priority: 1
Target Priority: 10

As a result, your cluster will look as follows.  Notice that the Data Replication resource was automatically moved underneath the database (dependency automatically created) to ensure it’s always brought online before the database:

 

Create an Internal Load Balancer

If this was a typical on-premises cluster using either physical or virtual servers, you’d be done at this point.  Clients and Applications would connect into the Virtual IP of the cluster (10.128.0.99) to reach the active node.  In Google Cloud, this doesn’t work without some additional configuration.

To connect into the cluster, Google provides a feature were you can setup an Internal Load Balancer (ILB).  Essentially, when you connect to the IP address of the ILB (which we will set to 10.128.0.99) you are routed to the currently active cluster node.

Create a TCP Load Balancer:

This will be an internal load balancer, so select “Only between my VMs”:

Next, give the load balancer a name (“internal-lb”) and then click Backend configuration:

Make sure you select the proper region (“us-central1”) and configure the Backend.  Click “+ Add backend” and add both instance groups (instance-group-1a AND instance-group-1b):

The load balancer decides which node to route traffic to based on a health check.  In this example, a health check to see if MySQL is running (checking for default port 3306) will be configured.  Select “Create a health check”:

Give the new health check a name (“mysql-health-check”) and configured it for TCP port 3306:

Next, configure the Frontend of the load balancer.  Select “Frontend configuration” and under IP address, define a custom static internal IP of 10.128.0.99.  The port should be 3306, which is the default port for MySQL:

Finally, review and finalize load balancer creation.  Click “Create”:

Result.  You will see that the load balancer is online, however, it’s not showing either instance group as healthy! (as indicated by the 0/0).  We will fix that in the next section:

Create Firewall Rules for Internal Load Balancer

Per Google documentation (see “Configure a firewall rule to allow internal load balancing” section), two firewall rules need to be created.  The first allows traffic to the load balancer and from the load balancer to the instances. The second allows health check probes from the health checker.

Create a new firewall rule:

Give the new rule a name (allow-internal-lb) and specify “10.128.0.0/20” as the source IP range.  The allowed protocols and ports should be “tcp:3306“:

After clicking “Create”, you’ll be returned to the Firewall rules page and can see the newly created rule in the list.  Click “Create Firewall Rule” again so we can create the 2nd required rule:

Give the 2nd rule a name (“allow-health-check”).  TWO different source IP ranges will need to be defined:

  • 130.211.0.0/22
  • 35.191.0.0/16

Note: Always a good idea to double check with the google cloud documentation to make sure that these IP ranges are still valid.

You should now see both newly created firewall rules in the list:

Test Cluster Connectivity

At this point, all of our Google Cloud and Cluster configurations are complete!

Cluster resources are currently active on node1:

You will also notice that the Internal Load Balancer is showing node1, which is a member of instance-group-1a as “healthy” and as such, routing traffic coming into the virtual IP (10.128.0.99) to node1:

SSH into the witness server, “sudo su -” to gain root access.   Install the mysql client if needed:

[root@witness ~]# yum -y install mysql

Test MySQL connectivity to the cluster:

[root@witness ~]# mysql --host=10.128.0.99 mysql -u root -p

Execute the following MySQL query to display the hostname of the active cluster node:

mysql> select @@hostname;
+------------+
| @@hostname |
+------------+
| node1      |
+------------+
1 row in set (0.00 sec)
mysql>

Using LifeKeeper GUI, failover from Node1 -> Node2″.  Right click on the mysql resource underneath node2, and select “In Service…”:

After failover, resources are brought online on node2:

You will now see that the Internal Load Balancer is showing instance-group-1b, which contains node2, as healthy.  Traffic is now routed to node2:

After failover has completed, re-run the MySQL query.  You’ll notice that the MySQL client has detected that the session was lost (during failover) and automatically reconnects:

Execute the following MySQL query to display the hostname of the active cluster node, verifying that now “node2” is active:

mysql> select @@hostname;
ERROR 2006 (HY000): MySQL server has gone away
No connection. Trying to reconnect...
Connection id:    48
Current database: mysql
+------------+
| @@hostname |
+------------+
| node2      |
+------------+
1 row in set (0.56 sec)
mysql>

 

Nov 182016
 

Earlier this year, Microsoft announced that they would be releasing a version of SQL Server that would run on Linux.  Today, I was excited to learn that Microsoft finally announced the public preview of what they are currently calling SQL Server v.Next, and that it is available for both Linux and Windows operating systems.  More information, along with download links and documentation can be found here:

https://www.microsoft.com/en-us/sql-server/sql-server-vnext-including-Linux

In this article, I will not only show you how to deploy a Linux VM in Azure running SQL Server, but also how to configure a 2-node failover cluster to make it highly available!  And WITHOUT using shared storage (aka “sanless” or “shared nothing” cluster).

The end result will be a 2-node SQL Server for Linux cluster (plus witness server) in Microsoft Azure IaaS (Infrastructure as a Service).  The guide includes both screenshots, shell commands and code snippets as appropriate.  I assume that you are somewhat familiar with Microsoft Azure and already have an Azure account with an associated subscription.  If not, you can sign up for a free account today.  I’m also going to assume that you have basic linux system administration skills as well as understand basic failover clustering concepts like Virtual IPs, etc.

Disclaimer: Azure is a rapidly moving target.  And here I’m working with a public preview version of SQL Server for Linux.  As such, features/screens/buttons are bound to change before SQL v.Next is officially released so your experience may vary slightly from what you’ll see below.  While this guide will show you how to make a SQL Server for Linux database highly available, you could certainly adapt this information and process to protect other applications or databases, like I have written about before here (MySQL example).

These are the high level steps to create a highly available MySQL database within Microsoft Azure IaaS:

  1. Create a Resource Group
  2. Create a Virtual Network
  3. Create a Storage Account
  4. Create Virtual Machines in an Availability Set
  5. Set VM Static IP Addresses
  6. Add a Data Disk to cluster nodes
  7. Create Inbound Security Rule to allow VNC access
  8. Linux OS Configuration
  9. Install and Configure SQL Server
  10. Install and Configure Cluster
  11. Create an Internal Load Balancer
  12. Test Cluster Connectivity

Overview

This article will describe how to create a cluster within a single Azure region.  The cluster nodes (sql-linux1, sql-linux2 and the witness server) will reside in an Availability Set (3 different Fault Domains and Update Domains), thanks to the new Azure Resource Manager (ARM). We will be creating all resources using the new Azure Resource Manager.

The configuration will look like this:

azure-linux-sqlserver

The following IP addresses will be used:

  • sql-linux1: 10.0.0.7
  • sql-linux2: 10.0.0.8
  • sql-witness: 10.0.0.9
  • virtual/”floating” IP: 10.0.0.199
  • SQL Server port: 1433

Create a Resource Group

First, create a Resource Group.  Your resource group will end up containing all of the various objects related to our cluster deployment: virtual machines, storage account, etc.  Here we will call our newly created Resource Group “sql-cluster”.

sql-resource-group1

Be mindful when selecting your region.  All of your resources will need to reside within the same region.  Here, we’ll be deploying everything into the “West US” region:

sql-resource-group2

Create a Virtual Network (VNet)

Next, if you don’t already have one, create a Virtual Network.  A Virtual Network is an isolated network within the Azure cloud that is dedicated to you.  You have full control over things like IP address blocks and subnets, routing, security policies (i.e. firewalls), DNS settings, and more.  You will be launching your Azure Iaas virtual machines (VMs) into your Virtual Network.

My Azure account already has an existing VNet (10.0.0.0/16) called “cluster-network” that I’m going to use in this guide.  Creating a VNet is pretty straightforward, and I’ve covered creating one here if you need a refresher.

Create a Storage Account

Before you provision any Virtual Machines, you’ll need a Storage Account to store them.

sql-storage-account1

Next, give your new storage account a name.  The storage account name must be unique across *ALL* of Azure.  (Every object that you store in Azure Storage has a unique URL address. The storage account name forms the subdomain of that address.)  In this example I call my storage account “sqllinuxcluster” but you’ll need to select something different as you setup your own.

Select a storage Type based on your requirements and budget.  For the purposes of this guide, I selected “Standard-LRS” (i.e. Locally Redundant) to minimize cost.

Make sure your new Storage Account is added to the Resource Group you created in Step 1 (“sql-cluster”)  in the same Location (“West US” in this example):

sql-storage-account2

Create Virtual Machines in an Availability Set

We will be provisioning 3 Virtual Machines in this guide.  The first two VMs (I’ll call them “sql-linux1” and “sql-linux2”) will function as cluster nodes with the ability to bring the SQL Server database and it’s associated resources online.  The 3rd VM will act as the cluster’s witness server for added protection against split-brain.

To ensure maximum availability, all 3 VMs will be added to the same Availability Set, ensuring that they will end up in different Fault Domains and Update Domains.

The Azure Marketplace has a VM template called “SQL Server vNext on Red Hat Enterprise Linux 7.2” that has a public preview evaluation version SQL Server v.Next for Linux pre-installed, which will save you a few steps.  If you’d rather start with an empty VM and install SQL yourself, the installation instructions can be found here.

Create “sql-linux1” VM

Create your first VM (“sql-linux1”) and select the marketplace image mentioned above.

sql-create-vm1

Give the VM a hostname (“sql-linux1”) and username/password that will later be used to SSH into the system.  Make sure you add this VM to your Resource Group (“sql-cluster”) and that it resides in the same region as all of your other resources:

sql-create-vm2

Next, choose your instance size.  For more information on the various instance sizes available, click here.

For the purposes of this guide, I selected the smallest/cheapest size I could, in this case “DS1_V2 Standard”, to minimize cost since this won’t be running a production workload.  Select the instance size that makes most sense based on what you are looking to test:

sql-create-vm3

IMPORTANT: By default, your VM won’t be added to an Availability Set.  On the Settings screen during make sure you create a new Availability Set, we’ll call “sql-availability-set”.  Azure Resource Manager (ARM) allows your to create Availability Sets with 3 Fault Domains.  The default values here are fine:

sql-create-vm4

On the next screen, review your VM properties and click OK to create your first VM.

Create “sql-linux2” and “sql-witness” VMs

Repeat the steps above twice to create two more VMs.

The only difference here is that you’ll be ADDING these VMs to the Availability Set (“sql-availability-set”) we just created.

It may take a little while for your 3 VMs to provision.  Once complete, you’ll see your VMs (sql-linux1, sql-linux2 and sql-witness) listed on the Virtual Machines screen within your Azure Portal.

Set VM Static IP Addresses

The VMs will be set with the following IP addresses:

  • sql-linux1: 10.0.0.7
  • sql-linux2: 10.0.0.8
  • sql-witness: 10.0.0.9

Repeat this step for each VM.  Select your VM and edit the Network Interfaces

sql-static-ip1

Select the network interface associated with the VM, and edit IP configurations.  Select “Static” and specify the desired IP address:

sql-static-ip2

Add a Data Disk to cluster nodes

Next, we will need to add a extra disk to of our cluster nodes (“sql-linux1” and “sql-linux2”).  This disk will store our SQL databases and the later be replicated between nodes.

Note: You do NOT need to add an extra disk to the “sql-witness” node.  Only “sql-linux1” and “sql-linux2”.

Edit your VM, select Disks and then attach a new disk.  Select a disk type (Standard or Premium SSD)  and size based on your workload.  Here I create a 10GB Standard disk on both of my cluster nodes.  As far as Host caching goes, “None” or “Read only” caching is fine.  I do not recommend using “Read/Write” as there is potential for data loss:

sql-add-disk1

Create Inbound Security Rule to allow VNC access

If your VM is part of a Network Security Group (NSG), which by default it likely is unless you disabled it during VM creation, the only port open in the “Azure firewall” is SSH (port 22).  Later in the guide, I’ll be using VNC to access the desktop of “sql-linux1” and configure the cluster using a GUI.  Create an Inbound Security Rule to open up VNC access.  In this guide port 5902 is used.  Adjust this according based on your VNC configuration.

Virtual Machines -> (select sql-linux1) -> Network interfaces -> (select NIC) -> Network security group -> (select the NSG) -> Inbound security rules -> Add

sql-security-group1

Linux OS Configuration

Here is where we will leave the Azure Portal for a little while and get our hands dirty on the command line, which as a Linux administrator you should be used to by now.  You aren’t given the root password to your Linux VMs in Azure, so once you login as the user specified during VM creation, use the “sudo” command to gain root privileges:

$sudo su -

Edit /etc/hosts

Unless you have already have a DNS server setup, you’ll want to create host file entries on all 3 servers so that they can properly resolve each other by name

Add the following lines to the end of your /etc/hosts file:

10.0.0.7    sql-linux1
10.0.0.8    sql-linux2
10.0.0.9    sql-witness
10.0.0.199   sql-vip

Disable SELinux

Edit /etc/sysconfig/linux and set “SELINUX=disabled”:

# vi /etc/sysconfig/selinux

# This file controls the state of SELinux on the system.
# SELINUX= can take one of these three values:
#     enforcing - SELinux security policy is enforced.
#     permissive - SELinux prints warnings instead of enforcing.
#     disabled - No SELinux policy is loaded.
SELINUX=disabled
# SELINUXTYPE= can take one of these two values:
#     targeted - Targeted processes are protected,
#     mls - Multi Level Security protection.
SELINUXTYPE=targeted

Configure iptables so that cluster the Virtual IP will work

In order to get connectivity to the cluster Virtual IP to work, and also monitoring of the IP resource, a few iptables rules need to be setup.  Note: 10.0.0.199 is the Virtual IP we’ll be using in our cluster, and 1433 is the default port used my SQL Server.

Note: RHEL7 changed the default firewall to FirewallD, instead of iptables.  Haven’t spent much time with firewalld yet, so for now this guide will disable firewalld and use iptables instead. You’ll also need to install the “iptables-services” package so that the service and chkconfig commands below work.

# systemctl stop firewalld
# systemctl disable firewalld

On sql-linux1 (10.0.0.7), run the following commands:

# yum install iptables-services
# iptables --flush
# iptables -t nat -A PREROUTING -p tcp --dport 1433 -j DNAT --to-destination 10.0.0.199:1433
# iptables -t nat -A PREROUTING -p tcp --dport 1434 -j DNAT --to-destination 10.0.0.199:1434
# iptables -t nat -A POSTROUTING -p icmp -s 10.0.0.199 -j SNAT --to-source 10.0.0.7 
# service iptables save 
# chkconfig iptables on 

On sql-linux2 (10.0.0.8), run the following commands:

# yum install iptables-services
# iptables --flush 
# iptables -t nat -A PREROUTING -p tcp --dport 1433 -j DNAT --to-destination 10.0.0.199:1433
# iptables -t nat -A PREROUTING -p tcp --dport 1434 -j DNAT --to-destination 10.0.0.199:1434
# iptables -t nat -A POSTROUTING -p icmp -s 10.0.0.199 -j SNAT --to-source 10.0.0.8 
# service iptables save 
# chkconfig iptables on

Install and Configure VNC (and related packages)

In order to access the GUI of our linux servers, and to later install and configure our cluster, install VNC server, as well as a handful of other required packages (cluster software needs the redhat-lsb and patch rpms).

# yum install tigervnc-server xterm wget unzip patch redhat-lsb
# vncpasswd

The following URL is a great guide to getting VNC Server running on RHEL 7 / CentOS 7:

https://www.digitalocean.com/community/tutorials/how-to-install-and-configure-vnc-remote-access-for-the-gnome-desktop-on-centos-7

Note:  This example configuration runs VNC on display 2 (:2, aka port 5902) and as root (not secure).  Adjust accordingly!

# cp /lib/systemd/system/vncserver@.service /etc/systemd/system/vncserver@:2.service
# vi /etc/systemd/system/vncserver@:2.service


[Service]
Type=forking
# Clean any existing files in /tmp/.X11-unix environment
ExecStartPre=/bin/sh -c '/usr/bin/vncserver -kill %i > /dev/null 2>&1 || :'
ExecStart=/sbin/runuser -l root -c "/usr/bin/vncserver %i -geometry 1024x768"
PIDFile=/root/.vnc/%H%i.pid
ExecStop=/bin/sh -c '/usr/bin/vncserver -kill %i > /dev/null 2>&1 || :'


# systemctl daemon-reload
# systemctl enable vncserver@:2.service
# vncserver :2 -geometry 1024x768

Reboot Cluster Nodes

Reboot your cluster nodes so that SELinux is disabled, and the 2nd disk you previously added is detected.

Partition and Format the “data” disk

In Step 6 of this guide (“Add a Data Disk to cluster nodes”) we did just that….added an extra disk to each cluster node to store the application data we will be protecting.  In this case it happens to be MySQL databases.

In Azure IaaS, Linux Virtual Machines use the following arrangement for disks:

  • /dev/sda – OS disk
  • /dev/sdb – temporary disk
  • /dev/sdc – 1st data disk
  • /dev/sdd – 2nd data disk
  • /dev/sdj – 8th data disk

The disk we added in Step 6 of this guide should appear as /dev/sdc.  You can run the “fdisk -l” command to verify.  You’ll see that /dev/sda (OS) and /dev/sdb (temporary) already have disk partitions and are being used.

# fdisk -l

Disk /dev/sda: 31.5 GB, 31457280000 bytes, 61440000 sectors
Units = sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 4096 bytes
I/O size (minimum/optimal): 4096 bytes / 4096 bytes
Disk label type: dos
Disk identifier: 0x000c46d3

Device Boot Start End Blocks Id System
/dev/sda1 * 2048 1026047 512000 83 Linux
/dev/sda2 1026048 61439999 30206976 83 Linux

Disk /dev/sdb: 7516 MB, 7516192768 bytes, 14680064 sectors
Units = sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 4096 bytes
I/O size (minimum/optimal): 4096 bytes / 4096 bytes
Disk label type: dos
Disk identifier: 0x7cd70e11

Device Boot Start End Blocks Id System
/dev/sdb1 128 14678015 7338944 83 Linux

Disk /dev/sdc: 10.7 GB, 10737418240 bytes, 20971520 sectors
Units = sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 4096 bytes
I/O size (minimum/optimal): 4096 bytes / 4096 bytes

Here I will create a partition (/dev/sdc1), format it, and mount it at the default location for SQL, which is /var/opt/mssql.  Perform the following steps on BOTH “sql-linux1” and “sql-linux2”:

# fdisk /dev/sdc
Command (m for help): n
Command action
e   extended
p   primary partition (1-4)
p
Partition number (1-4): 1
First cylinder (1-1305, default 1): <enter>
Using default value 1
Last cylinder, +cylinders or +size{K,M,G} (1-1305, default 1305): <enter>
Using default value 1305
 
Command (m for help): w
The partition table has been altered!
Calling ioctl() to re-read partition table.
Syncing disks.
[root@sql-linux1 ~]#
# mkfs.ext4 /dev/sdc1
# mkdir /var/opt/mssql
# chmod 770 /var/opt/mssql

Mount the filesystem:

# mount /dev/sdc1 /var/opt/mssql

Install and Configure SQL Server

If you started with a fresh linux system, the full installation instructions can be found here.

If you created your VMs using the “SQL Server vNext on Red Hat Enterprise Linux 7.2” Azure template, as I have done in this guide, then SQL Server is already installed.  All you need to do now is run the setup script:

# /opt/mssql/bin/sqlservr-setup
Microsoft(R) SQL Server(R) Setup

You can abort setup at anytime by pressing Ctrl-C. Start this program
with the --help option for information about running it in unattended
mode.

The license terms for this product can be downloaded from
http://go.microsoft.com/fwlink/?LinkId=746388 and found
in /usr/share/doc/mssql-server/LICENSE.TXT.

Do you accept the license terms? If so, please type "YES": YES

Please enter a password for the system administrator (SA) account: <enter desired password>
Please confirm the password for the system administrator (SA) account: <enter desired password>

Setting system administrator (SA) account password...

Do you wish to start the SQL Server service now? [y/n]: y
Do you wish to enable SQL Server to start on boot? [y/n]: n

You can use sqlservr-setup --enable-service to enable SQL Server to start
at boot.

Setup completed successfully.

Verify that the service is running:

systemctl status mssql-server

Stop SQL Server on both nodes.  The cluster software will later be responsible for starting it up:

systemctl stop mssql-server
systemctl stop mssql-server-telemetry

Install and Configure the Cluster

At this point, we are ready to install and configure our cluster.  SIOS Protection Suite for Linux (aka SPS-Linux) will be used in this guide as the clustering technology.  It provides both high availability failover clustering features (LifeKeeper) as well as real-time, block level data replication (DataKeeper) in a single, integrated solution.  SPS-Linux enables you to deploy a “SANLess” cluster, aka a “shared nothing” cluster meaning that cluster nodes don’t have any shared storage, as is the case with Azure VMs.

Install SIOS Protection Suite for Linux

Perform the following steps on ALL 3 VMs (sql-linux1, sql-linux2, sql-witness):

Download the SPS-Linux installation image file (sps.img) and and obtain either a trial license or purchase permanent licenses.  Contact SIOS for more information.

You will loopback mount it and run the “setup” script inside, as root (or first “sudo su -” to obtain a root shell if you haven’t already)

For example:

# mkdir /tmp/install
# mount -o loop sps.img /tmp/install
# cd /tmp/install
# ./setup

During the installation script, you’ll be prompted to answer a number of questions.  You will hit Enter on almost every screen to accept the default values.  Note the following exceptions:

  • On the screen titled “High Availability NFS” you may select “n” as we will not be creating a highly available NFS server
  • Towards the end of the setup script, you can choose to install a trial license key now, or later. We will install the license key later, so you can safely select “n” at this point
  • In the final screen of the “setup” select the ARKs (Application Recovery Kits, i.e. “cluster agents”) you wish to install from the list displayed on the screen.
    • The ARKs are ONLY required on “sql-linux1” and “sql-linux2”.  You do not need to install on “sql-witness”
    • Navigate the list with the up/down arrows, and press SPACEBAR to select the following:
      • lkDR – DataKeeper for Linux
    • This will result in the following additional RPMs installed on “sql-linux1” and “sql-linux2”:
      • steeleye-lkDR-9.0.2-6513.noarch.rpm

Install Witness/Quorum package

The Quorum/Witness Server Support Package for LifeKeeper (steeleye-lkQWK) combined with the existing failover process of the LifeKeeper core allows system failover to occur with a greater degree of confidence in situations where total network failure could be common. This effectively means that failovers can be done while greatly reducing the risk of “split-brain” situations.

Install the Witness/Quorum rpm on all 3 nodes (sql-linux1, sql-linux2, sql-witness):

# cd /tmp/install/quorumrpm -Uvh steeleye-lkQWK-9.0.2-6513.noarch.rpm

On ALL 3 nodes (sql-linux1, sql-linux2, sql-witness), edit /etc/default/LifeKeeper, set

NOBCASTPING=1

On ONLY the Witness server (“sql-witness”), edit /etc/default/LifeKeeper, set

WITNESS_MODE=off/none

Install a License key

On all 3 nodes, use the “lkkeyins” command to install the license file that you obtained from SIOS:

# /opt/LifeKeeper/bin/lkkeyins <path_to_file>/<filename>.lic

Start LifeKeeper

On all 3 nodes, use the “lkstart” command to start the cluster software:

# /opt/LifeKeeper/bin/lkstart

Set User Permissions for LifeKeeper GUI

On all 3 nodes, edit /etc/group and add the “tony” user (or whatever username you specified during VM creation) to the “lkadmin” group to grant access to the LifeKeeper GUI.  By default only “root” is a member of the group, and we don’t have the root password in :

# vi /etc/group

lkadmin:x:1001:root,tony

Open the LifeKeeper GUI

Make a VNC connection to the Public IP address of sql-linux1.  Based on the VNC and Inbound Security Rule configuration from above, you would connect to <Public_IP>:2 using the VNC password you specified earlier.  Once logged in, open a terminal window and run the LifeKeeper GUI using the following command:

# /opt/LifeKeeper/bin/lkGUIapp &

You will be prompted to connect to your first cluster node (“sql-linux1”).  Enter the linux userid and password specified during VM creation:

sql-gui-login1

Next, connect to both “sql-linux2” and “sql-witness” by clicking the “Connect to Server” button highlighted in the following screenshot:

sql-gui-connect1

You should now see all 3 servers in the GUI, with a green checkmark icon indicating they are online and healthy:

sql-gui-connect2

Create Communication Paths

Right-click on “sql-linux1” and select Create Comm Path

sql-gui-comm1

Select BOTH “sql-linux2” and “sql-witness” and then follow the wizard.  This will create comm paths between:

  • sql-linux1 & sql-linux2
  • sql-linux1 & sql-witness

sql-gui-comm2

A comm path still needs to be created between sql-linux2 & sql-witness.   Right click on “sql-linux2” and select Create Comm Path.  Follow the wizard and select “sql-witness” as the remote server:

sql-gui-comm3

At this point the following comm paths have been created:

  • sql-linux1 <—> sql-linux2
  • sql-linux1 <—> sql-witness
  • sql-linux2 <—> sql-witness

The icons in front of the servers have changed from a green “checkmark” to a yellow “hazard sign”.  This is because we only have a single communication path between nodes.

If the VMs had multiple NICs (information on creating Azure VMs with multiple NICs can be found here, but won’t be covered in this article), you would create redundant comm paths between each server.

sql-gui-comm4

To remove the warning icons, go to the View menu and de-select “Comm Path Redundancy Warning”:

sql-gui-comm5

Result:

sql-gui-nodes-green

Verify Communication Paths

Use the “lcdstatus” command to view the state of cluster resources.  Run the following commands to verify that you have correctly created comm paths on each node to the other two servers involved:

# /opt/LifeKeeper/bin/lcdstatus -q -d sql-linux1

MACHINE  NETWORK ADDRESSES/DEVICE   STATE     PRIO

sql-linux2    TCP     10.0.0.7/10.0.0.8  ALIVE        1

sql-witness  TCP     10.0.0.7/10.0.0.9  ALIVE        1

/opt/LifeKeeper/bin/lcdstatus -q -d sql-linux2

MACHINE  NETWORK ADDRESSES/DEVICE   STATE     PRIO

sql-linux1    TCP     10.0.0.8/10.0.0.7  ALIVE        1

sql-witness  TCP     10.0.0.8/10.0.0.9  ALIVE        1

/opt/LifeKeeper/bin/lcdstatus -q -d sql-witness

MACHINE  NETWORK ADDRESSES/DEVICE   STATE     PRIO

sql-linux1    TCP     10.0.0.9/10.0.0.7  ALIVE        1

sql-linux2    TCP     10.0.0.9/10.0.0.8  ALIVE        1

Create a Data Replication cluster resource (i.e. Mirror)

Before you proceed with this step, make sure that /var/opt/mssql is MOUNTED on sql-linux1, but NOT MOUNTED on sql-linux2.  Run “umount /var/opt/mssql” on sql-linux2 if needed.

Next, create a Data Replication resource to replicate the /var/opt/mssql partition from sql-linux1 (source) to sql-linux2 (target).  Click the “green plus” icon to create a new resource:

data replication1

Follow the wizard with these selections:

Please Select Recovery Kit:  Data Replication
Switchback Type: intelligent
Server: sql-linux1
Hierarchy Type: Replicate Exiting Filesystem
Existing Mount Point: /var/opt/mssql
Data Replication Resource Tag: datarep-mssql
File System Resource Tab: /var/opt/mssql
Bitmap File: (default value)
Enable Asynchronous Replication:  No

After the resource has been created, the “Extend” (i.e. define backup server) wizard will appear.  Use the following selections:

Target Server: sql-linux2
Switchback Type: Intelligent
Template Priority: 1
Target Priority: 10
Mount Point: /var/opt/mssql
Root Tag: /var/opt/mssql
Target Disk: /dev/sdc1
Data Replication Resource Tag: datarep-mssql
Bitmap File: (default value)
Replication Path: 10.0.0.7/10.0.0.8

The cluster will look like this:

sql-mirror-created

Create Virtual IP

Next, create a Virtual IP cluster resource.  Click the “green plus” icon to create a new resource:

virtual ip1

Follow the wizard with to create the IP resource with these selections:

Select Recovery Kit: IP
Switchback Type: Intelligent
Server: sql-linux1
IP Resource: 10.0.0.199
Netmask: 255.255.255.0
Network Interface: eth0
IP Resource Tag: ip-10.0.0.199

Extend the IP resource with these selections:

Target Server: sql-linux2
Switchback Type: Intelligent
Template Priority: 1
Target Priority: 10
IP Resource: 10.0.0.199
Netmask: 255.255.255.0
Network Interface: eth0
IP Resource Tag: ip-10.0.0.199

Configure a Ping List for the IP resource

By default, SPS-Linux monitors the health of IP resources by performing a broadcast ping.  In many virtual and cloud environments, broadcast pings don’t work.  In a previous step, we set “NOBCASTPING=1” in /etc/default/LifeKeeper to turn off broadcast ping checks. Instead, we will define a ping list.  This is a list of IP addresses to be pinged during IP health checks for this IP resource.   In this guide, we will add the witness server (10.0.0.9) to our ping list.

Right click on the IP resource (ip-10.0.0.199) and select Properties:

sql-pinglist1

You will see that initially, no ping list is configured for our 10.0.0.0 subnet.   Click “Modify Ping List”:

sql-pinglist2

Enter “10.0.0.9” (the IP address of our witness server), click “Add address” and finally click “Save List”:

sql-pinglist3

You will be returned to the IP properties panel, and can verify that 10.0.0.9 has been added to the ping list.  Click OK to close the window:

sql-pinglist4

Write scripts to stop/start/monitor/restart SQL Server

Next, create some simple shell scripts that will be loaded into the cluster to create a “Generic Application” resource, allowing the cluster to start/stop/monitor/restart SQL Server.  These are very basic examples that make calls to “systemctl” to control the SQL Server services.  We will create 4 scripts:

  • Restore Script – aka “start”: sql-start.sh
  • Remove Script – aka “stop”: sql-stop.sh
  • QuickCheck Script – aka “status”: sql-status.sh
  • Recover Script – aka “restart”: sql-restart.sh
# sudo su -

# cd /root

# vi sql-start.sh

#!/bin/bash
systemctl start mssql-server
systemctl start mssql-server-telemetry

# vi sql-stop.sh

#!/bin/bash
systemctl stop mssql-server
systemctl stop mssql-server-telemetry

# vi sql-status.sh

#!/bin/bash
systemctl status mssql-server
exit $?

# vi sql-restart.sh

#!/bin/bash
systemctl restart mssql-server
systemctl restart mssql-server-telemetry

Before loading these scripts into the cluster, make sure they are executable:

# chmod 755 sql-*.sh

Create the SQL Server resource hierarchy (Generic Application)

Next, create a Generic Application cluster resource.  We will load in the scripts created in the previous step.  To create, click the “green plus” icon to create a new resource:

Follow the wizard with to create the Generic Application resource with these selections:

Select Recovery Kit: Generic Application
Switchback Type: Intelligent
Server: sql-linux1
Restore Script: /root/sql-start.sh
Remove Script: /root/sql-stop.sh
QuickCheck Script: /root/sql-status.sh
Local Recovery Script: /root/sql-restart.sh
Application Info: <empty>
Bring Resource In Service: Yes
Resource Tag: app-SQLServer

Extend the Generic Application resource with the following selections:

Target Server: sql-linux2
Switchback Type: intelligent
Template Priority: 1
Target Priority: 10
Resource Tag: app-SQLServer
Application Info: <empty>

As a result, your cluster will look as follows:

sql-resources-no-deps

Create Dependencies between cluster resources

Create a dependency between the IP resource, SQL Server resource, and replicated filesystem resource so that they failover together as a group.  Right click on the “app-SQLServer” resource and select “Create Dependency”.

sql-create-dependency

You will do this two times.  First selecting “ip-10.0.0.199” as the child resource, and and second time selecting “/var/opt/mssql” as the child resource.  After dependency creation your resource hierarchy should look like this:

sql-dependencies-created

Create an Internal Load Balancer

If this was a typical on-premises cluster using either physical or virtual servers, you’d be done at this point.  Clients and Applications would connect into the Virtual IP of the cluster (10.0.0.199) to reach the active node.  In Azure, this doesn’t work without some additional configuration.

You will notice that you can’t connect to the Virtual IP from any server other than the node that is currently active.  Most cloud providers, including Azure, do not allow or support gratuitous ARPs which is the reason you can’t connect to the Virtual IP directly.

To workaround this, Azure provides a feature were you can setup an Internal Load Balancer (ILB).  Essentially, when you connect to the IP address of the ILB (which we will actually set to be the same as the cluster’s Virtual IP – 10.0.0.199) you are routed to the currently active cluster node.

Create a Load Balancer:

internal load balancer1

Give it a name, select “Internal” as the scheme, make sure your virtual network and subnet are properly selected, and assign a static IP that is the same as the cluster’s Virtual IP address.  In this example it’s 10.0.0.199:

sql-create-ilb

Next, add a backend pool behind the load balancer.  This how you place the two cluster VMs behind this load balancer

sql-ilb-backendpool

Next, configure a probe for your ILB.  The probe checks the health of a service behind the ILB to determine which node to route traffic to.  Here we will specify port 1433, which is the default for SQL Server:

sql-ilb-probe

Finally, complete the ILB configuration by creating a Load Balancing Rule.   TCP, Port 1433, and make sure you select “Enabled” for “Floating IP (direct server return)”:

sql-ilb-rule

Test Cluster Connectivity

At this point, all of our Azure and Cluster configurations are complete!

Cluster resources are currently active on sql-linux1:

sql-dependencies-created

SSH into the witness server (sql-witness), “sudo su -” to gain root access.   Install sqlcmd if needed:

https://docs.microsoft.com/en-us/sql/linux/sql-server-linux-connect-and-query-sqlcmd

Test SQL connectivity to the cluster:

[root@sql-witness ~]# sqlcmd -S 10.0.0.199 -U SA -p

Execute the following SQL query to display the hostname of the active cluster node:

1> select @@servername;
2> go

-------------
sql-linux1

(1 rows affected)

Using LifeKeeper GUI, failover from sql-linux1 -> sql-linux2″.  Right click on the “app-SQLServer” resource underneath sql-linux2, and select “In Service…”:

sql-failover-node2

After failover:

sql-after-failover

After failover has completed, re-run the SQL query, verifying that now “sql-linux2” is active:

[root@sql-witness ~]# sqlcmd -S 10.0.0.199 -U SA -p

Execute the following SQL query to display the hostname of the active cluster node:

1> select @@servername;
2> go

-------------
sql-linux2

(1 rows affected)
 Posted by at 10:14 am
Mar 212016
 

In this step by step guide I will take you through all steps required to configure a highly available, 2-node MySQL cluster (plus witness server) in Amazon’s Elastic Compute Cloud (Amazon EC2).  The guide includes both screenshots, shell commands and code snippets as appropriate.  I assume that you are somewhat familiar with Amazon EC2 and already have an account.  If not, you can sign up today.  I’m also going to assume that you have basic linux system administration skills as well as understand basic failover clustering concepts like Virtual IPs, etc.

Disclaimer: The cloud is rapidly moving target. As such, features/screens/buttons are bound to change over time so your experience may vary slightly from what you’ll see below.  While this guide will show you how to make a MySQL database highly available, you could certainly adapt this information and process to protect other applications or databases, like SAP, Oracle, PostgreSQL, NFS file servers, and more.

These are the high level steps to create a highly available MySQL database within Amazon EC2:

  1. Create a Virtual Private Cloud (VPC)
  2. Create an Internet Gateway
  3. Create Subnets (Availability Zones)
  4. Configure Route Tables
  5. Configure Security Group
  6. Launch Instances
  7. Create Elastic IP
  8. Create Route Entry for the Virtual IP
  9. Disable Source/Dest Checking for ENI’s
  10. Obtain Access Key ID and Secret Access Key
  11. Linux OS Configuration
  12. Install EC2 API Tools
  13. Install and Configure MySQL
  14. Install and Configure Cluster
  15. Test Cluster Connectivity

Overview

This article will describe how to create a cluster within a single Amazon EC2 region.  The cluster nodes (node1, node2 and the witness server) will reside different Availability Zones for maximum availability.  This also means that the nodes will reside in different subnets.

The configuration will look like this:

AWS-Linux-MySQL

The following IP addresses will be used:

  • node1: 10.0.0.4
  • node2: 10.0.1.4
  • witness: 10.0.2.4
  • virtual/”floating” IP: 10.1.0.10

Create a Virtual Private Cloud (VPC)

First, create a Virtual Private Cloud (aka VPC). A VPC is an isolated network within the Amazon cloud that is dedicated to you.  You have full control over things like IP address blocks and subnets, route tables, security groups (i.e. firewalls), and more.  You will be launching your Azure Iaas virtual machines (VMs) into your Virtual Network.

From the main AWS dashboard, select “VPC”

vpc1

Under “Your VPCs”, make sure you have selected the proper region at the top right of the screen.  In this guide the “US West (Oregon)” region will be used, because it is a region that has 3 Availability Zones.   For more information on Regions and Availability Zones, click here

vpc2

Give the VPC a name, and specify the IP block you wish to use.  10.0.0.0/16 will be used in this guide:

vpc3

You should now see the newly created VPC on the “Your VPCs” screen:

vpc4

 

Create an Internet Gateway

Next, create an Internet Gateway.  This is required if you want your Instances (VMs) to be able to communicate with the internet.

On the left menu, select Internet Gateways and click the Create Internet Gateway button.  Give it a name, and create:

 

internet-gateway1

Next, attach the internet gateway to your VPC:

internet-gateway2

Select your VPC, and click Attach:

internet-gateway3

Create Subnets (Availability Zones)

Next, create 3 subnets.  Each subnet will reside in it’s own Availability Zone.  The 3 Instances (VMs: node1, node2, witness) will be launched into separate subnets (and therefore Availability Zones) so that the failure of an Availability Zone won’t take out multiple nodes of the cluster.

The US West (Oregon) region, aka us-west-2, has 3 availability zones (us-west-2a, us-west-2b, us-west-2c).  Create 3 subnets, one in each of the 3 availability zones.

Under VPC Dashboard, navigate to Subnets, and then Create Subnet:

subnets1

Give the first subnet a name (“Subnet1)”, select the availability zone us-west-2a, and define the network block (10.0.0.0/24):

subnets2

Repeat to create the second subnet availability zone us-west-2b:

subnets3

Repeat to create the third subnet in availability zone us-west-2c:

subnets4

Once complete, verify that the 3 subnets have been created, each with a different CIDR block, and in separate Availability Zones, as seen below:

subnets5

Configure Route Tables

Update the VPC’s route table so that traffic to the outside world is send to the Internet Gateway created in a previous step.  From the VPC Dashboard, select Route Tables.   Go to the Routes tab, and by default only one route will exist which allows traffic only within the VPC.

Click Edit:

route-table1

 

Add another route:

route-table2

The Destination of the new route will be “0.0.0.0/0” (the internet) and for Target, select your Internet Gateway.  Then click Save:

route-table3

 

Next, associate the 3 subnets with the Route Table.   Click the “Subnet Associations” tab, and Edit:

route-table4

Check the boxes next to all 3 subnets, and Save:

route-table5

Verify that the 3 subnets are associated with the main route table:

route-table6

 

Later, we will come back and update the Route Table once more, defining a route that will allow traffic to communicate with the cluster’s Virtual IP, but this needs to be done AFTER the linux Instances (VMs) have been created.

Configure Security Group

Edit the Security Group (a virtual firewall) to allow incoming SSH and VNC traffic.  Both will later be used to configure the linux instances as well as installation/configuration of the cluster software.

On the left menu, select “Security Groups” and then click the “Inbound Rules” tab.  Click Edit:

security-group1

 

Add rules for both SSH (port 22) and VNC.  VNC generally uses ports in the 5900, depending on how you configure it, so for the purposes of this guide, we will open the 5900-5910 port range.  Configure accordingly based on your VNC setup:

security-group2

 

Launch Instances

We will be provisioning 3 Instances (Virtual Machines) in this guide.  The first two VMs (called “node1” and “node2”) will function as cluster nodes with the ability to bring the MySQL database and it’s associated resources online.  The 3rd VM will act as the cluster’s witness server for added protection against split-brain.

To ensure maximum availability, all 3 VMs will be deployed into different Availability Zones within a single region.  This means each instance will reside in a different subnet.

Go to the main AWS dashboard, and select EC2:

launch-instance1

Create “node1”

Create your first instance (“node1”).  Click Launch Instance:

launch-instance2

Select your linux distribution.  The cluster software used later supports RHEL, SLES, CentOS and Oracle Linux.  In this guide we will be using RHEL 7.X:

launch-instance3

Size your instance accordingly.  For the purposes of this guide and to minimize cost, t2.micro size was used because it’s free tier eligible.  See here for more information on instance sizes and pricing.

launch-instance4

Next, configure instance details.  IMPORTANT: make sure to launch this first instance (VM) into “Subnet1“, and define an IP address valid for the subnet (10.0.0.0/24) – below 10.0.0.4 is selected because it’s the first free IP in the subnet.  NOTE: .1/.2/.3 in any given subnet in AWS is reserved and can’t be used.

launch-instance5

Next, add an extra disk to the cluster nodes (this will be done on both “node1” and “node2”).  This disk will store our MySQL databases and the later be replicated between nodes.

Note: You do NOT need to add an extra disk to the “witness” node.  Only “node1” and “node2”.

Add New Volume, and enter in the desired size:

launch-instance6

 

Define a Tag for the instance, Node1:

launch-instance7

Associate the instance with the existing security group, so the firewall rules created previous will be active:

launch-instance8

Click Launch:

launch-instance9

IMPORTANT:  If this is the first instance in your AWS environment, you’ll need to create a new key pair.  The private key file will need to be stored in a safe location as it will be required when you SSH into the linux instances

launch-instance10

Create “node2”

Repeat the steps above to create your second linux instance (node2).  Configure it exactly like Node1.  However, make sure that you deploy it into “Subnet2” (us-west-2b availability zone).  The IP range for Subnet2 is 10.0.1.0/24, so an IP of 10.0.1.4 is used here:

launch-instance11

Make sure to add a 2nd disk to Node2 as well.  It should be the same exact size as the disk you added to Node1:

launch-instance12

Give the second instance a tag…. “Node2”:

launch-instance13

Create “witness”

Repeat the steps above to create your third linux instance (witness).  Configure it exactly like Node1&Node2, EXCEPT you DON’T need to add a 2nd disk, since this instance will only act as a witness to the cluster, and won’t ever bring MySQL online.

Make sure that you deploy it into “Subnet3” (us-west-2c availability zone).  The IP range for Subnet2 is 10.0.2.0/24, so an IP of 10.0.2.4 is used here:

launch-instance14

Note: default disk configuration is fine for the witness node.  A 2nd disk is NOT required:

launch-instance15

Tag the witness node:

launch-instance16

It may take a little while for your 3 instances to provision.  Once complete, you’ll see then listed as running in your EC2 console:

launch-instance17

Create Elastic IP

Next, create an Elastic IP, which is a public IP address that will be used to connect into you instance from the outside world.  Select Elastic IPs in the left menu, and then click “Allocate New Address”:

elastic-ip1

Select the newly created Elastic IP, right-click, and select “Associate Address”:

elastic-ip2

Associate this Elastic IP with Node1:

elastic-ip3

Repeat this for the other two instances if you want them to have internet access or be able to SSH/VNC into them directly.

Create Route Entry for the Virtual IP

At this point all 3 instances have been created, and the route table will need to be updated one more time in order for the cluster’s Virtual IP to work.  In this multi-subnet cluster configuration, the Virtual IP needs to live outside the range of the CIDR allocated to your VPC.

Define a new route that will direct traffic to the cluster’s Virtual IP (10.1.0.10) to the primary cluster node (Node1)

From the VPC Dashboard, select Route Tables, click Edit.  Add a route for “10.1.0.10/32” with a destination of Node1:

route-table-after-instance-creation

Disable Source/Dest Checking for ENI’s

Next, disable Source/Dest Checking for the Elastic Network Interfaces (ENI) of your cluster nodes. This is required in order for the instances to accept network packets for the virtual IP address of the cluster.

Do this for all ENIs.

Select “Network Interfaces”, right-click on an ENI, and select “Change Source/Dest Check”.

disable-source-dest-check1

Select “Disabled“:

disable-source-dest-check2

Obtain Access Key ID and Secret Access Key

Later in the guide, the cluster software will use the AWS Command Line Interface (CLI) to manipulate a route table entry for the cluster’s Virtual IP to redirect traffic to the active cluster node.  In order for this to work, you will need to obtain an Access Key ID and Secret Access Key so that the AWS CLI can authenticate properly.

In the top-right of the EC2 Dashboard, click on your name, and underneath select “Security Credentials” from the drop-down:

access-key1

 

Expand the “Access Keys (Access Key ID and Secret Access Key)” section of the table, and click “Create New Access Key”.  Download Key File and store the file in a safe location.

access-key2

Linux OS Configuration

Connect to the linux instance(s):

To connect to your newly created linux instances (via SSH), right click on the instance and select “Connect”.  This will display the instructions for connecting to the instance.  You will need the Private Key File you created/downloaded in a previous step:

connect1

Example:

connect2

Here is where we will leave the EC2 Dashboard for a little while and get our hands dirty on the command line, which as a Linux administrator you should be used to by now.

You aren’t given the root password to your Linux VMs in AWS (or the default “ec2-user” account either), so once you connect, use the “sudo” command to gain root privileges:

$sudo su -

Edit /etc/hosts

Unless you have already have a DNS server setup, you’ll want to create host file entries on all 3 servers so that they can properly resolve each other by name

Add the following lines to the end of your /etc/hosts file:

10.0.0.4    node1
10.0.1.4    node2
10.0.2.4    witness
10.1.0.10   mysql-vip

Disable SELinux

Edit /etc/sysconfig/linux and set “SELINUX=disabled”:

# vi /etc/sysconfig/selinux

# This file controls the state of SELinux on the system.
# SELINUX= can take one of these three values:
#     enforcing - SELinux security policy is enforced.
#     permissive - SELinux prints warnings instead of enforcing.
#     disabled - No SELinux policy is loaded.
SELINUX=disabled
# SELINUXTYPE= can take one of these two values:
#     targeted - Targeted processes are protected,
#     mls - Multi Level Security protection.
SELINUXTYPE=targeted

Set Hostnames

By default, these Linux instances will have a hostname that is based upon the server’s IP address, something like “ip-10-0-0-4.us-west-2.compute.internal”

You might notice that if you attempt to modify the hostname the “normal” way (i.e. editing /etc/sysconfig/network, etc), after each reboot, it reverts back to the original!!  I found a great thread in the AWS discussion forums that describes how to actually get hostnames to remain static after reboots.

Details here:  https://forums.aws.amazon.com/message.jspa?messageID=560446

Comment out modules that set hostname in “/etc/cloud/cloud.cfg” file. The following modules can be commented out using #.

# - set_hostname
# - update_hostname

Next, also change your hostname in /etc/hostname.

Reboot Cluster Nodes

Reboot all 3 instances so that SELinux is disabled, and the hostname changes take effect.

Install and Configure VNC (and related packages)

In order to access the GUI of our linux servers, and to later install and configure our cluster, install VNC server, as well as a handful of other required packages (cluster software needs the redhat-lsb and patch rpms).

# yum groupinstall “X Window System”
# yum groupinstall “Server with GUI”
# yum install tigervnc-server xterm wget unzip patch redhat-lsb
# vncpasswd

For RHEL 7.x/CentOS7.x:

The following URL is a great guide to getting VNC Server running on RHEL 7 / CentOS 7:

https://www.digitalocean.com/community/tutorials/how-to-install-and-configure-vnc-remote-access-for-the-gnome-desktop-on-centos-7

Note:  This example configuration runs VNC on display 2 (:2, aka port 5902) and as root (not secure).  Adjust accordingly!

# cp /lib/systemd/system/vncserver@.service /etc/systemd/system/vncserver@:2.service
# vi /etc/systemd/system/vncserver@:2.service


[Service]
Type=forking
# Clean any existing files in /tmp/.X11-unix environment
ExecStartPre=/bin/sh -c '/usr/bin/vncserver -kill %i > /dev/null 2>&1 || :'
ExecStart=/sbin/runuser -l root -c "/usr/bin/vncserver %i -geometry 1024x768"
PIDFile=/root/.vnc/%H%i.pid
ExecStop=/bin/sh -c '/usr/bin/vncserver -kill %i > /dev/null 2>&1 || :'


# systemctl daemon-reload
# systemctl enable vncserver@:2.service
# vncserver :2 -geometry 1024x768

For RHEL/CentOS 6.x systems:

# vi /etc/sysconfig/vncservers

      VNCSERVERS="2:root"
      VNCSERVERARGS[2]="-geometry 1024x768"

# service vncserver start
# chkconfig vncserver on

Open a VNC client, and connect to the <ElasticIP:2>.  If you can’t get it, it’s likely your linux firewall is in the way.  Either open the VNC port we are using here (port 5902), or for now, disable the firewall (NOT RECOMMENDED FOR PRODUCTION ENVIRONMENTS):

# systemctl stop firewalld
# systemctl disable firewalld

 

Partition and Format the “data” disk

When the linux instances were launched, and extra disk was added to each cluster node to store the application data we will be protecting.  In this case it happens to be MySQL databases.

The second disk should appear as /dev/xvdb.  You can run the “fdisk -l” command to verify.  You’ll see that /dev/xvda (OS) is already being used.

# fdisk -l

Disk /dev/xvda: 10.7 GB, 10737418240 bytes, 20971520 sectors
Units = sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disk label type: gpt

#         Start          End    Size  Type            Name
1         2048         4095      1M  BIOS boot parti
2         4096     20971486     10G  Microsoft basic
Disk /dev/xvdb: 2147 MB, 2147483648 bytes, 4194304 sectors
Units = sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes

 

Here I will create a partition (/dev/xvdb1), format it, and mount it at the default location for MySQL, which is /var/lib/mysql.  Perform the following steps on BOTH “node1” and “node2”:

# fdisk /dev/xvdb
Welcome to fdisk (util-linux 2.23.2).

Changes will remain in memory only, until you decide to write them.
Be careful before using the write command.

Device does not contain a recognized partition table
Building a new DOS disklabel with disk identifier 0x8c16903a.

Command (m for help): n
Partition type:
   p   primary (0 primary, 0 extended, 4 free)
   e   extended
Select (default p): p
Partition number (1-4, default 1): 1
First sector (2048-4194303, default 2048): <enter>
Using default value 2048
Last sector, +sectors or +size{K,M,G} (2048-4194303, default 4194303): <enter>
Using default value 4194303
Partition 1 of type Linux and of size 2 GiB is set

Command (m for help): w
The partition table has been altered!

Calling ioctl() to re-read partition table.
Syncing disks.

# mkfs.ext4 /dev/xvdb1
# mkdir /var/lib/mysql

On node1, mount the filesystem:

# mount /dev/xvdb1 /var/lib/mysql

Install EC2 API Tools

The EC2 API Tools (EC2 CLI) must be installed on each of the cluster nodes, so that the cluster software can later manipulate Route Tables, enabling connectivity to the Virtual IP.

The following URL is an excellent guide to setting this up.:

http://docs.aws.amazon.com/AWSEC2/latest/CommandLineReference/set-up-ec2-cli-linux.html

Here are the key steps:

Download, unzip, and move the CLI tools to the standard location (/opt/aws):

# wget http://s3.amazonaws.com/ec2-downloads/ec2-api-tools.zip
# unzip ec2-api-tools.zip
# mv ec2-api-tools-1.7.5.1/ /opt/aws/
# export EC2_HOME="/opt/aws"

If java isn’t already installed (run “which java” to check), install it:

# yum install java-1.8.0-openjdk

example (Based on default config of RHEL 7.2 system.  Adjust accordingly)

# export JAVA_HOME="/usr/lib/jvm/java-1.8.0-openjdk-1.8.0.71-2.b15.el7_2.x86_64/jre/"

You’ll need your AWS Access Key and AWS Secret Key.  Keep these values handy, because they will be needed later during cluster setup too! Refer to the following URL for more information:

https://console.aws.amazon.com/iam/home?#security_credential

# export AWS_ACCESS_KEY=your-aws-access-key-id
# export AWS_SECRET_KEY=your-aws-secret-key

Test CLI utility functionality:

# /opt/aws/bin/ec2-describe-regions

REGION           eu-west-1       ec2.eu-west-1.amazonaws.com
REGION           ap-southeast-1           ec2.ap-southeast-1.amazonaws.com
REGION           ap-southeast-2           ec2.ap-southeast-2.amazonaws.com
REGION           eu-central-1    ec2.eu-central-1.amazonaws.com
REGION           ap-northeast-2            ec2.ap-northeast-2.amazonaws.com
REGION           ap-northeast-1            ec2.ap-northeast-1.amazonaws.com
REGION           us-east-1         ec2.us-east-1.amazonaws.com
REGION           sa-east-1         ec2.sa-east-1.amazonaws.com
REGION           us-west-1        ec2.us-west-1.amazonaws.com
REGION           us-west-2        ec2.us-west-2.amazonaws.com

 

Install and Configure MySQL

Next, install install the MySQL packages, initialize a sample database, and set “root” password for MySQL.  In RHEL7.X, the MySQL packages have been replaced with the MariaDB packages.

On “node1”:

# yum install mariadb mariadb-server
# mount /dev/xvdb1 /var/lib/mysql
# /usr/bin/mysql_install_db --datadir="/var/lib/mysql/" --user=mysql
# mysqld_safe --user=root --socket=/var/lib/mysql/mysql.sock --port=3306 --datadir=/var/lib/mysql --log &
#
# # NOTE: This next command allows remote connections from ANY host.  NOT a good idea for production!
# echo “update user set Host='%' where Host='node1'; flush privileges | mysql mysql
#
# #Set MySQL's root password to 'SIOS'
# echo "update user set Password=PASSWORD('SIOS') where User='root'; flush privileges" | mysql mysql

Create a MySQL configuration file. We will place this on the data disk  (that will later be replicated – /var/lib/mysql/my.cnf).  Example:

# vi /var/lib/mysql/my.cnf

[mysqld]
datadir=/var/lib/mysql
socket=/var/lib/mysql/mysql.sock
pid-file=/var/run/mariadb/mariadb.pid
user=root
port=3306
# Disabling symbolic-links is recommended to prevent assorted security risks
symbolic-links=0
 
[mysqld_safe]
log-error=/var/log/mariadb/mariadb.log
pid-file=/var/run/mariadb/mariadb.pid 
 
[client]
user=root
password=SIOS

Move the original MySQL configuration file aside, if it exists:

# mv /etc/my.cnf /etc/my.cnf.orig

On “node2”:

On “node2”, you ONLY need to install the MariaDB/MySQL packages.  The other steps aren’t required:

[root@node2 ~]# yum install mariadb mariadb-server

Install and Configure the Cluster

At this point, we are ready to install and configure our cluster.  SIOS Protection Suite for Linux (aka SPS-Linux) will be used in this guide as the clustering technology.  It provides both high availability failover clustering features (LifeKeeper) as well as real-time, block level data replication (DataKeeper) in a single, integrated solution.  SPS-Linux enables you to deploy a “SANLess” cluster, aka a “shared nothing” cluster meaning that cluster nodes don’t have any shared storage, as is the case with EC2 Instances.

Install SIOS Protection Suite for Linux

Perform the following steps on ALL 3 VMs (node1, node2, witness):

Download the SPS-Linux installation image file (sps.img) and and obtain either a trial license or purchase permanent licenses.  Contact SIOS for more information.

You will loopback mount it and run the “setup” script inside, as root (or first “sudo su -” to obtain a root shell)

For example:

# mkdir /tmp/install
# mount -o loop sps.img /tmp/install
# cd /tmp/install
# ./setup

During the installation script, you’ll be prompted to answer a number of questions.  You will hit Enter on almost every screen to accept the default values.  Note the following exceptions:

  • On the screen titled “High Availability NFS” you may select “n” as we will not be creating a highly available NFS server
  • Towards the end of the setup script, you can choose to install a trial license key now, or later. We will install the license key later, so you can safely select “n” at this point
  • In the final screen of the “setup” select the ARKs (Application Recovery Kits, i.e. “cluster agents”) you wish to install from the list displayed on the screen.
    • The ARKs are ONLY required on “node1” and “node2”.  You do not need to install on “witness”
    • Navigate the list with the up/down arrows, and press SPACEBAR to select the following:
      • lkDR – DataKeeper for Linux
      • lkSQL – LifeKeeper MySQL RDBMS Recovery Kit
    • This will result in the following additional RPMs installed on “node1” and “node2”:
      • steeleye-lkDR-9.0.2-6513.noarch.rpm
      • steeleye-lkSQL-9.0.2-6513.noarch.rpm

Install Witness/Quorum package

The Quorum/Witness Server Support Package for LifeKeeper (steeleye-lkQWK) combined with the existing failover process of the LifeKeeper core allows system failover to occur with a greater degree of confidence in situations where total network failure could be common. This effectively means that failovers can be done while greatly reducing the risk of “split-brain” situations.

Install the Witness/Quorum rpm on all 3 nodes (node1, node2, witness):

# cd /tmp/install/quorumrpm -Uvh steeleye-lkQWK-9.0.2-6513.noarch.rpm

On ALL 3 nodes (node1, node2, witness), edit /etc/default/LifeKeeper, set

NOBCASTPING=1

On ONLY the Witness server (“witness”), edit /etc/default/LifeKeeper, set

WITNESS_MODE=off/none

Install the EC2 Recovery Kit Package

SPS-Linux provides specific features that allow resources to failover between nodes in different availability zones and regions. Here, the EC2 Recovery Kit (i.e. cluster agent) is used to manipulate Route Tables so that connections to the Virtual IP are routed to the active cluster node.

Install the EC2 rpm (node1, node2):

# cd /tmp/install/amazonrpm -Uvh steeleye-lkECC-9.0.2-6513.noarch.rpm

Install a License key

On all 3 nodes, use the “lkkeyins” command to install the license file that you obtained from SIOS:

# /opt/LifeKeeper/bin/lkkeyins <path_to_file>/<filename>.lic

Start LifeKeeper

On all 3 nodes, use the “lkstart” command to start the cluster software:

# /opt/LifeKeeper/bin/lkstart

Set User Permissions for LifeKeeper GUI

On all 3 nodes, create a new linux user account (i.e. “tony” in this example).  Edit /etc/group and add the “tony” user to the “lkadmin” group to grant access to the LifeKeeper GUI.  By default only “root” is a member of the group, and we don’t have the root password here:

# useradd tony
# passwd tony
# vi /etc/group

lkadmin:x:1001:root,tony

Open the LifeKeeper GUI

Make a VNC connection to the Elastic IP (Public IP) address of node1.  Based on the VNC  configuration from above, you would connect to <Public_IP>:2 using the VNC password you specified earlier.  Once logged in, open a terminal window and run the LifeKeeper GUI using the following command:

# /opt/LifeKeeper/bin/lkGUIapp &

You will be prompted to connect to your first cluster node (“node1”).  Enter the linux userid and password specified during VM creation:

lk-gui-connect1

Next, connect to both “node2” and “witness” by clicking the “Connect to Server” button highlighted in the following screenshot:

lk-gui-connect2

You should now see all 3 servers in the GUI, with a green checkmark icon indicating they are online and healthy:

lk-gui-connect3

Create Communication Paths

Right-click on “node1” and select Create Comm Path

comm path1

Select BOTH “node2” and “witness” and then follow the wizard.  This will create comm paths between:

  • node1 & node2
  • node1 & witness

comm path2

A comm path still needs to be created between node2 & witness.   Right click on “node2” and select Create Comm Path.  Follow the wizard and select “witness” as the remote server:

comm path3

At this point the following comm paths have been created:

  • node1 <—> node2
  • node1 <—> witness
  • node2 <—> witness

The icons in front of the servers have changed from a green “checkmark” to a yellow “hazard sign”.  This is because we only have a single communication path between nodes.

If the VMs had multiple NICs (information on creating Azure VMs with multiple NICs can be found here, but won’t be covered in this article), you would create redundant comm paths between each server.

comm path4

To remove the warning icons, go to the View menu and de-select “Comm Path Redundancy Warning”:

comm path5

Result:

comm path6

Verify Communication Paths

Use the “lcdstatus” command to view the state of cluster resources.  Run the following commands to verify that you have correctly created comm paths on each node to the other two servers involved:

# /opt/LifeKeeper/bin/lcdstatus -q -d node1

MACHINE  NETWORK ADDRESSES/DEVICE   STATE     PRIO

node2    TCP     10.0.0.4/10.0.1.4  ALIVE        1

witness  TCP     10.0.0.4/10.0.2.4  ALIVE        1

#/opt/LifeKeeper/bin/lcdstatus -q -d node2

MACHINE  NETWORK ADDRESSES/DEVICE   STATE     PRIO

node1    TCP     10.0.1.4/10.0.0.4  ALIVE        1

witness  TCP     10.0.1.4/10.0.2.4  ALIVE        1

#/opt/LifeKeeper/bin/lcdstatus -q -d witness

MACHINE  NETWORK ADDRESSES/DEVICE   STATE     PRIO

node1    TCP     10.0.2.4/10.0.0.4  ALIVE        1

node2    TCP     10.0.2.4/10.0.1.4  ALIVE        1

Create a Data Replication cluster resource (i.e. Mirror)

Next, create a Data Replication resource to replicate the /var/lib/mysql partition from node1 (source) to node2 (target).  Click the “green plus” icon to create a new resource:

data replication1

Follow the wizard with these selections:

Please Select Recovery Kit:  Data Replication
Switchback Type: intelligent
Server: node1
Hierarchy Type: Replicate Exiting Filesystem
Existing Mount Point: /var/lib/mysql
Data Replication Resource Tag: datarep-mysql
File System Resource Tab: /var/lib/mysql
Bitmap File: (default value)
Enable Asynchronous Replication:  No

After the resource has been created, the “Extend” (i.e. define backup server) wizard will appear.  Use the following selections:

Target Server: node2
Switchback Type: Intelligent
Template Priority: 1
Target Priority: 10
Target Disk: /dev/xvdb1
Data Replication Resource Tag: datarep-mysql
Bitmap File: (default value)
Replication Path: 10.0.0.4/10.0.1.4
Mount Point: /var/lib/mysql
Root Tag: /var/lib/mysql

The cluster will look like this:

data replication2

Create Virtual IP

Next, create a Virtual IP cluster resource.  Click the “green plus” icon to create a new resource:

virtual ip1

Follow the wizard with to create the IP resource with these selections:

Select Recovery Kit: IP
Switchback Type: Intelligent
IP Resource: 10.1.0.10
Netmask: 255.255.255.0
Network Interface: eth0
IP Resource Tag: ip-10.1.0.10

Extend the IP resource with these selections:

Switchback Type: Intelligent
Template Priority: 1
Target Priority: 10
IP Resource: 10.1.0.10
Netmask: 255.255.255.0
Network Interface: eth0
IP Resource Tag: ip-10.1.0.10

The cluster will now look like this, with both Mirror and IP resources created:

cluster-after-mirror-and-vip

Configure a Ping List for the IP resource

By default, SPS-Linux monitors the health of IP resources by performing a broadcast ping.  In many virtual and cloud environments, broadcast pings don’t work.  In a previous step, we set “NOBCASTPING=1” in /etc/default/LifeKeeper to turn off broadcast ping checks. Instead, we will define a ping list.  This is a list of IP addresses to be pinged during IP health checks for this IP resource.   In this guide, we will add the witness server (10.0.2.4) to our ping list.

Right click on the IP resource (ip-10.1.0.10) and select Properties:

aws-ping-list1

You will see that initially, no ping list is configured for our 10.1.0.0 subnet.   Click “Modify Ping List”:

aws-ping-list2

Enter “10.0.2.4” (the IP address of our witness server), click “Add address” and finally click “Save List”:

aws-ping-list3

You will be returned to the IP properties panel, and can verify that 10.0.2.4 has been added to the ping list.  Click OK to close the window:

aws-ping-list4

Create the MySQL resource hierarchy

Next, create a MySQL cluster resource.  The MySQL resource is responsible for stopping/starting/monitoring of your MySQL database.

Before creating MySQL resource, make sure the database is running.  Run “ps -ef | grep sql” to check.

If it’s running, great – nothing to do.  If not, start the database back up:

# mysqld_safe --user=root --socket=/var/lib/mysql/mysql.sock --port=3306 --datadir=/var/lib/mysql --log &

To create, click the “green plus” icon to create a new resource:

Follow the wizard with to create the IP resource with these selections:

Select Recovery Kit: MySQL Database
Switchback Type: Intelligent
Server: node1
Location of my.cnf: /var/lib/mysql
Location of MySQL executables: /usr/bin
Database Tag: mysql

Extend the IP resource with the following selections:

Target Server: node2
Switchback Type: intelligent
Template Priority: 1
Target Priority: 10

As a result, your cluster will look as follows.  Notice that the Data Replication resource was automatically moved underneath the database (dependency automatically created) to ensure it’s always brought online before the database:

aws-after-mysql-resource1

Create an EC2 resource to manage the route tables upon failover

SPS-Linux provides specific features that allow resources to failover between nodes in different availability zones and regions. Here, the EC2 Recovery Kit (i.e. cluster agent) is used to manipulate Route Tables so that connections to the Virtual IP are routed to the active cluster node.

To create, click the “green plus” icon to create a new resource:

Follow the wizard with to create the EC2 resource with these selections:

Select Recovery Kit: Amazon EC2
Switchback Type: Intelligent
Server: node1
EC2 Home: /opt/aws
EC2 URL: ec2.us-west-2.amazonaws.com
AWS Access Key: (enter Access Key obtained earlier)
AWS Secret Key: (enter Secret Key obtained earlier)
EC2 Resource Type: RouteTable (Backend cluster)
IP Resource: ip-10.1.0.10
EC2 Resource Tag: ec2-10.1.0.10

Extend the IP resource with the following selections:

Target Server: node2
Switchback Type: intelligent
Template Priority: 1
Target Priority: 10
EC2 Resource Tag: ec2-10.1.0.10

The cluster will look like this.  Notice how the EC2 resource is underneath the IP resource:

aws-after-ec2-resource1

Create a Dependency between the IP resource and the MySQL Database resource

Create a dependency between the IP resource and the MySQL Database resource so that they failover together as a group.  Right click on the “mysql” resource and select “Create Dependency”:

create-dependency1

On the following screen, select the “ip-10.1.0.10” resource as the dependency.  Click Next and continue through the wizard:

create-dependency2

At this point the SPS-Linux cluster configuration is complete.  The resource hierarchy will look as follows:

create-dependency3

Test Cluster Connectivity

At this point, all of our Amazon EC2 and Cluster configurations are complete!

Cluster resources are currently active on node1:

create-dependency3

Test connectivity to the cluster from the witness server (or another linux instance if you have one)  SSH into the witness server, “sudo su -” to gain root access.   Install the mysql client if needed:

[root@witness ~]# yum -y install mysql

Test MySQL connectivity to the cluster:

[root@witness ~]# mysql --host=10.1.0.10 mysql -u root -p

Execute the following MySQL query to display the hostname of the active cluster node:

MariaDB [mysql]> select @@hostname;
+------------+
| @@hostname |
+------------+
| node1      |
+------------+
1 row in set (0.00 sec)
MariaDB [mysql]>

Using LifeKeeper GUI, failover from Node1 -> Node2″.  Right click on the mysql resource underneath node2, and select “In Service…”:

aws-failover1

 

After failover has completed, re-run the MySQL query.  You’ll notice that the MySQL client has detected that the session was lost (during failover) and automatically reconnects:

Execute the following MySQL query to display the hostname of the active cluster node, verifying that now “node2” is active:

MariaDB [mysql]> select @@hostname;
ERROR 2006 (HY000): MySQL server has gone away
No connection. Trying to reconnect...
Connection id:    12
Current database: mysql
+------------+
| @@hostname |
+------------+
| node2      |
+------------+
1 row in set (0.53 sec)
MariaDB [mysql]>

 

Mar 092016
 

In this step by step guide I will take you through all steps required to configure a highly available, 2-node MySQL cluster (plus witness server) in Microsoft Azure IaaS (Infrastructure as a Service).  The guide includes both screenshots, shell commands and code snippets as appropriate.  I assume that you are somewhat familiar with Microsoft Azure and already have an Azure account with an associated subscription.  If not, you can sign up for a free account today.  I’m also going to assume that you have basic linux system administration skills as well as understand basic failover clustering concepts like Virtual IPs, etc.

Disclaimer: Azure is a rapidly moving target.  It’s getting better and better every day!  As such, features/screens/buttons are bound to change over time so your experience may vary slightly from what you’ll see below.  While this guide will show you how to make a MySQL database highly available, you could certainly adapt this information and process to protect other applications or databases, like SAP, Oracle, PostgreSQL, NFS file servers, and more.

These are the high level steps to create a highly available MySQL database within Microsoft Azure IaaS:

  1. Create a Resource Group
  2. Create a Virtual Network
  3. Create a Storage Account
  4. Create Virtual Machines in an Availability Set
  5. Set VM Static IP Addresses
  6. Add a Data Disk to cluster nodes
  7. Create Inbound Security Rule to allow VNC access
  8. Linux OS Configuration
  9. Install and Configure MySQL
  10. Install and Configure Cluster
  11. Create an Internal Load Balancer
  12. Test Cluster Connectivity

Overview

This article will describe how to create a cluster within a single Azure region.  The cluster nodes (node1, node2 and the witness server) will reside in an Availability Set (3 different Fault Domains and Update Domains), thanks to the new Azure Resource Manager (ARM). We will be creating all resources using the new Azure Resource Manager.

The configuration will look like this:

Cluster-Diagram

The following IP addresses will be used:

  • node1: 10.0.0.4
  • node2: 10.0.0.5
  • witness: 10.0.0.6
  • virtual/”floating” IP: 10.0.0.99
  • MySQL port: 3306

Create a Resource Group

First, create a Resource Group.  Your resource group will end up containing all of the various objects related to our cluster deployment: virtual machines, virtual network, storage account, etc.  Here we will call our newly created Resource Group “cluster-resources”.


resource group1

Be mindful when selecting your region.  All of your resources will need to reside within the same region.  Here, we’ll be deploying everything into the “West US” region:

resource group2

Create a Virtual Network (VNet)

Next, create a Virtual Network.  A Virtual Network is an isolated network within the Azure cloud that is dedicated to you.  You have full control over things like IP address blocks and subnets, routing, security policies (i.e. firewalls), DNS settings, and more.  You will be launching your Azure Iaas virtual machines (VMs) into your Virtual Network.

virtual network1

Make sure you select Resource Manager as the deployment model anytime you are given the option:

virtual network2

Give your new Virtual Network a name (“virtual-network”) and make sure you select the resource group that was created in the previous step (“cluster-resources”).  Your Virtual Network needs to reside in the same region as your Resource Group.  We will leave the IP Address and Subnet values as default.

virtual network3

Create a Storage Account

Before you provision any Virtual Machines, you’ll need to create a Storage Account where they will be stored.

storage account1

Again, make sure you select Resource Manager as the deployment model anytime you are given the option:

storage account2

Next, give your new storage account a name.  The storage account name must be unique across *ALL* of Azure.  (Every object that you store in Azure Storage has a unique URL address. The storage account name forms the subdomain of that address.)  In this example I call my storage account “linuxclusterstorage” but you’ll need to select something different as you setup your own.

Select a storage Type based on your requirements and budget.  For the purposes of this guide, I selected “Standard-LRS” (i.e. Locally Redundant) to minimize cost.

Make sure your new Storage Account is added to the Resource Group you created in Step 1 (“cluster-resources”)  in the same Location (“West US” in this example):

storage account3

Create Virtual Machines in an Availability Set

We will be provisioning 3 Virtual Machines in this guide.  The first two VMs (I’ll call them “node1” and “node2”) will function as cluster nodes with the ability to bring the MySQL database and it’s associated resources online.  The 3rd VM will act as the cluster’s witness server for added protection against split-brain.

To ensure maximum availability, all 3 VMs will be added to the same Availability Set, ensuring that they will end up in different Fault Domains and Update Domains.

Create “node1” VM

Create your first VM (“node1”).  In this guide we will be using CentOS 6.X:

create vm1

Make sure you use the Resource Manager deployment model.  It should be selected by default:

create vm2

Give the VM a hostname (“node1”) and username/password that will later be used to SSH into the system.  Make sure you add this VM to your Resource Group (“cluster-resources”) and that it resides in the same region as all of your other resources:

create vm3

Next, choose your instance size.  For more information on the various instance sizes available, click here.

For the purposes of this guide, I’m using “A3 Standard” for Node1 and Node2, to minimize cost since this won’t be running a production workload.  I used an even smaller “A1 Standard” size for the witness server.  Select the instance size that makes most sense for you.

create vm4

If you want to be able to connect into the VM from the outside world, set a Public IP address.  I did this so I can later SSH and VNC into the system

create vm5

IMPORTANT: By default, your VM won’t be added to an Availability Set.  On the Settings screen during make sure you create a new Availability Set, we’ll call “cluster-availability-set”.  Azure Resource Manager (ARM) allows your to create Availability Sets with 3 Fault Domains.  The default values here are fine:

create vm6

Review your VM properties and click OK to create your first VM:

create vm7

Create “node2” and “witness” VMs

Repeat the steps above twice to create two more VMs.  I created another “A3 Standard” size VM called “node2” and an “A1 Standard” size VM called “witness”.

The only difference here is that you’ll be ADDING these VMs to the Availability Set (“cluster-availability-set”) we just created:

create vm8

It may take a little while for your 3 VMs to provision.  Once complete, you’ll see your VMs listed on the Virtual Machines screen within your Azure Portal:

create vm9

Set VM Static IP Addresses

The VMs will be set with the following IP addresses:

  • node1: 10.0.0.4
  • node2: 10.0.0.5
  • witness: 10.0.0.6

Repeat this step for each VM.  Select your VM and edit the Network Interfaces

 

static ip address1

Select the network interface associated with the VM, and edit IP addresses.  Select “Static” and specify the desired IP address:

static ip address2

Add a Data Disk to cluster nodes

Next, we will need to add a extra disk to of our cluster nodes (“node1” and “node2”).  This disk will store our MySQL databases and the later be replicated between nodes.

Note: You do NOT need to add an extra disk to the “witness” node.  Only “node1” and “node2”.

Edit your VM, select Disks and then attach a new disk:

second disk1

Select a disk type (Standard or Premium SSD)  and size based on your workload.  Here I create a 10GB Standard disk on both of my cluster nodes.  As far as Host caching goes, “None” or “Read only” caching is fine.  I do not recommend using “Read/Write” as there is potential for data loss:

second disk2

Create Inbound Security Rule to allow VNC access

If your VM is part of a Network Security Group (NSG), which by default it likely is unless you disabled it during VM creation, the only port open in the “Azure firewall” is SSH (port 22).  Later in the guide, I’ll be using VNC to access the desktop of “node1” and configure the cluster using a GUI.  Create an Inbound Security Rule to open up VNC access.  In this guide port 5902 is used.  Adjust this according based on your VNC configuration.

Virtual Machines -> (select node1) -> Network interfaces -> (select NIC) -> Network security group -> (select the NSG) -> Inbound security rules -> Add


inbound security rule1

Linux OS Configuration

Here is where we will leave the Azure Portal for a little while and get our hands dirty on the command line, which as a Linux administrator you should be used to by now.  You aren’t given the root password to your Linux VMs in Azure, so once you login as the user specified during VM creation, use the “sudo” command to gain root privileges:

$sudo su -

Edit /etc/hosts

Unless you have already have a DNS server setup, you’ll want to create host file entries on all 3 servers so that they can properly resolve each other by name

Add the following lines to the end of your /etc/hosts file:

10.0.0.4    node1
10.0.0.5    node2
10.0.0.6    witness
10.0.0.99   mysql-vip

Disable SELinux

Edit /etc/sysconfig/linux and set “SELINUX=disabled”:

# vi /etc/sysconfig/selinux

# This file controls the state of SELinux on the system.
# SELINUX= can take one of these three values:
#     enforcing - SELinux security policy is enforced.
#     permissive - SELinux prints warnings instead of enforcing.
#     disabled - No SELinux policy is loaded.
SELINUX=disabled
# SELINUXTYPE= can take one of these two values:
#     targeted - Targeted processes are protected,
#     mls - Multi Level Security protection.
SELINUXTYPE=targeted

Configure iptables so that cluster the Virtual IP will work

IMPORTANT: In order to get connectivity to the cluster Virtual IP to work, and also monitoring of the IP resource, a few iptables rules need to be setup.  Note: 10.0.0.99 is the Virtual IP we’ll be using in our cluster, and 3306 is the default port used my MySQL.

On node1 (10.0.0.4), run the following commands:

# iptables --flush
# iptables -t nat -A PREROUTING -p tcp --dport 3306 -j DNAT --to-destination 10.0.0.99:3306
# iptables -t nat -A POSTROUTING -p icmp -s 10.0.0.99 -j SNAT --to-source 10.0.0.4
# service iptables save
# chkconfig iptables on

On Node2 (10.0.0.5), run the following commands:

# iptables --flush
# iptables -t nat -A PREROUTING -p tcp --dport 3306 -j DNAT --to-destination 10.0.0.99:3306
# iptables -t nat -A POSTROUTING -p icmp -s 10.0.0.99 -j SNAT --to-source 10.0.0.5
# service iptables save
# chkconfig iptables on

Install and Configure VNC (and related packages)

In order to access the GUI of our linux servers, to later configure our cluster, install VNC server on your cluster node.  In my setup I only did this on “node1”

# yum install tigervnc-server xterm
# vncpasswd
# vi /etc/sysconfig/vncservers

      VNCSERVERS="2:root"
      VNCSERVERARGS[2]="-geometry 1024x768"

# service vncserver start
# chkconfig vncserver on

Test connectivity by opening a VNC client on your laptop/desktop, and connecting to the Public IP of your cluster node

Reboot Cluster Nodes

Reboot your cluster nodes so that SELinux is disabled, and the 2nd disk you previously added is detected. Only “node1” and “node2” need to be rebooted.

Partition and Format the “data” disk

In Step 6 of this guide (“Add a Data Disk to cluster nodes”) we did just that….added an extra disk to each cluster node to store the application data we will be protecting.  In this case it happens to be MySQL databases.

In Azure IaaS, Linux Virtual Machines use the following arrangement for disks:

  • /dev/sda – OS disk
  • /dev/sdb – temporary disk
  • /dev/sdc – 1st data disk
  • /dev/sdd – 2nd data disk
  • /dev/sdj – 8th data disk

The disk we added in Step 6 of this guide should appear as /dev/sdc.  You can run the “fdisk -l” command to verify.  You’ll see that /dev/sda (OS) and /dev/sdb (temporary) already have disk partitions and are being used.

# fdisk -l

Disk /dev/sdb: 306.0 GB, 306016419840 bytes
255 heads, 63 sectors/track, 37204 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disk identifier: 0xd3920649

Device Boot      Start         End      Blocks   Id  System
/dev/sdb1   *           1       37205   298842112   83  Linux

Disk /dev/sdc: 10.7 GB, 10737418240 bytes
255 heads, 63 sectors/track, 1305 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disk identifier: 0x00000000

Disk /dev/sda: 32.2 GB, 32212254720 bytes
255 heads, 63 sectors/track, 3916 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disk identifier: 0x000c23d3

Device Boot      Start         End      Blocks   Id  System
/dev/sda1   *           1        3789    30432256   83  Linux
/dev/sda2            3789        3917     1024000   82  Linux swap / Solaris

 

Here I will create a partition (/dev/sdc1), format it, and mount it at the default location for MySQL, which is /var/lib/mysql.  Perform the following steps on BOTH “node1” and “node2”:

# fdisk /dev/sdc
Command (m for help): n
Command action
e   extended
p   primary partition (1-4)
p
Partition number (1-4): 1
First cylinder (1-1305, default 1): <enter>
Using default value 1
Last cylinder, +cylinders or +size{K,M,G} (1-1305, default 1305): <enter>
Using default value 1305
 
Command (m for help): w
The partition table has been altered!
Calling ioctl() to re-read partition table.
Syncing disks.
[root@node1 ~]#

# mkfs.ext4 /dev/sdc1
# mkdir /var/lib/mysql

On node1, mount the filesystem:

# mount /dev/sdc1 /var/lib/mysql

Install and Configure MySQL

Next, install install the MySQL packages, initialize a sample database, and set “root” password for MySQL.

On “node1”:

# yum -y install mysql mysql-server
# /usr/bin/mysql_install_db --datadir="/var/lib/mysql/" --user=mysql
# mysqld_safe --user=root --socket=/var/lib/mysql/mysql.sock --port=3306 --datadir=/var/lib/mysql --log &
#
# # NOTE: This next command allows remote connections from ANY host.  NOT a good idea for production!
# echo “update user set Host='%' where Host='node1'; flush privileges | mysql mysql
#
# #Set MySQL's root password to 'SIOS'
# echo "update user set Password=PASSWORD('SIOS') where User='root'; flush privileges" | mysql mysql

Create a MySQL configuration file. We will place this on the data disk  (that will later be replicated – /var/lib/mysql/my.cnf).  Example:

# vi /var/lib/mysql/my.cnf

[mysqld]
datadir=/var/lib/mysql
socket=/var/lib/mysql/mysql.sock
pid-file=/var/lib/mysql/mysqld.pid
user=root
port=3306
# Disabling symbolic-links is recommended to prevent assorted security risks
symbolic-links=0
 
[mysqld_safe]
log-error=/var/log/mysqld.log
pid-file=/var/run/mysqld/mysqld.pid
 
[client]
user=root
password=SIOS

Delete the original MySQL configuration file, located in /etc, if it exists:

# rm /etc/my.cnf

On “node2”:

On “node2”, you ONLY need to install the MySQL packages.  The other steps aren’t required:

[root@node2 ~]# yum -y install mysql mysql-server

Install and Configure the Cluster

At this point, we are ready to install and configure our cluster.  SIOS Protection Suite for Linux (aka SPS-Linux) will be used in this guide as the clustering technology.  It provides both high availability failover clustering features (LifeKeeper) as well as real-time, block level data replication (DataKeeper) in a single, integrated solution.  SPS-Linux enables you to deploy a “SANLess” cluster, aka a “shared nothing” cluster meaning that cluster nodes don’t have any shared storage, as is the case with Azure VMs.

Install SIOS Protection Suite for Linux

Perform the following steps on ALL 3 VMs (node1, node2, witness):

Download the SPS-Linux installation image file (sps.img) and and obtain either a trial license or purchase permanent licenses.  Contact SIOS for more information.

You will loopback mount it and run the “setup” script inside, as root (or first “sudo su -” to obtain a root shell)

For example:

# mkdir /tmp/install
# mount -o loop sps.img /tmp/install
# cd /tmp/install
# ./setup

During the installation script, you’ll be prompted to answer a number of questions.  You will hit Enter on almost every screen to accept the default values.  Note the following exceptions:

  • On the screen titled “High Availability NFS” you may select “n” as we will not be creating a highly available NFS server
  • Towards the end of the setup script, you can choose to install a trial license key now, or later. We will install the license key later, so you can safely select “n” at this point
  • In the final screen of the “setup” select the ARKs (Application Recovery Kits, i.e. “cluster agents”) you wish to install from the list displayed on the screen.
    • The ARKs are ONLY required on “node1” and “node2”.  You do not need to install on “witness”
    • Navigate the list with the up/down arrows, and press SPACEBAR to select the following:
      • lkDR – DataKeeper for Linux
      • lkSQL – LifeKeeper MySQL RDBMS Recovery Kit
    • This will result in the following additional RPMs installed on “node1” and “node2”:
      • steeleye-lkDR-9.0.2-6513.noarch.rpm
      • steeleye-lkSQL-9.0.2-6513.noarch.rpm

Install Witness/Quorum package

The Quorum/Witness Server Support Package for LifeKeeper (steeleye-lkQWK) combined with the existing failover process of the LifeKeeper core allows system failover to occur with a greater degree of confidence in situations where total network failure could be common. This effectively means that failovers can be done while greatly reducing the risk of “split-brain” situations.

Install the Witness/Quorum rpm on all 3 nodes (node1, node2, witness):

# cd /tmp/install/quorumrpm -Uvh steeleye-lkQWK-9.0.2-6513.noarch.rpm

On ALL 3 nodes (node1, node2, witness), edit /etc/default/LifeKeeper, set

NOBCASTPING=1

On ONLY the Witness server (“witness”), edit /etc/default/LifeKeeper, set

WITNESS_MODE=off/none

Install a License key

On all 3 nodes, use the “lkkeyins” command to install the license file that you obtained from SIOS:

# /opt/LifeKeeper/bin/lkkeyins <path_to_file>/<filename>.lic

Start LifeKeeper

On all 3 nodes, use the “lkstart” command to start the cluster software:

# /opt/LifeKeeper/bin/lkstart

Set User Permissions for LifeKeeper GUI

On all 3 nodes, edit /etc/group and add the “tony” user (or whatever username you specified during VM creation) to the “lkadmin” group to grant access to the LifeKeeper GUI.  By default only “root” is a member of the group, and we don’t have the root password in :

# vi /etc/group

lkadmin:x:1001:root,tony

Open the LifeKeeper GUI

Make a VNC connection to the Public IP address of node1.  Based on the VNC and Inbound Security Rule configuration from above, you would connect to <Public_IP>:2 using the VNC password you specified earlier.  Once logged in, open a terminal window and run the LifeKeeper GUI using the following command:

# /opt/LifeKeeper/bin/lkGUIapp &

You will be prompted to connect to your first cluster node (“node1”).  Enter the linux userid and password specified during VM creation:

lk-gui-connect1

Next, connect to both “node2” and “witness” by clicking the “Connect to Server” button highlighted in the following screenshot:

lk-gui-connect2

You should now see all 3 servers in the GUI, with a green checkmark icon indicating they are online and healthy:

lk-gui-connect3

Create Communication Paths

Right-click on “node1” and select Create Comm Path

comm path1

Select BOTH “node2” and “witness” and then follow the wizard.  This will create comm paths between:

  • node1 & node2
  • node1 & witness

comm path2

A comm path still needs to be created between node2 & witness.   Right click on “node2” and select Create Comm Path.  Follow the wizard and select “witness” as the remote server:

comm path3

At this point the following comm paths have been created:

  • node1 <—> node2
  • node1 <—> witness
  • node2 <—> witness

The icons in front of the servers have changed from a green “checkmark” to a yellow “hazard sign”.  This is because we only have a single communication path between nodes.

If the VMs had multiple NICs (information on creating Azure VMs with multiple NICs can be found here, but won’t be covered in this article), you would create redundant comm paths between each server.

comm path4

To remove the warning icons, go to the View menu and de-select “Comm Path Redundancy Warning”:

comm path5

Result:

comm path6

Verify Communication Paths

Use the “lcdstatus” command to view the state of cluster resources.  Run the following commands to verify that you have correctly created comm paths on each node to the other two servers involved:

# /opt/LifeKeeper/bin/lcdstatus -q -d node1

MACHINE  NETWORK ADDRESSES/DEVICE   STATE     PRIO

node2    TCP     10.0.0.4/10.0.0.5  ALIVE        1

witness  TCP     10.0.0.4/10.0.0.6  ALIVE        1

#/opt/LifeKeeper/bin/lcdstatus -q -d node2

MACHINE  NETWORK ADDRESSES/DEVICE   STATE     PRIO

node1    TCP     10.0.0.5/10.0.0.4  ALIVE        1

witness  TCP     10.0.0.5/10.0.0.6  ALIVE        1

#/opt/LifeKeeper/bin/lcdstatus -q -d witness

 

MACHINE  NETWORK ADDRESSES/DEVICE   STATE     PRIO

node1    TCP     10.0.0.6/10.0.0.4  ALIVE        1

node2    TCP     10.0.0.6/10.0.0.5  ALIVE        1

Create a Data Replication cluster resource (i.e. Mirror)

Next, create a Data Replication resource to replicate the /var/lib/mysql partition from node1 (source) to node2 (target).  Click the “green plus” icon to create a new resource:

data replication1

Follow the wizard with these selections:

Please Select Recovery Kit:  Data Replication
Switchback Type: intelligent
Server: node1
Hierarchy Type: Replicate Exiting Filesystem
Existing Mount Point: /var/lib/mysql
Data Replication Resource Tag: datarep-mysql
File System Resource Tab: /var/lib/mysql
Bitmap File: (default value)
Enable Asynchronous Replication:  No

After the resource has been created, the “Extend” (i.e. define backup server) wizard will appear.  Use the following selections:

Target Server: node2
Switchback Type: Intelligent
Template Priority: 1
Target Priority: 10
Target Disk: /dev/sdc1
Data Replication Resource Tag: datarep-mysql
Bitmap File: (default value)
Replication Path: 10.0.0.4/10.0.0.5
Mount Point: /var/lib/mysql
Root Tag: /var/lib/mysql

The cluster will look like this:

data replication2

Create Virtual IP

Next, create a Virtual IP cluster resource.  Click the “green plus” icon to create a new resource:

virtual ip1

Follow the wizard with to create the IP resource with these selections:

Select Recovery Kit: IP
Switchback Type: Intelligent
IP Resource: 10.0.0.99
Netmask: 255.255.255.0
Network Interface: eth0
IP Resource Tag: ip-10.0.0.99

Extend the IP resource with these selections:

Switchback Type: Intelligent
Template Priority: 1
Target Priority: 10
IP Resource: 10.0.0.99
Netmask: 255.255.255.0
Network Interface: eth0
IP Resource Tag: ip-10.0.0.99

Configure a Ping List for the IP resource

By default, SPS-Linux monitors the health of IP resources by performing a broadcast ping.  In many virtual and cloud environments, broadcast pings don’t work.  In a previous step, we set “NOBCASTPING=1” in /etc/default/LifeKeeper to turn off broadcast ping checks. Instead, we will define a ping list.  This is a list of IP addresses to be pinged during IP health checks for this IP resource.   In this guide, we will add the witness server (10.0.0.6) to our ping list.

Right click on the IP resource (ip-10.0.0.99) and select Properties:

ping list1

You will see that initially, no ping list is configured for our 10.0.0.0 subnet.   Click “Modify Ping List”:

ping list2

Enter “10.0.0.6” (the IP address of our witness server), click “Add address” and finally click “Save List”:

ping list3

You will be returned to the IP properties panel, and can verify that 10.0.0.6 has been added to the ping list.  Click OK to close the window:

ping list4

Create the MySQL resource hierarchy

Next, create a MySQL cluster resource.  The MySQL resource is responsible for stopping/starting/monitoring of your MySQL database.  To create, click the “green plus” icon to create a new resource:

Follow the wizard with to create the IP resource with these selections:

Select Recovery Kit: MySQL Database
Switchback Type: Intelligent
Server: node1
Location of my.cnf: /var/lib/mysql
Location of MySQL executables: /usr/bin
Database Tag: mysql

Extend the IP resource with the following selections:

Target Server: node2
Switchback Type: intelligent
Template Priority: 1
Target Priority: 10

As a result, your cluster will look as follows.  Notice that the Data Replication resource was automatically moved underneath the database (dependency automatically created) to ensure it’s always brought online before the database:

mysql-resource1

Create a Dependency between the IP resource and the MySQL Database resource

Create a dependency between the IP resource and the MySQL Database resource so that they failover together as a group.  Right click on the “mysql” resource and select “Create Dependency”:

create-dependency1

On the following screen, select the “ip-10.0.0.99” resource as the dependency.  Click Next and continue through the wizard:

create-dependency2

At this point the SPS-Linux cluster configuration is complete.  The resource hierarchy will look as follows:

create-dependency3

 

Create an Internal Load Balancer

If this was a typical on-premises cluster using either physical or virtual servers, you’d be done at this point.  Clients and Applications would connect into the Virtual IP of the cluster (10.0.0.99) to reach the active node.  In Azure, this doesn’t work without some additional configuration.

You will notice that you can’t connect to the Virtual IP from any server other than the node that is currently active.  Most cloud providers, including Azure, do not allow or support gratuitous ARPs which is the reason you can’t connect to the Virtual IP directly.

To workaround this, Azure provides a feature were you can setup an Internal Load Balancer (ILB).  Essentially, when you connect to the IP address of the ILB (which we will actually set to be the same as the cluster’s Virtual IP – 10.0.0.99) you are routed to the currently active cluster node.

Create a Load Balancer:

internal load balancer1

Give it a name, select “Internal” as the scheme, make sure your virtual network and subnet are properly selected, and assign a static IP that is the same as the cluster’s Virtual IP address.  In this example it’s 10.0.0.99:

internal load balancer2

Next, add a backend pool behind the load balancer.  This how you place the two cluster VMs behind this load balancer

internal load balancer3

Select both of your cluster nodes (node1, node2) and add them to the Backend Pool:

internal load balancer4

Once saved, expand the backend pool (called “ILBBackEnd” here) and you’ll see both VMs underneath along with their status and IPs.  It may take a few seconds before the screen updates:

internal load balancer5

Next, configure a probe for your ILB.  The probe checks the health of a service behind the ILB to determine which node to route traffic to.  Here we will specify port 3306, which is the default for MySQL:

internal load balancer6

Finally, complete the ILB configuration by creating a Load Balancing Rule.   TCP, Port 3306, and make sure you select “Enabled” for “Floating IP (direct server return)”:

internal load balancer7

Test Cluster Connectivity

At this point, all of our Azure and Cluster configurations are complete!

Cluster resources are currently active on node1:

create-dependency3

SSH into the witness server, “sudo su -” to gain root access.   Install the mysql client if needed:

[root@witness ~]# yum -y install mysql

Test MySQL connectivity to the cluster:

[root@witness ~]# mysql --host=10.0.0.99 mysql -u root -p

Execute the following MySQL query to display the hostname of the active cluster node:

mysql> select @@hostname;
+------------+
| @@hostname |
+------------+
| node1      |
+------------+
1 row in set (0.00 sec)
mysql>

Using LifeKeeper GUI, failover from Node1 -> Node2″.  Right click on the mysql resource underneath node2, and select “In Service…”:

failover1

After failover:

failover2

After failover has completed, re-run the MySQL query.  You’ll notice that the MySQL client has detected that the session was lost (during failover) and automatically reconnects:

Execute the following MySQL query to display the hostname of the active cluster node, verifying that now “node2” is active:

mysql> select @@hostname;
ERROR 2006 (HY000): MySQL server has gone away
No connection. Trying to reconnect...
Connection id:    48
Current database: mysql
+------------+
| @@hostname |
+------------+
| node2      |
+------------+
1 row in set (0.56 sec)
mysql>

 

Nov 262012
 

Much as a chain is only as strong as its weakest link, the effectiveness of a high availability cluster is limited by any single point of failures (SPOF) which exist within its deployment.  To ensure the absolute highest levels of availability, SPOFs must be removed.  There is a straightforward method for ridding the cluster of these weak links.

First, you must identify any SPOFs which exist with particular attention paid to servers, network connections and storage devices.   Modern servers come with redundant and error correcting memory, data striping across hard disks and multiple CPUs which eliminates most hardware components as a SPOF.   Software and human error, however, can result in server or application downtime.  Deploying a high availability cluster solution which monitors the health of servers and critical applications and takes automatic recovery actions in the event of failure eliminates this SPOF.  All clustering solutions provide basic ping tests to validate server functionality, but only more advanced offerings also track application health and have the ability to automatically recover from detected failures.  This deeper level of detection and recovery minimizes downtime.

Architecting all components of the cluster for redundancy is paramount to maximizing uptime.  Connections to storage often represent a SPOF and it is critical that multi-pathing is architected into any shared storage configuration.  Linux DM Multipath (DM-MPIO) provides the rerouting of block I/O to an alternate path in the event of a path failure. This eliminates all components in the path from server to storage as a potential SPOF and provides automatic recovery should a failure occur.

But even configured with multi-pathing, shared storage/SANs still represent single points of failure as does the physical data center where it is located.  To provide further protection, off-site replication of critical data combined with cross-site clustering must be deployed.  Combined with network redundancy between sites, this optimal solution removes all SPOFs.  Real-time replication ensures that an up-to-date copy of business critical data is always available; doing this off-site to a backup data center or into a cloud service also protects against primary data center outages that can result from fire, power outages, etc.

The use of application-level monitoring and auto-recovery, multi-pathing for shared storage, and data replication for off-site protection each eliminate potential Single Points of Failure within your cluster architecture.  Paying attention to these components during cluster architecture and deployment will ensure the greatest possible levels of uptime.

 Posted by at 12:04 pm
Nov 072012
 

Are you familiar with the Availability Equation? In a nutshell, this equation shows how the total time needed to restore an application to usability is equal to the time required to detect that an application is experiencing a problem plus the time required to perform a recovery action:

TRESTORE = TDETECT + TRECOVER

The equation introduces the key concepts of high availability (HA): clustering, problem detection, and subsequent recovery. HA solutions monitor the health of business application components; when problems are detected, these solutions act to restore them to service. The objective of deploying an HA solution is to minimize downtime.

Reducing detection and recovery times are two important tasks of any HA solution that you choose to deploy. Today’s applications are combinations of technologies: servers, storage, network infrastructure, and so on. When reviewing your HA options, be certain that you understand the technologies that each solution uses to detect and recover from all outage types. Each technology has a direct impact on service restoration times.

One technology that is crucial to providing the fastest possible restoration time is known as local detection and recovery (aka service-level problem detection and recovery). In a basic clustering solution, servers are connected and configured such that one or more servers can take over the operations of another in the event of a server failure. The server nodes in the cluster continuously send small data packets, often called heartbeat signals, to each other to indicate that they are “alive”.

In simple clustered environments, when one server stops generating heartbeats, other cluster members assume that this server is down and begin the process of taking over responsibility for that server’s domain of operation. This approach is adequate for detecting failure at the server level. But unless problems cause the interruption or cessation of heartbeat signals, server-level detection is inadequate. More than that, it can actually magnify the extent and impact of an outage.

For example, if Apache processes hang, the server may still send heartbeats — even though the Web server subsystem has ceased to perform its primary function. Rather than restart the Apache subsystem on the same or a different server, a basic server-level clustering solution would restart the entire software stack of the failed server on a backup server, thereby causing interruption to users and extending recovery time.

Using local detection and recovery, advanced clustering solutions deploy health-monitoring agents within individual cluster servers, to monitor individual system components such as a file system, a database, user-level application, IP address, and so on. These agents use heuristics that are specific to the monitored component. Therefore, the agents can predict and detect operational issues and then take the most appropriate recovery action. Often, the most efficient recovery method is to stop and restart the problem subsystem on the same server.

By detecting failures at a more granular level than simply by observing server-level heartbeats, and by enabling recovery within the same physical server, the time to restore an application to user availability can be greatly reduced. Solutions such as the SteelEye Protection Suite for Linux from SIOS  provides this level of detection and recovery for your environment.  Make certain that whichever HA solution you deploy can also support local detection and recovery.

 Posted by at 1:09 pm
Nov 012012
 

When selecting a high-availability (HA) solution, you should consider several criteria. These range from the total cost of the solution, to the ease with which you can configure and manage the cluster, to the specific restrictions placed on hardware and software. This post touches briefly on 12 of the most important checklist items.

1. Support for standard OS and application versions

Solutions that require enterprise or advanced versions of the OS, database, or application software can greatly reduce the cost benefits of moving to a commodity server environment. By deploying the proper HA middleware, you can make standard versions of applications and OSs highly available and meet the uptime requirements of your business environment.

2. Support for a variety of data storage configurations

When you deploy an HA cluster, the data that the protected applications require must be available to all systems that might need to bring the applications into service. You can share this data via data replication, by using shared SCSI or Fibre Channel storage, or by using a NAS device. Whichever method you decide to deploy, the HA product that you use must be able to support all data configurations so that you can change as your business needs dictate.

3. Ability to use heterogeneous solution components

Some HA clustering solutions require that every system within the cluster has identical configurations. This requirement is common among hardware-specific solutions in which clustering technology is meant to differentiate servers or storage and among OS vendors that want to limit the configurations they are required to support. This restriction limits your ability to deploy scaled-down servers as temporary backup nodes and to reuse existing hardware in your cluster deployment. Deploying identically configured servers might be the correct choice for your needs, but the decision shouldn’t be dictated by your HA solution provider.

4. Support for more than two nodes within a cluster

The number of nodes that can be supported in a cluster is an important measure of scalability. Entry-level HA solutions typically limit you to one two-node cluster, usually in active/passive mode. Although this configuration provides increased availability (via the addition of a standby server), it can still leave you exposed to application downtime. In a two-node cluster configuration, if one server is down for any reason, then the remaining server becomes a single point of failure. By clustering three or more nodes, you not only gain the ability to provide higher levels of protection, but you can also build highly scalable configurations.

5. Support for active/active and active/standby configurations

In an active/standby configuration, one server is idle, waiting to take over the workload of its cluster member. This setup has the obvious disadvantage of underutilizing your compute resource investment. To get the most benefit from your IT expenditure, ensure that cluster nodes can run in an active/active configuration.

6. Detection of problems at node and individual service levels

All HA software products can detect problems with cluster server functionality. This task is done by sending heartbeat signals between servers within the cluster and initiating a recovery if a cluster member stops delivering the signals. But advanced HA solutions can also detect another class of problems, one that occurs when individual processes or services encounter problems that render them unavailable but that do not cause servers to stop sending or responding to heartbeat signals. Given that the primary function of HA software is to ensure that applications are available to end users, detecting and recovering from these service level interruptions is a crucial feature. Ensure that your HA solution can detect both node- and service-level problems.

7. Support for in-node and cross-node recovery

The ability to perform recovery actions both across cluster nodes and within a node is also important. In cross-node recovery, one node takes over the complete domain of responsibility for another. When systems-level heartbeats are missed, the server which should have sent the heartbeats is assumed to be out of operation, and other cluster members begin recovery operations. With in-node or local recovery, failed system services first attempt to be restored within the server on which they are running. This task is typically done by stopping and restarting the service and any dependent system resources. This recovery method is much faster and minimizes downtime.

8. Transparency to client connections of server-side recovery

Server-side recovery of an application or even of an entire node should be transparent to client-side users. Through the use of virtualized IP addresses or server names, the mapping of virtual compute resources onto physical cluster entities during recovery, and automatic updating of network routing tables, no changes to client systems are necessary for the systems to access recovered applications and data. Solutions that require manual client-side configuration changes to access recovered applications greatly increase recovery time and introduce the risk of additional errors due to required human interaction. Recovery should be automated on both the servers and clients.

9. Protection for planned and unplanned downtime

In addition to providing protection against unplanned service outages, the HA solution that you deploy should be usable as an administration tool to lessen downtime caused by maintenance activities. By providing a facility to allow on-demand movement of applications between cluster members, you can migrate applications and users onto a second server while performing maintenance on the first. This can eliminate the need for maintenance windows in which IT resources are unavailable to end users. Ensure that your HA solution provides a simple and secure method for performing manual (on-demand) movement of applications and needed resources among cluster nodes.

10. Off-the-shelf protection for common business functions

Every HA solution that you evaluate should include tested and supported agents or modules that are designed to monitor the health of specific system resources: file systems, IP addresses, databases, applications, and so on. These modules are often called recovery modules. By deploying vendor-supplied modules, you benefit from both the run-time that the vendor and other customers have already done. You also have the assurance of ongoing support and maintenance of these solution components.

11. Ability to easily incorporate protection for custom business applications

There will likely be applications, perhaps custom to your corporation, that you want to protect but for which there are no vendor-supplied recovery modules. It is important, therefore, that you have a method for easily incorporating your business application into your HA solution’s protection schema. You should be able to do this without modifying your application, and especially without having to embed any vendor-specific APIs. A software developer’s kit that provides examples and a step-by-step process for protecting your application should be available, along with vendor-supplied support services, to assist as needed.

12. Ease of cluster deployment and management

A common myth surrounding HA clusters is that they are costly and complex to deploy and administer. This is not necessarily true. Cluster administration interfaces should be wizard-driven to assist with initial cluster configuration, should include auto-discovery of new elements as they are added to the cluster, and should allow for at-a-glance status monitoring of the entire cluster. Also, any cluster metadata must be stored in an HA fashion, not on a single quorum disk within the cluster, where corruption or an outage could cause the entire cluster to fall apart.

 

By looking for the capabilities on this checklist, you can make the best decision for your particular HA needs.

Oct 112012
 

Are you looking for a powerful yet easy to implement High Availability / Disaster Recovery solution for your SAP environment?  If so, you will want to take a look at the SteelEye Protection Suite (SPS) for Linux, from SIOS Technologies.  SPS provides integrated High Availability and Data Replication functionality that works with any server or storage configuration.  Support for SAP is provided out-of-the-box without the need for any scripting or customizations.

SPS for Linux was recently officially certified by SAP against their “SAP NetWeaver High Availability Cluster 730 Certification” (NW-HA-CLU 730)

A list of certified HA solutions for SAP can be found here:  http://scn.sap.com/docs/DOC-31701

For more information on SPS for Linux’s SAP functionality, please visit:

http://us.sios.com

or

http://us.sios.com/wp-content/uploads/2011/05/SPS-for-Linux-SAP-July-23.pdf

 Posted by at 10:27 am
Sep 252012
 

The primary advantage of running a MySQL cluster is obviously high availability (HA). To get the most from this type of solution, you will want to eliminate as many potential single points of failure as possible. Conventional wisdom says that you can’t form a cluster without some type of shared storage, which technically represents a single point of failure in your clustering architecture. However, there are solutions, such as the SteelEye Protection Suite (SPS) for Linux, that allow you to eliminate storage as a single point of failure by providing real-time data replication between cluster nodes. Let’s look at a typical scenario: You form a cluster that leverages local, replicated storage to protect a MySQL database.

This step-by-step discussion presumes that you’re working with an evaluation copy of SPS in a lab environment. We’re also presuming that you’ve confirmed that the primary and secondary servers and the network all meet the requirements for running this type of setup. (You can find details of these requirements in the SIOS SteelEye Protection Suite for Linux MySQL with Data Replication Evaluation Guide.)

Getting started

Before you begin setting up your cluster, you’ll need to configure the storage. The data that you want to replicate need to reside on a separate file system or logical volume. Keep in mind that the size of the target disk, whether you’re using a partition or logical volume, must be equal to or larger than the source.

In this example, we presume that you’re using a disk partition. (However, LVM is also fully supported.) First, partition the local storage for use with SteelEye DataKeeper. On the primary server, identify a free, unused disk partition to use as the MySQL repository or create a new partition. Use the fdisk utility to partition the disk, then format the partition and temporarily mount it at /mnt. Move any existing data from /var/lib/mysql/ into this new disk partition (assuming a default MySQL configuration). Unmount and then remount the partition at /var/lib/mysql. You don’t need to add this partition to /etc/fstab, as it will be mounted automatically by SPS.

On the secondary server, configure your disk as you did on the primary server.

Installing MSQL

Next you’ll deal with MySQL. On the primary server, install both the mysql and mysql-server RPM packages (if they don’t already exist on the system) and apply any required dependencies. Verify that your local disk partition is still mounted at /var/lib/mysql. If necessary, initialize a sample MySQL database. Ensure that all the files in your MySQL data directory (/var/lib/mysql) have the correct permissions and ownership, and then manually start the MySQL daemon from the command line. (Note: Do not start MySQL via the service command or the /etc/init.d/ script.)

Connect with the mysql client to verify that MySQL is running.

Update and verify the root password for your MySQL configuration. Then create a MySQL configuration file, such as the sample file shown here:

———-

# cat /var/lib/mysql/my.cnf

[mysqld]
datadir=/var/lib/mysql
socket=/var/lib/mysql/mysql.sock
pid-file=/var/lib/mysql/mysqld.pid
user=root
port=3306
# Default to using old password format for compatibility with mysql 3.x
# clients (those using the mysqlclient10 compatibility package).
old_passwords=1

# Disabling symbolic-links is recommended to prevent assorted security risks;
# to do so, uncomment this line:
# symbolic-links=0

[mysqld_safe]
log-error=/var/log/mysqld.log
pid-file=/var/run/mysqld/mysqld.pid

[client]

user=root

password=SteelEye

———-

In this example, we place this file in the same directory that we will later replicate (/var/lib/mysql/my.cnf). Delete the original MySQL configuration file (in /etc).

On the secondary server, install both the mysql and mysql-server RPM packages if necessary, apply any dependencies, and ensure that all the files in your MySQL data directory (/var/lib/mysql) have the correct permissions and ownership.

Installing SPS for Linux

Next, install SPS for Linux. For ease of installation, SIOS provides a unified installation script (called “setup”) for SPS for Linux. Instructions for how to obtain this software are in an email that comes with the SPS for Linux evaluation license keys.

Download the software and evaluation license keys on both the primary and secondary servers. On each server, run the installer script, which will install a handful of prerequisite RPMs, the core clustering software, and any optional ARKs that are needed. In this case, you will want to install the MySQL ARK (steeleye-lkSQL) and the DataKeeper (i.e., Data Replication) ARK (steeleye-lkDR). Apply the license key via the /opt/LifeKeeper/bin/lkkeyins command and start SPS for Linux via its start script, /opt/LifeKeeper/lkstart.

At this point you have SPS installed, licensed, and running on both of your nodes, and your disk and the MySQL database that you want to protect are configured.

In the next post, we’ll look at the remaining steps in the shared-nothing clustering process:

  • Creating communication (Comm) paths, i.e. heartbeats, between the primary and target servers
  • Creating an IP resource
  • Creating the mirror and launching data replication
  • Creating the MySQL database resource
  • Creating the MySQL IP address dependency
 Posted by at 9:37 am
Sep 252012
 

The previous post introduced the advantages of running a MySQL cluster, using a shared-nothing storage configuration. We also began walking through the process of setting up the cluster, using data replication and SteelEye Protection Suite (SPS) for Linux. In this post, we complete the process. Let’s get started.

Creating Comm paths

Now it’s time to access the SteelEye LifeKeeper GUI. LifeKeeper is an integrated component of SPS for Linux, and the LifeKeeper GUI is a Java-based application that can be run as a native Linux app or as an applet within a Java-enabled Web browser. (The GUI is based on Java RMI with callbacks, so hostnames must be resolvable or you might receive a Java 115 or 116 error.)

To start the GUI application, enter this command on either of the cluster nodes: /opt/LifeKeeper/bin/lkGUIapp & Or, to open the GUI applet from a Web browser, go to http://<hostname>:81.

The first step is to make sure that you have at least two TCP communication (Comm) paths between each primary server and each target server, for heartbeat redundancy. This way, the failure of one communication line won’t cause a split-brain situation. Verify the paths on the primary server. The following screenshots walk you through the process of logging into the GUI, connecting to both cluster nodes, and creating the Comm paths.

Step 1: Connect to primary server

tutorial image

Step 2: Connect to secondary server

tutorial image

Step 3: Create the Comm path

tutorial image

Step 4: Choose the local and remote servers

tutorial image

tutorial image

Step 5: Choose device type

tutorial image

Next, you are presented with a series of dialogue boxes. For each box, provide the required information and click Next to advance. (For each field in a dialogue box, you can click Help for additional information.)

Step 6: Choose IP address for local server to use for Comm path

tutorial image

Step 7: Choose IP address for remote server to use for Comm path

tutorial image

Step 8: Enter Comm path priority on local server

tutorial image

After entering data in all the required fields, click Create. You’ll see a message that indicates that the network Comm path was successfully created.

Step 9: Finalize Comm path creation

tutorial image

Click Next. If you chose multiple local IP addresses or remote servers and set the device type to TCP, then the procedure returns you to the setup wizard to create the next Comm path. When you’re finished, click Done in the final dialogue box. Repeat this process until you have defined all the Comm paths you plan to use.

Verify that the communications paths are configured properly by viewing the Server Properties dialogue box. From the GUI, select Edit > Server > Properties, and then choose the CommPaths tab. The displayed state should be ALIVE. You can also check the server icon in the right-hand primary pane of the GUI. If only one Comm path has been created, the server icon is overlayed with a yellow warning icon. A green heartbeat checkmark indicates that at least two Comm paths are configured and ALIVE.

Step 10: Review Comm path state

tutorial image

Creating and extending an IP resource

In the LifeKeeper GUI, create an IP resource and extend it to the secondary server by completing the following steps. This virtual IP can move between cluster nodes along with the application that depends on it. By using a virtual IP as part of your cluster configuration, you provide seamless redirection of clients upon switchover or failover of resources between cluster nodes because they continue to access the database via the same FQDN/IP.

Step 11: Create resource hierarchy

tutorial image

Step 12: Choose IP ARK

tutorial image

Enter the appropriate information for your configuration, using the following recommended values. (Click the Help button for further information.) Click Next to continue after entering the required information.

Field

Tips

Resource Type Choose IP Address as the resource type and click Next.
Switchback Type Choose Intelligent and click Next.
Server Choose the server on which the IP resource will be created. Choose your primary server and click Next.
IP Resource Enter the virtual IP information and click Next.(This is an IP address that is not in use anywhere on your network. All clients will use this address to connect to the protected resources.)
Netmask Enter the IP subnet mask that your TCP/IP resource will use on the target server. Any standard netmask for the class of the specific TCP/IP resource address is valid. The subnet mask, combined with the IP address, determines the subnet that the TCP/IP resource will use and should be consistent with the network configuration.This sample configuration 255.255.255.0 is used for a subnet mask on both networks.
Network Connection Enters the physical Ethernet card with which the IP address interfaces. Chose the network connection that will allow your virtual IP address to be routable. Choose the correct NIC and click Next.
IP Resource Tag Accept the default value and click Next. This value affects only how the IP is displayed in the GUI. The IP resource will be created on the primary server.

LifeKeeper creates and validates your resource. After receiving the message that the resource has been created successfully, click Next.

Step 13: Review notice of successful resource creation

tutorial image

Now you can complete the process of extending the IP resource to the secondary server.

Step 14: Extend IP resource to secondary server

tutorial image

The process of extending the IP resource starts automatically after you finish creating an IP address resource and click Next. You can also start this process from an existing IP address resource, by right-clicking the active resource and selecting Extend Resource Hierarchy. Use the information in the following table to complete the procedure.

Field

Recommended Entries or Notes

Switchback Type Leave as intelligent and click Next.
Template Priority Leave as default (1).
Target Priority Leave as default (10).
Network Interface This is the physical Ethernet card with which the IP address interfaces. Choose the network connection that will allow your virtual IP address to be routable. The correct physical NIC should be selected by default. Verify and then click Next.
IP Resource Tag Leave as default.
Target Restore Mode Choose Enable and click Next.
Target Local Recovery Choose Yes to enable local recovery for the SQL resource on the target server.
Backup Priority Accept the default value.

 

After receiving the message that the hierarchy extension operation is complete, click Finish and then click Done.

Your IP resource (example: 192.168.197.151) is now fully protected and can float between cluster nodes, as needed. In the LifeKeeper GUI, you can see that the IP resource is listed as Active on the primary cluster node and Standby on the secondary cluster node.

Step 15: Review IP resource state on primary and secondary nodes

tutorial image

Creating a mirror and beginning data replication

You’re ready to set up and configure the data replication resource, which you’ll use to synchronize MySQL data between cluster nodes. For this example, the data to replicate is in the /var/lib/mysql partition on the primary cluster node. The source volume must be mounted on the primary server, the target volume must not be mounted on the secondary server, and the target volume size must be equal to or larger than the source volume size.

The following screenshots illustrate the next series of steps.

Step 16: Create resource hierarchy

tutorial image

Step 17: Choose Data Replication ARK

tutorial image

Use these values in the Data Replication wizard.

Field

Recommended Entries or Notes

Switchback Type Choose Intelligent.
Server Choose LinuxPrimary (the primary cluster node or mirror source).
Hierarchy Type Choose Replicate Existing Filesystem.
Existing Mount Point Choose the mounted partition to replicate; in this example, /var/lib/mysql.
Data Replication Resource Tag Leave as default.
File System Resource Tag Leave as default.
Bitmap File Leave as default.
Enable Asynchronous Replication Leave as default (Yes).

Click Next to begin the creation of the data replication resource hierarchy. The GUI will display the following message.

Step 18: Begin creation of Data Replication resource

tutorial image

Click Next to begin the process of extending the data replication resource. Accept all default settings. When asked for a target disk, choose the free partition on your target server that you created earlier in this process. Make sure to choose a partition that is as large as or larger than the source volume and that is not mounted on the target system.

Step 19: Begin extension of Data Replication resource

tutorial image

Eventually, you are prompted to choose the network over which you want the replication to take place. In general, separating your user and application traffic from your replication traffic is best practice. This sample configuration has two separate network interfaces, our “public NIC” on the 192.168.197.X subnet and a “private/backend NIC” on the 192.168.198.X subnet. We will configure replication to go over the back-end network 192.168.198.X, so that user and application traffic is not competing with replication.

Step 20: Choose network for replication traffic

tutorial image

Click Next to continue through the wizard. Upon completion, your resource hierarchy will look like this:

Step 21: Review Data Replication resource hierarchy

tutorial image

Creating the MySQL resource hierarchy

You need to create a MySQL resource to protect the MySQL database and make it highly available between cluster nodes. At this point, MySQL must be running on the primary server but not running on the secondary server.

From the GUI toolbar, click Create Resource Hierarchy. Select MySQL Database and click Next. Proceed through the Resource Creation wizard, providing the following values.

Field

Recommended Entries or Notes

Switchback Type Choose Intelligent.
Server Choose LinuxPrimary (primary cluster node).
Location of my.cnf Enter /var/lib/mysql. (Earlier in the MySQL configuration process, you created a my.cnf file in this directory.)
Location of MySQL executables Leave as default (/usr/bin) because you’re using a standard MySQL install/configuration in this example.
Database tag Leave as default.

 

Click Create to define the MySQL resource hierarchy on the primary server. Click Next to extend the file system resource to the secondary server. In the Extend wizard, choose Accept Defaults. Click Finish to exit the Extend wizard. Your resource hierarchy should look like this:

Step 22: Review MySQL resource hierarchy

tutorial image

Creating the MySQL IP address dependency

Next, you’ll configure MySQL to depend on a virtual IP (192.168.197.151) so that the IP address follows the MySQL database as it moves.

From the GUI toolbar, right-click the mysql resource. Choose Create Dependency from the context menu. In the Child Resource Tag drop-down menu, choose ip-192.168.197.151. Click Next, click Create Dependency, and then click Done. Your resource hierarchy should now look like this:

Step 23: Review MySQL IP resource hierarchy

tutorial image

At this point in the evaluation, you’ve fully protected MySQL and its dependent resources (IP addresses and replicated storage). Test your environment, and you’re ready to go.

You can find much more information and detailed steps for every stage of the evaluation process in the SIOS SteelEye Protection Suite for Linux MySQL with Data Replication Evaluation Guide. To download an evaluation copy of SPS for Linux, visit the SIOS website or contact SIOS at info@us.sios.com.

 Posted by at 9:36 am