Nov 072012
 

Host-based or storage-based?

Two common platforms for replicating data are from the server host that operates against the data and from the storage array that holds the data.

Host-based replication doesn’t lock users into a particular storage array from any one vendor. SIOS SteelEye DataKeeper, for example, can replicate from any array to any array, regardless of vendor. This ability ultimately lowers costs and provides users the flexibility to choose what is right for their environment. Most host-based replication solutions can also replicate data natively over IP networks, so users don’t need to buy expensive hardware to achieve this functionality.

Storage-based replication is OS-independent and adds no processing overhead. However, vendors often demand that users replicate from and to similar arrays. This requirement can be costly, especially when you use a high-performance disk at your primary site — and now must use the same at your secondary site. Also, storage-based solutions natively replicate over Fibre Channel and often require extra hardware to send data over IP networks, further increasing costs.

When creating remote replicas for business continuity, the decision whether to deploy a host- or storage-based solution depends heavily on the platform that is being replicated and the business requirements for the applications that are in use. If the business demands zero impact to operations in the event of a site disaster, then host-based techniques provide the only feasible solution.

Host-based solutions are storage-agnostic, providing IT managers complete freedom to choose any storage that matches the needs of the enterprise. Host-based replication software functions with any storage hardware that can be mounted to the application platform, offering heterogeneous storage support. Host-based solutions that operate at the block or volume level are also ideally suited for cluster configurations.

One disadvantage is that host-based solutions consume server resources and can affect overall server performance. Despite this possibility, a host-based solution might still be appropriate when IT managers need a multi-vendor storage infrastructure or have a legacy investment or internal expertise in a specific host-based application.

A storage-based alternative does provide the benefit of an integrated solution from a dedicated storage vendor. These solutions leverage the controller of the storage array as an operating platform for replication functionality. The tight integration of hardware and software gives the storage vendor unprecedented control over the replication configuration and allows for service-level guarantees that are difficult to match with alternative replication approaches. Most storage vendors have also tailored their products to complement server virtualization and use key features such as virtual machine storage failover. Some enterprises might also have a long-standing business relationship with a particular storage vendor; in such cases, a storage solution might be a relevant fit.

High quality of service comes at a cost, however. Storage-based replication invariably sets a precondition of like-to-like storage device configuration. This means that two similarly configured high-end storage arrays must be deployed to support replication functionality, increasing costs and tying the organization to one vendor’s storage solution.

This locking in to a specific storage vendor can be a drawback. Some storage vendors have compatibility restrictions within their storage-array product line, potentially making technology upgrades and data migration expensive. When investigating storage alternatives, IT managers should pay attention to the total cost of ownership: The cost of future license fees and support contracts will affect expenses in the longer term.

Cost is a key consideration, but it is affected by several factors beyond the cost of the licenses. Does the solution require dedicated hardware, or can it be used with pre-existing hardware? Will the solution require network infrastructure expansion and if so, how much? If you are using replication to place secondary copies of data on separate servers, storage, or sites, realize that this approach implies certain hardware redundancies. Replication products that provide options to redeploy existing infrastructure to meet redundant hardware requirements demand less capital outlay.

Before deciding between a host- or storage-based replication solution, carefully consider the pros and cons of each, as illustrated in the following table.

Host-Based Replication Storage-Based Replication
Pros
  • Storage agnostic
  • Sync and async
  • Data can reside on any storage
  • Unaffected by storage upgrades
  • Single vendor for storage and replication
  • No burden on host system
  • OS agnostic
Cons
  • Use of computing resources on host

 

  • Vendor lock-in
  • Higher cost
  • Data must reside on array
  • Distance limitations of Fibre Channel
Best Fit
  • Multi-vendor storage environment
  • Need option of sync or async
  • Implementing failover cluster
  • Replicating to multiple targets
  • Prefer single vendor
  • Limited distance and controlled environment
  • Replicating to single target
 Posted by at 1:17 pm
Oct 082012
 

When you want to replicate data across multi-site or wide area network (WAN) configurations, you first need to answer one important question: Is there sufficient bandwidth to successfully replicate the partition and keep the mirror in the mirroring state as the source partition is updated throughout the day? Keeping the mirror in the mirroring state is crucial. A partition switchover is allowed only when the mirror is in the mirroring state.

Therefore, an important early step in any successful data replication solution is determining your network bandwidth requirements. How can you measure the rate of change—the value that indicates the amount of network bandwidth needed to replicate your data?

Establish Basic Rate of Change

First, use these commands to determine the basic daily rate of change for the files or partitions that you want to mirror; for example, to measure the amount of data written in a day for /dev/sda3, run this command at the beginning of the day:

MB_START=`awk ‘/sda3 / { print $10 / 2 / 1024 }’ /proc/diskstats`

Wait for 24 hours, then run this command:

MB_END=`awk ‘/sda3 / { print $10 / 2 / 1024 }’ /proc/diskstats`

The daily rate of change, in megabytes, is then MB_END – MB_START.

The amounts of data that you can push through various network connections are as follows:

  • For T1 (1.5Mbps): 14,000 MB/day (14 GB)
  • For T3 (45Mbps): 410,000 MB/day (410 GB)
  • For Gigabit (1Gbps): 5,000,000 MB/day (5 TB)

Establish Detailed Rate of Change

Next, you’ll need to measure detailed rate of change. The best way to collect this data is to log disk write activity for some period (e.g., one day) to determine the peak disk write periods. To do so, create a cron job that will log the timestamp of the system followed by a dump of /proc/diskstats. For example, to collect disk stats every 2 minutes, add this link to /etc/crontab:

*/2 * * * * root ( date ; cat /proc/diskstats ) >> /path_to/filename.txt

Wait for the determined period (e.g., one day, one week), then disable the cron job and save the resulting /proc/diskstats output file in a safe location.

Analyze and Graph Detailed Rate of Change Data

Next you should analyze the detailed rate of change data. You can use the roc-calc-diskstats utility for this task. This utility takes the /proc/diskstats output file and calculates the rate of change of the disks in the dataset. To run the utility, use this command:

# ./roc-calc-diskstats <interval> <start_time> <diskstats-data-file> [dev-list]

For example, the following dumps a summary (with per-disk peak I/O information) to the output file results.txt:

# ./roc-calc-diskstats 2m “Jul 22 16:04:01” /root/diskstats.txt sdb1,sdb2,sdc1 > results.txt

Here are sample results from the results.txt file:

Sample start time: Tue Jul 12 23:44:01 2011

Sample end time: Wed Jul 13 23:58:01 2011

Sample interval: 120s #Samples: 727 Sample length: 87240s

(Raw times from file: Tue Jul 12 23:44:01 EST 2011, Wed Jul 13 23:58:01 EST 2011)

Rate of change for devices dm-31, dm-32, dm-33, dm-4, dm-5, total

dm-31 peak:0.0 B/s (0.0 b/s) (@ Tue Jul 12 23:44:01 2011) average:0.0 B/s (0.0 b/s)

dm-32 peak:398.7 KB/s (3.1 Mb/s) (@ Wed Jul 13 19:28:01 2011) average:19.5 KB/s (156.2 Kb/s)

dm-33 peak:814.9 KB/s (6.4 Mb/s) (@ Wed Jul 13 23:58:01 2011) average:11.6 KB/s (92.9 Kb/s)

dm-4 peak:185.6 KB/s (1.4 Mb/s) (@ Wed Jul 13 15:18:01 2011) average:25.7 KB/s (205.3 Kb/s)

dm-5 peak:2.7 MB/s (21.8 Mb/s) (@ Wed Jul 13 10:18:01 2011) average:293.0 KB/s (2.3 Mb/s)

total peak:2.8 MB/s (22.5 Mb/s) (@ Wed Jul 13 10:18:01 2011) average:349.8 KB/s (2.7 Mb/s)

To help you understand your specific bandwidth needs over time, you can graph the detailed rate of change data. The following dumps graph data to results.csv (as well as dumping the summary to results.txt):

# export OUTPUT_CSV=1

# ./roc-calc-diskstats 2m “Jul 22 16:04:01” /root/diskstats.txt sdb1,sdb2,sdc1 2> results.csv > results.txt

SIOS has created a template spreadsheet, diskstats-template.xlsx, which contains sample data that you can overwrite with your data from roc-calc-diskstats. The following series of images show the process of using the spreadsheet.

  1. Open results.csv, and select all rows, including the total column.

1-copy-csv-data_574x116

  1. Open diskstats-template.xlsx, select the diskstats.csv worksheet.

2-diskstats-worksheet

  1. In cell 1-A, right-click and select Insert Copied Cells.
  2. Adjust the bandwidth value in the cell towards the bottom left of the worksheet (as marked in the following figure) to reflect the amount of bandwidth (in megabits per second) that you have allocated for replication. The cells to the right are automatically converted to bytes per second to match the collected raw data.

3-extend-existing-bandwidth_536x96

  1. Take note of the following row and column numbers:
    • Total (row 6 in the following figure)
    • Bandwidth (row 9 in the following figure)
    • Last datapoint (column R in the following figure)

4-note-row-colums_535x86

  1. Select the bandwidth vs ROC worksheet.

5-bandwidth-worksheet

  1. Right-click the graph and choose Select Data.
  2. In the Select Data Source dialog box, choose bandwidth in the Legend Entries (Series) list, and then click Edit.

6-edit-bandwidth

  1. In the Edit Series dialog box, use the following syntax in the Series values field: =diskstats.csv!$B$<row>:$<final_column>$<row> The following figure shows the series values for the spread B9 to R9.

7-bandwidth-values

  1. Click OK to close the Edit Series box.
  2. In the Select Data Source box, choose ROC in the Legend Entries (Series) list, and then click Edit.

8-edit-roc

  1. In the Edit Series dialog box, use the following syntax in the Series values field: =diskstats.csv!$B$<row>:$<final_column>$<row> The following figure shows the series values for the spread B6 to R6.

9-roc-values

  1. Click OK to close the Edit Series box, then click OK to close the Select Data Source box.

The Bandwidth vs ROC graph updates. Analyze your results to determine whether you have sufficient bandwidth to support data replication.

Next Steps

If your Rate of Change exceeds your available bandwidth, you will need to consider some of the following points to ensure your replication solution performs optimally:

  • Enable compression in your replication solution or in the network hardware. (DataKeeper for Linux, which is part of the SteelEye Protection Suite for Linux, supports this type of compression.)
  • Create a local, non-replicated storage repository for temporary data and swap files that don’t need to be replicated.
  • Reduce the amount of data being replicated.
  • Increase your network capacity.
Sep 252012
 

The primary advantage of running a MySQL cluster is obviously high availability (HA). To get the most from this type of solution, you will want to eliminate as many potential single points of failure as possible. Conventional wisdom says that you can’t form a cluster without some type of shared storage, which technically represents a single point of failure in your clustering architecture. However, there are solutions, such as the SteelEye Protection Suite (SPS) for Linux, that allow you to eliminate storage as a single point of failure by providing real-time data replication between cluster nodes. Let’s look at a typical scenario: You form a cluster that leverages local, replicated storage to protect a MySQL database.

This step-by-step discussion presumes that you’re working with an evaluation copy of SPS in a lab environment. We’re also presuming that you’ve confirmed that the primary and secondary servers and the network all meet the requirements for running this type of setup. (You can find details of these requirements in the SIOS SteelEye Protection Suite for Linux MySQL with Data Replication Evaluation Guide.)

Getting started

Before you begin setting up your cluster, you’ll need to configure the storage. The data that you want to replicate need to reside on a separate file system or logical volume. Keep in mind that the size of the target disk, whether you’re using a partition or logical volume, must be equal to or larger than the source.

In this example, we presume that you’re using a disk partition. (However, LVM is also fully supported.) First, partition the local storage for use with SteelEye DataKeeper. On the primary server, identify a free, unused disk partition to use as the MySQL repository or create a new partition. Use the fdisk utility to partition the disk, then format the partition and temporarily mount it at /mnt. Move any existing data from /var/lib/mysql/ into this new disk partition (assuming a default MySQL configuration). Unmount and then remount the partition at /var/lib/mysql. You don’t need to add this partition to /etc/fstab, as it will be mounted automatically by SPS.

On the secondary server, configure your disk as you did on the primary server.

Installing MSQL

Next you’ll deal with MySQL. On the primary server, install both the mysql and mysql-server RPM packages (if they don’t already exist on the system) and apply any required dependencies. Verify that your local disk partition is still mounted at /var/lib/mysql. If necessary, initialize a sample MySQL database. Ensure that all the files in your MySQL data directory (/var/lib/mysql) have the correct permissions and ownership, and then manually start the MySQL daemon from the command line. (Note: Do not start MySQL via the service command or the /etc/init.d/ script.)

Connect with the mysql client to verify that MySQL is running.

Update and verify the root password for your MySQL configuration. Then create a MySQL configuration file, such as the sample file shown here:

———-

# cat /var/lib/mysql/my.cnf

[mysqld]
datadir=/var/lib/mysql
socket=/var/lib/mysql/mysql.sock
pid-file=/var/lib/mysql/mysqld.pid
user=root
port=3306
# Default to using old password format for compatibility with mysql 3.x
# clients (those using the mysqlclient10 compatibility package).
old_passwords=1

# Disabling symbolic-links is recommended to prevent assorted security risks;
# to do so, uncomment this line:
# symbolic-links=0

[mysqld_safe]
log-error=/var/log/mysqld.log
pid-file=/var/run/mysqld/mysqld.pid

[client]

user=root

password=SteelEye

———-

In this example, we place this file in the same directory that we will later replicate (/var/lib/mysql/my.cnf). Delete the original MySQL configuration file (in /etc).

On the secondary server, install both the mysql and mysql-server RPM packages if necessary, apply any dependencies, and ensure that all the files in your MySQL data directory (/var/lib/mysql) have the correct permissions and ownership.

Installing SPS for Linux

Next, install SPS for Linux. For ease of installation, SIOS provides a unified installation script (called “setup”) for SPS for Linux. Instructions for how to obtain this software are in an email that comes with the SPS for Linux evaluation license keys.

Download the software and evaluation license keys on both the primary and secondary servers. On each server, run the installer script, which will install a handful of prerequisite RPMs, the core clustering software, and any optional ARKs that are needed. In this case, you will want to install the MySQL ARK (steeleye-lkSQL) and the DataKeeper (i.e., Data Replication) ARK (steeleye-lkDR). Apply the license key via the /opt/LifeKeeper/bin/lkkeyins command and start SPS for Linux via its start script, /opt/LifeKeeper/lkstart.

At this point you have SPS installed, licensed, and running on both of your nodes, and your disk and the MySQL database that you want to protect are configured.

In the next post, we’ll look at the remaining steps in the shared-nothing clustering process:

  • Creating communication (Comm) paths, i.e. heartbeats, between the primary and target servers
  • Creating an IP resource
  • Creating the mirror and launching data replication
  • Creating the MySQL database resource
  • Creating the MySQL IP address dependency
 Posted by at 9:37 am
Sep 252012
 

The previous post introduced the advantages of running a MySQL cluster, using a shared-nothing storage configuration. We also began walking through the process of setting up the cluster, using data replication and SteelEye Protection Suite (SPS) for Linux. In this post, we complete the process. Let’s get started.

Creating Comm paths

Now it’s time to access the SteelEye LifeKeeper GUI. LifeKeeper is an integrated component of SPS for Linux, and the LifeKeeper GUI is a Java-based application that can be run as a native Linux app or as an applet within a Java-enabled Web browser. (The GUI is based on Java RMI with callbacks, so hostnames must be resolvable or you might receive a Java 115 or 116 error.)

To start the GUI application, enter this command on either of the cluster nodes: /opt/LifeKeeper/bin/lkGUIapp & Or, to open the GUI applet from a Web browser, go to http://<hostname>:81.

The first step is to make sure that you have at least two TCP communication (Comm) paths between each primary server and each target server, for heartbeat redundancy. This way, the failure of one communication line won’t cause a split-brain situation. Verify the paths on the primary server. The following screenshots walk you through the process of logging into the GUI, connecting to both cluster nodes, and creating the Comm paths.

Step 1: Connect to primary server

tutorial image

Step 2: Connect to secondary server

tutorial image

Step 3: Create the Comm path

tutorial image

Step 4: Choose the local and remote servers

tutorial image

tutorial image

Step 5: Choose device type

tutorial image

Next, you are presented with a series of dialogue boxes. For each box, provide the required information and click Next to advance. (For each field in a dialogue box, you can click Help for additional information.)

Step 6: Choose IP address for local server to use for Comm path

tutorial image

Step 7: Choose IP address for remote server to use for Comm path

tutorial image

Step 8: Enter Comm path priority on local server

tutorial image

After entering data in all the required fields, click Create. You’ll see a message that indicates that the network Comm path was successfully created.

Step 9: Finalize Comm path creation

tutorial image

Click Next. If you chose multiple local IP addresses or remote servers and set the device type to TCP, then the procedure returns you to the setup wizard to create the next Comm path. When you’re finished, click Done in the final dialogue box. Repeat this process until you have defined all the Comm paths you plan to use.

Verify that the communications paths are configured properly by viewing the Server Properties dialogue box. From the GUI, select Edit > Server > Properties, and then choose the CommPaths tab. The displayed state should be ALIVE. You can also check the server icon in the right-hand primary pane of the GUI. If only one Comm path has been created, the server icon is overlayed with a yellow warning icon. A green heartbeat checkmark indicates that at least two Comm paths are configured and ALIVE.

Step 10: Review Comm path state

tutorial image

Creating and extending an IP resource

In the LifeKeeper GUI, create an IP resource and extend it to the secondary server by completing the following steps. This virtual IP can move between cluster nodes along with the application that depends on it. By using a virtual IP as part of your cluster configuration, you provide seamless redirection of clients upon switchover or failover of resources between cluster nodes because they continue to access the database via the same FQDN/IP.

Step 11: Create resource hierarchy

tutorial image

Step 12: Choose IP ARK

tutorial image

Enter the appropriate information for your configuration, using the following recommended values. (Click the Help button for further information.) Click Next to continue after entering the required information.

Field

Tips

Resource Type Choose IP Address as the resource type and click Next.
Switchback Type Choose Intelligent and click Next.
Server Choose the server on which the IP resource will be created. Choose your primary server and click Next.
IP Resource Enter the virtual IP information and click Next.(This is an IP address that is not in use anywhere on your network. All clients will use this address to connect to the protected resources.)
Netmask Enter the IP subnet mask that your TCP/IP resource will use on the target server. Any standard netmask for the class of the specific TCP/IP resource address is valid. The subnet mask, combined with the IP address, determines the subnet that the TCP/IP resource will use and should be consistent with the network configuration.This sample configuration 255.255.255.0 is used for a subnet mask on both networks.
Network Connection Enters the physical Ethernet card with which the IP address interfaces. Chose the network connection that will allow your virtual IP address to be routable. Choose the correct NIC and click Next.
IP Resource Tag Accept the default value and click Next. This value affects only how the IP is displayed in the GUI. The IP resource will be created on the primary server.

LifeKeeper creates and validates your resource. After receiving the message that the resource has been created successfully, click Next.

Step 13: Review notice of successful resource creation

tutorial image

Now you can complete the process of extending the IP resource to the secondary server.

Step 14: Extend IP resource to secondary server

tutorial image

The process of extending the IP resource starts automatically after you finish creating an IP address resource and click Next. You can also start this process from an existing IP address resource, by right-clicking the active resource and selecting Extend Resource Hierarchy. Use the information in the following table to complete the procedure.

Field

Recommended Entries or Notes

Switchback Type Leave as intelligent and click Next.
Template Priority Leave as default (1).
Target Priority Leave as default (10).
Network Interface This is the physical Ethernet card with which the IP address interfaces. Choose the network connection that will allow your virtual IP address to be routable. The correct physical NIC should be selected by default. Verify and then click Next.
IP Resource Tag Leave as default.
Target Restore Mode Choose Enable and click Next.
Target Local Recovery Choose Yes to enable local recovery for the SQL resource on the target server.
Backup Priority Accept the default value.

 

After receiving the message that the hierarchy extension operation is complete, click Finish and then click Done.

Your IP resource (example: 192.168.197.151) is now fully protected and can float between cluster nodes, as needed. In the LifeKeeper GUI, you can see that the IP resource is listed as Active on the primary cluster node and Standby on the secondary cluster node.

Step 15: Review IP resource state on primary and secondary nodes

tutorial image

Creating a mirror and beginning data replication

You’re ready to set up and configure the data replication resource, which you’ll use to synchronize MySQL data between cluster nodes. For this example, the data to replicate is in the /var/lib/mysql partition on the primary cluster node. The source volume must be mounted on the primary server, the target volume must not be mounted on the secondary server, and the target volume size must be equal to or larger than the source volume size.

The following screenshots illustrate the next series of steps.

Step 16: Create resource hierarchy

tutorial image

Step 17: Choose Data Replication ARK

tutorial image

Use these values in the Data Replication wizard.

Field

Recommended Entries or Notes

Switchback Type Choose Intelligent.
Server Choose LinuxPrimary (the primary cluster node or mirror source).
Hierarchy Type Choose Replicate Existing Filesystem.
Existing Mount Point Choose the mounted partition to replicate; in this example, /var/lib/mysql.
Data Replication Resource Tag Leave as default.
File System Resource Tag Leave as default.
Bitmap File Leave as default.
Enable Asynchronous Replication Leave as default (Yes).

Click Next to begin the creation of the data replication resource hierarchy. The GUI will display the following message.

Step 18: Begin creation of Data Replication resource

tutorial image

Click Next to begin the process of extending the data replication resource. Accept all default settings. When asked for a target disk, choose the free partition on your target server that you created earlier in this process. Make sure to choose a partition that is as large as or larger than the source volume and that is not mounted on the target system.

Step 19: Begin extension of Data Replication resource

tutorial image

Eventually, you are prompted to choose the network over which you want the replication to take place. In general, separating your user and application traffic from your replication traffic is best practice. This sample configuration has two separate network interfaces, our “public NIC” on the 192.168.197.X subnet and a “private/backend NIC” on the 192.168.198.X subnet. We will configure replication to go over the back-end network 192.168.198.X, so that user and application traffic is not competing with replication.

Step 20: Choose network for replication traffic

tutorial image

Click Next to continue through the wizard. Upon completion, your resource hierarchy will look like this:

Step 21: Review Data Replication resource hierarchy

tutorial image

Creating the MySQL resource hierarchy

You need to create a MySQL resource to protect the MySQL database and make it highly available between cluster nodes. At this point, MySQL must be running on the primary server but not running on the secondary server.

From the GUI toolbar, click Create Resource Hierarchy. Select MySQL Database and click Next. Proceed through the Resource Creation wizard, providing the following values.

Field

Recommended Entries or Notes

Switchback Type Choose Intelligent.
Server Choose LinuxPrimary (primary cluster node).
Location of my.cnf Enter /var/lib/mysql. (Earlier in the MySQL configuration process, you created a my.cnf file in this directory.)
Location of MySQL executables Leave as default (/usr/bin) because you’re using a standard MySQL install/configuration in this example.
Database tag Leave as default.

 

Click Create to define the MySQL resource hierarchy on the primary server. Click Next to extend the file system resource to the secondary server. In the Extend wizard, choose Accept Defaults. Click Finish to exit the Extend wizard. Your resource hierarchy should look like this:

Step 22: Review MySQL resource hierarchy

tutorial image

Creating the MySQL IP address dependency

Next, you’ll configure MySQL to depend on a virtual IP (192.168.197.151) so that the IP address follows the MySQL database as it moves.

From the GUI toolbar, right-click the mysql resource. Choose Create Dependency from the context menu. In the Child Resource Tag drop-down menu, choose ip-192.168.197.151. Click Next, click Create Dependency, and then click Done. Your resource hierarchy should now look like this:

Step 23: Review MySQL IP resource hierarchy

tutorial image

At this point in the evaluation, you’ve fully protected MySQL and its dependent resources (IP addresses and replicated storage). Test your environment, and you’re ready to go.

You can find much more information and detailed steps for every stage of the evaluation process in the SIOS SteelEye Protection Suite for Linux MySQL with Data Replication Evaluation Guide. To download an evaluation copy of SPS for Linux, visit the SIOS website or contact SIOS at info@us.sios.com.

 Posted by at 9:36 am
Aug 132012
 

When most people think about setting up a cluster, it usually involves two or more servers, and a SAN – or some other type of shared storage.  SAN’s are typically very costly and complex to setup and maintain. Also, they technically represent a potential Single Point of Failure (SPOF) in your cluster architecture.  These days, more and more people are turning to companies like Fusion-io, with their lightning fast ioDrives, to accelerate critical applications.  These storage devices sit inside the server (i.e. aren’t “shared disks”), and therefore can’t be used as cluster disks with many traditional clustering solutions.  Fortunately, there are solutions out there that allow you form a failover cluster when there is no shared storage involved – i.e. a “shared nothing” cluster.

Traditional Cluster   “Shared Nothing” Cluster

 

When leveraging data replication as part of a cluster configuration, it’s critical that you have enough bandwidth so that data can be replicated across the network just as fast as it’s written to disk.  The following are tuning tips that will allow you to get the most out of your “shared nothing” cluster configuration, when high-speed storage is involved:

Network

  • Use a 10Gbps NIC: Flash-based storage devices from Fusion-io (or other similar products from OCZ, LSI, etc) are capable of writing data at speeds in the HUNDREDS (750+) of MB/sec or more.  A 1Gbps NIC can only push a theoretical maximum of ~125 MB/sec, so anyone taking advantage of an ioDrive’s potential can easily write data much faster than could be pushed through a 1 Gbps network connection.  To ensure that you have sufficient bandwidth between servers to facilitate real-time data replication, a 10 Gbps NIC should always be used to carry replication traffic
  • Enable Jumbo Frames: Assuming that your Network Cards and Switches support it, enabling jumbo frames can greatly increase your network’s throughput while at the same time reducing CPU cycles.  To enable jumbo frames, perform the following configuration (example from a RedHat/CentOS/OEL linux server)
    • ifconfig <interface_name> mtu 9000
    • Edit /etc/sysconfig/network-scripts/ifcfg-<interface_name> file and add “MTU=9000” so that the change persists across reboots
    • To verify end-to-end jumbo frame operation, run this command: ping -s 8900 -M do <IP-of-other-server>
  • Change the NIC’s transmit queue length:
    • /sbin/ifconfig <interface_name> txqueuelen 10000
    • Add this to /etc/rc.local to preserve the setting across reboots

TCP/IP Tuning

  • Change the NIC’s netdev_max_backlog:
    • Set “net.core.netdev_max_backlog = 100000” in /etc/sysctl.conf
  • Other TCP/IP tuning that has shown to increase replication performance:
    • Note: these are example values and some might need to be adjusted based on your hardware configuration
    • Edit /etc/sysctl.conf and add the following parameters:
      • net.core.rmem_default = 16777216
      • net.core.wmem_default = 16777216
      • net.core.rmem_max = 16777216
      • net.core.wmem_max = 16777216
      • net.ipv4.tcp_rmem = 4096 87380 16777216
      • net.ipv4.tcp_wmem = 4096 65536 16777216
      • net.ipv4.tcp_timestamps = 0
      • net.ipv4.tcp_sack = 0
      • net.core.optmem_max = 16777216
      • net.ipv4.tcp_congestion_control=htcp

Typically you will also need to make adjustments to your cluster configuration, which will vary based on the clustering and replication technology you decide to implement.  In this example, I’m using the SteelEye Protection Suite for Linux (aka SPS, aka LifeKeeper), from SIOS Technologies, which allows users to form failover clusters leveraging just about any back-end storage type: Fiber Channel SAN, iSCSI, NAS, or, most relevant to this article, local disks that need to be synchronized/replicated in real time between cluster nodes.  SPS for Linux includes integrated, block level data replication functionality that makes it very easy to setup a cluster when there is no shared storage involved.

SteelEye Protection Suite (SPS) for Linux configuration recommendations:

  • Allocate a small (~100 MB) disk partition, located on the Fusion-io drive to place the bitmap file.  Create a filesystem on this partition and mount it, for example, at /bitmap:
    • # mount | grep /bitmap
    • /dev/fioa1 on /bitmap type ext3 (rw)
  • Prior to creating your mirror, adjust the following parameters in /etc/default/LifeKeeper
    • Insert: LKDR_CHUNK_SIZE=4096
      • Default value is 64
    • Edit: LKDR_SPEED_LIMIT=1500000
      • (Default value is 50000)
      • LKDR_SPEED_LIMIT specifies the maximum bandwidth that a resync will ever take — this should be set high enough to allow resyncs to go at the maximum speed possible
    • Edit: LKDR_SPEED_LIMIT_MIN=200000
      • (Default value is 20000)
      • LKDR_SPEED_LIMIT_MIN specifies how fast the resync should be allowed to go when there is other I/O going on at the same time — as a rule of thumb, this should be set to half or less of the drive’s maximum write throughput in order to avoid starving out normal I/O activity when a resync occurs

From here, go ahead and create your mirrors and configure the cluster as you normally would.