Oct 082012
 

When you want to replicate data across multi-site or wide area network (WAN) configurations, you first need to answer one important question: Is there sufficient bandwidth to successfully replicate the partition and keep the mirror in the mirroring state as the source partition is updated throughout the day? Keeping the mirror in the mirroring state is crucial. A partition switchover is allowed only when the mirror is in the mirroring state.

Therefore, an important early step in any successful data replication solution is determining your network bandwidth requirements. How can you measure the rate of change—the value that indicates the amount of network bandwidth needed to replicate your data?

Establish Basic Rate of Change

First, use these commands to determine the basic daily rate of change for the files or partitions that you want to mirror; for example, to measure the amount of data written in a day for /dev/sda3, run this command at the beginning of the day:

MB_START=`awk ‘/sda3 / { print $10 / 2 / 1024 }’ /proc/diskstats`

Wait for 24 hours, then run this command:

MB_END=`awk ‘/sda3 / { print $10 / 2 / 1024 }’ /proc/diskstats`

The daily rate of change, in megabytes, is then MB_END – MB_START.

The amounts of data that you can push through various network connections are as follows:

  • For T1 (1.5Mbps): 14,000 MB/day (14 GB)
  • For T3 (45Mbps): 410,000 MB/day (410 GB)
  • For Gigabit (1Gbps): 5,000,000 MB/day (5 TB)

Establish Detailed Rate of Change

Next, you’ll need to measure detailed rate of change. The best way to collect this data is to log disk write activity for some period (e.g., one day) to determine the peak disk write periods. To do so, create a cron job that will log the timestamp of the system followed by a dump of /proc/diskstats. For example, to collect disk stats every 2 minutes, add this link to /etc/crontab:

*/2 * * * * root ( date ; cat /proc/diskstats ) >> /path_to/filename.txt

Wait for the determined period (e.g., one day, one week), then disable the cron job and save the resulting /proc/diskstats output file in a safe location.

Analyze and Graph Detailed Rate of Change Data

Next you should analyze the detailed rate of change data. You can use the roc-calc-diskstats utility for this task. This utility takes the /proc/diskstats output file and calculates the rate of change of the disks in the dataset. To run the utility, use this command:

# ./roc-calc-diskstats <interval> <start_time> <diskstats-data-file> [dev-list]

For example, the following dumps a summary (with per-disk peak I/O information) to the output file results.txt:

# ./roc-calc-diskstats 2m “Jul 22 16:04:01” /root/diskstats.txt sdb1,sdb2,sdc1 > results.txt

Here are sample results from the results.txt file:

Sample start time: Tue Jul 12 23:44:01 2011

Sample end time: Wed Jul 13 23:58:01 2011

Sample interval: 120s #Samples: 727 Sample length: 87240s

(Raw times from file: Tue Jul 12 23:44:01 EST 2011, Wed Jul 13 23:58:01 EST 2011)

Rate of change for devices dm-31, dm-32, dm-33, dm-4, dm-5, total

dm-31 peak:0.0 B/s (0.0 b/s) (@ Tue Jul 12 23:44:01 2011) average:0.0 B/s (0.0 b/s)

dm-32 peak:398.7 KB/s (3.1 Mb/s) (@ Wed Jul 13 19:28:01 2011) average:19.5 KB/s (156.2 Kb/s)

dm-33 peak:814.9 KB/s (6.4 Mb/s) (@ Wed Jul 13 23:58:01 2011) average:11.6 KB/s (92.9 Kb/s)

dm-4 peak:185.6 KB/s (1.4 Mb/s) (@ Wed Jul 13 15:18:01 2011) average:25.7 KB/s (205.3 Kb/s)

dm-5 peak:2.7 MB/s (21.8 Mb/s) (@ Wed Jul 13 10:18:01 2011) average:293.0 KB/s (2.3 Mb/s)

total peak:2.8 MB/s (22.5 Mb/s) (@ Wed Jul 13 10:18:01 2011) average:349.8 KB/s (2.7 Mb/s)

To help you understand your specific bandwidth needs over time, you can graph the detailed rate of change data. The following dumps graph data to results.csv (as well as dumping the summary to results.txt):

# export OUTPUT_CSV=1

# ./roc-calc-diskstats 2m “Jul 22 16:04:01” /root/diskstats.txt sdb1,sdb2,sdc1 2> results.csv > results.txt

SIOS has created a template spreadsheet, diskstats-template.xlsx, which contains sample data that you can overwrite with your data from roc-calc-diskstats. The following series of images show the process of using the spreadsheet.

  1. Open results.csv, and select all rows, including the total column.

1-copy-csv-data_574x116

  1. Open diskstats-template.xlsx, select the diskstats.csv worksheet.

2-diskstats-worksheet

  1. In cell 1-A, right-click and select Insert Copied Cells.
  2. Adjust the bandwidth value in the cell towards the bottom left of the worksheet (as marked in the following figure) to reflect the amount of bandwidth (in megabits per second) that you have allocated for replication. The cells to the right are automatically converted to bytes per second to match the collected raw data.

3-extend-existing-bandwidth_536x96

  1. Take note of the following row and column numbers:
    • Total (row 6 in the following figure)
    • Bandwidth (row 9 in the following figure)
    • Last datapoint (column R in the following figure)

4-note-row-colums_535x86

  1. Select the bandwidth vs ROC worksheet.

5-bandwidth-worksheet

  1. Right-click the graph and choose Select Data.
  2. In the Select Data Source dialog box, choose bandwidth in the Legend Entries (Series) list, and then click Edit.

6-edit-bandwidth

  1. In the Edit Series dialog box, use the following syntax in the Series values field: =diskstats.csv!$B$<row>:$<final_column>$<row> The following figure shows the series values for the spread B9 to R9.

7-bandwidth-values

  1. Click OK to close the Edit Series box.
  2. In the Select Data Source box, choose ROC in the Legend Entries (Series) list, and then click Edit.

8-edit-roc

  1. In the Edit Series dialog box, use the following syntax in the Series values field: =diskstats.csv!$B$<row>:$<final_column>$<row> The following figure shows the series values for the spread B6 to R6.

9-roc-values

  1. Click OK to close the Edit Series box, then click OK to close the Select Data Source box.

The Bandwidth vs ROC graph updates. Analyze your results to determine whether you have sufficient bandwidth to support data replication.

Next Steps

If your Rate of Change exceeds your available bandwidth, you will need to consider some of the following points to ensure your replication solution performs optimally:

  • Enable compression in your replication solution or in the network hardware. (DataKeeper for Linux, which is part of the SteelEye Protection Suite for Linux, supports this type of compression.)
  • Create a local, non-replicated storage repository for temporary data and swap files that don’t need to be replicated.
  • Reduce the amount of data being replicated.
  • Increase your network capacity.
Aug 132012
 

When most people think about setting up a cluster, it usually involves two or more servers, and a SAN – or some other type of shared storage.  SAN’s are typically very costly and complex to setup and maintain. Also, they technically represent a potential Single Point of Failure (SPOF) in your cluster architecture.  These days, more and more people are turning to companies like Fusion-io, with their lightning fast ioDrives, to accelerate critical applications.  These storage devices sit inside the server (i.e. aren’t “shared disks”), and therefore can’t be used as cluster disks with many traditional clustering solutions.  Fortunately, there are solutions out there that allow you form a failover cluster when there is no shared storage involved – i.e. a “shared nothing” cluster.

Traditional Cluster   “Shared Nothing” Cluster

 

When leveraging data replication as part of a cluster configuration, it’s critical that you have enough bandwidth so that data can be replicated across the network just as fast as it’s written to disk.  The following are tuning tips that will allow you to get the most out of your “shared nothing” cluster configuration, when high-speed storage is involved:

Network

  • Use a 10Gbps NIC: Flash-based storage devices from Fusion-io (or other similar products from OCZ, LSI, etc) are capable of writing data at speeds in the HUNDREDS (750+) of MB/sec or more.  A 1Gbps NIC can only push a theoretical maximum of ~125 MB/sec, so anyone taking advantage of an ioDrive’s potential can easily write data much faster than could be pushed through a 1 Gbps network connection.  To ensure that you have sufficient bandwidth between servers to facilitate real-time data replication, a 10 Gbps NIC should always be used to carry replication traffic
  • Enable Jumbo Frames: Assuming that your Network Cards and Switches support it, enabling jumbo frames can greatly increase your network’s throughput while at the same time reducing CPU cycles.  To enable jumbo frames, perform the following configuration (example from a RedHat/CentOS/OEL linux server)
    • ifconfig <interface_name> mtu 9000
    • Edit /etc/sysconfig/network-scripts/ifcfg-<interface_name> file and add “MTU=9000” so that the change persists across reboots
    • To verify end-to-end jumbo frame operation, run this command: ping -s 8900 -M do <IP-of-other-server>
  • Change the NIC’s transmit queue length:
    • /sbin/ifconfig <interface_name> txqueuelen 10000
    • Add this to /etc/rc.local to preserve the setting across reboots

TCP/IP Tuning

  • Change the NIC’s netdev_max_backlog:
    • Set “net.core.netdev_max_backlog = 100000” in /etc/sysctl.conf
  • Other TCP/IP tuning that has shown to increase replication performance:
    • Note: these are example values and some might need to be adjusted based on your hardware configuration
    • Edit /etc/sysctl.conf and add the following parameters:
      • net.core.rmem_default = 16777216
      • net.core.wmem_default = 16777216
      • net.core.rmem_max = 16777216
      • net.core.wmem_max = 16777216
      • net.ipv4.tcp_rmem = 4096 87380 16777216
      • net.ipv4.tcp_wmem = 4096 65536 16777216
      • net.ipv4.tcp_timestamps = 0
      • net.ipv4.tcp_sack = 0
      • net.core.optmem_max = 16777216
      • net.ipv4.tcp_congestion_control=htcp

Typically you will also need to make adjustments to your cluster configuration, which will vary based on the clustering and replication technology you decide to implement.  In this example, I’m using the SteelEye Protection Suite for Linux (aka SPS, aka LifeKeeper), from SIOS Technologies, which allows users to form failover clusters leveraging just about any back-end storage type: Fiber Channel SAN, iSCSI, NAS, or, most relevant to this article, local disks that need to be synchronized/replicated in real time between cluster nodes.  SPS for Linux includes integrated, block level data replication functionality that makes it very easy to setup a cluster when there is no shared storage involved.

SteelEye Protection Suite (SPS) for Linux configuration recommendations:

  • Allocate a small (~100 MB) disk partition, located on the Fusion-io drive to place the bitmap file.  Create a filesystem on this partition and mount it, for example, at /bitmap:
    • # mount | grep /bitmap
    • /dev/fioa1 on /bitmap type ext3 (rw)
  • Prior to creating your mirror, adjust the following parameters in /etc/default/LifeKeeper
    • Insert: LKDR_CHUNK_SIZE=4096
      • Default value is 64
    • Edit: LKDR_SPEED_LIMIT=1500000
      • (Default value is 50000)
      • LKDR_SPEED_LIMIT specifies the maximum bandwidth that a resync will ever take — this should be set high enough to allow resyncs to go at the maximum speed possible
    • Edit: LKDR_SPEED_LIMIT_MIN=200000
      • (Default value is 20000)
      • LKDR_SPEED_LIMIT_MIN specifies how fast the resync should be allowed to go when there is other I/O going on at the same time — as a rule of thumb, this should be set to half or less of the drive’s maximum write throughput in order to avoid starving out normal I/O activity when a resync occurs

From here, go ahead and create your mirrors and configure the cluster as you normally would.