Rapid VSS Snapshots with Exchange 2010 DAG on EMC VMAX
We recently have published a very detailed Proven Solution Guide titled “EMC Virtual Infrastructure for Exchange 2010 – Enabled by EMC Symmetrix VMAX, VMWare vSphere 4, and EMC Replication Manager”. This solution was built out in our EMC Hopkinton Solutions Center where we wanted to showcase a 20,000 mailbox deployment on EMC Symmetrix VMAX utilizing Microsoft Exchange Database Availability Group (DAG) for high availability.
The simulated user profile requirements were:
- Mailbox Profile: 150 msgs sent/received per day – .15 IOPS for DAG
- Mailbox size – 1GB, virtually provisioned for 2.5GB
- DAG copies: 2 HA DAG copies in a single DAG
- RTO: 5 minutes
In this case, we used our previous Exchange building block on VMWare VSphere of 5,000 users per Exchange mailbox server/VM guest and aimed for 8 total DBs per server (4 active/4 passive copies). Each mailbox server was determined to utilize 6 vCPU per with a total of 32GB per VM per Microsoft recommended calculations. Across 2 ESX hosts, we deployed a total of 32 active and 32 passive DB, with 625 users per DB and accounted for interruption of service of either of the hosts so that active DBs across VMs could be taken over by Mailbox roles on the other ESX host.
The physical architecture of the solution is described below where we utilized two ESX hosts (based on Dell PE R900, 128GB RAM, 2-dual port 4GB Qlogic HBA), and Cisco MDS9509 FC switching into the VMAX FAs:
Database layout across hosts/DAG members looked like this, which is pretty typical for a DAG architecture where the aim is to distributed active/passive copies evenly across the members and since this is virtualized implementation the aim was also to distribute across physical hosts:
We all know there are some occasions where a DAG copy will need to be reseeded, so we also wanted to look at what impacts we will see when re-seeds do occur. We didn’t do the reseeding under load, so this is a best case scenario:
- Steady 40% network util on a 1GB/s link during reseeding
- DB of 575GB and Index of 250GB took 6 hours and 15 minutes to seed both DB and index. This factors to approx 2.2GB/minute for a single DB.
We also covered some specific design guidelines for virtualizing Exchange 2010 on the EMC Symmetrix VMAX which were addressed and the specific disk design layout that was calculated based on our Proven guidance with sizing for Exchange 2010. Virtual provisioning is also key in this solution and we layed out some of the design considerations when planning for thin pool design on VMAX:
- Thin pool for Exchange Mailbox Role is recommended – provides a much more granular design and allows for easier troubleshooting and perf analysis if needed.
- Use large Data Devices (120-240GB for thin pools)
- Use concatenated Thin Device Metas – as data devices are already striped in the disk group
In our case, we use thin pools per Exchange Mailbox role and each thin pool was supported via a Disk group with 16-600GB 10k FC disks and a 240GB data devices were used with R5 (7+1) protection.
We also really wanted to show some performance and functionality testing with EMC Replication Manager and EMC TimeFinder for Symmetrix to show how we perform with snapshots of Passive DBs in the DAG. These provide our point-in-time and roll-forward restores in the event of what is usually the most common form of corruption which is logical. A single Replication Manager mount host was used in the case to run the RM Management console and provide the VSS requestor function for our snapshots. The following diagram shows our strategy for snaps of the passive DAG copies:
Snapshots are great, but how well do they perform. With VSS snapshots, usually the most time consuming part is the Consistency check. With 2 copy DAG, consistency checks are less important because checks are already being done as log files are being copied to the other host so we worry less about physical corruption. Microsoft mentions this in the “VSS Frequently Asked Questions” with Exchange 2010 at: http://msdn.microsoft.com/en-us/library/aa579091(EXCHG.140).aspx
In our case, we wanted to show some of the job timings we observed in our testing:
- Each RM job contained 2.53 TB of Exchange data
- Consistency checks were performed on snaps of passive copy to determine impact (as mentioned above it is not as much of an issue to skip daily consistency checks when you have a 2 copy DAG). We say best practice to one run per week.
Another question that we addressed in how much snap space is required to protect the Exchange data. We measured the amount of space required over a 48 hour period and took snaps against 24 passive copies over 6-hour increments when LoadGen was running. Snap space for single DB and log Lun was around 2% a day, and data change rates also come into play. We saw about a 5% change rate.
Finally, 3 test scenarios were run with LoadGen to test overall performance of the solution under varying conditions:
- Test 1 – Normal operation – Very Heavy LoadGen load against 20,000 active mailboxes and taking snapshot off passive DAG copy
- Test 2- Mailbox Role down – Very Heavy LoadGen load against 20,000 active mailbox with single MBX server down
- Test 3- Consistency check running on passive – Loadgen against passive, and run for 24 hours during consistency check
In summary, we proved that the solution provides outstanding performance, enabling large mailbox with virtual provisioning on the VMAX and leveraging space-saving TimeFinder snapshots, along with the strong integration with Exchange 2010 via EMC Replication Manager. Virtualizing Exchange 2010 with VMWare vSphere provides a solid solution where you can consolidate Exchange roles and provide HA with Exchange DAGs.
You can find the full Proven Solutions Guide at: http://www.emc.com/collateral/software/technical-documentation/h7392-virtual-exchange-vmax-vsphere-psg.pdf
Until next time,