This time I have decided to share our best practices material for SharePoint. Yes, I know, some of you SharePoint savvies would question its meaning. as SharePoint is SOOO broad and infrastructure is just one pillar. TRUE! however its the foundation, and as such you have to invest there first before building your service consisting of web applications, sites, web parts, workflows, dashboards and…. you get the idea :-)
So here’s a summary of the presentation attached, James Baldwin and myself presented back at EMC World 2011 in Vegas.
I’ve also added a lot of links and references for you to find the technical material necessary to accomplish plan and deployment activities.
Please feel free to comment as I would love to get your feedback as for what works/doesn’t work in your environment.
Servers and Virtualization
Virtualizing SharePoint servers has the same benefits as any other application and/or database servers in your datacenter. but there’s actually more than “just”:
- Consolidation – Achieve 2-10x consolidation ratio, especially for larger deployments
- Performance – Improved front end performance with more, smaller WFEs rather than few large WFEs.
- Maintenance – Live migration of virtual machines (VMware vMotion, Hyper-V Quick/Live Migration)
- Load Balancing – Maximized overall performance with balanced HW utilization across the farm (VMware DRS, SCVMM PRO)
Aside from all those obvious benefits, there are couple that stand out in distributed configurations like SharePoint:
- Availability – VM based protection for SharePoint provides HOMOGENOUS availability (VMware HA, WSFC)
- Business Continuity – Simplified DR management (vCenter Site Recovery Manager, Cluster Enabler)
SharePoint is a perfect candidate for horizontal scaling. meaning scaling out server roles as more resources required. You would actually get some performance benefits when Web servers (front/back end) are broken into multiple instances by scaling out rather up, utilizing same hardware resources. In many other cases I find that working for the application and SQL server roles too.
Don’t get intimidated by the hypervisor overhead (<10%), again you can always distribute content databases using multiple SQL instances, Partition Index across multiple Crawl/Index servers. all servers roles in SharePoint 2010 can scale out!
Some would recommend to consider leaving index and/or SQL physical, we have proved so many times that ALL server roles can be virtualized without any problem while balancing processing (CPU) pressure (I wouldn’t be that worried with I/O throughput) with multiple virtual machines. Microsoft general recommendation to dedicate at least 8 physical cores for a medium sized farm is very generic and I don’t see that as a showstopper. if you take that guidance literally I would suggest to wait for the next rev. of vSphere (very soon) and Hyper-V (later this/next year) which would present up to 32 vCPU support.
- Plan for USER LOAD peaks and not for systematic peaks. from what we have observed in lab tests and actual customer data, SharePoint’s regular timer jobs are responsible for most of the I/O and CPU peaks.
The configuration above supported more than 20,000 users with 10% concurrency using only three Dell R910 ESX cluster.
Of course you would have to plan for performance and not capacity in most cases; definitely for SharePoint farms in a production environments. before getting into details, here’s a good picture of where SharePoint data is located:
For a complete list of all databases installed with SharePoint goto Database types and descriptions (SharePoint Server 2010)
To point the finger the the “hottest” areas in terms of I/Os I would rank it in the following order of importance (IOPS and latency):
- Search databases (Crawl and Property) data and log files
- Query Server/s – Query component/s
- tempdb data and log files
- databases logs
From our lab tests this is what we have found in terms of I/O sizes and R/W ratios. as you can understand this is not your typical OLTP or OLAP profile, SharePoint workload can vary but here’s some data you might found useful when planning storage resources:
Microsoft suggests really high I/Os derived from the search components of the farm. while definitely true, I would argue the suggested IOPS requirements apply to most environments. here’s what we have observed vs. TechNet recommendations:
For sizing, look no further using the following TechNet articles:
Here’s a summary with our recommendations:
- Use 64KB unit allocation size (cluster) when formatting a DB Volume (MSDN)
- Plan Database file sizes accordingly
- Don’t rely on autogrowth – File growth can cause locking and has some performance implications. set files size and autogrowth increments appropriately
- Plan Database file sizes accordingly
- When using Thin/Virtual provisioning
- Use the “Quick Format” option
- Enable Instant file initialization as it enhances the speed for data file creations, restores, data file growth
- Assign SQL service account to “Perform Volume Maintenance Tasks” permission
- Just bear in mind that log files (ldf) are fully allocated and zeroed upon creation or expansion
- Standard storage response time guidelinesapply, On well-tuned storage system, ideal values would be:
- 1–5 ms for Log
- 4–20 ms for Data on OLTP systems (ideally 10 ms or less)
- 30 ms or less on DSS (decision support system) type
In general, provides great value but for maximum efficiency, it depends on which storage role:
- Search Index component – No (Highly changing, throw-away data)
- Search Query component - Yes (Highly-read data with small burst write changes)
- TempDB – Yes (The same blocks are re-used on disk and performance of TEMPDB directly affects SharePoint performance request – tempdb is used in every SharePoint request)
- Content databases/BLOB Store – Maybe (Depending on the diversity of workload. If some site collections tend to be busier than others)
CX/VNX Thin Provisioning and LUN Compression
BLOB Storage (RBS)
We work with Metalogix StoragePoint for BLOB externalization. currently all EMC storage solutions are supported with StoragePoint as it has connectors to file, block and object storage. some of the possible BLOB stores:
- Symmetrix VMAX – Block
- VNX – Block and/or File (NFS/CIFS)
- Atmos/VE – object (REST)
- Centera – Centera API
- Isilon – File (CIFS/NFS)
- Data Domain – File (CIFS/NFS)
While each would make sense to our customers depending on their general storage design and requirements, please consider the following guidelines:
- Latency – TTFB (Time to First Byte) should be less than 20 ms
- Recommended maximum content database size remains 200 GB (guidance might change by MS)
- SQL RBS FILESTREAM provider works, but it doesn’t scale as other RBS providers like StoragePoint
- Performance improvement would be more salient when
- Externalizing larger objects (>1MB)
- Read-intensive access
- File size limit remains 2GB even with RBS
- Backup and Replication considerations:
- Native/Item level backup (stsadm based) would include BLOBs
- SQL based backup would only protect the content database metadata
- To maintain consistency:
- Backup – First Content Databases then BLOB Store
- Restore – First BLOB Store then Content Databases
- For DR purposes always tie RBS volumes with SQL Server volumes
- For faster recovery, consider larger intervals of garbage collection jobs (Keeps previous BLOB versions)
- Here’s a reference architecturebased on:
- EMC VNX5300, SQL Database: 15K SAS, BLOB Store: 7.2K NL-SAS (CIFS Share for RBS)
- Max user capacity – 8,630 (10%)
- BLOBs consumed 92% of content databases
- Full crawl duration – 34 hours (4.4 documents)
While Microsoft has some guidance for SharePoint availability as covered in Plan for availability (SharePoint Server 2010), there’s a lot more involved to obtain a true and complete SharePoint availability across multiple sites. Storage based replication can accelerate the failover process and can scale as your farm sotrage needs grow. The most basic methods of availability can be achieved with SQL server log shipping and/or database mirroring, but while effective for smaller configurations they still lack the complete farm protection. the only components that can be continuously protected with db mirroring/log shipping are SQL databases, and not all of them! what about the index? WFE? app servers?
SharePoint DR involves a lot, but can be significantly simplified when virtualizing all server roles, thus providing end-to-end mobility of SharePoint farm services without worrying about the BLOB filesystem, individual databases, index partitions etc. When choosing storage based replication, the first thing to consider is leveraging consistency grouping all SharePoint volumes. This is of a great value as it can guarantee an end-to-end (I like that term :-) ) consistency at any point in time.
Here’s a table to help you understand what is related to what and how to go about consistency grouping available with almost all EMC replication solutions (SRDF, RecoverPoint, MirrorView etc.):While this is a great solution, that type of DR still involves manual failover, restart and configuration. that’s the reason why I would be always recommending virtualizing all server roles, that would enable you to leverage automation solutions for virtual infrastracture. Namely, vCenter Site Recovery Manager (SRM) or Multi-Site Hyper-V clustering enabled by EMC Cluster Enabler (SRDF, RecoverPoint, MirrorView). Assuming VPLEX is deployed, you won’t even need Cluster Enabler but just rely on Windows Server Failover Cluster (VMs) to achieve that.
- Long distance application mobility for SharePoint 2010 (VPLEX Geo, Hyper-V)
- Business Continuity for SharePoint 2010 (RecoverPoint, vCenter SRM)
- Vmotion over distance for SharePoint 2007 (VPLEX Metro, vSphere)
- Business Continuity for SharePoint 2007 (RecoverPoint, Cluster Enabler, Hyper-V)
I’ll keep updating that post based on feedback, updated findings and updated guidance from our friends at Microsoft.