Define monitoring requirements

Monitoring is the art of using instrumentation to analyze and predict the behaviors of a given system. Knowing what instrumentation is available and the expected values for each enables the administrator of a system to be able to adjust for performance idiosyncrasies of a system without experiencing any downtime.

Each new revision of SharePoint has introduced greater potential for monitoring at a more granular level than in previous versions. SharePoint 2013 continues this pattern by providing insight into the operations of major subsystems such as Microsoft SQL, ASP.NET, IIS, and other services.

Service guarantee metrics

In the planning stages of your farm design, you should have developed SLAs that define the service guarantee provided by the farm. This guarantee defines not only the times that a system can be up or down entirely but also the availability and enforceability of scheduled outage windows.

Within the SLA for your environment you will find terms such as “downtime,” “scheduled downtime,” and “uptime percentage.” These metrics merely describe at a high level what the goals of monitoring are. For instance, within the Microsoft Office 365 SLA agreement for SharePoint Online, you will find definitions for the following:

  • Downtime Defined as “Any period of time when users are unable to access SharePoint sites for which they have appropriate permissions.”
  • Scheduled Downtime Defined as “(i) Downtime within preestablished maintenance windows; or (ii) Downtime during major version upgrade.” The SLA goes on to state that scheduled downtime is not considered downtime.
  • Monthly Uptime Percentage Defined as being calculated “by taking the total number of minutes in a calendar month multiplied by the total number of users minus the total number of minutes of Downtime experienced by all users in a given calendar month, all divided by the total number of minutes in that calendar month multiplied by the total number of users.”Immediately after these definitions, the SLA goes on to define what service credit is offered in the event of monthly uptime percentage falling below 99.9% (“Three Nines”), 99% (“Two Nines”), and 95% (“One Nine”).

Monitoring levels

Now you know what your monthly uptime percentage is (for example, three 9s would give you a maximum of approximately 24*60*.001, or 1.44 minutes per day of downtime) and what constitutes downtime, you can begin to monitor the SharePoint farm (or farms) to prevent these incidents.

A single SharePoint farm has three major levels at which it can be monitored (from largest to smallest):

  • Server level At this level, you will be monitoring the servers that constitute the farm:
    • Web tier servers
    • Application tier servers
    • SQL database servers
  • Service application level At this level, you will be monitoring all the services provided within the farm, such as Excel Calculation Services, User Profile services, and so on.
  • Site and site collection level At this level, you monitor all the sites and site collections contained within the farm.
“Remember that outages do not necessarily require the failure of an entire farm; an improperly deployed feature or a misconfigured service application can result in downtime for a considerable segment of the user base without rendering the entire farm inoperable.”

Monitoring tools

There are four core tools that can be used to monitor SharePoint 2013 farms: Central Administration, Windows PowerShell, logs, and System Center 2012 Operations Manager.

Central Administration allows for the configuration and monitoring of the SharePoint logs as well as configuration of usage and health providers. Additionally, Health Analyzer runs a series of rules on a regular basis that check on the status of metrics such as these:

  • Free disk space on both SharePoint and SQL servers
  • Service issues such as problems with State service, InfoPath Forms Services, and Visio Graphics Service
  • SQL-specific issues, such as overly large content databases, databases in need of upgrade, and the read/write status for a given database

Windows PowerShell focuses on the diagnostic capabilities found in the Unified Logging Service (ULS) logs. The ULS logs can be quite detailed in scope, meaning that quite literally hundreds and thousands of entries can be found on a given server. Using the Get-SPLogEvent cmdlet, you can view trace events by level, area, category, event ID, process, or message text.

Additionally, you can pipe its output to the Out-GridView cmdlet to produce tabular log output in a graphical format (as shown in Figure 5-1), which can be easily refined and/or exported to an Excel spreadsheet for further analysis.

image

FIGURE 1 Using the Out-GridView cmdlet.

Logs for use with SharePoint monitoring come from two distinct sources. At the operating system level, you find the standard event logs, in which events that concern SharePoint and its supporting technologies (SQL, IIS, and so on) are recorded (primarily in the Application and System logs). As previously mentioned, SharePoint also records information in its own series of trace logs, otherwise known as the ULS logs.

Finally, if you have a larger SharePoint farm (or farms), you may find that the monitoring of each individual system is becoming time-centric, and that even the use of the usage and health providers is not enough to provide a complete picture of the systems required to support SharePoint.

For this purpose, Microsoft produces a product known as System Center 2012 Operations Manager. Using this toolset with the System Center Management Pack for SharePoint 2013 not only allows for the effective monitoring of multiple SharePoint farms and their component systems but also for the alerting and preventative actions required to assist in the maintenance of service level guarantees.

The System Center Monitoring Pack monitors both Microsoft SharePoint Server 2013 and Project Server 2013. It also monitors the following service applications:

  • Access Services
  • Business Data Connectivity (BDC)
  • Security Token Service
  • Managed Metadata Web Service
  • Education Services
  • Excel Services Application
  • InfoPath Forms Services
  • Performance Point Services
  • Sandboxed Code Services
  • Secure Store Services
  • SharePoint Server Search
  • Translation Services
  • User Profile Service
  • Visio Services
  • Word Automation Service