Any individual or company looking to utilize a data center’s services needs a guaranteed level of quality. There has to be some assurance that their equipment will not fail and that their data will always be available (or that in the worst case scenario, their project will quickly be restored).
When it comes to data centers, the most commonly recognized guarantee is certification. Certification asserts that the center complies with a predetermined list of standards. Although issues of standardization have gained particular interest over the past few years, the number of relevant publications on the topic remains low.
In this article, we’ll be looking at the primary regulatory documents for data center certification.
Uptime Institute Certification
The most well-known organization for data center certification is of course the Uptime Institute. It is comprised of a group of companies that specialize in training, consulting, and certification in the field of data center design, construction, and operation. It was founded by Kenneth G. Brill in 1993. In 2009, the Uptime Institute was acquired by the 451 Group as an independent division. The organization’s head office is located in New York, and it has regional branches in Brazil, Russia, Mexico, Great Britain, Costa Rica, UAE, and Taiwan.
Through the Uptime Institute’s certification process, data centers are assigned a level of reliability (system availability), called Tiers. The Tier classification system was first described in the mid-90s and was widely received shortly thereafter. Levels are assigned based on an evaluation of the center’s fault tolerance and protection from potential failures.
Tier Certification Stages
The Tier certification process is complex and involves multiple stages.
Data center certification starts at the design stage: all design documents are sent to the Uptime Institute and undergo TCDD (Tier Certification of Design Documents).
The documents are inspected to ensure that all of the future data center’s subsystems will be able to handle specific tasks. The documentation must contain answers to 19 questions pertaining to all primary infrastructures. Answers should describe these systems, the topological location of IT and engineering networks, and the operating conditions of equipment. Tier 4 certifications also requires an algorithm for infrastructure management automation. The current standards demand that malfunctions in these data centers be localized and backup systems activated automatically, with no human interaction.
The Uptime Institute analyzes the submitted documents, and then holds a teleconference with the team of data center representatives. During this teleconference, the institute points out any issues that were discovered and discuss ways to correct them.
After making all of the necessary corrections, the project receives a certificate of compliance that is valid for two years (starting the day it is issued).
The next step in the certification process is the TCCF (Tier Certification of Constructed Facility). Representatives from the Uptime Institute visit the now-constructed data center and look at how the project was implemented, noting and documenting any and all issues. Special attention is given to analyzing differences between the the previously submitted project documentation and the actual project.
After the owner of the data center fixes all identified issues, the Uptime Institute issues a certificate of compliance with one of the four Tiers (described below).
The Uptime Institute doesn’t only look at the formal compliance criteria, but how the data center is managed and how service levels are ensured.
Companies that have their own data centers, but for whatever reason don’t want to undergo the Tier certification process, can receive a Management and Operations Stamp of Approval.
Data centers are issued a certificate in accordance with one of the Tier levels based on the results of their evaluation. There are 4 possible levels:
- Tier 1 means the center has the basic ability to support its IT infrastructure: there’s an uninterrupted power supply and guaranteed protection from power fluctuations, as well as a cooling system and generator, which would ensure the continued operation of the center in the event of a power outage. There are no backup systems in place: if one component goes offline, the entire data center will go down.
- Tier 2 implies the center has a certain level of redundancy; data centers at this level may not fail if equipment goes offline. This is possible thanks to additional power feeds and cooling channels. construction and renovations cannot be carried out without suspending data center operations. This redundancy plan is called N+1 (For each N primary system there is a backup).
- Tier 3 means an elevated level of redundancy: maintenance can be performed and failed components can be replaced without disrupting the standard operation of the data center. All infrastructure systems contain several backups: there are multiple power feeds and cooling channels, but only one of them will be active at a given time. This redundancy plan is called 2N (all primary systems as duplicated, preventing failure).
- Tier 4 requires the highest level of redundancy. Each of the data center’s infrastructure systems posses 2 (N+1) redundancy: both primary and backup systems are duplicated.
Each Tier requires the following service uptime:
- Tier 1 — 99.671%
- Tier 2 — 99.741%
- Tier 3 — 99.982%
- Tier 4 — 99.995%
At first glance, it may not seem like there’s much difference between these numbers, but in a critical situation, they could be very noticeable. The higher the service availability level, the lower the permissible downtime. The permissible downtime for each Tier is as follows:
- Tier 1 — 1729 minutes, or roughly 29 hours a year
- Tier 2 — 1361 minutes, or 23 hours a year
- Tier 3 — 95 minutes a year
- Tier 4 — only 26 minutes a year
According to data from 2013, 187 data centers around the world (24 in Russia) have Tier certification (see the map here); most of these meet Tier 3 requirements. The certification process is a company expense and not cheap: costs range on average from 100 000 to 300 000 dollars. Despite the high cost, more and more data centers around the world (including in Russia) are looking to be certified, and this is completely understandable. Firstly, potential investors and clients see certification as a sign of quality. Secondly, the involvement of the Uptime Institute’s consultants help prevent errors and optimize expenses.
The Tier System and Other Standards
Despite the growing popularity of Tier certification around the world, it is not the only standard for data centers.
On the map linked above, we can see that there are significantly fewer data centers in Western Europe with Tier certification than in South America or Asia.
There is a logical explanation for this: national standards play a huge role in the design and construction of data centers in European countries. For example, the organization TÜV Süd is responsible for data center certification in Germany.
In the USA, a large number of data centers have TIA-942 certification, which was developed in 2005 by the American National Standards Institute (ANSI).
Discussions about the differences between TIA-942 and Uptime Institute standards have already taken place around the Internet. The TIA standards give a more detailed description of data center infrastructure requirements and it has its own Tier level equivalents. Compared to TIA-942, the Uptime Institute standards seem much more flexible.
Unlike the TIA-942 standards, the Uptime Institute standards include not just technical requirements, but organizational and managerial requirements. In 2010, the Uptime Institute published its recommendations Tier Standard: Operational Sustainability. Reliability and sustainability consist of the following points:
- Location reliability (evaluates multiple parameters: from transport availability to level of risk of natural and technical disasters)
- Building reliability (compliance with technical requirements, durability against natural and technical influences, etc.)
- Functional compliance (uses reliable and tested technology, has redundant systems, etc.)
- Management and functionality (organizational features, qualified personnel, etc.)
The Uptime Institute regulatory documents also include an additional rating–gold, silver, and bronze–which complement Tiers. Ratings are given based on the data center’s administration and operations.
Despite the advantages of the Tier system, there is one drawback: the certification process is bogged down by bureaucracy; this is why so few data centers on the above-linked map are marked as having completed the procedure. The process is much simpler for TIA-942 certification, which is why there are many more data centers certified to these standards.
We should also mention one well-known and fairly wide-spread standard: BICSI 002 2010, which was developed by Building Industry Service International in 2010. In terms of content, it is very similar to TIA-942. Its technical requirements are much more detailed compared to other standards, but there is no rating system or levels (although one section lists possible levels of power supply systems).
At the end of 2015, an updated and more thorough version of this standard was released: ANSI/BICSI 002-2014.
Emergency Power-Off Problems
The emergency power-off (EPO) capability is one of the most widely discussed problems pertaining to the design and construction of data centers. Different standards treat it differently.
According to the Uptime Institute standards, a data center must have an EPO option only when it is required by local law. As Uptime Institute documentation notes, EPO is often the cause of downtime (for example, if someone accidentally or carelessly activates this function).
This is why the presence of this function is not recommended in data centers.
The developers of TIA-942-2 hold a similar stance, where they directly indicate that an emergency power off button should not be installed unless required by local law.
In the BCSI standard, it’s noted that emergency power-off capabilities in a data center bring big risks. At the same time, this standard doesn’t have any direct instructions or recommendations; it states that the EPO function can be implemented at the data center owner’s discretion. For F0-F1 classes, the power-off procedure should be one step, and three steps for F2-F5 classes.
Some Words about Energy Efficiency
Along with level of availability and reliability, an important characteristic in modern data centers is energy efficiency. The consortium Green Grid researches the effective use of data center resources. Members of this organization include well-known companies like Cisco, Dell, EMC, Intel, IBM, and more. Although Green Grid does not develop standards, the consortium-created power usage effectiveness and infrastructure efficiency metrics were adopted by the US Environmental Protection Agency and other government organizations around the world.
The coefficient of power usage efficiency is calculated with the following formula (image from Wikipedia):
Where PUE is the coefficient of power usage efficiency, Total Facility Energy is the total amount of consumable energy and IT Equipment Energy is the total amount of energy the IT infrastructure can consume.
There is even an online calculator (such as the one here) which can be used to calculate PUE for a specific data center. Lately, there have been a fair number publications on ecological issues of data center operations; however, no standard has a corresponding requirement, if only because the concept of energy efficiency is ambiguous.
In this article, we gave you a quick overview of the primary standards regulating the construction and operations of data centers.
Naturally, there are a lot of aspects we couldn’t cover here. If you have anything to add, we’d love to hear from you in the comments below.
We plan on continuing our article series on data center in the near future. In our next article, we’ll take an in-depth look at Russian standards.