One of the biggest challenges facing today’s data center professionals is not being able to effectively manage their data center remotely. For reasons such as increasing edge data center deployments, shifts to colocation facilities, and the need to work from home through the COVID-19 pandemic, all data center managers must be able to solve complex data center problems without having the ability to go on site to see the data center or look at local meters.
Data center managers are now stuck wondering questions like how much capacity do I have and when will I run out? How do I manage moves, adds, and changes with remote hands? Where can I deploy equipment if I don’t have the luxury of walking the data center floor? How do I identify and manage hot spots? How do I ensure power loads don’t exceed capacity and cause downtime while I’m not at the data center? And without the right solution, the problems continue to mount each day.
To answer such questions, the solution lies in identifying and monitoring Key Performance Indicators (KPIs), and leveraging the insights to optimize your data center and increase uptime, improve efficiency, better utilize capacity, and boost the productivity of people.
The top 15 KPIs you need to monitor to remotely manage your data center are:
Power capacity per cabinet. Data center power resources are increasingly constrained, while managing to uptime competes with driving efficient power utilization. By monitoring your power capacity at the cabinet level, you will improve uptime by ensuring you don’t exceed capacity and save money by discovering stranded power capacity.
Actual active power per cabinet. Many data center managers take weekly or monthly measurements of their power consumption, leaving them vulnerable to short term peaks and potential overloads that are not detected. Monitor your power consumption per rack in real time, trend that data continuously, and set thresholds and alerts to ensure that you are notified and able to react before there is a major issue or users are impacted.
Stranded power capacity per cabinet. Data center managers will often allocate more power to each rack than is actually demanded by the IT equipment. This causes stranded power that can be deployed elsewhere in the data center to save costs. For a single rack, a few kilowatts of stranded power may seem unremarkable, but when you factor in hundreds or thousands of racks, stranded power could account for as much as 50% of all available power. Monitor power consumption in your data center to identify stranded capacity. Then, deploy that power with confidence and delay spending millions to build your next data center.
Cooling capacity. To keep your equipment operating safely within the recommended temperatures, you must track your cooling capacity. This helps maintain uptime and ensures that you have the capacity to cool the heat output of IT equipment. Be sure to have additional capacity to account for potential equipment failures and load growth.
Free rack units trend. This KPI allows you to see how many items can be installed in your data center over time based on RU height. It is useful to identify trends in the efficiency of your use of space and to correlate how much space vs. power capacity you have to deploy new devices
Available floor space remaining. In addition to tracking available cabinet space, track available floor space by the number of open cabinet positions to know how much white space is available to deploy new cabinets on the data center floor.
Data and power ports capacity and usage trends. How effective you are at planning and managing your data center capacity is related to how detailed your data is. Tracking capacity down to the data and power port level provides granular data that clues you in to how many available ports remain. Monitor your usage and capacity by connector type to ensure you never run out of free data or power ports in your data center.
Cabinets with most free data and power ports. When provisioning new equipment, you should know the best place to reserve cabinet space to achieve optimal utilization of resources. This requires knowing which cabinets have available data and power port capacity. By tracking physical port capacity at the cabinet level, you can intelligently provision new equipment, make more informed capacity planning decisions, use power and network resources more efficiently, and reduce operating expenses.
Requests by requester, stage, type, and location. To maintain SLAs while improving efficiency and productivity of data center staff, you must properly monitor and manage moves, adds, and changes. Track the number of change requests, tickets, and work orders, who is making them and where, what progress is being made, and what types of changes are being requested. Track your requests from creation to approval to ensure work order quality and transparency while improving staff efficiency through improved collaboration.
Completed requests over time. It’s important to know how much work is being done in the data center. One method of doing this is by monitoring the number of completed moves, adds, and changes over time. Tracking data center activity and productivity in this manner allows you to determine whether the number of employees in the data center is justified or not, troubleshoot outages more easily, and bill your customers more accurately.
Asset audit trail. Having complete visibility and transparency into the information and history of any asset in your data center helps drive efficiency and facilitate compliance. For the most effective remote data center management, maintain a real-time audit log for all changes in your data center that includes what action was taken, by who, and when.
Energy consumption per location. Energy consumption per server is growing each year as increases in performance drive energy demand, and the cost of energy consumed can account for up to 50% of total data center operating expenses. As such, energy consumption needs to be monitored and intelligently reduced. Track your energy consumption and set targets to reduce consumption, bill back users, meet corporate sustainability and green initiatives, and collect energy rebates and carbon credits.
Latest temperature per cabinet. A common mistake in data center monitoring is to monitor the temperature at the room level rather than the rack level, potentially leaving you blind to cabinets that are operating at unsafe temperatures. Instead, monitor each cabinet’s temperature in real time to ensure that your equipment is operating safely within ASHRAE standards, easily identify hot spots, and save money by avoiding overcooling.
Average temperature over time. In addition to tracking the latest temperature per cabinet, you should add a level of sophistication to your monitoring by trending that data over time to identify spikes and irregularities. By monitoring the average temperature per cabinet over time, you can ensure that your equipment is operating within safe guidelines not just now, but all the time. If you see temperature spikes, you’ll have data to identify what the issue was and prevent it from reoccurring.
Delta-T per cabinet. Delta-T is the difference in temperatures between two sensor readings at different locations of a cabinet. It is used to measure the inlet temperature of IT equipment compared to the heat emitting from IT equipment. You should monitor the Delta-T for each cabinet in your data center to help balance airflow volume, identify hot spots, and maintain a safe environment. This will lead to maximizing your cooling capacity, reducing operating expenses, and deferring capital expenditures.
It’s more critical than ever to integrate, analyze, and act on the KPIs that have the most impact on your data center, but how do you begin to remotely monitor these metrics? With a comprehensive remote Data Center Infrastructure Management (DCIM) solution, it’s easy.
A modern DCIM tool provides all your most important KPIs right out of the box with zero-configuration dashboard widgets, reports, and visual analytics. An enterprise-class data and health poller gathers data directly from facility equipment to ensure accurate, high-quality information that leads to deeper, more reliable insights. Second-generation DCIM makes it simple for data center professionals to make smarter, more informed remote data center management decisions to improve data center health and efficiency while dramatically simplifying capacity management.
Want to see for yourself Sunbird’s world-class dashboards for remote data center management? Take a free test drive today!