Infrastructure investigations: the case of the added second-and-a-half

By Nick York, Product Manager, Analytics –

In our ongoing detective series, we’ll investigate unexplained IT infrastructure problems and walk through the process of using infrastructure performance management to solve the mysteries.

A Virtual Instruments customer planned a swap of its production and disaster recovery sites. We heavily outfitted each site with our infrastructure performance management (IPM) solution, VirtualWisdom4. Our customer’s legacy systems complicated the initial migration in the early stages. Upgrades to business intelligence suite versions were required, as well as operating system patches, to ensure the data stored on the legacy systems transferred appropriately.

With these kinds of migrations, our infrastructure investigators anticipate some issues, so we developed production workload models to set performance benchmarks early and ensure we would know what had changed following the project’s completion.

The crime, the clues and investigation

We quickly saw all of the telltale signs of system delays, but only in certain components. Session stacking was apparent as soon as the site came online, and database administrators saw log-write elapsed times of more than 1,500 milliseconds in trace logs, but the storage team saw normal write-response times when it looked at the performance monitoring tools it used. The problem was there, though, on the customer side; they saw the extra second to second and a half.

This crime needed solving – and fast. Undoing the migration would be painful, taking as long as the failover, but it would be necessary if the delays remained.

The client turned to Virtual Instruments, and we got right on the case. Using VirtualWisdom, we could see exactly what occurred even if the other monitoring solutions in place could not. Our team got to work. The histograms produced to assess every data point individually confirmed what we already knew, and then some. IPM found that the delay came at the top of every minute, and VirtualWisdom pinpointed the exact point within the process at which it was happening.

The culprit

The new array’s performance monitoring process created the delay at the top of every minute while it produced a summary of the performance data. The write request entered the storage port every 1,000 to 1,500 milliseconds or 1 to 1.5 seconds, always at the top of the minute. With this insight gleaned from VirtualWisdom4 in hand, the client relayed the information to its storage provider to resolve the problem quickly. The end result was faster, undeterred access to the data.

Another Infrastructure Investigation closed, thanks to IPM with Virtual Instruments.

Have a performance monitoring case you need to solve? IPM is exactly what you need.