Avoiding service interruptions in a data warehouse

Nicholas Dimotakis, Regional Service Director, PS Management –

In an increasingly complex world, the way in which we use and store data has significantly changed. A data warehouse – or a large store of data accumulated from a wide range of sources within a datacentre and often used to guide business decisions – is critical to many modern organisations. And, while the cost of RAM continues to fall each year, the silicon technology has failed to progress for several decades. This phenomenon has driven many businesses to adopt flash in addition to their spinning-disk storage. But, the management of these multi-tiered storage environments can become extremely complex and it is not uncommon for data warehouse service interruptions and performance issues to arise as a result.

Below are three issues that are commonly the cause of data warehouse application performance issues:

Physical layer issues
Theseusually result in I/O information becoming lost in transmission.  This, in turn, forces something higher up the network stack (HBA, multipath software, OS) to re-issue the I/O command.  To the application, this appears as long wait times, resulting in data bottlenecks and database slow-downs.

Queue depth issues
If we imagine that every HBA in a server represents a highway lane and I/Os are cars, the more HBAs we introduce, the more lanes we have to access our storage. But on these highways there are no speed limits, nor are there any other regulations. This scenario would soon result in motorist anarchy and the highway would become inefficient. By turning queue depths – the number of pending input/output (I/O) requests for a volume – on allows the SAN administrator to ensure that every I/O (car on the road) has a smooth journey and therefore a fast response-time.

In an Oracle RAC environment, multiple HBAs from multiple servers can make requests to storage ports or LUNs and very quickly overload the storage resources capabilities.  We have seen MANY times that an improperly tuned HBA causes response times to increase 10x or even 100x.

Buffer-to-buffer credit issues
These will appear to applications as seemingly random and unpredictable slow-downs, not related to server load. Such delays can increase response times by 10x or even 100x.

In conclusion, data warehouse applications are, by nature, I/O intensive and therefore require servers that are able to process large volumes of information. The issues listed above are certainly the most commonly found to affect performance. None of these occurrences can be resolved by simply introducing a faster disk: in order for I/Os to flow properly, HBAs need be well tuned, for example. Only with tools that provide managers with visibility into its real-time performance, health, and utilization metrics, can these pitfalls be averted before they result in service interruptions.