Table of contents for Before You Walk in The Door to Performance Test
- Before You Walk in The Door to Performance Test (Knowing the Costs: Open-Source aka “Free” vs. Commercial): Part Seven of an Ongoing Series by Howard Clark
- Before You Walk in The Door to Performance Test (Knowing the Costs): Part Six of an Ongoing Series by Howard Clark
- Before You Walk in The Door to Performance Test (Developing a Performance Checklist): Part Five of an Ongoing Series by Howard Clark
- Before You Walk in The Door to Performance Test (Assessing Your Teammate’s and/or Client’s Ability to Deliver): Part Four of an Ongoing Series by Howard Clark
- Before You Walk in The Door to Performance Test (Evaluating Test Infrastructure): Part Three of an Ongoing Series by Howard Clark
- Before You Walk in The Door to Performance Test(Defining Performance Requirements): Part Two of an Ongoing Series by Howard Clark
- Before You Walk in The Door to Performance Test: Part One of an Ongoing Series by Howard Clark
Welcome to Mission Impossible, at least at first glance it appears that way. Let’s look at the idea of an organization bringing you in cold, off the street to test the performance of their system or systems and grade that performance based on a set of criteria. This criterion unfortunately has not been pre-defined, and in fact the intelligence gathering that needed to take place hasn’t even occurred yet. What you come to discover at some point is that Company ABC takes orders for widgets, using three different channels, which receive orders from three different types of users, each of which use the system with varying frequencies. Now this hasn’t even been determined by Company ABC yet, but they would like to know how many orders their system can handle nonetheless. So how do you go about figuring this out exactly? Well let’s look at a few of the vital resources you need to track down.
- Historical observations from previous versions or other similar applications.
- Business analysts involved with the system, in their absence track down an external and/or internal power user.
- The web server logs for page hit frequency and source IP.
- The application server logs for building a usage map based on class invocations.
- The database is an excellent source for finding record counts over time.
- Your own usage pattern of the app can provide insight especially as a new user going through training material or first-time walkthroughs.
Now let’s propose that none of this information is available, the system is brand new and no one actually knows what to base the performance criteria on. Well this is where industry benchmarks can play a role. You move away from the actual application and look at it strictly from a hardware and infrastructure perspective. You can abandon user modeling, in favor of creating load/stress scenarios that explore the boundaries of the hardware’s capacity. Industry experts have defined typical distributed apps for the purposes of e-commerce and developed benchmark tests against those mock systems to help ascertain hardware capacity specifically. The organization whose standards I hold with the most regard is The Standard Performance Evaluation Corporation (SPEC) a non-profit corporation formed to establish, maintain and endorse a standardized set of relevant benchmarks that can be applied to the newest generation of high-performance computers (http://www.spec.org/). Now if this proves to be a little too far removed for the comfort of your client, then look to implement an arbitrary benchmark based on the larger more CPU, Disk I/O , and Memory intensive operations of the system under test. Package those operations together to form a customized unit of measure. You might create a product_catalog_browse unit which encompasses the following:
- Search for a specific family of product.
- Browsing the product image.
- Select a quantity.
- Add to cart.
This forms your unit that you would create a test script for and report as X number of product_catalog_browse operations per minute/hour/etc.. Once you’ve packaged enough operations together you can begin to test them both individually and in concert with each other. You can pace them out arbitrarily just be consistent, so that you make a repeatable test with a very low coefficient of variation. What this means is that the standard deviation of a group of measurements divided by the average or mean should be very low between test instances. This should be less than or equal to .2, anything higher tells us that our test is not repeatable. So make sure you re-initialize the environment between the test instances. This is basically putting base-lining into practice. So in the end you can give the client a count of how many product_catalog_browse operations in conjunction with say other units you defined like product_purchase and product_purchase_change operations that can be completed in an hour. While arbitrary these measurements would begin to paint a picture of what types and how many groups of functional operations can be supported. Perform these tests to the point of significant application response degradation, and we begin to get an idea of the hardware’s capacity.
So what we’ve done in the absence of performance criterion from the client, which quite honestly rarely exists for a new system, is create an acceptable specification or put into use an industry standard specification and explored the system’s capacity through rigourous exploration via stress/load test scenarios.
No Comments so far ↓
Your comments are welcomed.