Testing is Rocket Science Not Brain Surgery

Almost all of the World's Greatest Accomplishments Were The Result of Great Planning!

Testing is Rocket Science Not Brain Surgery header image 1

A Side Note: Job Posting for Aeronautical Engineer – Have experience as an airplane passenger? You’re hired! by Howard Clark

April 6th, 2007 · Uncategorized

The assumption is that you know where you are going and the terrain to some extent, versus a haphazard venture into unknown programming logic. I’m of the opinion that software testing should be performed by software developers trained in the art and science of testing. Now ideally this would be the same developers who wrote the code, who in a peer review situation would cross-test each other’s code. We’ve been led to believe that the bias these developers hold can’t be overcome and so a neutral party needs to be involved. I’m fine with that, but that neutral party really needs to have a level of technical acumen around the technologies used. If you look at the profile of a flight engineer for a space shuttle mission you typically find extensive military and aeronautical experience in addition to engineering. You aren’t going to see some brilliant microchip designer flying on a shuttle mission without certain pre-requisites, the required skills and attributes are not mutually-exclusive. Neither should the skills and attributes of a software tester and a developer be seen as mutually-exclusive. [Read more →]

→ 3 CommentsTags:

Before You Walk in The Door to Performance Test(Defining Performance Requirements): Part Two of an Ongoing Series by Howard Clark

April 4th, 2007 · Uncategorized

Welcome to Mission Impossible, at least at first glance it appears that way. Let’s look at the idea of an organization bringing you in cold, off the street to test the performance of their system or systems and grade that performance based on a set of criteria. This criterion unfortunately has not been pre-defined, and in fact the intelligence gathering that needed to take place hasn’t even occurred yet. What you come to discover at some point is that Company ABC takes orders for widgets, using three different channels, which receive orders from three different types of users, each of which use the system with varying frequencies. Now this hasn’t even been determined by Company ABC yet, but they would like to know how many orders their system can handle nonetheless. So how do you go about figuring this out exactly? Well let’s look at a few of the vital resources you need to track down.

  • Historical observations from previous versions or other similar applications.
  • Business analysts involved with the system, in their absence track down an external and/or internal power user.
  • The web server logs for page hit frequency and source IP.
  • The application server logs for building a usage map based on class invocations.
  • The database is an excellent source for finding record counts over time.
  • Your own usage pattern of the app can provide insight especially as a new user going through training material or first-time walkthroughs.

Now let’s propose that none of this information is available, the system is brand new and no one actually knows what to base the performance criteria on. Well this is where industry benchmarks can play a role. You move away from the actual application and look at it strictly from a hardware and infrastructure perspective. You can abandon user modeling, in favor of creating load/stress scenarios that explore the boundaries of the hardware’s capacity. Industry experts have defined typical distributed apps for the purposes of e-commerce and developed benchmark tests against those mock systems to help ascertain hardware capacity specifically. The organization whose standards I hold with the most regard is The Standard Performance Evaluation Corporation (SPEC) a non-profit corporation formed to establish, maintain and endorse a standardized set of relevant benchmarks that can be applied to the newest generation of high-performance computers (http://www.spec.org/). Now if this proves to be a little too far removed for the comfort of your client, then look to implement an arbitrary benchmark based on the larger more CPU, Disk I/O , and Memory intensive operations of the system under test. Package those operations together to form a customized unit of measure. You might create a product_catalog_browse unit which encompasses the following:

  1. Search for a specific family of product.
  2. Browsing the product image.
  3. Select a quantity.
  4. Add to cart.

This forms your unit that you would create a test script for and report as X number of product_catalog_browse operations per minute/hour/etc.. Once you’ve packaged enough operations together you can begin to test them both individually and in concert with each other. You can pace them out arbitrarily just be consistent, so that you make a repeatable test with a very low coefficient of variation. What this means is that the standard deviation of a group of measurements divided by the average or mean should be very low between test instances. This should be less than or equal to .2, anything higher tells us that our test is not repeatable. So make sure you re-initialize the environment between the test instances. This is basically putting base-lining into practice. So in the end you can give the client a count of how many product_catalog_browse operations in conjunction with say other units you defined like product_purchase and product_purchase_change operations that can be completed in an hour. While arbitrary these measurements would begin to paint a picture of what types and how many groups of functional operations can be supported. Perform these tests to the point of significant application response degradation, and we begin to get an idea of the hardware’s capacity.

So what we’ve done in the absence of performance criterion from the client, which quite honestly rarely exists for a new system, is create an acceptable specification or put into use an industry standard specification and explored the system’s capacity through rigourous exploration via stress/load test scenarios.

→ No CommentsTags:

Before You Walk in The Door to Performance Test: Part One of an Ongoing Series by Howard Clark

April 2nd, 2007 · Uncategorized

We’re here to benefit the testing community overall so it’s prudent to help out on a few of the real issues you’ll face on the job. The goal of this series is to help you settle the hard questions whose answers you may have taken for granted when you signed up for this mission. If you only take one thing away from this blog, let it be this, DO NOT TAKE ANYTHING FOR GRANTED!

There, you’ve been warned, with capital letters to boot. I could almost rest on that statement alone and end this series, but that would be a disservice to the enormity of the error in not addressing the specifics. So over the next few weeks we will explore the big ones, the issues that leave you thinking to yourself, “Why did they even post this requirement, now what do I do that I’m here on-site?”

Addendum:

Thanks to Ben Simo’s contributions at Quality Frog the idea of “Defensive Pessimism” was brought to my attention. This archetype is actually a good place to start for the performance tester, but as with all things it should be used in moderation.

→ No CommentsTags:

Testers, Developers and Mine Fields by Howard Clark

March 30th, 2007 · Uncategorized

By definition when a person steps on a land mine and activates it, the mine’s main charge explodes and releases a blast wave consisting of hot gases (the by-product of the explosion). This blast wave will send a huge compressive force upwards, bringing the mine casing and bits of the soil covering the mine along with it.

Apparently this is information one needs to know before you release the findings of a performance test analysis. The mine is the “bug” or component responsible for the performance degradation, and man oh man does it have explosive potential. A performance bug can speak to an architectual mis-step which can have awesome consequences affecting everyone from the neighborhood developer all the way to the C-level sponsers. The realization that mines are sort of unconventional, that they lie in wait, ambushing the unsuspecting victim should be cause for concern. This has a sort of malevolence about it, and really doesn’t have a place even in war where combatants typically engage each other openly.

But when developers are going full throttle, and testing is disengaged waiting for code to be released, that open dialog doesn’t happen. Compound that with the pressures of meeting deadlines, and the increased potential to take shortcuts and we begin to set the stage.

So often even after the development has ended and testing has even been through multiple cycles the mines remain, people forget where they put them, and they become a long lasting problem. Well that’s exactly what a performance bug is nine times out of ten, a forgotten land mine lying in wait. But what you, the performance tester, should do is begin your mind sweeping in advance and post a warning. You don’t blindly walk out into the minefield; good way to lose a foot, you probe slowly and carefully place markers at suspected sites. You take proactive measures and gather intelligence early, letting the players know they need to watch their step. Then you can successfully go in and dispose of issues, and resolve the potential bugs in a controlled manner. Yes, you’ll still find something that you didn’t mark, and that’s to be expected. But what makes it easier for everyone is the big sign that says DANGER, watch your step. Not in an effort to be an alarmist; chicken little never convinced anyone of anything, but rather in an effort to be informative and insightful.

→ No CommentsTags:

And we’re off!

March 26th, 2007 · Uncategorized

If testing is so easy then why do we humans seem to fail at it in alarming rates? It probably has a lot to do with one’s initial viewpoint about it as an activity associated with software development, versus it being an activity with its own path. Maybe we need to coin the phrase “Software Testing Life Cycle.” This bizarro world I inhabit makes development a product of testing, where we test, develop, and iterate versus the traditional “code-fix” conundrum. How we do this is a matter of debate as testing moves into the developer unit testing world and lives as a parallel activity beginning with requirements as the first object to be tested.

→ No CommentsTags: