Prussian statesman Otto von Bismarck once said, “Laws are like sausages, it is better not to see them being made.” I think a good case could be made that the same is true for ASL scenario playtesting and development. Most of the time, most ASL players don’t give much thought to how the scenario in front of them was created, nor is there any reason for them to. The exception is when they encounter a “dog” (i.e., unbalanced) scenario, or a scenario that clearly has some sort of rules or interpretation problem. Then one hears the irritated lament, “Was this thing even playtested?”
The answer may be no. Or the answer may be that it was not playtested enough, or not enough in the right way, or by the right players, or simply that the correct lessons were not learned from the playtesting that occurred. Sometimes a scenario turns out problematical or unbalanced because the publishers were lazy, sloppy, or incompetent. But often it turns out that way through no fault of their own. The bottom line, though, is that while extensive playtesting cannot guarantee that a scenario will be balanced or fun, the lack of such playtesting certainly raises the risk of such an outcome.
What I’d like to do in this essay is discuss playtesting and development in ASL, some of the difficulties involved, and perhaps more importantly, its past and present state. By way of introduction, I should say that I have had the opportunity to do scenario packs with three different publishers (MMP, Heat of Battle, and Schwerpunkt), to have playtested for a number of official and unofficial products, to have witnessed many other playtests, and to be close to a number of scenario designers and publishers and heard many an “inside” story (and gossip). Through all of this, I have seen a lot of sausage made, and it isn’t pretty. I will try to give this “insider’s” perspective without violating confidences.
For those new to this issue, I should say a word about first, why playtesting is important, and second, how playtesting is done. Playtesting is the term used to refer to testing of a wargame or wargame scenario through pre-publication playing to eliminate rules problems, ahistorical or unrealistic outcomes, and to make sure that the end product is balanced; that is, that both sides have a reasonable chance to win. I should note that many wargame publishers do not care about this last point; they have no problems producing games with unbalanced scenarios or campaigns, because they belong to a camp that believes in “historical” victory conditions (i.e., achieving the historical goals of one side) rather than “competitive” victory conditions (i.e., creating victory conditions such that both sides have a roughly equal chance at game victory, even if due to a preponderance of advantages one side will always overpower the other). In ASL specifically, however, balance is considered one of the greatest attributes of a scenario and is highly valued. One reason for this is that many wargames are primarily played solitaire, while ASL is primarily played competitively, whether face to face or on-line.
Development is the term used to refer the process by which wargame and scenario designs are honed and improved. Usually this means intepreting and applying the results of playtests in order to work out the kinks of a scenario. In some wargaming companies the designer is also the chief developer, while in other companies, the developer is a separate person. In the ASL world, the roles of designer and developer are usually combined, though not always (Avalon Hill and then later MMP, for example, typically relied/rely on outside designers and internal developers).
There are five main types of playtesting, all with uses and weaknesses. These are:
- Solitaire playtesting. A designer playtests his own work, playing both sides. This is typically useful to suss out set-up issues and errors, rules problems, and gross imbalances, often before sending it out for playtesting with others. Beyond that, its value is limited because a designer tends to be so familiar with the scenario that he is often blind to its problems. For example, because he wrote a rule, he knows how it should be interpreted, but may not realize how other people may be likely to intepret it in actuality. Unfortunately, in the history of ASL, there have been some products, typically by smaller TPP or appearing in general wargaming magazines, in which solitaire playtesting seems to have been the primary type of playtesting involved.
- Designer-involved playtesting. This type of playtesting involves the designer playing his scenario with someone else. This is a very important early stage type of playtesting. With the designer involved, he can quickly see problems emerging when his scenario is put to the test of play with an actual opponent (and may even be able to create solutions “on the fly.” The other person involved may try strategies the designer did not anticipate, or find other ways to push or even “break” the scenario. In these respects, this type of playtesting is very useful. However, its usefulness tends to decrease over time. The designer will quickly become familiar, even too familiar, with the intricacies of the scenario, whereas most of the time each opponent will be new to the scenario. What does it tell you, in terms of balance, if someone very familiar with a scenario beats someone playing it for the first time? It’s hard to know how to interpret that. Sometimes a designer can play the same scenario with the same person several times, which is definitely better, and produces results similar to internal playtesting (see below).
- Internal playtesting. Internal playtesting is when a relatively small group of people playtest the scenarios against each other. Many third party ASL publishers are oriented primarily around this method. Schwerpunkt and the East Side Gamers are two examples. This method offers many advantages, and relatively few drawbacks. The main advantage is that experienced player of known strengths can repeatedly play the same scenarios, as both sides–this is very likely to discover even small problems, if adequatel done. The chief drawback that does sometime emerge happens because small groups of people who repeatedly play each other often tend to develop similar playing styles over time (plus, their menstrual cycles synchronize). But sometime a scenario that tends to play out one way using a particular play style may actually play differently with some other play style. This is why it is often useful to supplement internal playtesting with at least some blind playtesting.
- Supervised blind playtesting. This sort of playtesting is very valuable, though opportunities for it do not come up that often. With this sort of playtesting, two players not part of an internal group playtest a scenario, and the playing is witnessed by a designer, developer or other person already experienced with the scenario. With this, one gets all the benefits of true blind playtesting (see below), but avoids some of the weaknesses.
- True blind playtesting. In contrast to supervised blind playtesting, true blind playtesting involves two people playtesting a scenario away from the vigilant eyes of the designer/developer, then reporting the results to him. This is very useful because it provides completely fresh eyes on a scenario. However, it definitely adds issues of interpretation, because the designer/developer is completely reliant on the playtesters adequately being able to communicate the import of their play. If playtesters interpreted a rule incorrectly, or were of greatly unequal skill, the designer may never know. This is why blind playtesting is an important corrective, but cannot typically be the primary method. The old Avalon Hill, though, often had enough resources (in terms of people willing to volunteer for them) that some of their games had different volunteer playtest coordinators, who were outside people who took on the role of surrogate developers, gathering and supervising blind playtests. Because the coordinators ended up being very familiar with the games themselves, they could help designers/developers interpret the playtests conducted under their supervision.
One thing that all this should suggest is that not all playtests (or playtesters) are equal. If one playtester is a very experienced ASLer and his opponent a novice, the results of that playtest may not be very valuable to the designer (unless the novice wins!). If two novices play each other, how valuable will their results be? It is hard to know. There are a number of ASL scenarios that give one set of results when played by experienced players (who can better undersand the nuances of the scenario) but another set of results when played by inexperienced players (typically because the scenario requires a knowledge of either offensive or defensive tactics that the novice will not have). And often it is the relatively inexperienced players, all fired up with a new passion, who may be most likely to volunteer to playtest.
So the magic question becomes how many playtests are the right number of playtests for a scenario. Alas, there is no magical answer. Some types of scenarios inherently require more playtests than others. For example, “timing” scenarios, in which different forces enter the mapboard (or must move across the mapboard) at different times from different places, are often notoriously difficult to playtest, because if the timing is not just right, one side or the other will easily be able to accomplish its goals. Moreover, some scenarios just start off being better designed (in terms of balance) than others. These scenarios happily require fewer playtests. To give a personal example, my favorite scenario from Action Pack #3: Few Returned (a pack I designed and developed both) is AP23, Agony at Arnautovo. The scenario involves an attack and counterattack (so both sides get to attack and defend during the scenario), which is not the easiest sort of scenario to design, so I anticipated a gruelling playtest process. However, to my delight, I discovered that the scenario seemed to work well right from the very first playtest. After that first playing, I made some slight adjustments, and from that point on, it seemed very fun and balanced, with no more changes required. As every playing worked out well, it did not take very many playings before I could confidently declare it “finished.” Happily, its post-publication history bore out the playtest results, and the scenario has a good reputation. In contrast, scenario AP22 (Ghost Riders) ended up taking a huge number of playtests to get the balance right, and ended up being a lot more work (though all that work did produce a good scenario).
As a rule of thumb, the less certain one is about one’s playtesters (for example, if the bulk of playtesting is designer-involved playtesting or truly blind playtesting), the more playtests are required. There is no magic number of playtests after which a scenario can be deemed awesome. Generally speaking, the more one playtests, the greater a chance that the scenario will be a good one. However, there are no guarantees, and after a while, the law of diminishing returns begins to kick in. Unfortunately, the law doesn’t inform you when it has kicked in.
Several years ago, I had the pleasure of interviewing John Hill, the wargame designer who created the original Squad Leader. He told me that in designing the scenarios for Squad Leader, he took extra care to playtest the first scenario, Guards Counterattack, a very high number of times, because it would be the first scenario players would experience and he wanted it to be just right. His great efforts did not go unrewarded; the scenario ended up being both fun and well-balanced.
When ASL first came out, Avalon Hill followed that tradition. The scenarios that appeared in Beyond Valor were extensively playtested. The proof is in the pudding. The ROAR site, which allows players to record the results of scenario playings, shows that almost all of the original 10 Beyond Valor scenarios are well balanced, and some of them are almost exquisitely balanced. Subsequent early Avalon Hill ASL products did not match this great height, but nonetheless tended to have a high proportion of well balanced scenarios.
The same is true for some third party ASL products. One of the most praised third party products in ASL history is the Windy City Wargamers 1996 tournament scenario pack. The Windy City Wargamers are a Chicago-based gaming group that developed an extensive ASL following (most famous for an annual tournament, the ASL Open, that is still being run today). The genesis for the pack began in late 1994 with just two scenarios, which grew in number over time (until it became a full pack). Original playtesting of the scenarios was done primarily by two people, who playtested them extensively, a couple as many as 12 times. In August 1995, Louie Tokarz, whose baby the project was, began sending them to other club members for more playtesting. Thus the pack had the advantage of being playtested by a fairly large number of very experienced players. This continued for six more months, until January 1996. By this time all the scenarios had received extensive playtesting. The pack was finally finished in February 1996. The resulting scenarios were extremely well received. Looking at it today, 13 years later, only one of the ten scenarios seems unbalanced, while the rest range from balanced to extremely balanced. Moreover, they are fun. Fully four of the scenarios in the pack are considered classics today: Eye of the Tiger, Stand and Die, Abandon Ship and Will to Fight…Eradicated.
Most ASL products do not get that amount of tender loving care, however. It is sad to say, but some publishers have published ASL scenarios that have not been playtested at all. One prominent third party publisher developed something of a reputation in the late 1990s for occasionally doing this. I cannot independently confirm those allegations. However, over the past five years, two different scenario designers confided in me that they submitted unplaytested scenario designs to third party publishers who published the scenarios as submitted (i.e., without playtesting). One of these cases involved a third party publisher that has had a fairly high reputation. After the scenario was published, players quickly realized that it had a problem which rendered it more or less unplayable, and a revised version had to be released. Obviously, even a modicum of playtesting would have revealed that problem. Some early third party scenarios, like some of the ones published in On All Fronts, an early ASL newsletter, clearly had no playtesting before publication.
In most cases, though, playtesting is not missing, but merely insuficient. Sometimes this is because of overstretching or a lack of commitment, or perhaps even “burnout” on a project before it is truly finished. In some cases, it may be an overconfident person publishing his own scenarios. Many designers find the design process far more enjoyable than the more thankless playtesting and development process; it is perhaps not surprising that some might neglect the latter somewhat. I was recently informed, regarding one third party publisher who has been publishing a fair amount of scenarios with balance issues, that the problem was that their internal playtesters were not experienced and they were reluctant to accept the comments of their few outside playtesters. Is this true? I have not yet been able to confirm it.
Rather than engage in innuendo, or being forced to keep sources anonymous, I will relate a personal experience of a less than optimal playtest/development process. My first scenario design project as a scenario pack centered on the U.S. 37th Infantry Division in the PTO (on New Georgia and Bougainville). I was still a relatively new player, having only played ASL for a few years at that point, though extensively. I designed 10 scenarios and began to get them playtested, thinking at some point I would submit them somewhere.
As it turned out, I ended up doing the bulk of the development work on the project. I was able to get a pretty decent number of playtests for the scenarios, but the bulk of them were either designer-involved playtests or truly blind playtests. Note from my above descriptions that both of these methods have problems. In too many of the playtests, I was personally involved, and I naturally became over-familiar with many of the scenarios, and would win no matter which side I played, which made it difficult to understand balance issues involved. Some playtesters were novices, while other playtesters sent inadequate reports. Far too often, a playtester or set of playtesters only played a scenario once, not becoming truly familiar with it.
Making sense of all this was difficult for me, an inexperienced designer. Although I did not realize it at the time, I tended to fall into some development traps. One common trap that developers can fall into is the tendency to explain away results. For example,a player sends to a designer a playtest result from a playing between him and a friend. In this playtest, the Japanese player won. While one might think the designer would at least wonder if perhaps the scenario is unbalanced in favor of the Japanese, especially if previous playtests showed similar results, sometimes a designer (who, it will be recalled, is often overfamiliar with his scenarios) will read the report and rationalize: “Well, the American player didn’t do X and Y; no wonder the Japanese player won.” It is actually quite easy to unconsciously slip into this mindset.
During all of this, I had reached an agreement with third party publisher Heat of Battle to publish the scenarios as a scenario pack; this eventually became Buckeyes! While making these arrangements, I had the self-awareness to realize that I was an inexperienced designer, developer and playtester, and that though all the scenarios had been playtested to some degree, these playtests were likely to be insufficient. I told Heat of Battle that the scenarios all needed to go through a thorough batch of playtesting by Heat of Battle. They began this process, playtesting several of the scenarios several times, but never completed it. I still don’t know why; I presume a combination of a lack of resources and lack of desire. Instead, the decision was made to go ahead and publish it. I went along with this; inexperienced as I was, I didn’t know any better.
The resulting product did not end up being a Beyond Valor or WCW pack, that’s for sure. While actual errata was very minor, some of the scenarios ended up less balanced than they would have been with more, and more rigorous, playtesting. The first scenario in the pack, Welcome to the Jungle, designed as a good introductory PTO scenario, ended up strongly unbalanced in favor of the Japanese (I should note Heat of Battle later let me add a bit of balancing errata to help correct this). After this was brought to my attention–in the not very shy way that ASL players have a way of doing–I arranged some post publication designer-observed blind playtesting that confirmed the scenario was not balanced. I subsequently went back through all the playtest reports and e-mails related to this scenario, to see if I could trace what went wrong, and I discovered that I had clearly fallen victim to designer rationalization. If I had spent less time trying to explain away the playtest results and more time paying attention to them, I might have been better off. Of course, this was my first effort as a designer and that was my first scenario.
As is the case with many third party products, not enough playings have been recorded in ROAR for me to know how many of the Buckeyes! scenarios might be unbalanced. One of them, Repple-Depples No More, certainly seems like it might be a candidate. Others I am more confident about, or have gotten good word of mouth reports on. Is Buckeyes! a bad pack? No, it’s not. It has some good scenarios in it. But it is only an average pack. There is every possibility that it could have been better if I had been a more experienced designer/developer and if Heat of Battle had given it the full playtesting that I had hoped they would. I think we jointly share the blame that Buckeyes! in all likelihood did not reach its potential.
So, you need a lot of playtesting and, moreover, you need a lot of that playtesting to be effective playtesting? And you need to be able to interpret those results correctly? That’s a lot of work. You bet.
I should note that there are others who have differing opinions on this issue. Glenn Houseman, an experienced scenario designer who is one of the main people behind the East Side Gamers scenario packs, thinks that many designers and publishers overstate the amount of playtesting their scenarios get (which may be true in some cases) and that scenarios typically don’t need a ton of playtesting.
In May 2006, Glenn (a friend of mine), made the following post (somewhat edited by me for length) on an ASL forum, explaining his views (he has reiterated this opinion at times, so I believe it is one he still holds):
“A good scenario designer creates a balanced scenario, which then only needs to
be playtested a few times to polish the details. There seems to be a whole
ignorant cult that’s developed in the last couple of years that the more
a scenario is playtested, the better and more “balanced” it will be. It’s a
load of crap. A lousy design doesn’t magically get better by playtesting it over
and over. It just wastes playtester’s time. I’m very disappointed that many of
the designers I’ve met feel the need to inflate the number of playests their
scenario had before it was published. They seem ashamed to admit to anything
under a dozen. They may be impressing people who don’t know anything about
scenario design, but they make me laugh. If your design needs to be playtested
two dozen times, you suck. A scenario’s balance and playability are decided
at creation, NOT as the result of playtesting it over and over. The DESIGNER is
responsable for how balanced and fun it is, not the ammount of playtesting. If a
designer creates an inherently unbalanced, boring scenario, two dozen playtests
will not change this…it will only make it more bland. My most popular and
balanced design (22-17 on ROAR) was only officially playtested three times. Most
take four or five. Others have taken nine or ten. If it goes beyond that…I
dump it…I simply can’t believe that I’m some sort of genius who can do in five
playtests what other designers require in twenty…Please explain to me how this
myth of a zillion playtests equals a good, balanced scenario came to be accepted
as gospel; and why perfectly good, talented designers now feel the need to lie
about how many times they playtest.”
This is clearly an area where Glenn and I disagree, at least for the most part. I think it is mostly true that extensive playtests cannot necessarily save a badly conceived or designed scenario. I will note that sometimes they can (playtesters can often come up with great suggestions), and also that sometimes designers don’t realize their design is poor (and perhaps should be abandoned or reconceived), and it is only the playtesters who can make them aware of this. Subtract those playtests and the designer may march forward in ignorance.
But I think more importantly, extensive (quality) playtesting makes it more likely that a scenario will be good, or that an inherently good scenario can become excellent. Extensive playtesting, though never a guarantee of success, offers at least the possibility that minor problems can be discovered and excised, that game concepts can be thoroughly tested, and that balance can be honed. Though Glenn, a talented designer, has designed and published a number of scenarios I have thoroughly enjoyed, I have to suspect that even more of his scenarios would have been even better had he arranged for more extensive (and more blind) playtesting. We will have to be satisfied with what we have.
I do think he is right that some people give the impression that their products are more playtested than they actually are. For people concerned about this issue, some potential warning signs to look for include: 1) few or no playtesters mentioned or credited in the product; 2) short period between first announcement that a product is in development and its appearance on the market (keep in mind that it typically takes at least a year to produce a pack of 10 well playtested scenarios, and adjust upwards accordingly for larger products); 3) publishers releasing a number of products in a short period of time; and 4) large amounts of errata or post-publication fixes, other than those related to production problems.
Unfortunately–and in case people were wondering if I would continue this essay forever, I will let people know that we are at the final section–I am somewhat pessimistic about the state of playtesting today and its future. The main reason for this is that the pool of quality playtesters is getting mighty overworked these days. The good news is that we are experiencing a wealth of third party products (partially due to a bit of product constipation at MMP). Just think of the active third party publishers out there: Critical Hit, Heat of Battle, Bounding Fire Productions, Le Franc Tireur, Schwerpunkt, East Side Gamers, Friendly Fire, and several smaller ones.
Some of these publishers have their own internal groups, people who rarely do any ASL other than playtesting their on products. Others reach out to the ASL world, soliciting help in playtesting. I know a number of ASLers actively playtesting for at least three publishers. Yet only a certain percentage of ASLers are interested at all in playtesting, of these not all are actually good playtesters, and of the good playtesters, many have limited time.
I can’t escape the feeling that there is more demand for good playtesting right now than there are good playtesters with sufficient time and energy. Even some publishers with internal playtest groups, like Schwerpunkt, are cuttng back on the amount of playtesting they do for their products. Some have justified this by nothing that they are far more experienced playtesters and developers today than a decade ago. Therefore, it is easier for them to note potential problems or issues today, whereas earlier it would have required more playtests to find the same problems. I think this explanation has a certain amount of validity to it, but as Stalin might have said, sometimes quantity has a quality all its own.
Though it would be difficult for me to objectively prove it, I think we are suffering the consequences today for the comparatively poor playtester to product ratio. Equivalents to the WCW pack seem to be relatively rare these days. Some might suggest that some of the Friendly Fire packs are comparable, and this may be true–they are of high quality overall–but it will take more time to be sure. I am not privy to the amount or nature of playtesting that goes into the Friendly Fire packs, so I can’t comment on them in that regard.
It may well be that some of us will have to adjust our expectations for scenario packs, if we have not already. For a pack of 10 scenarios, perhaps a reasonable expectation would be that three of its scenarios turn out to be very well balanced, three more turn out to be reasonably balanced, and the rest range from somewhat unbalanced to very unbalanced. That is how a number of ASL products are turning out these days. Is this satisfactory to you? Unsatisfactory? I honestly am not sure myself.
Sometimes the back room of the butcher shop isn’t that pretty! But I still want my bacon…
Radicus says
Hehe, sometimes it pays to watch sausages being made, Herr Schmidt in Philly in the late 1800;s got the idea for mass producing condoms from working in a sausage factory, created an empire and made a fortune!
Nice article Mark. Thanks.
scrub says
A great read. Lots to think about. I have done a bit of playtesting (and reviewing of games in a previous life).
Something changes when you have something you do for fun start to become “business” and “serious”. It is, as you say, a pretty thankless task and I applaud those able to keep it up so the rest of us can enjoy the fruit of their efforts.
BWP says
Nice article, and while I don’t actively disagree with anything you say here, I am more inclined to subscribe to Glenn’s POV. I don’t think there’s really that much of a science to playtesting, and I don’t think extensive extra testing will turn an already-good scenario into an even better one. I think that (in the vast majority of cases) the ones that are really good are born that way, not made … and ditto the really bad ones, which does at least prove that *some* playtesting is always a necessity!