# StatXact: Statistical Software for Exact Nonparametric Inference

*A full-featured statistical program with many sophisticated algorithms to ensure accuracy*

click the image to enlarge Figure 1: Main Screen |

StatXact is software for conducting exact statistical tests, confidence intervals and power/sample size calculations. For the non-initiated, these are tests that calculate exact p-values from compilations of tables, giving more extreme values than a given cutoff table. These are to be contrasted with p-values based upon asymptotic theory that requires very large samples and a normal distribution. As large sample size and normal distribution requirements are not always a good match for real-world data, the researcher has some need for these types of calculations. StatXact is just one part of Cytel Studio, which serves as the Windows base for this program and its sister software, LogXact, and provides the more basic statistical and plotting functions. Version 6 of this software is designed for Windows NT/2000/XP and requires at least a 1 GHz processor with 512 MB to 1 GB of RAM (depending upon the size of the data sets and calculations). At least 5 GB of available hard disk space are recommended, but presumably this could be reduced for only modest-sized datasets.

Upon boot up, the main screen appears (Figure 1). This consists of the menu and tool bars (which will be familiar to Windows users) the navigator area and the case data spreadsheet. As analyses or plots are generated, they can be immediately accessed by either the navigator icons or tabs that appear above the spreadsheet. The spreadsheet is not terribly Excel-like, and this is a problem as the analyst will not be able to manipulate the data as readily as in Excel, or slice and dice columns/rows as with some other statistical programs. There is capability to do simple cut-and-paste operations as well as sort, filter and transform data. StatXact allows a wide variety of file imports (Excel, Lotus, ASCII, SAS, Stata, Statistica, SPSS, Systat, BMDP and dBASE), but not so many exports (ASCII, SAS, Stata and SPSS). In addition, it is a simple matter to copy and paste data (minus the headers) from programs such as Excel and JMP into StatXact, or vice versa.

Now, the standard commentary concerning the manuals…

click the image to enlarge Figure 2: Box Plots |

For those who hate to continuously stare at a computer screen and who have trouble finding everything from the online help or the help menu, the developers supply three very well-written and complete manuals. The Cytel Studio Base manual will start the novice off with minimal pain, and the two StatXact manuals will fill in much of the theoretical detail for more advanced users. The indices are actually useful, the pagination continuous across both manuals, and both volumes have a full subject index.

Graphics capabilities are limited, but this is not a graphics program. Standard analytic plots such as the histogram, box, line, scatter, and stem and leaf are available, as well as several others (Figures 2 and 3). Basic modifications are possible through the 'Modify' button, which appears with each graphic.

As for the statistics, there is a complete menu of non-parametric choices (check their online brochure), of which the major categories are listed in Table 1. Please note, the software also will do the good old-fashioned descriptive statistics, as well as parametric tests and Monte Carlo simulations. The menus are logically arranged and the tests are easy to access. However, once the user is set to do an analysis, the little nuances of the program become a bit too obvious.

click the image to enlarge Figure 3: Scatter Plot |

Upon selecting a Wilcoxon test, a dialog box appears requesting selection of such variables as 'Population,' 'Response,' 'Stratum' and 'Frequency.' For those neither intimately involved in statistical numbers crunching nor schooled in the language of epidemiology, this is a tough choice. What the program is actually asking is to have stacked data with a numeric grouping column. The data goes into the Response box, and the grouping variable into the Population box. And it only took me 15 minutes to figure it out! Somewhere in the manuals it was explained, but I couldn't find it in the online help, as no examples with actual numbers were discovered. Luckily, the tests requiring row and column data are more transparent. The output of these tests is nicely tabularized so it is easy to quickly pick out the relevant features (Figure 4). Unfortunately, on tests where many results are required, the table is too lengthy to be seen without scrolling, and it was not immediately apparent whether or not the tables could be reformatted.

Another limitation is computational time. When the asymptotic tests are done, most calculations with even large datasets are carried out near instantaneously, and the software offers an elapsed time for each calculation. When the exact tests are done, run times that exceed those of the Monte Carlo simulations were seen. These take only 5 to 7 seconds for modest data size, but will slow considerably with larger sets.

And now, for the big pluses….

Many researchers notice little difference between asymptotic and exact p-values when comparing results. This is especially true when considering only the decisions that will be made based on the cutoff. While running through several dozen test sets, I noticed many differences between the two, but never one that would reverse a decision.

click the image to enlarge Figure 4: Output Format |

However, upon "adjusting' some data to approach the cutoff alphas, it was apparent that the exact tests were suggesting sound conclusions, while the asymptotic values argued for more data.

Other pleasant discoveries were the speed of calculation and completeness of the tables produced for such commonly used multivariate procedures as discriminant analysis and principal components analysis. In both cases, however, it would have been helpful if the options included 2-D and 3-D graphics of the group separations, something usually done for a preliminary assessment of separation quality prior to looking at the numbers.

For the clinical crowd, the DataEditor menu has some very nice sorting and filtering properties, and other data manipulation features will greatly assist researchers attempting to subset the data.

click the image to enlarge Table 1: Selected Statistical Routines |

Although this is definitely not a package for the rank beginner, it offers a variety of excellent tests (heavy on the non-parametric) that, while seemingly geared toward clinical and epidemiological data, will be useful in a wide variety of situations. Thus, many biological and physical scientists will find this of use and, as the routines are available through SAS, many statisticians as well. While pricey by my standards, it should be remembered that this is a full-featured statistical program with many sophisticated algorithms to ensure accuracy. I found the help desk to be courteous, mercifully fast, and the staff highly knowledgeable about the statistical routines. Full listings of features as well as upgrades may be easily found at the Web site. *Note: While this review was awaiting the light of publication, Cytel introduced version 7, that includes a number of significant upgrades: correlation tests across a single (as opposed to bi-) variable, speedier batch language, p-values for the trend test in multiple outcomes, estimates of conditional maximum likelihood of trend parameters in correlated data, clustered response tests for correlated significance, and improved visualization and automation functions.*

##### Availability

• Commercial: $1,400**Cytel Software**

675 Massachusetts Ave.

Cambridge, MA 02139-3309

617-661-2011; Fax: 617-661-4405

info@cytel.com; www.cytel.com*John Wass is a statistician with GPRD Pharmacogenetics, Abbott Laboratories. He may be contacted at sceditor@scimag.com*