StatCalc

RequirementStatCalc
Section3.2.6
JIRA Task
EIR-56 - Getting issue details... STATUS
Reviewed For
Date
Conventional spacing between sections2016-08-31

Introduction

The StatCalc component of Epi Info™ 7 enables the user to evaluate the performance of different study designs and statistical tests by supplying high-level information on the properties of hypothetical data sets and the criteria used for evaluation. StatCalc tools can be divided into three broad categories: 1). sample size and power calculations for unmatched case-control studies, population surveys, cohort or cross-sectional studies, chi-square for trend by the Mantel extension of the Mantel-Haenszel summary odds ratio, and chi square (tests for the presence of a trend in dose-response or other case-control studies where a series of increasing or decreasing exposures is being studied); 2). analysis of 2×2 tables to produce odds ratios and risk ratios (relative risks) with confidence limits, Fisher exact tests, and 1- and 2-tailed p-values, with Mantel-Haenszel summary odds ratios, chi square tests and associated p-values for stratified data; 3). distribution-based event probabilities, 2-tailed p-values and confidence intervals for deviations from binomial (proportions) and Poisson (rare events) distributions given the number of observed and expected events. The StatCalc tools can be accessed as an independent module from the main menu or as part of the Visual Dashboard.


Accessing StatCalc

Overview

StatCalc appears on the Epi Info™ 7 main menu, middle row, right column, and is captioned "Statistical calculators for sample size, power, and more."  Clicking on StatCalc in the main menu opens the StatCalc menu, which is similar to the main menu in appearance and contains buttons for the eight (8) StatCalc calculators listed in StatCalc Calculator Properties, below. The individual calculators are also part of Visual Dashboard (VD).  In VD, the calculators can be accessed from the Options menu, under the "Add StatCalc calculator" submenu, and added to the dashboard Canvas in the same manner as other gadgets.  As StatCalc calculators do not access data records and process only data entered manually by the user, calculators can be added to a canvas without having to first attach a data source.  However, calculators placed on the dashboard cannot be saved with the canvas, nor can their output be exported or printed with the other analysis gadgets.

Functional Requirements

  1. Epi Info™ 7 shall enable the user to open the StatCalc menu from the Epi Info™ main menu.
  2. The StatCalc menu shall enable the user to start the eight (8) StatCalc calculators from an array of buttons bearing the names of the calculators.
  3. When the user selects a calculator from the StatCalc menu, each calculator shall open a window containing the components specified in StatCalc Calculator Properties, below.
  4. The VD shall enable the user to select StatCalc calculators from the Options menu.
  5. The VD shall enable the user to place the selected calculators on the Canvas.
  6. Calculators placed on the dashboard Canvas shall have all the relevant properties of other gadgets including the ability to:
    1. be repositioned on the Canvas,
    2. anchor to other gadgets,
    3. stack vertically with other gadgets in response to the user pressing the Vertical Arrange button on the VD status bar,
    4. hide borders in response to the user pressing the Hide Gadget Borders button on the VD status bar[1],
    5. collapse and expand the calculator window in response to the user pressing the Arrow toggle button on the gadget's title bar[2], and
    6. close and be removed from the Canvas in response to the user pressing the Close button (red box with a white × inside).

Notes:

  1. The behavior of this function in StatCalc calculators differs from that of other gadgets. In most cases, the Hide Gadget Borders button not only removes the outer rectangle, but the title bar and associated buttons as well. As a consequence, the gadgets can no longer be repositioned, reconfigured, collapsed, expanded, or closed. The only function the Hide Gadget Borders button provides to a StatCalc calculator is to remove its border.
  2. These functions have not been implemented as of Epi Info™ version 7.2.0.1. See also Future Development.


StatCalc Calculators

Overview

The StatCalc calculators can be divided into three general categories: 1) sample size and power calculations, 2)  two-way table calculations, and 3) distribution-based event probabilities. Each category has distinct input parameters and analyses, but they all provide the ability to enter hypothetical data and evaluate their statistical significance.  In doing so, the user may determine important study parameters, such as the number of subjects required to test a hypothesis given specific assumptions about variables such as rates of exposure or the ratios of cases to controls.

Functional Requirements

The requirements for StatCalc are best described in a tabular format, with a row for each calculator and a column for each feature. The features are: calculator name, title (used when the calculator is open as a window or gadget), description, input parameters, and derived values.


StatCalc Calculator Properties

Sample Size & Power
CalculatorTitleDescriptionInput ParametersDerived Values
Population SurveySample Size and Power

Population survey or descriptive study (For simple random sampling, leave design effect and clusters equal to 1.)

  1. Population size
  2. Expected frequency
  3. Acceptable margin of error
  4. Design effect
  5. Clusters
  1. For confidence levels of:
    1. 80%
    2. 90%
    3. 95%
    4. 97%
    5. 99%
    6. 99.9%
    7. 99.99%
  2. Cluster size
  3. Total Sample
Cohort or Cross- SectionalSample Size and PowerUnmatched cohort and cross-sectional studies (exposed and unexposed)
  1. Two-sided confidence level:
    1. 80%
    2. 90%
    3. 95%
    4. 97%
    5. 99%
    6. 99.9%
    7. 99.99%
  2. Power (%)
  3. Ratio (Unexposed: Exposed)
  4. Outcome in unexposed group (%)
  5. Risk ratio
  6. Odds ratio
  7. Outcome in exposed group (%)

A table consisting of:

  1. Rows:
    1. Cases
    2. controls
    3. totals
  2. Columns:
    1. Kelsey
    2. Fleiss
    3. Fleiss w/CC Models
Unmatched Case-ControlSample Size and PowerUnmatched case-control study (Comparison of ILL and NOT ILL)
  1. Two-sided confidence level:
    1. 80%
    2. 90%
    3. 95%
    4. 97%
    5. 99%
    6. 99.9%
    7. 99.99%
  2. Power (%)
  3. Ratio of controls to cases
  4. Controls exposed (%)
  5. Odds ratio
  6. Cases with exposure (%)[1]

A table consisting of:

  1. Rows:
    1. Cases
    2. controls
    3. totals
  2. Columns:
    1. Kelsey
    2. Fleiss
    3. Fleiss w/CC Models
Chi-Square for TrendChi-Square for TrendAnalysis for Linear Trends in Proportions
  1. Series of records containing:
    1. exposure score
    2. N cases
    3. N controls
  1. odds ratio (for each record)
  2. Chi square for linear trend[2]
  3. p-value
2×2 Table Calculations
CalculatorTitleInput ParametersDerived Values (By Stratum)Derived Values (Summary Results)

Tables (2×2×N)

(Stratified Two-way Tables)

2×2 Tables

By stratum, 1 - 9:

  1. Subject count: Exposure (plus), Outcome (plus)
  2. Subject count: Exposure (plus), Outcome (minus)
  3. Subject count: Exposure (minus), Outcome (plus)
  4. Subject count: Exposure (minus), Outcome (minus)
  1. Odds-based Parameters
    1. Odds Ratio:
      1. Estimate
      2. Lower
      3. Upper
    2. Maximum Likelihood Estimate Odds Radio (Mid-P): 
      1. Estimate
      2. Lower
      3. Upper
    3. Fisher's Exact Test:
      1. Lower
      2. Upper
  2. Risk-based Parameters
    1. Risk ratio:
      1. Estimate
      2. Lower
      3. Upper
    2. Risk difference:
      1. Estimate
      2. Lower
      3. Upper
  3. Statistical Tests
    1. Uncorrected:
      1. chi-square
      2. 2-tailed p-value
    2. Mantel-Haenszel:
      1. chi-square 
      2. 2-tailed p-value
    3. Mid-P exact test:
      1. 1-tailed p-value
    4. Fisher's exact test:
      1. 1-tailed p-value
      2. 2-tailed p-value
  1. Odds ratio
    1. Crude (Cross Product):
      1. Estimate
      2. Lower
      3. Upper
    2. Crude (MLE):
      1. Estimate
      2. Lower
      3. Upper
    3. Fisher's Exact test:
      1. Lower
      2. Upper
    4. Adjusted (MH):
      1. Estimate
      2. Lower
      3. Upper
    5. Adjusted (MLE):
      1. Estimate
      2. Lower
      3. Upper
  2. Risk ratio
    1. Crude:
      1. Estimate
      2. Lower
      3. Upper
    2. Adjusted:
      1. Estimate
      2. Lower
      3. Upper
  3. Chi-square
    1. Uncorrected (MH):
      1. chi-square
      2. 1-tailed p-value
      3. 2-tailed p-value
    2. Corrected (MH):
      1. chi-square
      2. 1-tailed p-value
      3. 2-tailed p-value
Matched-Pair Case ControlPair-Matched Case-Control Study
  1. Cases: Exposure (plus)
  2. Cases: Exposure (minus)
  3. Controls: Exposure (plus)
  4. Controls: Exposure (minus)

  1. Odds-based parameters
    1. Odds Ratio:
      1. Estimate
      2. Lower
      3. Upper
    2. Exact:
      1. Lower
      2. Upper
  2. Statistical Tests
    1. McNemar:
      1. chi-square
      2. 2-tailed p-value
    2. Corrected:
      1. chi-square
      2. 2-tailed p-value
    3. Fisher's exact test:
      1. 1-tailed p-value
      2. 2-tailed p-value
Distribution-Based Event Probabilities
CalculatorTitleDescriptionInput ParametersDerived Values
BinomialBinomialBinomial - Proportion vs. Standard
  1. Numerator (cases)
  2. Total observations
  3. Expected percentage
  1. Probability that the number of cases is <, <=, =, >=, > numerator
  2. Two-tailed p-value
  3. 95% Confidence interval
PoissonPoissonRare Event vs. Standard
  1. Observed number of events
  2. Expected number of events
  1. Probability that the number of cases is <, <=, =, >=, > to the observed number
  2. 2-tailed p-value
  3. 95% confidence interval

Notes:

  1. Not all input parameters are independent.
  2. Extended Mantel-Haenszel


Future Development

Overview

The StatCalc tools are well-implemented from the mathematical and performance points of view.  The primary problem involves the labeling of fields and the limited documentation of specific tools.  Generally speaking, the calculators are designed with a very specific application in mind, while the underlying calculations may be applied to a much broader set of problems.  This can be addressed in a number of ways.  Making the labels more general may be helpful for some users.  However, a better approach might be to add a pull-down menu to tools such as "Cohort or Cross-Sectional" that offers a number of possible scenarios.  In this case, the options may include "Unmatched Cohort Study" and "Cross-Sectional Study".  Choosing an option would change the labeling of the input (and some output) fields to use terminology traditionally associated with that study type.  The options may also reflect the methodological context of the study.  Applied to infectious disease, one examines the ratio of exposed to unexposed individuals and the corresponding numbers who are ill or unaffected.  In contrast, a genetic study examines individuals that do or do not possess a particular allele of a gene or marker; the outcome is expressed in terms of phenotypes.  The goal is to express the statistical variables in a manner that is familiar to the researcher using a particular study design in a particular discipline.  Doing so will help ensure that the user enters the proper information in the correct fields and can accurately interpret the results.

The remaining issues concern consistency in how the tools are named and referred to in different parts of Epi Info™.  The label on the button in the StatCalc menu or VD Options menu does not always match the labeling of the resulting window or gadget (when run in stand-alone mode or as part of Visual Dashboard, respectively).  There is also inconsistent hyphenation of compound adjectives (e.g., "case-control").  Finally, the StatCalc gadgets in VD do not follow the pattern for the control widgets (located in the upper-right corner) used in other VD gadgets (they cannot be collapsed, for example).