Scientific intelligence platform for AI-powered data management and workflow automation

Sample Size & Power Calculations

Calculate for a Variety of frequentist and Bayesian Design

Adaptive Design

Design and Analyze a Wide Range of Adaptive Designs

Milestone Prediction

Predict Interim Analysis Timing or Study Length

Randomization Lists

Generate and Save Lists for your Trial Design

Group Sequential and Promising Zone Designs

Calculate Boundaries & Find Sample Size. Evaluate Interim Data & Re-estimate Sample Size

Sample Size for Bayesian Statistics

Probability of Success (Assurance), Credible Intervals, Bayes Factors and more

Early Stage and Complex Designs

Sample size & operating characteristics for Phase I, II & Seamless Designs (MAMS)

Sample Size & Power Calculations

Calculate for a Variety of frequentist and Bayesian Design

Adaptive Design

Design and Analyze a Wide Range of Adaptive Designs

Milestone Prediction

Predict Interim Analysis Timing or Study Length

Randomization Lists

Generate and Save Lists for your Trial Design

Group Sequential and Promising Zone Designs

Calculate Boundaries & Find Sample Size. Evaluate Interim Data & Re-estimate Sample Size

Sample Size for Bayesian Statistics

Probability of Success (Assurance), Credible Intervals, Bayes Factors and more

Early Stage and Complex Designs

Sample size & operating characteristics for Phase I, II & Seamless Designs (MAMS)

nQuery 9.3

An upgraded group sequential design table format, improvements to our nQuery Predict tool, and 22 other new sample size tables in areas such as survival, correlation, pilot studies, and Bayesian equivalence.

25 new sample size tables have been added

TIER

New Pro Tables

TIER

New Plus Tables

TIER

New Base Tables

3 new sample size tables have been added to the Pro tier of nQuery 9.3.

**Group Sequential Design Overhaul**

**What is it?**Group Sequential Design is the most common adaptive design used in confirmatory clinical trials. This design allows trialists to stop a trial early at pre-specified interim analyses if there is sufficient evidence that treatment is effective (efficacy) or ineffective (futility). Group sequential designs capacity to stop trials early can lead to significant cost savings while also getting vital treatments into the hands of patients faster. Methods such as the Lan-DeMets error spending function give trialists significant flexibility to define the conditions under which a trial will stop early while maintaining significant flexibility during trial monitoring.

nQuery 9.3 is the first stage of our group sequential overhaul. This overhaul includes a number of additional group sequential methods, substantial improvements in user experience and additional detailed outputs to better explore different group sequential scenarios. Future updates will expand to additional endpoints, group sequential methods and user features.

**Overhauled Tables added:**

**Group Sequential Design for Two Means****Group Sequential Design for Two Proportions****Information-Based Group Sequential Design**

**Overhauled Tables added:**

**11 Spending Functions (O’Brien-Fleming, Pocock, Power Family, Hwang-Shih-DeCani, Exponential, Beta, t-distribution, Logistic, Normal, Cauchy, User Defined/Interpolated)****Wang-Tsiatis & Pampallona-Tsiatis Designs****Haybittle-Peto (p-value) Design****Unified Family Design****Custom Boundary Design with custom Z statistic, p-value, Score Statistic or Effect Size boundary inputs****2-sided Futility Boundaries**

**Additional User Features:**

**New user-responsive interface****Boundary Parameterization Conversion****Detailed & Exportable Group Sequential Reports****Improved Editable Boundary Plots****Error Spending Plots**

To access these tables, you must have a** nQuery Pro Tier** subscription. **If you do, nQuery should automatically prompt you to update.**

You can manually update nQuery Advanced by clicking **Help>Check for updates.**

4 new sample size tables have been added to the Plus tier of nQuery 9.3.

**Sample Size for Pilot Studies (3 New Tables)****Bayesian Equivalence (1 New Table)**

**What is it?**

Pilot studies are small studies undertaken before a planned trial to assess the feasibility and expected operating characteristics of the main trial. For example, pilot studies will often be used to derive estimates of study parameters required for sample size determination.

Several rule-of-thumb proposals have been put forward for the required size of a pilot study to get reasonable parameter estimates. However, research has shown that these are often incorrect for many real world scenarios. In response, methods that find the appropriate sample size for a pilot study that takes account of the expected uncertainty of the pilot study estimate were developed.

In nQuery 9.3, two common proposals for calculating the required sample size for a pilot study before a main trial that will use the two sample t-test are implemented: one based on the non-central t-statistic and one based on the upper confidence limit for the estimate. In addition, tables for the problem detection in a pilot study and for the upper confidence limit directly are included.

**Tables added:**

**Sample Size for Pilot Study for Two Sample t-test Trial (Non-Central t-distribution, Upper Confidence Limit)****Error Detection in Pilot Study****Sample Size for Upper Confidence Limit of Standard Deviation from Pilot Study**

**What is it?**

Equivalence testing, where a researcher wants to establish if two treatments give equivalent results, is a common objective in areas such as generics development and medical devices. At present, most equivalence testing is conducted using frequentist methods such as the two one-sided tests (TOST) or checking if a confidence interval falls within the lower and upper equivalence limits.

Bayesian alternatives have been proposed for the testing of equivalence. This includes proposals such as the “Region of Practical Equivalence” (also known as ROPE) and using Bayes Factors. The ROPE method is analogous to the confidence interval approach except using a highest density (HDI) credible interval while having improved interpretability in both the testing and estimate stages.

nQuery 9.3 implements a table that provides the required sample size for two-arm study assessing equivalence using the Bayesian ROPE methodology.

**Table added:**

**Bayesian Equivalence using Region of Practical Equivalence (ROPE)**

To access these tables, you must have a** nQuery Pro or Plus Tier** subscription. **If you do, nQuery should automatically prompt you to update.**

You can manually update nQuery Advanced by clicking **Help>Check for updates.**

18 new sample size tables have been added to the Base tier of nQuery 9.3.

**Proportions (6 New Tables)****Survival (3 New Tables)****Correlation/Agreement/Diagnostics/Variances (9 New Tables)****Randomization List Improvements**

**What is it?**

Proportions (i.e. categorical data) are a common type of data where the most common endpoint of interest, in particular dichotomous variables. Examples in clinical trials include the proportion of patients who experience a subject response such as tumour regression. There are a wide variety of designs proposed for binary proportions ranging from exact to maximum likelihood to normal approximations.

In nQuery 9.3, sample size tables are added in the following areas for the design of trials involving proportions:

**Tables added:**

**Logistic Regression****Confidence Interval for Proportions**

**What is it?**

Logistic regression is the most widely used regression model for the analysis of binary endpoints such as response rate. This model can flexibly model the effect of multiple types of treatment configurations while accounting for the effect of other variables as covariates.

In nQuery 9.3, sample size determination is added for several additional logistic regression scenarios including covariate-adjusted analyses and additive interaction effects.

**Tables added:**

**Covariate Adjusted Analysis for Binary Variable****Covariate Adjusted Analysis for Normal Variable****Additive Interaction for Cohort Design****Additive Interaction for Case-Control Design**

**What is it?**

Confidence intervals are the most widely used statistical interval in clinical research. Statistical intervals allow for assessment for the degree of uncertainty for a statistical estimate. For proportions, many different approaches have been proposed for the construction of confidence intervals depending on the study design and desired operating characteristics.

In nQuery 9.3, sample size determination for the width of a confidence interval is added for a binary endpoint in a stratified and cluster randomized stratified design.**Tables added:**

**Confidence Interval for Stratified Binary Endpoint****Confidence Interval for Cluster Randomized Stratified Binary Endpoint**

**What is it?**

Survival or Time-to-Event trials are trials in which the endpoint of interest is the time until a particular event occurs, for example death or tumour regression. Survival analysis is often encountered in areas such as oncology or cardiology.

In nQuery 9.3, sample size tables are added in the following areas for the design of trials involving survival analysis:

**Tables added:**

**Maximum Combination (MaxCombo) Tests****Linear-Rank Tests for Piecewise Survival Data****Paired Survival Data**

**What is it?**

Combination Tests represent a unified approach to sample size determination for the unweighted and weighted log-rank tests under Proportional Hazard (PH) and Non-Proportional Hazard (NPH) patterns.

The log-rank test is one of the most widely used tests for the comparison of survival curves. However, a number of alternative linear-rank tests are available. The most common reason to use an alternative test is that the performance of the log-rank test depends on the proportional hazards assumption and may suffer significant power loss if the treatment effect (hazard ratio) is not constant. While the standard log-rank test assigns equal importance to each event, weighted log-rank tests apply a prespecified weight function to each event. However, there are many types of non-proportional hazards (delayed treatment effect, diminishing effect, crossing survival curves) so choosing the most appropriate weighted log-rank test can be difficult if the treatment effect profile is unknown at the design stage.

The maximum combination test can be used to compare multiple test statistics and select the most appropriate linear rank-test based on the data, while controlling the Type I error by adjusting for the multiplicity due to the correlation of test statistics. In this release, one new table is being added in the area of maximum combination tests.

In nQuery 9.3, we add to our inequality and non-inferiority MaxCombo sample size table from nQuery 9.1 & 9.2 by adding a sample size determination table for equivalence testing using the MaxCombo Test.**Table added:**

**Equivalence Maximum Combination (MaxCombo) Linear Rank Tests using Piecewise Survival**

(Log-Rank, Wilcoxon, Tarone-Ware, Peto-Peto, Fleming-Harrington, Threshold Lag, Generalized Linear Lag)

**What is it?**

The log-rank test is one of the most widely used tests for the comparison of survival curves. However, a number of alternative linear-rank tests are available. The most common reason to use an alternative test is that the performance of the log-rank test depends on the proportional hazards assumption and may suffer significant power loss if the treatment effect (hazard ratio) is not constant. While the standard log-rank test assigns equal importance to each event, weighted log-rank tests apply a prespecified weight function to each event. However, there are many types of non-proportional hazards (delayed treatment effect, diminishing effect, crossing survival curves) so choosing the most appropriate weighted log-rank test can be difficult if the treatment effect profile is unknown at the design stage.

In nQuery 9.3, sample size determination is provided for seven linear-rank tests with flexible piecewise survival for equivalence testing, building on the inequality (superiority) and non-inferiority tables added in 9.2.

These nQuery tables can be used to easily compare the power achieved or sample size required for the Log-Rank, Wilcoxon, Tarone-Ware, Peto-Peto, Fleming-Harrington, Threshold Lag and Generalized Linear Lag and complement the MaxCombo tables provided when interested in evaluating multiple tests simultaneously.

**Tables added:**

**Equivalence Linear Rank Tests using Piecewise Survival**(Log-Rank, Wilcoxon, Tarone-Ware, Peto-Peto, Fleming-Harrington, Threshold Lag, Generalized Linear Lag)

**What is it?**

Paired analyses are a common approach to increase the efficiency of trials by comparing the results between two highly related outcomes (e.g. from the same person). Where a paired analysis is appropriate, ignoring this pairing can lead to underpowered inference. For example, in ophthalmology survival type endpoints (e.g. time to vision loss/degradation) can occur where a different treatment is applied to each eye but the standard log-rank test is often incorrectly still used.

In nQuery 9.3, a sample size table for paired survival analysis using the rank test is added.**Table added:**

**Test for Paired Survival Data**

**What is it?**

Correlation, agreement and diagnostic measures are all interested in the strength of relationships between two or more variables in different contexts. Correlation is interested in assessing the strength of the relationship between two variables.

Agreement assesses the degree to which two (or more) raters (e.g. two diagnostic tests) can reliably replicate their assessments. Diagnostic testing compares the degree of agreement between a proposed rater and the truth (e.g. screening programme result vs “gold standard” test such as biopsy)

Variance is used the assess the degree of variability in a measure. Tests comparing variances can be used to assess if the amount of variation differs significantly between groups.

In nQuery 9.3, sample size tables are added in the following areas for the design of trials using these concepts:

**Table added:**

**Correlation (Pearson’s, Spearman’s, Kendall tau-B)****Agreement (Binary Kappa, Polychotomous Kappa, Coefficient (Cronbach) Alpha)****Diagnostics (Partial ROC Analysis)****Variances (F-test, Levene’s Test, Bonett’s Test)**

**What is it?**

Correlation measures are widely used to summarise the strength of association between variables. Commonly seen in areas such as regression analysis, the most widely used version is the Pearson correlation for a linear relationship.

However other correlations may be more suitable in certain contexts such as rank correlations like Spearman’s correlation for dealing with ordinal rank data.

In nQuery 9.3, tables are added for the confidence interval for a Pearson correlation and tests for the Spearman and Kendall tau-B rank correlations.**Table added:**

**Confidence Interval for One (Pearson) Correlation****Test for Spearman Correlation****Test for Kendall tau-B Correlation**

**What is it?**

Assessing the reliability of different “raters” is vital in areas where multiple assessors criteria or methods are available to evaluate a disease or condition. For example, Cohen’s Kappa statistic is a widely used approach to quantify the degree of agreement between multiple raters and provides a basis for the testing and estimation of interrater reliability.

In nQuery 9.3, tables are added for the testing and confidence intervals for the polychotomous (⋝2 raters) Kappa statistic and test for comparing two coefficient (Cronbach) alpha.

**Table added:**

**Test for Polychotomous Kappa****Confidence Interval for Polychotomous Kappa****Test for Two Coefficient (Cronbach) Alpha**

**What is it?**

The statistical evaluation of diagnostic testing is a vital component of ensuring that proposed screening or testing procedures have the appropriate accuracy for clinical usage. A plethora of measures exist to evaluate the performance of a diagnostic test but one of the most common is Receiver Operating Characteristic (ROC) curve analysis where the Area Under the Curve (AUC) provides a snapshot of how well a test performs over the entire range of discrimination boundaries.

However, researchers may sometimes be interested in assessing the performance over a more limited range of outcomes. One such method is the partial ROC (pROC), which assesses the ROC performance only over a limited range of the True Positive Rate (TPR - Y-axis in ROC) or False Positive Rate (FPR - X-axis in ROC). For example, the region where FPR is greater than 0.8 implies that more than 80% of negative subjects are incorrectly classified as positives: this is unacceptable in many real cases.

In nQuery 9.3, sample size tables are added for partial ROC analysis for assessing one or two ROC curves.

**Table added:**

**Test for One Partial ROC****Test for Two Partial ROC**

**What is it?**

The variance is the most commonly cited statistic for evaluating the degree of variation in a measure. Researchers will often be interested in evaluating the degree to which the variation differs between one or more groups. Several tests have been proposed for this purpose including the F-test, Levene’s Test and Bonett’s test for comparing variances.

In nQuery 9.3, a sample size table is added for the comparison of two independent variances using the F-test, Levene’s Test or Bonett’s Test.

**Table added:**

**Tests for Two Variances (F-test, Levene’s Test, Bonett’s Test)**

**What is it?**

Randomization is a vital part of ensuring valid statistical inferences from common statistical tests used widely in clinical trials. Randomization creates treatment groups that are comparable and reduces the influence of unknown factors. However, in clinical trials there are often ethical and logistical considerations that mean that simple random allocation may not be appropriate.

For example, it is often considered important to ensure that balance is maintained at any given time during a clinical trial to reduce the potential effect of time-varying covariates or when sequential analysis is planned. Additionally, it can be important that covariates such as gender are relatively balanced overall.

nQuery 9.2 saw the addition of the Randomization Lists tool which will allow for the easy generation of randomization lists that account both for randomization and any balancing covariates of interest. 9.2 included the following randomization algorithms:

**Table added:**

**Block Randomization****Complete Randomization****Efron’s Biased Coin (2 Groups Only)**

nQuery 9.3 sees the addition of multiple new randomization algorithms, an increase in the number of allowable centers and a fully updated chapter on randomizations lists in our user manual. The summary of the updates to the randomization lists features are:

**4 New Randomization Algorithms (Smith’s, Wei’s Urn, Random Sorting, Random Sorting with Maximum Allowable Deviation)****Increase in number of allowable centers from 25 to 500****Improved User Manual Chapter on Randomization Lists**

To access these tables, you must have an active nQuery subscription. **If you do, nQuery should automatically prompt you to update.**

You can manually update nQuery Advanced by clicking **Help>Check for updates.**

**What's new in the Expert tier of nQuery?**

The Expert tier focuses on the nQuery Predict tool which allows researchers to predict the expected length of time to reach a trial milestone, such as study end or an interim analysis, based on pre-trial assumptions or from interim data from a trial while on-going. nQuery Predict allows for the prediction of time to reach recruitment milestones or, for survival analysis trials, event milestones using a variety of different accrual, event and dropout models.

nQuery 9.3 adds the Piecewise Poisson model for accrual prediction. This flexible extension uses the inversion method to extend the existing constant Poisson accrual model and allow the trialist to have the rate of accrual change over time by allowing multiple different Poisson rates to be assigned to discrete future time periods. This type of flexible model can be used to approximate many different types of possible accrual patterns with up to eight discrete time periods evaluable in the nQuery 9.3 Piecewise Poisson model.