Project Whitesheet
This document describes every statistical calculation performed by the Meta-Student platform. It is intended as a transparent audit trail for researchers, supervisors, and reviewers.
Meta-Student Statistical Methods Whitesheet
Version: 2.0 — IPD Linear Mixed Models
Date: May 2026
Platform: Meta-Student — Cumulative Science Platform for Undergraduate Research
R Engine: lme4/lmerTest + metafor packages, running under R >= 4.4
This document describes every statistical calculation performed by the platform, the assumptions underlying each, and how to interpret every output shown on the results page. It is intended as a transparent audit trail for researchers, supervisors, and reviewers.
1. Overview of the Analysis Pipeline
When a study administrator triggers a meta-analysis, the platform:
1. Fetches all submissions with status VALIDATED or INCLUDED from the database, in chronological order of submission.
2. Downloads the raw CSV data file from each submission (individual participant data).
3. Sends the complete participant-level data to the R engine via a POST request to /run.
4. The R engine stacks participant data across studies, fits a linear mixed model appropriate to the study design, computes per-study estimates, diagnostics, and the full analytics suite.
5. Results are stored in the database and displayed on the results page.
A minimum of 2 submissions with at least 4 total participants is required before analysis can run.
1.1 Why IPD Meta-Analysis?
Previous versions of the platform used summary-statistics meta-analysis: students entered group-level means, SDs, and sample sizes, and the platform computed per-study effect sizes using metafor::escalc(). Version 2.0 switches to Individual Participant Data (IPD) meta-analysis using linear mixed models.
Advantages of the IPD approach:
- No manual summary statistic entry — students upload raw data CSVs, eliminating transcription errors.
- Greater statistical power — participant-level data retains more information than summary statistics.
- Correct handling of complex designs — crossover trials, nested structures, and pre-post correlations are modelled directly rather than approximated.
- Natural variance decomposition — the mixed model simultaneously estimates between-study variance (τ²) and within-study residual variance (σ²).
- Flexibility — covariates and design features (period effects, sequence effects) are handled within the model rather than requiring external corrections.
2. Study Designs and Required CSV Columns
The platform supports five study designs. The design is set at study creation and determines the CSV template students receive.
2.1 Randomised Controlled Trial (RCT)
Each row is one participant.
| Column | Meaning |
|---|---|
participant_id |
Unique participant identifier |
group |
Group assignment: treatment / control (or 1 / 0) |
outcome_value |
Measured outcome in original units |
age |
(optional) Participant age |
sex |
(optional) Participant sex |
notes |
(optional) Free text |
2.2 Pre-Post Study (PRE_POST)
Each row is one participant measured before and after an intervention.
| Column | Meaning |
|---|---|
participant_id |
Unique participant identifier |
score_1 |
Pre-intervention score |
score_2 |
Post-intervention score |
age, sex, notes |
(optional) |
2.3 RCT with Pre-Post Measures (RCT_PRE_POST)
Each row is one participant in a two-group design with baseline and follow-up.
| Column | Meaning |
|---|---|
participant_id |
Unique participant identifier |
group |
treatment / control |
pre_score |
Baseline measurement |
post_score |
Follow-up measurement |
age, sex, notes |
(optional) |
2.4 Crossover Trial (CROSSOVER)
Each row is one participant who received both treatments in a defined sequence.
| Column | Meaning |
|---|---|
participant_id |
Unique participant identifier |
sequence |
Allocation sequence: AB or BA |
period_1_value |
Outcome measured in period 1 |
period_2_value |
Outcome measured in period 2 |
age, sex, notes |
(optional) |
2.5 Cross-Sectional Correlational (CROSS_SECTIONAL)
Each row is one participant with two measured variables.
| Column | Meaning |
|---|---|
participant_id |
Unique participant identifier |
variable_1 |
Predictor / independent variable |
variable_2 |
Outcome / dependent variable |
age, sex, notes |
(optional) |
3. Data Preparation
The stack_participants() function in the R engine:
1. Stacks all participant rows from all submissions into a single data frame, adding study_id and label columns to identify which submission each row belongs to.
2. Coerces outcome columns to numeric (non-numeric values become NA).
3. Drops rows with missing outcome data (listwise deletion within each study).
4. Factors study_id, group, and sequence for use in the mixed model.
For crossover designs, participant IDs are prefixed with study_id to prevent identity collisions when different studies happen to use the same ID scheme (e.g., "1", "2", "3").
4. Model Fitting — Design-Specific IPD Mixed Models
All models are fitted using lme4::lmer() with REML = TRUE (Restricted Maximum Likelihood). p-values are obtained via the Satterthwaite approximation provided by lmerTest.
4.1 RCT — Two Independent Groups
Model:
outcome_value ~ group + (1 | study_id) + (0 + group | study_id)
groupis relevelled so that the control group is the reference category. The platform identifies the control group by checking for labels containing "control", "0", "placebo", "con", or "ctrl" (case-insensitive). If none match, the first alphabetical level is used.- The fixed effect for the treatment group is the mean difference (treatment minus control) — this is the primary unstandardised effect.
(1 | study_id)captures between-study baseline variance (nuisance parameter — how much labs differ in their control-condition measurements).(0 + group | study_id)captures between-study treatment effect variance — the meta-analytic τ². This is the quantity reported as heterogeneity.- The two terms are independent (zero correlation assumed). This allows τ² to be estimated stably at k ≥ 4 by eliminating the need to estimate an intercept-slope correlation, which requires k > 20 to identify reliably.
- Falls back to
(1 | study_id)intercept-only if k < 4 or if the slope term causes a convergence failure.
Primary output: Mean Difference (in original measurement units)
Secondary output: Hedges' g (see Section 5)
4.2 Pre-Post — Within-Subjects Change
Model:
change ~ 1 + (1 | study_id)
where: change = score_2 - score_1
- The change score is computed for each participant.
- The intercept is the mean change across all participants, accounting for study-level clustering.
- This design tests whether the mean change is significantly different from zero.
Primary output: Mean Change (in original units)
Secondary output: Cohen's d_z (see Section 5)
4.3 RCT with Pre-Post — ANCOVA Model
Model:
post_score ~ pre_score + group + (1 | study_id) + (0 + group | study_id)
- This is an ANCOVA (Analysis of Covariance) formulation: baseline (
pre_score) is included as a covariate. - The treatment group coefficient represents the ANCOVA-adjusted group difference at post-test, controlling for each participant's baseline score.
- This is statistically more efficient than analysing change scores whenever there is any baseline-outcome correlation — i.e., the confidence interval is narrower for the same data. Under the standard assumption of parallel regression slopes across groups, the two approaches are estimating the same quantity.
- The independent random slope
(0 + group | study_id)gives the meta-analytic τ² (between-study treatment effect variance). Falls back to intercept-only if k < 4 or convergence fails.
SMD standardisation: The standardised effect (Hedges' g) is computed via emmeans::eff_size on the ANCOVA-adjusted group contrast, with denominator sigma_total = sqrt(Var.random + Var.residual) from insight::get_variance() (Westfall et al. 2014 total-SD). See Section 5 for the full formula. The Calin-Jageman exact Hedges J small-sample correction is applied to both the estimate and its SE.
Primary output: Adjusted Mean Difference — ANCOVA (in original units)
Secondary output: Hedges' g (see Section 5)
4.4 Crossover — Senn's Mixed-Model Approach
Data reshaping: Each participant contributes two rows (one per period). The data is reshaped from wide to long format:
| Row | treatment | period | outcome |
|---|---|---|---|
| Participant 1 (AB), period 1 | A | 1 | period_1_value |
| Participant 1 (AB), period 2 | B | 2 | period_2_value |
| Participant 2 (BA), period 1 | B | 1 | period_1_value |
| Participant 2 (BA), period 2 | A | 2 | period_2_value |
Model:
outcome ~ treatment + period + (1 | participant_within_study) + (0 + treatment | study_id)
treatment(A vs B) captures the treatment effect, with B as the reference (control).periodabsorbs any period/carry-over effect.(1 | participant_within_study)accounts for within-person correlation (each person measured twice).(0 + treatment | study_id)captures the meta-analytic τ² — between-study variability in the treatment effect. This replaces the previous(1 | study_id)random intercept which captured between-study baseline variance rather than treatment effect variance.- Falls back to
(1 | study_id)if convergence fails.
This follows Senn's (2002) recommended approach for crossover meta-analysis with individual data, correctly separating the treatment, period, and carry-over effects.
Primary output: Treatment Effect, A vs B (in original units)
Secondary output: Hedges' g (see Section 5)
4.5 Cross-Sectional — Mixed-Effects Regression
Model (k ≥ 4 studies):
variable_2 ~ variable_1 + (1 | study_id) + (0 + variable_1 | study_id)
(1 | study_id)captures between-study differences in the intercept (baseline level ofvariable_2).(0 + variable_1 | study_id)captures between-study variability in the regression slope — the meta-analytic τ².- The two terms are independent (zero correlation assumed), enabling stable estimation at k ≥ 4. The previous correlated structure
(1 + variable_1 | study_id)required k ≥ 8 because it added an unidentifiable correlation parameter that caused the optimizer to thrash at low k.
Falls back to intercept-only if k < 4 or convergence fails:
variable_2 ~ variable_1 + (1 | study_id)
Primary output: Regression Slope (unstandardised, in outcome-per-predictor units)
Secondary output: Standardised Beta (slope × SD_x / SD_y)
In addition to the mixed-model slope, the platform computes four complementary association metrics by pooling study-level estimates (see Section 6).
5. Standardised Effect Sizes (Secondary Output)
For the three factor-contrast designs (RCT, RCT_PRE_POST, CROSSOVER), the secondary output is a standardised mean difference derived from the mixed model via emmeans::eff_size with the Westfall et al. (2014) total-variance sigma:
SMD = eff_size(emmeans_contrast, sigma = sigma_total, edf = df.residual(model)) × J
where:
sigma_total = sqrt(Var.random + Var.residual)
(Var.random = sum of all random-effect variances;
both terms extracted via insight::get_variance(model))
J = exp( lgamma(df/2) - log(sqrt(df/2)) - lgamma((df-1)/2) )
(Calin-Jageman exact Hedges' J; numerically equivalent to
the asymptotic 1 - 3/(4*df - 1) at large df but more
accurate at small df. Source: doi:10.31234/osf.io/s2597)
df = Satterthwaite df from the emmeans contrast (lmerTest)
edf = df.residual(model)
(df reference for the uncertainty in sigma; see Westfall 2014)
Why total-variance sigma. Westfall et al. argue that the population SMD interpretation requires standardising against the full marginal SD of the outcome, not just the within-study residual SD. With independent random slopes the random-effect variance is non-trivial, so excluding it (the pre-v6 approach) systematically inflated SMDs relative to the population reference. The Westfall denominator includes every random component plus the residual.
For PRE_POST, the design is intercept-only (mean change against zero), so emmeans contrasts add no value. The secondary remains Cohen's d_z computed from the change-score variance with the same Calin-Jageman J factor; variance components are now extracted via insight::get_variance() for consistency with the other designs.
For CROSS_SECTIONAL, the secondary is the standardised regression coefficient β computed via marginaleffects::avg_slopes with a linear-transform hypothesis sprintf("%.10f * b1 = 0", sd_x / sd_y). This gives a delta-method SE for the standardised slope rather than the naive slope_SE × SD_x / SD_y, which ignores the uncertainty in the marginal SDs. Hedges' J is not applied (β is not a standardised mean difference).
> Methodological change vs earlier versions. Prior versions standardised by sqrt(σ²) (residual SD only). The current Westfall total-SD denominator includes the random-effect variances. Numerical SMD values for the same dataset will therefore differ from prior reports. The RCT_PRE_POST denominator in particular has changed from "pooled SD of within-group change scores" to the total marginal SD of post_score, which is the larger methodological change for that design.
5.1 Confidence intervals for SMD
SMD ± t(df, 0.975) × SE_SMD
where SE_SMD is propagated through emmeans::eff_size and scaled by J.
CIs use the t-distribution with Satterthwaite degrees of freedom, not the z-distribution, for consistency with the primary effect CIs.
5.2 Design-specific SMD labels
| Design | SMD Label | Interpretation |
|---|---|---|
| RCT | Hedges' g | Treatment vs control in total-SD units (Westfall) |
| PRE_POST | Cohen's d_z | Mean change in change-score SD units |
| RCT_PRE_POST | Hedges' g | ANCOVA-adjusted between-group difference in total-SD units (Westfall) |
| CROSSOVER | Hedges' g | Treatment effect in total-SD units (Westfall) |
| CROSS_SECTIONAL | Standardised Beta | Regression slope in SD units (β × SD_x / SD_y, delta-method SE) |
5.3 Interpretation benchmarks (Cohen, 1988)
| SMD | Interpretation | ||
|---|---|---|---|
| < 0.2 | Negligible | ||
| 0.2 – 0.49 | Small | ||
| 0.5 – 0.79 | Medium | ||
| 0.8 – 1.19 | Large | ||
| >= 1.2 | Very large |
Important caveat: These benchmarks are rough population-level defaults. A "small" effect on a hard-to-move physiological variable may be highly practically significant, while a "large" effect on a trivial outcome may not be. Always interpret magnitude in relation to the specific outcome and population.
6. Cross-Sectional Association Metrics
For cross-sectional studies, the platform reports four complementary measures of association. Each is computed per study and then pooled across studies using inverse-variance weighting.
6.1 OLS Regression Slope
The primary output from the mixed model (Section 4.5). This is the unstandardised slope — for each one-unit increase in the predictor, the expected change in the outcome. It is the most directly interpretable measure because it is in original measurement units.
6.2 Pearson r
Per-study Pearson correlations are pooled directly (without Fisher z transformation). The pooling uses inverse-variance weights with the asymptotic variance of Pearson r:
Var(r) = (1 - r²)² / (n - 1)
Pooled estimate and 95% CI:
r_pooled = Σ(w_i × r_i) / Σ(w_i) where w_i = 1 / Var(r_i)
SE = sqrt(1 / Σ(w_i))
CI = r_pooled ± 1.96 × SE
Rationale for omitting Fisher z: Fisher's z transformation is standard when pooling many studies with small samples, but it introduces bias when correlations are not near zero and studies are few. Since Meta-Student typically has a modest number of studies with reasonable sample sizes, direct pooling is simpler and avoids this bias.
6.3 Spearman ρ (rho)
A rank-order correlation that is robust to outliers and can detect monotonic (not just linear) relationships. Per-study Spearman correlations are pooled using the asymptotic variance approximation (Fieller, Hartley, & Pearson, 1957):
Var(ρ) = (1 + ρ²/2) / (n - 3)
6.4 Kendall τ (tau)
A robust rank correlation based on concordant and discordant pairs. More conservative than Spearman and performs well with small samples or tied values. Per-study values are pooled using the asymptotic variance (Kendall, 1970):
Var(τ) = 2(2n + 5) / (9n(n - 1))
6.5 Interpretation
If all four metrics broadly agree, the evidence for the association is strong. If they diverge (e.g., Pearson r is large but Spearman ρ is small), it suggests the relationship may not be linear, or outliers may be inflating the Pearson estimate. The platform displays all four to allow users to assess robustness.
7. Variance Components
The mixed model decomposes total variance into two levels:
7.1 τ² (tau-squared) — Between-Study Variance
The variance of the true study-level effects around the pooled mean. A larger τ² means the studies' underlying effects genuinely differ. This is the random-slope variance from the (0 + treatment | study_id) term in the mixed model — the between-study variability in the treatment effect specifically, not in baseline levels. The (1 | study_id) intercept term captures between-study baseline variance and is a nuisance parameter that is not reported as τ².
7.2 σ² (sigma-squared) — Residual (Within-Study) Variance
The variance of individual participant outcomes around their study-level mean, after accounting for the fixed effects. This captures individual-level noise.
7.3 I² — Proportion of Heterogeneity
I² is adapted for the IPD mixed-model context. Since σ² is individual-level variance while τ² is study-level, a naive τ²/(τ² + σ²) would vastly underestimate I² because σ² includes noise from every participant.
The correct formula scales σ² to the study level using the harmonic mean of per-study sample sizes:
n_typical = k / Σ(1/n_i) (harmonic mean of study sizes)
typical_sampling_var = σ² / n_typical (scaled to study level)
I² = τ² / (τ² + typical_sampling_var)
This yields an I² comparable to what would be obtained from a traditional summary-statistics random-effects model.
Interpretation (Higgins et al., 2003):
| I² | Interpretation |
|---|---|
| < 25% | Low heterogeneity — studies are fairly consistent |
| 25% – 49% | Moderate heterogeneity — some variability |
| 50% – 74% | Substantial heterogeneity — studies differ meaningfully |
| >= 75% | High heterogeneity — interpret the pooled estimate with caution |
8. Per-Study Estimates
For the forest plot and leave-one-out analyses, the platform computes per-study effect estimates using simple within-study calculations (not BLUPs from the mixed model):
| Design | Per-study estimate | SE |
|---|---|---|
| RCT | mean(treatment) - mean(control) |
Pooled independent-samples SE |
| PRE_POST | mean(change scores) |
SD(change) / sqrt(n) |
| RCT_PRE_POST | mean(Δtreatment) - mean(Δcontrol) |
Pooled independent-samples SE on change scores |
| CROSSOVER | (mean(diff_AB) - mean(diff_BA)) / 2 |
Pooled SE / 2 |
| CROSS_SECTIONAL | OLS slope from lm(y ~ x) |
Standard OLS SE |
Weights are computed as w_i = 1/SE_i² and normalised to sum to 1, so the frontend can display them as percentages.
These per-study estimates are then used to:
- Construct the forest plot (via
metafor::forest()) - Run leave-one-out sensitivity analysis (via
metafor::leave1out()) - Compute the funnel plot (via
metafor::funnel()) - Run publication bias tests
9. Diagnostic Plots
The platform generates six types of diagnostic output from the IPD mixed model:
9.1 Q-Q Plot of Residuals
Plots the quantiles of the level-1 (participant) residuals against theoretical normal quantiles. Points should lie close to the diagonal line. Systematic deviations indicate non-normality of residuals, which can affect CI coverage and p-values.
9.2 Residuals vs Fitted Values
Plots residuals against predicted values. Should show a random scatter around zero. Patterns (curves, fanning) indicate model misspecification — e.g., a non-linear relationship, heteroscedasticity, or omitted variables.
A LOWESS smoother (red line) is overlaid to highlight any trends.
9.3 Q-Q Plot of Random Effects
Plots the study-level treatment slope random effects (the (0 + treatment | study_id) term — the meta-analytic τ component) against normal quantiles. Checks the assumption that true treatment effects are normally distributed across studies. With few studies (k < 10), this plot is noisy and should be interpreted cautiously.
9.4 Scale-Location Plot
Plots sqrt(|standardised residuals|) against fitted values. Checks for homoscedasticity (equal variance). If the red LOWESS line is roughly flat, variance is approximately constant. An upward or downward slope suggests variance increases or decreases with the predicted value.
9.5 Baujat Plot
Generated from the per-study summary statistics using metafor::baujat(). Each point is a study, with:
- x-axis: contribution to overall Q (heterogeneity)
- y-axis: influence on the pooled result
Studies in the top-right corner are both heterogeneous and influential — they are the strongest candidates for investigation.
9.6 Leave-One-Out Analysis
Each study is removed in turn, and the pooled effect is re-estimated from the remaining studies. If the result changes substantially when a single study is dropped, that study is highly influential and should be examined for data quality issues. Both a table and a forest-style plot are provided.
10. Sequential Analysis and Stopping Rules
10.1 Cumulative Re-Analysis
The sequential analysis re-fits the full design-specific IPD mixed model after each study is added, in chronological submission order. Starting from the first 2 submissions, each iteration adds one more.
For each cumulative model at study k, the platform records:
| Field | Meaning | ||
|---|---|---|---|
k |
Number of studies included | ||
effect |
Pooled unstandardised effect at that point | ||
se |
Standard error of the unstandardised effect | ||
ci_lower, ci_upper |
95% confidence interval | ||
smd, smd_se |
Standardised effect and its SE | ||
tau2, sigma2 |
Variance components at that point | ||
stability.effectChange |
Absolute change in SMD (\ | Δ SMD\ | ) from previous step (retained for diagnostics) |
stability.tau2Change |
Relative change in τ² (log-ratio) from previous step (retained for diagnostics) | ||
stability.sigma2Change |
Relative change in σ² (log-ratio) from previous step (retained for diagnostics) |
10.2 Sequential Stopping Rule (v6 — three-trigger framework with 2-consecutive confirmation)
When does a study close?
Meta-Student uses a sequential design: each new study submission is analysed as it arrives. Under v6 the study closes when any one of three triggers fires on two consecutive completed studies (strict 2-consecutive same-trigger rule). All three triggers are gated by an initial burn-in of k ≥ 4 studies.
| Trigger | Condition | Action on confirmed fire | ||
|---|---|---|---|---|
| T1 — Detection | SE(standardised effect) < SESOI ÷ 2 AND \ | z\ | > z<sub>α/2</sub> | Close as positive |
| T2 — Conditional-power futility | Projected CP at k<sub>max</sub> below the registered CP threshold | Close as futile (CP) | ||
| T3 — Precision futility | Projected k required to satisfy SE < SESOI/2 exceeds the registered k cap | Close as futile (precision) |
If none fire on consecutive looks, the study continues to the next k. Because the rule requires two consecutive matching fires, the practical minimum k for any close is 5.
Trigger 1 — Detection (T1). Identical in spirit to the v5 sufficient-information rule, with a significance check added. The standard error of the pooled standardised effect must be below SESOI ÷ 2 AND the pooled effect must be significantly different from zero at the two-sided alpha (default 0.05). The standardised effect is design-dependent:
- Experimental designs (RCT, PRE_POST, RCT_PRE_POST, CROSSOVER): the standardised effect is the SMD (Hedges' g or Cohen's d_z). Default SESOI = 0.20.
- Correlational designs (CROSS_SECTIONAL): the standardised effect is the standardised regression coefficient β. Default SESOI = 0.10.
Trigger 2 — Conditional-power futility (T2). Conditional power at the current observed estimate, projected forward to k<sub>max</sub>, answers: "if we kept going, what is the probability we ever cross significance?" When this probability falls below the CP threshold (default 0.20), T2 fires. The CP projection assumes future studies contribute equal information (the standard Lan–Wittes simplifying assumption); this slightly overestimates CP under non-trivial heterogeneity, an issue tracked for v7 (see §10.5).
Trigger 3 — Precision futility (T3). Projects the k required to satisfy the precision target from the current observed SE trajectory. Specifically, under the SE-scales-as-1/√k approximation,
k_required = ceiling( k_current × (SE_current / (SESOI / 2))^2 )
If k_required exceeds the registered k cap (default 50), T3 fires. T3 addresses the high-heterogeneity case: when τ̂ is large relative to SESOI/2, satisfying the precision criterion would require impractically many studies. T3 caps this explicitly without coupling the decision rule to heterogeneity directly.
Strict 2-consecutive confirmation. A trigger only closes a study when the same trigger fires at look k and look k+1. Triggers do not cross-confirm one another: a study flickering between T1 at k and T2 at k+1 keeps running until two consecutive looks agree. This mitigates ordering effects, where a single look at k = 4 might fire on noise.
Why this design.
1. v5 had no failure pathway. A study with true δ ≈ 0 would accumulate indefinitely until the precision criterion was met, at which point it terminated as a null result. This is inefficient and produces no early signal that the line of inquiry is unlikely to find a meaningful effect.
2. Conditional power gives a principled futility rule. CP at the current observed estimate is a standard tool in group-sequential trial design (Lan & Wittes 1988; Whitehead 1997). A threshold of 0.10 to 0.20 is well established in that literature.
3. τ-driven precision futility addresses the high-heterogeneity case. Heterogeneity influences the decision only through its effect on SE, exactly as it does in v5; T3 just caps the projection so a study cannot drag on indefinitely chasing precision that the data-generating process will not yield.
4. Two consecutive confirmations mitigate ordering effects. A single look at k = 4 has a meaningful chance of firing on noise. Requiring agreement at consecutive looks reduces this ordering sensitivity. Strict same-trigger matching prevents pathological close-outs where consecutive looks disagree about the exit state.
5. What is explicitly deferred to v7. Predictive-probability-based futility (P(|δ| > SESOI | data) < threshold) is deferred because it requires an additional simulation sweep to tune the probability threshold. A more sophisticated CP projection that accounts for τ̂ directly is also deferred to v7.
Combined verdict reported by the platform:
stopReason ∈ { "detection", "cp_futility", "precision_futility", null }
stoppedAtK = first k at which the same trigger fired at two consecutive looks
The platform also retains legacy v5 booleans (precisionMet, sufficientInformation, criteria.c1_burnin, criteria.c2_precision) for backwards compatibility with results persisted under the v5 framework, but these are not used by the v6 stopping logic itself.
> Important: A futile close (T2 or T3) is not a positive null finding. It indicates that continued accumulation under the current trajectory is unlikely to change the verdict, given the study's registered SESOI and feasibility constraints. Always interpret results using the full confidence interval and prediction interval displayed on the results page.
Parameter configuration: Each registered study has four v6 parameters set at creation and locked thereafter: sesoi (default 0.20 SMD scale, 0.10 β scale), cpThreshold (default 0.20), kCap (default 50), kMaxForCp (default 30). Researchers planning a study should set SESOI to the minimum meaningful effect in their field on the appropriate scale. The other three default values follow the group-sequential literature and are appropriate for most studies; adjust only with a registered rationale.
10.3 Limitations
The v6 framework's defaults are theoretically grounded but not yet verified by a full simulation re-run. A v6 power and operating-characteristics study (Type I error at δ = 0, incorrect-futility rate at δ ≥ SESOI, expected stopping-k distributions, and τ sensitivity) is the planned follow-up before v6 is used in confirmatory analyses. The CP projection currently uses the Lan–Wittes simplifying assumption that future studies contribute equal information; a τ̂-aware CP projection is on the v7 roadmap.
11. Publication Bias Assessment
Publication bias analyses use the per-study summary statistics (Section 8) fed into metafor::rma(), not the IPD mixed model directly. This is because metafor's bias diagnostics are designed for study-level data.
11.1 Funnel Plot
A scatter plot of each study's effect size (x-axis) against its standard error (y-axis, inverted). Under no publication bias, points should scatter symmetrically around the pooled effect in an inverted funnel shape. Asymmetry suggests small studies with certain directions may be missing.
Requires >= 3 studies. Most informative with >= 10 studies.
11.2 Egger's Regression Test
Formal test for funnel asymmetry. Regresses standardised effects on precision:
(effect_i / SE_i) = a + b × (1 / SE_i) + error_i
A significant non-zero intercept (a) indicates asymmetry.
| Output | Meaning |
|---|---|
z |
Test statistic |
pValue |
Two-tailed p-value for the intercept |
bias |
Estimated intercept — positive = small studies show larger effects |
Caution: Low power with few studies (k < 10). A non-significant result does not rule out publication bias.
11.3 Trim-and-Fill
Non-parametric method (Duval & Tweedie, 2000) that detects and corrects funnel asymmetry by:
1. Identifying extreme studies on one side
2. Temporarily removing them
3. Re-estimating the centre
4. Imputing mirror-image studies
5. Recomputing the pooled effect
| Output | Meaning |
|---|---|
nImputed |
Studies imputed to restore symmetry |
adjustedEffectSize |
Corrected pooled effect |
adjustedCiLower, adjustedCiUpper |
Corrected CI |
Uses the R0 estimator. If nImputed = 0, the funnel is already symmetric. Caution: Can over-correct when heterogeneity is high.
11.4 Fail-Safe N (Rosenthal)
The number of unpublished null-result studies that would need to exist to reduce the pooled effect to non-significance.
Conventionally robust if FSN > 5k + 10
A large FSN suggests the result is resilient to publication bias. A small FSN (< 10) means only a few null studies could invalidate the finding.
12. Outlier and Influence Detection
Using the per-study summary statistics and metafor::influence(), a study is flagged as a potential outlier if either criterion is met:
| Criterion | Threshold | Meaning | ||
|---|---|---|---|---|
| Cook's distance | > 1 | Disproportionate influence on the pooled estimate | ||
| Studentised residual | rstudent | > 2 | Effect size > 2 SDs from the model prediction |
Requires >= 3 studies. Flagging is a signal for manual review, not automatic exclusion. Flagged submission IDs are stored in outlierIds and highlighted on both the admin panel and results page.
13. Confidence Interval Construction
All confidence intervals for primary and secondary effects use the t-distribution with Satterthwaite degrees of freedom from lmerTest, not the z-distribution. This provides more accurate coverage with small samples and few studies.
The Satterthwaite approximation estimates the effective degrees of freedom for each fixed-effect coefficient by accounting for the variance-component structure of the model. When the approximation is unavailable (rare), the engine falls back to the z-distribution (qnorm(0.975)).
For per-study estimates and pooled publication-bias analyses, normal-theory CIs (± 1.96 × SE) are used, consistent with standard meta-analytic practice via metafor.
14. Completion Report Package
When a study is marked Completed, the platform automatically generates a downloadable ZIP archive containing the full reproducible analysis pipeline and a near-complete academic manuscript template. The intent is that nothing stored in the database is required for a reader to independently reproduce the analysis or write up the paper.
14.1 Archive contents
README.md Human-readable summary and file index
analysis.Rmd Reproducible R Markdown meta-analysis script
report.qmd Quarto technical report — every statistic explained
manuscript.qmd Quarto academic manuscript template
references.bib Starter BibTeX bibliography
data/
summary_statistics.csv Pooled summary statistics (all included submissions)
submissions/
<submitter>_<date>.csv Individual participant-level CSV, one per contributor
deviations/
<submitter>_methods.txt Methods report and admin notes per contributor
14.2 report.qmd — comprehensive technical report
This is a self-contained technical report rendering to HTML or PDF. For every statistic it shows, it also documents (a) what the statistic estimates, (b) the assumptions it depends on, (c) how to read it, and (d) when it misleads. It includes:
- Full
summary(res)output from the fitted metafor model - Pooled effect on both standardised and raw (unstandardised) scales
- Knapp-Hartung small-sample adjustment (t with k−1 df, inflated SE) alongside the standard Wald test — recommended when k < 20
- 95% prediction interval for the true effect of a new sample from the same population (distinct from the CI for the average)
- Model fit statistics (logLik, deviance, AIC, BIC)
- Per-sample best linear unbiased predictions (BLUPs) with shrinkage toward the pooled mean
- Heterogeneity: τ², τ, I², H², Cochran's Q, with profile-likelihood 95% CIs for τ² and I²
- Panel-by-panel interpretation of every diagnostic plot (Q–Q, residuals-vs-fitted, scale-location, leave-one-out, Baujat)
- Influence-diagnostic table with per-column interpretation guidance (Cook's d, rstudent, DFFITS, cov. ratio, hat)
- Publication-bias diagnostics (funnel plot, Egger's regression, trim-and-fill, Rosenthal's failsafe N)
- Sensitivity analyses: (i) exclusion of influential samples, (ii) comparison across alternative τ² estimators (REML, DL, PM, ML, HE)
14.3 manuscript.qmd — academic paper template
This is a Quarto manuscript pre-populated with project metadata (title, design, primary outcome, description) and all meta-analytic results, structured as an academic paper and designed to substantially reduce manuscript-writing overhead. It renders to HTML, Word (.docx), or PDF via quarto render manuscript.qmd --to docx.
Structure:
| Section | Content |
|---|---|
| Abstract | Structured (Background / Methods / Results / Conclusions) with Methods and Results pre-populated |
| Introduction | Three [AUTHOR: ...] paragraph prompts + auto-filled "The present study" paragraph |
| Methods | Fully auto-populated — design-specific blurbs for Design, Participants, Procedure, Outcomes, Effect Size, Statistical Analysis (including assumption checks, publication-bias procedures, information accrual) |
| Results | Sample characteristics → Assumption checks and sensitivity analyses (first) → Primary analysis (second) |
| Discussion | Fully [AUTHOR: ...] placeholders (intentionally left blank) |
| Data/code availability, CRediT, Funding, COI | Short boilerplate paragraphs |
The Methods and Results narrative adapts automatically to the study design (RCT, PRE_POST, RCT_PRE_POST, CROSSOVER, or CROSS_SECTIONAL), so the same template works across studies of the same design with different topics.
14.4 Mixed-model reporting detail
Both report.qmd and manuscript.qmd present the fitted random-effects linear mixed model in substantially more detail than the website's summary cards. Each file includes a mixed-model table reporting:
- Fixed-effect point estimate μ̂, SE, z (Wald), p (Wald), 95% CI
- Knapp-Hartung adjusted t, p, and 95% CI
- 95% prediction interval
- τ², its square root τ, profile-likelihood 95% CI for τ²
- I² with profile-likelihood 95% CI, H²
- Cochran's Q, df, p-value
- Per-sample BLUPs (observed vs shrunken estimate, shrinkage magnitude)
14.5 Raw files included
data/summary_statistics.csv— one row per submission with its computed summary statistics (means, SDs, correlations, n) exactly as supplied to the R engine.submissions/<submitter>_<date>.csv— the individual participant-level CSV originally uploaded by each contributor. These allow independent re-execution of the full pipeline.deviations/<submitter>_methods.txt— the submitter's methods-report text and any admin validation notes recorded at the time the submission was reviewed.
Generation is implemented in [lib/reportGenerator.ts](../lib/reportGenerator.ts); the file is produced once on the transition to COMPLETED status and written to uploads/reports/<studyId>.zip.
15. Data Flow Summary
Student uploads CSV of individual participant data and this is approved by the Supervisor
|
v
Submission stored in database (status: PENDING)
|
v (admin reviews and validates)
Status set to VALIDATED or INCLUDED
|
v (admin triggers analysis)
POST /api/analysis/run - the system:
-> downloads CSV for each submission
-> parses to participant-level rows
-> POST to R engine at R_API_URL/run
|
v (R engine)
stack_participants() — combine all CSVs into one data frame
analyze_*() — design-specific lmer() model
-> primary: unstandardised effect (MD, slope, etc.)
-> secondary: standardised effect (Hedges' g, d_z, etc.)
-> variance: τ², σ², I²
compute_per_study() — within-study estimates for plots
forest() / funnel() — base64 PNG plots
compute_diagnostics() — Q-Q, residuals, scale-location, Baujat, LOO
run_sequential() — cumulative models (k=2..K) + stopping rules
run_publication_bias() — Egger's, fail-safe N, trim-fill, outliers
|
v
Results returned as JSON
|
v
Stored in MetaResult table
|
v
Displayed on /dashboard/results/[studyId]
and /studies/[studyId] (public view)
16. Methodological References
- Cohen, J. (1988). Statistical Power Analysis for the Behavioral Sciences (2nd ed.). Erlbaum.
- Duval, S., & Tweedie, R. (2000). Trim and fill: A simple funnel-plot-based method of testing and adjusting for publication bias in meta-analysis. Biometrics, 56(2), 455–463.
- Fieller, E. C., Hartley, H. O., & Pearson, E. S. (1957). Tests for rank correlation coefficients. I. Biometrika, 44(3/4), 470–481.
- Hedges, L. V. (1981). Distribution theory for Glass's estimator of effect size and related estimators. Journal of Educational Statistics, 6(2), 107–128.
- Higgins, J. P. T., Thompson, S. G., Deeks, J. J., & Altman, D. G. (2003). Measuring inconsistency in meta-analyses. BMJ, 327(7414), 557–560.
- Kendall, M. G. (1970). Rank Correlation Methods (4th ed.). Griffin.
- Morris, S. B., & DeShon, R. P. (2002). Combining effect size estimates in meta-analysis with repeated measures and independent-groups designs. Psychological Methods, 7(1), 105–125.
- Rosenthal, R. (1979). The file drawer problem and tolerance for null results. Psychological Bulletin, 86(3), 638–641.
- Senn, S. (2002). Cross-over Trials in Clinical Research (2nd ed.). Wiley.
- Viechtbauer, W. (2010). Conducting meta-analyses in R with the metafor package. Journal of Statistical Software, 36(3), 1–48.
This whitesheet is version-controlled alongside the codebase. For questions or feedback, contact joe.warne@tudublin.ie.