Every week I get a version of the same question from thesis students: "I have my data — which test should I use?" Sometimes the data is right there in a spreadsheet. Sometimes the student has already run a test and wants to know if they chose correctly. Either way, the answer comes from the same four questions about the data, not from the test name itself.
This post gives you a decision flowchart followed by a reference table covering every major test you are likely to encounter in an agricultural, environmental, or social science thesis. The flowchart is faster. The table has more detail. Use both.
Why the Test Choice Matters
Using the wrong test does not just give you a wrong p-value. It can invalidate your entire analysis. A t-test used where ANOVA belongs inflates your Type I error rate. Running ANOVA on data that violates normality makes your F-statistic unreliable. Applying Pearson's correlation to ranked data misses the structure in your variables.
Reviewers catch this. The Methods section of any manuscript includes a statistical analysis subsection, and the first thing a reviewer checks is whether the test matches the data structure. Getting it right at the thesis stage means your results are defensible — and your published paper does not get a revision request about statistical methods.
"The best statistical test is not the most sophisticated one — it is the one whose assumptions your data actually meets."
Four Questions to Ask First
Before you look at any test, answer these four questions about your data:
1. What type is your dependent variable (DV)? Is the outcome you are measuring continuous (yield in t/ha, pH, temperature) or categorical (survived/died, treatment group, land use class)? This is the single biggest fork in the decision tree.
2. How many independent variables (IVs) do you have? One IV or multiple? One-IV situations use simpler tests. Multiple IVs open up regression models and factorial ANOVA designs.
3. What type are your independent variables? Categorical IVs (treatment, group, variety) lead to t-tests and ANOVA. Continuous IVs (temperature, rainfall, dose) lead to regression and correlation. You can have both at once — that is ANCOVA.
4. Is your data normally distributed? This determines whether you use a parametric test (assumes normality, more statistical power) or a non-parametric alternative (no distribution assumption, but less power). Check this with a Shapiro-Wilk test for small samples (<50) or a histogram/Q-Q plot for larger ones.
The Decision Flowchart
Work through the questions in order. Each question branches you toward a smaller set of options until you reach a specific test.
The Complete Reference Table
The table below covers all the major tests in one place. Use it to double-check your choice after the flowchart, or as a quick reference when reviewing statistical methods in a manuscript.
| Test | DV Type | IV Type | Groups / IVs | Key Assumption | Type |
|---|---|---|---|---|---|
| Independent t-test t.test(y ~ group) | Continuous | Categorical | 2 groups | Normal, equal variance | Parametric |
| Paired t-test t.test(a, b, paired=TRUE) | Continuous | Categorical | 2 (paired) | Normal differences | Parametric |
| Mann-Whitney U wilcox.test(y ~ group) | Continuous / Ordinal | Categorical | 2 groups | None (rank-based) | Non-Param. |
| Wilcoxon Signed-Rank wilcox.test(a, b, paired=TRUE) | Continuous / Ordinal | Categorical | 2 (paired) | Symmetric differences | Non-Param. |
| One-Way ANOVA aov(y ~ group) | Continuous | Categorical | 3+ groups | Normal, equal variance | Parametric |
| Kruskal-Wallis kruskal.test(y ~ group) | Continuous / Ordinal | Categorical | 3+ groups | None (rank-based) | Non-Param. |
| Two-Way ANOVA aov(y ~ A * B) | Continuous | 2 Categorical | 3+ per factor | Normal, equal variance | Parametric |
| Simple Linear Regression lm(y ~ x) | Continuous | Continuous | 1 IV | Linear, normal residuals | Parametric |
| Multiple Linear Regression lm(y ~ x1 + x2 + ...) | Continuous | Multiple (any) | 2+ IVs | Linear, normal residuals, no multicollinearity | Parametric |
| ANCOVA aov(y ~ group + covariate) | Continuous | Categorical + Continuous | Mixed | Normal, equal slopes | Parametric |
| Pearson's r cor.test(x, y) | Continuous | Continuous | 1 IV | Normal, linear relationship | Parametric |
| Spearman's ρ cor.test(x, y, method="spearman") | Continuous / Ordinal | Continuous / Ordinal | 1 IV | Monotonic relationship | Non-Param. |
| Chi-square test chisq.test(table) | Categorical | Categorical | 2+ categories each | Expected count ≥ 5 per cell | Non-Param. |
| Logistic Regression glm(y ~ x, family=binomial) | Binary (0/1) | Any | Any | Large sample, no multicollinearity | Parametric |
Parametric vs Non-Parametric — When Does It Matter?
Parametric tests (t-test, ANOVA, linear regression) assume that your data, or more precisely your residuals, follow a roughly normal distribution. This assumption makes them more powerful — they detect real effects with smaller sample sizes. Non-parametric tests make no such assumption; they work on ranks rather than raw values, which makes them robust but slightly less sensitive when the normality assumption would actually hold.
In practice, with sample sizes above 30, the normality assumption matters less because the Central Limit Theorem ensures that sample means approach a normal distribution regardless of the underlying distribution. Below 30, check normality seriously. Below 15 (which happens a lot in pot experiments and small field trials), a non-parametric test is safer even if the Shapiro-Wilk p-value is above 0.05, because the test has low power to detect non-normality in small samples.
n > 30 and data looks roughly symmetric on a histogram? Use the parametric test. n < 15 or the histogram is clearly skewed or multimodal? Use the non-parametric alternative. The 15–30 range is where the Shapiro-Wilk result matters most.
The Two-Group Situation — t-test vs Mann-Whitney U
You have two groups and a continuous DV. The two candidates are the independent t-test and the Mann-Whitney U test.
The t-test compares group means and assumes that the data in each group are approximately normally distributed with similar variance. It produces an interpretable test statistic (t) and a confidence interval for the difference in means. When the assumptions hold, it is the right choice.
Mann-Whitney U compares the distributions of the two groups without assuming normality. It works by ranking all observations together, then testing whether one group tends to have higher ranks than the other. It does not test means — it tests whether one group stochastically dominates the other — so you need to be precise about what you claim in your results section.
One situation that trips people up: the paired vs independent distinction. If you are comparing the same subjects before and after a treatment, or matched pairs (plot A on soil type X and plot B on soil type Y, matched by other properties), you have paired data. Use the paired t-test or Wilcoxon Signed-Rank, not the independent versions. Using independent tests on paired data ignores within-subject correlation and loses statistical power.
Three or More Groups — ANOVA vs Kruskal-Wallis
Most agricultural and environmental experiments compare more than two treatments. This is where one-way ANOVA becomes the standard choice — and where the most common error in thesis statistics shows up.
ANOVA tests whether any of the group means differ significantly. A significant F-statistic (p < 0.05) tells you that not all means are equal, but it does not tell you which ones differ. That is what the post-hoc test does — and this is where you need Tukey HSD, LSD, or Duncan's test. See the ANOVA tutorial and the Tukey HSD post for the full procedure.
When normality fails (Shapiro-Wilk p < 0.05 on residuals, or n is very small), Kruskal-Wallis is the right alternative. It is the non-parametric version of one-way ANOVA. A significant result tells you the groups differ in distribution — follow up with Dunn's test with Bonferroni correction for pairwise comparisons.
ANOVA assumes equal variance across groups (homoscedasticity). Check this with Levene's test in R: leveneTest(y ~ group, data=df). If Levene's test is significant (p < 0.05), use Welch's ANOVA instead: oneway.test(y ~ group, var.equal=FALSE).
Checking Assumptions in R
Here is the minimum assumption-checking code you should run before reporting any parametric test result. Run this before ANOVA, before t-tests, and before regression.
# === Assumption checks before parametric tests === # 1. Normality — Shapiro-Wilk (best for n < 50) shapiro.test(df$yield) # W = 0.962, p-value = 0.483 → p > 0.05 means normality OK # 2. Normality by group (before ANOVA or t-test) by(df$yield, df$treatment, shapiro.test) # 3. Q-Q plot — visual check (points near the line = normal) qqnorm(residuals(aov(yield ~ treatment, data = df))) qqline(residuals(aov(yield ~ treatment, data = df)), col = "red") # 4. Homogeneity of variance — Levene's test library(car) leveneTest(yield ~ treatment, data = df) # p > 0.05 = variances are equal → ANOVA assumption met # 5. If normality fails → run Kruskal-Wallis instead kruskal.test(yield ~ treatment, data = df) # 6. Dunn's pairwise test after Kruskal-Wallis library(dunn.test) dunn.test(df$yield, df$treatment, method = "bonferroni")
For regression, the assumption checks are slightly different — you check residuals rather than raw data:
# === Assumption checks for linear regression === model <- lm(yield ~ soil_ph, data = df) # Four diagnostic plots in one call par(mfrow = c(2, 2)) plot(model) # Plot 1: Residuals vs Fitted — should show no pattern # Plot 2: Q-Q plot — points should follow the diagonal # Plot 3: Scale-Location — should be flat (equal variance) # Plot 4: Cook's distance — identifies influential outliers # Shapiro-Wilk on residuals shapiro.test(residuals(model))
Reporting Your Test Result Correctly
Once you have chosen and run the right test, you need to report it in a way that lets readers replicate your decision. Here are the standard reporting formats for the tests covered above.
ANOVA: "Fertilizer treatment had a significant effect on grain yield (F3,8 = 24.6, p < 0.001, one-way ANOVA). Tukey HSD post-hoc comparison showed that T4 produced significantly higher yield than T1 and T3 (p < 0.05) but did not differ significantly from T2."
t-test: "Mean yield under urea treatment (5.94 t/ha) was significantly higher than the control (3.83 t/ha; t4 = 12.3, p < 0.001, independent-samples t-test)."
Mann-Whitney U: "Soil pH was significantly higher in the pre-monsoon sampling than the post-monsoon period (W = 68, p = 0.009, Mann-Whitney U test)."
Kruskal-Wallis: "Available phosphorus differed significantly across sampling sites (H5 = 14.8, p = 0.011, Kruskal-Wallis test). Dunn's pairwise test with Bonferroni correction identified significant differences between sites S4 and S6 (p = 0.008)."
Linear regression: "Soil pH explained 68% of the variance in maize yield (R² = 0.68, F1,10 = 21.4, p < 0.001). For each unit increase in pH, yield increased by an estimated 0.84 t/ha (β = 0.84, 95% CI [0.52, 1.16], p < 0.001)."
Five Mistakes That Show Up in Every Third Thesis
Running three t-tests to compare three groups (T1 vs T2, T1 vs T3, T2 vs T3) is wrong. Each test carries a 5% Type I error risk, and running three of them inflates your overall error rate to roughly 14%. Use ANOVA, which tests all groups simultaneously while controlling this rate.
A significant ANOVA p-value only tells you that some groups differ — it does not tell you which ones. You must follow ANOVA with a post-hoc comparison (Tukey HSD, LSD, or Duncan depending on your context). Stopping at the F-statistic leaves the main scientific question unanswered.
Pearson's r assumes both variables are continuous and approximately normally distributed with a linear relationship. If your data is ordinal, heavily skewed, or shows a non-linear relationship, Spearman's ρ is the correct tool. Running Pearson on ranked data gives you a coefficient that is harder to interpret and potentially misleading.
Assumption checking is not optional. Journals increasingly require you to state which normality and variance tests were performed and to report their results. Run Shapiro-Wilk and Levene's test, plot your residuals, and include the results in your Methods or as a supplementary table.
If your data has a paired structure — before/after measurements on the same plot, matched sites, repeated measures on the same plant — and you use an independent-samples test, you discard within-subject correlation and lose power. Always identify whether your design is independent or paired before choosing the test.
If you have a dataset in front of you right now and are unsure which test to use — or if you have already run an analysis and want a second pair of eyes — the data analysis service on this site covers exactly this: test selection, assumption checking, running the analysis in R or SPSS, and writing up the statistical methods and results sections for you.
Sajjadur Rahman
MSc Researcher · Data Analyst · University of DhakaNST Fellow and active researcher using R, SPSS, and Python across soil science and agricultural experiments. Developer of SPADE — open-source software for NUE analysis, ANOVA, and publication-ready figures. Available for statistical test selection, data analysis, and thesis consultation.