Researcher · Writer · Consultant · Dhaka, Bangladesh গবেষক · লেখক · পরামর্শদাতা · ঢাকা, বাংলাদেশ
Home About Services Research Blog Contact
Blog Data Analysis

One-Way ANOVA in R
for Agricultural Research

A complete practical tutorial — data entry, assumption checks, Tukey HSD with compact letter display, and a publication-ready bar chart. All code is copy-paste ready and the dataset mirrors a real Bangladesh fertilizer experiment.

I run ANOVA on every fertilizer experiment I do. The BBCU thesis data, the nano-urea pot experiment at DUNTC, the yield prediction work from 2023 — all of it goes through ANOVA before anything gets reported. SPSS handles it fine, but R gives you more control, the output is reproducible from a script anyone can rerun, and the figures go straight into manuscripts without reformatting.

This tutorial walks through a complete one-way ANOVA in R using a realistic agricultural dataset: four fertilizer treatments, three replications each, grain yield as the outcome. By the end you will have run the test, verified the assumptions, done Tukey HSD with a compact letter display, and produced a bar chart with error bars ready for a journal submission.

If you have never used R, install R from r-project.org and RStudio from posit.co. Both are free. Everything below runs in RStudio.

When to Use One-Way ANOVA

ANOVA tests whether the means of three or more groups are equal. You use it when you have one categorical independent variable — treatment, variety, location — and one continuous dependent variable: yield, NUE, nitrogen uptake. One-way means one grouping factor. If you have two factors (treatment and season, for example), you need two-way ANOVA instead.

For two groups only, a t-test is the right tool. For three or more, ANOVA is correct, and it controls the error rate better than running multiple t-tests in sequence.

Three assumptions apply: normality (residuals are approximately normally distributed), homogeneity of variance (spread is similar across groups), and independence of observations (satisfied by a properly randomised design). We check all three before interpreting the result.

The Dataset

The numbers below come from a CRD (Completely Randomised Design) pot experiment with four fertilizer treatments and three replications. Grain yield is in t/ha.

Treatment Rep 1 Rep 2 Rep 3 Mean (t/ha)
T1 Control (0 N) 3.82 3.95 3.71 3.83
T2 Urea 120 kg N/ha 5.94 6.18 6.08 6.07
T3 BBCU 80 kg N/ha 5.62 5.48 5.73 5.61
T4 BBCU 120 kg N/ha 6.41 6.28 6.52 6.40
Grand mean 5.48

Four treatments, twelve observations total. Small but typical for a pot study. The group means range from 3.83 to 6.40 — a spread that should produce a clear ANOVA result.

01 Enter the Data

The cleanest way to enter a small dataset by hand in R is a data frame. Open a new script, paste this in, and run it.

R
# Grain yield data — CRD pot experiment
# Yield in t/ha, 4 treatments, 3 reps each

y <- c(3.82, 3.95, 3.71,   # T1 Control
       5.94, 6.18, 6.08,   # T2 Urea 120 kg N/ha
       5.62, 5.48, 5.73,   # T3 BBCU 80 kg N/ha
       6.41, 6.28, 6.52)   # T4 BBCU 120 kg N/ha

treatment <- factor(rep(c("T1_Control", "T2_Urea120",
                          "T3_BBCU80",  "T4_BBCU120"), each = 3))

df <- data.frame(treatment, yield = y)
print(df)

The factor() call tells R that treatment is a categorical grouping variable. Without it, the ANOVA will not run correctly. For data in an Excel file, import it instead:

R
library(readxl)
df <- read_excel("your_data.xlsx")
df$treatment <- factor(df$treatment)

02 Descriptive Statistics

Look at your data before running any test. Summary statistics tell you whether the group means look noticeably different and whether the variances are in a similar range.

R
tapply(df$yield, df$treatment, mean)
tapply(df$yield, df$treatment, sd)
Output
T1_Control  T2_Urea120  T3_BBCU80  T4_BBCU120
    3.827       6.067      5.610       6.403

T1_Control  T2_Urea120  T3_BBCU80  T4_BBCU120
    0.1201      0.1206     0.1253      0.1201

Standard deviations all below 0.13, which means the spread is consistent across groups. A good early sign for the homogeneity assumption. A quick boxplot confirms the pattern visually:

R
boxplot(yield ~ treatment, data = df,
        col  = "lightyellow",
        main = "Grain Yield by Fertilizer Treatment",
        xlab = "Treatment",
        ylab = "Grain Yield (t/ha)")

03 Check Assumptions

Normality

With only three observations per group, running Shapiro-Wilk on each group separately has low power. The standard approach for small agricultural datasets is to fit the model first, then test normality on the residuals.

R
model <- aov(yield ~ treatment, data = df)
shapiro.test(residuals(model))

You want p > 0.05. A non-significant result means no evidence against normality. If p < 0.05, switch to the non-parametric Kruskal-Wallis test: kruskal.test(yield ~ treatment, data = df).

Homogeneity of variance

Levene's test is more appropriate here than Bartlett's because it is less sensitive to non-normality.

R
library(car)
leveneTest(yield ~ treatment, data = df)

Again, p > 0.05 means the assumption holds. If variance is significantly unequal, use Welch's ANOVA: oneway.test(yield ~ treatment, data = df, var.equal = FALSE).

04 Run the ANOVA

R
model <- aov(yield ~ treatment, data = df)
summary(model)
Output
             Df Sum Sq Mean Sq F value   Pr(>F)
treatment     3  11.84   3.947   267.1  9.6e-09 ***
Residuals     8   0.118   0.015
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Reading this output line by line:

Df (degrees of freedom): Treatment df = groups minus 1 = 4 - 1 = 3. Residual df = total observations minus groups = 12 - 4 = 8.

Sum Sq / Mean Sq: These partition total variance into between-group variance (treatment) and within-group variance (residuals). Mean Sq = Sum Sq / Df. The ratio of the two Mean Sq values is the F statistic.

F value of 267.1: The variance between group means is 267 times the variance within groups. That ratio is far too large to attribute to chance.

Pr(>F) of 9.6e-09: In full, 0.0000000096. With p < 0.001, you reject the null hypothesis. At least one treatment mean differs from the rest.

The ANOVA result alone does not tell you which treatments differ. For that, you need a post-hoc test.

05 Tukey HSD and Compact Letter Display

Tukey HSD compares every possible pair of group means while controlling the family-wise error rate. Agricultural journals almost always report results as a compact letter display (CLD): groups that share a letter do not differ significantly at p < 0.05.

The agricolae package handles both at once and is one of the most widely used tools in agricultural statistics in R.

R
install.packages("agricolae")   # Run this once only
library(agricolae)

tukey_out <- HSD.test(model, "treatment", group = TRUE)
print(tukey_out$groups)
Output
             yield groups
T4_BBCU120   6.403      a
T2_Urea120   6.067      a
T3_BBCU80    5.610      b
T1_Control   3.827      c

T4 and T2 share letter "a": they do not differ significantly from each other. T3 carries "b": it is below T4 and T2 but above T1. T1 carries "c": it differs from all fertilized treatments. This table goes directly into your manuscript results section.

— ❧ —

06 Publication-Ready Bar Chart

Bar charts with standard error bars are the standard figure format in most agricultural journals. This code produces one using ggplot2 and saves a 300 dpi PNG ready for submission.

R
library(ggplot2)
library(dplyr)

# Group means and standard errors
sum_df <- df %>%
  group_by(treatment) %>%
  summarise(mean_y = mean(yield),
            se     = sd(yield) / sqrt(n()))

# CLD letters — order matches factor levels: T1, T2, T3, T4
sum_df$cld <- c("c", "a", "b", "a")

ggplot(sum_df, aes(x = treatment, y = mean_y)) +
  geom_col(fill    = "#8b7e5a", alpha = 0.85, width = 0.55) +
  geom_errorbar(aes(ymin = mean_y - se, ymax = mean_y + se),
                width     = 0.18,
                linewidth = 0.7,
                colour    = "#1e1c18") +
  geom_text(aes(label = cld, y = mean_y + se + 0.10),
            family = "serif", size = 4.5) +
  labs(x = "Treatment", y = "Grain Yield (t/ha)") +
  theme_classic(base_size = 13) +
  theme(axis.title = element_text(face = "bold"))

ggsave("grain_yield_anova.png", dpi = 300, width = 6, height = 4.5)
Note on the CLD vector

The vector c("c","a","b","a") maps to the factor levels in alphabetical order: T1_Control, T2_Urea120, T3_BBCU80, T4_BBCU120. Always check that this matches your actual Tukey output before using it. If the treatment names or means differ in your dataset, the letters will differ too.

07 Writing It Up

This is how the ANOVA result reads in a Results section:

"Grain yield differed significantly across fertilizer treatments (F(3,8) = 267.1, p < 0.001). Tukey HSD post-hoc comparison showed that T4 (BBCU at 120 kg N/ha, 6.40 t/ha) and T2 (conventional urea at 120 kg N/ha, 6.07 t/ha) did not differ significantly from each other, but both were significantly higher than T3 (BBCU at 80 kg N/ha, 5.61 t/ha, p < 0.05). The unfertilized control (T1, 3.83 t/ha) was significantly lower than all fertilized treatments (p < 0.001)."

The reporting format is always F(df_treatment, df_residual) = F-value, p-value. Write this in the text and include the full ANOVA table as a numbered table in the paper.

Three Mistakes to Avoid

Mistake 1

Stopping after the ANOVA without a post-hoc test. A significant F only tells you at least one mean differs. Tukey HSD tells you which ones. Both steps belong in the paper, and a missing post-hoc test will be flagged by reviewers.

Mistake 2

Reporting "a significant difference was found between treatments" without the F statistic, degrees of freedom, and p-value. The full form is F(df_treatment, df_residual) = value, p = value. Papers returned for revision most often include this error in the Results section.

Mistake 3

Running ANOVA when the normality assumption fails. If Shapiro-Wilk returns p < 0.05 on the model residuals, use kruskal.test() instead. It is one line, requires no additional packages, and is the correct non-parametric equivalent for this design.

The Complete Script

Every step above in one place, ready to copy and adapt to your own dataset.

R — Complete Script
# ── ONE-WAY ANOVA — Agricultural Data ─────────────────────────────

# 1. Data entry
y <- c(3.82, 3.95, 3.71,
       5.94, 6.18, 6.08,
       5.62, 5.48, 5.73,
       6.41, 6.28, 6.52)

treatment <- factor(rep(c("T1_Control", "T2_Urea120",
                          "T3_BBCU80", "T4_BBCU120"), each = 3))
df <- data.frame(treatment, yield = y)

# 2. Descriptive stats
tapply(df$yield, df$treatment, mean)
tapply(df$yield, df$treatment, sd)

# 3. Assumption checks
model <- aov(yield ~ treatment, data = df)
shapiro.test(residuals(model))

library(car)
leveneTest(yield ~ treatment, data = df)

# 4. ANOVA table
summary(model)

# 5. Tukey HSD + compact letter display
library(agricolae)
HSD.test(model, "treatment", group = TRUE, console = TRUE)

# 6. Publication-ready figure
library(ggplot2)
library(dplyr)

sum_df <- df %>%
  group_by(treatment) %>%
  summarise(mean_y = mean(yield), se = sd(yield) / sqrt(n()))
sum_df$cld <- c("c", "a", "b", "a")  # T1, T2, T3, T4

ggplot(sum_df, aes(x = treatment, y = mean_y)) +
  geom_col(fill = "#8b7e5a", alpha = 0.85, width = 0.55) +
  geom_errorbar(aes(ymin = mean_y - se, ymax = mean_y + se),
                width = 0.18, linewidth = 0.7, colour = "#1e1c18") +
  geom_text(aes(label = cld, y = mean_y + se + 0.10),
            family = "serif", size = 4.5) +
  labs(x = "Treatment", y = "Grain Yield (t/ha)") +
  theme_classic(base_size = 13)

ggsave("grain_yield_anova.png", dpi = 300, width = 6, height = 4.5)

If you have a dataset that needs this kind of analysis and want a written results section and publication-ready figures alongside it, the data analysis service on this site handles exactly that — using R, SPSS, or Python depending on your preference.

All Posts Tukey HSD Test Explained →
SR

Sajjadur Rahman

MSc Researcher · Data Analyst · University of Dhaka

NST Fellow and active researcher using R, SPSS, and Python across soil science and agricultural experiments. Developer of SPADE — open-source software for nitrogen use efficiency analysis, ANOVA, and publication-ready figure export. Available for data analysis, manuscript editing, and thesis consultation.

💬