JS Estimator

Hunan U

1 Introduction

Previously, we talked about the bias-variance tradeoff and briefly mentioned the James-Stein estimator. The main take-away is that a biased estimator may have a better performance than an unbiased estimator, in terms of achieving a lower MSE. In statistics, the most notable example is the James-Stein estimator.

We use R to demonstrate the superiority of James-Stein estimator for a particular model, which is also a good exercise for R programming and statistical computing.

2 Set-up

The hiring problem. Alice is a hiring manager. She looks at NN different aspects of each candidate’s characteristics (say communication skills, analytical skills, etc). A candidate’s true characteristic in aspect ii is μiμ_i. Alice cannot observe μiμ_i, but she can have an imperfect measure ziz_i of each μiμ_i. Alice wishes to have a good estimate of a candidate’s characteristics μ=(μ1,,μN)μ=(μ_1, \dots, μ_N) based on the measures z=(z1,,zn)z = (z_1, \dots, z_n).

We build a simple model for the hiring problem:

Question: How to obtain a good estimate of μ=(μ1,,μN)μ = (μ_1, \dots, μ_N) based on the one observation z=(μ1,,μN)z = (μ_1, \dots, μ_N)?

  1. The maximum-likelihood estimator (MLE) is μ̂(MLE)=z\hat{\mu}^{(MLE)} = z. We can verify that this estimator is unbiased: 𝔼μ̂(MLE)=μ. \mathbb{E} \hat{\mu}^{(MLE)} = \mu.

  2. The James-Stein Estimator is μ̂JS=(1N2||z||2)z\hat{\mu}^{JS} = (1- \frac{N-2}{||z||^2})z.

Clearly, the James-Stein Estimator is biased. Indeed, it simply “shrinks” the unbiased MLE by 1N2||z||21- \frac{N-2}{||z||^2}. When N3N\ge3, we have 1N2||z||2<11- \frac{N-2}{||z||^2} < 1.

James-Stein Theorem (1961). For N3N \ge 3, the James-Stein Estimator dominates the MLE in terms of achieving a lower MSE: 𝔼||μ̂(JS)μ||2<𝔼||μ̂(MLE)μ||2, \mathbb{E} ||\hat{μ}^{(JS)} - μ||^2 < \mathbb{E} ||\hat{μ}^{(MLE)} - μ||^2, where ||μ̂μ||2=i=1N(μ̂iμi)2||\hat{μ} - μ||^2 = \sum_{i=1}^N (\hat{μ}_i - μ_i) ^ 2.

Proving this theorem may be thorny if you have limited exposure to mathematical statistics before. However, we can use R to run some experiments/simulations, and data will tell us whether this theorem holds in reality.

3 Experiments

The function run_experiment() performs the following operations:

  1. Take an input N and generate some vector μμ with length N

  2. Repeat the following steps nrep(=100) times:

  3. Return a data frame containing all the err_MLE and err_JSE

run_experiment = function(N) {
    nrep = 100
    err_MLE = rep(0,nrep)
    err_JSE = rep(0,nrep)
    μ = rnorm(N) # generated from standard normal dist
    
    for (i in 1:nrep) {
        e = rnorm(N,0,1)
        z = μ + e
        μ_MLE = z
        μ_JSE = (1-(N-2)/sum(z^2))*z
        err_MLE[i] = sum((μ_MLE-μ)^2)/N
        err_JSE[i] = sum((μ_JSE-μ)^2)/N
    }
    err_both = as.data.frame(cbind(err_MLE,err_JSE))
    names(err_both) = c("err_MLE","err_JSE")
    return(err_both)
}

We have the data for all the MSEs on 100 samples of James-Stein estimator and Maximum Likelihood estimator. We use the box plot to compare them.

get_box = function(N) {
    err = run_experiment(N)
    boxplot(err, ylab = "Error: ||hat{μ}-μ||/N")
    title(paste("μ_i is generated from Normal(0,1), sample size N=",N,sep=""))
}
get_box(N=3)

get_box(N=5)

get_box(N=2)

The statistical experiments confirm the superiority of James-Stein estimator when N3N \ge 3.