估計量的偏誤

在統計學中，估計量的偏差（或偏差函數）是此估計量的期望值與估計參數的真值之差。偏差為零的估計量或決策規則稱為無偏的。否則該估計量是有偏的。在統計中，「偏差」是一個函數的客觀陳述。

偏差也可以相對於中位數來衡量，而非相對於均值（期望值），在這種情況下為了與通常的「均值」無偏性區別，稱作「中值」無偏。偏差與一致性相關聯，一致估計量都是收斂並且漸進無偏的（因此會收斂到正確的值），雖然一致序列中的個別估計量可能是有偏的（只要偏差收斂於零）；參見偏差與一致性。

當其他量相等時，無偏估計量比有偏估計量更好一些，但在實踐中，並不是所有其他統計量的都相等，於是也經常使用有偏估計量，一般偏差較小。當使用一個有偏估計量時，也會估計它的偏差。有偏估計量可能用於以下原因：由於如果不對總體進一步假設，無偏估計量不存在或很難計算（如標準差的無偏估計（英語：unbiased estimation of standard deviation））；由於估計量是中值無偏的，卻不是均值無偏的（或反之）；由於一個有偏估計量較之無偏估計量（特別是收縮估計量（英語：shrinkage estimator））可以減小一些損失函數（尤其是均方差）；或者由於在某些情況下，無偏的條件太強，這種情況無偏估計量不是必要的。此外，在非線性變換下均值無偏性不會保留，不過中值無偏性會保留（參見變換的效應）；例如樣本方差是總體方差的無偏估計量，但它的平方根標準差則是總體標準差的有偏估計量。下面會進行說明。

定義

設我們有一個參數為實數 θ 的概率模型，產生觀測數據的概率分布 $P_{\theta }(x)=P(x\mid \theta )$ ，而統計量 ${\hat {\theta }}$ 是基於任何觀測數據 $x$ 下 θ 的估計量。也就是說，我們假定我們的數據符合某種未知分布 $P_{\theta }(x)=P(x\mid \theta )$ （其中 θ 是一個固定常數，而且是該分布的一部分，但具體值未知），於是我們構造估計量 ${\hat {\theta }}$ ，該估計量將觀測數據與我們希望的接近 θ 的值對應起來。因此這個估量的（相對於參數 θ的）偏差定義為

\operatorname {Bias} _{\theta }[\,{\hat {\theta }}\,]=\operatorname {E} _{\theta }[\,{\hat {\theta }}\,]-\theta =\operatorname {E} _{\theta }[\,{\hat {\theta }}-\theta \,],

其中 $\operatorname {E} _{\theta }$ 表示分布 $P_{\theta }(x)=P(x\mid \theta )$ 的期望值，即對所有可能的觀測值 $x$ 取平均。由於 θ 對於條件分布 $P(x\mid \theta )$ 是可測的，就有了第二個等號。

對於參數 θ 的所有值的偏差都等於零的估計量稱為無偏估計量。

在一次關於估計量性質的模擬實驗中，估計量的偏差可以用平均有符號離差（英語：mean signed difference）來評估。

例子

樣本方差

隨機變量的樣本方差從兩方面說明了估計量偏差：首先，自然估計量（naive estimator）是有偏的，可以通過比例因子校正；其次，無偏估計量的均方差（MSE）不是最優的，可以用一個不同的比例因子來最小化，得到一個比無偏估計量的MSE更小的有偏估計量。

具體地說，自然估計量就是將離差平方和加起來然後除以 n，是有偏的。不過除以 n − 1 會得到一個無偏估計量。相反，MSE可以通過除以另一個數來最小化（取決於分布），但這會得到一個有偏估計量。這個數總會比 n − 1 大，所以這就叫做收縮估計量（英語：shrinkage estimator），因為它把無偏估計量向零「收縮」；對於正態分布，最佳值為 n + 1。

設 X₁, ..., X_n 是期望為 μ、方差為 σ² 的獨立同分布（i.i.d.）隨機變量。如果樣本均值與未修正樣本方差定義為

{\overline {X}}={\frac {1}{n}}\sum _{i=1}^{n}X_{i},\qquad S^{2}={\frac {1}{n}}\sum _{i=1}^{n}\left(X_{i}-{\overline {X}}\,\right)^{2},

則 S² 是 σ² 的一個有偏估計量，因為

{\begin{aligned}\operatorname {E} [S^{2}]&=\operatorname {E} \left[{\frac {1}{n}}\sum _{i=1}^{n}{\big (}X_{i}-{\overline {X}}{\big )}^{2}\right]=\operatorname {E} {\bigg [}{\frac {1}{n}}\sum _{i=1}^{n}{\bigg (}(X_{i}-\mu )-({\overline {X}}-\mu ){\bigg )}^{2}{\bigg ]}\\[8pt]&=\operatorname {E} {\bigg [}{\frac {1}{n}}\sum _{i=1}^{n}{\bigg (}(X_{i}-\mu )^{2}-2({\overline {X}}-\mu )(X_{i}-\mu )+({\overline {X}}-\mu )^{2}{\bigg )}{\bigg ]}\\[8pt]&=\operatorname {E} {\bigg [}{\frac {1}{n}}\sum _{i=1}^{n}(X_{i}-\mu )^{2}-{\frac {2}{n}}({\overline {X}}-\mu )\sum _{i=1}^{n}(X_{i}-\mu )+{\frac {1}{n}}({\overline {X}}-\mu )^{2}\sum _{i=1}^{n}1{\bigg ]}\\[8pt]&=\operatorname {E} {\bigg [}{\frac {1}{n}}\sum _{i=1}^{n}(X_{i}-\mu )^{2}-{\frac {2}{n}}({\overline {X}}-\mu )\sum _{i=1}^{n}(X_{i}-\mu )+{\frac {1}{n}}({\overline {X}}-\mu )^{2}\cdot n{\bigg ]}\\[8pt]&=\operatorname {E} {\bigg [}{\frac {1}{n}}\sum _{i=1}^{n}(X_{i}-\mu )^{2}-{\frac {2}{n}}({\overline {X}}-\mu )\sum _{i=1}^{n}(X_{i}-\mu )+({\overline {X}}-\mu )^{2}{\bigg ]}\\[8pt]\end{aligned}}

換句話說，未修正的樣本方差的期望值不等於總體方差 σ²，除非乘以歸一化因子。而樣本均值是總體均值 μ 的無偏^[1]估計量。

S² 是有偏的原因源於樣本均值是 μ 的普通最小二乘（英語：ordinary least squares）（OLS）估計量這個事實： ${\overline {X}}$ 是令 $\sum _{i=1}^{n}(X_{i}-{\overline {X}})^{2}$ 儘可能小的數。也就是說，當任何其他數代入這個求和中時，這個和只會增加。尤其是，在選取 $\mu \neq {\overline {X}}$ 就會得出，

{\frac {1}{n}}\sum _{i=1}^{n}(X_{i}-{\overline {X}})^{2}<{\frac {1}{n}}\sum _{i=1}^{n}(X_{i}-\mu )^{2},

於是

{\begin{aligned}\operatorname {E} [S^{2}]&=\operatorname {E} {\bigg [}{\frac {1}{n}}\sum _{i=1}^{n}(X_{i}-{\overline {X}})^{2}{\bigg ]}<\operatorname {E} {\bigg [}{\frac {1}{n}}\sum _{i=1}^{n}(X_{i}-\mu )^{2}{\bigg ]}=\sigma ^{2}.\end{aligned}}

注意到，通常的樣本方差定義為

s^{2}={\frac {1}{n-1}}\sum _{i=1}^{n}(X_{i}-{\overline {X}}\,)^{2},

而這時總體方差的無偏估計量。可以由下式看出：

\operatorname {E} {\big [}({\overline {X}}-\mu )^{2}{\big ]}={\frac {1}{n}}\sigma ^{2}.

方差的有偏（未修正）與無偏估計之比稱為貝塞爾修正（英語：Bessel's correction）。

參見

參考文獻

Brown, George W. "On Small-Sample Estimation." The Annals of Mathematical Statistics, vol. 18, no. 4 (Dec., 1947), pp. 582–585.
JSTOR 2236236
.
Lehmann, E. L.（英語：Erich Leo Lehmann） "A General Concept of Unbiasedness" The Annals of Mathematical Statistics, vol. 22, no. 4 (Dec., 1951), pp. 587–592.
JSTOR 2236928
.
Allan Birnbaum（英語：Allan Birnbaum）, 1961. "A Unified Theory of Estimation, I", The Annals of Mathematical Statistics, vol. 32, no. 1 (Mar., 1961), pp. 112–135.
Van der Vaart, H. R., 1961. "Some Extensions of the Idea of Bias" The Annals of Mathematical Statistics, vol. 32, no. 2 (June 1961), pp. 436–447.
Pfanzagl, Johann. 1994. Parametric Statistical Theory. Walter de Gruyter.
Stuart, Alan; Ord, Keith; Arnold, Steven [F.]. Classical Inference and the Linear Model. Kendall's Advanced Theory of Statistics 2A. Wiley. 2010. ISBN 0-4706-8924-2. .
Voinov, Vassily [G.]; Nikulin, Mikhail [S.]. Unbiased estimators and their applications. 1: Univariate case. Dordrect: Kluwer Academic Publishers. 1993. ISBN 0-7923-2382-3.
Voinov, Vassily [G.]; Nikulin, Mikhail [S.]. Unbiased estimators and their applications. 2: Multivariate case. Dordrect: Kluwer Academic Publishers. 1996. ISBN 0-7923-3939-8.
Klebanov, Lev [B.]; Rachev, Svetlozar [T.]; Fabozzi, Frank [J.]. Robust and Non-Robust Models in Statistics. New York: Nova Scientific Publishers. 2009. ISBN 978-1-60741-768-2.

外部連結

Hazewinkel, Michiel (編), Unbiased estimator, 数学百科全书, Springer, 2001, ISBN 978-1-55608-010-4

^ Richard Arnold Johnson; Dean W. Wichern. Applied Multivariate Statistical Analysis. Pearson Prentice Hall. 2007 [10 August 2012]. ISBN 978-0-13-187715-3. （原始內容存檔於2016-05-29）.

[JohnsonWichern2007-1] Richard Arnold Johnson; Dean W. Wichern. Applied Multivariate Statistical Analysis. Pearson Prentice Hall. 2007 [10 August 2012]. ISBN 978-0-13-187715-3. （原始內容存檔於2016-05-29）.

[1]