2010年 第48卷 第2期: 161-168
作者:王世骐,邓涛
摘要:基于多元统计分析中对样本完整性的要求,为了在分析中不抛弃大量不完整的化石标本或者不大大减少变量,创建了一种恢复标本残缺数据的方法。本方法基于线性回归理论,假设同类标本个体之间的区别仅仅是大小的区别,形状的区别可以忽略不计,因此,在同类标本中,可以用一件标本的已知测量数据预测另一件标本的残缺测量数据。在多件标本的情况下,对某件标本的某个残缺数据的预测结果是用其他标本分别进行预测所得值的加权平均,加权系数的选取与每件标本的保存完好程度相关。用现生马属头骨及肢骨标本做的数据试验证明,该方法具有良好的稳定性,对标本的种类、数量及残缺值的多少均不敏感,对于尺寸较大的标本或数值较大的数据的预测效果要比对尺寸较小的标本或数值较小的数据的预测效果要好。与传统的线性回归方法的不同之处在于,本方法利用的是样本(即标本)间的线性相关性,传统方法利用的是变量(即测量项)间的线性相关性。在通常情况下,样本间的线性相关程度要优于变量间的线性相关程度。本方法简单实用,在对化石标本进行统计分析,特别是多元统计分析中具有良好的应用前景。
关键词:线性回归,最小二乘法,化石标本,马属
卷期:48卷 02期
RECOVERING THE MISSING DATA OF DEFECTIVE FOSSIL SPECIMENS USING LINEAR REGRESSION METHOD
WANG Shi-Qi DENG Tao
Abstract In multivariate statistical analyses, intact specimens are essentially required. In order to help researchers avoid having to discard many defective fossil specimens or greatly reduce the number of variables in their analyses, we developed a method based on the theory of linear regression for recovering missing data for defective fossil specimens. Using this method, missing value of measurements can be predicted based on other intact or defective equivalent specimens. Numerical tests have been carried out on the head and limb bones of extant Equus. The results show that our method, which is relatively insensitive to the quantity, preservational quality and type of available specimens, has satisfactory stability. The predictive accuracy is best for large specimens or measurements of large magnitude. Furthermore, our method is distinct from traditional linear regression methods in utilizing linear correlations between specimens rather than between variables, since the correlations are usually stronger in the case of the former procedure. Our method is simple in theory and practice, and should be broadly applicable to statistical analyses of fossil specimens, particularly if multivariate.
Key words linear regression, least square method, fossil specimens, Equus