R,从零水平开始。国内真的没有一本像样的R教科书啊!勉强用用薛毅编的记录建模与R软件吧,找不出更好的了工作环境仍是linux。第二章答案:Ex2.1x-c(1,2,3)y-c(4,5,6)e-c(1,1,1)z=2*x+y+ez1=crossprod(x,y)#z1为x1与x2的内积 或者 x%*%yz2=tcrossprod(x,y)#z1为x1与x2的外积 或者 x%o%yz;z1;z2要点:基本的列表赋值措施,内积和外积概念。内积为标量,外积为矩阵。Ex2.2A-matrix(1:20,c(4,5);AB-matrix(1:20,nrow=4,byrow=TRUE);BC=A+B;C#不

2、存在AB这种写法E=A*B;EF-A1:3,1:3;FH-matrix(c(1,2,4,5),nrow=1);H#H起过渡作用,不规则的数组下标G-B,H;G要点:矩阵赋值措施。默认是byrow=FALSE,数据按列放置。取出部分数据的措施。可以用数组作为数组的下标取出数组元素。Ex2.3x-c(rep(1,times=5),rep(2,times=3),rep(3,times=4),rep(4,times=2);x #或者省略times=,如下面的形式x-c(rep(1,5),rep(2,3),rep(3,4),rep(4,2);x要点:rep()的使用措施。rep(a,b)即将a反复b次E

3、x2.4n - 5; H-array(0,dim=c(n,n)for (i in 1:n)for (j in 1:n)Hi,j-1/(i+j-1);HG - solve(H);G #求H的逆矩阵ev - eigen(H);ev #求H的特性值和特性向量要点:数组初始化;for循环的使用待解决:如何将很长的命令(如for循环)用几行打出来再执行?每次想换行的时候一按回车就执行了还没打完的命令.Ex2.5StudentData-data.frame(name=c(zhangsan,lisi,wangwu,zhaoliu,dingyi),sex=c(F,M,F,M,F),age=c(14,15,16


5、存入数据框StudentData_a中。write.csv(StudentData_a,studentdata.csv)#把数据框StudentData_a在工作目录里输出,输出的文献名为studentdata.csv,可用Excel打开.要点:读写文献。read.table(file)write.table(Rdata,file)read.csv(file)write.csv(Rdata,file)外部文献,不管是待读入或是要写出的,命令中都得加双引号。Ex2.7Fun-function(n)if(n = 0)list(fail=please input a integer above 0!

13、.5 79.5 68.8 75.0 78.8 72.0 68.8 76.5 73.5 72.7 75.0 70.478.0 78.8 74.3 64.3 76.5 74.3 74.7 70.4 72.7 76.5 70.4 72.0 75.8 75.8 70.476.5 65.0 77.2 73.5 72.7 80.5 72.0 65.0 80.3 71.2 77.6 76.5 68.8 73.5 77.280.5 72.0 74.3 69.7 81.2 67.3 81.6 67.3 72.7 84.3 69.7 74.3 71.2 74.3 75.072.0 75.4 67.3 81.6 7

14、5.0 71.2 71.2 69.7 73.5 70.4 75.0 72.7 67.3 70.3 76.573.5 72.0 68.0 73.5 68.0 74.3 72.7 72.7 74.3 70.4编写一种函数(程序名为data_outline.R)描述样本的多种描述性记录量。data_outline-function(x)n-length(x)m-mean(x)v-var(x)s-sd(x)me-median(x)cv-100*s/mcss-sum(x-m)2)uss-sum(x2)R - max(x)-min(x)R1 -quantile(x,3/4)-quantile(x,1/4)

15、sm -s/sqrt(n)g1 -n/(n-1)*(n-2)*sum(x-m)3)/s3g2 -(n*(n+1)/(n-1)*(n-2)*(n-3)*sum(x-m)4)/s4-(3*(n-1)2)/(n-2)*(n-3)data.frame(N=n,Mean=m,Var=v,std_dev=s,Median=me,std_mean=sm,CV=cv,CSS=css,USS=uss,R=R,R1=R1,Skewness=g1,Kurtosis=g2,row.names=1)进入R,source(data_outline.R) #将程序调入内存serumdata-scan(3.1.txt);se

16、rumdata #将数据读入向量serumdata。data_outline(serumdata)成果如下: N Mean Var std_dev Median std_mean CV CSS USS R1 100 73.696 15.41675 3.926417 73.5 0.3926417 5.327857 1526.258 544636.3 20 R1 Skewness Kurtosis1 4.6 0.03854249 0.07051809要点:read.table()用于读表格形式的文献。上述形式的数据由于第七行缺几种数据,故用read.table()不能读入。 scan()可以直接读

17、纯文本文献。scan()和matrix()连用还可以将数据寄存成矩阵形式。 X-matrix(scan(3.1.txt,0),ncol=10,byrow=TRUE) #将上述数据放置成10*10的矩阵。scan()还可以从屏幕上直接输入数据。 Yhist(serumdata,freq=FALSE,col=purple,border=red,density=3,angle=60,main=paste(the histogram of serumdata),xlab=age,ylab=frequency)#直方图。col是填充颜色。默认空白。border是边框的颜色,默认前景色。density是在

18、图上画条纹阴影,默认不画。angle是条纹阴影的倾斜角度(逆时针方向),默认45度。main, xlab, ylab是标题,x和y坐标轴名称。lines(density(serumdata),col=blue)#密度估计曲线。x lines(x,dnorm(x,mean(serumdata),sd(serumdata),col=green) #正态分布的概率密度曲线 plot(ecdf(serumdata),verticals=TRUE,do.p=FALSE) #绘制经验分布图 lines(x,pnorm(x,mean(serumdata),sd(serumdata),col=blue) #正

19、态经验分布 qqnorm(serumdata,col=purple) #绘制QQ图 qqline(serumdata,col=red) #绘制QQ直线Ex3.3 stem(serumdata,scale=1) #作茎叶图。原始数据小数点后数值四舍五入。The decimal point is at the | 64 | 300 66 | 23333 68 | 00888777 70 | 72 | 5 74 | 76 | 78 | 0888555 80 | 355266 82 | 84 | 3boxplot(serumdata,col=lightblue,notch=T) #作箱线图。notch

20、表达带有缺口。 fivenum(serumdata) #五数总结1 64.3 71.2 73.5 75.8 84.3Ex3.4 shapiro.test(serumdata) #正态性Shapori-Wilk检查措施 Shapiro-Wilk normality testdata: serumdataW = 0.9897, p-value = 0.6437结论:p值0.05,可觉得来自正态分布的总体。 ks.test(serumdata,pnorm,mean(serumdata),sd(serumdata) #Kolmogrov-Smirnov检查,正态性 One-sample Kolmogo

21、rov-Smirnov testdata: serumdataD = 0.0701, p-value = 0.7097alternative hypothesis: two-sidedWarning message:In ks.test(serumdata, pnorm, mean(serumdata), sd(serumdata) : cannot compute correct p-values with ties结论:p值0.05,可觉得来自正态分布的总体。注意,这里的警告信息,是由于数据中有反复的数值,ks检查规定待检数据时持续的,不容许反复值。Ex3.5 y f plot(f,y,c

22、ol=lightgreen) #plot()生成箱线图 x y z boxplot(x,y,z,names=c(1,2,3),col=c(5,6,7) #boxplot()生成箱线图结论:第2和第3组没有明显差别。第1组合其她两组有明显差别。Ex3.6数据太多,懒得录入。离散图应当用plot即可。Ex3.7 studata data.frame(studata) #转化为数据框 V1 V2 V3 V4 V5 V61 1 alice f 13 56.5 84.02 2 becka f 13 65.3 98.03 3 gail f 14 64.3 90.04 4 karen f 12 56.3 7

23、7.05 5 kathy f 12 59.8 84.56 6 mary f 15 66.5 112.07 7 sandy f 11 51.3 50.58 8 sharon f 15 62.5 112.59 9 tammy f 14 62.8 102.510 10 alfred m 14 69.0 112.511 11 duke m 14 63.5 102.512 12 guido m 15 67.0 133.013 13 james m 12 57.3 83.014 14 jeffery m 13 62.5 84.015 15 john m 12 59.0 99.516 16 philip m

24、 16 72.0 150.017 17 robert m 12 64.8 128.018 18 thomas m 11 57.5 85.019 19 william m 15 66.5 112.0 names(studata) attach(studata) #将数据框调入内存 plot(weightheight,col=red) #体重对于身高的散点图 coplot(weightheight|sex,col=blue) #不同性别,体重与身高的散点图 coplot(weightheight|age,col=blue) #不同年龄,体重与身高的散点图 coplot(weightheight|a

25、ge+sex,col=blue) #不同年龄和性别,体重与身高的散点图Ex3.8 x y f z contour(x,y,z,levels=c(0,1,2,3,4,5,10,15,20,30,40,50,60,80,100),col=blue) #二维等值线 persp(x,y,z,theta=120,phi=0,expand=0.7,col=lightblue) #三位网格曲面Ex3.9 attach(studata) cor.test(height,weight) #Pearson有关性检查 Pearsons product-moment correlationdata: height a

26、nd weightt = 7.5549, df = 17, p-value = 7.887e-07alternative hypothesis: true correlation is not equal to 095 percent confidence interval: 0.7044314 0.9523101sample estimates: cor0.8777852由此可见身高和体重是有关的。Ex4.2指数分布,的极大似然估计是n/sum(Xi) x lamda x mean(x)1 1平均为1个。Ex4.4 obj-function(x)f x0nlm(obj,x0)$minimum

27、1 48.98425$estimate1 11.4127791 -0.8968052$gradient1 1.411401e-08 -1.493206e-07$code1 1$iterations1 16Ex4.5 x t.test(x) #t.test()做单样本正态分布区间估计 One Sample t-testdata: xt = 35.947, df = 9, p-value = 4.938e-11alternative hypothesis: true mean is not equal to 095 percent confidence interval: 63.1585 71.6

28、415sample estimates:mean of x 67.4平均脉搏点估计为 67.4 ,95%区间估计为 63.1585 71.6415 。 t.test(x,alternative=less,mu=72) #t.test()做单样本正态分布单侧区间估计 One Sample t-testdata: xt = -2.4534, df = 9, p-value = 0.01828alternative hypothesis: true mean is less than 7295 percent confidence interval: -Inf 70.83705sample esti

29、mates:mean of x 67.4p值不不小于0.05,回绝原假设,平均脉搏低于常人。要点:t.test()函数的用法。本例为单样本;可做双边和单侧检查。Ex4.6 x y t.test(x,y,var.equal=TRUE) Two Sample t-testdata: x and yt = 4.6287, df = 18, p-value = 0.0002087alternative hypothesis: true difference in means is not equal to 095 percent confidence interval: 7.53626 20.0637

30、4sample estimates:mean of x mean of y 140.6 126.8盼望差的95%置信区间为 7.53626 20.06374 。要点:t.test()可做两正态样本均值差估计。此例觉得两样本方差相等。ps:我怎么觉得这题应当用配对t检查?Ex4.7 x y t.test(x,y,var.equal=TRUE) Two Sample t-testdata: x and yt = 1.198, df = 7, p-value = 0.2699alternative hypothesis: true difference in means is not equal t

31、o 095 percent confidence interval: -0. 0.sample estimates:mean of x mean of y 0.14125 0.13920 盼望差的95%的区间估计为-0. 0.Ex4.8接Ex4.6 var.test(x,y) F test to compare two variancesdata: x and yF = 0.2353, num df = 9, denom df = 9, p-value = 0.04229alternative hypothesis: true ratio of variances is not equal t

32、o 195 percent confidence interval: 0.05845276 0.94743902sample estimates:ratio of variances 0.2353305要点:var.test可做两样本方差比的估计。基于此成果可觉得方差不等。因此,在Ex4.6中,计算盼望差时应当采用方差不等的参数。 t.test(x,y) Welch Two Sample t-testdata: x and yt = 4.6287, df = 13.014, p-value = 0.0004712alternative hypothesis: true difference i

33、n means is not equal to 095 percent confidence interval: 7.359713 20.240287sample estimates:mean of x mean of y 140.6 126.8盼望差的95%置信区间为 7.359713 20.240287 。要点:t.test(x,y,var.equal=TRUE)做方差相等的两正态样本的均值差估计 t.test(x,y)做方差不等的两正态样本的均值差估计Ex4.9 x n tmp mean(x)1 1.904762 mean(x)-tmp;mean(x)+tmp1 1.4940411 2.

34、315483平均呼唤次数为1.90.95的置信区间为1.49,2,32Ex4.10 x t.test(x,alternative=greater) One Sample t-testdata: xt = 23.9693, df = 9, p-value = 9.148e-10alternative hypothesis: true mean is greater than 095 percent confidence interval: 920.8443 Infsample estimates:mean of x 997.1灯泡平均寿命置信度95%的单侧置信下限为 920.8443 要点:t.

35、test()做单侧置信区间估计记录建模与R软件第五章习题答案(假设检查)Ex5.1 x t.test(x,mu=225) One Sample t-testdata: xt = -3.4783, df = 19, p-value = 0.002516alternative hypothesis: true mean is not equal to 22595 percent confidence interval:172.3827 211.9173sample estimates:mean of x 192.15原假设:油漆工人的血小板计数与正常成年男子无差别。备择假设:油漆工人的血小板计数与

36、正常成年男子有差别。p值不不小于0.05,回绝原假设,觉得油漆工人的血小板计数与正常成年男子有差别。上述检查是双边检查。也可采用单边检查。备择假设:油漆工人的血小板计数不不小于正常成年男子。 t.test(x,mu=225,alternative=less) One Sample t-testdata: xt = -3.4783, df = 19, p-value = 0.001258alternative hypothesis: true mean is less than 22595 percent confidence interval: -Inf 208.4806sample esti

37、mates:mean of x 192.15同样可得出油漆工人的血小板计数不不小于正常成年男子的结论。Ex5.2 pnorm(1000,mean(x),sd(x)1 0.5087941 x1 1067 919 1196 785 1126 936 918 1156 920 948 pnorm(1000,mean(x),sd(x)1 0.5087941x A B t.test(A,B,paired=TRUE) Paired t-testdata: A and Bt = -0.6513, df = 7, p-value = 0.5357alternative hypothesis: true dif

38、ference in means is not equal to 095 percent confidence interval:-15.62889 8.87889sample estimates:mean of the differences -3.375p值不小于0.05,接受原假设,两种措施治疗无差别。Ex5.4(1)正态性W检查:xy shapiro.test(x) Shapiro-Wilk normality testdata: xW = 0.9699, p-value = 0.7527 shapiro.test(y) Shapiro-Wilk normality testdata:

39、 yW = 0.971, p-value = 0.7754ks检查: ks.test(x,pnorm,mean(x),sd(x) One-sample Kolmogorov-Smirnov testdata: xD = 0.1065, p-value = 0.977alternative hypothesis: two-sidedWarning message:In ks.test(x, pnorm, mean(x), sd(x) : cannot compute correct p-values with ties ks.test(y,pnorm,mean(y),sd(y) One-samp

40、le Kolmogorov-Smirnov testdata: yD = 0.1197, p-value = 0.9368alternative hypothesis: two-sidedWarning message:In ks.test(y, pnorm, mean(y), sd(y) : cannot compute correct p-values with tiespearson拟合优度检查,以x为例。 sort(x)1 -5.6 -1.6 -1.4 -0.7 -0.5 0.4 0.7 1.7 2.0 2.5 2.5 2.8 3.0 3.5 4.016 4.5 4.6 5.8 6.0

41、 7.1 x1 p p1 0.04894712 0.24990009 0.6288 0.90075856 0.98828138 p chisq.test(x1,p=p) Chi-squared test for given probabilitiesdata: x1X-squared = 0.5639, df = 4, p-value = 0.967Warning message:In chisq.test(x1, p = p) : Chi-squared approximation may be incorrectp值为0.967,接受原假设,x符合正态分布。(2)方差相似模型t检查: t.

42、test(x,y,var.equal=TRUE) Two Sample t-testdata: x and yt = -0.6419, df = 38, p-value = 0.5248alternative hypothesis: true difference in means is not equal to 095 percent confidence interval:-2.326179 1.206179sample estimates:mean of x mean of y 2.065 2.625方差不同模型t检查: t.test(x,y) Welch Two Sample t-te

43、stdata: x and yt = -0.6419, df = 36.086, p-value = 0.525alternative hypothesis: true difference in means is not equal to 095 percent confidence interval:-2.32926 1.20926sample estimates:mean of x mean of y 2.065 2.625配对t检查: t.test(x,y,paired=TRUE) Paired t-testdata: x and yt = -0.6464, df = 19, p-va

44、lue = 0.5257alternative hypothesis: true difference in means is not equal to 095 percent confidence interval:-2.373146 1.253146sample estimates:mean of the differences -0.56三种检查的成果都显示两组数据均值无差别。(3)方差检查: var.test(x,y) F test to compare two variancesdata: x and yF = 1.5984, num df = 19, denom df = 19,

45、p-value = 0.3153alternative hypothesis: true ratio of variances is not equal to 195 percent confidence interval:0.6326505 4.0381795sample estimates:ratio of variances 1.598361接受原假设,两组数据方差相似。Ex5.5 a b ks.test(a,pnorm,mean(a),sd(a) One-sample Kolmogorov-Smirnov testdata: aD = 0.1464, p-value = 0.9266a

46、lternative hypothesis: two-sided ks.test(b,pnorm,mean(b),sd(b) One-sample Kolmogorov-Smirnov testdata: bD = 0.2222, p-value = 0.707alternative hypothesis: two-sidedWarning message:In ks.test(b, pnorm, mean(b), sd(b) : cannot compute correct p-values with tiesa和b都服从正态分布。方差齐性检查: var.test(a,b) F test t

47、o compare two variancesdata: a and bF = 1.9646, num df = 11, denom df = 9, p-value = 0.3200alternative hypothesis: true ratio of variances is not equal to 195 percent confidence interval:0.5021943 7.0488630sample estimates:ratio of variances 1.964622可觉得a和b的方差相似。选用方差相似模型t检查: t.test(a,b,var.equal=TRUE

48、) Two Sample t-testdata: a and bt = -8.8148, df = 20, p-value = 2.524e-08alternative hypothesis: true difference in means is not equal to 095 percent confidence interval:-48.24975 -29.78358sample estimates:mean of x mean of y125.5833 164.6000可觉得两者有差别。Ex5.6二项分布总体的假设检查: binom.test(57,400,p=0.147) Exac

49、t binomial testdata: 57 and 400number of successes = 57, number of trials = 400, p-value = 0.8876alternative hypothesis: true probability of success is not equal to 0.14795 percent confidence interval:0.1097477 0.1806511sample estimates:probability of success 0.1425P 值0.05,故接受原假设,表达调查成果支持该市老年人口的见解Ex

50、5.7二项分布总体的假设检查: binom.test(178,328,p=0.5,alternative=greater) Exact binomial testdata: 178 and 328number of successes = 178, number of trials = 328, p-value = 0.06794alternative hypothesis: true probability of success is greater than 0.595 percent confidence interval:0.4957616 1.0000000sample estimates:probability of success 0.5426829不能觉得这种解决能增长母鸡的比例。Ex5.8运用pearson卡方检查与否符合特定分布: chisq.test(c(315,101,108,32),p=c(9,3,3,1)/16) Chi-squared test for given probabilitiesdata:

