R을 이용한 통계 프로그래밍 기초 제7장 유의성 검정

투정이 2009. 3. 25. 17:25

2009. 3. 25. 17:25

# R을 이용한 통계 프로그래밍 기초

# 제7장 유의성 검정

# 7.1 통계적 가설 검정

# 7.2 일집단 t-검정

# 7.2.1 소표본(n <= 30)이며, 모분산(sigma^2)을 모르는 경우

# 예 7.1 : 암컷 원숭이 몸무게가 8.5kg?

x = c(8.3, 9.5, 9.6, 8.75, 8.4, 9.1, 8.15, 8.8)

summary(x)

t.test(x, mu = 8.5) # 평균이 8.5인지 t-검정

# 7.2.2 모분산(sigma^2)을 아는 경우

# 예 7.2 : 암컷 원숭이 몸무게가 8.5kg? 단 모분산은 1.0

x = c(8.3, 9.5, 9.6, 8.75, 8.4, 9.1, 8.15, 8.8) # 자료
var = 1 # 모분산

z = (mean(x) - 8.5) / (var/sqrt(length(x))) # 검정통계량

pz = pnorm(z, 0, 1) # 표준정규분포에서 X <= z인 확률

pvalue = (1 - pz) * 2 # 양측검정이므로 2배

pvalue # 귀무가설(mu = 8.5) 채택

install.packages('UsingR', dependencies=TRUE)
library(UsingR)

simple.z.test(x, 1, conf.level = 0.95) # 모분산을 알고 있을 경우 t-test를 흉내낸 z-test

# 7.3 이집단 t-검정

# 7.3.1 일변량 표본으로 소표본(n <= 30)에서 모분산 sigma1^2 = sigma2^2 = sigma^2이며 sigma^2를 모르는 경우

# 7.3.2 일변량 표본으로 소표본(n <= 30)에서 모분산을 모르며 sigma1^2 /= sigma2^2 인 경우

# 예 7.3 : 지혈제 A와 B의 지혈시간 사이에 유의한 차이가?
x1 = c(1.1, 2.3, 4.3, 2.2, 5.3)
x1
x2 = c(2.3, 4.3, 3.5)
x2

# 1) 두 집단의 분산이 같다고 가정
t.test(x1, x2, var.equal = TRUE, alternative = 'two.sided') # t-test

# 2) 두 집단의 분산이 다르다고 가정
t.test(x1, x2, var.equal = FALSE, conf.level = 0.95) # t-test

# 3) 두 집단의 분산이 같다고 가정할 경우 단측검정 1
t.test(x1, x2, var.equal = TRUE, alt = 'greater')

# 4) 두 집단의 분산이 같다고 가정할 경우 단측검정 2
t.test(x1, x2, var.equal = TRUE, alt = 'less')

# 예 7.4 : 배양법에 따른 호박잎의 질소 성분 함량 차이?
dd = read.table('table_7_4.txt', header = TRUE)

# 1) 분산이 같다고 가정
t.test( x ~ method, var.equal = TRUE, data = dd)

# 2) 분산이 다르다고 가정
t.test( dd$x ~ dd$method, var.equal = FALSE)

# 7.4 이집단 분산비 F-검정

# 예 7.5 : 예 7.3의 지혈제 A와 B의 분산이 동일한가?
x1 = c(1.1, 2.3, 4.3, 2.2, 5.3)
x1
x2 = c(2.3, 4.3, 3.5)
x2

var.test(x1, x2)

# 7.5 짝지어진 표본에 대한 t-검정
pre = c(77, 56, 64, 60, 58, 72, 67, 78, 67, 79)
post = c(99, 80, 78, 65, 59, 67, 65, 85, 74, 80)

t.test(post, pre, paired = TRUE)

# 7.6 일집단 비율에 대한 검정

# 예 7.7 : 모기 150마리 중 110마리 사망, 살충제 85% 효과 있는가?

# 1) 양측 대립 가설인 경우
prop.test(x = 110, n = 150, p = 0.85, alt = 'two.sided')

# 2) 단측 대립 가설인 경우
prop.test(x = 110, n = 150, p = 0.85, alt = 'less')

# 7.7 이집단 비율에 대한 검정

# 예 7.8 두 도시의 후보 지지율에 차이가?

phat = c(100/300, 170/400)
n = c(300, 400)
prop.test(n * phat, n, alt = 'two.sided')

prop.test(c(100,170), c(300,400), alt = 'two.sided')

프로그램 결과

R version 2.8.1 (2008-12-22)
Copyright (C) 2008 The R Foundation for Statistical Computing
ISBN 3-900051-07-0

R은 free 소프트웨어이고, [완전하게 무보증]입니다.
일정한 조건에 따르면, 자유롭게 이것을 재배포할수가 있습니다.
배포 조건의 상세한것에 대해서는 'license()' 또는 'licence()' 라고 입력해주십시오

R는 많은 공헌자에의한 공동 프로젝트입니다
더 자세한것에 대해서는 'contributors()'라고 입력해 주십시오.
또는, R나 R의 패키지를 출판물로 인용할때의 형식에 대해서는
'citation()'라고 입력해주십시오
'demo()'라고 입력하면, demos를 볼수가 있습니다.
'help()'라고 한다면, on-line help가 나옵니다.
'help.start()'로 HTML 브라우저에 의한 help가 보여집니다
'q()'라고 입력하면 R를 종료합니다
> # R을 이용한 통계 프로그래밍 기초
>
> # 제7장 유의성 검정
>
> # 7.1 통계적 가설 검정
>
> # 7.2 일집단 t-검정
>
> # 7.2.1 소표본(n <= 30)이며, 모분산(sigma^2)을 모르는 경우
>
> # 예 7.1 : 암컷 원숭이 몸무게가 8.5kg?
>
> x = c(8.3, 9.5, 9.6, 8.75, 8.4, 9.1, 8.15, 8.8)
>
> summary(x)
Min. 1st Qu. Median Mean 3rd Qu. Max.
8.150 8.375 8.775 8.825 9.200 9.600
>
> t.test(x, mu = 8.5) # 평균이 8.5인지 t-검정

One Sample t-test

data: x
t = 1.6986, df = 7, p-value = 0.1332
alternative hypothesis: true mean is not equal to 8.5
95 percent confidence interval:
8.372577 9.277423
sample estimates:
mean of x
8.825

>
> # 7.2.2 모분산(sigma^2)을 아는 경우
>
> # 예 7.2 : 암컷 원숭이 몸무게가 8.5kg? 단 모분산은 1.0
>
> x = c(8.3, 9.5, 9.6, 8.75, 8.4, 9.1, 8.15, 8.8) # 자료
> var = 1 # 모분산
>
> z = (mean(x) - 8.5) / (var/sqrt(length(x))) # 검정통계량
>
> pz = pnorm(z, 0, 1) # 표준정규분포에서 X <= z인 확률
>
> pvalue = (1 - pz) * 2 # 양측검정이므로 2배
>
> pvalue # 귀무가설(mu = 8.5) 채택
[1] 0.3579707
>
> install.packages('UsingR', dependencies=TRUE)
이 세션으로 사용에 CRAN mirror를 선택해 주십시오
이하에 에러contrib.url(repos, type) :
mirror를 세트 하지 않고, CRAN를 사용하려고 하는것입니다
> library(UsingR)
>
> simple.z.test(x, 1, conf.level = 0.95) # 모분산을 알고 있을 경우 t-test를 흉내낸 z-test
[1] 8.132048 9.517952
>
> # 7.3 이집단 t-검정
>
> # 7.3.1 일변량 표본으로 소표본(n <= 30)에서 모분산 sigma1^2 = sigma2^2 = sigma^2이며 sigma^2를 모르는 경우
>
> # 7.3.2 일변량 표본으로 소표본(n <= 30)에서 모분산을 모르며 sigma1^2 /= sigma2^2 인 경우
>
> # 예 7.3 : 지혈제 A와 B의 지혈시간 사이에 유의한 차이가?
> x1 = c(1.1, 2.3, 4.3, 2.2, 5.3)
> x1
[1] 1.1 2.3 4.3 2.2 5.3
> x2 = c(2.3, 4.3, 3.5)
> x2
[1] 2.3 4.3 3.5
>
> # 1) 두 집단의 분산이 같다고 가정
> t.test(x1, x2, var.equal = TRUE, alternative = 'two.sided') # t-test

Two Sample t-test

data: x1 and x2
t = -0.2956, df = 6, p-value = 0.7775
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
-3.030714 2.377381
sample estimates:
mean of x mean of y
3.040000 3.366667

>
> # 2) 두 집단의 분산이 다르다고 가정
> t.test(x1, x2, var.equal = FALSE, conf.level = 0.95) # t-test

Welch Two Sample t-test

data: x1 and x2
t = -0.34, df = 5.972, p-value = 0.7455
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
-2.680674 2.027341
sample estimates:
mean of x mean of y
3.040000 3.366667

>
> # 3) 두 집단의 분산이 같다고 가정할 경우 단측검정 1
> t.test(x1, x2, var.equal = TRUE, alt = 'greater')

Two Sample t-test

data: x1 and x2
t = -0.2956, df = 6, p-value = 0.6113
alternative hypothesis: true difference in means is greater than 0
95 percent confidence interval:
-2.474048 Inf
sample estimates:
mean of x mean of y
3.040000 3.366667

>
> # 4) 두 집단의 분산이 같다고 가정할 경우 단측검정 2
> t.test(x1, x2, var.equal = TRUE, alt = 'less')

Two Sample t-test

data: x1 and x2
t = -0.2956, df = 6, p-value = 0.3887
alternative hypothesis: true difference in means is less than 0
95 percent confidence interval:
-Inf 1.820714
sample estimates:
mean of x mean of y
3.040000 3.366667

>
> # 예 7.4 : 배양법에 따른 호박잎의 질소 성분 함량 차이?
> dd = read.table('table_7_4.txt', header = TRUE)
>
> # 1) 분산이 같다고 가정
> t.test( x ~ method, var.equal = TRUE, data = dd)

Two Sample t-test

data: x by method
t = 4.0763, df = 7, p-value = 0.004712
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
4.369086 16.440914
sample estimates:
mean in group 1 mean in group 2
26.780 16.375

>
> # 2) 분산이 다르다고 가정
> t.test( dd$x ~ dd$method, var.equal = FALSE)

Welch Two Sample t-test

data: dd$x by dd$method
t = 4.596, df = 4.23, p-value = 0.00881
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
4.251762 16.558238
sample estimates:
mean in group 1 mean in group 2
26.780 16.375

>
> # 7.4 이집단 분산비 F-검정
>
> # 예 7.5 : 예 7.3의 지혈제 A와 B의 분산이 동일한가?
> x1 = c(1.1, 2.3, 4.3, 2.2, 5.3)
> x1
[1] 1.1 2.3 4.3 2.2 5.3
> x2 = c(2.3, 4.3, 3.5)
> x2
[1] 2.3 4.3 3.5
>
> var.test(x1, x2)

F test to compare two variances

data: x1 and x2
F = 2.8895, num df = 4, denom df = 2, p-value = 0.5465
alternative hypothesis: true ratio of variances is not equal to 1
95 percent confidence interval:
0.07362013 30.77032496
sample estimates:
ratio of variances
2.889474

>
> # 7.5 짝지어진 표본에 대한 t-검정
> pre = c(77, 56, 64, 60, 58, 72, 67, 78, 67, 79)
> post = c(99, 80, 78, 65, 59, 67, 65, 85, 74, 80)
>
> t.test(post, pre, paired = TRUE)

Paired t-test

data: post and pre
t = 2.3906, df = 9, p-value = 0.04052
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
0.3974552 14.4025448
sample estimates:
mean of the differences
7.4

>
> # 7.6 일집단 비율에 대한 검정
>
> # 예 7.7 : 모기 150마리 중 110마리 사망, 살충제 85% 효과 있는가?
>
> # 1) 양측 대립 가설인 경우
> prop.test(x = 110, n = 150, p = 0.85, alt = 'two.sided')

1-sample proportions test with continuity correction

data: 110 out of 150, null probability 0.85
X-squared = 15.1111, df = 1, p-value = 0.0001014
alternative hypothesis: true p is not equal to 0.85
95 percent confidence interval:
0.6538678 0.8006060
sample estimates:
p
0.7333333

>
> # 2) 단측 대립 가설인 경우
> prop.test(x = 110, n = 150, p = 0.85, alt = 'less')

1-sample proportions test with continuity correction

data: 110 out of 150, null probability 0.85
X-squared = 15.1111, df = 1, p-value = 5.068e-05
alternative hypothesis: true p is less than 0.85
95 percent confidence interval:
0.000000 0.791249
sample estimates:
p
0.7333333

>
> # 7.7 이집단 비율에 대한 검정
>
> # 예 7.8 두 도시의 후보 지지율에 차이가?
>
> phat = c(100/300, 170/400)
> n = c(300, 400)
> prop.test(n * phat, n, alt = 'two.sided')

2-sample test for equality of proportions with continuity
correction

data: n * phat out of n
X-squared = 5.6988, df = 1, p-value = 0.01698
alternative hypothesis: two.sided
95 percent confidence interval:
-0.16664176 -0.01669158
sample estimates:
prop 1 prop 2
0.3333333 0.4250000

>
> prop.test(c(100,170), c(300,400), alt = 'two.sided')

2-sample test for equality of proportions with continuity
correction

data: c(100, 170) out of c(300, 400)
X-squared = 5.6988, df = 1, p-value = 0.01698
alternative hypothesis: two.sided
95 percent confidence interval:
-0.16664176 -0.01669158
sample estimates:
prop 1 prop 2
0.3333333 0.4250000

>
>

'Programming > R' 카테고리의 다른 글

R을 이용한 통계 프로그래밍 기초 제9장 회귀 분석 (4)	2009.03.26
R을 이용한 통계 프로그래밍 기초 제8장 카이제곱 검정 (4)	2009.03.26
R을 이용한 통계 프로그래밍 기초 제6장 이변량 데이터 (3)	2009.03.20
R을 이용한 통계프로그래밍 기초 제5장 일변량 데이터와 기술통계량 (3)	2009.03.18
R을 이용한 통계 프로그래밍 기초 제4장 난수 발생과 모의실험 (3)	2009.03.16

Let's Study Animal Breeding

R을 이용한 통계 프로그래밍 기초 제7장 유의성 검정

'Programming > R' 카테고리의 다른 글

+ Recent posts

티스토리툴바