다형질 모형의 차원 줄이기(Reduce the Dimension of Multivariate Models)

투정이 2020. 9. 29. 16:41

2020. 9. 29. 16:41

다형질 모형을 분석하는 것은 매우 큰 행렬을 다루는 것이다. 예를 1000마리를 유전평가한다고 하였을 때 single trait model의 경우 MME의 원소 수는 1000 * 1000으로 백만 개의 원소이나, 2 trait model의 경우 2000 * 2000으로 원소 수가 4백만 개로 늘어난다. n traits가 되면 원소 수는 n의 제곱이 된다. 그래서 다형질 모형을 자료 변환을 통하여 단형질 모형으로 분석할 방법을 고안하게 되었다.

여기서는 Canonical transformation을 통해서 다형질 모형을 단형질 모형으로 분석하는 방법을 소개한다. canonical transformaion matrix을 만들어 원래 자료에서 변환 자료를 만들고, 각각의 변환 자료를 단형질 모형으로 분석한다. 이후 단형질 모형의 육종가를 다시 canonical transformation matrix의 역행렬을 이용하여 변환하면 다형질 모형의 육종가와 동일하게 된다.

단 이 방법은 각 형질의 모형이 동일한 경우(same model) 그리고 결측치가 없는 경우(no missing value)에 사용할 수 있어 실제적으로 사용하기에 상당한 제한이 있다.

Linear Models for the Prediction of Animal Breeding Values 3rd Edition, Raphael A. Mrode.

Example 6.1 p97.

Example 5.1 p72에 나오는 다형질 모형을 canonical transformation matrix 이용하여 단형질 모형으로 분석하고 다시 단형질 분석의 육종가를 변환하는 과정

자료 mt01_data.txt

4 1 4.5 6.8
5 2 2.9 5.0
6 2 3.9 6.8
7 1 3.5 6.0
8 1 5.0 7.5

혈통 mt01_pedi.txt

프로그램

# Reduce the dimension of multivariate models
# canonical transformation 
# n traits model to single trait model

# MASS packages
if (!("MASS" %in% installed.packages())) {
    install.packages("MASS", dependencies = TRUE)
}
library(MASS)

# plyr packages
if (!("plyr" %in% installed.packages())) {
    install.packages("plyr", dependencies = TRUE)
}
library(plyr)

# genetic variance - covariance matrix
g = matrix(c(20, 18, 18, 40), nrow = 2)
g

# residual matrix
r = matrix(c(40, 11, 11, 30), nrow = 2)
r

# eigenvalues and eigenvectors
e1 = eigen(r)
e1

# diagonal matirx of eigenvalues
b = diag(e1$values)
b

# eigenvectors
u = e1$vectors
u

# calculate p
p = u %*% sqrt(ginv(b)) %*% t(u)
p

# pgp
pgp = p %*% g %*% t(p)
pgp

# eigenvalues and eigenvectors
e2 = eigen(pgp)
e2

# eigenvectors
l = e2$vectors
l

# canonical transformation matrix
q = t(l) %*% p
q

# new genetic variance
w = q %*% g %*% t(q)
w

# set working directory 
setwd(".") 

# print working directory 
getwd()

# read data : animal, dam, sex, weaning_weight
data = read.table("mt01_data.txt", header = FALSE, sep = "", col.names = c("animal", "sex", "wwg", "pwg"), stringsAsFactors = FALSE)

# print
data

# number of data
ndata = nrow(data)
ndata

# data2
data2 = cbind(data, wwg2 = rep(0, 5), pwg2 = rep(0, 5))
data2

q %*% as.numeric(data2[1, 3:4])

# data transformation
for (i in 1 : ndata) {
    data2[i, 5:6] = q %*% as.numeric(data2[i, 3:4])
}

data2

# wwg2 data
wwg2 = data2[, c('animal','sex','wwg2')]
wwg2

#pwg2
pwg2 = data2[, c('animal','sex','pwg2')]
pwg2

# output wwg2
write.table(wwg2, 'wwg2_data.txt', sep = " ", row.names = FALSE, col.names = FALSE, quote = FALSE)

# output pwg2
write.table(pwg2, 'pwg2_data.txt', sep = " ", row.names = FALSE, col.names = FALSE, quote = FALSE)

#########################################################
# breeding values estimation using renumf90 and blupf90 #
#########################################################

# read solutions
sol_wwg2 = read.table("solutions_wwg2", skip = 1, sep = "", col.names = c("trait", "effect", "level", "solution", "se"), stringsAsFactors = FALSE)
sol_wwg2

# read solutions
sol_pwg2 = read.table("solutions_pwg2", skip = 1, sep = "", col.names = c("trait", "effect", "level", "solution", "se"), stringsAsFactors = FALSE)
sol_pwg2

# breeding values of wwg2
bv_wwg2 = sol_wwg2[sol_wwg2$effect == 2, c(3, 4)]
bv_wwg2

# breeding values of wwg2
bv_pwg2 = sol_pwg2[sol_pwg2$effect == 2, 4]
bv_pwg2

# breeding values
bv_01 = data.frame(animal = bv_wwg2$level, wwg2 = bv_wwg2$solution, pwg2 = bv_pwg2)
bv_01

# read renumbered pedigree 
pedi = read.table("renadd02.ped", sep = "", stringsAsFactors = FALSE) 
pedi

aid_renum = pedi[,c(1,10)]
colnames(aid_renum) = c("animal", "orig_aid")
aid_renum

# join original animal id
bv_02 = join(x = bv_01, y = aid_renum, by = "animal", type = "left")
bv_02

# sort by original id
bv_02 = bv_02[order(bv_02$orig_aid),]
bv_02

# number of animals in pedigree
npedi = nrow(bv_02)
npedi

# add wwg and pwg
bv_03 = cbind(bv_02, wwg = rep(0, npedi), pwg = rep(0, npedi))
bv_03

# inverse of q
iq = ginv(q)
iq

for (i in 1 : npedi) {
    bv_03[i, 5:6] = iq %*% as.numeric(bv_03[i, 2:3])
}
bv_03

실행 결과

> # Reduce the dimension of multivariate models
> # canonical transformation 
> # n traits model to single trait model
> 
> # MASS packages
> if (!("MASS" %in% installed.packages())) {
+     install.packages("MASS", dependencies = TRUE)
+ }
> library(MASS)
> 
> # plyr packages
> if (!("plyr" %in% installed.packages())) {
+     install.packages("plyr", dependencies = TRUE)
+ }
> library(plyr)
> 
> # genetic variance - covariance matrix
> g = matrix(c(20, 18, 18, 40), nrow = 2)
> g
     [,1] [,2]
[1,]   20   18
[2,]   18   40
> 
> # residual matrix
> r = matrix(c(40, 11, 11, 30), nrow = 2)
> r
     [,1] [,2]
[1,]   40   11
[2,]   11   30
> 
> # eigenvalues and eigenvectors
> e1 = eigen(r)
> e1
eigen() decomposition
$values
[1] 47.08305 22.91695

$vectors
           [,1]       [,2]
[1,] -0.8407743  0.5413857
[2,] -0.5413857 -0.8407743

> 
> # diagonal matirx of eigenvalues
> b = diag(e1$values)
> b
         [,1]     [,2]
[1,] 47.08305  0.00000
[2,]  0.00000 22.91695
> 
> # eigenvectors
> u = e1$vectors
> u
           [,1]       [,2]
[1,] -0.8407743  0.5413857
[2,] -0.5413857 -0.8407743
> 
> # calculate p
> p = u %*% sqrt(ginv(b)) %*% t(u)
> p
            [,1]        [,2]
[1,]  0.16424710 -0.02874736
[2,] -0.02874736  0.19038107
> 
> # pgp
> pgp = p %*% g %*% t(p)
> pgp
          [,1]      [,2]
[1,] 0.4026185 0.2643755
[2,] 0.2643755 1.2692999
> 
> # eigenvalues and eigenvectors
> e2 = eigen(pgp)
> e2
eigen() decomposition
$values
[1] 1.3435798 0.3283387

$vectors
          [,1]       [,2]
[1,] 0.2704897 -0.9627229
[2,] 0.9627229  0.2704897

> 
> # eigenvectors
> l = e2$vectors
> l
          [,1]       [,2]
[1,] 0.2704897 -0.9627229
[2,] 0.9627229  0.2704897
> 
> # canonical transformation matrix
> q = t(l) %*% p
> q
            [,1]       [,2]
[1,]  0.01675141 0.17550834
[2,] -0.16590031 0.07917187
> 
> # new genetic variance
> w = q %*% g %*% t(q)
> w
              [,1]          [,2]
[1,]  1.343580e+00 -1.110223e-16
[2,] -4.163336e-17  3.283387e-01
> 
> # set working directory 
> setwd(".") 
> 
> # print working directory 
> getwd()
[1] "D:/temp/07_multi2single"
> 
> # read data : animal, dam, sex, weaning_weight
> data = read.table("mt01_data.txt", header = FALSE, sep = "", col.names = c("animal", "sex", "wwg", "pwg"), stringsAsFactors = FALSE)
> 
> # print
> data
  animal sex wwg pwg
1      4   1 4.5 6.8
2      5   2 2.9 5.0
3      6   2 3.9 6.8
4      7   1 3.5 6.0
5      8   1 5.0 7.5
> 
> # number of data
> ndata = nrow(data)
> ndata
[1] 5
> 
> # data2
> data2 = cbind(data, wwg2 = rep(0, 5), pwg2 = rep(0, 5))
> data2
  animal sex wwg pwg wwg2 pwg2
1      4   1 4.5 6.8    0    0
2      5   2 2.9 5.0    0    0
3      6   2 3.9 6.8    0    0
4      7   1 3.5 6.0    0    0
5      8   1 5.0 7.5    0    0
> 
> q %*% as.numeric(data2[1, 3:4])
           [,1]
[1,]  1.2688381
[2,] -0.2081827
> 
> # data transformation
> for (i in 1 : ndata) {
+     data2[i, 5:6] = q %*% as.numeric(data2[i, 3:4])
+ }
> 
> data2
  animal sex wwg pwg      wwg2        pwg2
1      4   1 4.5 6.8 1.2688381 -0.20818267
2      5   2 2.9 5.0 0.9261208 -0.08525154
3      6   2 3.9 6.8 1.2587872 -0.10864249
4      7   1 3.5 6.0 1.1116800 -0.10561986
5      8   1 5.0 7.5 1.4000696 -0.23571251
> 
> # wwg2 data
> wwg2 = data2[, c('animal','sex','wwg2')]
> wwg2
  animal sex      wwg2
1      4   1 1.2688381
2      5   2 0.9261208
3      6   2 1.2587872
4      7   1 1.1116800
5      8   1 1.4000696
> 
> #pwg2
> pwg2 = data2[, c('animal','sex','pwg2')]
> pwg2
  animal sex        pwg2
1      4   1 -0.20818267
2      5   2 -0.08525154
3      6   2 -0.10864249
4      7   1 -0.10561986
5      8   1 -0.23571251
> 
> # output wwg2
> write.table(wwg2, 'wwg2_data.txt', sep = " ", row.names = FALSE, col.names = FALSE, quote = FALSE)
> 
> # output pwg2
> write.table(pwg2, 'pwg2_data.txt', sep = " ", row.names = FALSE, col.names = FALSE, quote = FALSE)
> 
> #########################################################
> # breeding values estimation using renumf90 and blupf90 #
> #########################################################
> 
> # read solutions
> sol_wwg2 = read.table("solutions_wwg2", skip = 1, sep = "", col.names = c("trait", "effect", "level", "solution", "se"), stringsAsFactors = FALSE)
> sol_wwg2
   trait effect level    solution        se
1      1      1     1  1.26648926 1.0167961
2      1      1     2  1.08895002 1.1351950
3      1      2     1 -0.08935869 1.0261539
4      1      2     2 -0.00239531 0.9989446
5      1      2     3  0.09539732 1.0284425
6      1      2     4  0.07287388 0.9893837
7      1      2     5 -0.08838927 1.0012954
8      1      2     6  0.05159845 1.0980228
9      1      2     7 -0.00159342 1.1410714
10     1      2     8 -0.03120867 1.0653321
> 
> # read solutions
> sol_pwg2 = read.table("solutions_pwg2", skip = 1, sep = "", col.names = c("trait", "effect", "level", "solution", "se"), stringsAsFactors = FALSE)
> sol_pwg2
   trait effect level    solution        se
1      1      1     1 -0.18510837 0.7112092
2      1      1     2 -0.09805258 0.8369194
3      1      2     1  0.01452058 0.5482942
4      1      2     2  0.00069546 0.5417348
5      1      2     3 -0.00480559 0.5484205
6      1      2     4 -0.00940595 0.5391831
7      1      2     5  0.00701672 0.5418626
8      1      2     6 -0.00290040 0.5606925
9      1      2     7  0.00195090 0.5697311
10     1      2     8 -0.00048094 0.5543938
> 
> # breeding values of wwg2
> bv_wwg2 = sol_wwg2[sol_wwg2$effect == 2, c(3, 4)]
> bv_wwg2
   level    solution
3      1 -0.08935869
4      2 -0.00239531
5      3  0.09539732
6      4  0.07287388
7      5 -0.08838927
8      6  0.05159845
9      7 -0.00159342
10     8 -0.03120867
> 
> # breeding values of wwg2
> bv_pwg2 = sol_pwg2[sol_pwg2$effect == 2, 4]
> bv_pwg2
[1]  0.01452058  0.00069546 -0.00480559 -0.00940595  0.00701672 -0.00290040  0.00195090 -0.00048094
> 
> # breeding values
> bv_01 = data.frame(animal = bv_wwg2$level, wwg2 = bv_wwg2$solution, pwg2 = bv_pwg2)
> bv_01
  animal        wwg2        pwg2
1      1 -0.08935869  0.01452058
2      2 -0.00239531  0.00069546
3      3  0.09539732 -0.00480559
4      4  0.07287388 -0.00940595
5      5 -0.08838927  0.00701672
6      6  0.05159845 -0.00290040
7      7 -0.00159342  0.00195090
8      8 -0.03120867 -0.00048094
> 
> # read renumbered pedigree 
> pedi = read.table("renadd02.ped", sep = "", stringsAsFactors = FALSE) 
> pedi
  V1 V2 V3 V4 V5 V6 V7 V8 V9 V10
1  1  2  5  1  0  2  1  0  0   7
2  7  0  0  3  0  0  0  0  2   2
3  2  6  0  2  0  1  1  1  0   4
4  3  6  7  1  0  2  1  0  1   6
5  6  0  0  3  0  0  0  2  0   1
6  4  8  3  1  0  2  1  0  0   8
7  8  0  0  3  0  0  0  2  0   3
8  5  8  7  1  0  2  1  0  1   5
> 
> aid_renum = pedi[,c(1,10)]
> colnames(aid_renum) = c("animal", "orig_aid")
> aid_renum
  animal orig_aid
1      1        7
2      7        2
3      2        4
4      3        6
5      6        1
6      4        8
7      8        3
8      5        5
> 
> # join original animal id
> bv_02 = join(x = bv_01, y = aid_renum, by = "animal", type = "left")
> bv_02
  animal        wwg2        pwg2 orig_aid
1      1 -0.08935869  0.01452058        7
2      2 -0.00239531  0.00069546        4
3      3  0.09539732 -0.00480559        6
4      4  0.07287388 -0.00940595        8
5      5 -0.08838927  0.00701672        5
6      6  0.05159845 -0.00290040        1
7      7 -0.00159342  0.00195090        2
8      8 -0.03120867 -0.00048094        3
> 
> # sort by original id
> bv_02 = bv_02[order(bv_02$orig_aid),]
> bv_02
  animal        wwg2        pwg2 orig_aid
6      6  0.05159845 -0.00290040        1
7      7 -0.00159342  0.00195090        2
8      8 -0.03120867 -0.00048094        3
2      2 -0.00239531  0.00069546        4
5      5 -0.08838927  0.00701672        5
3      3  0.09539732 -0.00480559        6
1      1 -0.08935869  0.01452058        7
4      4  0.07287388 -0.00940595        8
> 
> # number of animals in pedigree
> npedi = nrow(bv_02)
> npedi
[1] 8
> 
> # add wwg and pwg
> bv_03 = cbind(bv_02, wwg = rep(0, npedi), pwg = rep(0, npedi))
> bv_03
  animal        wwg2        pwg2 orig_aid wwg pwg
6      6  0.05159845 -0.00290040        1   0   0
7      7 -0.00159342  0.00195090        2   0   0
8      8 -0.03120867 -0.00048094        3   0   0
2      2 -0.00239531  0.00069546        4   0   0
5      5 -0.08838927  0.00701672        5   0   0
3      3  0.09539732 -0.00480559        6   0   0
1      1 -0.08935869  0.01452058        7   0   0
4      4  0.07287388 -0.00940595        8   0   0
> 
> # inverse of q
> iq = ginv(q)
> iq
         [,1]       [,2]
[1,] 2.600648 -5.7651217
[2,] 5.449516  0.5502527
> 
> for (i in 1 : npedi) {
+     bv_03[i, 5:6] = iq %*% as.numeric(bv_03[i, 2:3])
+ }
> bv_03
  animal        wwg2        pwg2 orig_aid         wwg          pwg
6      6  0.05159845 -0.00290040        1  0.15091058  0.279590613
7      7 -0.00159342  0.00195090        2 -0.01539110 -0.007609879
8      8 -0.03120867 -0.00048094        3 -0.07839010 -0.170336777
2      2 -0.00239531  0.00069546        4 -0.01023877 -0.012670601
5      5 -0.08838927  0.00701672        5 -0.27032165 -0.477817750
3      3  0.09539732 -0.00480559        6  0.27579969  0.517224909
1      1 -0.08935869  0.01452058        7 -0.31610344 -0.478971601
4      4  0.07287388 -0.00940595        8  0.24374578  0.391951708
>

중간에 eigenvector가 책의 결과와 다르다. 그러나 eigenvector는 유일한 것이 아니다. 그냥 무시하고 진행하고 결과를 비교해 보니 동일하였다.

저작자표시

'Animal Breeding > R for Genetic Evaluation' 카테고리의 다른 글

Fixed Effect Model for SNP Effects, unweighted analysis (0)	2020.12.16
임의 회귀 모형의 해를 이용한 개체 육종가 그래프 및 305일령 육종가 계산 (0)	2020.11.26
R을 이용한 일반적인 모형의 육종가 구하기 프로그래밍 (0)	2020.08.25
R을 이용한 다형질 개체 모형(unequal design matrices)의 육종가 구하기 프로그래밍 (0)	2020.07.29
R을 이용한 다형질 개체 모형(same model and missing record)의 육종가 구하기 프로그래밍 (0)	2020.07.26

Let's Study Animal Breeding

다형질 모형의 차원 줄이기(Reduce the Dimension of Multivariate Models)

'Animal Breeding > R for Genetic Evaluation' 카테고리의 다른 글

+ Recent posts

티스토리툴바