Let's Study Animal Breeding

전체 글

The derivative of a determinant(행렬의 행렬식의 도함수, 미분) 2020.05.21 1
R을 이용한 단형질 개체 모형(Single Trait Animal Model)의 육종가 구하기 프로그래밍 2020.05.18
Maximum Likelihood Estimation(MLE) 2020.04.24 1
R을 이용한 혈통 파일 처리 2020.04.10
SNP Effect를 이용한 DGV(or GEBV) 추정 2020.03.05

The derivative of a determinant(행렬의 행렬식의 도함수, 미분)

투정이 2020. 5. 21. 20:12

2020. 5. 21. 20:12

The derivative of a determinant

행렬의 행렬식의 도함수

행렬의 행렬식의 미분

로그 행렬식의 미분, 도함수

저작자표시

'ETC' 카테고리의 다른 글

pgAdmin 4를 이용한 Backup과 Restore (0)	2020.07.27
PostgreSQL psql을 이용한 role, database, table 생성 (0)	2020.07.27
Maximum Likelihood Estimation(MLE) (1)	2020.04.24
회귀분석 - Piecewise regression (0)	2020.03.01
회귀분석 - 가변수 (0)	2020.03.01

R을 이용한 단형질 개체 모형(Single Trait Animal Model)의 육종가 구하기 프로그래밍

투정이 2020. 5. 18. 16:41

2020. 5. 18. 16:41

R을 이용하여 단형질 개체 모형의 육종가, 신뢰도, 정확도, 육종가 분할 및 DYD(daughter yield deviation)을 구하는 프로그래밍을 한다. BLUPF90, ASREML, MIXBLUP, WOMBAT 등과 같은 전문적인 유전 능력 평가 프로그램을 이용할 경우 육종가(정확도 포함)를 구할 수 있으나, 그 과정을 이해하려면 R로 해야만 한다. 그리고 유전 능력 평가 프로그램도 육종가 분할이나 DYD를 구해 주진 않는다. 자료가 클 경우 본 프로그램을 이용하여 육종가를 구하는 것은 불가능한다. 그러나 육종가를 분할할 때 또는 DYD를 구해야 할 경우, 또는 알고리듬을 이해해야 하는 경우 유용할 것으로 생각한다.

혈통 자료 - stam_pedi.txt

animal, sire, dam
1, 0, 0 
2, 0, 0 
3, 0, 0 
4, 1, 0 
5, 3, 2 
6, 1, 2 
7, 4, 5 
8, 3, 6

표현형 자료 - stam_data.txt

animal, sex, gain
4, 1, 4.5
5, 2, 2.9
6, 2, 3.9
7, 1, 3.5
8, 1, 5

프로그램


# Single Trait Animal Model - Date : 2020-06-01

# Linear Models for the Prediction of Animal Breeding Values, 3rd Edtion

# Raphael Mrode

# Example 3.1 p37

# install.packages('MASS', dependencies = TRUE)
library(MASS)

# functions

# make design matrix 
desgn <- function(v) {
    if (is.numeric(v)) {
        va = v
        mrow = length(va)
        mcol = max(va)
    }
    if (is.character(v)) {
        vf = factor(v)
        # 각 수준의 인덱스 값을 저장
        va = as.numeric(vf)
        mrow = length(va)
        mcol = length(levels(vf))
    }
    
    # 0으로 채워진 X 준비
    X = matrix(data = c(0), nrow = mrow, ncol = mcol)
    
    for (i in 1:mrow) {
        ic = va[i]
        X[i, ic] = 1
    }
    return(X)
}

# function to make inverse of numerator relationship matrix
ainv = function(pedi) {
    n = nrow(pedi)
    Ainv = matrix(c(0), nrow = n, ncol = n)
    
    for (i in 1:n) {
        animal = pedi[i, 1]
        sire = pedi[i, 2]
        dam = pedi[i, 3]
        
        if (sire == 0 & dam == 0) {
            # both parents unknown
            alpha = 1
            Ainv[animal, animal] = alpha + Ainv[animal, animal]
        } else if (sire != 0 & dam == 0) {
            # sire known
            alpha = 4/3
            Ainv[animal, animal] = alpha + Ainv[animal, animal]
            Ainv[animal, sire] = -alpha/2 + Ainv[animal, sire]
            Ainv[sire, animal] = -alpha/2 + Ainv[sire, animal]
            Ainv[sire, sire] = alpha/4 + Ainv[sire, sire]
        } else if (sire == 0 & dam != 0) {
            # dam known
            alpha = 4/3
            Ainv[animal, animal] = alpha + Ainv[animal, animal]
            Ainv[animal, dam] = -alpha/2 + Ainv[animal, dam]
            Ainv[dam, animal] = -alpha/2 + Ainv[dam, animal]
            Ainv[dam, dam] = alpha/4 + Ainv[dam, dam]
        } else {
            # both parents known
            alpha = 2
            Ainv[animal, animal] = alpha + Ainv[animal, animal]
            Ainv[animal, sire] = -alpha/2 + Ainv[animal, sire]
            Ainv[sire, animal] = -alpha/2 + Ainv[sire, animal]
            Ainv[animal, dam] = -alpha/2 + Ainv[animal, dam]
            Ainv[dam, animal] = -alpha/2 + Ainv[dam, animal]
            Ainv[sire, sire] = alpha/4 + Ainv[sire, sire]
            Ainv[sire, dam] = alpha/4 + Ainv[sire, dam]
            Ainv[dam, sire] = alpha/4 + Ainv[dam, sire]
            Ainv[dam, dam] = alpha/4 + Ainv[dam, dam]
        }
    }
    return(Ainv)
}

# set working directory 
setwd("D:/temp/stam") 

# print working directory 
getwd()


# prdigree and data

# read pedigree : animal sire dam
pedi = read.table("stam_pedi.txt", header = TRUE, sep = ",", stringsAsFactors = FALSE)

# print
pedi

# read data : animal, sex, pre-weaning gain
data = read.table("stam_data.txt", header = TRUE, sep = ",", stringsAsFactors = FALSE)

# print
data

# variance component and ratio
sigma_a = 20
sigma_e = 40
alpha = sigma_e/sigma_a

# print
sigma_a
sigma_e
alpha

# design matrix

# design matrix of fixed effect
X = desgn(data[, 2])

# print
X

# number of levels of fixed effect
no_lvl_f = ncol(X)

# print
no_lvl_f

# desing matrix of animal effect
Z = desgn(data[, 1])

# print
Z

# number of animals
no_animals = ncol(Z)

# print
no_animals


# observation
y = data[, 3]

# print
y

# inverse matrix of NRM
ai = ainv(pedi)

# print
ai

# LHS, RHS

# LHS construction
LHS = rbind(
        cbind(t(X) %*% X, t(X) %*% Z), 
        cbind(t(Z) %*% X, t(Z) %*% Z + ai * alpha))

# print
LHS

# RHS contruction
RHS = rbind(t(X) %*% y, t(Z) %*% y)

# print
RHS

# Solutions

# generalized inverse of LHS
gi_LHS = ginv(LHS)

# print
gi_LHS

# solution
sol = gi_LHS %*% RHS

# print
sol

# solutions for fixed effects and animal effects
sol_fixed = sol[1 : no_lvl_f]
sol_animal = sol[(no_lvl_f + 1) : (no_lvl_f + no_animals)]

# 확인
sol_fixed
sol_animal

# reliability(r2), accuracy(r), standard error of prediction(SEP)

# diagonal elements of the generalized inverse of LHS for animal equation
d_ani = diag(gi_LHS[(no_lvl_f + 1) : (no_lvl_f + no_animals), (no_lvl_f + 1) : (no_lvl_f + no_animals)])

# print
d_ani

# reliability
rel = 1 - d_ani * alpha

# print
rel

# accuracy
acc = sqrt(rel)

# print
acc

# standard error of prediction(SEP)
sep = sqrt(d_ani * sigma_e)

# 확인
sep

# partitioning of breeding values

# yield deviation
yd1 = ginv(t(Z) %*% Z) %*% t(Z) %*% (y - X %*% sol_fixed)

# print
yd1

# numerator of n2
a2 = diag(t(Z) %*% Z)

# print
a2


# Parents average, progeny contribution

# parents avearge
pa1 = rep(0, no_animals)

# progeny contribution numerator
pc0 = rep(0, no_animals)

# numerator of n3, denominator of progeny contribution
a3 = rep(0, no_animals)

# numerator of n1
a1 = rep(0, no_animals)

# looping pedi
for (i in 1 : no_animals) {

    sire = pedi[i, 2]
    dam = pedi[i, 3]
    
    if (sire == 0 & dam == 0) {
        # both parents unknown
        # PA
        a1[i] = 1 * alpha
        
    } else if (sire != 0 & dam == 0) {
        # 개체의 부만 알고 있을 경우
        
        # PA 
        a1[i] = 4/3 * alpha
        pa1[i] = sol_animal[sire]/2
        
        # PC for sire
        a3[sire] = a3[sire] + 0.5 * alpha * (2/3)
        pc0[sire] = pc0[sire] + 0.5 * alpha * (2/3) * (2 * sol_animal[i])
        
    } else if (sire == 0 & dam != 0) {
        # dam known
        
        # PA 
        a1[i] = 4/3 * alpha
        pa1[i] = sol_animal[dam]/2
        
        # PC for dam
        a3[dam] = a3[dam] + 0.5 * alpha * (2/3)
        pc0[dam] = pc0[dam] + 0.5 * alpha * (2/3) * (2 * sol_animal[i])
        
    } else {
        # both parents known
        
        # PA 
        a1[i] = 2 * alpha
        pa1[i] = (sol_animal[sire] + sol_animal[dam])/2
        
        # PC for sire
        a3[sire] = a3[sire] + 0.5 * alpha
        pc0[sire] = pc0[sire] + 0.5 * alpha * (2 * sol_animal[i] - sol_animal[dam])

        # PC for dam
        a3[dam] = a3[dam] + 0.5 * alpha
        pc0[dam] = pc0[dam] + 0.5 * alpha * (2 * sol_animal[i] - sol_animal[sire])

    }
}

# print
a1
pa1
a3
pc0

# denominator of n1, n2, n3, diagonal of animals in LHS
n_nu = a1 + a2 + a3

# print
n_nu

# parents average fraction of breeding values
pa = pa1 * (a1 / n_nu)

# print
pa

# yield deviation fraction of breeding values
yd = yd1 * (a2 / n_nu)

# print
yd

# progeny contribution
pc1 = pc0 / a3
pc1[is.nan(pc1) == TRUE] = 0
pc1

# progeny contribution fraction of breeding values
pc =  pc1 * (a3 / n_nu)

# print
pc

# Progeny(Daughter) Yield Deviation(PYD, DYD)

# n2 of progeny
n2prog = a2 / (a1 + a2)

# print
n2prog

# numerator of dyd : summation of u of progeny * n2 of progeny * (2 * YD - bv of mate)
dyd_n = rep(0, no_animals)
# denominator of dyd : summation of u of progeny * n2 of progeny
dyd_d = rep(0, no_animals)

# looping pedi
for (i in 1 : no_animals) {
  
    sire = pedi[i, 2]
    dam = pedi[i, 3]
  
    if (sire == 0 & dam == 0) {
        # both parents unknown

    } else if (sire != 0 & dam == 0) {
        # 개체의 부만 알고 있을 경우
    
        # dyd_n
        dyd_n[sire] = dyd_n[sire] + n2prog[i] * 2 / 3 * 2 * yd1[i]
        # dyd_d 
        dyd_d[sire] = dyd_d[sire] + n2prog[i] * 2 / 3

    } else if (sire == 0 & dam != 0) {
        # dam known
    
        # dyd_n
        dyd_n[dam] = dyd_n[dam] + n2prog[i] * 2 / 3 * 2 * yd1[i]
        # dyd_d
        dyd_d[dam] = dyd_d[dam] + n2prog[i] * 2 / 3
    
    } else {
        # both parents known
    
        # dyd_n
        dyd_n[sire] = dyd_n[sire] + n2prog[i] * (2 * yd1[i] - sol_animal[dam])
        dyd_n[dam] = dyd_n[dam] + n2prog[i] * (2 * yd1[i] - sol_animal[sire])
    
        # dyd_d
        dyd_d[sire] = dyd_d[sire] + n2prog[i]
        dyd_d[dam] = dyd_d[dam] + n2prog[i]
    
    }
}

# print
dyd_n
dyd_d

# dyd
dyd = dyd_n / dyd_d
dyd[is.nan(dyd) == TRUE] = 0

# print
dyd

# breeding values and fractions
stam_result = data.frame(animal = pedi[,1], animal_bv = sol_animal, rel = rel, acc = acc, sep = sep, pa = pa, yd = yd, pc = pc, sum_of_fr = pa + yd + pc, dyd = dyd)

# print
stam_result

# 파일로 출력, 분리자는 ",", 따옴표 없고 
output_filename = gsub("[ -]", "", paste("stam_result_", Sys.Date(), ".csv")) 
write.table(stam_result, output_filename, sep=",", row.names = FALSE, quote = FALSE)

실행 결과

> # Single Trait Animal Model - Date : 2020-06-01
> 
> # Linear Models for the Prediction of Animal Breeding Values, 3rd Edtion
> 
> # Raphael Mrode
> 
> # Example 3.1 p37
> 
> # install.packages('MASS', dependencies = TRUE)
> library(MASS)
> 
> # functions
> 
> # make design matrix 
> desgn <- function(v) {
+     if (is.numeric(v)) {
+         va = v
+         mrow = length(va)
+         mcol = max(va)
+     }
+     if (is.character(v)) {
+         vf = factor(v)
+         # 각 수준의 인덱스 값을 저장
+         va = as.numeric(vf)
+         mrow = length(va)
+         mcol = length(levels(vf))
+     }
+     
+     # 0으로 채워진 X 준비
+     X = matrix(data = c(0), nrow = mrow, ncol = mcol)
+     
+     for (i in 1:mrow) {
+         ic = va[i]
+         X[i, ic] = 1
+     }
+     return(X)
+ }
> 
> # function to make inverse of numerator relationship matrix
> ainv = function(pedi) {
+     n = nrow(pedi)
+     Ainv = matrix(c(0), nrow = n, ncol = n)
+     
+     for (i in 1:n) {
+         animal = pedi[i, 1]
+         sire = pedi[i, 2]
+         dam = pedi[i, 3]
+         
+         if (sire == 0 & dam == 0) {
+             # both parents unknown
+             alpha = 1
+             Ainv[animal, animal] = alpha + Ainv[animal, animal]
+         } else if (sire != 0 & dam == 0) {
+             # sire known
+             alpha = 4/3
+             Ainv[animal, animal] = alpha + Ainv[animal, animal]
+             Ainv[animal, sire] = -alpha/2 + Ainv[animal, sire]
+             Ainv[sire, animal] = -alpha/2 + Ainv[sire, animal]
+             Ainv[sire, sire] = alpha/4 + Ainv[sire, sire]
+         } else if (sire == 0 & dam != 0) {
+             # dam known
+             alpha = 4/3
+             Ainv[animal, animal] = alpha + Ainv[animal, animal]
+             Ainv[animal, dam] = -alpha/2 + Ainv[animal, dam]
+             Ainv[dam, animal] = -alpha/2 + Ainv[dam, animal]
+             Ainv[dam, dam] = alpha/4 + Ainv[dam, dam]
+         } else {
+             # both parents known
+             alpha = 2
+             Ainv[animal, animal] = alpha + Ainv[animal, animal]
+             Ainv[animal, sire] = -alpha/2 + Ainv[animal, sire]
+             Ainv[sire, animal] = -alpha/2 + Ainv[sire, animal]
+             Ainv[animal, dam] = -alpha/2 + Ainv[animal, dam]
+             Ainv[dam, animal] = -alpha/2 + Ainv[dam, animal]
+             Ainv[sire, sire] = alpha/4 + Ainv[sire, sire]
+             Ainv[sire, dam] = alpha/4 + Ainv[sire, dam]
+             Ainv[dam, sire] = alpha/4 + Ainv[dam, sire]
+             Ainv[dam, dam] = alpha/4 + Ainv[dam, dam]
+         }
+     }
+     return(Ainv)
+ }
> 
> # set working directory 
> setwd("D:/temp/stam") 
> 
> # print working directory 
> getwd()
[1] "D:/temp/stam"
> 
> 
> # prdigree and data
> 
> # read pedigree : animal sire dam
> pedi = read.table("stam_pedi.txt", header = TRUE, sep = ",", stringsAsFactors = FALSE)
> 
> # print
> pedi
  animal sire dam
1      1    0   0
2      2    0   0
3      3    0   0
4      4    1   0
5      5    3   2
6      6    1   2
7      7    4   5
8      8    3   6
> 
> # read data : animal, sex, pre-weaning gain
> data = read.table("stam_data.txt", header = TRUE, sep = ",", stringsAsFactors = FALSE)
> 
> # print
> data
  animal sex gain
1      4   1  4.5
2      5   2  2.9
3      6   2  3.9
4      7   1  3.5
5      8   1  5.0
> 
> # variance component and ratio
> sigma_a = 20
> sigma_e = 40
> alpha = sigma_e/sigma_a
> 
> # print
> sigma_a
[1] 20
> sigma_e
[1] 40
> alpha
[1] 2
> 
> # design matrix
> 
> # design matrix of fixed effect
> X = desgn(data[, 2])
> 
> # print
> X
     [,1] [,2]
[1,]    1    0
[2,]    0    1
[3,]    0    1
[4,]    1    0
[5,]    1    0
> 
> # number of levels of fixed effect
> no_lvl_f = ncol(X)
> 
> # print
> no_lvl_f
[1] 2
> 
> # desing matrix of animal effect
> Z = desgn(data[, 1])
> 
> # print
> Z
     [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8]
[1,]    0    0    0    1    0    0    0    0
[2,]    0    0    0    0    1    0    0    0
[3,]    0    0    0    0    0    1    0    0
[4,]    0    0    0    0    0    0    1    0
[5,]    0    0    0    0    0    0    0    1
> 
> # number of animals
> no_animals = ncol(Z)
> 
> # print
> no_animals
[1] 8
> 
> 
> # observation
> y = data[, 3]
> 
> # print
> y
[1] 4.5 2.9 3.9 3.5 5.0
> 
> # inverse matrix of NRM
> ai = ainv(pedi)
> 
> # print
> ai
           [,1] [,2] [,3]       [,4] [,5] [,6] [,7] [,8]
[1,]  1.8333333  0.5  0.0 -0.6666667  0.0 -1.0    0    0
[2,]  0.5000000  2.0  0.5  0.0000000 -1.0 -1.0    0    0
[3,]  0.0000000  0.5  2.0  0.0000000 -1.0  0.5    0   -1
[4,] -0.6666667  0.0  0.0  1.8333333  0.5  0.0   -1    0
[5,]  0.0000000 -1.0 -1.0  0.5000000  2.5  0.0   -1    0
[6,] -1.0000000 -1.0  0.5  0.0000000  0.0  2.5    0   -1
[7,]  0.0000000  0.0  0.0 -1.0000000 -1.0  0.0    2    0
[8,]  0.0000000  0.0 -1.0  0.0000000  0.0 -1.0    0    2
> 
> # LHS, RHS
> 
> # LHS construction
> LHS = rbind(
+         cbind(t(X) %*% X, t(X) %*% Z), 
+         cbind(t(Z) %*% X, t(Z) %*% Z + ai * alpha))
> 
> # print
> LHS
      [,1] [,2]      [,3] [,4] [,5]      [,6] [,7] [,8] [,9] [,10]
 [1,]    3    0  0.000000    0    0  1.000000    0    0    1     1
 [2,]    0    2  0.000000    0    0  0.000000    1    1    0     0
 [3,]    0    0  3.666667    1    0 -1.333333    0   -2    0     0
 [4,]    0    0  1.000000    4    1  0.000000   -2   -2    0     0
 [5,]    0    0  0.000000    1    4  0.000000   -2    1    0    -2
 [6,]    1    0 -1.333333    0    0  4.666667    1    0   -2     0
 [7,]    0    1  0.000000   -2   -2  1.000000    6    0   -2     0
 [8,]    0    1 -2.000000   -2    1  0.000000    0    6    0    -2
 [9,]    1    0  0.000000    0    0 -2.000000   -2    0    5     0
[10,]    1    0  0.000000    0   -2  0.000000    0   -2    0     5
> 
> # RHS contruction
> RHS = rbind(t(X) %*% y, t(Z) %*% y)
> 
> # print
> RHS
      [,1]
 [1,] 13.0
 [2,]  6.8
 [3,]  0.0
 [4,]  0.0
 [5,]  0.0
 [6,]  4.5
 [7,]  2.9
 [8,]  3.9
 [9,]  3.5
[10,]  5.0
> 
> # Solutions
> 
> # generalized inverse of LHS
> gi_LHS = ginv(LHS)
> 
> # print
> gi_LHS
             [,1]        [,2]         [,3]         [,4]        [,5]        [,6]        [,7]        [,8]       [,9]
 [1,]  0.59556097  0.15730213 -0.164119413 -0.083624565 -0.13059083 -0.26455749 -0.14827804 -0.16632621 -0.2842464
 [2,]  0.15730213  0.80245865 -0.132863260 -0.241250738 -0.11196840 -0.08730803 -0.29891465 -0.30600266 -0.1859495
 [3,] -0.16411941 -0.13286326  0.471094211  0.006928037  0.03264668  0.21954371  0.04495225  0.22077427  0.1386223
 [4,] -0.08362457 -0.24125074  0.006928037  0.492095721 -0.01030797  0.02039033  0.23734577  0.24515571  0.1198194
 [5,] -0.13059083 -0.11196840  0.032646682 -0.010307967  0.45645878  0.04812709  0.20132326  0.02261354  0.1258983
 [6,] -0.26455749 -0.08730803  0.219543709  0.020390333  0.04812709  0.42768015  0.04704420  0.12757186  0.2428012
 [7,] -0.14827804 -0.29891465  0.044952254  0.237345770  0.20132326  0.04704420  0.42810675  0.16972255  0.2197160
 [8,] -0.16632621 -0.30600266  0.220774267  0.245155707  0.02261354  0.12757186  0.16972255  0.44228277  0.1521830
 [9,] -0.28424641 -0.18594950  0.138622268  0.119819354  0.12589831  0.24280124  0.21971599  0.15218301  0.4418562
[10,] -0.23787901 -0.19864885  0.134192262  0.110664009  0.21774710  0.12319108  0.17807393  0.21922376  0.1680818
           [,10]
 [1,] -0.2378790
 [2,] -0.1986488
 [3,]  0.1341923
 [4,]  0.1106640
 [5,]  0.2177471
 [6,]  0.1231911
 [7,]  0.1780739
 [8,]  0.2192238
 [9,]  0.1680818
[10,]  0.4223641
> 
> # solution
> sol = gi_LHS %*% RHS
> 
> # print
> sol
              [,1]
 [1,]  4.358502330
 [2,]  3.404430006
 [3,]  0.098444576
 [4,] -0.018770099
 [5,] -0.041084203
 [6,] -0.008663123
 [7,] -0.185732099
 [8,]  0.176872088
 [9,] -0.249458555
[10,]  0.182614688
> 
> # solutions for fixed effects and animal effects
> sol_fixed = sol[1 : no_lvl_f]
> sol_animal = sol[(no_lvl_f + 1) : (no_lvl_f + no_animals)]
> 
> # 확인
> sol_fixed
[1] 4.358502 3.404430
> sol_animal
[1]  0.098444576 -0.018770099 -0.041084203 -0.008663123 -0.185732099  0.176872088 -0.249458555  0.182614688
> 
> # reliability(r2), accuracy(r), standard error of prediction(SEP)
> 
> # diagonal elements of the generalized inverse of LHS for animal equation
> d_ani = diag(gi_LHS[(no_lvl_f + 1) : (no_lvl_f + no_animals), (no_lvl_f + 1) : (no_lvl_f + no_animals)])
> 
> # print
> d_ani
[1] 0.4710942 0.4920957 0.4564588 0.4276802 0.4281067 0.4422828 0.4418562 0.4223641
> 
> # reliability
> rel = 1 - d_ani * alpha
> 
> # print
> rel
[1] 0.05781158 0.01580856 0.08708243 0.14463969 0.14378651 0.11543447 0.11628766 0.15527171
> 
> # accuracy
> acc = sqrt(rel)
> 
> # print
> acc
[1] 0.2404404 0.1257321 0.2950973 0.3803153 0.3791919 0.3397565 0.3410098 0.3940453
> 
> # standard error of prediction(SEP)
> sep = sqrt(d_ani * sigma_e)
> 
> # 확인
> sep
[1] 4.340941 4.436646 4.272979 4.136086 4.138148 4.206104 4.204075 4.110300
> 
> # partitioning of breeding values
> 
> # yield deviation
> yd1 = ginv(t(Z) %*% Z) %*% t(Z) %*% (y - X %*% sol_fixed)
> 
> # print
> yd1
           [,1]
[1,]  0.0000000
[2,]  0.0000000
[3,]  0.0000000
[4,]  0.1414977
[5,] -0.5044300
[6,]  0.4955700
[7,] -0.8585023
[8,]  0.6414977
> 
> # numerator of n2
> a2 = diag(t(Z) %*% Z)
> 
> # print
> a2
[1] 0 0 0 1 1 1 1 1
> 
> 
> # Parents average, progeny contribution
> 
> # parents avearge
> pa1 = rep(0, no_animals)
> 
> # progeny contribution numerator
> pc0 = rep(0, no_animals)
> 
> # numerator of n3, denominator of progeny contribution
> a3 = rep(0, no_animals)
> 
> # numerator of n1
> a1 = rep(0, no_animals)
> 
> # looping pedi
> for (i in 1 : no_animals) {
+ 
+     sire = pedi[i, 2]
+     dam = pedi[i, 3]
+     
+     if (sire == 0 & dam == 0) {
+         # both parents unknown
+         # PA
+         a1[i] = 1 * alpha
+         
+     } else if (sire != 0 & dam == 0) {
+         # 개체의 부만 알고 있을 경우
+         
+         # PA 
+         a1[i] = 4/3 * alpha
+         pa1[i] = sol_animal[sire]/2
+         
+         # PC for sire
+         a3[sire] = a3[sire] + 0.5 * alpha * (2/3)
+         pc0[sire] = pc0[sire] + 0.5 * alpha * (2/3) * (2 * sol_animal[i])
+         
+     } else if (sire == 0 & dam != 0) {
+         # dam known
+         
+         # PA 
+         a1[i] = 4/3 * alpha
+         pa1[i] = sol_animal[dam]/2
+         
+         # PC for dam
+         a3[dam] = a3[dam] + 0.5 * alpha * (2/3)
+         pc0[dam] = pc0[dam] + 0.5 * alpha * (2/3) * (2 * sol_animal[i])
+         
+     } else {
+         # both parents known
+         
+         # PA 
+         a1[i] = 2 * alpha
+         pa1[i] = (sol_animal[sire] + sol_animal[dam])/2
+         
+         # PC for sire
+         a3[sire] = a3[sire] + 0.5 * alpha
+         pc0[sire] = pc0[sire] + 0.5 * alpha * (2 * sol_animal[i] - sol_animal[dam])
+ 
+         # PC for dam
+         a3[dam] = a3[dam] + 0.5 * alpha
+         pc0[dam] = pc0[dam] + 0.5 * alpha * (2 * sol_animal[i] - sol_animal[sire])
+ 
+     }
+ }
> 
> # print
> a1
[1] 2.000000 2.000000 2.000000 2.666667 4.000000 4.000000 4.000000 4.000000
> pa1
[1]  0.00000000  0.00000000  0.00000000  0.04922229 -0.02992715  0.03983724 -0.09719761  0.06789394
> a3
[1] 1.666667 2.000000 2.000000 1.000000 1.000000 1.000000 0.000000 0.000000
> pc0
[1]  0.3609634 -0.0750804 -0.1643368 -0.3131850 -0.4902540  0.4063136  0.0000000  0.0000000
> 
> # denominator of n1, n2, n3, diagonal of animals in LHS
> n_nu = a1 + a2 + a3
> 
> # print
> n_nu
[1] 3.666667 4.000000 4.000000 4.666667 6.000000 6.000000 5.000000 5.000000
> 
> # parents average fraction of breeding values
> pa = pa1 * (a1 / n_nu)
> 
> # print
> pa
[1]  0.00000000  0.00000000  0.00000000  0.02812702 -0.01995143  0.02655816 -0.07775809  0.05431515
> 
> # yield deviation fraction of breeding values
> yd = yd1 * (a2 / n_nu)
> 
> # print
> yd
            [,1]
[1,]  0.00000000
[2,]  0.00000000
[3,]  0.00000000
[4,]  0.03032093
[5,] -0.08407167
[6,]  0.08259500
[7,] -0.17170047
[8,]  0.12829953
> 
> # progeny contribution
> pc1 = pc0 / a3
> pc1[is.nan(pc1) == TRUE] = 0
> pc1
[1]  0.21657807 -0.03754020 -0.08216841 -0.31318501 -0.49025399  0.40631358  0.00000000  0.00000000
> 
> # progeny contribution fraction of breeding values
> pc =  pc1 * (a3 / n_nu)
> 
> # print
> pc
[1]  0.09844458 -0.01877010 -0.04108420 -0.06711107 -0.08170900  0.06771893  0.00000000  0.00000000
> 
> # Progeny(Daughter) Yield Deviation(PYD, DYD)
> 
> # n2 of progeny
> n2prog = a2 / (a1 + a2)
> 
> # print
> n2prog
[1] 0.0000000 0.0000000 0.0000000 0.2727273 0.2000000 0.2000000 0.2000000 0.2000000
> 
> # numerator of dyd : summation of u of progeny * n2 of progeny * (2 * YD - bv of mate)
> dyd_n = rep(0, no_animals)
> # denominator of dyd : summation of u of progeny * n2 of progeny
> dyd_d = rep(0, no_animals)
> 
> # looping pedi
> for (i in 1 : no_animals) {
+   
+     sire = pedi[i, 2]
+     dam = pedi[i, 3]
+   
+     if (sire == 0 & dam == 0) {
+         # both parents unknown
+ 
+     } else if (sire != 0 & dam == 0) {
+         # 개체의 부만 알고 있을 경우
+     
+         # dyd_n
+         dyd_n[sire] = dyd_n[sire] + n2prog[i] * 2 / 3 * 2 * yd1[i]
+         # dyd_d 
+         dyd_d[sire] = dyd_d[sire] + n2prog[i] * 2 / 3
+ 
+     } else if (sire == 0 & dam != 0) {
+         # dam known
+     
+         # dyd_n
+         dyd_n[dam] = dyd_n[dam] + n2prog[i] * 2 / 3 * 2 * yd1[i]
+         # dyd_d
+         dyd_d[dam] = dyd_d[dam] + n2prog[i] * 2 / 3
+     
+     } else {
+         # both parents known
+     
+         # dyd_n
+         dyd_n[sire] = dyd_n[sire] + n2prog[i] * (2 * yd1[i] - sol_animal[dam])
+         dyd_n[dam] = dyd_n[dam] + n2prog[i] * (2 * yd1[i] - sol_animal[sire])
+     
+         # dyd_d
+         dyd_d[sire] = dyd_d[sire] + n2prog[i]
+         dyd_d[dam] = dyd_d[dam] + n2prog[i]
+     
+     }
+ }
> 
> # print
> dyd_n
[1]  0.25343572 -0.01501608  0.02320667 -0.30625451 -0.34166831  0.26481591  0.00000000  0.00000000
> dyd_d
[1] 0.3818182 0.4000000 0.4000000 0.2000000 0.2000000 0.2000000 0.0000000 0.0000000
> 
> # dyd
> dyd = dyd_n / dyd_d
> dyd[is.nan(dyd) == TRUE] = 0
> 
> # print
> dyd
[1]  0.66376021 -0.03754020  0.05801667 -1.53127256 -1.70834154  1.32407954  0.00000000  0.00000000
> 
> # breeding values and fractions
> stam_result = data.frame(animal = pedi[,1], animal_bv = sol_animal, rel = rel, acc = acc, sep = sep, pa = pa, yd = yd, pc = pc, sum_of_fr = pa + yd + pc, dyd = dyd)
> 
> # print
> stam_result
  animal    animal_bv        rel       acc      sep          pa          yd          pc    sum_of_fr         dyd
1      1  0.098444576 0.05781158 0.2404404 4.340941  0.00000000  0.00000000  0.09844458  0.098444576  0.66376021
2      2 -0.018770099 0.01580856 0.1257321 4.436646  0.00000000  0.00000000 -0.01877010 -0.018770099 -0.03754020
3      3 -0.041084203 0.08708243 0.2950973 4.272979  0.00000000  0.00000000 -0.04108420 -0.041084203  0.05801667
4      4 -0.008663123 0.14463969 0.3803153 4.136086  0.02812702  0.03032093 -0.06711107 -0.008663123 -1.53127256
5      5 -0.185732099 0.14378651 0.3791919 4.138148 -0.01995143 -0.08407167 -0.08170900 -0.185732099 -1.70834154
6      6  0.176872088 0.11543447 0.3397565 4.206104  0.02655816  0.08259500  0.06771893  0.176872088  1.32407954
7      7 -0.249458555 0.11628766 0.3410098 4.204075 -0.07775809 -0.17170047  0.00000000 -0.249458555  0.00000000
8      8  0.182614688 0.15527171 0.3940453 4.110300  0.05431515  0.12829953  0.00000000  0.182614688  0.00000000
> 
> # 파일로 출력, 분리자는 ",", 따옴표 없고 
> output_filename = gsub("[ -]", "", paste("stam_result_", Sys.Date(), ".csv")) 
> write.table(stam_result, output_filename, sep=",", row.names = FALSE, quote = FALSE)
>

결과 파일 - stam_result_20200601.csv

animal,animal_bv,rel,acc,sep,pa,yd,pc,sum_of_fr,dyd
1,0.0984445757038791,0.0578115770821025,0.24044038155456,4.34094096462483,0,0,0.0984445757038788,0.0984445757038788,0.66376020776493
2,-0.0187700991008759,0.0158085581151151,0.125732088645322,4.43664612491212,0,0,-0.0187700991008722,-0.0187700991008722,-0.037540198201747
3,-0.0410842029270846,0.087082430924723,0.295097324496043,4.27297921613311,0,0,-0.0410842029270835,-0.0410842029270835,0.0580166699481492
4,-0.00866312266194402,0.144639692852924,0.38031525456248,4.13608584811069,0.0281270216296797,0.0303209293167937,-0.0671110736084169,-0.00866312266194338,-1.53127256021528
5,-0.18573209949465,0.143786506530158,0.379191912532635,4.13814812076572,-0.0199514340093202,-0.0840716676511125,-0.0817089978342196,-0.185732099494652,-1.70834153704798
6,0.176872087681303,0.11543446872744,0.33975648445238,4.20610397225879,0.0265581588676677,0.0825949990155542,0.0677189297980794,0.176872087681301,1.32407954321716
7,-0.249458554833631,0.116287655050207,0.341009757998517,4.20407503489125,-0.0777580888626375,-0.171700465970993,0,-0.24945855483363,0
8,0.182614687930696,0.155271707028943,0.39404531088308,4.11029997195109,0.0543151539016872,0.128299534029007,0,0.182614687930695,0

저작자표시

'Animal Breeding > R for Genetic Evaluation' 카테고리의 다른 글

R을 이용한 반복 모형(Repeatability Model)의 육종가 구하기 프로그래밍 (0)	2020.06.14
R을 이용하여 반복 모형(repeatability model)에 대한 blupf90 분석 결과 정리하기 (0)	2020.06.13
R을 이용하여 단형질 개체 모형(single trait animal model)에 대한 blupf90 분석 결과 정리하기 (0)	2020.06.03
R을 이용한 혈통 파일 처리 (0)	2020.04.10
R로 단형질 임의회귀모형의 해 구하기 (1)	2011.06.09

Maximum Likelihood Estimation(MLE)

투정이 2020. 4. 24. 16:08

2020. 4. 24. 16:08

Likelihood는 각 사건이 일어날 확률을 모두 곱한 것이다.

예를 들어 사람의 체중을 측정하였을 때 사람의 체중은 정규분포를 따르므로 평균과 분산이 알려져 있는 집단에서 60kg이 나올 확률은 정규 분포의 probability density function에 60kg을 넣으면 된다.

체중을 60, 65, 70 이렇게 관측하였을 때 MLE로 평균과 분산을 추정한다는 것은 60의 확률, 65의 확률, 70의 확률을 구해서 곱한다는 것이다. 이 확률에는 우리가 추정해야 할 미지의 평균과 분산이 포함되어 있을 것이다.

이렇게 각각의 확률을 구해서 곱한 것을 Likelihood라고 한다. Maximum Likelihood Estimation이라는 것은 어떤 평균과 분산을 취할 때 이 Likelihood가 최대(또는 최소)가 될 것인가라는 문제를 푸는 것이다. 즉 우리가 평균과 분산은 모르지만 어떤 집단에서 체중을 관측하였다. 이러한 관측을 할 수 있는 확률(이게 Likelihood)이 최대가 되게 하는 평균과 분산을 찾는 것이 maximun likelihood estimation이다.

MLE를 하기 위해서는 그러므로 분포 즉 확률 밀도 함수(probability density function)를 알고 있어야 한다. 많은 관측치가 정규 분포하는 집단에서 관측되므로 정규 분포하는 집단에서 관측치를 구하였을 경우 평균과 분산에 대하여 MLE하는 것을 설명한다.

각각의 확률을 곱할 경우 매우 다루기 힘든 형태가 되는데, 어차피 최대, 최소와 관련된 문제는 미분을 해서 0로 놓는데 로그를 한 후에 미분을 해서 0으로 놓아도 같은 결과가 나온다. 그래서 각각의 확률을 곱하고(likelihood), 로그를 먼저 한 후(log)에 미분(maximum)을 한다.

결론적으로 MLE로 구한 평균이나 산술평균이나 같고, 분산도 마찬가지이다. 하지만 log-likelihood는 다양한 분야에서 사용되므로 알아 두는게 좋다.

다변량 정규분포의 Maximum Likelihood Estimation

위에서 체중에 대해서 얘기하였는데, 체중과 키를 동시에 다룬다면 다변량 정규 분포를 다루는 것이다.

우선 다변량 정규분포 함수(probability density function)가 왜 그렇게 생겼는지 알아 보고, 다변량 정규분포 함수의 Likelihood를 구하고, 평균과 분산에 대한 Maximum Likelihood Estimation을 할 것이다.

다변량 정규분포 함수를 유도(derivation of multivariate normal distribution)해 보자. 다변량 표준 정규 분포를 유도하고, 선형 변환을 이용하여 다변량 정규분포 함수를 유도한다.

다변량 정규 분포의 likelihood를 구하고, 로그를 취하고, 최대화하는 mean과 variance를 구해 보자.

행렬을 미분할 때 필요한 몇 가지 공식과 합성 함수의 미분법도 같이 써 주었다.

분산 성분 추정(Estimation of variance component)을 위해서 사용하는 REML(Restricted or Residual Maximum Likelihood)를 이해하기 위한 기초자료로 Maximum Likelihood Estimation을 정리하였다.

저작자표시

'ETC' 카테고리의 다른 글

PostgreSQL psql을 이용한 role, database, table 생성 (0)	2020.07.27
The derivative of a determinant(행렬의 행렬식의 도함수, 미분) (1)	2020.05.21
회귀분석 - Piecewise regression (0)	2020.03.01
회귀분석 - 가변수 (0)	2020.03.01
회귀분석 - 일반화 회귀분석 (0)	2020.03.01

R을 이용한 혈통 파일 처리

투정이 2020. 4. 10. 16:32

2020. 4. 10. 16:32

R을 이용한 혈통 파일 처리

혈통 파일을 처리한다는 것은 혈통 파일의 오류를 찾아내고, 혈연 계수 행렬(Numerator Relationship Matrix, NRM)을 구하거나, NRM의 역행렬을 구하거나, 근교계수 등을 구하는 것을 말한다.

1) 혈통 파일의 처리는 전체 혈통에서 필요한 혈통을 추출하는데서 시작한다. 물론 전체 혈통 파일을 처리할 수도 있으나 관심이 가는 개체의 조상을 추적하면 다루어야 할 개체수를 줄일 수 있다.(Trace) 2) 다음으로 이렇게 추출한 개체들을 조상에서 자손까지 정렬하여야 한다. 생일이 있으면 쉽게 할 수 있으나 생일을 신뢰하기 어렵거나 생일이 아예 없는 경우도 있다. 생일이 없는 경우 선조의 세대를 개체의 세대보다 높게 매기고 세대를 정렬한다.(Stack) 3) 다음으로 계산의 편리를 위하여 개체의 이름을 일련번호로 바꾼다.(Renumber) 4) 이렇게 일련번호로 된 혈통을 가지고 NRM, NRM의 역행렬, 근교계수 등을 구한다.

자세한 이론에 대해서는 농촌진흥청에서 나온 “동물육종을 위한 혈통분석 알고리듬과 프로그래밍”을 참고하기 바란다.

http://lib.rda.go.kr/search/mediaView.do?mets_no=000000010979

■ 전체혈통

animal,sire,dam
a1,0,0
a2,0,0
a3,a1,a2
a6,a5,a2
a9,0,0
a10,0,0
a11,0,0
a14,0,0
a12,a11,a10
a15,a13,a12
a16,a15,a14
a4,a1,0
a5,a4,a3
a8,a7,a4
a7,a5,a6
a13,a9,a6

■ Key animals

key
a3
a4
a5
a6

■ Trace Pedigree

- R 프로그램

# trace pedigree

# set working directory
setwd("D:/temp")

# print working directory
getwd()

# read all pedigree
pedi_01 <- read.table("pedi_all.txt", header = TRUE, sep = ",", stringsAsFactors = FALSE)

# print pedigree
dim(pedi_01)
head(pedi_01)
tail(pedi_01)

# number of animal
no_of_animals = nrow(pedi_01)
no_of_animals

# add flag column
pedi_02 = cbind(pedi_01, flag = rep(0, no_of_animals))

# print pedigree
dim(pedi_02)
head(pedi_02)
tail(pedi_02)

# read key animals
key_animals <- read.table("key_animals.txt", header = TRUE, sep = ",", stringsAsFactors = FALSE)

# print key animal
dim(key_animals)
head(key_animals)
tail(key_animals)

# no_of_keys
no_of_keys = nrow(key_animals)
no_of_keys

# process each animal of key animals
for (i in 1 : no_of_keys) {
    # key 파일의 개체를 temp에 넣기
    temp = data.frame(animal = c(key[i, "key"]), stringsAsFactors = FALSE)
    
    # temp를 읽어 내려가면서 처리하기
    # j가 temp 끝까지 처리 
    j = 1
    while (j <= nrow(temp)) {
        # 개체의 위치 저장
        index = which(pedi_02$animal == temp[j, "animal"])
        
        # key에 있는 개체를 pedigree에서 찾았다면
        if (length(index) != 0) {
            # 개체를 아직 처리하지 않았다면
            if (pedi_02[index, "flag"] == 0) {
                # peid_02에 개체를 처리했다고 표시(flag)
                pedi_02[index, "flag"] = 1
                
                # 개체의 아비 찾기
                sire = pedi_02[index, "sire"]
                
                # 아비가 있는 건가 아닌가
                if (sire != 0) {
                    # temp에 sire 추가
                    temp = rbind(temp, c(sire))
                }
                
                # 개체의 어미 찾기
                dam = pedi_02[index, "dam"]
                
                # 어미가 있는 건가 아닌가
                if (dam != 0) {
                    # temp에 dam 추가
                    temp = rbind(temp, c(dam))
                }
            }
        }
        # key에 있는 개체를 pedigree에서 찾지 못했다면
        else{
            print(paste("개체 ", temp[j, "animal"], "을 혈통에서 찾지 못함"))
        }
        
        j = j + 1
    }
    #print(temp)
}

# pedi_02의 flag가 1인 것만 추출 
pedi_03 = pedi_02[which(pedi_02$flag == 1),]

# print pedigree
dim(pedi_03)
head(pedi_03)
tail(pedi_03)

# animal, sire, dam만 추출
pedi_04 = pedi_03[,c("animal", "sire", "dam")]

# print pedigree
dim(pedi_04)
head(pedi_04)
tail(pedi_04)

# 파일로 출력, 분리자는 ",", 따옴표 없고 
output_filename = gsub("[ -]", "", paste("pedi_", Sys.Date(), ".trc"))
write.table(pedi_04, output_filename, sep=",", row.names = FALSE, quote = FALSE)

- 실행 결과

> # trace pedigree
> 
> # set working directory
> setwd("D:/temp")
> 
> # print working directory
> getwd()
[1] "D:/temp"
> 
> # read all pedigree
> pedi_01 <- read.table("pedi_all.txt", header = TRUE, sep = ",", stringsAsFactors = FALSE)
> 
> # print pedigree
> dim(pedi_01)
[1] 16  3
> head(pedi_01)
  animal sire dam
1     a1    0   0
2     a2    0   0
3     a3   a1  a2
4     a6   a5  a2
5     a9    0   0
6    a10    0   0
> tail(pedi_01)
   animal sire dam
11    a16  a15 a14
12     a4   a1   0
13     a5   a4  a3
14     a8   a7  a4
15     a7   a5  a6
16    a13   a9  a6
> 
> # number of animal
> no_of_animals = nrow(pedi_01)
> no_of_animals
[1] 16
> 
> # add flag column
> pedi_02 = cbind(pedi_01, flag = rep(0, no_of_animals))
> 
> # print pedigree
> dim(pedi_02)
[1] 16  4
> head(pedi_02)
  animal sire dam flag
1     a1    0   0    0
2     a2    0   0    0
3     a3   a1  a2    0
4     a6   a5  a2    0
5     a9    0   0    0
6    a10    0   0    0
> tail(pedi_02)
   animal sire dam flag
11    a16  a15 a14    0
12     a4   a1   0    0
13     a5   a4  a3    0
14     a8   a7  a4    0
15     a7   a5  a6    0
16    a13   a9  a6    0
> 
> # read key animals
> key_animals <- read.table("key_animals.txt", header = TRUE, sep = ",", stringsAsFactors = FALSE)
> 
> # print key animal
> dim(key_animals)
[1] 4 1
> head(key_animals)
  key
1  a3
2  a4
3  a5
4  a6
> tail(key_animals)
  key
1  a3
2  a4
3  a5
4  a6
> 
> # no_of_keys
> no_of_keys = nrow(key_animals)
> no_of_keys
[1] 4
> 
> # process each animal of key animals
> for (i in 1 : no_of_keys) {
+     # key 파일의 개체를 temp에 넣기
+     temp = data.frame(animal = c(key_animals[i, "key"]), stringsAsFactors = FALSE)
+     
+     # temp를 읽어 내려가면서 처리하기
+     # j가 temp 끝까지 처리 
+     j = 1
+     while (j <= nrow(temp)) {
+         # 개체의 위치 저장
+         index = which(pedi_02$animal == temp[j, "animal"])
+         
+         # key에 있는 개체를 pedigree에서 찾았다면
+         if (length(index) != 0) {
+             # 개체를 아직 처리하지 않았다면
+             if (pedi_02[index, "flag"] == 0) {
+                 # peid_02에 개체를 처리했다고 표시(flag)
+                 pedi_02[index, "flag"] = 1
+                 
+                 # 개체의 아비 찾기
+                 sire = pedi_02[index, "sire"]
+                 
+                 # 아비가 있는 건가 아닌가
+                 if (sire != 0) {
+                     # temp에 sire 추가
+                     temp = rbind(temp, c(sire))
+                 }
+                 
+                 # 개체의 어미 찾기
+                 dam = pedi_02[index, "dam"]
+                 
+                 # 어미가 있는 건가 아닌가
+                 if (dam != 0) {
+                     # temp에 dam 추가
+                     temp = rbind(temp, c(dam))
+                 }
+             }
+         }
+         # key에 있는 개체를 pedigree에서 찾지 못했다면
+         else{
+             print(paste("개체 ", temp[j, "animal"], "을 혈통에서 찾지 못함"))
+         }
+         
+         j = j + 1
+     }
+     #print(temp)
+ }
> 
> # pedi_02의 flag가 1인 것만 추출 
> pedi_03 = pedi_02[which(pedi_02$flag == 1),]
> 
> # print pedigree
> dim(pedi_03)
[1] 6 4
> head(pedi_03)
   animal sire dam flag
1      a1    0   0    1
2      a2    0   0    1
3      a3   a1  a2    1
4      a6   a5  a2    1
12     a4   a1   0    1
13     a5   a4  a3    1
> tail(pedi_03)
   animal sire dam flag
1      a1    0   0    1
2      a2    0   0    1
3      a3   a1  a2    1
4      a6   a5  a2    1
12     a4   a1   0    1
13     a5   a4  a3    1
> 
> # animal, sire, dam만 추출
> pedi_04 = pedi_03[,c("animal", "sire", "dam")]
> 
> # print pedigree
> dim(pedi_04)
[1] 6 3
> head(pedi_04)
   animal sire dam
1      a1    0   0
2      a2    0   0
3      a3   a1  a2
4      a6   a5  a2
12     a4   a1   0
13     a5   a4  a3
> tail(pedi_04)
   animal sire dam
1      a1    0   0
2      a2    0   0
3      a3   a1  a2
4      a6   a5  a2
12     a4   a1   0
13     a5   a4  a3
> 
> # 파일로 출력, 분리자는 ",", 따옴표 없고 
> output_filename = gsub("[ -]", "", paste("pedi_", Sys.Date(), ".trc"))
> write.table(pedi_04, output_filename, sep=",", row.names = FALSE, quote = FALSE)

- 결과 파일(pedi_20200410.trc)

animal,sire,dam
a1,0,0
a2,0,0
a3,a1,a2
a6,a5,a2
a4,a1,0
a5,a4,a3

■ Stack Pedigree

- R 프로그램

# stack pedigree and check errors

# set working directory
setwd("D:/temp")

# print working directory
getwd()

# 혈통 자료 읽기
pedi_01 <- read.table("pedi_20200410.trc", header = TRUE, sep = ",", stringsAsFactors = FALSE)

# print pedigree
dim(pedi_01)
head(pedi_01)
tail(pedi_01)

# no_of_animals
no_of_animals = nrow(pedi_01)
no_of_animals

# copy pedigree
pedi_02 = pedi_01

# 아비 어미가 모두 개체에 나오게 하자
for (i in 1 : no_of_animals) {
    
    # 아비가 0이 아니고, 개체에 등장하지 않으면
    if (pedi_01[i, 2] != 0 && length(which(pedi_02$animal == pedi_01[i, 2])) == 0) {
        # add sire to animal
        pedi_02 = rbind(pedi_02, c(pedi_01[i, 2], 0, 0))
    }
    
    # 어미가 0이 아니고, 개체에 등장하지 않으면
    if (pedi_01[i, 3] != 0 && length(which(pedi_02$animal == pedi_01[i, 3])) == 0) {
        # add dam to animal
        pedi_02 = rbind(pedi_02, c(pedi_01[i, 3], 0, 0))
    }
}

# print pedigree
dim(pedi_02)
tail(pedi_02)

# 중복된 행 제거 
pedi_03 = unique(pedi_02)

# number of animals in pedi_02 and pedi_03
nrow(pedi_02)
no_of_animals = nrow(pedi_03)
no_of_animals

# dup, sex, gen 컬럼 추가하기
pedi_04 = cbind(pedi_03, dup = rep(0, no_of_animals), 
                sex = rep(0, no_of_animals), 
                asd = rep(0, no_of_animals), 
                gen = rep(1, no_of_animals))

# print pedigree
dim(pedi_04)
head(pedi_04)
tail(pedi_04)

################################
# check duplication error
################################

# sorting
pedi_04 = pedi_04[order(pedi_04$animal), ]
head(pedi_04)
tail(pedi_04)

for (i in 2 : no_of_animals) {
    
    # 이전 개체와 현재 개체가 같은 경우
    if (pedi_04[i - 1, "animal"] == pedi_04[i, "animal"]){
        
        # 아비가 다르거나 어미가 다르다면 
        if (pedi_04[i - 1, "sire"] != pedi_04[i, "sire"] 
            || pedi_04[i - 1, "dam"] != pedi_04[i, "dam"]){
            pedi_04[i - 1, "dup"] = 1
            pedi_04[i    , "dup"] = 1
            
            print(paste("개체 ", pedi_04[i, "animal"] , "가 두 번 등장, 혈통이 다름"))
        }
    }
}

# print animals with duplication error
pedi_04[which(pedi_04$dup == 1), ]

##########################
# check sex error
##########################
for (i in 1 : no_of_animals) {
    # sire와 dam을 저장
    sire = pedi_04[i, "sire"]
    dam  = pedi_04[i, "dam"]
    
    # sire가 있다면
    if (sire != 0) {
        # sire가 개체의 어디에 나오는가
        index = which(pedi_04$animal == sire)
        
        # sire의 sex가 결정되지 않았다면 수로 표시
        if (pedi_04[index, "sex"] == 0) {
            pedi_04[index, "sex"] = 2
        }
        # sire의 sex가 암으로 되어 있다면 error표시
        else if (pedi_04[index, "sex"] == 1) {
            pedi_04[index, "sex"] = 3
        }
    }
    
    # dma이 있다면
    if (dam != 0) {
        # dam이 개체의 어디에 나오는가
        index = which(pedi_04$animal == dam)
        
        # dam의 sex가 결정되지 않았다면 암으로 표시
        if (pedi_04[index, "sex"] == 0) {
            pedi_04[index, "sex"] = 1
        }
        # dam의 sex가 수로 되어 있다면 error 표시
        else if (pedi_04[index, "sex"] == 2) {
            pedi_04[index, "sex"] = 3
        }
    }
}

# print animals with sex error
pedi_04[which(pedi_04$sex == 3), ]


########################################
# check animal = sire = dam error
########################################
# animal = sire인 개체의 위치 저장
index = which(pedi_04$animal == pedi_04$sire)

# animal = sire인 개체가 있다면 error 추가 
if (length(index) != 0) {
    pedi_04[index, "asd"] = pedi_04[index, "asd"] + 1
}

# animal = dam인 개체의 위치 저장
index = which(pedi_04$animal == pedi_04$dam)

# animal = dam인 개체가 있다면 error 추가 
if (length(index) != 0) {
    pedi_04[index, "asd"] = pedi_04[index, "asd"] + 2
}

# sire와 dam이 0이 아니고, sire = dam인 개체의 위치 저장 
index = which(pedi_04$sire == pedi_04$dam & pedi_04$sire != 0 & pedi_04$dam != 0)

# sire = dam이면 error 추가 
if (length(index) != 0) {
    pedi_04[index, "asd"] = pedi_04[index, "asd"] + 4
}

# print animals with a = s = d error
pedi_04[which(pedi_04$asd != 0), ]

#########################
# check cycling error
#########################
for (i in 1 : 50) {
    for (j in 1 : no_of_animals) {
        sire = pedi_04[j, "sire"]
        dam  = pedi_04[j, "dam"]
        
        # 아비가 0이 아니고, 아비의 세대수가 개체의 세대수보다 같거나 작다면
        index = which(pedi_04$animal == sire)
        if ((sire != 0) && (pedi_04[index, "gen"] <= pedi_04[j, "gen"])) {
            pedi_04[index, "gen"] = pedi_04[j, "gen"] + 1
        }
        
        # 어미가 0이 아니고, 어미의 세대수가 개체의 세대수보다 같거나 작다면
        index = which(pedi_04$animal == dam)
        if ((dam != 0) && (pedi_04[index, "gen"] <= pedi_04[j, "gen"])) {
            pedi_04[index, "gen"] = pedi_04[j, "gen"] + 1
        }
    }
}

# sort by generation
pedi_04 = pedi_04[order(-pedi_04$gen),]

# print pedigree : if generation is very large, there may be an error
head(pedi_04, 10)
tail(pedi_04, 10)

# 파일로 출력, 분리자는 ",", 따옴표 없고 
output_filename = gsub("[ -]", "", paste("pedi_", Sys.Date(), ".stk"))
write.table(pedi_04, output_filename, sep=",", row.names = FALSE, quote = FALSE)

- 실행 결과

> # set working directory
> setwd("D:/temp")
> # print working directory
> getwd()
[1] "D:/temp"
> # 혈통 자료 읽기
> pedi_01 <- read.table("pedi_20200410.trc", header = TRUE, sep = ",", stringsAsFactors = FALSE)
> # print pedigree
> dim(pedi_01)
[1] 6 3
> head(pedi_01)
  animal sire dam
1     a1    0   0
2     a2    0   0
3     a3   a1  a2
4     a6   a5  a2
5     a4   a1   0
6     a5   a4  a3
> tail(pedi_01)
  animal sire dam
1     a1    0   0
2     a2    0   0
3     a3   a1  a2
4     a6   a5  a2
5     a4   a1   0
6     a5   a4  a3
> # no_of_animals
> no_of_animals = nrow(pedi_01)
> no_of_animals
[1] 6
> # copy pedigree
> pedi_02 = pedi_01
> # 아비 어미가 모두 개체에 나오게 하자
> for (i in 1 : no_of_animals) {
+     
+     # 아비가 0이 아니고, 개체에 등장하지 않으면
+     if (pedi_01[i, 2] != 0 && length(which(pedi_02$animal == pedi_01[i, 2])) == 0) {
+         # add sire to animal
+         pedi_02 = rbind(pedi_02, c(pedi_01[i, 2], 0, 0))
+     }
+     
+     # 어미가 0이 아니고, 개체에 등장하지 않으면
+     if (pedi_01[i, 3] != 0 && length(which(pedi_02$animal == pedi_01[i, 3])) == 0) {
+         # add dam to animal
+         pedi_02 = rbind(pedi_02, c(pedi_01[i, 3], 0, 0))
+     }
+ }
> # print pedigree
> dim(pedi_02)
[1] 6 3
> tail(pedi_02)
  animal sire dam
1     a1    0   0
2     a2    0   0
3     a3   a1  a2
4     a6   a5  a2
5     a4   a1   0
6     a5   a4  a3
> # 중복된 행 제거 
> pedi_03 = unique(pedi_02)
> # number of animals in pedi_02 and pedi_03
> nrow(pedi_02)
[1] 6
> no_of_animals = nrow(pedi_03)
> no_of_animals
[1] 6
> # dup, sex, gen 컬럼 추가하기
> pedi_04 = cbind(pedi_03, dup = rep(0, no_of_animals), 
+                 sex = rep(0, no_of_animals), 
+                 asd = rep(0, no_of_animals), 
+                 gen = rep(1, no_of_animals))
> # print pedigree
> dim(pedi_04)
[1] 6 7
> head(pedi_04)
  animal sire dam dup sex asd gen
1     a1    0   0   0   0   0   1
2     a2    0   0   0   0   0   1
3     a3   a1  a2   0   0   0   1
4     a6   a5  a2   0   0   0   1
5     a4   a1   0   0   0   0   1
6     a5   a4  a3   0   0   0   1
> tail(pedi_04)
  animal sire dam dup sex asd gen
1     a1    0   0   0   0   0   1
2     a2    0   0   0   0   0   1
3     a3   a1  a2   0   0   0   1
4     a6   a5  a2   0   0   0   1
5     a4   a1   0   0   0   0   1
6     a5   a4  a3   0   0   0   1
> # sorting
> pedi_04 = pedi_04[order(pedi_04$animal), ]
> head(pedi_04)
  animal sire dam dup sex asd gen
1     a1    0   0   0   0   0   1
2     a2    0   0   0   0   0   1
3     a3   a1  a2   0   0   0   1
5     a4   a1   0   0   0   0   1
6     a5   a4  a3   0   0   0   1
4     a6   a5  a2   0   0   0   1
> tail(pedi_04)
  animal sire dam dup sex asd gen
1     a1    0   0   0   0   0   1
2     a2    0   0   0   0   0   1
3     a3   a1  a2   0   0   0   1
5     a4   a1   0   0   0   0   1
6     a5   a4  a3   0   0   0   1
4     a6   a5  a2   0   0   0   1
> for (i in 2 : no_of_animals) {
+     
+     # 이전 개체와 현재 개체가 같은 경우
+     if (pedi_04[i - 1, "animal"] == pedi_04[i, "animal"]){
+         
+         # 아비가 다르거나 어미가 다르다면 
+         if (pedi_04[i - 1, "sire"] != pedi_04[i, "sire"] 
+             || pedi_04[i - 1, "dam"] != pedi_04[i, "dam"]){
+             pedi_04[i - 1, "dup"] = 1
+             pedi_04[i    , "dup"] = 1
+             
+             print(paste("개체 ", pedi_04[i, "animal"] , "가 두 번 등장, 혈통이 다름"))
+         }
+     }
+ }
> # print animals with duplication error
> pedi_04[which(pedi_04$dup == 1), ]
[1] animal sire   dam    dup    sex    asd    gen   
<0 행> <또는 row.names의 길이가 0입니다>
> ##########################
> # check sex error
> ##########################
> for (i in 1 : no_of_animals) {
+     # sire와 dam을 저장
+     sire = pedi_04[i, "sire"]
+     dam  = pedi_04[i, "dam"]
+     
+     # sire가 있다면
+     if (sire != 0) {
+         # sire가 개체의 어디에 나오는가
+         index = which(pedi_04$animal == sire)
+         
+         # sire의 sex가 결정되지 않았다면 수로 표시
+         if (pedi_04[index, "sex"] == 0) {
+             pedi_04[index, "sex"] = 2
+         }
+         # sire의 sex가 암으로 되어 있다면 error표시
+         else if (pedi_04[index, "sex"] == 1) {
+             pedi_04[index, "sex"] = 3
+         }
+     }
+     
+     # dma이 있다면
+     if (dam != 0) {
+         # dam이 개체의 어디에 나오는가
+         index = which(pedi_04$animal == dam)
+         
+         # dam의 sex가 결정되지 않았다면 암으로 표시
+         if (pedi_04[index, "sex"] == 0) {
+             pedi_04[index, "sex"] = 1
+         }
+         # dam의 sex가 수로 되어 있다면 error 표시
+         else if (pedi_04[index, "sex"] == 2) {
+             pedi_04[index, "sex"] = 3
+         }
+     }
+ }
> # print animals with sex error
> pedi_04[which(pedi_04$sex == 3), ]
[1] animal sire   dam    dup    sex    asd    gen   
<0 행> <또는 row.names의 길이가 0입니다>
> ########################################
> # check animal = sire = dam error
> ########################################
> # animal = sire인 개체의 위치 저장
> index = which(pedi_04$animal == pedi_04$sire)
> # animal = sire인 개체가 있다면 error 추가 
> if (length(index) != 0) {
+     pedi_04[index, "asd"] = pedi_04[index, "asd"] + 1
+ }
> # animal = dam인 개체의 위치 저장
> index = which(pedi_04$animal == pedi_04$dam)
> # animal = dam인 개체가 있다면 error 추가 
> if (length(index) != 0) {
+     pedi_04[index, "asd"] = pedi_04[index, "asd"] + 2
+ }
> # sire와 dam이 0이 아니고, sire = dam인 개체의 위치 저장 
> index = which(pedi_04$sire == pedi_04$dam & pedi_04$sire != 0 & pedi_04$dam != 0)
> # sire = dam이면 error 추가 
> if (length(index) != 0) {
+     pedi_04[index, "asd"] = pedi_04[index, "asd"] + 4
+ }
> # print animals with a = s = d error
> pedi_04[which(pedi_04$asd != 0), ]
[1] animal sire   dam    dup    sex    asd    gen   
<0 행> <또는 row.names의 길이가 0입니다>
> #########################
> # check cycling error
> #########################
> for (i in 1 : 50) {
+     for (j in 1 : no_of_animals) {
+         sire = pedi_04[j, "sire"]
+         dam  = pedi_04[j, "dam"]
+         
+         # 아비가 0이 아니고, 아비의 세대수가 개체의 세대수보다 같거나 작다면
+         index = which(pedi_04$animal == sire)
+         if ((sire != 0) && (pedi_04[index, "gen"] <= pedi_04[j, "gen"])) {
+             pedi_04[index, "gen"] = pedi_04[j, "gen"] + 1
+         }
+         
+         # 어미가 0이 아니고, 어미의 세대수가 개체의 세대수보다 같거나 작다면
+         index = which(pedi_04$animal == dam)
+         if ((dam != 0) && (pedi_04[index, "gen"] <= pedi_04[j, "gen"])) {
+             pedi_04[index, "gen"] = pedi_04[j, "gen"] + 1
+         }
+     }
+ }
> # sort by generation
> pedi_04 = pedi_04[order(-pedi_04$gen),]
> # print pedigree : if generation is very large, there may be an error
> head(pedi_04, 10)
  animal sire dam dup sex asd gen
1     a1    0   0   0   2   0   4
2     a2    0   0   0   1   0   4
3     a3   a1  a2   0   1   0   3
5     a4   a1   0   0   2   0   3
6     a5   a4  a3   0   2   0   2
4     a6   a5  a2   0   0   0   1
> tail(pedi_04, 10)
  animal sire dam dup sex asd gen
1     a1    0   0   0   2   0   4
2     a2    0   0   0   1   0   4
3     a3   a1  a2   0   1   0   3
5     a4   a1   0   0   2   0   3
6     a5   a4  a3   0   2   0   2
4     a6   a5  a2   0   0   0   1
> # 파일로 출력, 분리자는 ",", 따옴표 없고 
> output_filename = gsub("[ -]", "", paste("pedi_", Sys.Date(), ".stk"))
> write.table(pedi_04, output_filename, sep=",", row.names = FALSE, quote = FALSE)

- 결과 파일(pedi_20200410.stk)

animal,sire,dam,dup,sex,asd,gen
a1,0,0,0,2,0,4
a2,0,0,0,1,0,4
a3,a1,a2,0,1,0,3
a4,a1,0,0,2,0,3
a5,a4,a3,0,2,0,2
a6,a5,a2,0,0,0,1

■ Renumber Pedigree

- R 프로그램

# renumber pedigree

# set working directory
setwd("D:/temp")

# print working directory
getwd()

# 혈통 자료 읽기
pedi_01 <- read.table("pedi_20200410.stk", header = TRUE, sep = ",", stringsAsFactors = FALSE)

# print pedigree
dim(pedi_01)
head(pedi_01)
tail(pedi_01)

# animal 기준 넘버링
animal_renum = as.data.frame(cbind(animal = pedi_01[, "animal"], arenum = seq(1, nrow(pedi_01))), stringsAsFactors = FALSE)

# 부모가 0인 경우 추가
animal_renum = rbind(animal_renum, c(0, 0))

# print animal renumber
dim(animal_renum)
head(animal_renum)
tail(animal_renum)

# 조인하기 위한 패키지 실행
#install.packages("plyr")
library(plyr)

# 개체에 번호 붙이기
pedi_02 = join(x = pedi_01, y = animal_renum, by = "animal", type = "left")

# print pedigree
head(pedi_02)
tail(pedi_02)

# animal에서 sire로 이름 변경
sire_renum = animal_renum
colnames(sire_renum) = c("sire", "srenum")
head(sire_renum)
tail(sire_renum)

# sire에 번호 붙이기 
pedi_03 = join(x = pedi_02, y = sire_renum, by = "sire", type = "left")

# print pedigree
head(pedi_03)
tail(pedi_03)

# sire에서 dam으로 이름 변경 
dam_renum = animal_renum
colnames(dam_renum) = c("dam", "drenum")
head(dam_renum)

# dam에 번호 붙이기 
pedi_04 = join(x = pedi_03, y = dam_renum, by = "dam", type = "left")

# print pedigree
head(pedi_04)
tail(pedi_04)

# select arenum, srenum, drenum
pedi_05 = pedi_04[, c("arenum", "srenum", "drenum")]

head(pedi_05)

# renumbered pedigree 파일로 출력, 분리자는 ",", 따옴표 없고 
output_filename = gsub("[ -]", "", paste("pedi_", Sys.Date(), ".ren"))
write.table(pedi_05, output_filename, sep=",", row.names = FALSE, quote = FALSE)

# animal number 파일로 출력, 분리자는 ",", 따옴표 없고 
arn_animal = pedi_04[,c("arenum", "animal", "sire", "dam")]
output_filename = gsub("[ -]", "", paste("pedi_", Sys.Date(), ".arn"))
write.table(arn_animal, output_filename, sep=",", row.names = FALSE, quote = FALSE)

- 실행 결과

> # set working directory
> setwd("D:/temp")
> # print working directory
> getwd()
[1] "D:/temp"
> # 혈통 자료 읽기
> pedi_01 <- read.table("pedi_20200410.stk", header = TRUE, sep = ",", stringsAsFactors = FALSE)
> # print pedigree
> dim(pedi_01)
[1] 6 7
> head(pedi_01)
  animal sire dam dup sex asd gen
1     a1    0   0   0   2   0   4
2     a2    0   0   0   1   0   4
3     a3   a1  a2   0   1   0   3
4     a4   a1   0   0   2   0   3
5     a5   a4  a3   0   2   0   2
6     a6   a5  a2   0   0   0   1
> tail(pedi_01)
  animal sire dam dup sex asd gen
1     a1    0   0   0   2   0   4
2     a2    0   0   0   1   0   4
3     a3   a1  a2   0   1   0   3
4     a4   a1   0   0   2   0   3
5     a5   a4  a3   0   2   0   2
6     a6   a5  a2   0   0   0   1
> # animal 기준 넘버링
> animal_renum = as.data.frame(cbind(animal = pedi_01[, "animal"], arenum = seq(1, nrow(pedi_01))), stringsAsFactors = FALSE)
> # 부모가 0인 경우 추가
> animal_renum = rbind(animal_renum, c(0, 0))
> # print animal renumber
> dim(animal_renum)
[1] 7 2
> head(animal_renum)
  animal arenum
1     a1      1
2     a2      2
3     a3      3
4     a4      4
5     a5      5
6     a6      6
> tail(animal_renum)
  animal arenum
2     a2      2
3     a3      3
4     a4      4
5     a5      5
6     a6      6
7      0      0
> # 조인하기 위한 패키지 실행
> #install.packages("plyr")
> library(plyr)
> # 개체에 번호 붙이기
> pedi_02 = join(x = pedi_01, y = animal_renum, by = "animal", type = "left")
> # print pedigree
> head(pedi_02)
  animal sire dam dup sex asd gen arenum
1     a1    0   0   0   2   0   4      1
2     a2    0   0   0   1   0   4      2
3     a3   a1  a2   0   1   0   3      3
4     a4   a1   0   0   2   0   3      4
5     a5   a4  a3   0   2   0   2      5
6     a6   a5  a2   0   0   0   1      6
> tail(pedi_02)
  animal sire dam dup sex asd gen arenum
1     a1    0   0   0   2   0   4      1
2     a2    0   0   0   1   0   4      2
3     a3   a1  a2   0   1   0   3      3
4     a4   a1   0   0   2   0   3      4
5     a5   a4  a3   0   2   0   2      5
6     a6   a5  a2   0   0   0   1      6
> # animal에서 sire로 이름 변경
> sire_renum = animal_renum
> colnames(sire_renum) = c("sire", "srenum")
> head(sire_renum)
  sire srenum
1   a1      1
2   a2      2
3   a3      3
4   a4      4
5   a5      5
6   a6      6
> tail(sire_renum)
  sire srenum
2   a2      2
3   a3      3
4   a4      4
5   a5      5
6   a6      6
7    0      0
> # sire에 번호 붙이기 
> pedi_03 = join(x = pedi_02, y = sire_renum, by = "sire", type = "left")
> # print pedigree
> head(pedi_03)
  animal sire dam dup sex asd gen arenum srenum
1     a1    0   0   0   2   0   4      1      0
2     a2    0   0   0   1   0   4      2      0
3     a3   a1  a2   0   1   0   3      3      1
4     a4   a1   0   0   2   0   3      4      1
5     a5   a4  a3   0   2   0   2      5      4
6     a6   a5  a2   0   0   0   1      6      5
> tail(pedi_03)
  animal sire dam dup sex asd gen arenum srenum
1     a1    0   0   0   2   0   4      1      0
2     a2    0   0   0   1   0   4      2      0
3     a3   a1  a2   0   1   0   3      3      1
4     a4   a1   0   0   2   0   3      4      1
5     a5   a4  a3   0   2   0   2      5      4
6     a6   a5  a2   0   0   0   1      6      5
> # sire에서 dam으로 이름 변경 
> dam_renum = animal_renum
> colnames(dam_renum) = c("dam", "drenum")
> head(dam_renum)
  dam drenum
1  a1      1
2  a2      2
3  a3      3
4  a4      4
5  a5      5
6  a6      6
> # dam에 번호 붙이기 
> pedi_04 = join(x = pedi_03, y = dam_renum, by = "dam", type = "left")
> # print pedigree
> head(pedi_04)
  animal sire dam dup sex asd gen arenum srenum drenum
1     a1    0   0   0   2   0   4      1      0      0
2     a2    0   0   0   1   0   4      2      0      0
3     a3   a1  a2   0   1   0   3      3      1      2
4     a4   a1   0   0   2   0   3      4      1      0
5     a5   a4  a3   0   2   0   2      5      4      3
6     a6   a5  a2   0   0   0   1      6      5      2
> tail(pedi_04)
  animal sire dam dup sex asd gen arenum srenum drenum
1     a1    0   0   0   2   0   4      1      0      0
2     a2    0   0   0   1   0   4      2      0      0
3     a3   a1  a2   0   1   0   3      3      1      2
4     a4   a1   0   0   2   0   3      4      1      0
5     a5   a4  a3   0   2   0   2      5      4      3
6     a6   a5  a2   0   0   0   1      6      5      2
> # select arenum, srenum, drenum
> pedi_05 = pedi_04[, c("arenum", "srenum", "drenum")]
> head(pedi_05)
  arenum srenum drenum
1      1      0      0
2      2      0      0
3      3      1      2
4      4      1      0
5      5      4      3
6      6      5      2
> # renumbered pedigree 파일로 출력, 분리자는 ",", 따옴표 없고 
> output_filename = gsub("[ -]", "", paste("pedi_", Sys.Date(), ".ren"))
> write.table(pedi_05, output_filename, sep=",", row.names = FALSE, quote = FALSE)
> # animal number 파일로 출력, 분리자는 ",", 따옴표 없고 
> arn_animal = pedi_04[,c("arenum", "animal", "sire", "dam")]
> output_filename = gsub("[ -]", "", paste("pedi_", Sys.Date(), ".arn"))
> write.table(arn_animal, output_filename, sep=",", row.names = FALSE, quote = FALSE)

- 결과 파일

<pedi_20200410.ren>
arenum,srenum,drenum
1,0,0
2,0,0
3,1,2
4,1,0
5,4,3
6,5,2

<pedi_20200410.arn>
arenum,animal,sire,dam
1,a1,0,0
2,a2,0,0
3,a3,a1,a2
4,a4,a1,0
5,a5,a4,a3
6,a6,a5,a2

■ Numerator Relationship Matrix 만들기

- R 프로그램

# make numerator relationship matrix

# set working directory
setwd("D:/temp")

# print working directory
getwd()

# 혈통 자료 읽기
pedi_01 <- read.table("pedi_20200410.ren", header = TRUE, sep = ",", stringsAsFactors = FALSE)

# 입력한 혈통 파일 확인
dim(pedi_01)
head(pedi_01)
tail(pedi_01)

# number of animal
no_of_animals = nrow(pedi_01)
no_of_animals

# empty NRM 생성
A = matrix(rep(0, no_of_animals * no_of_animals), nrow = no_of_animals)

# empty NRM 확인
A

# 개체를 하나씩 읽으면서 NRM을 채워 나가기
for (i in 1:no_of_animals) {
    animal = pedi_01[i, 1]
    sire = pedi_01[i, 2]
    dam = pedi_01[i, 3]
    
    if (sire == 0 & dam == 0) {
        # 부모를 모를 경우
        A[animal, animal] = 1
    } else if (sire != 0 & dam == 0) {
        # 부만 알고 있을 경우
        for (j in 1:animal - 1) {
            A[j, animal] = 0.5 * (A[j, sire])
            A[animal, j] = A[j, animal]
        }
        A[animal, animal] = 1
    } else if (sire == 0 & dam != 0) {
        # 모만 알고 있을 경우
        for (j in 1:animal - 1) {
            A[j, animal] = 0.5 * (A[j, dam])
            A[animal, j] = A[j, animal]
        }
        A[animal, animal] = 1
    } else {
        # 부와 모를 알고 있을 경우
        for (j in 1:animal - 1) {
            A[j, animal] = 0.5 * (A[j, sire] + A[j, dam])
            A[animal, j] = A[j, animal]
        }
        A[animal, animal] = 1 + 0.5 * A[sire, dam]
    }
}

# NRM 출력
A

# install 'MASS' packages
#install.packages("MASS")

# activate package
library("MASS")

# inverse of A
ginv(A)

- 실행 결과

> # set working directory
> setwd("D:/temp")
> # print working directory
> getwd()
[1] "D:/temp"
> # 혈통 자료 읽기
> pedi_01 <- read.table("pedi_20200410.ren", header = TRUE, sep = ",", stringsAsFactors = FALSE)
> # 입력한 혈통 파일 확인
> dim(pedi_01)
[1] 6 3
> head(pedi_01)
  arenum srenum drenum
1      1      0      0
2      2      0      0
3      3      1      2
4      4      1      0
5      5      4      3
6      6      5      2
> tail(pedi_01)
  arenum srenum drenum
1      1      0      0
2      2      0      0
3      3      1      2
4      4      1      0
5      5      4      3
6      6      5      2
> # number of animal
> no_of_animals = nrow(pedi_01)
> no_of_animals
[1] 6
> # empty NRM 생성
> A = matrix(rep(0, no_of_animals * no_of_animals), nrow = no_of_animals)
> # empty NRM 확인
> A
     [,1] [,2] [,3] [,4] [,5] [,6]
[1,]    0    0    0    0    0    0
[2,]    0    0    0    0    0    0
[3,]    0    0    0    0    0    0
[4,]    0    0    0    0    0    0
[5,]    0    0    0    0    0    0
[6,]    0    0    0    0    0    0
> # 개체를 하나씩 읽으면서 NRM을 채워 나가기
> for (i in 1:no_of_animals) {
+     animal = pedi_01[i, 1]
+     sire = pedi_01[i, 2]
+     dam = pedi_01[i, 3]
+     
+     if (sire == 0 & dam == 0) {
+         # 부모를 모를 경우
+         A[animal, animal] = 1
+     } else if (sire != 0 & dam == 0) {
+         # 부만 알고 있을 경우
+         for (j in 1:animal - 1) {
+             A[j, animal] = 0.5 * (A[j, sire])
+             A[animal, j] = A[j, animal]
+         }
+         A[animal, animal] = 1
+     } else if (sire == 0 & dam != 0) {
+         # 모만 알고 있을 경우
+         for (j in 1:animal - 1) {
+             A[j, animal] = 0.5 * (A[j, dam])
+             A[animal, j] = A[j, animal]
+         }
+         A[animal, animal] = 1
+     } else {
+         # 부와 모를 알고 있을 경우
+         for (j in 1:animal - 1) {
+             A[j, animal] = 0.5 * (A[j, sire] + A[j, dam])
+             A[animal, j] = A[j, animal]
+         }
+         A[animal, animal] = 1 + 0.5 * A[sire, dam]
+     }
+ }
> # NRM 출력
> A
     [,1]  [,2]   [,3]   [,4]   [,5]   [,6]
[1,] 1.00 0.000 0.5000 0.5000 0.5000 0.2500
[2,] 0.00 1.000 0.5000 0.0000 0.2500 0.6250
[3,] 0.50 0.500 1.0000 0.2500 0.6250 0.5625
[4,] 0.50 0.000 0.2500 1.0000 0.6250 0.3125
[5,] 0.50 0.250 0.6250 0.6250 1.1250 0.6875
[6,] 0.25 0.625 0.5625 0.3125 0.6875 1.1250
> # activate package
> library("MASS")
> # inverse of A
> ginv(A)
              [,1]          [,2]          [,3]          [,4]          [,5]          [,6]
[1,]  1.833333e+00  5.000000e-01 -1.000000e+00 -6.666667e-01 -1.887379e-15  6.106227e-16
[2,]  5.000000e-01  2.033333e+00 -1.000000e+00  2.220446e-16  5.333333e-01 -1.066667e+00
[3,] -1.000000e+00 -1.000000e+00  2.500000e+00  5.000000e-01 -1.000000e+00 -8.881784e-16
[4,] -6.666667e-01 -4.440892e-16  5.000000e-01  1.833333e+00 -1.000000e+00 -7.771561e-16
[5,] -4.440892e-16  5.333333e-01 -1.000000e+00 -1.000000e+00  2.533333e+00 -1.066667e+00
[6,] -4.996004e-16 -1.066667e+00 -1.110223e-16 -1.110223e-16 -1.066667e+00  2.133333e+00

■ 근친교배를 무시한 NRM의 역행렬 만들기

- R 프로그램

# make inverse of numerator relationship matrix ignoring inbreeding

# set working directory
setwd("D:/temp")

# print working directory
getwd()

# 혈통 자료 읽기
pedi_01 <- read.table("pedi_20200410.ren", header = TRUE, sep = ",", stringsAsFactors = FALSE)

# 입력한 혈통 파일 확인
dim(pedi_01)
head(pedi_01)
tail(pedi_01)

# number of animal
no_of_animals = nrow(pedi_01)
no_of_animals

# 혈연계수의 역행렬(A-1) 할당
Ainv = matrix(rep(0, no_of_animals * no_of_animals), nrow = no_of_animals)
Ainv

# 혈연계수의 역행렬(A-1) 계산

for (i in 1 : no_of_animals) {
    animal = pedi_01[i, 1]
    sire = pedi_01[i, 2]
    dam = pedi_01[i, 3]
    
    if (sire == 0 & dam == 0) {
        # 개체의 부모를 둘다 모를 경우
        alpha = 1
        Ainv[animal, animal] = alpha + Ainv[animal, animal]
    } else if (sire != 0 & dam == 0) {
        # 개체의 부만 알고 있을 경우
        alpha = 4/3
        Ainv[animal, animal] = alpha + Ainv[animal, animal]
        Ainv[animal, sire] = -alpha/2 + Ainv[animal, sire]
        Ainv[sire, animal] = -alpha/2 + Ainv[sire, animal]
        Ainv[sire, sire] = alpha/4 + Ainv[sire, sire]
    } else if (sire == 0 & dam != 0) {
        # 개체의 모만 알고 있을 경우
        alpha = 4/3
        Ainv[animal, animal] = alpha + Ainv[animal, animal]
        Ainv[animal, dam] = -alpha/2 + Ainv[animal, dam]
        Ainv[dam, animal] = -alpha/2 + Ainv[dam, animal]
        Ainv[dam, dam] = alpha/4 + Ainv[dam, dam]
    } else {
        # 개체의 부모를 모두 알고 있을 경우
        alpha = 2
        Ainv[animal, animal] = alpha + Ainv[animal, animal]
        Ainv[animal, sire] = -alpha/2 + Ainv[animal, sire]
        Ainv[sire, animal] = -alpha/2 + Ainv[sire, animal]
        Ainv[animal, dam] = -alpha/2 + Ainv[animal, dam]
        Ainv[dam, animal] = -alpha/2 + Ainv[dam, animal]
        Ainv[sire, sire] = alpha/4 + Ainv[sire, sire]
        Ainv[sire, dam] = alpha/4 + Ainv[sire, dam]
        Ainv[dam, sire] = alpha/4 + Ainv[dam, sire]
        Ainv[dam, dam] = alpha/4 + Ainv[dam, dam]
    }
    
    # 개체별 처리과정 출력 cat('Animal', i, '\n') print(Ainv)
}

# print A - Inverse
Ainv

- 실행 결과

> # set working directory
> setwd("D:/temp")
> # print working directory
> getwd()
[1] "D:/temp"
> # 혈통 자료 읽기
> pedi_01 <- read.table("pedi_20200410.ren", header = TRUE, sep = ",", stringsAsFactors = FALSE)
> # 입력한 혈통 파일 확인
> dim(pedi_01)
[1] 6 3
> head(pedi_01)
  arenum srenum drenum
1      1      0      0
2      2      0      0
3      3      1      2
4      4      1      0
5      5      4      3
6      6      5      2
> tail(pedi_01)
  arenum srenum drenum
1      1      0      0
2      2      0      0
3      3      1      2
4      4      1      0
5      5      4      3
6      6      5      2
> # number of animal
> no_of_animals = nrow(pedi_01)
> no_of_animals
[1] 6
> # 혈연계수의 역행렬(A-1) 할당
> Ainv = matrix(rep(0, no_of_animals * no_of_animals), nrow = no_of_animals)
> Ainv
     [,1] [,2] [,3] [,4] [,5] [,6]
[1,]    0    0    0    0    0    0
[2,]    0    0    0    0    0    0
[3,]    0    0    0    0    0    0
[4,]    0    0    0    0    0    0
[5,]    0    0    0    0    0    0
[6,]    0    0    0    0    0    0
> for (i in 1 : no_of_animals) {
+     animal = pedi_01[i, 1]
+     sire = pedi_01[i, 2]
+     dam = pedi_01[i, 3]
+     
+     if (sire == 0 & dam == 0) {
+         # 개체의 부모를 둘다 모를 경우
+         alpha = 1
+         Ainv[animal, animal] = alpha + Ainv[animal, animal]
+     } else if (sire != 0 & dam == 0) {
+         # 개체의 부만 알고 있을 경우
+         alpha = 4/3
+         Ainv[animal, animal] = alpha + Ainv[animal, animal]
+         Ainv[animal, sire] = -alpha/2 + Ainv[animal, sire]
+         Ainv[sire, animal] = -alpha/2 + Ainv[sire, animal]
+         Ainv[sire, sire] = alpha/4 + Ainv[sire, sire]
+     } else if (sire == 0 & dam != 0) {
+         # 개체의 모만 알고 있을 경우
+         alpha = 4/3
+         Ainv[animal, animal] = alpha + Ainv[animal, animal]
+         Ainv[animal, dam] = -alpha/2 + Ainv[animal, dam]
+         Ainv[dam, animal] = -alpha/2 + Ainv[dam, animal]
+         Ainv[dam, dam] = alpha/4 + Ainv[dam, dam]
+     } else {
+         # 개체의 부모를 모두 알고 있을 경우
+         alpha = 2
+         Ainv[animal, animal] = alpha + Ainv[animal, animal]
+         Ainv[animal, sire] = -alpha/2 + Ainv[animal, sire]
+         Ainv[sire, animal] = -alpha/2 + Ainv[sire, animal]
+         Ainv[animal, dam] = -alpha/2 + Ainv[animal, dam]
+         Ainv[dam, animal] = -alpha/2 + Ainv[dam, animal]
+         Ainv[sire, sire] = alpha/4 + Ainv[sire, sire]
+         Ainv[sire, dam] = alpha/4 + Ainv[sire, dam]
+         Ainv[dam, sire] = alpha/4 + Ainv[dam, sire]
+         Ainv[dam, dam] = alpha/4 + Ainv[dam, dam]
+     }
+     
+     # 개체별 처리과정 출력 cat('Animal', i, '\n') print(Ainv)
+ }
> # print A - Inverse
> Ainv
           [,1] [,2] [,3]       [,4] [,5] [,6]
[1,]  1.8333333  0.5 -1.0 -0.6666667  0.0    0
[2,]  0.5000000  2.0 -1.0  0.0000000  0.5   -1
[3,] -1.0000000 -1.0  2.5  0.5000000 -1.0    0
[4,] -0.6666667  0.0  0.5  1.8333333 -1.0    0
[5,]  0.0000000  0.5 -1.0 -1.0000000  2.5   -1
[6,]  0.0000000 -1.0  0.0  0.0000000 -1.0    2

■ Inbreeding Coefficient 계산하기

- R 프로그램

# calculate inbreeding coefficients using Meuwissen and Luo algorithm

# set working directory
setwd("D:/temp")

# print working directory
getwd()

# 혈통 자료 읽기
pedi_01 <- read.table("pedi_20200410.ren", header = TRUE, sep = ",", stringsAsFactors = FALSE)

# 입력한 혈통 파일 확인
dim(pedi_01)
head(pedi_01)
tail(pedi_01)

# number of animal
no_of_animals = nrow(pedi_01)
no_of_animals

# 근교계수를 저장할 컬럼 추가
pedi_02 = cbind(pedi_01, inb = rep(0, no_of_animals))
dim(pedi_02)
head(pedi_02)
tail(pedi_02)

# Dii를 구하는 함수
getDii = function(animal) {
    sire = pedi_02[animal, 2]
    dam = pedi_02[animal, 3]
    
    if (sire == 0 & dam == 0) {
        return(1)
    } else if (sire != 0 & dam == 0) {
        return(0.75 - 0.25 * pedi_02[sire, 4])
    } else if (sire == 0 & dam != 0) {
        return(0.75 - 0.25 * pedi_02[dam, 4])
    } else if (sire != 0 & dam != 0) {
        return(0.5 - 0.25 * (pedi_02[sire, 4] + pedi_02[dam, 4]))
    }
}

# 1열 즉 개체에 대하여 정렬
pedi_02 = pedi_02[order(pedi_02[, 1]), ]
head(pedi_02)
tail(pedi_02)

# 한 개체씩 근교계수 계산
for (i in 1 : no_of_animals) {
    # id - t array
    idt = data.frame(id = pedi_02[i, 1], t = 1)
    
    # current row position of array
    cur_pos = 1
    # add position of sire or dam
    add_pos = 2
    
    # idt를 모두 처리할 때까지
    while (cur_pos < add_pos) {
        animal = idt[cur_pos, 1]
        sire = pedi_02[animal, 2]
        dam = pedi_02[animal, 3]
        
        if (sire > 0) {
            # sire가 있다면
            idt = rbind(idt, c(sire, idt[cur_pos, 2] * 0.5))
            add_pos = add_pos + 1
        }
        
        if (dam > 0) {
            # dam이 있다면
            idt = rbind(idt, c(dam, idt[cur_pos, 2] * 0.5))
            add_pos = add_pos + 1
        }
        cur_pos = cur_pos + 1
    }
    
    # sorting
    idt = idt[order(idt$id), ]
    
    # 개체의 근교계수 계산
    p_id = idt[1, 1]
    t = idt[1, 2]
    f = -1
    
    if (nrow(idt) > 1) {
        for (j in 2:nrow(idt)) {
            if (p_id != idt[j, 1]) {
                f = f + t * t * getDii(p_id)
                p_id = idt[j, 1]
                t = idt[j, 2]
            } else {
                t = t + idt[j, 2]
            }
        }
    }
    f = f + t * t * getDii(p_id)
    pedi_02[i, 4] = f
}

# print inbreeding coefficient
pedi_02

# 혈통 자료 읽기
ori_pedi <- read.table("pedi_20200409.arn", header = TRUE, sep = ",", stringsAsFactors = FALSE)

# 입력한 파일 확인
dim(ori_pedi)
head(ori_pedi)
tail(ori_pedi)

# 조인하기 위한 패키지 실행
#install.packages("plyr")
library(plyr)

# add orignal pedigree
pedi_03 = join(x = pedi_02, y = ori_pedi, by = "arenum", type = "left")

# renumbered pedigree, inbreeding coefficient, original pedigree, sep = ",", no quote
output_filename = gsub("[ -]", "", paste("pedi_", Sys.Date(), ".inb"))
write.table(pedi_03, output_filename, sep=",", row.names = FALSE, quote = FALSE)

- 실행 결과

> # set working directory
> setwd("D:/temp")
> # print working directory
> getwd()
[1] "D:/temp"
> # 혈통 자료 읽기
> pedi_01 <- read.table("pedi_20200410.ren", header = TRUE, sep = ",", stringsAsFactors = FALSE)
> # 입력한 혈통 파일 확인
> dim(pedi_01)
[1] 6 3
> head(pedi_01)
  arenum srenum drenum
1      1      0      0
2      2      0      0
3      3      1      2
4      4      1      0
5      5      4      3
6      6      5      2
> tail(pedi_01)
  arenum srenum drenum
1      1      0      0
2      2      0      0
3      3      1      2
4      4      1      0
5      5      4      3
6      6      5      2
> # number of animal
> no_of_animals = nrow(pedi_01)
> no_of_animals
[1] 6
> # 근교계수를 저장할 컬럼 추가
> pedi_02 = cbind(pedi_01, inb = rep(0, no_of_animals))
> dim(pedi_02)
[1] 6 4
> head(pedi_02)
  arenum srenum drenum inb
1      1      0      0   0
2      2      0      0   0
3      3      1      2   0
4      4      1      0   0
5      5      4      3   0
6      6      5      2   0
> tail(pedi_02)
  arenum srenum drenum inb
1      1      0      0   0
2      2      0      0   0
3      3      1      2   0
4      4      1      0   0
5      5      4      3   0
6      6      5      2   0
> # Dii를 구하는 함수
> getDii = function(animal) {
+     sire = pedi_02[animal, 2]
+     dam = pedi_02[animal, 3]
+     
+     if (sire == 0 & dam == 0) {
+         return(1)
+     } else if (sire != 0 & dam == 0) {
+         return(0.75 - 0.25 * pedi_02[sire, 4])
+     } else if (sire == 0 & dam != 0) {
+         return(0.75 - 0.25 * pedi_02[dam, 4])
+     } else if (sire != 0 & dam != 0) {
+         return(0.5 - 0.25 * (pedi_02[sire, 4] + pedi_02[dam, 4]))
+     }
+ }
> # 1열 즉 개체에 대하여 정렬
> pedi_02 = pedi_02[order(pedi_02[, 1]), ]
> head(pedi_02)
  arenum srenum drenum inb
1      1      0      0   0
2      2      0      0   0
3      3      1      2   0
4      4      1      0   0
5      5      4      3   0
6      6      5      2   0
> tail(pedi_02)
  arenum srenum drenum inb
1      1      0      0   0
2      2      0      0   0
3      3      1      2   0
4      4      1      0   0
5      5      4      3   0
6      6      5      2   0
> # 한 개체씩 근교계수 계산
> for (i in 1 : no_of_animals) {
+     # id - t array
+     idt = data.frame(id = pedi_02[i, 1], t = 1)
+     
+     # current row position of array
+     cur_pos = 1
+     # add position of sire or dam
+     add_pos = 2
+     
+     # idt를 모두 처리할 때까지
+     while (cur_pos < add_pos) {
+         animal = idt[cur_pos, 1]
+         sire = pedi_02[animal, 2]
+         dam = pedi_02[animal, 3]
+         
+         if (sire > 0) {
+             # sire가 있다면
+             idt = rbind(idt, c(sire, idt[cur_pos, 2] * 0.5))
+             add_pos = add_pos + 1
+         }
+         
+         if (dam > 0) {
+             # dam이 있다면
+             idt = rbind(idt, c(dam, idt[cur_pos, 2] * 0.5))
+             add_pos = add_pos + 1
+         }
+         cur_pos = cur_pos + 1
+     }
+     
+     # sorting
+     idt = idt[order(idt$id), ]
+     
+     # 개체의 근교계수 계산
+     p_id = idt[1, 1]
+     t = idt[1, 2]
+     f = -1
+     
+     if (nrow(idt) > 1) {
+         for (j in 2:nrow(idt)) {
+             if (p_id != idt[j, 1]) {
+                 f = f + t * t * getDii(p_id)
+                 p_id = idt[j, 1]
+                 t = idt[j, 2]
+             } else {
+                 t = t + idt[j, 2]
+             }
+         }
+     }
+     f = f + t * t * getDii(p_id)
+     pedi_02[i, 4] = f
+ }
> # print inbreeding coefficient
> pedi_02
  arenum srenum drenum   inb
1      1      0      0 0.000
2      2      0      0 0.000
3      3      1      2 0.000
4      4      1      0 0.000
5      5      4      3 0.125
6      6      5      2 0.125
> # 혈통 자료 읽기
> ori_pedi <- read.table("pedi_20200409.arn", header = TRUE, sep = ",", stringsAsFactors = FALSE)
> # 입력한 파일 확인
> dim(ori_pedi)
[1] 6 4
> head(ori_pedi)
  arenum animal sire dam
1      1     a1    0   0
2      2     a2    0   0
3      3     a3   a1  a2
4      4     a4   a1   0
5      5     a5   a4  a3
6      6     a6   a5  a2
> tail(ori_pedi)
  arenum animal sire dam
1      1     a1    0   0
2      2     a2    0   0
3      3     a3   a1  a2
4      4     a4   a1   0
5      5     a5   a4  a3
6      6     a6   a5  a2
> # 조인하기 위한 패키지 실행
> #install.packages("plyr")
> library(plyr)
> # add orignal pedigree
> pedi_03 = join(x = pedi_02, y = ori_pedi, by = "arenum", type = "left")
> # renumbered pedigree, inbreeding coefficient, original pedigree, sep = ",", no quote
> output_filename = gsub("[ -]", "", paste("pedi_", Sys.Date(), ".inb"))
> write.table(pedi_03, output_filename, sep=",", row.names = FALSE, quote = FALSE)

- 결과 파일(pedi_20200410.inb)

arenum,srenum,drenum,inb,animal,sire,dam
1,0,0,0,a1,0,0
2,0,0,0,a2,0,0
3,1,2,0,a3,a1,a2
4,1,0,0,a4,a1,0
5,4,3,0.125,a5,a4,a3
6,5,2,0.125,a6,a5,a2

■ TDT’ 행렬 만들기

- R 프로그램

# make T, D matrix and then make A

# set working directory
setwd("D:/temp")

# print working directory
getwd()

# 혈통 자료 읽기
pedi_01 <- read.table("pedi_20200410.inb", header = TRUE, sep = ",", stringsAsFactors = FALSE)

# 입력한 혈통 파일 확인
dim(pedi_01)
head(pedi_01)
tail(pedi_01)

# number of animal
no_of_animals = nrow(pedi_01)
no_of_animals

# empty T 생성
T = matrix(rep(0, no_of_animals * no_of_animals), nrow = no_of_animals)

# empty T 확인
T

# 개체를 하나씩 읽으면서 NRM을 채워 나가기
for (i in 1:no_of_animals) {
    animal = pedi_01[i, 1]
    sire = pedi_01[i, 2]
    dam = pedi_01[i, 3]
    
    if (sire == 0 & dam == 0) {
        # 부모를 모를 경우
        T[animal, animal] = 1
    } else if (sire != 0 & dam == 0) {
        # 부만 알고 있을 경우
        for (j in 1:animal - 1) {
            T[animal, j] = 0.5 * (T[sire, j])
        }
        T[animal, animal] = 1
    } else if (sire == 0 & dam != 0) {
        # 모만 알고 있을 경우
        for (j in 1:animal - 1) {
            T[animal, j] = 0.5 * (T[dam, j])
        }
        T[animal, animal] = 1
    } else {
        # 부와 모를 알고 있을 경우
        for (j in 1:animal - 1) {
            T[animal, j] = 0.5 * (T[sire, j] + T[dam, j])
        }
        T[animal, animal] = 1
    }
}

# T 출력
T

# Assum that we know inbreeding coefficients
InbCo = pedi_01[, c("inb")]
InbCo

# empty D
D = matrix(rep(0, no_of_animals * no_of_animals), nrow = no_of_animals)
D

for (i in 1:6) {
    animal = pedi_01[i, 1]
    sire = pedi_01[i, 2]
    dam = pedi_01[i, 3]
    
    if (sire == 0 & dam == 0) {
        # 부모를 모를 경우
        D[animal, animal] = 1
    } else if (sire != 0 & dam == 0) {
        # 부만 알고 있을 경우
        D[animal, animal] = 0.75 - 0.25 * InbCo[sire]
    } else if (sire == 0 & dam != 0) {
        # 모만 알고 있을 경우
        D[animal, animal] = 0.75 - 0.25 * InbCo[dam]
    } else {
        # 부와 모를 알고 있을 경우
        D[animal, animal] = 0.5 - 0.25 * InbCo[sire] - 0.25 * InbCo[dam]
    }
}

# print D
D

# print A
T %*% D %*% t(T)

- 실행 결과

> # set working directory
> setwd("D:/temp")
> # print working directory
> getwd()
[1] "D:/temp"
> # 혈통 자료 읽기
> pedi_01 <- read.table("pedi_20200410.inb", header = TRUE, sep = ",", stringsAsFactors = FALSE)
> # 입력한 혈통 파일 확인
> dim(pedi_01)
[1] 6 7
> head(pedi_01)
  arenum srenum drenum   inb animal sire dam
1      1      0      0 0.000     a1    0   0
2      2      0      0 0.000     a2    0   0
3      3      1      2 0.000     a3   a1  a2
4      4      1      0 0.000     a4   a1   0
5      5      4      3 0.125     a5   a4  a3
6      6      5      2 0.125     a6   a5  a2
> tail(pedi_01)
  arenum srenum drenum   inb animal sire dam
1      1      0      0 0.000     a1    0   0
2      2      0      0 0.000     a2    0   0
3      3      1      2 0.000     a3   a1  a2
4      4      1      0 0.000     a4   a1   0
5      5      4      3 0.125     a5   a4  a3
6      6      5      2 0.125     a6   a5  a2
> # number of animal
> no_of_animals = nrow(pedi_01)
> no_of_animals
[1] 6
> # empty T 생성
> T = matrix(rep(0, no_of_animals * no_of_animals), nrow = no_of_animals)
> # empty T 확인
> T
     [,1] [,2] [,3] [,4] [,5] [,6]
[1,]    0    0    0    0    0    0
[2,]    0    0    0    0    0    0
[3,]    0    0    0    0    0    0
[4,]    0    0    0    0    0    0
[5,]    0    0    0    0    0    0
[6,]    0    0    0    0    0    0
> # 개체를 하나씩 읽으면서 NRM을 채워 나가기
> for (i in 1:no_of_animals) {
+     animal = pedi_01[i, 1]
+     sire = pedi_01[i, 2]
+     dam = pedi_01[i, 3]
+     
+     if (sire == 0 & dam == 0) {
+         # 부모를 모를 경우
+         T[animal, animal] = 1
+     } else if (sire != 0 & dam == 0) {
+         # 부만 알고 있을 경우
+         for (j in 1:animal - 1) {
+             T[animal, j] = 0.5 * (T[sire, j])
+         }
+         T[animal, animal] = 1
+     } else if (sire == 0 & dam != 0) {
+         # 모만 알고 있을 경우
+         for (j in 1:animal - 1) {
+             T[animal, j] = 0.5 * (T[dam, j])
+         }
+         T[animal, animal] = 1
+     } else {
+         # 부와 모를 알고 있을 경우
+         for (j in 1:animal - 1) {
+             T[animal, j] = 0.5 * (T[sire, j] + T[dam, j])
+         }
+         T[animal, animal] = 1
+     }
+ }
> # T 출력
> T
     [,1]  [,2] [,3] [,4] [,5] [,6]
[1,] 1.00 0.000 0.00 0.00  0.0    0
[2,] 0.00 1.000 0.00 0.00  0.0    0
[3,] 0.50 0.500 1.00 0.00  0.0    0
[4,] 0.50 0.000 0.00 1.00  0.0    0
[5,] 0.50 0.250 0.50 0.50  1.0    0
[6,] 0.25 0.625 0.25 0.25  0.5    1
> # Assum that we know inbreeding coefficients
> InbCo = pedi_01[, c("inb")]
> InbCo
[1] 0.000 0.000 0.000 0.000 0.125 0.125
> # empty D
> D = matrix(rep(0, no_of_animals * no_of_animals), nrow = no_of_animals)
> D
     [,1] [,2] [,3] [,4] [,5] [,6]
[1,]    0    0    0    0    0    0
[2,]    0    0    0    0    0    0
[3,]    0    0    0    0    0    0
[4,]    0    0    0    0    0    0
[5,]    0    0    0    0    0    0
[6,]    0    0    0    0    0    0
> for (i in 1:6) {
+     animal = pedi_01[i, 1]
+     sire = pedi_01[i, 2]
+     dam = pedi_01[i, 3]
+     
+     if (sire == 0 & dam == 0) {
+         # 부모를 모를 경우
+         D[animal, animal] = 1
+     } else if (sire != 0 & dam == 0) {
+         # 부만 알고 있을 경우
+         D[animal, animal] = 0.75 - 0.25 * InbCo[sire]
+     } else if (sire == 0 & dam != 0) {
+         # 모만 알고 있을 경우
+         D[animal, animal] = 0.75 - 0.25 * InbCo[dam]
+     } else {
+         # 부와 모를 알고 있을 경우
+         D[animal, animal] = 0.5 - 0.25 * InbCo[sire] - 0.25 * InbCo[dam]
+     }
+ }
> # print D
> D
     [,1] [,2] [,3] [,4] [,5]    [,6]
[1,]    1    0  0.0 0.00  0.0 0.00000
[2,]    0    1  0.0 0.00  0.0 0.00000
[3,]    0    0  0.5 0.00  0.0 0.00000
[4,]    0    0  0.0 0.75  0.0 0.00000
[5,]    0    0  0.0 0.00  0.5 0.00000
[6,]    0    0  0.0 0.00  0.0 0.46875
> # print A
> T %*% D %*% t(T)
     [,1]  [,2]   [,3]   [,4]   [,5]   [,6]
[1,] 1.00 0.000 0.5000 0.5000 0.5000 0.2500
[2,] 0.00 1.000 0.5000 0.0000 0.2500 0.6250
[3,] 0.50 0.500 1.0000 0.2500 0.6250 0.5625
[4,] 0.50 0.000 0.2500 1.0000 0.6250 0.3125
[5,] 0.50 0.250 0.6250 0.6250 1.1250 0.6875
[6,] 0.25 0.625 0.5625 0.3125 0.6875 1.1250
>

■ 근친교배를 고려한 NRM의 역행렬 만들기

- R 프로그램

# make inverse of the numerator relationship matrix accounting for inbreedig

# set working directory
setwd("D:/temp")

# print working directory
getwd()

# 혈통 자료 읽기
pedi_01 <- read.table("pedi_20200410.inb", header = TRUE, sep = ",", stringsAsFactors = FALSE)

# 입력한 혈통 파일 확인
dim(pedi_01)
head(pedi_01)
tail(pedi_01)

# number of animal
no_of_animals = nrow(pedi_01)

# 혈연계수의 역행렬(A-1) 할당
Ainv = matrix(rep(0, no_of_animals * no_of_animals), nrow = no_of_animals)

# inbreeding coefficients
InbCo = pedi_01[, c("inb")]
InbCo

# 혈연계수의 역행렬(A-1) 계산

for (i in 1:no_of_animals) {
    animal = pedi_01[i, 1]
    sire = pedi_01[i, 2]
    dam = pedi_01[i, 3]
    
    if (sire == 0 & dam == 0) {
        # 개체의 부모를 둘다 모를 경우
        alpha = 1
        Ainv[animal, animal] = alpha + Ainv[animal, animal]
    } else if (sire != 0 & dam == 0) {
        # 개체의 부만 알고 있을 경우
        alpha = 1/(0.75 - 0.25 * InbCo[sire])
        Ainv[animal, animal] = alpha + Ainv[animal, animal]
        Ainv[animal, sire] = -alpha/2 + Ainv[animal, sire]
        Ainv[sire, animal] = -alpha/2 + Ainv[sire, animal]
        Ainv[sire, sire] = alpha/4 + Ainv[sire, sire]
    } else if (sire == 0 & dam != 0) {
        # 개체의 모만 알고 있을 경우
        alpha = 1/(0.75 - 0.25 * InbCo[dam])
        Ainv[animal, animal] = alpha + Ainv[animal, animal]
        Ainv[animal, dam] = -alpha/2 + Ainv[animal, dam]
        Ainv[dam, animal] = -alpha/2 + Ainv[dam, animal]
        Ainv[dam, dam] = alpha/4 + Ainv[dam, dam]
    } else {
        # 개체의 부모를 모두 알고 있을 경우
        alpha = 1/(0.5 - 0.25 * InbCo[sire] - 0.25 * InbCo[dam])
        Ainv[animal, animal] = alpha + Ainv[animal, animal]
        Ainv[animal, sire] = -alpha/2 + Ainv[animal, sire]
        Ainv[sire, animal] = -alpha/2 + Ainv[sire, animal]
        Ainv[animal, dam] = -alpha/2 + Ainv[animal, dam]
        Ainv[dam, animal] = -alpha/2 + Ainv[dam, animal]
        Ainv[sire, sire] = alpha/4 + Ainv[sire, sire]
        Ainv[sire, dam] = alpha/4 + Ainv[sire, dam]
        Ainv[dam, sire] = alpha/4 + Ainv[dam, sire]
        Ainv[dam, dam] = alpha/4 + Ainv[dam, dam]
    }
    
    # 개체별 처리과정 출력 cat('Animal', i, '\n') print(Ainv)
}

# print A Inverse
Ainv

- 실행 결과

> # set working directory
> setwd("D:/temp")
> # print working directory
> getwd()
[1] "D:/temp"
> # 혈통 자료 읽기
> pedi_01 <- read.table("pedi_20200410.inb", header = TRUE, sep = ",", stringsAsFactors = FALSE)
> # 입력한 혈통 파일 확인
> dim(pedi_01)
[1] 6 7
> head(pedi_01)
  arenum srenum drenum   inb animal sire dam
1      1      0      0 0.000     a1    0   0
2      2      0      0 0.000     a2    0   0
3      3      1      2 0.000     a3   a1  a2
4      4      1      0 0.000     a4   a1   0
5      5      4      3 0.125     a5   a4  a3
6      6      5      2 0.125     a6   a5  a2
> tail(pedi_01)
  arenum srenum drenum   inb animal sire dam
1      1      0      0 0.000     a1    0   0
2      2      0      0 0.000     a2    0   0
3      3      1      2 0.000     a3   a1  a2
4      4      1      0 0.000     a4   a1   0
5      5      4      3 0.125     a5   a4  a3
6      6      5      2 0.125     a6   a5  a2
> # number of animal
> no_of_animals = nrow(pedi_01)
> # 혈연계수의 역행렬(A-1) 할당
> Ainv = matrix(rep(0, no_of_animals * no_of_animals), nrow = no_of_animals)
> # inbreeding coefficients
> InbCo = pedi_01[, c("inb")]
> InbCo
[1] 0.000 0.000 0.000 0.000 0.125 0.125
> for (i in 1:no_of_animals) {
+     animal = pedi_01[i, 1]
+     sire = pedi_01[i, 2]
+     dam = pedi_01[i, 3]
+     
+     if (sire == 0 & dam == 0) {
+         # 개체의 부모를 둘다 모를 경우
+         alpha = 1
+         Ainv[animal, animal] = alpha + Ainv[animal, animal]
+     } else if (sire != 0 & dam == 0) {
+         # 개체의 부만 알고 있을 경우
+         alpha = 1/(0.75 - 0.25 * InbCo[sire])
+         Ainv[animal, animal] = alpha + Ainv[animal, animal]
+         Ainv[animal, sire] = -alpha/2 + Ainv[animal, sire]
+         Ainv[sire, animal] = -alpha/2 + Ainv[sire, animal]
+         Ainv[sire, sire] = alpha/4 + Ainv[sire, sire]
+     } else if (sire == 0 & dam != 0) {
+         # 개체의 모만 알고 있을 경우
+         alpha = 1/(0.75 - 0.25 * InbCo[dam])
+         Ainv[animal, animal] = alpha + Ainv[animal, animal]
+         Ainv[animal, dam] = -alpha/2 + Ainv[animal, dam]
+         Ainv[dam, animal] = -alpha/2 + Ainv[dam, animal]
+         Ainv[dam, dam] = alpha/4 + Ainv[dam, dam]
+     } else {
+         # 개체의 부모를 모두 알고 있을 경우
+         alpha = 1/(0.5 - 0.25 * InbCo[sire] - 0.25 * InbCo[dam])
+         Ainv[animal, animal] = alpha + Ainv[animal, animal]
+         Ainv[animal, sire] = -alpha/2 + Ainv[animal, sire]
+         Ainv[sire, animal] = -alpha/2 + Ainv[sire, animal]
+         Ainv[animal, dam] = -alpha/2 + Ainv[animal, dam]
+         Ainv[dam, animal] = -alpha/2 + Ainv[dam, animal]
+         Ainv[sire, sire] = alpha/4 + Ainv[sire, sire]
+         Ainv[sire, dam] = alpha/4 + Ainv[sire, dam]
+         Ainv[dam, sire] = alpha/4 + Ainv[dam, sire]
+         Ainv[dam, dam] = alpha/4 + Ainv[dam, dam]
+     }
+     
+     # 개체별 처리과정 출력 cat('Animal', i, '\n') print(Ainv)
+ }
> # print A Inverse
> Ainv
           [,1]       [,2] [,3]       [,4]       [,5]      [,6]
[1,]  1.8333333  0.5000000 -1.0 -0.6666667  0.0000000  0.000000
[2,]  0.5000000  2.0333333 -1.0  0.0000000  0.5333333 -1.066667
[3,] -1.0000000 -1.0000000  2.5  0.5000000 -1.0000000  0.000000
[4,] -0.6666667  0.0000000  0.5  1.8333333 -1.0000000  0.000000
[5,]  0.0000000  0.5333333 -1.0 -1.0000000  2.5333333 -1.066667
[6,]  0.0000000 -1.0666667  0.0  0.0000000 -1.0666667  2.133333

저작자표시

'Animal Breeding > R for Genetic Evaluation' 카테고리의 다른 글

R을 이용한 반복 모형(Repeatability Model)의 육종가 구하기 프로그래밍 (0)	2020.06.14
R을 이용하여 반복 모형(repeatability model)에 대한 blupf90 분석 결과 정리하기 (0)	2020.06.13
R을 이용하여 단형질 개체 모형(single trait animal model)에 대한 blupf90 분석 결과 정리하기 (0)	2020.06.03
R을 이용한 단형질 개체 모형(Single Trait Animal Model)의 육종가 구하기 프로그래밍 (0)	2020.05.18
R로 단형질 임의회귀모형의 해 구하기 (1)	2011.06.09

SNP Effect를 이용한 DGV(or GEBV) 추정

투정이 2020. 3. 5. 17:12

2020. 3. 5. 17:12

single step GBLUP의 장점은 SNP genotype이 있는 개체와 없는 개체를 동시에 분석할 수 있다는 것이다. 그러나 유전평가를 한 직후에 몇 마리의 genotype이 확보되고 그 개체의 DGV를 계산할 필요성이 생겼다면 다시 전체 유전평가를 하기란 쉬운 일이 아니다. 그래서 ssGBLUP 후에 PostGSF90을 이용하여 SNP effect를 추정하였고, 여기서는 이들 SNP effect를 이용하여 새로이 genotype을 확보한 개체의 DGV를 추정할 것이다. 여기서는 PREDF90 프로그램을 이용한다.

준비할 것은 PostGSF90이 출력한 snp_pred 파일과 새로운 개체의 genotype 파일이다. 그런데 새로운 개체의 genotype 파일을 준비하는데 몇 가지 과정을 거쳐야 한다. 새로운 개체가 몇 마리이든 기존 개체와 함께 imputation을 해야 한다. imputation 과정은 이미 https://bhpark.tistory.com/198?category=866943 으로 설명한 바 있다. 여기서는 함께 imputation한 개체 중 DGV를 추정할 개체만 뽑아내야 한다.

imputation된 결과 파일을 50k_for_gs.txt 이라 하자.

추출할 개체의 개체 번호 파일을 작성하고 cow_list.txt 이라하자.

전체 파일에서 일부 개체를 추출하여 저장하는 프로그램은 다음과 같다. sort, join은 유닉스 명령어이다. 자세한 사항은 https://bhpark.tistory.com/58 을 참고한다.

sort -k 1,1 50k_for_gs.txt > 50k_for_gs_02.txt

sort -k 1,1 cow_list.txt > cow_list_02.txt

join 50k_for_gs_02.txt cow_list_02.txt > selected_cows_for_predf90.txt

pause

추출한 개체의 유전자형이다.

여기서 주의해야 할 점은 유전평가할 때와 동일한 imputation을 해야 한다는 점이다. 예를 들어 유전평가할 때는 BovineSNP50v3로 했는데 DGV를 추정할 때는 HanwooSNP50v1으로 imputation하면 안 된다는 점이다. 그리고 imputation에 동원된 개체들의 genotype에 따라서 최종 imputation된 SNP가 달라질 수 있다. 예를 들어 유전평가할 때는 특정 SNP가 모두 missing(no call)이어서 imputation 결과에 특정 SNP가 빠졌는데, 후에 DGV를 추정하기 위하여 한 imputation에서는 해당 SNP의 genotype이 있어 imputation 에 포함될 수 있다. 그래서 유전평가할 때 imputation 된 SNP 수와 DGV 추정할 때 imputation 된 SNP 수를 비교하여야 하고, 다르다면 원인을 파악해야 한다.

다음으로 고려해야 할 것이 imputation 된 SNP의 수와 QC 후에 남아 있는 (그리고 effect를 추정한) SNP의 개수가 다르다는 점이다. 그래서 imputation된 SNP 중에서 effect를 추정한 SNP만 추출해야 한다. SNP를 추출하는 프로그램을 만들어야 하는데, 매번 effect를 가지고 있는 SNP가 달라지므로, SNP를 추출하는 프로그램을 만드는 프로그램을 작성해야 한다. 그건 SNP Map 파일에서 시작한다.

QC를 하고 난 SNP Map 파일의 이름은 snp_map_for_gs.txt_clean이다.

- 1열은 일련번호, 2열은 염색체 번호, 3열을 위치, 4열을 이름, 5열은 원래 일련번호

- 5열의 원래 일련번호를 이용해서 SNP 추출

SNP를 추출하는 프로그램을 작성하는 프로그램을 작성하는데 파일 이름을make_select_snp_awk_program.awk 로 한다.

BEGIN {
printf "{print $1, "
}
{
printf "substr($2,%d,1) ", $5
}
END {
printf "}\n"
}

## 2023.3.20.

QC한 후 출력하는 map file 형식이 바뀌었다.

header가 들어갔고, 1열이 사라졌다. 그래서 다음과 같이 AWK 프로그램이 바뀌어야 한다.

BEGIN {
    printf "{print $1, "
}
{
    if (NR >= 2){
        printf "substr($2,%d,1) ", $4
    }
}
END {
    printf "}\n"
}

NR >= 2라는 문장을 통해서 2행부터 뭔가 실행하는 것으로 한다. 그리고 $4 4열을 printf 문의 %d에 넣는다.

결과적으로 생긴 SNP를 추출하는 생성된 AWK 프로그램은 다음과 같다.

{print $1, substr($2,1,1) substr($2,3,1) substr($2,5,1) substr($2,8,1) 
substr($2,9,1) substr($2,11,1) substr($2,12,1) substr($2,14,1) 
substr($2,15,1) substr($2,17,1) substr($2,18,1) substr($2,20,1) 
substr($2,21,1) substr($2,22,1) substr($2,23,1) substr($2,24,1) 
substr($2,25,1) substr($2,26,1) substr($2,27,1) substr($2,29,1) 
...
}

2023.3.20. ##

실행은 다음과 같이 한다.

awk -f make_select_snp_awk_program.awk snp_map_for_gs.txt_clean > select_snp_for_predf90.awk

실행화면이다.

실행결과 만들어진 select_snp.awk 프로그램은 다음과 같다.

{print $1, substr($2,1,1) substr($2,2,1) substr($2,4,1) substr($2,6,1) substr($2,7,1) substr($2,11,1) substr($2,12,1) substr($2,13,1) substr($2,15,1) substr($2,16,1) substr($2,18,1) substr($2,20,1) substr($2,21,1) substr($2,22,1) substr($2,23,1) substr($2,24,1) substr($2,25,1)
...
}

1열은 그대로 출력하고, 2열의 1번째 문자, 2열의 2번째 문자, 2열의 4번째 문자 등을 출력한다.

SNP를 추출하는 프로그램을 다음과 같이 실행한다.

awk -f select_snp_for_predf90.awk selected_cows_for_predf90.txt > selected_cows_snps_for_predf90.txt

SNP를 추출한 결과는 다음과 같다.

한 행의 길이가 42577 = 개체번호(15) + 공백(1) + SNP(46561) 임을 알 수 있다.

입력 파일 준비가 완료되었으므로 DGV를 계산하자.

준비한 파일은 다음과 같다.

- selected_cows_snps_for_predf90.txt

- snp_pred : postgsf90을 실행한 결과로 나온 파일

임의 효과(형질과 관련 효과의 수)에 대한 정보, 42561개 SNP의 gene frequency, solutions of SNP effects가 포함되어 있다. 여기서는 1개 형질만 유전평가 했으므로 1개 형질에 대한 SNP 효과가 있다.

프로그램 실행은 다음과 같다.

predf90 --snpfile selected_cows_snps_for_predf90.txt | tee predf90_01.log

실행 결과로 만들어진 파일은 snp_predictions이다.

1열은 개체번호, 2열은 call rate(여기서는 imputation 후이므로 모두 1.0), 형질의 DGV이다.

같은 개체를 ssGBLUP으로 GEBV를 구하고, PREDF90으로 DGV를 구했을 경우 값이 차이가 나는데, 일정한 값이 차이가 나므로 사용에 문제는 없을 것으로 보인다. 순위 상관은 98 ~ 99% 정도 되는 것으로 나온다. 일정한 차이 이외에 나타나는 약간의 차이는 GEBV를 구할 때 GRM과 NRM을 섞지만, PREDF90에서는 오로지 genotype만 이용하기 때문인 것으로 보인다. 이러한 차이가 있으나 한 농장의 어린 가축들의 DGV를 구하고 사용하는데는 문제가 없을 것으로 보인다.

EBV(estimated breeding value), GEBV(genomic estimated breeding value), DGV(direct genomic value)와 같은 용어들이 사용되고 있는데 EBV는 혈통(pedigree)과 검정자료(phenotype)를 이용한 추정치, GEBV는 혈통, 검정자료와 유전자자료(SNP genotype)를 이용한 추정치, DGV는 SNP effect를 이용하여 SNP genotype으로부터 구한 추정치를 의미한다. 물론 사용자들마다 이것을 혼용해 사용하기도 한다.

저작자표시

'Animal Breeding > Genomic Selection' 카테고리의 다른 글

서로 다른 Chip 버전을 이용한 친자감정 및 부모 찾기 (1)	2023.04.18
H 행렬, H 행렬의 역행렬 유도 (0)	2021.01.19
PostGSF90을 이용한 GWAS (0)	2020.03.05
SNP Marker Effect 추정 (0)	2020.03.02
blupf90을 이용하여 유전체 육종가 구하기 (0)	2020.02.13

PREV 이전 1 ···6 7 8 9 10 11 12 ···50 NEXT 다음

Let's Study Animal Breeding

전체 글

The derivative of a determinant(행렬의 행렬식의 도함수, 미분)

'ETC' 카테고리의 다른 글

R을 이용한 단형질 개체 모형(Single Trait Animal Model)의 육종가 구하기 프로그래밍

'Animal Breeding > R for Genetic Evaluation' 카테고리의 다른 글

Maximum Likelihood Estimation(MLE)

'ETC' 카테고리의 다른 글

R을 이용한 혈통 파일 처리

'Animal Breeding > R for Genetic Evaluation' 카테고리의 다른 글

SNP Effect를 이용한 DGV(or GEBV) 추정

'Animal Breeding > Genomic Selection' 카테고리의 다른 글

+ Recent posts

티스토리툴바