blupf90으로 single-step GBLUP Model 풀기

투정이 2021. 1. 5. 17:06

2021. 1. 5. 17:06

# Single-step GBLUP Model(ssGBLUP)

# Linear Models for the Prediction of Animal Breeding Values, 3rd Edition.

# Raphael Mrode

# Example 11.6 p192

간단한 설명은 아래 포스팅 참조

2021/01/05 - [Animal Breeding/R for Genetic Evaluation] - Single-step GBLUP(ssGBLUP)

Single-step GBLUP(ssGBLUP)

# Single-step GBLUP Model(ssGBLUP) # Linear Models for the Prediction of Animal Breeding Values, 3rd Edition. # Raphael Mrode # Example 11.6 p192 single step GBLUP(ssGBLUP)은 유전체 자..

bhpark.tistory.com

Data(ssgblup_data.txt)

13 0 0 1 558 9 0.001792115
14 0 0 1 722 13.4 0.001385042
15 13 4 1 300 12.7 0.003333333
16 15 2 1 73 15.4 0.01369863
17 15 5 1 52 5.9 0.019230769
18 14 6 1 87 7.7 0.011494253
19 14 9 1 64 10.2 0.015625
20 14 9 1 103 4.8 0.009708738
21 1 3 1 13 7.6 0.076923077
22 14 8 1 125 8.8 0.008

animal, sire, dam, general mean, edc, dyd fat, 1/edc

Pedigree(ssgblup_pedi.txt)

SNP genotype(ssgblup_snpgenotype.txt)

18 11010202210000000000000000000000000000000000000000
19 00110202200000000000000000000000000000000000000000
20 01100102200000000000000000000000000000000000000000
21 20000122120000000000000000000000000000000000000000
22 00011202000000000000000000000000000000000000000000
23 01100102210000000000000000000000000000000000000000
24 10001102000000000000000000000000000000000000000000
25 00011202100000000000000000000000000000000000000000
26 10110201000000000000000000000000000000000000000000

Renumf90 파라미터 파일(renumf90_ssgblup.par)

# Parameter file for program renf90; it is translated to parameter
# file for BLUPF90 family programs.
DATAFILE
ssgblup_data.txt
TRAITS
6
FIELDS_PASSED TO OUTPUT
 
WEIGHT(S)
 7
RESIDUAL_VARIANCE
245
EFFECT
4 cross alpha
EFFECT
1 cross alpha
RANDOM
animal
FILE
ssgblup_pedi.txt
FILE_POS
1 2 3
SNP_FILE
ssgblup_snpgenotype.txt
PED_DEPTH
0
(CO)VARIANCES
35.241
OPTION no_quality_control
OPTION AlphaBeta 0.95 0.05
OPTION tunedG 0
OPTION thrStopCorAG 0.10
OPTION solv_method FSPAK

Renumf90 실행

renumf90 renumf90_ssgblup.par | tee renumf90_ssgblup.log

Renumf90 실행 화면

Renumf90 로그

 RENUMF90 version 1.145
 renumf90_ssgblup.par
 datafile:ssgblup_data.txt
 traits:           6
 R
   245.0    

 Processing effect  1 of type cross     
 item_kind=alpha     

 Processing effect  2 of type cross     
 item_kind=alpha     
 pedigree file name  "ssgblup_pedi.txt"
 positions of animal, sire, dam, alternate dam, yob, and group     1     2     3     0     0     0     0
 SNP file name  "ssgblup_snpgenotype.txt"
 all pedigrees to be included
 Reading (CO)VARIANCES:           1 x           1

 Maximum size of character fields: 20

 Maximum size of record (max_string_readline): 800

 Maximum number of fields for input file (max_field_readline): 100

 Pedigree search method (ped_search): convention

 Order of pedigree animals (animal_order): default

 Order of UPG (upg_order): default

 Missing observation code (missing): 0

 hash tables for effects set up
 first 3 lines of the data file (up to 70 characters)
    13 0 0 1 558 9 0.001792115
    14 0 0 1 722 13.4 0.001385042
    15 13 4 1 300 12.7 0.003333333
 read           10  records
 table with            1  elements sorted
 added count
 Effect group            1  of column            1  with            1  levels
 table expanded from        10000  to        10000  records
 added count
 Effect group            2  of column            1  with           10  levels
 wrote statistics in file "renf90.tables"

 Basic statistics for input data  (missing value code is '0')
 Pos  Min         Max         Mean        SD                 N
   6    4.8000      15.400      9.5500      3.3890          10

 random effect with SNPs  2
 type: animal    
 file: ssgblup_snpgenotype.txt
 read SNPs           9  records
 Effect group            2  of column            1  with           14  levels

 random effect   2
 type:animal    
 opened output pedigree file "renadd02.ped"
 read           26  pedigree records
 loaded           12  parent(s) in round            0

 Pedigree checks
 
 Number of animals with records                  =           10
 Number of animals with genotypes                =            9
 Number of animals with records or genotypes     =           14
 Number of animals with genotypes and no records =            4
 Number of parents without records or genotypes  =           12
 Total number of animals                         =           26

 Wrote cross reference IDs for SNP file "ssgblup_snpgenotype.txt_XrefID"

 Wrote parameter file "renf90.par"
 Wrote renumbered data "renf90.dat" 10 records

Renumf90 실행 결과 생성된 파일

renadd02.ped

14 4 26 1 0 12 0 0 0 26
1 0 0 3 0 0 1 1 0 13
2 15 17 1 0 12 1 0 0 21
19 0 0 3 0 0 0 0 1 5
3 4 23 1 0 12 1 0 0 19
4 0 0 3 0 0 1 8 0 14
5 4 22 1 0 12 1 0 0 22
17 0 0 3 0 0 0 0 1 3
22 0 0 3 0 0 0 0 1 8
6 1 18 1 0 2 1 2 0 15
11 4 25 1 0 12 0 0 0 23
24 0 0 3 0 0 0 0 1 10
15 0 0 3 0 0 0 1 0 1
20 0 0 3 0 0 0 0 1 6
7 6 16 1 0 2 1 0 0 16
12 4 24 1 0 12 0 0 0 24
25 0 0 3 0 0 0 0 1 11
18 0 0 3 0 0 0 0 1 4
8 6 19 1 0 2 1 0 0 17
23 0 0 3 0 0 0 0 2 9
13 4 21 1 0 12 0 0 0 25
26 0 0 3 0 0 0 0 1 12
9 4 23 1 0 12 1 0 0 20
16 0 0 3 0 0 0 0 1 2
10 4 20 1 0 12 1 0 0 18
21 0 0 3 0 0 0 0 1 7

Cross Reference ID(ssgblup_snpgenotype.txt_XrefID) - genotype을 가지고 있는 개체들의 renumbered ID

renf90.dat

 9 0.001792115 1 1
 13.4 0.001385042 1 4
 12.7 0.003333333 1 6
 15.4 0.01369863 1 7
 5.9 0.019230769 1 8
 7.7 0.011494253 1 10
 10.2 0.015625 1 3
 4.8 0.009708738 1 9
 7.6 0.076923077 1 2
 8.8 0.008 1 5

renf90.par

# BLUPF90 parameter file created by RENUMF90
DATAFILE
 renf90.dat
NUMBER_OF_TRAITS
           1
NUMBER_OF_EFFECTS
           2
OBSERVATION(S)
    1
WEIGHT(S)
           2
EFFECTS: POSITIONS_IN_DATAFILE NUMBER_OF_LEVELS TYPE_OF_EFFECT[EFFECT NESTED]
 3         1 cross 
 4        26 cross 
RANDOM_RESIDUAL VALUES
   245.00    
 RANDOM_GROUP
     2
 RANDOM_TYPE
 add_animal   
 FILE
renadd02.ped                                                                    
(CO)VARIANCES
   35.241    
OPTION SNP_file ssgblup_snpgenotype.txt
OPTION no_quality_control
OPTION AlphaBeta 0.95 0.05
OPTION tunedG 0
OPTION thrStopCorAG 0.10
OPTION solv_method FSPAK

본 예제의 genotype은 제대로 된 genotype이 아니다. 그래서 다음 프로그램이 정상적으로 동작하지 않는다. 그래서 no_quality_control, thrStopCorAG 등의 옵션을 넣어서 강제로 실행을 하는 것이다. 실제 분석에서는 세심한 주의를 기울여서 옵션을 주고 문제점이 있나 없나 확인을 해야 한다. 그렇지 않으면 유전체 자료를 이용하는 것이 독이 될 수도 있다.

중간 과정을 점검하기 위하여 H 역행렬, A 역행렬, GimA22i 등등이 필요하면 위의 "OPTION solv_method FSPAK"을 지우고 다음을 추가하여 pregsf90을 실행한다.

OPTION saveAscii
OPTION saveHinv
OPTION saveAinv
OPTION saveHinvOrig
OPTION saveAinvOrig
OPTION saveDiagGOrig
OPTION saveGOrig
OPTION saveA22Orig
OPTION saveGimA22iOrig

여기서는 실행하지 않는다.

실제에서는 pregsf90을 실행하여 유전체 자료의 문제점이 있는지 없는지 반드시 확인하여야 한다. pregsf90은 문제점을 찾아내고 수정하기 위한 많은 옵션을 제공하는데 반드시 의미하는 바가 무엇인지 확인하고 진행하기를 바란다. 그리고 pregsf90, blupf90은 문제가 있어도 대충 덮고 실행을 계속하게 한다. 그러나 그렇게 하지 말기를 바란다. 문제가 발견되면 문제를 일으킨 유전체 자료가 무엇인지 확인하고 분석비용이 아깝긴 하지만 넣는 것이 바람직한 것인지 아니면 버리는 것이 나은 것인지 판단을 하고 진행해야 한다. 보통 버리는 것이 정신 건강상 좋다. 유전체 자료를 넣는다고 무조건 정확도가 올라가서 개량량이 증가하는 것은 아니다. 양질의 유전체 자료를 넣을 때만 그렇게 된다. 당연한 얘기가 실제에선 무시되고 마구 분석하는 현실이 안타깝기만 하다. 모든 것이 그렇듯이 유전체 자료를 이용한 유전체 선발도 마법의 도구가 아니다.

blupf90 실행하기 - renumf90이 만들어낸 파라미터 파일을 이용하여 blupf90 실행

다음 명령어로 실행

blupf90 renf90.par | tee blupf90_ssgblup.log

실행 화면

실행 로그

renf90.par
     BLUPF90 ver. 1.68

 Parameter file:             renf90.par
 Data file:                  renf90.dat
 Number of Traits             1
 Number of Effects            2
 Position of Observations      1
 Position of Weight (1)        2
 Value of Missing Trait/Observation           0

EFFECTS
 #  type                position (2)        levels   [positions for nested]
    1  cross-classified       3         1
    2  cross-classified       4        26

 Residual (co)variance Matrix
  245.00    

 Random Effect(s)    2
 Type of Random Effect:      additive animal
 Pedigree File:              renadd02.ped                                                                                                                                                                                                                                              
 trait   effect    (CO)VARIANCES
  1       2     35.24    

 REMARKS
  (1) Weight position 0 means no weights utilized
  (2) Effect positions of 0 for some effects and traits means that such
      effects are missing for specified traits
 

 * The limited number of OpenMP threads = 4

 * solving method (default=PCG):FSPAK               
 

 Options read from parameter file for genomic
 * SNP format: BLUPF90 standard (text)
 * SNP file: ssgblup_snpgenotype.txt
 * SNP Xref file: ssgblup_snpgenotype.txt_XrefID
 * No Quality Control Checks !!!!! (default .false.):  T
 * Set the threshold to Stop the analysis if low cor(A22,G) (default 0.3):  0.1
 * Create a tuned G (default = 2): 0
 * AlphaBeta defaults alpha=0.95, beta=0.05) :  0.95  0.05
 Data record length =            4
 # equations =           27
 G
  35.241    
 read           10  records in   0.9531250      s,                      21 
  nonzeroes
  read           26  additive pedigrees
 
 *--------------------------------------------------------------*
 *                 Genomic Library: Dist Version 1.281          *
 *                                                              *
 *             Optimized OpenMP Version -  4 threads            *
 *                                                              *
 *  Modified relationship matrix (H) created for effect:   2    *
 *--------------------------------------------------------------*
 
 Read 26 animals from pedigree file: "D:\users\bhpark\2021\job\공부하기\01_Linear Models for the Prediction of Animal Breeding Values_3rd_Mrode\26_ssGBLUP_blupf90\renadd02.ped"
 Number of Genotyped Animals: 9

 Creating A22 
    Extracting subset of: 19 pedigrees from: 26 elapsed time:     0.0000
    Calculating A22 Matrix by Colleau OpenMP...elapsed time: .0000
    Numbers of threads=1 4

 Reading SNP file
    Column position in file for the first marker: 4
    Format to read SNP file: (3x,400000i1)                                     
    Number of SNPs: 50
    Format: integer genotypes (0 to 5) to double-precision array
    Number of Genotyped animals: 9
    Reading SNP file elapsed time: .00
 
 Statistics of alleles frequencies in the current population
    N:             50
    Mean:       0.074
    Min:        0.000
    Max:        0.944
    Var:        0.038
 
 
 Quality Control - Monomorphic SNPs Exist - NOT REMOVED: 40

 Genotypes missings (%):  0.000

 Calculating G Matrix 
    Dgemm MKL #threads=     1    4 Elapsed omp_get_time:     0.0000
 
 Scale by Sum(2pq). Average:   3.19135802469136     

 Detecting samples with similar genotypes
    elapsed time=     0.0
 
 Blend G as alpha*G + beta*A22: (alpha,beta)     0.950     0.050

 Frequency - Diagonal of G
    N:           9
    Mean:        1.057
    Min:         0.565
    Max:         2.648
    Range:       0.104
    Class:     20
 
  #Class       Class   Count
       1  0.5645           1
       2  0.6687           1
       3  0.7729           1
       4  0.8771           4
       5  0.9813           1
       6   1.085           0
       7   1.190           0
       8   1.294           0
       9   1.398           0
      10   1.502           0
      11   1.606           0
      12   1.711           0
      13   1.815           0
      14   1.919           0
      15   2.023           0
      16   2.127           0
      17   2.232           0
      18   2.336           0
      19   2.440           0
      20   2.544           1
      21   2.648           0
 

 Check for diagonal of genomic relationship matrix

 ** High Diagonal of genotype 4  2.65 Not Removed
 ** Low Diagonal of genotype 8  0.56 Not Removed

 Check for diagonal of genomic relationship matrix, genotypes not removed: 2

 ------------------------------
  Final Pedigree-Based Matrix 
 ------------------------------
 
 Statistic of Rel. Matrix A22
                            N      Mean       Min       Max       Var
     Diagonal               9     1.000     1.000     1.000     0.000
     Off-diagonal          72     0.201     0.000     0.500     0.013
 

 ----------------------
  Final Genomic Matrix 
 ----------------------
 
 Statistic of Genomic Matrix
                            N      Mean       Min       Max       Var
     Diagonal               9     1.057     0.565     2.648     0.377
     Off-diagonal          72    -0.116    -0.742     0.725     0.142
 

 Correlation of Genomic Inbreeding and Pedigree Inbreeding
     Variance of Y is Zero !!
     Correlation:     0.0000
 
 Diagonal elements
    Estimating Regression Coefficients G = b0 11' + b1 A + e
    Regression coefficients b0 b1 =        NaN       NaN

    Correlation diagonal elements G & A       NaN
 
 All elements - Diagonal / Off-Diagonal
    Estimating Regression Coefficients G = b0 11' + b1 A + e
    Regression coefficients b0 b1 =     -0.395     1.413

    Correlation all elements G & A     0.708
 
 Off-Diagonal
    Using 56 elements from A22 >= .02000

    Estimating Regression Coefficients G = b0 11' + b1 A + e
    Regression coefficients b0 b1 =     -0.483     1.647

    Correlation Off-Diagonal elements G & A     0.212
 
 ***********************************************************************
 * CORRELATION FOR OFF-DIAGONALS G & A22 IS LOW THAN  0.50  !!!!!  *
 * MISIDENTIFIED GENOMIC SAMPLES OR POOR QUALITY GENOMIC DATA *
 ***********************************************************************

 Creating A22-inverse 
    Inverse LAPACK MKL dpotrf/i  #threads=    1    4 Elapsed omp_get_time:     0.0000

 ----------------------
  Final A22 Inv Matrix 
 ----------------------
 
 Statistic of Inv. Rel. Matrix A22
                            N      Mean       Min       Max       Var
     Diagonal               9     1.233     1.000     1.429     0.017
     Off-diagonal          72    -0.101    -0.571     0.000     0.009
 
 
 Creating G-inverse 
    Inverse LAPACK MKL dpotrf/i  #threads=    1    4 Elapsed omp_get_time:     0.0000

 --------------------------
  Final Genomic Inv Matrix 
 --------------------------
 
 Statistic of Inv. Genomic Matrix
                            N      Mean       Min       Max       Var
     Diagonal               9     5.227     1.593    11.227     9.161
     Off-diagonal          72     0.308    -5.482     3.166     3.142
 

 Check for diagonal of Inverse Genomic - Inverse of pedigree relationship matrix


 ------------------------------
  Final G Inv - A22 Inv Matrix 
 ------------------------------
 
 Statistic of Inv. Genomic - A22 Matrix
                            N      Mean       Min       Max       Var
     Diagonal               9     3.994     0.593    10.021     8.878
     Off-diagonal          72     0.408    -5.355     3.293     3.069
 

*--------------------------------------------------*
* Setup Genomic Done !!!, elapsed time:      0.049 *
*--------------------------------------------------*

 finished peds in    1.156250      s,                     108  nonzeroes
 solutions stored in file: "solutions"

blupf90 실행결과 생기는 파일

sum2pq

   3.19135802469136

solutions

trait/effect level  solution
   1   1         1          8.39032676
   1   2         1          0.00291425
   1   2         2         -0.02384446
   1   2         3          0.00746559
   1   2         4          0.00433541
   1   2         5          0.00839707
   1   2         6          0.00559388
   1   2         7          0.01313654
   1   2         8         -0.00236465
   1   2         9          0.00100295
   1   2        10         -0.00244449
   1   2        11         -0.00298365
   1   2        12         -0.00082301
   1   2        13          0.00793245
   1   2        14          0.00483534
   1   2        15         -0.01192223
   1   2        16          0.00689306
   1   2        17         -0.01192223
   1   2        18          0.00275784
   1   2        19         -0.00344106
   1   2        20         -0.00307480
   1   2        21          0.00384316
   1   2        22          0.00415291
   1   2        23          0.00206656
   1   2        24         -0.00199381
   1   2        25         -0.00343423
   1   2        26          0.00177842

original ID 로 정렬한 개체의 gebv

rid	oid	sol
15	1	-0.01192
16	2	0.00689
17	3	-0.01192
18	4	0.00276
19	5	-0.00344
20	6	-0.00307
21	7	0.00384
22	8	0.00415
23	9	0.00207
24	10	-0.00199
25	11	-0.00343
26	12	0.00178
1	13	0.00291
4	14	0.00434
6	15	0.00559
7	16	0.01314
8	17	-0.00236
10	18	-0.00244
3	19	0.00747
9	20	0.00100
2	21	-0.02384
5	22	0.00840
11	23	-0.00298
12	24	-0.00082
13	25	0.00793
14	26	0.00484

해가 책과도 다르고 blupf90 팀의 Yutaka Masuda의 해와도 다르다. 위의 참고 포스팅하고는 같다. 이유는 참고 포스팅을 보기 바란다.

저작자표시

'Animal Breeding > BLUPF90' 카테고리의 다른 글

inbupgf90을 이용한 근교계수 계산과 교배계획 세우기 (0)	2023.05.25
blupf90으로 social interaction model 풀기 (1)	2021.04.21
blupf90으로 GBLUP Model with residual polygenic effects 풀기 (0)	2020.12.29
blupf90으로 gblup model 풀기 (0)	2020.12.29
blupf90으로 SNP-BLUP Model(with polygenic effects unweighted analysis) 풀기 (0)	2020.12.24

Let's Study Animal Breeding

blupf90으로 single-step GBLUP Model 풀기

'Animal Breeding > BLUPF90' 카테고리의 다른 글

+ Recent posts

티스토리툴바