# Single-step GBLUP Model(ssGBLUP)
# Linear Models for the Prediction of Animal Breeding Values, 3rd Edition.
# Raphael Mrode
# Example 11.6 p192
간단한 설명은 아래 포스팅 참조
2021/01/05 - [Animal Breeding/R for Genetic Evaluation] - Single-step GBLUP(ssGBLUP)
Data(ssgblup_data.txt)
13 0 0 1 558 9 0.001792115
14 0 0 1 722 13.4 0.001385042
15 13 4 1 300 12.7 0.003333333
16 15 2 1 73 15.4 0.01369863
17 15 5 1 52 5.9 0.019230769
18 14 6 1 87 7.7 0.011494253
19 14 9 1 64 10.2 0.015625
20 14 9 1 103 4.8 0.009708738
21 1 3 1 13 7.6 0.076923077
22 14 8 1 125 8.8 0.008
animal, sire, dam, general mean, edc, dyd fat, 1/edc
Pedigree(ssgblup_pedi.txt)
1 0 0
2 0 0
3 0 0
4 0 0
5 0 0
6 0 0
7 0 0
8 0 0
9 0 0
10 0 0
11 0 0
12 0 0
13 0 0
14 0 0
15 13 4
16 15 2
17 15 5
18 14 6
19 14 9
20 14 9
21 1 3
22 14 8
23 14 11
24 14 10
25 14 7
26 14 12
SNP genotype(ssgblup_snpgenotype.txt)
18 11010202210000000000000000000000000000000000000000
19 00110202200000000000000000000000000000000000000000
20 01100102200000000000000000000000000000000000000000
21 20000122120000000000000000000000000000000000000000
22 00011202000000000000000000000000000000000000000000
23 01100102210000000000000000000000000000000000000000
24 10001102000000000000000000000000000000000000000000
25 00011202100000000000000000000000000000000000000000
26 10110201000000000000000000000000000000000000000000
Renumf90 파라미터 파일(renumf90_ssgblup.par)
# Parameter file for program renf90; it is translated to parameter
# file for BLUPF90 family programs.
DATAFILE
ssgblup_data.txt
TRAITS
6
FIELDS_PASSED TO OUTPUT
WEIGHT(S)
7
RESIDUAL_VARIANCE
245
EFFECT
4 cross alpha
EFFECT
1 cross alpha
RANDOM
animal
FILE
ssgblup_pedi.txt
FILE_POS
1 2 3
SNP_FILE
ssgblup_snpgenotype.txt
PED_DEPTH
0
(CO)VARIANCES
35.241
OPTION no_quality_control
OPTION AlphaBeta 0.95 0.05
OPTION tunedG 0
OPTION thrStopCorAG 0.10
OPTION solv_method FSPAK
Renumf90 실행
renumf90 renumf90_ssgblup.par | tee renumf90_ssgblup.log
Renumf90 실행 화면
Renumf90 로그
RENUMF90 version 1.145
renumf90_ssgblup.par
datafile:ssgblup_data.txt
traits: 6
R
245.0
Processing effect 1 of type cross
item_kind=alpha
Processing effect 2 of type cross
item_kind=alpha
pedigree file name "ssgblup_pedi.txt"
positions of animal, sire, dam, alternate dam, yob, and group 1 2 3 0 0 0 0
SNP file name "ssgblup_snpgenotype.txt"
all pedigrees to be included
Reading (CO)VARIANCES: 1 x 1
Maximum size of character fields: 20
Maximum size of record (max_string_readline): 800
Maximum number of fields for input file (max_field_readline): 100
Pedigree search method (ped_search): convention
Order of pedigree animals (animal_order): default
Order of UPG (upg_order): default
Missing observation code (missing): 0
hash tables for effects set up
first 3 lines of the data file (up to 70 characters)
13 0 0 1 558 9 0.001792115
14 0 0 1 722 13.4 0.001385042
15 13 4 1 300 12.7 0.003333333
read 10 records
table with 1 elements sorted
added count
Effect group 1 of column 1 with 1 levels
table expanded from 10000 to 10000 records
added count
Effect group 2 of column 1 with 10 levels
wrote statistics in file "renf90.tables"
Basic statistics for input data (missing value code is '0')
Pos Min Max Mean SD N
6 4.8000 15.400 9.5500 3.3890 10
random effect with SNPs 2
type: animal
file: ssgblup_snpgenotype.txt
read SNPs 9 records
Effect group 2 of column 1 with 14 levels
random effect 2
type:animal
opened output pedigree file "renadd02.ped"
read 26 pedigree records
loaded 12 parent(s) in round 0
Pedigree checks
Number of animals with records = 10
Number of animals with genotypes = 9
Number of animals with records or genotypes = 14
Number of animals with genotypes and no records = 4
Number of parents without records or genotypes = 12
Total number of animals = 26
Wrote cross reference IDs for SNP file "ssgblup_snpgenotype.txt_XrefID"
Wrote parameter file "renf90.par"
Wrote renumbered data "renf90.dat" 10 records
Renumf90 실행 결과 생성된 파일
renadd02.ped
14 4 26 1 0 12 0 0 0 26
1 0 0 3 0 0 1 1 0 13
2 15 17 1 0 12 1 0 0 21
19 0 0 3 0 0 0 0 1 5
3 4 23 1 0 12 1 0 0 19
4 0 0 3 0 0 1 8 0 14
5 4 22 1 0 12 1 0 0 22
17 0 0 3 0 0 0 0 1 3
22 0 0 3 0 0 0 0 1 8
6 1 18 1 0 2 1 2 0 15
11 4 25 1 0 12 0 0 0 23
24 0 0 3 0 0 0 0 1 10
15 0 0 3 0 0 0 1 0 1
20 0 0 3 0 0 0 0 1 6
7 6 16 1 0 2 1 0 0 16
12 4 24 1 0 12 0 0 0 24
25 0 0 3 0 0 0 0 1 11
18 0 0 3 0 0 0 0 1 4
8 6 19 1 0 2 1 0 0 17
23 0 0 3 0 0 0 0 2 9
13 4 21 1 0 12 0 0 0 25
26 0 0 3 0 0 0 0 1 12
9 4 23 1 0 12 1 0 0 20
16 0 0 3 0 0 0 0 1 2
10 4 20 1 0 12 1 0 0 18
21 0 0 3 0 0 0 0 1 7
Cross Reference ID(ssgblup_snpgenotype.txt_XrefID) - genotype을 가지고 있는 개체들의 renumbered ID
10 18
3 19
9 20
2 21
5 22
11 23
12 24
13 25
14 26
renf90.dat
9 0.001792115 1 1
13.4 0.001385042 1 4
12.7 0.003333333 1 6
15.4 0.01369863 1 7
5.9 0.019230769 1 8
7.7 0.011494253 1 10
10.2 0.015625 1 3
4.8 0.009708738 1 9
7.6 0.076923077 1 2
8.8 0.008 1 5
renf90.par
# BLUPF90 parameter file created by RENUMF90
DATAFILE
renf90.dat
NUMBER_OF_TRAITS
1
NUMBER_OF_EFFECTS
2
OBSERVATION(S)
1
WEIGHT(S)
2
EFFECTS: POSITIONS_IN_DATAFILE NUMBER_OF_LEVELS TYPE_OF_EFFECT[EFFECT NESTED]
3 1 cross
4 26 cross
RANDOM_RESIDUAL VALUES
245.00
RANDOM_GROUP
2
RANDOM_TYPE
add_animal
FILE
renadd02.ped
(CO)VARIANCES
35.241
OPTION SNP_file ssgblup_snpgenotype.txt
OPTION no_quality_control
OPTION AlphaBeta 0.95 0.05
OPTION tunedG 0
OPTION thrStopCorAG 0.10
OPTION solv_method FSPAK
본 예제의 genotype은 제대로 된 genotype이 아니다. 그래서 다음 프로그램이 정상적으로 동작하지 않는다. 그래서 no_quality_control, thrStopCorAG 등의 옵션을 넣어서 강제로 실행을 하는 것이다. 실제 분석에서는 세심한 주의를 기울여서 옵션을 주고 문제점이 있나 없나 확인을 해야 한다. 그렇지 않으면 유전체 자료를 이용하는 것이 독이 될 수도 있다.
중간 과정을 점검하기 위하여 H 역행렬, A 역행렬, GimA22i 등등이 필요하면 위의 "OPTION solv_method FSPAK"을 지우고 다음을 추가하여 pregsf90을 실행한다.
OPTION saveAscii
OPTION saveHinv
OPTION saveAinv
OPTION saveHinvOrig
OPTION saveAinvOrig
OPTION saveDiagGOrig
OPTION saveGOrig
OPTION saveA22Orig
OPTION saveGimA22iOrig
여기서는 실행하지 않는다.
실제에서는 pregsf90을 실행하여 유전체 자료의 문제점이 있는지 없는지 반드시 확인하여야 한다. pregsf90은 문제점을 찾아내고 수정하기 위한 많은 옵션을 제공하는데 반드시 의미하는 바가 무엇인지 확인하고 진행하기를 바란다. 그리고 pregsf90, blupf90은 문제가 있어도 대충 덮고 실행을 계속하게 한다. 그러나 그렇게 하지 말기를 바란다. 문제가 발견되면 문제를 일으킨 유전체 자료가 무엇인지 확인하고 분석비용이 아깝긴 하지만 넣는 것이 바람직한 것인지 아니면 버리는 것이 나은 것인지 판단을 하고 진행해야 한다. 보통 버리는 것이 정신 건강상 좋다. 유전체 자료를 넣는다고 무조건 정확도가 올라가서 개량량이 증가하는 것은 아니다. 양질의 유전체 자료를 넣을 때만 그렇게 된다. 당연한 얘기가 실제에선 무시되고 마구 분석하는 현실이 안타깝기만 하다. 모든 것이 그렇듯이 유전체 자료를 이용한 유전체 선발도 마법의 도구가 아니다.
blupf90 실행하기 - renumf90이 만들어낸 파라미터 파일을 이용하여 blupf90 실행
다음 명령어로 실행
blupf90 renf90.par | tee blupf90_ssgblup.log
실행 화면
실행 로그
renf90.par
BLUPF90 ver. 1.68
Parameter file: renf90.par
Data file: renf90.dat
Number of Traits 1
Number of Effects 2
Position of Observations 1
Position of Weight (1) 2
Value of Missing Trait/Observation 0
EFFECTS
# type position (2) levels [positions for nested]
1 cross-classified 3 1
2 cross-classified 4 26
Residual (co)variance Matrix
245.00
Random Effect(s) 2
Type of Random Effect: additive animal
Pedigree File: renadd02.ped
trait effect (CO)VARIANCES
1 2 35.24
REMARKS
(1) Weight position 0 means no weights utilized
(2) Effect positions of 0 for some effects and traits means that such
effects are missing for specified traits
* The limited number of OpenMP threads = 4
* solving method (default=PCG):FSPAK
Options read from parameter file for genomic
* SNP format: BLUPF90 standard (text)
* SNP file: ssgblup_snpgenotype.txt
* SNP Xref file: ssgblup_snpgenotype.txt_XrefID
* No Quality Control Checks !!!!! (default .false.): T
* Set the threshold to Stop the analysis if low cor(A22,G) (default 0.3): 0.1
* Create a tuned G (default = 2): 0
* AlphaBeta defaults alpha=0.95, beta=0.05) : 0.95 0.05
Data record length = 4
# equations = 27
G
35.241
read 10 records in 0.9531250 s, 21
nonzeroes
read 26 additive pedigrees
*--------------------------------------------------------------*
* Genomic Library: Dist Version 1.281 *
* *
* Optimized OpenMP Version - 4 threads *
* *
* Modified relationship matrix (H) created for effect: 2 *
*--------------------------------------------------------------*
Read 26 animals from pedigree file: "D:\users\bhpark\2021\job\공부하기\01_Linear Models for the Prediction of Animal Breeding Values_3rd_Mrode\26_ssGBLUP_blupf90\renadd02.ped"
Number of Genotyped Animals: 9
Creating A22
Extracting subset of: 19 pedigrees from: 26 elapsed time: 0.0000
Calculating A22 Matrix by Colleau OpenMP...elapsed time: .0000
Numbers of threads=1 4
Reading SNP file
Column position in file for the first marker: 4
Format to read SNP file: (3x,400000i1)
Number of SNPs: 50
Format: integer genotypes (0 to 5) to double-precision array
Number of Genotyped animals: 9
Reading SNP file elapsed time: .00
Statistics of alleles frequencies in the current population
N: 50
Mean: 0.074
Min: 0.000
Max: 0.944
Var: 0.038
Quality Control - Monomorphic SNPs Exist - NOT REMOVED: 40
Genotypes missings (%): 0.000
Calculating G Matrix
Dgemm MKL #threads= 1 4 Elapsed omp_get_time: 0.0000
Scale by Sum(2pq). Average: 3.19135802469136
Detecting samples with similar genotypes
elapsed time= 0.0
Blend G as alpha*G + beta*A22: (alpha,beta) 0.950 0.050
Frequency - Diagonal of G
N: 9
Mean: 1.057
Min: 0.565
Max: 2.648
Range: 0.104
Class: 20
#Class Class Count
1 0.5645 1
2 0.6687 1
3 0.7729 1
4 0.8771 4
5 0.9813 1
6 1.085 0
7 1.190 0
8 1.294 0
9 1.398 0
10 1.502 0
11 1.606 0
12 1.711 0
13 1.815 0
14 1.919 0
15 2.023 0
16 2.127 0
17 2.232 0
18 2.336 0
19 2.440 0
20 2.544 1
21 2.648 0
Check for diagonal of genomic relationship matrix
** High Diagonal of genotype 4 2.65 Not Removed
** Low Diagonal of genotype 8 0.56 Not Removed
Check for diagonal of genomic relationship matrix, genotypes not removed: 2
------------------------------
Final Pedigree-Based Matrix
------------------------------
Statistic of Rel. Matrix A22
N Mean Min Max Var
Diagonal 9 1.000 1.000 1.000 0.000
Off-diagonal 72 0.201 0.000 0.500 0.013
----------------------
Final Genomic Matrix
----------------------
Statistic of Genomic Matrix
N Mean Min Max Var
Diagonal 9 1.057 0.565 2.648 0.377
Off-diagonal 72 -0.116 -0.742 0.725 0.142
Correlation of Genomic Inbreeding and Pedigree Inbreeding
Variance of Y is Zero !!
Correlation: 0.0000
Diagonal elements
Estimating Regression Coefficients G = b0 11' + b1 A + e
Regression coefficients b0 b1 = NaN NaN
Correlation diagonal elements G & A NaN
All elements - Diagonal / Off-Diagonal
Estimating Regression Coefficients G = b0 11' + b1 A + e
Regression coefficients b0 b1 = -0.395 1.413
Correlation all elements G & A 0.708
Off-Diagonal
Using 56 elements from A22 >= .02000
Estimating Regression Coefficients G = b0 11' + b1 A + e
Regression coefficients b0 b1 = -0.483 1.647
Correlation Off-Diagonal elements G & A 0.212
***********************************************************************
* CORRELATION FOR OFF-DIAGONALS G & A22 IS LOW THAN 0.50 !!!!! *
* MISIDENTIFIED GENOMIC SAMPLES OR POOR QUALITY GENOMIC DATA *
***********************************************************************
Creating A22-inverse
Inverse LAPACK MKL dpotrf/i #threads= 1 4 Elapsed omp_get_time: 0.0000
----------------------
Final A22 Inv Matrix
----------------------
Statistic of Inv. Rel. Matrix A22
N Mean Min Max Var
Diagonal 9 1.233 1.000 1.429 0.017
Off-diagonal 72 -0.101 -0.571 0.000 0.009
Creating G-inverse
Inverse LAPACK MKL dpotrf/i #threads= 1 4 Elapsed omp_get_time: 0.0000
--------------------------
Final Genomic Inv Matrix
--------------------------
Statistic of Inv. Genomic Matrix
N Mean Min Max Var
Diagonal 9 5.227 1.593 11.227 9.161
Off-diagonal 72 0.308 -5.482 3.166 3.142
Check for diagonal of Inverse Genomic - Inverse of pedigree relationship matrix
------------------------------
Final G Inv - A22 Inv Matrix
------------------------------
Statistic of Inv. Genomic - A22 Matrix
N Mean Min Max Var
Diagonal 9 3.994 0.593 10.021 8.878
Off-diagonal 72 0.408 -5.355 3.293 3.069
*--------------------------------------------------*
* Setup Genomic Done !!!, elapsed time: 0.049 *
*--------------------------------------------------*
finished peds in 1.156250 s, 108 nonzeroes
solutions stored in file: "solutions"
blupf90 실행결과 생기는 파일
sum2pq
3.19135802469136
solutions
trait/effect level solution
1 1 1 8.39032676
1 2 1 0.00291425
1 2 2 -0.02384446
1 2 3 0.00746559
1 2 4 0.00433541
1 2 5 0.00839707
1 2 6 0.00559388
1 2 7 0.01313654
1 2 8 -0.00236465
1 2 9 0.00100295
1 2 10 -0.00244449
1 2 11 -0.00298365
1 2 12 -0.00082301
1 2 13 0.00793245
1 2 14 0.00483534
1 2 15 -0.01192223
1 2 16 0.00689306
1 2 17 -0.01192223
1 2 18 0.00275784
1 2 19 -0.00344106
1 2 20 -0.00307480
1 2 21 0.00384316
1 2 22 0.00415291
1 2 23 0.00206656
1 2 24 -0.00199381
1 2 25 -0.00343423
1 2 26 0.00177842
original ID 로 정렬한 개체의 gebv
rid oid sol
15 1 -0.01192
16 2 0.00689
17 3 -0.01192
18 4 0.00276
19 5 -0.00344
20 6 -0.00307
21 7 0.00384
22 8 0.00415
23 9 0.00207
24 10 -0.00199
25 11 -0.00343
26 12 0.00178
1 13 0.00291
4 14 0.00434
6 15 0.00559
7 16 0.01314
8 17 -0.00236
10 18 -0.00244
3 19 0.00747
9 20 0.00100
2 21 -0.02384
5 22 0.00840
11 23 -0.00298
12 24 -0.00082
13 25 0.00793
14 26 0.00484
해가 책과도 다르고 blupf90 팀의 Yutaka Masuda의 해와도 다르다. 위의 참고 포스팅하고는 같다. 이유는 참고 포스팅을 보기 바란다.
'Animal Breeding > BLUPF90' 카테고리의 다른 글
inbupgf90을 이용한 근교계수 계산과 교배계획 세우기 (0) | 2023.05.25 |
---|---|
blupf90으로 social interaction model 풀기 (1) | 2021.04.21 |
blupf90으로 GBLUP Model with residual polygenic effects 풀기 (0) | 2020.12.29 |
blupf90으로 gblup model 풀기 (0) | 2020.12.29 |
blupf90으로 SNP-BLUP Model(with polygenic effects unweighted analysis) 풀기 (0) | 2020.12.24 |