Each phenotype row shows its top 3 associated latent factors with variance components and w-values.
Reference: Lazarev et al. bioRxiv 2024
Select a phenotype of interest to visualize variance components.
Reference: Lazarev et al. bioRxiv 2024
Select a trait to view its variant associations. Variants are colored by which of the trait's top 3 latent factors they load onto:
Enrichment score: En = (g_sig/g_total) / (n_sig/n_total)
Reference: Lazarev et al. bioRxiv 2024
Reference: Lazarev et al. bioRxiv 2024
This browser uses the "all" dataset from Tanigawa et al. (Nature Communications 2019), which includes comprehensive phenotypic data from the UK Biobank.
To create your own GUIDE model, visit the GUIDE GitHub repository for the relevant code. For detailed methodology and theoretical background, refer to our paper: Lazarev et al. bioRxiv 2024.
The latent factor characterizations displayed in this browser were generated using Claude Opus 4.5 with extended thinking enabled and a minimized temperature parameter to ensure reproducibility. The model analyzed the top associated phenotypes, genetic variants, and biological pathways for each latent factor to produce interpretable summaries of the underlying biological mechanisms.
Read the attached paper. I am also attaching the GUIDE model for the "all" dataset with L=100 latent factors. Specifically, I'm attaching: - the GUIDE X -> L (SNP to latent) and L ->T (latent to trait) variance components matrices with the top 100 values per latent (phenotype_var_comp_top100_per_latent.csv, variant_var_comp_top100_per_latent.csv) - the GUIDE X -> L (SNP to latent) and L ->T (latent to trait) contribution scores matrices with the top 100 values per latent (variant_contribution_top100_per_latent.csv, phenotype_contribution_top100_per_latent.csv) - the GUIDE X -> L (SNP to latent) and L ->T (latent to trait) -log10(w-values) matrices computed using the variance components with the top 100 values per latent (variant_logw_mat_XL_top100_per_latent.csv, phenotype_logw_mat_LT_top100_per_latent.csv) - the trait labels (phenotype_labels.txt) - the genetic variant (SNP) labels (variant_labels.txt) - dictionary from SNP to nearest gene(snp_gene_annotation.json) Using this information, for every one of the 100 latent factors write a short, informative label that encapsulates and best characterizes its biological/mechanistic/pathophysiological role. Also, provide a written summary describing its most likely role given its most significant variance components, contribution scores, and w-values relative to both traits and genetic variants. (The contribution scores, unlike the variance components, does not include the scaling information, and should be understood as representing the degree to which a given genetic variant or trait maps onto a given latent factor based on the optimal basis found by GUIDE, as opposed to the specific linear combination, using the scaling information, that is particular to the given dataset.) Write this as an expert, world-class geneticist. The tone should be professional, the language precise and clear, and the analysis thorough and rigorous, with references to standard databases and peer-reviewed research articles. All claims should be supported by the information attached and/or by the references described in the previous sentence.