GUIDE Browser - Phenotype Table

Each phenotype row shows its top 3 associated latent factors with variance components and w-values.

Reference: Lazarev et al. bioRxiv 2024

Phenotype Association Table

GUIDE Browser - Bar Plots

Select a phenotype of interest to visualize variance components.

Reference: Lazarev et al. bioRxiv 2024

Manhattan Plots

Select a trait to view its variant associations. Variants are colored by which of the trait's top 3 latent factors they load onto:

  • Red: Loads onto 1st latent (En = enrichment score)
  • Orange: Loads onto 2nd latent
  • Green: Loads onto 3rd latent
  • Gray: Loads onto other latents

Enrichment score: En = (g_sig/g_total) / (n_sig/n_total)

Reference: Lazarev et al. bioRxiv 2024

GUIDE Browser - LLM Characterization

Reference: Lazarev et al. bioRxiv 2024

About this Dataset and LLM Characterization

Dataset

This browser uses the "all" dataset from Tanigawa et al. (Nature Communications 2019), which includes comprehensive phenotypic data from the UK Biobank.

Creating Your Own Model

To create your own GUIDE model, visit the GUIDE GitHub repository for the relevant code. For detailed methodology and theoretical background, refer to our paper: Lazarev et al. bioRxiv 2024.

LLM Characterization Details

The latent factor characterizations displayed in this browser were generated using Claude Opus 4.5 with extended thinking enabled and a minimized temperature parameter to ensure reproducibility. The model analyzed the top associated phenotypes, genetic variants, and biological pathways for each latent factor to produce interpretable summaries of the underlying biological mechanisms.

View Full LLM Prompt
Read the attached paper. I am also attaching the GUIDE model for the "all" dataset with L=100 latent factors. Specifically, I'm attaching:
- the GUIDE X -> L (SNP to latent) and L ->T (latent to trait) variance components matrices with the top 100 values per latent (phenotype_var_comp_top100_per_latent.csv, variant_var_comp_top100_per_latent.csv)
- the GUIDE X -> L (SNP to latent) and L ->T (latent to trait) contribution scores matrices with the top 100 values per latent (variant_contribution_top100_per_latent.csv, phenotype_contribution_top100_per_latent.csv)
- the GUIDE X -> L (SNP to latent) and L ->T (latent to trait) -log10(w-values) matrices computed using the variance components with the top 100 values per latent (variant_logw_mat_XL_top100_per_latent.csv, phenotype_logw_mat_LT_top100_per_latent.csv)
- the trait labels (phenotype_labels.txt)
- the genetic variant (SNP) labels (variant_labels.txt)
- dictionary from SNP to nearest gene(snp_gene_annotation.json)
Using this information, for every one of the 100 latent factors write a short, informative label that encapsulates and best characterizes its biological/mechanistic/pathophysiological role. Also, provide a written summary describing its most likely role given its most significant variance components, contribution scores, and w-values relative to both traits and genetic variants. (The contribution scores, unlike the variance components, does not include the scaling information, and should be understood as representing the degree to which a given genetic variant or trait maps onto a given latent factor based on the optimal basis found by GUIDE, as opposed to the specific linear combination, using the scaling information, that is particular to the given dataset.)
Write this as an expert, world-class geneticist. The tone should be professional, the language precise and clear, and the analysis thorough and rigorous, with references to standard databases and peer-reviewed research articles. All claims should be supported by the information attached and/or by the references described in the previous sentence.