Fig. 1 Flow chart outlining multiple steps involved in the database design such as literature search, data extraction, data annotation and data harmonization.
Fig. 2 (A) Pie chart outlining the distribution of 652 breast cancer associated loci across the genome. (B) Pie chart outlining the distribution of variants that either predisposes to breast cancer (Disease; OR >1) or confer protection against breast cancer (Protective; OR <1).
Fig. 3 Chromosomal ideogram illustrating the distribution of 652 breast cancer associated loci across the chromosomes. Chromosomal ideogram was constructed using PhenoGram software tool with each dot representing one gene or variant.
Fig. 4 (A) Scatter plot illustrating the number of breast cancer associated loci relative to its length for every chromosome. The chromosomal length for each chromosome was retrieved from Ensembl under Chromosome Statistics. (B) Scatter plot illustrating the number of breast cancer associated loci relative to the total number of genes present in each chromosome. The total number of genes for each chromosome was calculated using Ensembl (Chromosome Statistics) by adding the number of coding genes, non-coding genes and pseudogenes. The thick continuous line depicts the trendline for the number of breast cancer associated loci present in each chromosome compared to its length (A) or the total number of genes present in that chromosome (B). (A & B) The thin dotted line is an imaginary trendline to illustrate a perfect positive correlation.
Fig. 5 Flow chart outlining the different criteria used to annotate and collate the rare-monogenic variant containing breast cancer genes. Out of the 459 breast cancer genes, our manual curation effort has identified 39 genes to contain disease causing monogenic variants.
Fig. 6 Protein network analysis performed in the 459 breast cancer genes, revealed a major cluster enriched among the DNA repair pathways. Rare-monogenic variant containing breast cancer genes (red dots) were mainly present within this cluster. The protein-protein interaction network was constructed using STRING database and graphically adjusted in Cytoscape.