Bioinformatics


Software Tool for Short Sequence Repeat Analysis

Patterns of short-sequence repeats (SSRs) at specific loci are widely used to distinguish microbial strains. In collaboration with Prof. Kelly Brayton (Allen School for Global Animal Health, WSU), we developed RepeatAnalyzer—a robust software tool for managing and analyzing SSR data on bacterial strains infecting cattle. The initial release included an Anaplasma marginale dataset and supported storing, searching, and identifying (i) SSR motifs, (ii) strains containing those motifs, and (iii) the publications reporting them. The prototype was deployed to track A. marginale strains in South Africa, Asia, and multiple U.S. groups. Although our first focus was A. marginale, the software was designed to remain flexible for any species whose strains are defined by SSRs.


Sequence Similarity Network Model and Applications

Sequence similarity networks are useful for classifying and characterizing biologically important proteins and other structures. Threshold-based approaches to similarity network construction using exact distance measures are prohibitively slow to compute and rely on the difficult task of selecting an appropriate threshold, while similarity networks based on approximate distance calculations compromise useful structural information. 

We introduced the Directed Weighted Average Nearest-Neighbor (DiWANN) framework that overcomes both of these drawbacks. DiWANN creates sparse but information-rich similarity graphs by representing each sequence by a node and connecting the node via a directed edge to only the closest sequence, or sequences in the case of ties. The weights on the edges represent the edit distance between the sequences. Our DiWANN model is accompanied by an efficient pruning and bounding-based algorithm for constructing the network. This loss-aware pruning lets researchers map relationships across tens of thousands of genomes without the noise or computational burden of dense networks, revealing clusters and outliers that dense graphs often obscure.

Building on DiWANN’s concise representation, we examined various applications, including analysis of cancer’s driver-mutation landscape through two complementary graph views. A reduced-data DiWANN sequence-similarity network links tumors with highly similar driver-gene profiles, spotlighting cancer types whose samples cluster tightly—and may therefore respond to common therapies. In parallel, a bipartite tumor-to-gene network (and its one-mode gene projection) shows which driver genes frequently co-occur and which remain exclusive to particular cancers. Together, these network perspectives uncover both pan-cancer and cancer-specific driver mutations, offering insight into tumour heterogeneity and potential therapeutic targets.


Papers

  • H. Catanese, K. Brayton and A.H. Gebremedhin. A Nearest-Neighbors Network Model for Sequence Data Reveals New Insight into Genotype Distribution of a Pathogen, BMC Bioinformatics (2018) 19:475. https://doi.org/10.1186/s12859-018-2453-2.
    Abstract    Paper
  • S. Patil, S. Roberts, A. Gebremedhin.
    Network Analysis of Driver Genes in Human Cancers, Frontiers in Bioinformatics 4 (2024).
    Abstract  Paper
  • S. Patil, H. Catanese, K. Brayton, E. Lofgren, A. H. Gebremedhin.
    Sequence Similarity Network Analysis Provides Insight into the Temporal and Geographical Distribution of Mutations in SARS-CoV-2 Spike Protein, Viruses 14, 1672 (2022).
    Abstract  Paper
  • H.N. Catanese, K.A. Brayton and A.H. Gebremedhin, RepeatAnalyzer: a tool for analysing and managing short-sequence repeat data, BMC Genomics 2016 17:422. DOI: 10.1186/s12864-016-2686-2
    Abstract    Paper
  • Z.T.H. Khumalo, H.N. Catanese, N. Leisching, P. Hove, N.E. Collins, M.E. Chaisit,  A.H. Gebremedhin, M.C. Oosthuizen and K.A. Brayton, Characterization of Anaplasma marginale subspecies centrale using msp1aS genotyping reveals wildfire reservoir,  Journal of Clinical Microbiology, 2016 54:10, 2503-2512
    Abstract    Paper
  • P. Hove, M.E. Chaisi, K.A. Brayton, H. Ganesan, H.N. Catanese, M.S. Mtshali, A.M. Mutshembele, M.C. Oosthuizen, and N.E. Collins, Co-infections with multiple genotypes of Anaplasma marginale in cattle indicate pathogen diversity, Parasites & Vectors (2018) 11:5
    Abstract  Paper
  • E. Khaledian, A. H. Gebremedhin, K. Brayton, S. Broschat.
    A Network Science Approach for Determining the Ancestral Phylum of Bacteria, ACM-BCB 2018.
    Abstract | Paper