Research
Research Focus
Abstracts
Title: Fast Algorithms for the Maximum Clique Problem on Massive Graphs with Applications to Overlapping Community Detection
Authors: B. Pattabiraman, M.M.A Patwary, A.H. Gebremedhin, W.K. Liao, A. Choudhary
Status: Internet Mathematics, Vol 11, No 4-5, pp 421-448, 2015.
Abstract: The maximum clique problem is a well known NP-Hard problem with applications in data mining, network analysis, information retrieval and many other areas related to the World Wide Web. There exist several algorithms for the problem with acceptable runtimes for certain classes of graphs, but many of them are infeasible for massive graphs. We present a new exact algorithm that employs novel pruning techniques and is able to find maximum cliques in very large, sparse graphs quickly. Extensive experiments on different kinds of synthetic and real-world graphs show that our new algorithm can be orders of magnitude faster than existing algorithms. We also present a heuristic that runs orders of magnitude faster than the exact algorithm while providing optimal or near-optimal solutions. We illustrate a simple application of the algorithms in developing methods for detection of overlapping communities in networks.
Title: Fast Maximum Clique Algorithms for Large Graphs
Authors: R.A. Rossi, D.F. Gleich, A.H. Gebremedhin, M.M.A. Patwary
Status: Proceedings of WWW2014.
Abstract: We propose a fast, parallel maximum clique algorithm for large sparse graphs that is designed to exploit characteristics of social and information networks. Despite clique’s status as an NP-hard problem with poor approximation guarantees, our method exhibits nearly linear runtime scaling over real-world networks ranging from 1000 to 100 million nodes. In a test on a social network with 1.8 billion edges, the algorithm finds the largest clique in about 20 minutes. Key to the efficiency of our algorithm are an initial heuristic procedure that finds a large clique quickly and a parallelized branch and bound strategy with aggressive pruning tnd ordering echniques. We use the algorithm to compute the largest temporal strong components of temporal contact networks.
Title: Fast Algorithms for the Maximum Clique Problem on Massive Sparse Graphs
Authors: B. Pattabiraman, M.M.A Patwary, A.H. Gebremedhin, W.K. Liao, A. Choudhary
Status: WAW 2013: 10th Workshop on Algorithms and Models for the Web Graph, LNCS 8305, pp 156-169, 2013.
Abstract: The maximum clique problem is a well known NP-Hard problem with applications in data mining, network analysis, information retrieval and many other areas related to the World Wide Web. There exist several algorithms for the problem with acceptable runtimes for certain classes of graphs, but many of them are infeasible for massive graphs. We present a new exact algorithm that employs novel pruning techniques and is able to quickly find maximum cliques in large sparse graphs. Extensive experiments on different kinds of synthetic and real-world graphs show that our new algorithm can be orders of magnitude faster than existing algorithms. We also present a heuristic that runs orders of magnitude faster than the exact algorithm while providing optimal or near-optimal solutions.
Title: Parallel Maximum Clique Algorithms with Applications to Network Analysis
Authors: R.A. Rossi, D.F. Gleich, A.H. Gebremedhin
Status: SIAM Journal on Scientific Computing, Vol 37, Issue 5, pages C589-C618, 2015.
Abstract: We present a fast, parallel maximum clique algorithm for large sparse graphs that is designed to exploit characteristics of social and information networks. The method exhibits a roughly linear runtime scaling over real-world networks ranging from a thousand to a hundred million nodes. In a test on a social network with 1.8 billion edges, the algorithm finds the largest clique in about 20 minutes. At its heart the algorithm employs a branch-and-bound strategy with novel and aggressive pruning techniques. The pruning techniques include the combined use of core numbers of vertices along with a good initial heuristic solution to remove the vast majority of the search space. In addition, the exploration of the search tree is parallelized. During the search, processes immediately communicate changes to upper and lower bounds on the size of maximum clique. This occasionally results in a super-linear speedup because tasks with large search spaces can be pruned by other processes. We demonstrate the impact of the algorithm on applications using two different network analysis problems: computation of temporal strong components in dynamic networks and determination of compress-friendly ordering of nodes of massive networks.
Title: ScaWL : Scaling k-WL (Weisfeiler-Leman) Algorithms in Memory and Performance on Shared and Distributed-Memory Systems
Authors: C. Soss, A. Rajam, J. Layne, E. Serra, M. Halappanavar, A. Gebremedhin
Status: ACM Transactions on Architecture and Code Optimization (TACO), accepted 2024.
Abstract: ScaWL introduces a storage-conscious reformulation of the k-dimensional Weisfeiler-Leman (k-WL) color-refinement procedure together with a hybrid shared/distributed-memory execution engine. By compressing intermediate color maps and pruning redundant refinement steps, ScaWL lowers peak memory demand by up to 40 × compared with state-of-the-art baselines while preserving exact k-WL expressiveness. Experiments on real and synthetic graphs with up to 1 billion edges show near-linear strong scaling to 1,024 CPU cores and order-of-magnitude speed-ups over prior distributed k-WL implementations, enabling practical use of high-order WL features in graph kernels, isomorphism testing, and GNN pre-training.
Title: Direction-Optimizing Label Propagation Framework for Structure Detection in Graphs: Design, Implementation, and Experimental Analysis
Authors: X. Liu, M. Halappanavar, K. Barker, A. Lumsdaine, A. H. Gebremedhin
Status: ACM Journal of Experimental Algorithmics 27 (1.12), 1–31 (2022).
Abstract: This article unifies three core structure-detection problems—weakly connected components, biconnected components, and community detection—under a single “direction-optimizing” label-propagation (DOLP) paradigm. The framework dynamically chooses push or pull traversal per super-step, reducing redundant edge visits in sparse regions while saturating bandwidth on dense subgraphs. A detailed performance model explains when the direction switch pays off, and an open-source C++/MPI implementation achieves up to 2.3 × speed-ups over traditional label propagation on real-world graphs with up to 38 billion edges, all while cutting DRAM traffic by 45 %.
Title: Direction-Optimizing Label Propagation and Its Application to Community Detection
Authors: X. Liu, M. Halappanavar, K. Barker, A. Lumsdaine, A. H. Gebremedhin
Status: Proceedings of Computing Frontiers (CF 2020)
Abstract: Extending the DOLP philosophy to modularity-based community detection, this work couples direction-aware traversal with a modularity-gain tie-breaker to accelerate convergence on scale-free networks. A mixed OpenMP/CUDA prototype shows 6–10 × speed-ups over Louvain and Infomap on billion-edge social graphs, while delivering comparable modularity scores. The study isolates load imbalance and label oscillation as the principal bottlenecks in GPU implementations and proposes heuristics that keep them under one iteration per level.
Title: Distributed Direction-Optimizing Label Propagation for Community Detection
Authors: X. Liu, J. Firos, M. Zalewski, M. Halappanavar, K. Barker, A. Lumsdaine, A. H. Gebremedhin
Status: IEEE High Performance Extreme Computing Conference (HPEC 2019) – Graph Challenge Innovation Award.
Abstract: This paper generalizes direction-optimizing label propagation to distributed-memory clusters, introducing ghost-vertex caching and asynchronous halo exchanges that overlap communication with computation. Evaluated on a 1.5-trillion-edge synthetic scale graph, the algorithm scales to 8,192 Summit nodes with 83 % parallel efficiency and halves time-to-solution versus the best prior Graph Challenge entry. The innovation demonstrates that near-interactive community detection on web-scale data is feasible on leadership-class systems.
Title: Multi-metric Graph Query Performance Prediction
Authors: K. Sasani, M. H. Namaki, Y. Wu, A. H. Gebremedhin
Status: Proceedings of DASFAA 2018 (LNCS 10827), pp. 289–306
Abstract: The study proposes a machine-learning framework that predicts graph query cost along three axes—runtime, memory footprint, and answer quality—using structural statistics extracted from both the query and the data graph. By formulating the task as a multi-label regression problem and training gradient-boosted decision trees on 120,000 query/data pairs, the method attains R² > 0.9 for reachability queries and R² > 0.85 for pattern-matching queries across eight real networks. Leveraging the predictions, a workload-aware optimizer reorders mixed query batches and cuts overall execution time by 54 % on average.
Title: Network Similarity Prediction in Time-Evolving Graphs: A Machine-Learning Approach
Authors: K. Sasani, M. H. Namaki, A. H. Gebremedhin
Status: Workshop on the Intersection of Graph Algorithms and Machine Learning (GraML 2018), at IPDPSW 2018.
Abstract: Temporal graphs often demand fast estimation of how similar two graph snapshots will become in the near future. This work learns to forecast similarity scores (Δ-SimRank and graph-edit distance) by combining structural motifs, temporal centrality trends, and spectral signatures as features for a recurrent gradient-boosting model. On five evolving social and communication networks, the predictor anticipates similarity one step ahead with a mean absolute error 38 % lower than dynamic-embedding baselines, enabling early detection of sudden structural shifts such as event-driven bursts and community mergers.
Title: Performance Prediction for Graph Queries
Authors: M.H. Namaki, K. Sasani, Y. Wu and A.H. Gebremedhin
Status: ACM SIGMOD International Conference on Management of Data Workshop on Network Data Analytics (NDA 2017)
Abstract: Query performance prediction has shown benefits to query optimization and resource allocation for relational databases. Emerging applications are leading to search scenarios where workloads with heterogeneous, structure-less analytical queries are processed over large-scale graph and network data. This calls for effective models to predict the performance of graph analytical queries, which are often more involved than their relational counterparts. In this paper, we study and evaluate predictive techniques for graph query performance prediction. We make several contributions. (1) We propose a general learning framework that makes use of practical and computationally efficient statistics from query scenarios and employs regression models. (2) We instantiate the framework with two routinely issued query classes, namely, reachability and graph pattern matching, that exhibit different query complexity. We develop modeling and learning algorithms for both query classes. (3) We show that our prediction models readily apply to resource-bounded querying, by providing a learning-based workload optimization strategy. Given a query workload and a time bound, the models select queries to be processed with a maximized query profit and a total cost within the bound. Using real-world graphs, we experimentally demonstrate the efficacy of our framework in terms of accuracy and the effectiveness of workload optimization.
Title: Cast to Vote: A Socio-technical Network Analysis of an Election Smartphone Application
Authors: J. Stachofsky, A. H. Gebremedhin, R. Crossler
Status: Digital Government: Research and Practice 3 (1), Article 3, pp. 1-17 (2022).
Abstract: This article builds a socio-technical dataset and associated network and provides exploratory analysis of a GitHub repository for a smartphone application used by the Elizabeth Warren campaign during the 2020 Iowa Democratic Caucuses. It provides insights into the types of technology used by campaigns for tracking primary delegates and the social actors that have potential influence over those technologies. Results suggest that certain nodes in the network have more influence over the election technology and that security risks can manifest through the unique interaction between technical and social network components as supply-chain attacks. The work sheds light on implications of how election technology is audited and designed—especially for potential public-facing voting technology—and it proposes a metric for evaluating socio-technical vulnerability.
Title: High-Order Line Graphs of Non-Uniform Hypergraphs: Algorithms, Applications, and Experimental Analysis
Authors: X. T. Liu, J. Firoz, S. Aksoy, I. Amburg, A. Lumsdaine, C. Joslyn, B. Praggastis, A. H. Gebremedhin
Status: Proc. IEEE International Parallel and Distributed Processing Symposium (IPDPS 2022), pp. 784-794.
Abstract: Hypergraphs offer flexible and robust data representations for many applications, but methods that work directly on hypergraphs are often prohibitively expensive. Existing practice expands a hypergraph into a graph—via clique- or line-graph constructions—and then runs standard graph analytics, incurring massive space and time costs. We present efficient parallel algorithms that accelerate and shrink the memory footprint of higher-order graph expansions, focusing on hyperedge-based s-line graphs while remaining applicable to higher-order clique expansions. This is the first framework to enable hypergraph spectral analysis of large datasets on a single shared-memory machine. Our algorithms allow analyses previously out of reach, delivering orders-of-magnitude speed-ups over sparse-matrix multiplication baselines and achieving 2–31 × speed-ups over the best prior heuristic for s-line construction.
Title: NWHy: A Framework for Hypergraph Analytics — Representations, Data Structures, and Algorithms
Authors: X. T. Liu, J. Firoz, A. H. Gebremedhin, A. Lumsdaine
Status: IEEE IPDPS Workshops 2022, pp. 275-284.
Abstract: NWHypergraph (NWHy) is a high-performance C++ framework for both exact and approximate hypergraph analytics. It supports multiple hypergraph representations—including bipartite, clique-expansion, s-line, and a novel “adjoin-graph” form—and provides parallel data structures and algorithms for each. The library demonstrates near-state-of-the-art scalability across representations and introduces two queue-based algorithms for s-line graph construction that showcase work-queue design principles. These queue-based kernels match the performance of non-queue methods on bipartite graphs while extending naturally to more irregular structures, making NWHy a versatile foundation for large-scale hypergraph analysis.
Title: Parallel Algorithms for Efficient Computation of High-Order Line Graphs of Hypergraphs
Authors: X. T. Liu, J. Firoz, A. Lumsdaine, C. Joslyn, S. Aksoy, B. Praggastis, A. H. Gebremedhin
Status: Proc. IEEE HiPC 2021, pp. 312-321 (Best-Paper Nominee).
Abstract: We investigate mathematical models for multi-way (non-dyadic) interactions represented as hypergraphs, focusing on the computation of s-overlap line graphs that capture shared-vertex strength between hyperedges. Treating s-overlap as a kernel, we develop parallel algorithms with heuristics that bypass redundant work and exploit data locality. The resulting implementations build line graphs orders of magnitude faster than naïve pairwise approaches and more than 10 × faster than SpGEMM-with-filtration baselines, especially for large s. These gains enable downstream graph-analytics pipelines to leverage highly tuned graph algorithms on line-graph projections to analyse hypergraph connectivity in previously intractable datasets.
Title: Exploring MPI Communication Models for Graph Applications Using Graph Matching as a Case Study
Authors: S. Ghosh, M. Halappanavar, A. Kalyanaraman, A. Khan, A. H. Gebremedhin
Status: Proc. IEEE International Parallel and Distributed Processing Symposium (IPDPS 2019), pp. 761-770.
Abstract: Traditional distributed-memory graph codes rely on MPI point-to-point Send-Recv, but newer MPI paradigms—Remote Memory Access (RMA) and topology-aware neighborhood collectives—promise better support for the irregular traffic patterns of graph analytics. Using distributed half-approximate maximum-weight graph matching as a case study, we compare implementations based on MPI-3 RMA and neighborhood collectives against a non-blocking Send-Recv baseline. On thousands of NERSC Cori cores, the RMA and neighborhood-collective versions achieve up to 6 × runtime speed-ups while also cutting energy and memory costs, demonstrating that modern MPI communication models can substantially improve large-scale irregular graph workloads.
Lucid Dreaming for Experience Replay: Refreshing Past States with the Current Policy
Authors: Y. Du, G. Warnell, A. Gebremedhin, P. Stone, M. Taylor
Status: Neural Computing and Applications (2021)
Abstract:
Experience replay (ER) improves the data efficiency of off-policy reinforcement-learning (RL) algorithms by allowing an agent to store and reuse its past experiences in a replay buffer. While many techniques have been proposed to enhance ER by biasing how experiences are sampled from the buffer, thus far they have not considered strategies for refreshing experiences inside the buffer. In this work, we introduce Lucid Dreaming for Experience Replay (LiDER), a conceptually new framework that allows replay experiences to be refreshed by leveraging the agent’s current policy. LiDER consists of three steps: first, LiDER moves an agent back to a past state; second, from that state, LiDER lets the agent execute a sequence of actions by following its current policy— as if the agent were “dreaming” about the past and can try out different behaviors to encounter new experiences in the dream; third, LiDER stores and reuses the new experience if it turned out better than what the agent previously experienced, i.e., to refresh its memories. LiDER is designed to be easily incorporated into off-policy, multi-worker RL algorithms that use ER; we present a case study of applying LiDER to an actor-critic-based algorithm. Results show LiDER consistently improves performance over the baseline in six Atari 2600 games. arxiv.org
Algorithmic Accountability in Small Data: Sample-Size-Induced Bias Within Classification Metrics
Authors: J. Briscoe, G. Kepler, D. Deford, A. Gebremedhin
Status: AISTATS 2025 (Proceedings of the 28th International Conference on Artificial Intelligence and Statistics)
Abstract:
Evaluating machine learning models is crucial not only for determining their technical accuracy but also for assessing their potential societal implications. While the potential for low-sample-size bias in algorithms is well known, we demonstrate the significance of sample-size bias induced by combinatorics in classification metrics. This revelation challenges the efficacy of these metrics in assessing bias with high resolution, especially when comparing groups of disparate sizes, which frequently arise in social applications. We provide analyses of the bias that appears in several commonly applied metrics and propose a model-agnostic assessment and correction technique. Additionally, we analyze counts of undefined cases in metric calculations, which can lead to misleading evaluations if improperly handled. This work illuminates the previously unrecognized challenge of combinatorics and probability in standard evaluation practices and thereby advances approaches for performing fair and trustworthy classification methods. arxiv.org
Facets of Disparate Impact: Evaluating Legally Consistent Bias in Machine Learning
Authors: J. Briscoe, A. Gebremedhin
Status: CIKM ’24 (ACM International Conference on Information and Knowledge Management)
Abstract:
Leveraging current legal standards, we define bias through the lens of marginal benefits and objective testing with the novel metric “Objective Fairness Index”. This index combines the contextual nuances of objective testing with metric stability, providing a legally consistent and reliable measure. Utilizing the Objective Fairness Index, we provide fresh insights into sensitive machine learning applications, such as COMPAS (recidivism prediction), highlighting the metric’s practical and theoretical significance. The Objective
Fairness Index allows one to differentiate between discriminatory
tests and systemic disparities.
Reducing Sample Selection Bias in Clinical Data through Generation of Multi-Objective Synthetic Data
Authors: J. Briscoe, C. DeSmet, K. Wuestney, A. Gebremedhin, R. Fritz, D. J. Cook
Status: ICBES 2024 (11th International Conference on Biomedical Engineering and Systems)
Abstract:
In the era of data-driven healthcare, identifying, quantifying, and mitigating bias in machine learning is of paramount importance. The impact of fair machine learning is particularly significant when predictions are applied in a clinical setting, where biased
predictions can lead to unequal healthcare outcomes. In this paper, we consider the area of biomedical informatics and examine existing bias metrics and introduce a new metric to analyze bias in a smart home dataset. We investigate bias that may occur along sensitive
attributes and examine its impact on the machine learning task of activity recognition from the collected data. In a novel approach to bias mitigation, we introduce a multi-objective generative adversarial network that creates synthetic data to mitigate sample bias by enhancing data diversity. We validate these methods using data collected for older adults living in smart homes who are managing multiple chronic health conditions, highlighting the potential of our approach to improve health predictions and outcomes.
Exploring Geriatric Clinical Data and Mitigating Bias with Multi-Objective Synthetic Data Generation for Equitable Health Predictions
Authors: J. Briscoe, C. DeSmet, K. Wuestney, A. Gebremedhin, R. Fritz, D. J. Cook
Status: Journal of Biomedical Engineering and Biosciences (2024)
Abstract:
In the data-centric landscape of modern healthcare, addressing bias in machine-learning models is crucial for ensuring equitable health outcomes. When applied in clinical settings, biased predictions can exacerbate disparities in healthcare. This paper focuses on the domain of biomedical informatics and the challenge of mitigating bias in smart-home datasets used for health monitoring. We assess existing bias metrics and a new metric, the Objective Fairness Index (OFI), to quantify bias related to sensitive attributes. To address these biases, we propose a novel method using a multi-objective generative adversarial network (GAN) that generates diverse synthetic data to improve data representation. This approach, validated on data from older adults managing chronic health conditions, demonstrates the potential to enhance both prediction accuracy and fairness in health outcomes. jbeb.avestia.com
Algorithmic Accountability in Small Data: Reliability and Fairness in Classification Metrics
Authors: J. Briscoe, G. Kepler, D. DeFord, A. Gebremedhin
Status: International Conference on Artificial Intelligence and Statistics (AISTATS 2025)
Abstract:
Evaluating machine learning models is crucial not only for determining their technical accuracy but also for assessing their potential societal implications. While the potential for low-sample-size bias in algorithms is well known, we demonstrate the significance of sample-size bias induced by combinatorics in classification metrics. This revelation challenges the efficacy of these metrics in assessing bias with high resolution, especially when comparing groups of disparate sizes, which frequently arise in social applications. We provide analyses of the bias that appears in several commonly applied metrics and propose a model-agnostic assessment and correction technique. Additionally, we analyze counts of undefined cases in metric calculations, which can lead to misleading evaluations if improperly handled. This work illuminates the previously unrecognized challenge of combinatorics and probability in standard evaluation practices and thereby advances approaches for performing fair and trustworthy classification methods.
An Adaptive Machine Learning Framework for Behind-the-Meter Load/PV Disaggregation
Authors: R. Saeedi, K. S. Sajan, K. Davies, A. Srivastava, A. H. Gebremedhin
Status: IEEE Transactions on Industrial Informatics, 17 (10): 7060–7069 (2021)
Abstract:
A significant amount of distributed photovoltaic (PV) generation is invisible to distribution-system operators because it is behind the meter on customer premises and not directly monitored by the utility. The resulting unknown, varying negative demand introduces uncertainty that impacts system reliability and operational cost. This paper proposes an adaptive machine-learning framework to disaggregate PV generation and load from net measurements. Measurements from smart transformers, smart meters, other sensors, and weather stations are transformed so that (a) temporal dependencies are captured effectively and (b) dimensionality is reduced without sacrificing accuracy. Linear regression, decision-tree, random-forest, and multilayer-perceptron models are trained on the transformed data. Several split scenarios—including 90/10, one-month-out, one-season-out, and panel-independent—demonstrate that the framework can estimate PV generation with high accuracy; random forest yields the best overall performance. researchgate.net
Adversarial Creation of a Smart Home Testbed for Novelty Detection
Authors: J. Briscoe, A. Gebremedhin, L. Holder, D. Cook
Status: AAAI Spring Symposium Series 2022
Abstract:
This paper introduces SHGAN, an adversarially-trained system to create smart home testbeds for novelty detection. We design a unique structure to model the complexities of smart home data and generate an arbitrary amount of realistic sensor readings. We validate the approach based on data collected from real-world smart homes and discuss methods for utilizing this testbed to detect and handle novel smart home automated tasks
Title: Denoising Diffusion Implicit Models for Generating Cyber Defense Network Traffic
Authors: J. Halvorsen, Y. Yan, A. Gebremedhin
Status: IEEE International Conference on Communications (ICC), 2025
Abstract: This paper introduces Denoising Diffusion Implicit Models (DDIMs) to generate synthetic network traffic tailored for cyber defense applications. The approach aims to address the scarcity of high-quality, labeled datasets in cybersecurity by producing realistic traffic patterns that can be used to train and evaluate intrusion detection systems. The methodology leverages the strengths of diffusion models in capturing complex data distributions, ensuring the generated traffic maintains statistical properties akin to real-world scenarios.
Title: Applying Generative Machine Learning to Intrusion Detection: A Systematic Mapping Study and Review
Authors: J. Halvorsen, C. Izurieta, H. Cai, A. Gebremedhin
Status: ACM Computing Surveys, Volume 56, Issue 10, Article No. 257, 2024
Abstract: Intrusion Detection Systems (IDSs) are vital for modern cyber defense, alerting users to cyber-attacks. Machine learning enhances IDS capabilities but faces challenges like limited quality training data and high false-positive rates. Generative Machine Learning Models (GMLMs) offer solutions by generating synthetic data to improve training. This article presents a comprehensive exploration of GMLMs in intrusion detection, including a systematic mapping study of existing research and a detailed review providing insights and future research directions.
Title: Generative Machine Learning for Cyber Security
Authors: J. Halvorsen, A. Gebremedhin
Status: Military Cyber Affairs, Vol. 7, Iss. 1, Article 4, 2024
Abstract: The future of cybersecurity stands to benefit from continued research on generative models. Automated approaches to cybersecurity based on machine learning are necessary to combat emerging cyber threats. However, current machine learning tools face challenges like data availability and high false-positive rates. Generative models can address these issues by creating high-quality synthetic data for training and testing. Some generative architectures are multipurpose and, when used for tasks like intrusion detection, can outperform existing classifier models. This paper discusses the potential of generative models in enhancing cybersecurity.
Title: A Critical Review of Cybersecurity Education in the United States
Authors: J. Crabb, C. Hundhausen, A. Gebremedhin
Status: ACM Technical Symposium on Computer Science Education (SIGCSE), 2024
Abstract: This work examines the state-of-the-art of cybersecurity education in the United States by analyzing data from two sources: Programs of Study at Centers of Academic Excellence in Cybersecurity (CAE-C) designated by the NSA and peer-reviewed research in cybersecurity education over the past decade. The study identifies trends, gaps, and proposes improvements to align curricula with industry needs, ensuring the graduation of work-ready cybersecurity specialists.
Title: Cybersecurity Education: Insights From A Novel Cybersecurity Summer Workshop
Authors: J. Crabb, C. Izurieta, B. Van Wie, O. Adesope, A. Gebremedhin
Status: IEEE Security & Privacy, Vol. 22, pp. 89–98, Nov–Dec 2024
Abstract: This article shares experiences and lessons learned from a multifaceted two-week cybersecurity summer workshop designed to enhance cybersecurity education. The workshop included seminars, mentored research, certificate programs, internships, and support for student club activities. The goal was to provide a comprehensive educational experience to prepare students for careers in cybersecurity.
Title: Cybersecurity Education and Research: Experiences in Training the Next Generation of Cyber Professionals
Authors: J. Crabb, A. Gebremedhin
Status: CYBER Magazine, MCPA, May 2024
Abstract: Cybersecurity continues to grow in importance as a national issue, impacting privacy, public safety, and national defense. This article discusses the need to educate and train the U.S. cyber workforce to the highest standards in both civilian and military sectors. It highlights government initiatives like the National Initiative for Cybersecurity Education (NICE) and the Centers of Academic Excellence in Cybersecurity (CAE-C), aiming to standardize cybersecurity education and training across the country.
Title: CPSyNet: A Tool for Generating Customized Cyber-Power Synthetic Network for Distribution System with Distributed Energy Resources
Authors: L. Wang, J. Halvorsen, S. Pannala, A. Srivastava, A. Gebremedhin, N. Schulz
Status: IET Smart Grid, Vol. 5, No. 6, pp. 463–477, 2022
Abstract: The integration of distributed energy resources and advancements in information technology have transformed traditional power distribution systems into active cyber-physical systems. However, existing publicly available distribution test feeders are limited and lack features like cyber models and customization. This paper introduces CP-SyNet, a tool developed to generate customizable cyber-physical synthetic distribution test feeders. CP-SyNet creates three-phase unbalanced test feeders based on user requirements, considering both cyber and physical aspects of the network for comprehensive analysis.
Title : A nearest-neighbors network model for sequence data reveals new insight into genotype distribution of a pathogen a tool for analyzing and managing short-sequence repeat data
Authors: H.N. Catanese, K.A. Brayton, A.H. Gebremedhin
Status: BMC Bioinformatics 2018.
Abstract: Background. Sequence similarity networks are useful for classifying and characterizing biologically important proteins. Threshold-based approaches to similarity network construction using exact distance measures are prohibitively slow to compute and rely on the difficult task of selecting an appropriate threshold, while similarity networks based on approximate distance calculations compromise useful structural information.
Results. We present an alternative network representation for a set of sequence data that overcomes these drawbacks. In our model, called the Directed Weighted All Nearest Neighbors (DiWANN) network, each sequence is represented by a node and is connected via a directed edge to only the closest sequence, or sequences in the case of ties, in the dataset. Our contributions span several aspects. Specifically, we: (i) Apply an all nearest neighbors network model to protein sequence data from three different applications and examine the structural properties of the networks; (ii) Compare the model against threshold-based networks to validate their semantic equivalence, and demonstrate the relative advantages the model offers; (iii) Demonstrate the model’s resilience to missing sequences; and (iv) Develop an efficient algorithm for constructing a DiWANN network from a set of sequences. We find that the DiWANN network representation attains similar semantic properties to threshold-based graphs, while avoiding weaknesses of both high and low threshold graphs. Additionally, we find that approximate distance networks, using BLAST bitscores in place of exact edit distances, can cause significant loss of structural information. We show that the proposed DiWANN network construction algorithm provides a fourfold speedup over a standard threshold based approach to network construction. We also identify a relationship between the centrality of a sequence in a similarity network of an Anaplasma marginale short sequence repeat dataset and how broadly that sequence is dispersed geographically.
Conclusion. We demonstrate that using approximate distance measures to rapidly construct similarity networks may lead to significant deficiencies in the structure of that network in terms centrality and clustering analyses. We present a new network representation that maintains the structural semantics of threshold-based networks while increasing connectedness, and an algorithm for constructing the network using exact distance measures in a fraction of the time it would take to build a threshold-based equivalent.
Title: Network Analysis of Driver Genes in Human Cancers
Authors: S. Patil, S. Roberts, A. Gebremedhin
Status: Frontiers in Bioinformatics, 4 (2024)
Abstract:
Identifying driver genes is critical to understanding cancer progression and enabling targeted therapies. Driver genes are those that directly contribute to cancer development by conferring growth advantage to cells through mutations. We perform a network-based analysis of driver genes across multiple cancer types using data from The Cancer Genome Atlas (TCGA). Gene co-occurrence and mutual exclusivity relationships are extracted and represented as undirected graphs, with nodes denoting driver genes and edges denoting statistically significant associations. Using structural network measures such as degree centrality, clustering coefficient, and community structure, we identify key differences and commonalities across cancers. We observe that certain genes act as hubs in multiple cancers, suggesting their potential as pan-cancer therapeutic targets. Community detection reveals modules enriched in functionally similar or co-mutated genes. Our analysis demonstrates the utility of network science in uncovering complex patterns in cancer genomics and provides insight into the topological roles of driver genes.
Title: Sequence Similarity Network Analysis Provides Insight into the Temporal and Geographical Distribution of Mutations in SARS-CoV-2 Spike Protein
Authors: S. Patil, H.N. Catanese, K.A. Brayton, E. Lofgren, A.H. Gebremedhin
Status: Viruses, 14, 1672 (2022)
Abstract:
Severe acute respiratory syndrome‑related coronavirus (SARS‑CoV‑2), which still infects hundreds of thousands of people globally each day despite various countermeasures, has been mutating rapidly. Mutations in the spike (S) protein seem to play a vital role in viral stability, transmission, and adaptability. Therefore, to control the spread of the virus, it is important to gain insight into the evolution and transmission of the S protein. This study deals with the temporal and geographical distribution of mutant S proteins from sequences gathered across the US over a period of 19 months in 2020 and 2021. The S protein sequences are studied using two approaches: (i) multiple sequence alignment is used to identify prominent mutations and highly mutable regions and (ii) sequence similarity networks are subsequently employed to gain further insight and study mutation profiles of concerning variants across the defined time periods and states. Additionally, we tracked the variants using visualizations on geographical maps. The visualizations produced using the Directed Weighted All Nearest Neighbors (DiWANN) networks and maps provided insights into the transmission of the virus that reflect well the statistics reported for the time periods studied. We found that the networks created using DiWANN are superior to commonly used approximate distance networks created using BLAST bitscores. The study offers a richer computational approach to analyze the transmission profile of the prominent S protein mutations in SARS-CoV-2 and can be extended to other proteins and viruses
Title: RepeatAnalyzer: a tool for analysing and managing short-sequence repeat data
Authors: H.N. Catanese, K.A. Brayton, A.H. Gebremedhin
Status: BMC Genomics 2016 17:422.
Abstract: Background. Short-sequence repeats (SSRs) occur in both prokaryotic and eukaryotic DNA, inter- and intragenically, and may be exact or inexact copies. When heterogeneous SSRs are present in a given locus, we can take advantage of the pattern of different repeats to genotype strains based on the SSRs. Cataloguing and tracking these repeats can be difficult as diverse groups of researchers are involved in the identification of the repeats. Additionally, the task is error-prone when done manually.
Results. We developed RepeatAnalyzer, a new software tool capable of tracking, managing, analysing and cataloguing SSRs and genotypes using Anaplasma marginale as a model species. RepeatAnalyzer’s analysis capability includes novel metrics for measuring regional genetic diversity (corresponding to variety and regularity of SSR occurrence). As a part of its visualization capabilities, RepeatAnalyzer produces high quality maps of the geographic distribution of genotypes or SSRs over a region of interest. RepeatAnalyzer’s repeat identification functionality was validated for all SSRs and genotypes reported in 21 publications, using 380 A. marginale isolates gathered from the five publications within that list that provided access to their isolates. The tool produced accurate genotyping results in every case. In addition, it uncovered a number of errors in the published literature: 11 cases where SSRs were misreported, 5 cases where two different SSRs had been given the same name, and 16 cases where two or more names had been given to a single SSR. The analysis and visualization functionalities of the tool are demonstrated using several examples.
Conclusions. RepeatAnalyzer is a robust software tool that can be used for storing, managing, and analysing short-sequence repeats for the purpose of strain identification. The tool can be used for any set of SSRs regardless of species. When applied to A. marginale, our test case, we show that genotype lengths for a given region follow a normal distribution, while SSR frequencies follow a power-law-like distribution. Further, we find that over 90 % of repeats are 28 to 29 amino acids long, which is in agreement with conventional wisdom. Lastly, our analysis reveals that the most common edit distance is five or six, which is counter-intuitive since we expected that result to be closer to one, resulting from the simplest change from one repeat to another.
Title: Characterization of Anaplasma marginale subspecies centrale using msp1aS genotyping reveals wildfire reservoir
Authors: Z.T.H. Khumalo, H.N. Catanese, N. Leisching, P. Hove, N.E. Collins, M.E. Chaisit, A.H. Gebremedhin, M.C. Oosthuizen and K.A. Brayton
Status: Journal of Clinical Microbiology, 2016 54:10, 2503-2512.
Abstract: Bovine anaplasmosis caused by the intraerythrocytic rickettsial pathogen Anaplasma marginale is endemic in South Africa. Anaplasma marginale subspecies centrale (A. centrale) also infects cattle, however, it causes a milder form of anaplasmosis and is used as a live vaccine against A. marginale. There has been less interest in the epidemiology of A. centrale, and, as a result, there are few reports detecting natural infections of this organism. When detected in cattle, it is often assumed that it is due to vaccination, and in most cases it is reported as co-infection with A. marginale without characterization of the strain. In this study a total of 380 blood samples from wild ruminant species and cattle collected from Biobanks, National Parks, and other regions of South Africa were used in duplex real-time PCR assays to simultaneously detect A. marginale and A. centrale. PCR results indicated high occurrence of A. centrale infections ranging from 25-100% in National Parks. Samples positive for A. centrale were further characterized using the msp1aS gene, a homolog of msp1a of A. marginale which contains repeats at the 5′ end that are useful for genotyping strains. A total of 47 Msp1aS repeats were identified which corresponded to 32 A. centrale genotypes detected in cattle, buffalo and wildebeest. RepeatAnalyzer was used to examine strain diversity. Our results demonstrate a diversity of A. centrale strains from cattle and wildlife hosts from South Africa and indicate the utility of msp1aS as a genotypic marker for A. centrale strain diversity.
Title: Co-infections with multiple genotypes of Anaplasmamarginale in cattle indicate pathogen diversity
Authors: P. Hove, M.E. Chaisi, K.A. Brayton, H. Ganesan, H.N. Catanese, M.S. Mtshali, A.M. Mutshembele, M.C. Oosthuizen, and N.E. Collins.
Status: Journal of Clinical Microbiology, 2016 54:10, 2503-2512.
Abstract: Background. Only a few studies have examined the presence of Anaplasma marginale and Anaplasma centrale in South Africa, and no studies have comprehensively examined these species across the whole country. To undertake this country-wide study we adapted a duplex quantitative PCR assay for use in South Africa, but found that one of the genes on which the assay was based was variable. Therefore, we sequenced a variety of field samples and tested the assay on the variants detected. We used the assay to screen 517 cattle samples sourced from all nine provinces of South Africa, and subsequently examined A. marginale positive samples for msp1 genotype to gauge strain diversity.
Results. Although the A. marginale msp1β gene is variable, the qPCR functions at an acceptable efficiency. The A. centralegroEL gene was not variable within the qPCR assay region. Of the cattle samples screened using the assay, 57% and 17% were found to be positive for A. marginale and A. centrale, respectively. Approximately 15% of the cattle were co-infected. Msp1 genotyping revealed 36 novel repeat sequences. Together with data from previous studies, we analyzed the Msp1a repeats from South Africa where a total of 99 repeats have been described that can be attributed to 190 msp1α genotypes. While 22% of these repeats are also found in other countries, only two South African genotypes are also found in other countries; otherwise the genotypes are unique to South Africa.
Conclusions. Anaplasma marginale was prevalent in the Western Cape, KwaZulu-Natal and Mpumalanga, and absent in the Northern Cape. Anaplasma centrale was prevalent in the Western Cape and KwaZulu-Natal and absent in the Northern Cape and Eastern Cape. None of the cattle in the study were known to be vaccinated with A. centrale, so finding positive cattle indicates that this organism appears to be naturally circulating in cattle. A diverse population of A. marginale strains is found in South Africa, with some msp1α genotypes widely distributed across the country, and others appearing only once in one province. This diversity should be taken into account in future vaccine development studies.
Title:Bring Your Own Location Data: Use of Google Smartphone Location History Data for Environmental Health Research
Authors: P. Hystad, O. Amram, O. Oje, A. Larkin, K. Boakye, A. Avery, A. Gebremedhin, G. Duncan
Status: Environmental Health Perspectives, 130(11), CID 117005, 2022.
Abstract: Background: Environmental exposures are commonly estimated using spatial methods, with most epidemiological studies relying on home addresses. Passively collected smartphone location data, like Google Location History (GLH) data, may present an opportunity to integrate existing long‑term time–activity data. Objectives: We aimed to evaluate the potential use of GLH data for capturing long‑term retrospective time–activity data for environmental health research. Methods: We included 378 individuals who participated in previous GPS studies within the Washington State Twin Registry. GLH data consists of location information that has been routinely collected since 2010 when location sharing was enabled within Android operating systems or Google apps. We created instructions for participants to download their GLH data and provide it through secure data transfer. We summarized the GLH data provided, compared it to available GPS data, and conducted an exposure assessment for nitrogen dioxide (NO₂) air pollution.
Results: Of 378 individuals contacted, we received GLH data from 61 individuals (16.1%) and 53 (14.0%) indicated interest but did not have historical GLH data available. The provided GLH data spanned 2010–2021 and included 34 million locations, capturing 66,677 participant days. The median number of days with GLH data per participant was 752, capturing 442 unique locations. When we compared GLH data to 2‑week GPS data (~1.8 million points), 95% of GPS time‑activity points were within 100 m of GLH locations. We observed important differences between NO₂ exposures assigned at home locations compared with GLH locations, highlighting the importance of GLH data in environmental exposure assessment.
Conclusion: Collecting GLH data is a feasible and cost‑effective method for capturing retrospective time–activity patterns for large populations. Cohort studies should consider adding GLH data collection to capture historical time–activity patterns of participants using a “bring‑your‑own‑location‑data” approach. Privacy concerns must be carefully managed when using GLH data.
Title:Smartphone Google Location History: A Novel Approach to Outdoor Physical Activity Research
Authors: O. Amram, O. Oje, A. Larkin, K. Baokye, A. Avery, A. Gebremedhin, B. Williams, G. Duncan, P. Hystad
Status:Journal of Physical Activity and Health, 22(3):364–372 (Dec 11 2024; print Mar 1 2025)
Abstract: Background: Outdoor physical activity (PA) is an important component of overall health; however, it is difficult to measure. Passively collected smartphone location data like Google Location History (GLH) present an opportunity to address this issue. Objectives: To evaluate the use of GLH data for measuring outdoor PA. Methods: We collected GLH data for 357 individuals from the Washington State Twin Registry. We first summarized GLH measurements relevant to outdoor PA. Next, we compared accelerometer measurements to GLH-classified PA for a subset of 25 participants who completed 2 weeks of global positioning system and accelerometer monitoring. Finally, we examined the association between GLH-measured walking and obesity.
Results: Participants provided a mean (SD) average 52 (18.8) months of GLH time‑activity data, which included a mean (SD) average of 2,421 (1,632) trips per participant. GLH measurements were classified as 79,994 walking trips (11.6% of all trips), 564,558 vehicle trips (81.8%), 11,974 cycling (1.7%), and 890 running trips (0.1%). Sixty‑two percent of these trips had location accuracy > 80%. In accelerometry validation, GLH walking trips had mean vector magnitude of 3,150 counts/min vs 489 for vehicle trips. In adjusted cross‑sectional analyses, both walking minutes and walking trips per month were inversely associated with obesity (OR = 0.78; 95% CI 0.60–0.96, and OR = 0.91; 95% CI 0.82–0.98, respectively).
Conclusion: GLH data provide a novel method for measuring long-term, retrospective outdoor PA, offering new opportunities for physical activity research.
Title:HierGP: Hierarchical Grid Partitioning for Scalable Geospatial Data Analytics
Authors: O. Oje, T. Stirewalt, O. Amram, P. Hystad, S. Amiri, A. Gebremedhin
Status:ACM Transactions on Spatial Algorithms and Systems, Accepted September 2024.
Abstract: Application domains such as environmental health science, climate science, and geosciences—where the relationship between humans and the environment is studied—are constantly evolving and require innovative approaches in geospatial data analysis. Recent technological advancements have led to the proliferation of high-granularity geospatial data, enabling such domains but posing major challenges in managing vast datasets that have high spatiotemporal similarities. We introduce the Hierarchical Grid Partitioning (HierGP) framework to address this issue, and demonstrate that HierGP outperforms existing approaches in runtime, memory footprint, and scalability.
Results: HierGP significantly accelerates geospatial data processing by reducing memory usage and improving computation speed over state-of-the-art methods, while maintaining or improving analytic accuracy.
Conclusion: HierGP provides a scalable and efficient foundation for processing high-resolution spatial-temporal datasets, offering researchers a robust tool for environmental and geospatial analyses.
Title:Use of Individual Google Location History Data to Identify Consumer Encounters with Food Outlets,
Authors: O. Oje, O. Amram, P. Hystad, A. Gebremedhin, P. Monsivais
Status: International Journal of Health Geographics, 24(1), 2025.
Abstract: Addressing key behavioral risk factors for chronic diseases, such as diet, requires innovative methods to objectively measure dietary patterns and their upstream determinants, notably the food environment. Although GIS techniques have pushed the boundaries by mapping food outlet availability, they often simplify food access dynamics to the vicinity of home addresses, possibly misclassifying neighborhood effects. Leveraging Google Location History Timeline (GLH) data offers a novel approach to assess long-term patterns of food outlet utilization at an individual level, providing insights into the relationship between food environment interactions, diet quality, and health outcomes.
Results: We identified 156,405 specific food outlet visits for the 357 study participants. Sixty percent were full-service restaurants, 15% limited-service restaurants, and 16% supermarkets. Mean visits per person per month to any food outlet was 12.795. Only 8%, 10%, and 11% of full-service restaurants, limited-service restaurants, and supermarkets, respectively, occurred within 1 km of residential locations.
Conclusion: GLH data presents a novel method to assess individual-level food utilization behaviors. Future research should explore the utility of GLH data to assess food environment interactions and their relationship with health outcomes.
CPSyNet: A Tool for Generating Customisable Cyber-Power Synthetic Network for Distribution Systems with Distributed Energy Resources
Authors: L. Wang; J. Halvorsen; S. Pannala; A. Srivastava; A. H. Gebremedhin; N. Schulz
Status: IET Smart Grid 5(6): 463–477 (2022)
Abstract:
The integration of distributed energy resources and advancements in information technology have enabled the transition of traditional power distribution systems to active cyber-physical distribution systems. Existing publicly available test feeders lack cyber models and customization options. CP-SyNet bridges this gap by generating three-phase unbalanced synthetic feeders tailored to user specifications, while modeling both physical power flows and the corresponding communication network. Graph-theoretic construction of the physical network and automatic emulation of a cyber layer enable realistic co-simulation for security, resilience, and control studies, as demonstrated in case-study feeders.
MPC-Based Decentralized Voltage Control in Power Distribution Systems with EV and PV Coordination
Authors: L. Wang; A. Dubey; A. H. Gebremedhin; A. Srivastava; N. Schulz
Status: IEEE Transactions on Smart Grid 13(4): 2908–2919 (2022)
Abstract:
High penetrations of distributed photovoltaics and electric vehicles introduce severe voltage regulation challenges. We present a decentralized model-predictive control algorithm that clusters network nodes based on flexibility metrics, then coordinates PV inverter reactive power, PV curtailment, and EV charging/discharging within each cluster. Applied to the IEEE-123 node feeder under varied penetration scenarios, the method achieves satisfactory voltage profiles with significantly reduced computational burden compared to centralized MPC.
An Adaptive Machine Learning Framework for Behind-the-Meter Load/PV Disaggregation
Authors: R. Saeedi; K. S. Sajan; K. Davies; A. Srivastava; A. H. Gebremedhin
Status: IEEE Transactions on Industrial Informatics 17(10): 7060–7069 (2021)
Abstract:
A significant amount of distributed photovoltaic (PV) generation is invisible to distribution system operators since it is behind the meter on customer premises and not directly monitored by the utility. The generation essentially adds an unknown varying negative demand to the system, which causes additional uncertainty in determining the total load. This uncertainty directly impacts system reliability, cold load pickup, load behavior modeling, and hence cost of operation. This paper proposes an adaptive machine learning framework to disaggregate PV generation and load from the net measurement. The framework estimates PV generation and load based on measurements/data collected from smart transformers, smart meters, other sensors, and weather stations. The proposed framework core idea is to transform the data such that: a) the machine learning model can effectively utilize the time dependency of measurements, and b) the measurements are transformed into a lower-dimensional space to reduce complexity while maintaining accuracy. The transformed measurements are then used to train the machine learning models for load/PV dis-aggregation. Machine learning models investigated include linear regression (LR), decision tree (DT), random forest (RF), and Multilayer Perceptron (MLP). Several test/ training split scenarios, including 90%-10% Split, One-Month-Out, One-Season-Out, and Panel-Independent Split, are presented to provide a thorough evaluation of the proposed framework. Results show that the proposed framework can estimate PV generation with high accuracy using low-complexity methods, and random forest is found to provide superior performance compared to the other ML models investigated.
Reinforcement Learning for Battery Energy Storage Dispatch Augmented with Model-Based Optimizer
Authors: G. Krishnamoorthy; A. Dubey; A. H. Gebremedhin
Status: SmartGridComm 2021 (IEEE International Conference on Communications, Control, and Computing Technologies for Smart Grids)
Abstract:
Reinforcement learning has been found useful in solving optimal power flow (OPF) problems in electric power distribution systems. However, the use of largely model-free reinforcement learning algorithms that completely ignore the physics-based modeling of the power grid compromises the optimizer performance and poses scalability challenges. This paper proposes a novel approach to synergistically combine the physics-based models with learning-based algorithms using imitation learning to solve distribution-level OPF problems. Specifically, we propose imitation learning based improvements in deep reinforcement learning (DRL) methods to solve the OPF problem for a specific case of battery storage dispatch in the power distribution systems. The proposed imitation learning algorithm uses the approximate optimal solutions obtained from a linearized model-based OPF solver to provide a good initial policy for the DRL algorithms while improving the training efficiency. The effectiveness of the proposed approach is demonstrated using IEEE 34-bus and 123-bus distribution feeders with numerous distribution-level battery storage systems.
An Open-Source Environment for Reinforcement Learning in Power Distribution Systems
Authors: G. Krishnamoorthy; A. Dubey; A. H. Gebremedhin
Status: IEEE PES General Meeting 2022
Abstract:
We introduce an open-source platform to interface the OpenAI gym environment (an open-source repository for RL algorithms) with OpenDSS (an open-source distribution system simulator). The proposed OpenDSS-RL wrapper enables the seamless application of reinforcement learning (RL) algorithms for power distribution systems using a standardized environment. Specifically, the environment provides the user the ability to perform the training and testing of any state-of-the-art RL algorithm on any of the distribution system models available in OpenDSS. We demonstrate voltage regulation applications for IEEE 13-bus, 34-bus, and 123-bus test systems using the proposed OpenDSS-RL wrapper. The proposed RL controller for voltage control are trained using four state-of-the-art RL algorithms available from stable baselines. The tool is available at https://github.com/WSU-DS/OpenDSS-RL-Wrapper.
Title: Reinforcement Learning for Battery Energy Storage Dispatch Augmented with Model-based Optimizer
Authors: G. Krishnamoorthy, A. Dubey, A. H. Gebremedhin
Status: IEEE SmartGridComm, 2021
Abstract:
Reinforcement learning has been found useful in solving optimal power flow (OPF) problems in electric power distribution systems. However, the use of largely model-free reinforcement learning algorithms that completely ignore the physics-based modeling of the power grid compromises the optimizer performance and poses scalability challenges. This paper proposes a novel approach to synergistically combine the physics-based models with learning-based algorithms using imitation learning to solve distribution-level OPF problems. Specifically, we propose imitation learning-based improvements in deep reinforcement learning (DRL) methods to solve the OPF problem for a specific case of battery storage dispatch in the power distribution systems. The proposed imitation learning algorithm uses the approximate optimal solutions obtained from a linearized model-based OPF solver to provide a good initial policy for the DRL algorithms while improving the training efficiency. The effectiveness of the proposed approach is demonstrated using IEEE 34-bus and 123‑bus distribution feeders with numerous distribution-level battery storage systems.
Title: A Signal-level Transfer Learning Framework for Autonomous Reconfiguration of Wearable Systems
Authors: R. Saeedi and A.H. Gebremedhin
Status: IEEE Transactions on Mobile Computing, Vol 19, No 3, 513-527, 2020
Abstract: Machine learning algorithms, which form the core intelligence of wearables, traditionally deduce a computational model from a set of training data to detect events of interest. However, in the dynamic environment in which wearables operate, the accuracy of a computational model drops whenever changes in configuration or context of the system occur. In this paper, using transfer learning as an organizing principle, we propose a novel design framework to enable autonomous reconfiguration of wearable systems. More specifically, we focus on the cases where the specifications of sensor(s) or the subject vary compared to what is available in the training data. We develop two new algorithms for data mapping (the mapping is between the training data and the data for the current operating setting). The first data mapping algorithm combines effective methods for finding signal similarity with network-based clustering, while the second algorithm is based on finding signal motifs.
Title: Synthetic Sensor Data Generation for Health Applications: A Supervised Deep Learning Approach
Authors: S. Norgaard, R. Saeedi, K. Sasani, and A.H. Gebremedhin
Status: IEEE Engineering in Medicine and Biology Society Conference (EMBC 2018)
Abstract: Recent advancements in mobile devices, data analysis, and wearable sensors render the capability of in-place health monitoring. Supervised machine learning algorithms, the core intelligence of these systems, learn from labeled training data. However, labeling vast amount of data is time-consuming and expensive. Moreover, sensor data often contains personal information that a user may not be comfortable sharing. Therefore, there is a strong need to develop methods for generating realistic labeled sensor data. In this paper, we propose a supervised generative adversarial network architecture that learns from feedback from both a discriminator and a classifier in order to create synthetic sensor data. We demonstrate the effectiveness of the architecture on a publicly available human activity dataset.
Title: Personalized Human Activity Recognition using Wearables: A Manifold Learning-based Knowledge Transfer
Authors: R. Saeedi, K. Sasani, S. Norgaard and A.H. Gebremedhin
Status: IEEE Engineering in Medicine and Biology Society Conference (EMBC 2018)
Abstract: Human activity recognition (HAR) is an important component in health-care systems. For example, it can enable context-aware applications such as elderly care and patient monitoring. Relying on a set of training data, supervised machine learning algorithms form the core intelligence of most existing HAR systems. Meanwhile, the accuracy of an HAR model highly depends on the similarity between the training and the operating context. Therefore, there is a need for developing machine learning algorithms that can easily adapt to the operating context at hand. In this paper, we propose a cross-subject transfer learning algorithm that links source and target subjects by constructing manifolds from feature-level representation of the source subject(s). Our algorithm assigns labels to the unlabeled data in the current context using the manifold learned from the source subject(s).
Title: A Closed-loop Deep Learning Architecture for Robust Activity Recognition using Wearable Sensors
Authors: R. Saeedi, S. Norgaard, and A.H. Gebremedhin
Status: IEEE Big Data 2017
Abstract: Human activity recognition (HAR) plays a central role in health-care, fitness and sport applications because of its potential to enable context-aware human monitoring. With the increase in popularity of wearable devices, we are witnessing a large influx in availability of human activity data. For effective analysis and interpretation of these heterogeneous and high-volume streaming data, we need powerful algorithms. In particular, there is a strong need for developing algorithms for robust classification of human activity data that specifically address challenges associated with dynamic environments (e.g. different users, signal heterogeneity). We use the term robust here in two, orthogonal senses: 1) leveraging related data in such a way that knowledge is transferred to a new context; and 2) actively re-configuring machine learning algorithms such that they can be applied in a new context. In this paper, we propose an architecture that combines an active learning approach with a novel deep network. Our deep neural network exploits both Convolutional and Long Short-Term Memory (LSTM) layers in order to learn hierarchical representation of features and capture time dependencies from raw-data. The active learning process allows us to choose the best instances for fine-tuning the deep network to the new setting in which the system operates (i.e. a new subject). We demonstrate the efficacy of the architecture using real data of human activity. We show that the accuracy of activity recognition reaches over 90% by annotating less than 20% of unlabeled data.
Title: Co-MEAL: Cost-Optimal Multi-Expert Active Learning Architecture for Mobile Health Monitoring
Authors: R. Saeedi, K. Sasani and A.H. Gebremedhin
Status: ACM-BCB17
Abstract: Mobile health monitoring plays a central role in a variety of health-care applications. Using mobile technology, health-care providers can access clinical information and communicate with subjects in real-time. Due to the sensitive nature of health-care applications, these systems need to process physiological signals highly accurately. However, as mobile devices are employed in dynamic environments, the accuracy of a machine learning model drops whenever a change in configuration of the system occurs. Therefore, data mining and machine learning techniques that specifically address challenges associated with dynamic environments (e.g. different users, signal heterogeneity) are needed. In this paper, using active learning as an organizing principle, we propose a cost-optimal multiple-expert architecture to adapt a machine learning model (e.g. classifier) developed in a given context to a new context or configuration. More specifically, in our architecture, a system’s machine learning model learns from experts available to the system, (e.g. another mobile device, human annotator) while minimizing the cost of data labeling. Our architecture also exploits collaboration between experts to enrich their knowledge which in turn decreases both cost and uncertainty of data labeling in future steps. We demonstrate the efficacy of the architecture using a publicly available dataset on human activity. We show that the accuracy of activity recognition reaches over 85% by labeling only 15% of unlabeled data. At the same time, the number of queries from human expert is reduced by up to 82%.
Title: Transfer Learning Algorithms for Autonomous Configuration of Wearable Systems
Authors: R. Saeedi, H. Ghasemzadeh and A.H. Gebremedhin
Status: IEEE BigData 2016.
Abstract: Wearables have emerged as a revolutionary technology in many application domains including healthcare and fitness. Machine learning algorithms, which form the core intelligence of wearables, traditionally deduce a computational model from a set of training examples to detect events of interest (e.g. activity type). However, in the dynamic environment in which wearables typically operate in, the accuracy of a computational model drops whenever changes in configuration of the system occur. Therefore, there is a need to develop systems which can adapt to the new configuration autonomously. In this paper, using transfer learning as an organizing principle, we develop several algorithms for data mapping. The data mapping algorithms employ effective signal similarity methods and are used to adapt the system to the new configuration. We demonstrate the efficacy of the data mapping algorithms using a publicly available dataset on human activity recognition.
Title : Algorithms for Balanced Graph Colorings with Applications in Parallel Computing
Authors: H.Lu, M. Halappanavar, D. Chavarri-a-Miranda, A.H. Gebremedhin, A. Panyala and A. Kalyanaraman
Status: IEEE Transactions on Parallel and Distributed Systems 28(5), 1240–1256, 2017.
Abstract: Graph coloring–in a generic sense–is used to identify subsets of independent tasks in parallel scientific computing applications. Traditional coloring heuristics aim to reduce the number of colors used as that number also corresponds to the number of parallel steps in the application. However, if the color classes produced have a skew in their sizes, utilization of hardware resources becomes inefficient, especially for the smaller color classes. Equitable coloring is a theoretical formulation of coloring that guarantees a perfect balance among color classes, and its practical relaxation is referred to here as balanced coloring. In this paper, we consider balanced coloring models in the context of parallel computing. The goal is to achieve a balanced coloring of an input graph without increasing the number of colors that an algorithm oblivious to balance would have used. We propose and study multiple heuristics that aim to achieve such a balanced coloring for two variants of coloring problem, distance-1 coloring (the standard coloring problem) and partial distance-2 coloring (defined on a bipartite graph). We present parallelization approaches for multi-core and manycore architectures and cross-evaluate their effectiveness with respect to the quality of balance achieved and performance. Furthermore, we study the impact of the proposed balanced coloring heuristics on a concrete application – viz. parallel community detection, which is an example of an irregular application. In addition, we propose several extensions to our basic balancing schemes and evaluate their balancing efficacy and performance characteristics. The thorough treatment of balanced coloring presented in this paper from algorithms to application is expected to serve as a valuable resource to parallel application developers who seek to improve parallel performance of their applications using coloring.
Title : Parallelization of Bin Packing on Multicore Systems
Authors: S. Ghosh and A.H. Gebremedhin
Status: 2016 IEEE International Conference on High Performance Computing, Data, and Analytics (HiPC’16).
Abstract: We study effective parallelization of approximation algorithms for the one-dimensional bin packing problem on a multicore platform. Bin packing is a classic combinatorial optimization problem that aims to pack a given sequence of items into a minimum number of equal-sized bins. The problem potentially serves as a model for a wide variety of applications. Examples include: packing data into chunks in a memory hierarchy in a given system to increase application performance, loading vehicles subject to weight limitations, and packing TV commercials into station breaks. Bin packing has long served as a proving ground for the analysis of approximation algorithms and played a crucial role in the development of much of the theory of approximation algorithms. Its parallelization, however, has received comparatively much less attention. In this work, we develop multipleparallel versions of an effective approximation algorithm (First Fit Decreasing) for the problem and investigate the trade-off between solution quality and execution time. We use OpenMP and Cilk Plus as mechanisms for achieving the parallelization. The new parallel algorithms obtain a speedup of more than 10x (on 32 cores) for moderate to large input sequences without sacrificing much on the quality of solution produced by the sequential algorithm — in particular, we see only about 3 to 30% increase in the number of bins compared to the sequential version. In turn, the solution obtained by the sequential First Fit Decreasing algorithm is provably almost optimal (the approximation ratio is less than 1.3).
Title : One-sided Interface for Matrix Operations using MPI-3 RMA: A Case Study with Elemental
Authors: S. Ghosh, J.R. Hammond, A.J. Pena, P. Balaji, A.H. Gebremedhin and B. Chapman
Status: Proceedings of ICPP 2016.
Abstract: A one-sided programming model separates communication from synchronization, and is the driving principle behind partitioned global address space (PGAS) libraries such as Global Arrays (GA) and SHMEM. PGAS models expose a rich set of functionality that a developer needs in order to implement mathematical algorithms that require frequent multidimensional array accesses. However, use of existing PGAS libraries in application codes often requires significant development effort in order to fully exploit these programming models. On the other hand, a vast majority of scientific codes use MPI either directly or indirectly via third-party scientific computation libraries, and need features to support application-specific communication requirements (e.g., asynchronous update of distributed sparse matrices, commonly arising in machine learning workloads). For such codes it is often impractical to completely shift programming models in favor of special one-sided communication middleware. Instead, an elegant and productive solution is to exploit the one-sided functionality already offered by MPI-3 RMA (Remote Memory Access). We designed a general one-sided interface using the MPI-3 passive RMA model for remote matrix operations in the linear algebra library Elemental; we call the interface we designed RMAInterface. Elemental is an open source library for distributed-memory dense and sparse linear algebra and optimization.
Title : Balanced Coloring for Parallel Computing Applications
Authors: H. Lu, M. Halappanavar, D. Charri-a-Miranda, A.H. Gebremedhin and A. Kalyanaraman
Status: Proceedings of IPDPS 2015.
Abstract: Graph coloring is used to identify subsets of independent tasks in parallel scientific computing applications. Traditional coloring heuristics aim to reduce the number of colors used as that number also corresponds to the number of parallel steps in the application. However, if the color classes produced have a skew in their sizes, utilization of hardware resources becomes inefficient, especially for the smaller color classes. Equitable coloring is a theoretical formulation of coloring that guarantees a perfect balance among color classes, and its practical relaxation is referred to as balanced coloring. In this paper, we revisit the problem of balanced coloring in the context of parallel computing. The goal is to achieve a balanced coloring of an input graph without increasing the number of colors that an algorithm oblivious to balance would have used. We propose and study multiple heuristics that aim to achieve such a balanced coloring, present parallelization approaches for multi-core and manycore architectures, and cross-evaluate their effectiveness with respect to the quality of balance achieved and performance. Furthermore, we study the impact of the proposed balanced coloring heuristics on a concrete application – viz. parallel community detection, which is an example of an irregular application. The thorough treatment of balanced coloring presented in this paper from algorithms to application is expected to serve as a valuable resource to parallel application developers who seek to improve parallel performance of their applications using coloring.
Title: Parallel Maximum Clique Algorithms with Applications to Network Analysis
Authors: R.A. Rossi, D.F. Gleich, A.H. Gebremedhin
Status: SIAM Journal on Scientific Computing, Vol 37, Issue 5, pages C589-C618, 2015.
Abstract: We present a fast, parallel maximum clique algorithm for large sparse graphs that is designed to exploit characteristics of social and information networks. The method exhibits a roughly linear runtime scaling over real-world networks ranging from a thousand to a hundred million nodes. In a test on a social network with 1.8 billion edges, the algorithm finds the largest clique in about 20 minutes. At its heart the algorithm employs a branch-and-bound strategy with novel and aggressive pruning techniques. The pruning techniques include the combined use of core numbers of vertices along with a good initial heuristic solution to remove the vast majority of the search space. In addition, the exploration of the search tree is parallelized. During the search, processes immediately communicate changes to upper and lower bounds on the size of maximum clique. This occasionally results in a super-linear speedup because tasks with large search spaces can be pruned by other processes. We demonstrate the impact of the algorithm on applications using two different network analysis problems: computation of temporal strong components in dynamic networks and determination of compress-friendly ordering of nodes of massive networks.
Title: Fast Maximum Clique Algorithms for Large Graphs
Authors: R.A. Rossi, D.F. Gleich, A.H. Gebremedhin, M.M.A. Patwary
Status: Proceedings of WWW2014.
Abstract: We propose a fast, parallel maximum clique algorithm for large sparse graphs that is designed to exploit characteristics of social and information networks. Despite clique’s status as an NP-hard problem with poor approximation guarantees, our method exhibits nearly linear runtime scaling over real-world networks ranging from 1000 to 100 million nodes. In a test on a social network with 1.8 billion edges, the algorithm finds the largest clique in about 20 minutes. Key to the efficiency of our algorithm are an initial heuristic procedure that finds a large clique quickly and a parallelized branch and bound strategy with aggressive pruning tnd ordering echniques. We use the algorithm to compute the largest temporal strong components of temporal contact networks.
Title: Graph Coloring Algorithms for Multi-core and Massively Multithreaded Architectures
Authors: U. Catalyurek, J. Feo, A.H. Gebremedhin, M. Halappanavar, A. Pothen
Status: Parallel Computing 38 (2012), 576-594.
Abstract: We explore the interplay between architectures and algorithm design in the context of shared-memory platforms and a specific graph problem of central importance in scientific and high-performance computing, distance-1 graph coloring. We introduce two different kinds of multithreaded heuristic algorithms for the stated, NP-hard, problem. The first algorithm relies on speculation and iteration, and is suitable for any shared-memory system. The second algorithm uses dataflow principles, and is targeted at the non-conventional, massively multithreaded Cray XMT system. We study the performance of the algorithms on three different platforms—Cray XMT, Sun Niagara 2, and Intel Nehalem—representing varying degrees of multithreading capabilities. As testbed, we use synthetically generated massive graphs carefully designed to cover a wide spectrum of input types. The results show that the algorithms have scalable runtime performance and use nearly the same number of colors as the underlying serial algorithm, which in turn is effective in practice. The study provides insight into the design of high performance algorithms for irregular problems on many-core architectures.
Title: New Multithreaded Ordering and Coloring Algorithms for Multicore Architectures
Authors: M.M.A. Patwary, A.H. Gebremedhin and A. Pothen
Status: In E. Jeannot, R. Namyst and J. Roman, editors, Euro-Par 2011, volume 6853 of Lecture Notes in Computer Science, pages 250 — 262. Springer, 2011.
Abstract: We present new multithreaded vertex ordering and distance-k graph coloring algorithms that are well-suited for the emerging and rapidly growing multicore platforms. The vertex ordering techniques rely on various notions of “degree”, are known to be effective in reducing the number of colors used by a greedy coloring algorithm, and are generic enough to be applicable to contexts other than coloring. We employ approximate degree computation in the ordering algorithms and speculation and iteration in the coloring algorithms as our primary remedies for breaking sequentiality and achieving effective parallelization. The algorithms have been implemented using OpenMP, and experiments run on Intel Nehalem and other multi-core machines using a set of carefully designed synthetic graphs and real-world graphs attest that the algorithms provide scalable runtime performance. The number of colors the algorithms use is nearly the same as in the serial case, which in turn is often very close to optimal.
Title: Distributed-memory Parallel Algorithms for Matching and Coloring
Authors: U. Catalyurek, F. Dobrian, A.H. Gebremedhin, M. Halappanavar, A. Pothen
Status: Proceedings of IEEE International Parallel and Distributed Processing Symposium, Workshops and PhD Forums (IPDPSW), Workshop on Parallel Computing and Optimization (PCO’11), pages 1966–1975, 2011
Abstract: We discuss the design and implementation of new highly-scalable distributed-memory parallel algorithms for two prototypical graph problems, edge-weighted matching and distance-1 vertex coloring. Graph algorithms in general have low concurrency, poor data locality, and high ratio of data access to computation costs, making it challenging to achieve scalability on massively parallel machines. We overcome this challenge by employing a variety of techniques, including speculation and iteration, optimized communication, and randomization. We present preliminary results on weak and strong scalability studies conducted on an IBM Blue Gene/P machine employing up to tens of thousands of processors. The results show that the algorithms hold strong potential for computing at petascale.
Title: Distributed-memory Parallel Algorithms for Distance-2 Coloring and Related Problems in Derivative Computation
Authors: D. Bozdag, U. Catalyurek, A. Gebremedhin, F. Manne, E. Boman and F. Ozgunner
Status: SIAM Journal on Scientific Computing, Vol 32, Issue 4, pp 2418–2446, 2010.
Abstract: The distance-2 graph coloring problem aims at partitioning the vertex set of a graph into the fewest sets consisting of vertices pairwise at distance greater than two from each other. Its applications include derivative computation in numerical optimization and channel assignment in radio networks. We present efficient, distributed-memory, parallel heuristic algorithms for this NP-hard problem as well as for two related problems used in the computation of Jacobians and Hessians. Parallel speedup is achieved through graph partitioning, speculative (iterative) coloring, and a BSP-like organization of parallel computation. Results from experiments conducted on a PC cluster employing up to 96 processors and using large-size real-world as well as synthetically generated test graphs show that the algorithms are scalable. In terms of quality of solution, the algorithms perform remarkably well—the number of colors used by the parallel algorithms was observed to be very close to the number used by the sequential counterparts, which in turn are quite often near optimal. Moreover, the experimental results show that the parallel distance-2 coloring algorithm compares favorably with the alternative approach of solving the distance-2 coloring problem on a graph $G$ by first constructing the square graph $G^2$ and then applying a parallel distance-1 coloring algorithm on $G^2$. Implementations of the algorithms are made available via the Zoltan load-balancing library.
Title: A framework for Scalable Greedy Coloring on Distributed Memory Parallel Computers
Authors: D. Bozdag, A. Gebremedhin, F. Manne, E. Boman and U. Catalyurek
Status: Journal of Parallel and Distributed Computing Vol 68, No 4, pp 515–535, 2008.
Abstract: We present a scalable framework for parallelizing greedy graph coloring algorithms on distributed-memory computers. The framework unifies several existing algorithms and blends a variety of techniques for creating or facilitating concurrency. The latter techniques include exploiting features of the initial data distribution, the use of speculative coloring and randomization, and a BSP-style organization of computation and communication. We experimentally study the performance of several specialized algorithms designed using the framework and implemented using MPI. The experiments are conducted on two different platforms and the test cases include large-size synthetic graphs as well as real graphs drawn from various application areas. Computational results show that implementations that yield good speedup while at the same time using about the same number of colors as a sequential greedy algorithm can be achieved by setting parameters of the framework in accordance with the size and structure of the graph being colored. Our implementation is freely available as part of the Zoltan parallel data management and load-balancing library.
Title: A Parallel Distance-2 Graph Coloring Algorithm for Distributed Memory Computers
Authors: D. Bozdag, U. Catalyurek, A.H. Gebremedhin, F. Manne, E. G. Boman and F. Ozguner
Status: Lecture Notes in Computer Science, vol 3726, 2005, pages 796 – 806,Springer. Proc. of HPCC 2005, Sept 21 – 25, 2005, Sorrento, Italy.
Abstract: The distance-2 graph coloring problem aims at partitioning the vertex set of a graph into the fewest sets consisting of vertices pairwise at distance greater than two from each other. Application examples include numerical optimization and channel assignment. We present the first distributed-memory heuristic algorithm for this NP-hard problem. Parallel speedup is achieved through graph partitioning, speculative (iterative) coloring, and a BSP-like organization of computation. Experimental results show that the algorithm is scalable, and compares favorably with an alternative approach—solving the problem on a graph G by first constructing the square graph G^2 and then applying a parallel distance-1 coloring algorithm on G^2 .
Title: A Scalable Parallel Graph Coloring Algorithm for Distributed Memory Computers
Authors: E.G. Boman, D. Bozdag, U. Catalyurek, A.H. Gebremedhin and F. Manne
Status: Lecture Notes in Computer Science, vol 3648 , 2005, pages 241 – 251,Springer. Proc. of EuroPar 2005, 30 Aug – 2 Sept, 2005, Lisboa, Portugal.
Abstract: In large-scale parallel applications a graph coloring is often carried out to schedule computational tasks. In this paper, we describe a new distributed-memory algorithm for doing the coloring itself in parallel. The algorithm operates in an iterative fashion; in each round vertices are speculatively colored based on limited information, and then a set of incorrectly colored vertices, to be recolored in the next round, is identified. Parallel speedup is achieved in part by reducing the frequency of communication among processors. Experimental results on a PC cluster using up to 16 processors show that the algorithm is scalable.
Title: Speeding up Parallel Graph Coloring
Authors: A.H. Gebremedhin, F.Manne and T. Woods
Status: Lecture Notes in Computer Science, vol 3732, pp 1079-1088, 2005, Springer. Proc. of Para 2004, June 20 – 23, 2004, Lyngby, Denmark.
Abstract: This paper presents new efficient parallel algorithms for finding approximate solutions to graph coloring problems. We consider an existing shared memory parallel graph coloring algorithm and suggest several enhancements both in terms of ordering the vertices so as to minimize cache misses, and performing vertex-to- processor assignments based on graph partitioning instead of random allocation. We report experimental results that demonstrate the performance of our algorithms on an IBM Regatta supercomputer when up to 12 processors are used. Our implementations use OpenMP for parallelization and Metis for graph partitioning. The experiments show that we get up to a 70 % reduction in runtime compared to the previous algorithm.
Title: Graph Coloring on Coarse Grained Multicomputers
Authors: A. Gebremedhin, I.Guerrin-Lassous, J. Gustedt and J.A. Telle
Status: Discrete Applied Mathematics, Vol 131, No 1, pp 179–198, 2003
Abstract: We present an efficient and scalable Coarse Grained Multicomputer (CGM) coloring algorithm that colors a graph G with at most D + 1 colors where D is the maximum degree in G . This algorithm is given in two variants: a randomized and a deterministic . We show that on a p-processor CGM model the proposed algorithms require a parallel time of O(|G|/p) and a total work and overall communication cost of O(|G|) . These bounds correspond to the average case for the randomized version and to the worst case for the deterministic variant.
Title: Parallel Distance-k Coloring Algorithms for Numerical Optimization
Authors: A.H. Gebremedhin, F. Manne and A. Pothen
Status: In B. Monien and R. Feldmann (Eds.): EuroPar 2002, Lecture Notes in Computer Science 2400, pp. 912-921, Springer-Verlag 2002.
Abstract: Matrix partitioning problems that arise in the efficient estimation of sparse Jacobians and Hessians can be modeled using variants of graph coloring problems. In a previous work, we argue that distance-2 coloring and distance-3/2 coloring [we now call this star coloring ] are robust and flexible formulations of the respective matrix estimation problems. The problem size in large-scale optimization contexts makes the matrix estimation phase an expensive part of the entire computation both in terms of execution time and memory space. Hence, there is a need for both shared- and distributed-memory parallel algorithms for the stated graph coloring problems. In the current work, we present the first practical shared address space parallel algorithms for these problems. The main idea in our algorithms is to randomly partition the vertex set equally among the available processors, let each processor speculatively color its vertices using information about already colored vertices, detect eventual conflicts in parallel, and finally re-color conflicting vertices sequentially. Randomization is also used in the coloring phases to further reduce conflicts. Our PRAM-analysis shows that the algorithms should give almost linear speedup for sparse graphs that are large relative to the number of processors. Experimental results from our OpenMP implementations on a Cray Origin2000 using various large graphs show that the algorithms indeed yield reasonable speedup for modest numbers of processors.
Title: Scalable Parallel Graph Coloring Algorithms
Authors: A. Gebremedhin and F. Manne
Status: Concurrency: Practice and Expereince, Vol 12, pp1131–1146, 2000.
Abstract: Finding a good graph coloring quickly is often a crucial phase in the development of efficient, parallel algorithms for many scientific and engineering applications. In this paper we consider the problem of solving the graph coloring problem itself in parallel. We present a simple and fast parallel graph coloring heuristic that is well suited for shared memory programming and yields an almost linear speedup on the PRAM model. We also present a second heuristic that improves on the number of colors used. The heuristics have been implemented using OpenMP. Experiments conducted on an SGI Cray Origin 2000 super computer using very large graphs from finite element methods and eigenvalue computations validate the theoretical run-time analysis.
Title: Graph Coloring on a Coarse Grained Multiprocessor
Authors: A.H. Gebremedhin, I.G. Lassous, J. Gustedt and J.A. Telle
Status: In Brandes, Ulrik, Wagner and Dorothea (Eds.): WG 2000, Lecture Notes in Computer Science 1928, pp. 184-195, 2000, Springer-Verlag.
Abstract: We present the first efficient parallel coloring algorithm for the Coarse Grain Multicomputer model. The algorithm uses at most D+1 colors, where D is the maximum degree in the graph.
Title: Parallel Graph Coloring Algorithms using OpenMP
Authors: A.H. Gebremedhin and F. Manne
Status: In Proc. of EWOMP’99, First European Workshop on OpenMP, Sept.30 – Oct. 1, 1999, Lund, Sweden.
Abstract: In this paper we make this successful approach feasible for a larger variety of architectures by extending it to the more general setting of coarse grained multiprocessors
(CGM), see Dehne et al. (1996). This model of parallel computation makes an abstraction of the interconnection network between the processors of a parallel machine (or
network) and tries to capture the efficiency of a parallel algorithm using only a few
parameters. Several experiments show that the CGM model is of practical relevance:
implementations of algorithms formulated in the CGM model in general turn out to be
feasible, portable, predictable and efficient, see Guerin ´ Lassous et al. (2000)
Download Extended Abstract in PDF
Title: Capitalizing on Live Variables: New Algorithms for Efficient Hessian Computation via Automatic Differentiation
Authors: M. Wang, A.H. Gebremedhin and A. Pothen
Status: Mathematical Programming Computation, pp 1–43, 2016.
Abstract: We revisit an algorithm (called Edge Pushing (EP)) for computing Hessians using Automatic Differentiation (AD) recently proposed by Gower and Mello (2012). Here we give a new, simpler derivation for the EP algorithm based on the notion of live variables from data-flow analysis in compiler and redesign the algorithm with close attention to general applicability and performance. We call this algorithm LIVARH and develop an extension of LIVARH that incorporates preaccumulation to further reduce execution time–the resulting algorithm is called LIVARHACC. We engineer robust implementations for both algorithms LIVARH and LIVARHACC within ADOL-C, a widely-used operator overloading based AD software tool. Rigorous complexity analyses for the algorithms are provided, and the performance of the algorithms is evaluated using a mesh optimization application and several kinds of synthetic functions as testbeds. The results show that the new algorithms outperform state-of-the-art sparse methods (based on sparsity pattern detection, coloring, compressed matrix evaluation, and recovery) in some cases by orders of magnitude. We have made our implementation available online as open-source software and it will be included in a future release of ADOL-C.
Title: ColPack: Software for Graph Coloring and Related Problems in Scientific Computing
Authors: A.H. Gebremedhin, D. Nguyen, M.M.A. Patwary and A. Pothen
Status: ACM Transactions on Mathematical Software, Vol 40, No 1, pp 1-31, 2013.
Abstract: We present a suite of fast and effective algorithms, encapsulated in a software package called ColPack, for a variety of graph coloring and related problems. Many of the coloring problems model partitioning needs arising in compression-based computation of Jacobian and Hessian matrices using automatic differentiation. Several of the coloring problems also find important applications in many areas outside derivative computation, including frequency assignment in wireless networks, scheduling, facility location, and concurrency discovery and data movement operations in parallel and distributed computing. The presentation in this article includes a high-level description of the various coloring algorithms within a common design framework, a detailed treatment of the theory and efficient implementation of known as well as new vertex ordering techniques upon which the coloring algorithms rely, a discussion of the package’s software design, and an illustration of its usage. The article also includes an extensive experimental study of the major algorithms in the package using real-world as well as synthetically generated graphs.
Title: Exploiting Sparsity in Automatic Differentiation on Multicore Architectures
Authors: B. Letschert, K. Kulshreshtha, A. Walther, D. Nguyen, A.H. Gebremedhin and A. Pothen
Status: In S. Forth et al. (Eds.), Recent Advances in Algorithmic Differentiation, Lecture Notes in Computational Science and Engineering 87, DOI 10.1007/978-3-642-30023-3_14, 2012, Springer.
Abstract: We discuss the design, implementation and performance of algorithms suitable for the efficient computation of sparse Jacobian and Hessian matrices using Automatic Differentiation via operator overloading on multicore architectures. The procedure for exploiting sparsity (for runtime and memory efficiency) in serial computation involves a number of steps. Using nonlinear optimization problems as test cases, we show that the algorithms involved in the various steps can be adapted to multithreaded computations.
Title: Implementation of Partial Separability in a Source to Source Transformation AD Tool
Authors: S.H.K. Narayanan, B. Norris, P. Hovland and A. Gebremedhin
Status: In S. Forth et al. (Eds.), Recent Advances in Algorithmic Differentiation, Lecture Notes in Computational Science and Engineering 87, DOI 10.1007/978-3-642-30023-3_31, 2012, Springer.
Abstract: A significant number of large optimization problems exhibit structure known as partial separability, for example, least squares problems, where elemental functions are gathered into groups that are then squared. The sparsity of the Jacobian of a partially separable function can be exploited by computing the smaller Jacobians of the elemental functions and then assembling them into the full Jacobian. We implemented partial separability support in ADIC2 by using pragmas to identify partially separable function values, applying source transformations to subdivide the elemental gradient computations, and using the ColPack coloring toolkit to compress the sparse elemental Jacobians. We present experimental results for an elastic-plastic torsion optimization problem from the MINPACK-2 test suite.
Title: Sparse Jacobian Computation using ADIC2 and ColPack
Authors: S.H.K. Narayanan, B. Norris, P. Hovland, D. Nguyen and A. Gebremedhin
Status: Procedia Computer Science, 4:2115 — 2123, 2011. Proceedings of the International Conference on Computational Science, ICCS 2011.
Abstract: Many scientific applications benefit from the accurate and efficient computation of derivatives. Automatically generating these derivative computations from an application’s source code offers a competitive alternative to other approaches, such as less accurate numerical approximations or labor-intensive analytical implementations. ADIC2 is a source transformation tool for generating code for computing the derivatives (e.g., Jacobian or Hessian) of a function given the C or C++ implementation of that function. Often the Jacobian or Hessian is sparse and presents the opportunity to greatly reduce storage and computational requirements in the automatically generated derivative computation. ColPack is a tool that compresses structurally independent columns of the Jacobian and Hessian matrices through graph coloring approaches. In this paper, we describe the integration of ColPack coloring capabilities into ADIC2, enabling accurate and efficient sparse Jacobian computations. We present performance results for a case study of a simulated moving bed chromatography application. Overall, the computation of the Jacobian by integrating ADIC2 and ColPack is approximately 40% faster than a comparable implementation that integrates ADOL-C and ColPack when the Jacobian is computed multiple times.
Title: Distributed-memory Parallel Algorithms for Distance-2 Coloring and Related Problems in Derivative Computation
Authors: D. Bozdag, U. Catalyurek, A. Gebremedhin, F. Manne, E. Boman and F. Ozgunner
Status: SIAM Journal on Scientific Computing, Vol 32, Issue 4, pp 2418–2446, 2010.
Abstract: The distance-2 graph coloring problem aims at partitioning the vertex set of a graph into the fewest sets consisting of vertices pairwise at distance greater than two from each other. Its applications include derivative computation in numerical optimization and channel assignment in radio networks. We present efficient, distributed-memory, parallel heuristic algorithms for this NP-hard problem as well as for two related problems used in the computation of Jacobians and Hessians. Parallel speedup is achieved through graph partitioning, speculative (iterative) coloring, and a BSP-like organization of parallel computation. Results from experiments conducted on a PC cluster employing up to 96 processors and using large-size real-world as well as synthetically generated test graphs show that the algorithms are scalable. In terms of quality of solution, the algorithms perform remarkably well—the number of colors used by the parallel algorithms was observed to be very close to the number used by the sequential counterparts, which in turn are quite often near optimal. Moreover, the experimental results show that the parallel distance-2 coloring algorithm compares favorably with the alternative approach of solving the distance-2 coloring problem on a graph $\G$ by first constructing the square graph $\G^2$ and then applying a parallel distance-1 coloring algorithm on $\G^2$. Implementations of the algorithms are made available via the Zoltan load-balancing library.
Title : Efficient Computation of Sparse Hessians Using Coloring and Automatic Differentiation
Authors: A. Gebremedhin, A. Pothen, A. Tarafdar, and A. Walther
Status: INFORMS Journal on Computing, Vol 21, No 2, pp 209-223, 2009.
Abstract : The computation of a sparse Hessian matrix H using automatic differentiation (AD) can be made efficient using the following four-step procedure.
- Determine the sparsity structure of H.
- Obtain a seed matrix S that defines a column partition of H using a specialized coloring on the adjacency graph of H.
- Compute the compressed Hessian matrix B=HS using AD.
- Recover the numerical values of the entries of H from B.
The coloring variant used in the second step depends on whether the recovery in the fourth step is direct or indirect : a direct method uses star coloring and an indirect method uses acyclic coloring . In an earlier work, we had designed and implemented effective heuristic algorithms for these two NP-hard coloring problems. Recently, we integrated part of the developed software with the AD tool ADOL-C, which has recently acquired a sparsity detection capability. In this paper, we provide a detailed description and analysis of the recovery algorithms, and experimentally demonstrate the efficacy of the coloring techniques in the overall process of computing the Hessian of a given function using ADOL-C as an example of an AD tool. We also present new analytical results on star and acyclic coloring of chordal graphs. The experimental results show that sparsity exploitation via coloring yields enormous savings in runtime and makes the computation of Hessians of very large size feasible. The results also show that evaluating a Hessian via an indirect method is often faster than a direct evaluation. This speedup is achieved without compromising numerical accuracy.
Title : Exploiting Sparsity in Jacobian Computation via Coloring and Automatic Differentiation:
A Case Study in a Simulated Moving Bed Process
Authors: A. Gebremedhin, A. Pothen and A. Walther
Status: In C. Bischof et al. (Eds): Proc. of AD2008, Lecture Notes in Computational Science and Engineering 64, pp 339-349, 2008, Springer.
Abstract : Using a model from a chromatographic separation process in chemical engineering, we demonstrate that large, sparse Jacobians of fairly complex structures can be computed accurately and efficiently by using automatic differentiation (AD) in combination with a four-step procedure involving matrix compression and de-compression. For the detection of sparsity pattern (step 1), we employ a new operator overloading-based implementation of a technique that relies on propagation of index domains. To obtain the seed matrix to be used for compression (step 2), we use a distance-2 coloring of the bipartite graph representation of the Jacobian. The compressed Jacobian is computed using the vector forward mode of AD (step 3). A simple routine is used to directly recover the entries of the Jacobian from the compressed representation (step 4). Experimental results using ADOL-C show that the runtimes of each of these steps is in complete agreement with theoretical analysis, and the total runtime is found to be only about a hundred times the time needed for evaluating the function itself. The alternative approach of computing the Jacobian without exploiting sparsity is infeasible.
Title : New Acyclic and Star Coloring Algorithms with Applications to Hessian Computation
Authors: A. Gebremedhin, A. Tarafdar, F. Manne, and A. Pothen
Status: SIAM Journal on Scientific Computing, Vol 29, No 3, pp 1042–1072, 2007.
Abstract : Acyclic and star coloring problems are specialized vertex coloring problems that arise in the efficient computation of Hessians using automatic differentiation or finite differencing, when both sparsity and symmetry are exploited. We present an algorithmic paradigm for finding heuristic solutions for these two NP-hard problems. The underlying common technique is the exploitation of the structure of two-colored induced subgraphs. For a graph G on n vertices and m edges, the time complexity of our star coloring algorithm is O(n d_2) , where d_k , a generalization of vertex degree, denotes the average number of distinct paths of length at most k edges starting at a vertex in G . The time complexity of our acyclic coloring algorithm is larger by a multiplicative factor involving the inverse of Ackermann’s function. The space complexity of both algorithms is O(m) . To the best of our knowledge, our work is the first practical algorithm for the acyclic coloring problem. For the star coloring problem, our algorithm uses fewer colors and is considerably faster than a previously known O(n d_3) time algorithm. Computational results from experiments on various large-size test graphs demonstrate that the algorithms are fast and produce highly effective solutions. The use of these algorithms in Hessian computation is expected to reduce overall runtime drastically.
Title : What Color Is Your Jacobian? Graph Coloring for Computing Derivatives
Authors : A. Gebremedhin, F. Manne, and A. Pothen
Status: SIAM Review, Vol 47, No 4, pp 629–705, 2005.
Abstract: Graph coloring has been employed since the 1980s to efficiently compute sparse Jacobian and Hessian matrices using either finite differences or automatic differentiation. Several coloring problems occur in this context, depending on whether the matrix is a Jacobian or a Hessian, and the specifics of the computational techniques employed. We consider eight variant vertex coloring problems here. This article begins with a gentle introduction to the problem of computing a sparse Jacobian, followed by an overview of the historical development of the research area. Then we present a unifying framework for the graph models of the variant matrix estimation problems. The framework is based upon the viewpoint that a partition of a matrix into structurally orthogonal groups of columns corresponds to distance-2 coloring an appropriate graph representation. The unified framework helps integrate earlier work and leads to fresh insights; enables the design of more efficient algorithms for many problems; leads to new algorithms for others; and eases the task of building graph models for new problems. We report computational results on two of the coloring problems to support our claims. Most of the methods for these problems treat a column or a row of a matrix as an atomic entity, and partition the columns or rows (or both). A brief review of methods that do not fit these criteria is provided. We also discuss results in discrete mathematics and theoretical computer science that intersect with the topics considered here.