Home Page
Personal Information
Boffo Places
Published Papers
UNC Chapel Hill
Public Library
Weather Info
Web Search
 

Search: 
 
 

This section provides access to publications associated with my research projects. A list of paper titles organized by subject follows. To see the abstract (and citation) for a paper, click on its title.

With most abstracts there will be an icon that looks like . If you click on that icon, the postscript/pdf/doc form of the paper will be retrieved for viewing and optional printing.

Books & Book Chapters
> Switching to Renewable Energy Is Prohibitively Expensive, chapter in Global Warming (Introducing Issues with Opposing Viewpoints), GALE CENGAGE Learning, Lauri S. Scherer (Eds.), Greenhaven Press, 2012.

> The Energy Gap: How to Solve the World Energy Crisis, Preserve the Environment & Save Civilization, Doug L. Hoffman and Allen Simmons, The Resilient Earth Press, July 2010.


> Performance Modeling of Enterprise Grid Systems, chapter in Data Engineering: Mining, Information, and Intelligence, Springer,Series: International Series in Operations Research & Management Science , Vol. 132 Chan, Yupo; Talburt, John; Talley, Terry M. (Eds.), 2009.

> The Resilient Earth: Science, Global Warming, and the Future of Humanity, Doug L. Hoffman and Allen Simmons, Booksurge Publishing, October 2008.

Published Papers
> An Empirical Study on Forecasting using Decomposed Arrival Data of an Enterprise Computing System, April, 2012.

> A Forecasting Capability Study of Empirical Mode Decomposition for the Arrival Time of a Parallel Batch System, April, 2010.

> Modeling and Simulation of HPC Systems Through Job Scheduling Analysis, April, 2010.

> Fairshare Scheduling – A Case Study, March, 2010.

> Application of Empirical Mode Decomposition to the Arrival Time Characterization of a Parallel Batch System Using System Logs , September, 2009.

> Capacity Planning of a Commodity Cluster in an Academic Environment: A Case Study, , April, 2008.

> A Case Study on Grid Performance Modeling , November, 2006.

> Initial Starting Point Analysis for K-Means Clustering: A Case Study , March, 2006.

> Adaptive Automatic Grid Reconfiguration Using Workload Phase Identification , December, 2005.

> Comparison of Protein Structures by Transformation into Dihedral Angle Sequences , August, 1996.

> BioSCAN: A Dynamically Reconfigurable Systolic Array for Biosequence Analysis , June, 1996.

> BioSCAN: A Network Sharable Computational Resource for Searching Biosequence Databases , March, 1996.

> Rapid Protein Structure Classification using One-dimensional Structure Profiles on the BioSCAN Parallel Computer, October, 1995.

> Pseudotorsional OCCO backbone angle as a single descriptor of protein secondary structure, May, 1995.

> A Scalable Systolic Multiprocessor System for Analysis of Biological Sequences , March, 1993.

Technical Notes
> Design of the BioSCAN server software , April, 1993.

> A Comparison of the BioSCAN Algorithm on Multiple Architectures , May, 1993.

> A Computer Architecture for Fast Aproximate Pattern Matching , April, 1993.

> UnCvL: The University of North Carolina C Vector Library , May, 1993.


Switching to Renewable Energy Is Prohibitively Expensive, D. L. Hoffman in Global Warming (Introducing Issues with Opposing Viewpoints), GALE CENGAGE Learning, Lauri S. Sherer (Eds.), Greenhaven Press, 2012.

ABSTRACT:

Green advocates and climate change alarmists alike insist that the world shift to using only non-polluting, renewable energy sources, and the sooner the better. What is seldom mentioned is the enormous cost of retooling the world's energy infrastructure to use intermittent, unreliable wind and solar energy.


An Empirical Study on Forecasting using Decomposed Arrival Data of an Enterprise Computing System , Linh Ngo, Amy Apon, and Doug Hoffman, In Proceedings 9th International Conference on Information Technology: New Generations, April, 2012.

ABSTRACT:

This research utilizes several well known forecasting techniques in combination with EMD to investigate the tradeoffs of EMD’s decomposition (sifting) step for forecasting the arrival workload of an enterprise cluster. The research is based on earlier work on the forecasting potential of empirical mode decomposition (EMD). Results show that EMD helps to improve forecasting results. Parallelization is used to perform extensive investigation across the full range of data. Future research is to increase the statistical confidence in the level of improvements possible when EMD is used as a decomposition method for forecasting.


The Energy Gap: How to Solve the World Energy Crisis, Preserve the Environment & Save Civilization, Doug L. Hoffman and Allen Simmons, The Resilient Earth Press, July 2010.

ABSTRACT:

Humans have a trait that distinguishes us from all other species: the ability to use fire. We turn on a switch and light comes into our homes. With the turn of a key, vehicles take us where we want to go. We adjust a thermostat in our homes to make us warm or cool. These are everyday events we hardly think about. It took centuries of vision, science and engineering to achieve this comfort-point in our long evolutionary journey. Today, an average person lives better than kings lived several centuries ago. As we revealed the facts behind global warming in our last book, The Resilient Earth, we take the same tack in out latest work, The Energy Gap. In its pages, we present the hard science and engineering that will close a looming energy gap for our country and the world. There is also a warning. If we chose the political route, the activist route, the human race will slide backwards for the first time since the Industrial Revolution. If we choose the correct path, as revealed in The Energy Gap, our species will continue its forward march towards a brighter future for all on Earth.


A Forecasting Capability Study of Empirical Mode Decomposition for the Arrival Time of a Parallel Batch System, Linh Ngo, Amy Apon, and Doug Hoffman, In Proceedings 7th International Conference on Information Technology: New Generations, April, 2010.

ABSTRACT:

This paper demonstrates the feasibility and potential of applying empirical mode decomposition (EMD) to forecast the arrival time behaviors in a parallel batch system. An analysis of the workload records shows the existence of daily and weekly patterns within the workload. Results show that the intrinsic mode functions (IMF), products of the sifting/decomposition process of EMD, produce a better prediction than the original arrival histogram when used in a simple weight-matching prediction technique. Promising applications include the implementation of an EMD/neural network combination.


Modeling and Simulation of HPC Systems Through Job Scheduling Analysis, W. B. Hurst, S. Ramaswamy, R. B. Lenin and D. Hoffman, In Proceedings of ALAR 2010 Conference on Applied Research in Information Technology, April, 2010.

ABSTRACT:

A key component needed for researching High Performance Cluster (HPC) Systems can be found through simulation of the HPC system. This paper presents comparative analysis of performance characteristics found from the operations of an “active” HPC system and a “simulated” HPC system.


Fairshare Scheduling – A Case Study, Hung Bui, Wesley Emeneker, Amy Apon, Doug Hoffman and Larry Dowdy, In Proceedings of The 11th LCI International Conference on High-Performance Clustered Computing, March, 2010.

ABSTRACT:

Scheduling and resource management are important in optimizing multiprocessor cluster resource allocation. Resources must be multiplexed to service requests of varied importance, and the policy chosen to manage this multiplexing can have enormous impact on throughput and response time. Fairshare scheduling is a way to manage application performance by dynamically allocating shares of system resources among competing users. The primary objective of this paper is to present an in-depth case study of fairshare scheduling In this case study, an in-depth sensitivity analysis of the various tunable parameters in fair-share scheduling techniques will be provided. The starting points for the study are scheduler log files collected from two production systems, one a production industry cluster and the second a university cluster. The approach to the case study is in two parts. First, using well-known techniques in the field, workload models for the two different environments are built and analyzed. Secondly, after the models are developed, they are presented to a fairshare scheduler under what-if scenarios. The experimental results are examined to evaluate the performance of fairshare scheduling.


Performance Modeling of Enterprise Grid Systems, D. L. Hoffman, A. Apon, L. Dowdy, B. Lu, et al, in Data Engineering: Mining, Information, and Intelligence, Series: International Series in Operations Research & Management Science , Vol. 132, Chan, Yupo; Talburt, John; Talley, Terry M. (Eds.), Springer, 2009.

ABSTRACT:

Modeling has long been recognized as an invaluable tool for predicting the performance behavior of computer systems. Modeling software, both commercial and open source, is widely used as a guide for the development of new systems and the upgrading of exiting ones. Unfortunately, no set of comprehensive tools exists for modeling complex distributed computing environments such as the ones found in emerging grid deployments. This chapter addresses concepts, methodologies, and tools that are useful when designing, implementing, and tuning the performance in grid and cluster environments.


Application of Empirical Mode Decomposition to the Arrival Time Characterization of a Parallel Batch System Using System Logs, Linh Ngo, Baochuan Lu, Hung Bui, Amy Apon, Nathan Hamm, Larry Dowdy, Doug Hoffman and Denny Brewer, In Proceedings of the 2009 International Conference on Modeling, Simulation, and Visualization Methods, July, 2009.

ABSTRACT:

Abstract: Traditionally, workload models of large-scale production computer clusters are created fromsystem logs for the purpose of analyzing and predithe performance of these systems. Such logs are oflarge, complex, and unwieldy. For conciseness, thsystem log can be approximated by finding a hypeexponential distribution that captures the workload dynamics as closely as possible. Using this techniqthe workload model is able to match closely the glostatistical measurements of the original system log.However, using a hyperexponential distribution to synthetically regenerate job arrival times in a simulation model does not capture the realistic randomness of bursts of arrivals in the original log.this paper, a new workload modeling method basedEmpirical Mode Decomposition (EMD) is describeEMD provides a compromise between the full complexity of the original log data and a simple hyperexponential representation. Likewise, the EMapproach provides a compromise between the accuassociated with the log data and the coarse approximation using the hyperexponential representation. The tradeoff of using an EMD approach can be effective in certain performance modeling studies.


The Resilient Earth: Science, Global Warming, and the Future of Humanity, Doug L. Hoffman and Allen Simmons, Booksurge Publishing, October 2008.

ABSTRACT:

A million years after the birth of our sun, the violent explosion of a nearby supernova nearly ended life on Earth before it began. Over the next four and a half billion years, forces of nature shaped our planet and the life it harbored. Barely surviving the traumatic birth of the Moon, buffeted by supernovae, and bombarded by asteroids, the resilient Earth endured. And despite planet-freezing ice ages, devastating mass extinctions, and ever changing climate, life not only survived, it thrived. Today, we are told all life on Earth is threatened by a new peril--human-caused global warming. The Resilient Earth presents the science behind global warming for a general audience, separating fact from fiction and truth from exaggeration.


Capacity Planning of a Commodity Cluster in an Academic Environment: A Case Study, Baochuan Lu, Linh Ngo, Hung Bui, Amy Apon, Nathan Hamm, Larry Dowdy, Doug Hoffman and Denny Brewer, 9th LCI International Conference on High-Performance Clustered Computing, April 2008.

ABSTRACT:

In this paper, the design of a simulation model for evaluat- ing two alternative supercomputer configurations in an academic envi- ronment is presented. The workload is analyzed and modeled, and its effect on the relative performance of both systems is studied. The In- tegrated Capacity Planning Environment (ICPE) toolkit, developed for commodity cluster capacity planning, is successfully applied to the tar- get environment. The ICPE is a tool for workload modeling, simulation modeling, and what-if analysis. A new characterization strategy is ap- plied to the workload to more accurately model commodity cluster work- loads. Through “what-if” analysis, the sensitivity of the baseline system performance to workload change, and also the relative performance of the two proposed alternative systems are compared and evaluated. This case study demonstrates the usefulness of the methodology and the ap- plicability of the tools in gauging system capacity and making design decisions.


A Case Study on Grid Performance Modeling, B. Lu, A. Apon, L. Dowdy, F. Robinson, D. Hoffman, and D. Brewer, International Conference on Parallel and Distributed Computing Systems, November 13, 2006.

ABSTRACT:

The purpose of this case study is to develop a performance model for an enterprise grid for performance management and capacity planning. The target environment includes grid applications such as health-care and financial services where the data is located primarily within the resources of a worldwide corporation. The approach is to build a discrete event simulation model for a representative work-flow grid. Five work-flow classes, found using a customized k-means clustering algorithm characterize the workload of the grid. Analyzing the gap between the simulation and measurement data validates the model. The case study demonstrates that the simulation model can be used to predict the grid system performance given a workload forecast. The model is also used to evaluate alternative scheduling strategies. The simulation model is flexible and easily incorporates several system details.


Initial Starting Point Analysis for K-Means Clustering: A Case Study, F. Robinson, A. Apon, D. Brewer, L. Dowdy, D. Hoffman, B. Lu, Proceedings of ALAR 2006 Conference on Applied Research in Information Technology, March, 2006.

ABSTRACT:

Workload characterization is an important part of systems performance modeling. Clustering is a method used to find classes of jobs within workloads. K-Means is one of the most popular clustering algorithms. Initial starting point values are needed as input parameters when performing k-means clustering. This paper shows that the results of the running the k-means algorithm on the same workload will vary depending on the values chosen as initial starting points. Fourteen methods of composing initial starting point values are compared in a case study. The results indicate that a synthetic method, scrambled midpoints, is an effective starting point method for k-means clustering.


Adaptive Automatic Grid Reconfiguration Using Workload Phase Identification, B. Lu, M. Tinker, A. Apon, D. Hoffman, and L. Dowdy, Proceedings of EScience 2005, December, 2005.

ABSTRACT:

The purpose of this study is to develop an adaptive model of a very large scale data processing and storage environment. The target environment includes grid applications such as health-care and finance in which the data may be located primarily within the resources of a worldwide corporation. The approach is to use phase identification techniques that can detect over-utilized grid resources, and then to make dynamic decisions to reassign additional resources to that portion of the application processing. Two phase identification techniques are proposed, a variation technique and a real-time threshold-based technique. The techniques are validated with a simulation model and a case study using measured data from a production grid environment. The case study demonstrates that phase identification techniques can be used as the intelligent component of a reactive mechanism for a grid to adapt to changing environmental conditions by dynamic automatic reconfiguration. Results show that threshold based phase identifying techniques combined with dynamic resource allocation capabilities are effective in alleviating performance hot spots and improving response time in a large scale data grid.


Comparison of Protein Structures by Transformation into Dihedral Angle Sequences, D. L. Hoffman, PhD dissertation, University of North Carolina at Chapel Hill, 1996.

ABSTRACT:

Proteins are large complex organic molecules that are essential to the existence of life. Decades of study have revealed that proteins having different sequences of amino acids can posses very similar three-dimensional structures. To date, protein structure comparison methods have been accurate but costly in terms of computer time. This dissertation presents a new method for comparing protein structures using dihedral transformations. Atomic XYZ coordinates are transformed into a sequence of dihedral angles, which is then transformed into a sequence of dihedral sectors. Alignment of two sequences of dihedral sectors reveals similarities between the original protein structures. Experiments have shown that this method detects structural similarities between sequences with less than 20% amino acid sequence identity, finding structural similarities that would not have been detected using amino acid alignment techniques. Comparisons can be performed in seconds that had previously taken minutes or hours.


BioSCAN: A Dynamically Reconfigurable Systolic Array for Biosequence Analysis, Raj K. Singh, W. D. Dettloff, V. L. Chi, D. L. Hoffman, S. G. Tell, C. T. White, S. F. Altschul, and B. W. Erickson, Proc. of CERCS96, National Science Foundation, Arlington, VA, June 22-24, 1996.

ABSTRACT:

We describe the design, implementation, and deployment via the Internet of BioSCAN, an application-specific computer system for the rapid determination of statistically significant alignments of biopolymer (DNA, RNA, protein) sequences. BioSCAN continues to outperform other systems designed to perform this basic task of molecular biology, which continues to grow in magnitude and importance. The BioSCAN system is hosted by a general-purpose workstation containing a special-purpose hardware engine that accelerates the core algorithm for comparing two biosequences. Careful partitioning of the computational tasks between hardware and software provides not only high performance but also programmability. The BioSCAN system can compare a sequence of up to 12,992 characters with an arbitrarily large database containing arbitrarily long sequences at a rate of 2 million database characters per second. This rate is nearly 1,000 times greater than the rate achieved by a state-of-the-art workstation using software alone. This network-sharable computational resource is accessible interactively via the World Wide Web using Mosaic, Netscape or other client software.


BioSCAN: A Network-Sharable Computational Resource for Searching Biosequence Databases, Raj K. Singh, D. L. Hoffman, S. G. Tell, and C. T. White, Computer Applications in the Biosciences, Vol. 12, No. 3, 1996, pp. 191-196.

ABSTRACT:

We describe a network sharable, interactive computational tool for rapid and sensitive search and analysis of biomolecular sequence databases such as GenBank, GenPept, Protein Identification Resource, and SWISS-PROT. The resource is accessible via the World Wide Web using popular client software such as Mosaic and Netscape. The client software is freely available on a number of computing platforms including Macintosh, IBM-PC, and Unix workstations.


Rapid Protein Structure Classification using One-dimensional Structure Profiles on the BioSCAN Parallel Computer, D. L. Hoffman, S. Laiter, Raj K. Singh, I. I. Vaisman, and A. Tropsha, Computer Applications in the Biosciences, Vol. 11, No. 6, 1995, pp. 675-679.

ABSTRACT:

Rapid growth of protein structures database in recent years requires an effective approach for objective comparison and classification of deposited protein structures. We describe a novel method for structure comparison and classification based on the alignment of one-dimensional structure profiles. These profiles are obtained by calculating the OCCO pseudodihedral angles (formed by O-C-C-O atoms of carbonyl groups of consecutive amino acid residues) from protein three-dimensional coordinates. These angle measurements are then converted into a 24 letter alphabet, and the protein structures are represented by sequences of letter from this alphabet. The BioSCAN parallel computer, designed for primary sequence alignment, is used to rapidly align and classify these one-dimensional structure profiles. We have developed and implemented weighted scoring matrix to identify structural classes based on commonly found structural motifs. The results of our experiments are in good agreement with the traditional protein structure classification schemes. One-dimensional structure profiles significantly improve efficiency of structure comparison and classification.


Pseudotorsional OCCO backbone angle as a single descriptor of protein secondary structure, Sergei Laiter, Doug L. Hoffman, Raj K. Singh, Iosif I. Vaisman, and Alexander Tropsha, Protien Science,Volume 4, Issue 8, 1995, pp.1633-1643.

ABSTRACT:

Protein secondary structure is conventionally identified using characteristic ranges of two backbone torsional angles φ and ψ. We suggest that the secondary structure can be adequately characterized by a single descriptor, the Oi-1Ci-1CiOi (where i is the residue number) pseudotorsional backbone angle. A set of 102 structurally distinct protein chains from the Protein Data Bank was used to evaluate the adequacy of this descriptor. We find that a specific range of OCCO angles corresponds to each major secondary structure. The complete range of OCCO angles (-180° to 179°) was broken into 18 consecutive subranges of 20° each, and each subrange was assigned a letter. Thus, the OCCO profiles for each protein in the database were translated into a sequence of letters. The Needleman-Wunsch primary sequence alignment algorithm was then used for secondary/tertiary structure comparison and alignment. Preliminary results indicate that this new approach has a significant potential for rapid identification of fold families in the Protein Data Bank.


A Scalable Systolic Multiprocessor for Analysis of Biological Sequences, Raj K. Singh, S. G. Tell, C. T. White, D. L. Hoffman, V. L. Chi, and B. W. Erickson, Proc. of the Symposium on Integrated Systems, Seattle, WA, March 3-5, 1993, MIT Press, Cambridge, MA, pp. 168-182.

ABSTRACT:

The design and implementation of an application-specific, fault-tolerant, and scalable multiprocessor system called BioSCAN (Biological Sequence Comparative Analysis Node) are described. Discussed are system partitioning and integration, functional decomposition between hardware and software, the algorithm and its implementation in VLSI, the early results of using the system, and comparison with other hardware and software solutions for biological sequence analysis.


Design of the BioSCAN Server Software, D. L. Hoffman, Department of Computer Science, University of North Carolina, Chapel Hill, NC. TR93-049, 1993.

ABSTRACT:

This paper is an exploration of the design goals for the Biological Sequence Comparative Analysis Node (BioSCAN) network server software and of the impact that these goals had on the overall structure and implementation of that software. The primary audience for this paper consists of computer scientists and computational biologists involved in developing similar server software. Biologists who are users of the BioSCAN computational node and have a desire for deeper understanding of how the server functions will also find this paper useful. It is assumed that the reader is familiar with UNIX and with basic networking concepts. The peculiarities of implementing a network server for a batch resource will be identified and the solutions chosen by the BioSCAN design team explained. Research for the BioSCAN project, including the design of the server software, was supported in part by NSF grant MIP-9024585.


A comparison of the BioSCAN algorithm on Multiple architectures, D. L. Hoffman, Department of Computer Science, University of North Carolina, Chapel Hill, NC. TR93-050, 1993.

ABSTRACT:

This paper compares the performance characteristics of the BioSCAN biological sequence matching algorithm on several different computer architectures. The architectures examined are a conventional RISC general purpose uni-processor, a vector oriented ``supercomputer'', and a Single Instruction Multi Data (SIMD) massively parallel computer. These architectures are represented by the following hardware platforms: a Sun 490 RISC, a Convex 240, and a MasPar MP-1. The performance of these three platforms is compared with that of the custom built BioSCAN hardware.


A Computer Architecture for Fast Aproximate Pattern Matching, R. E. Faith and D. L. Hoffman, Department of Computer Science, University of North Carolina, Chapel Hill, NC. TR93-051, 1993.

ABSTRACT:


UnCvL: The University of North Carolina C Vector Library, R. E. Faith, D. L. Hoffman, and D. G. Stahl, Department of Computer Science, University of North Carolina, Chapel Hill, NC. TR93-063, 1993.

ABSTRACT:


Copyright © 1999 - 2010, Doug L. Hoffman, all rights reserved

Questions or comments about this site?
Contact hoffman@bogus.org