Matthew Hatem, Scott Kiesel and Wheeler Ruml
Proceedings of the Twenty-ninth AAAI Conference on Artificial Intelligence, 2015
There are two major paradigms for linear-space heuristic search: iterative deepening (IDA*) and recursive best-first search (RBFS). While the node regeneration overhead of IDA* is easily characterized in terms of the heuristic branching factor, the overhead of RBFS depends on how widely the promising nodes are separated in the search tree, and is harder to anticipate. In this paper, we present two simple techniques for improving the performance of RBFS while maintaining its advantages over IDA*. While these techniques work well in practice, they do not provide any theoretical bounds on the amount of regeneration overhead. To this end, we introduce RBFS CR, the first method for provably bounding the regeneration overhead of RBFS. We show empirically that this improves its performance in several domains, both for optimal and suboptimal search, and also yields a better linear-space anytime heuristic search. RBFS CR is the first linear space best-first search robust enough to solve a variety of domains with varying operator costs.
Matthew Hatem and Wheeler Ruml
Proceedings of the Twenty-eigth AAAI Conference on Artificial Intelligence, 2014
It is commonly appreciated that solving search problems optimally can take too long. Bounded suboptimal search algorithms trade increased solution cost in a principled way for reduced solving time. Explicit Estimation Search (EES) is a recent state-of-the-art algorithm for bounded suboptimal search. Although it tends to expand fewer nodes than alternative algorithms, its per-node expansion overhead is much higher, causing it to sometimes take longer. In this paper, we present simplified variants of EES (SEES) and an earlier algorithm, A* 𝜀 (SA* 𝜀), that use different implementations of the same motivating ideas to significantly reduce search overhead and implementation complexity. In an empirical evaluation, we find that SEES matches or outperforms classic bounded suboptimal search algorithms, such as WA*, on all domains tested. We also confirm that, while SEES and SA* 𝜀 expand roughly the same number of nodes as their progenitors, they solve problems significantly faster and are much easier to implement. This work widens the applicability of state-of-the-art bounded suboptimal search by making it easier to deploy.
Matthew Hatem and Wheeler Ruml
Proceedings of the Seventh Annual Symposium on Combinatorial Search, 2014
Bounded suboptimal search algorithms are usually faster than optimal ones, but they can still run out of memory on large problems. This paper makes three contributions. First, we show how solution length estimates, used by the current state-of-the-art linear-space bounded suboptimal search algorithm Iterative Deepening EES, can be used to improve unbounded-space suboptimal search. Second, we convert one of these improved algorithms into a linear-space variant called Iterative Deepening A* epsilon, resulting in a new state of the art in linear-space bounded suboptimal search. Third, we show how Recursive Best-First Search can be used to create additional linear-space variants that have more stable performance. Taken together, these results significantly expand our armamentarium of bounded suboptimal search algorithms.
Matthew Hatem, Roni Stern, and Wheeler Ruml
Proceedings of the Sixth Annual Symposium on Combinatorial Search, 2013
It is commonly appreciated that solving search problems optimally can overrun time and memory constraints. Bounded suboptimal search algorithms trade increased solution cost for reduced solving time and memory consumption. However, even suboptimal search can overrun memory on large problems. The conventional approach to this problem is to combine a weighted admissible heuristic with an optimal linear space algorithm, resulting in algorithms such as Weighted IDA* (wIDA*). However, wIDA* does not exploit distance-to-go estimates or inadmissible heuristics, which have recently been shown to be helpful for suboptimal search. In this paper, we present a linear space analogue of Explicit Estimation Search (EES), a recent algorithm specifically designed for bounded suboptimal search. We call our method Iterative Deepening EES (IDEES). In an empirical evaluation, we show that IDEES dramatically outperforms wIDA* on domains with non-uniform edge costs and can scale to problems that are out of reach for the original EES.
Matthew Hatem and Wheeler Ruml
Proceedings of the Twenty-seventh AAAI Conference on Artificial Intelligence, 2013
Multiple sequence alignment (MSA) is a central problem in computational biology. It is well known that MSA can be formulated as a shortest path problem and solved using heuristic search, but the memory requirement of A* makes it impractical for all but the smallest problems. Partial Expansion A* (PEA*) reduces the space complexity of A* by generating only the most promising successor nodes. However, even PEA* exhausts available memory on many problems. Another alternative is Iterative Deepening Dynamic Programming, which uses an uninformed search order but stores only the nodes along the search frontier. However, it too cannot scale to the largest problems. In this paper, we propose storing nodes on cheap and plentiful secondary storage. We present a new general-purpose algorithm, Parallel External PEA* (PE2A*), that combines PEA* with Delayed Duplicate Detection to take advantage of external memory and multiple processors to solve large MSA problems. In our experiments, PE2A* is the first algorithm capable of solving the entire Reference Set 1 of the standard BAliBASE benchmark using a biologically accurate cost function. This work suggests that external best-first search can effectively use heuristic information to surpass methods that rely on uninformed search orders.
Chris Biemann, Bonaventura Coppola, Michael R. Glass, Alfio Gliozzo, Matthew Hatem, and Martin Riedl
Proceedings of the EMNLP-13 Workshop on Graph-based Algorithms for Natural Language Processing (TextGraphs-8), 2013
We introduce an interactive visualization component for the JoBimText project. JoBimText is an open source platform for large-scale distributional semantics based on graph representations. First we describe the underlying technology for computing a distributional thesaurus on words using bipartite graphs of words and context features, and contextualizing the list of semantically similar words towards a given sentential context using graph-based ranking. Then we demonstrate the capabilities of this contextualized text expansion technology in an interactive visualization. The visualization can be used as a semantic parser providing contextualized expansions of words in text as well as disambiguation to word senses induced by graph clustering, and is provided as an open source tool.
Ethan Burns, Matthew Hatem, Michael J. Leighton, and Wheeler Ruml
Proceedings of the Fifth Annual Symposium on Combinatorial Search, 2012
Published papers rarely disclose implementation details. In this paper we show how such details can account for speedups of up to a factor of 28 for different implementations of the same algorithm. We perform an in-depth analysis of the most popular benchmark in heuristic search: the 15-puzzle. We study implementation choices in C++ for both IDA* and A* using the Manhattan distance heuristic. Results suggest that several optimizations deemed critical in folklore provide only small improvements while seemingly innocuous choices can play a large role. These results are important for ensuring that the correct conclusions are drawn from empirical comparisons.
Matthew Hatem, Ethan Burns, and Wheeler Ruml
Proceedings of the Twenty-fifth AAAI Conference on Artificial Intelligence, 2011
The memory requirements of basic best-first heuristic search algorithms like A* make them infeasible for solving large problems. External disk storage is cheap and plentiful compared to the cost of internal RAM. Unfortunately, state-of-the-art external memory search algorithms either rely on brute-force search techniques, such as breadth-first search, or they rely on all node values falling in a narrow range of integers, and thus perform poorly on real-world domains with real-valued costs. We present a new general-purpose algorithm, PEDAL, that uses external memory and parallelism to perform a best-first heuristic search capable of solving large problems with real costs. We show theoretically that PEDAL is I/O efficient and empirically that it is both better on a standard unit-cost benchmark, surpassing internal IDA* on the 15-puzzle, and gives far superior performance on problems with real costs.
Matthew Hatem, Ethan Burns and Wheeler Ruml
IBM developerWorks, 2013
Solving problems by searching through a space of possible solutions is a fundamental technique in artificial intelligence called state space search. Heuristic search is a form of state space search that exploits knowledge about a problem to find solutions more efficiently. Heuristic search has enjoyed much success in a variety of domains. In this article, we introduce you to the field of heuristic search and present an implementation of A* — the most widely used heuristic search algorithm — in the Java programming language. Heuristic search algorithms pose high demands on computing resources and memory. We also show how we improved our Java implementation by avoiding expensive garbage collection and exploiting a high-performance alternative to the Java Collections Framework (JCF).