Г. В. Царева
Скачать 1.52 Mb.
|
COMPUTER RESEARCHINTRODUCTORY TEXTThe IBM Computer Science team explores foundational issues that confront the computing industry today. Because theory cuts across every aspect of computer science, they tend to interact with a large number of other research teams. The IBM research center has ongoing projects in the following areas: Algorithms: How can you get computers to solve problems efficiently? Complexity: What computational resources (time, storage, etc) do problems inherently require? Database: What models and algorithms are useful in helping computers store and retrieve information efficiently? Web: What is the overall structure of the World-wide Web, and how can algorithms take advantage of this? IBM’s focus on algorithms falls under the following broad categories: (1) Massive data set algorithms: The design of algorithms that work on massive data sets is inevitable from a practical point of view. These algorithms have to work with very limited main memory. Three computational models have been proposed in this setting: the data stream model where the algorithm can make one or few passes over the data; the sampling model, where the algorithm is allowed random access; and the sublinear model, where the running time of the algorithm has to be sublinear in its input size. (2) Approximation algorithms: To overcome the curse of NP-hardness, developing approximation algorithms becomes important. The primal-dual and the semi-definite programming methods have been of immense help in this regard. These tools have yielded remarkably powerful algorithms to solve many important combinatorial optimization problems. (3) Aggregation algorithms: Developing algorithms for improving the performance of large-scale databases and information retrieval system is a challenging task. Recent interests include aggregation algorithms with performance guarantees and static pruning methods for reducing the index size in IR systems. (4) Other algorithms: Other interests of the IBM research team include lattice algorithms, online algorithms, and property testing algorithms. Complexity theory is a field that concerns itself with the intrinsic computational difficulty of problems. It focuses on the effect of limiting natural computational resources, e.g. time and space, on the class of problems that need to be solved. The emphasis is on generality, and this is usually achieved by defining appropriate models of computation. The classical Turing machine captures the notion of general computation. In dealing with specific classes of problems, it has been fruitful to also consider other models such as circuits, probabilistic Turing machines, random access machines, interactive proof systems, sampling/data stream models etc. The group at IBM Almaden has focused on a variety of problems in this area. Recent work has addressed the complexity of lattice problems, zero-knowledge protocols, fault-tolerant computing, property testing, derandomization, communication complexity, sampling, a data stream algorithms. There is a long history of research at IBM on the theory of databases, going back to the early days of the relational model, which was invented, developed, and studied extensively here. Their main database focus now, both in theory and in more practical projects, is on less traditional ways of storing data, such as in multimedia databases and in emerging databases and data service technologies for the Internet. In the area of multimedia databases, IBM investigated aggregation algorithms for combining fuzzy information from multiple sources. Algorithmic tools for searching and mining the web are becoming increasingly sophisticated and vital. In this context, algorithms that use and exploit structural information about the web perform better than generic methods in both efficiency and reliability. The IBM research center provides a characterization of the web that shows that the web emerges as the outcome of a number of essentially independent stochastic processes that evolve at various scales. A striking consequence of this scale invariance is that the structure of the web is “fractal”– cohesive sub-regions displays the same characteristics as the web at large. An understanding of this underlying fractal nature is therefore applicable to designing data services across multiple domains and scales. We describe potential applications of this line of research to optimized algorithm design for web-scale data analyses. The IBM focuses on two kinds of Web-related research. The first is understanding and modeling the seemingly chaotic Web and the second is developing algorithms that exploit this understanding. Several macro-level micro-level connectivity properties of the Web were studied and the bow-tie model was proposed to abstract the connectivity properties. A simple stochastic model was constructed to explain experimentally observed phenomena; such models can be used to predict the evolution of the Web. |