An important feature of database technology of the nineties is the use of distributed computation for speeding up the execution of complex queries. Today, the use of parallelism is tested in several experimental database architectures and a few commercial systems for conventional select-project-join queries. In particular, hash-based fragmentation is used to distribute data to disks under the control of different processors, in multi-processor architectures without shared memory, in order to perform selections and joins in parallel. With the development of new (logic) query languages and deductive databases, the new dimension of recursion has been added to query processing. Transitive closure queries, such as bill-of-material, allow important database problems to be solved by the database system itself; and more general logic programming queries allow us to study queries not considered before. Although recursive queries are very complex, their regular structure makes them particularly suited for parallel execution. Well-considered use of parallelism can give a high efficiency gain when processing recursive queries. In this paper, we give an overview of approaches to parallel execution of recursive queries as they have been presented in recent literature. After showing that the most typical Datalog queries have exactly the same expressive power as the transitive closure of simple algebraic expressions, we focus on describing algebraic approaches to recursion. To give a good overview of the problems that are inherent to parallel computation, we introduce a graphical formalism to describe parallel execution. This formalism enables us to clearly show the behaviour of parallel execution strategies. We first review algorithms developed in the framework of algebraic transitive closures that operate on entire relations; then we introduce fragmentation, distinguishing between hash-based and semantic fragmentation.
AN OVERVIEW OF PARALLEL STRATEGIES FOR TRANSITIVE CLOSURE ON ALGEBRAIC MACHINES
CACACE F;
1991-01-01
Abstract
An important feature of database technology of the nineties is the use of distributed computation for speeding up the execution of complex queries. Today, the use of parallelism is tested in several experimental database architectures and a few commercial systems for conventional select-project-join queries. In particular, hash-based fragmentation is used to distribute data to disks under the control of different processors, in multi-processor architectures without shared memory, in order to perform selections and joins in parallel. With the development of new (logic) query languages and deductive databases, the new dimension of recursion has been added to query processing. Transitive closure queries, such as bill-of-material, allow important database problems to be solved by the database system itself; and more general logic programming queries allow us to study queries not considered before. Although recursive queries are very complex, their regular structure makes them particularly suited for parallel execution. Well-considered use of parallelism can give a high efficiency gain when processing recursive queries. In this paper, we give an overview of approaches to parallel execution of recursive queries as they have been presented in recent literature. After showing that the most typical Datalog queries have exactly the same expressive power as the transitive closure of simple algebraic expressions, we focus on describing algebraic approaches to recursion. To give a good overview of the problems that are inherent to parallel computation, we introduce a graphical formalism to describe parallel execution. This formalism enables us to clearly show the behaviour of parallel execution strategies. We first review algorithms developed in the framework of algebraic transitive closures that operate on entire relations; then we introduce fragmentation, distinguishing between hash-based and semantic fragmentation.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.