Collective communication operations (CCOs) are one of the most powerful tools for parallel processing on distributed memory architectures. From the theoretical viewpoint there has been a major effort in the design of optimal algorithms for these operations, especially for massive parallel processors (MPPs). However, in spite of the increasing availability of MPPs, there are just a few limited experimental checks of the different theories, so the assessment of their real value is not easy. The aim of the present paper is to address such issues for the most common CCOs, considering practical algorithms that can be included in a generic communication library. The main result is a new algorithm for building a quasi-optimal broadcast tree that is much simpler than, and as efficient as, previously available algorithms. To investigate the advantages and drawbacks of the proposed algorithms, a large set of experimental data has been collected on an IBM SP2 parallel system. The data demonstrate the efficiency of our approach in a number of interesting cases. Finally, all the experimental results have been related to the model used in designing the algorithms.

Collective communication operations: experimental results vs. theory

Iannello G
1998-01-01

Abstract

Collective communication operations (CCOs) are one of the most powerful tools for parallel processing on distributed memory architectures. From the theoretical viewpoint there has been a major effort in the design of optimal algorithms for these operations, especially for massive parallel processors (MPPs). However, in spite of the increasing availability of MPPs, there are just a few limited experimental checks of the different theories, so the assessment of their real value is not easy. The aim of the present paper is to address such issues for the most common CCOs, considering practical algorithms that can be included in a generic communication library. The main result is a new algorithm for building a quasi-optimal broadcast tree that is much simpler than, and as efficient as, previously available algorithms. To investigate the advantages and drawbacks of the proposed algorithms, a large set of experimental data has been collected on an IBM SP2 parallel system. The data demonstrate the efficiency of our approach in a number of interesting cases. Finally, all the experimental results have been related to the model used in designing the algorithms.
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.12610/2572
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 35
  • ???jsp.display-item.citation.isi??? 22
social impact