IntroductionpH is a parallel language obtained by extending Haskell. This document must be read in conjunction with the Haskell report since it describes only the extensions.
AbstractThe semantics of kernel pH are defined in the form of a parallel, normalizing interpreter. A description of I-structure and M-structure operators is also given within the same framework. Semantics of barriers in pH are presented by translation into the kernel language without barriers. the framework presented is also suitable for multithreaded compilation of pH.
AbstractMany of the details that a programmer must manage when programming in a procedural language are handled by the implementations in a functional language. In a parallel functional language, we would expect the assignment of processes to processors and the scheduling of processes to be handled by the implementation. The partitioning of a program into processes may be handled by the implementation as well.
We will assume that a parallel functional program may create processes dynamically, and that processes may synchronize in an arbitrary manner. It does not matter whether processes are defined in the source program or generated by the implementation.
On parallel systems where each processor has some local memory, it is common practice not to move processes once they have started to execute. We will show that if each process must be executed on a single processor, then no fully automatic scheduling strategy can ensure good performance.
We also will show that if all processors are sequential processes (i.e. do not contain internal parallelism), and if a process may be moved whenever it resumes execution following synchronization with another process, then good performance can be assured, at least in theory.
Selected references
- Richard E. Ladner and Michael J. Fischer. Parallel prefix computation. Journal of the ACM, 27(4):831-838, October 1980.
- Jayadev Misra. Powerlist: A structure for parallel recursion. ACM Transactions on Programming Languages and Systems, 16(6):1737-1767, November 1994.
Selected papers that cite this oneScience of Computer Programming, 26(1-3):59-78, May 1996.
- Klaus Achatz and Wolfram Schulte. Massive parallelization of divide-and-conquer algorithms over powerlists.
Contents summary:
Introduction in functional programming using Miranda; Clean (version 0.8); Underlying model of computation (lambda-calculus, term rewriting systems, graph rewriting systems); Type systems; Strictness analysis; Implementation techniques using Clean as intermediate language; Abstract machines; Parallel Graph Rewriting; Concurrency Aspects of Clean; Code generation for both sequential and parallel architectures.
AbstractCommunication lifting is a program transformation that can be applied to a synchronous process network to restructure the network. This restructuring in theory improves sequential and parallel performance. The transformation has been formally specified and proved correct and it has been implemented as an automatic program transformation tool. This tool has been applied to a small set of programs consisting of synchronous process networks. For these networks communication lifting generates parallel programs that do not require locking. Measurements indicate performance gains in practice both with sequential and parallel evaluation. Communication lifting is a worthwhile optimisation to be included in a compiler for a lazy functional language.
AbstractComputations described using Bird's constructive algebra of lists are nicely amenable to parallel implementations. Indeed, by using higher-order functions, ordered list traversals such as foldl and foldr can be expressed as unordered reductions. Based on this observation, a set of optimizations have been developed for list traversals in the parallel Haskell (pH) compiler. These optimizations are inspired by, and partially subsume, earlier work an the optimization of sequential list traversal.
AbstractTraditionally a vectorising compiler matches the iterative constructs of a program against a set of predefined templates. If a loop contains no dependency cycles then a map template can be used; other simple dependencies can often be expressed in terms of fold or scan templates. This paper addresses the template matching problem within the context of functional programming. A small collection of program identities are used to specify vectorisable for-loops. By incorporating these program identities within a monad, all well-typed for-loops in which the body of the loop is expressed using the vectorisation monad can be vectorised. This technique enables the elimination of template matching from a vectorising compiler, and the proof of the safety of vectorisation can be performed by a type inference mechanism.
AbstractR. Milner, J. Parrow und D. WalkerWe present the pi-calculus, a calculus of communicating systems in which one can naturally express processes which have changing structure. Not only may the component agents of a system be arbitrarily linked, but a communication between neighbours may carry information which changes that linkage. The calculus is an extension of the process algebra CCS, following work by Engberg and Nielsen who added mobility to CCS while preserving its algebraic properties. The pi-calculus gains simplicity by removing all distinction between variables and constants; communication links are identified by names, and computation is represented purely as the communication of names across links.
After an illustrated description of how the pi-calculus generalises conventional process algebras in treating mobility, several examples exploiting mobility are given in some detail. The important examples are the encoding into the pi-calculus of higher-order functions (the lambda-calculus and combinatory algebra), the transmission of processes as values, and the representation of data structures as processes.
The paper continues by presenting the algebraic theory of strong bisimilarity and strong equivalence, including a new notion of equivalence indexed by distinctions -- i.e. assumptions of inequality among names. These theories are based upon a semantics in terms of a labelled transition system and a notion of strong bisimulation, both of which are expounded in detail in a companion paper. We also report briefly on work-in-progress based upon the corresponding notion of weak bisimulation, in which internal actions cannot be observed.
AbstractThis is the second of two papers in which we present the pi-calculus, a calculus of mobile processes. We provide a detailed presentation of some of the theory of the calculus developed to date, and in particular we establish most of the results stated in the companion paper.
AbstractWe present an operational semantics for call-by-need reduction in terms of an encoding of the lambda calculus in Milner's pi-calculus. Correctness of the encoding is proved with respect to the call-by-need lambda-calculus of Ariola et al. The compactness and intuitiveness of the process calculus presentation makes it interesting as an alternative method for defining what is meant by 'call-by-need'. With respect to applications, the interest of this work lies in the use of pi-calculus as an abstract yet realistic target language. The practical value of the encoding is demonstrated with an outline for a parallel code generator.
Compared to the work presented in TR CS/93/14, this report focusses on a single reduction strategy, call-by-need, and strengthens the theoretical foundations of the encoding. The section on implementation is new and has prompted some small changes to the encodings (e.g, we now use only the asynchronous subset of the monadic pi-calculus).
AbstractGUM is a portable, parallel implementation of the Haskell functional language which has been publicly released with version 0.26 of the Glasgow Haskell Compiler (GHC). GUM is message-based, and portability is facilitated by using the PVM communications-harness available on most multi-processors, including shared-memory and distributed-memory machines. For example GUM is available by FTP for a Sun SPARCserver multiprocessor and for a network of Sun SPARC workstations.
High message-latency in distributed machines is ameliorated by sending messages asynchronously, and by sending large packets of related data in each message. Initial performance figures demonstrate absolute speedups relative to the best sequential compiler technology. To improve the performance of a parallel Haskell program GUM provides tools for monitoring and visualising the behaviour of threads and of PEs during execution.