Performance Analysis of Algorithms on Shared Memory


Performance analysis of algorithms on shared memory, message passing and hybrid models for stand-alone and clustered SMPs



Parallel computing is a form of computation that allows many instructions to be run simultaneously, in parallel in a program. This can be achieved by splitting up a program into independent parts so that each processor can execute its part of the program simultaneously with the other processors. This can be achieved on a single computer with multiple processors or with number of individual computers connected by a network or a combination of the two.

Parallel computing has grown outside of the high-performance computing community due to the introduction of multi-core3 and multi-processor computers at a reasonable price for the average consumer.

Recent desktop and high performance processors provide multiple hardware threads technically realized by hardware multithreading and multiple processor cores on a single chip. Programmers will be faced with hundreds of hardware threads per processor chip as exploitable instruction level parallelism in applications is limited and the processor clock frequency cannot be increased any further due to power consumption and heat problems exploiting thread level parallelism becomes unavoidable if further improvement in processsor performance is required and there is no doubt that our requirements and expectations of machine performance will increase further. This means that parallel programming will actually concern a majority of application and system programmers in the foreseeable future even in the desktop and embedded domain. A model of parallel computation consists of a parallel programming model and a corresponding cost model .

A parallel programming model describes an abstract parallel machine by its basic operations such as arithmetic operations spawning of tasks reading from and writing to shared memory or sending and receiving messages. Their effects on the state of the computation the constraints of when and where these can be applied and how they can be composed in particular a parallel programming model also contains at least for shared

memory programming models a memory model that describes how and when memory accesses can become visible to the different parts of a parallel computer. The memory model sometimes is given implicitly a parallel cost model that associates a cost which usually describes parallel execution time and resource occupation with each basic operation and describes how to predict the accumulated cost of composed operations up to entire parallel programs A parallel programming model is often associated with one or several parallel programming languages or libraries that realize the model Parallel algorithms that are usually formulated in terms of a particular parallel programming model.

OpenMP (Open Multi-Processing), Message passing Interface (MPI) and Hybrid OpenMP/MPI is a parallel programming model where communication between processes is done by interchanging messages. OpenMP is an API that supports multi-platform shared memory multi-processing programming in C,C++ and Fortran on most processor architectures and operating systems, including Solaris Linux,, AIX, HP-UX, Mac OS X and Windows platforms.

MPI is a model for a distributed memory system where communication cannot be achieved by sharing of variables. The Message Passing Interface (MPI) is the de-facto standard for programming distributed memory systems as it provides a simple communication API and eases the task of developing portable parallel applications.

Hybrid OpenMP+MPI facilitates cooperative shared memory programming across clustered SMP nodes. MPI provides communication among various SMP nodes whereas OpenMP manages the workload on each SMP node. MPI and OpenMP are used in tandem to manage the overall concurrency of the application.


As individual processors are not capable of solving the most significant computational problems because of their inherent complexity, the idea of putting multiple processors to work on a single program came into existence thus motivating the idea of parallel computing.

Parallel computing is the use of a parallel computer to reduce the time needed to solve a single computational problem. it is a multiple-processor computer system supporting parallel programming. Two categories of parallel computers are multi-computers and centralized multiprocessors. Multi-computer is a parallel computer constructed out of multiple computers and an interconnection network where the processors on different computers interact by passing messages to each other. Centralized multi-processor( also called as symmetrical multiprocessor or SMP) is one where all the CPUs share access to a single global memory.


Applications were designed to run on a single systems. But individual systems are not capable of solving the significant problems efficiently because of their inherent complexity.

The limitation is that it cannot harness the capacity of a multi-core processor. Hence multi-threading the applications must be done.


Parallel programming combines the distributed memory parallelization on the node interconnect with shared memory parallelization inside each node. The challenges and the potentials of the dominant programming models on hierarchically structured hardware is described : Pure MPI (message passing interface), pure OpenMP (with distributed shared memory extensions) and hybrid MPI+OpenMP in several flavors. We identify few cases where the hybrid programming model can indeed be the superior solution because of memory consumption or improved load balance and reduced communication needs.

Hybrid programming introduces OpenMP into MPI applications makes more efficient use of the shared memory on SMP nodes, thus mitigating the need for explicit intra-node communication. Introducing MPI and OpenMP during the design/coding of a new application can help maximize efficiency, scaling and performance.

At the recent time, the hybrid model has begun to attract more attention, for at least two reasons. The first is that it is relatively easy to pick a language/library instantiation of the hybrid model: OpenMP plus MPI. While there may be other approaches, they remain research and development projects, whereas OpenMP compilers and MPI libraries are now solid commercial products, with implementations from multiple vendors.

The second reason is that scalable parallel computers now appear to encourage this model. The fastest machines now virtually all consist of multi-core nodes connected by a high speed network. The idea of using OpenMP threads to exploit the multiple cores per node (with one multithreaded process per node) while using MPI to communicate among the nodes appears obvious. Yet one can also use an “MPI everywhere” approach on these architectures, and the data on which approach is better is confusing and inconclusive.


Multithreading of applications on a clustered system using hybrid methodology. The objective is to increase the performance of application on clusters using Hybrid methodology.

  • Network intrusion detection, cryptography, multiparty computations are some of the core users of parallel computing techniques.
  • Embedded systems increasingly rely on distributed control algorithms.
  • A modern automobile consists of tens of processors communicating to perform complex tasks for optimizing handling and performance.
  • conventional structured peer-peer networks impose overlay networks and utilize algorithms directly from parallel computing.

The post Performance Analysis of Algorithms on Shared Memory appeared first on mynursinghomeworks.


Source link