a) Scalability over machine size
This indicates how well the performance of a scalable parallel computer
will improve with additional processors. The resources increased
are most frequently processors, but they could also be memory capacity
and I/O capability. There also exist a maximum number of processors a system
can accomodate and thus impose an upper bound of scalability over machine
size.
b) Scalability over problem size
This indicates how well the syustem can handle larger problems with
lager data size and workload. Apart from depending on machine size, it
also depends on memory capacity, and communication capability of the machine.
c) Resource scalability
This refers to gaining higher performance or functionality by increasing
the machine size (i.e. the number of processors), investing in more storage
(cache, main memory, disks), improving the software, etc. Within this dimension,
three categories have to be considered. Machine size scalability indicates
how well the performance will improve with additional processors. Scaling
up in resource means gaining higher performance by investing more memory,
bigger off-chip caches, bigger disks and so on. Finally, software scalability
indicates how the performance of a system be improved by a newer version
of the OS that has more functionalities, a better compiler with more efficient
optimizations, more efficient mathematical and engineering libraries, more
efficient and easy-to-use applications software and more user-friendly
progrmming environment.
d) Generation scalability
This refers to the capability of a system to scal up the performance
using the next generation components, such as a faster processor, a faster
memory, a newer version of operating system, a more powerful compiler,
etc, with the rest of the system be usable and modifiable as little as
possible.
e) Heterogeneity scalability
This property refers to how well a system can scal up by integrating
hardware and software components supplied from different designers
or vendors. This calls for using components with a standard, open architecture
and interface. In the software area, this is called portability.
Problem 1.2
For the PRAM model, its machine size n can be arvitrarily large. The
basic time step is called a cycle and within each cycle, each procesor
executes exactly one instruction which can be any random access machine
instruction and even a null instruction. Also, all processors are tightly
synchronous at each cycle, and synchronization overhead is assumend to
be zero. Communication between processors is done through accessing shared
variables (or shared memory) with the communication overhead ingnored.
The parallelism overhead is also ignored and the only overhead accounted
in a PRAM is the load imbalance overhead.
Problem 1.3
a) Parallel vector processors (PVP)
1. PVP systems contain a small number of powerful
custom-designed vector processors to manipulate vector data.
2. PVP systems normally do not use caches but they
use a large number of vector register and an instruction buffer.
b) Symmetric multiprocessor (SMP)
1. Every processor has equal access to the shared
memory, the I/O devices, and the operating system devices.
2. SMP are connected to shared memory through a
high speed snoopy bus. The limitation is that they use a centralized shared
memory and a bus or crossbar system interconnect, which are both difficult
to scale onece built.
c) Massively parallel processor (MPP)
1. A very large-scale computer system with commodity
processing nodes interconnected together with a high-speed low-latency
interconnect. Memories are physically distributed.
2. Only a few host OS has a micro-kernel.
d) Cluster of workstations (COW)
1. Each node is itself a complete workstation, without
some peripherals (e.g. monitor, keyboard, mouse, etc.)
2. There is a complete OS residing on each node.
The OS of a COW is the same workstation UNIX, plus an add-on software layer
to support single system image, availbility, parallelism, communication,
and load balancing.
e) Disctributed shared memory (DSM) machine
1. The memory of the DSM machines is
physically distributed among different nodes, but system hardware and software
create an illusion of a single address space to application users.
2. Special hardware and software extensions
are used to maintain memory consistency and coherence.
Problem 1.7
a) For multiprocessors, the replicated unit is processor while in multicomputer,
the replicated unit is the whole computer. Besides, multiprocessors are
tightly coupled to the high degree sharing resource through a high-speed
backplane or motherboard. However, the multicoomputers consist of multiple
computers, often called nodes, interconnedted by a message-passing network.
It is loosely coupled since it has only a low degree of resource sharing
through commodity network. Each node is an autonomouse computer consisting
of a processor, local memory, and sometimes attached disks or I/O peripherals.
For multiprocessors, many processors share the shared memory, shared
disk, network and I/O. Interaction is done through shared variables. For
multicomputers, each node has its own disk, memory (and maybe I/O). The
only shared resource is the network. Interaction is done through message
passing.
b)
* UMA stands for uniform memory access. All memory locations
are at an equal distance away from any processor, and all memory accesses
roughly take the same amount of time.
* NUMA stands for non-uniform memory access. It does not support constant-time
read and write operations. In most NUMA architectures, memory is organized
hierarchically, so that some portions can be read and written more quickly
by some processors than by others.
* COMA stands for cache-only memory architecture. All local memories
are structured as caches. Such a cache has much larger capacity than the
level-2 cache or the remote cache of a node. COMA is the only architecture
that provides hardware support for replicating the same cache block in
multiple local memories.
* DSM is distributed shared memory. The memories are physically distributed
to all processors, but logically shared so that a programmer can see a
logically shared memory. Some computers also have a hierarchical memory
organization. One processor's local memory may be another's remote memory.
Besides local memories, the processors have accesses to a global memory.
* NORMA is no-remote memory access. The node memories have separate
address spaces. A node can not directly access remote memory. The only
way to access remote data is by passing messages.
c) Clusters differ from a conventional network of autonomous computers in that all cluster nodes must be able to work collectively as a single, integrated computing resource; in addition to filling the conventional role of using each node by interactive users individually. A cluster realizes this single-resource concept by a number of single-system image (SSI) tecuniques. The SSI layer makes the system collaborate to give the user an illusion that the computers virtually form one single huge computer, so that it can be eqasier to use, manage and maintain. Furthermore, availability is enhanced by redundancy features to eliminate single points of failure.
d) Clusters have the following features:
* Cluster is more scalable than SMP. Cluster's scalability is a multitude
scalability. SMPs are processor scalable systems, while clusters scale
in many components, including processor, memory, disks, and even I/O devices.
Being loosely coupled, clusters can scale to hundreds of nodes, while it
is extremely difficult to build an SMP of more than tens of processors.
In an SMP, the shared memory (and the memory bus) is a bottleneck, while
in a cluster, there is no memory bottleneck. Cluster can provide much higher
aggregate memory bandwidth and reduced memory latency. The local disks
of a cluster also aggregate to large disk space, which can easily surpass
that of a centralized RAID disk. The enhanced processing, storage,
and I/O capability enables a cluster to solve large-scale programs by using
some of the well developed parallel software packages.
* A cluster has multiple memory, local disks and processor components.
When one fails, the others can still be used to keep the cluster going.
In contrast, when the shared memory of an SMP machine fails, the entire
system will be brought down. A cluster has multiple OS images, each residing
in a separate node. When one system image crashes, other nodes still work,
while in SMP, which has a single OS iamge residing in the shared
memory. The failure of this image crashed the entire system.