Problem 1.1

a) Scalability over machine size
This indicates how well the performance of a scalable parallel computer will improve with additional processors. The resources increased are most frequently processors, but they could also be memory capacity and I/O capability. There also exist a maximum number of processors a system can accomodate and thus impose an upper bound of scalability over machine size.

b) Scalability over problem size
This indicates how well the syustem can handle larger problems with lager data size and workload. Apart from depending on machine size, it also depends on memory capacity, and communication capability of the machine.

c) Resource scalability
This refers to gaining higher performance or functionality by increasing the machine size (i.e. the number of processors), investing in more storage (cache, main memory, disks), improving the software, etc. Within this dimension, three categories have to be considered. Machine size scalability indicates how well the performance will improve with additional processors. Scaling up in resource means gaining higher performance by investing more memory, bigger off-chip caches, bigger disks and so on. Finally, software scalability indicates how the performance of a system be improved by a newer version of the OS that has more functionalities, a better compiler with more efficient optimizations, more efficient mathematical and engineering libraries, more efficient and easy-to-use applications software and more user-friendly progrmming environment.

d) Generation scalability
This refers to the capability of a system to scal up the performance using the next generation components, such as a faster processor, a faster memory, a newer version of operating system, a more powerful compiler, etc, with the rest of the system be usable and modifiable as little as possible.

e) Heterogeneity scalability
This property refers to how well a system can scal up by integrating hardware and software components supplied from different designers or vendors. This calls for using components with a standard, open architecture and interface. In the software area, this is called portability.

Problem 1.2
For the PRAM model, its machine size n can be arvitrarily large. The basic time step is called a cycle and within each cycle, each procesor executes exactly one instruction which can be any random access machine instruction and even a null instruction. Also, all processors are tightly synchronous at each cycle, and synchronization overhead is assumend to be zero. Communication between processors is done through accessing shared variables (or shared memory) with the communication overhead ingnored. The parallelism overhead is also ignored and the only overhead accounted in a PRAM is the load imbalance overhead.

Problem 1.3
a) Parallel vector processors (PVP)
1. PVP systems contain a small number of powerful custom-designed vector processors to manipulate vector data.
2. PVP systems normally do not use caches but they use a large number of vector register and an instruction buffer.

b) Symmetric multiprocessor (SMP)
1. Every processor has equal access to the shared memory, the I/O devices, and the operating system devices.
2. SMP are connected to shared memory through a high speed snoopy bus. The limitation is that they use a centralized shared memory and a bus or crossbar system interconnect, which are both difficult to scale onece built.

c) Massively parallel processor (MPP)
1. A very large-scale computer system with commodity processing nodes interconnected together with a high-speed low-latency interconnect. Memories are physically distributed.
2. Only a few host OS has a micro-kernel.

d) Cluster of workstations (COW)
1. Each node is itself a complete workstation, without some peripherals (e.g. monitor, keyboard, mouse, etc.)
2. There is a complete OS residing on each node. The OS of a COW is the same workstation UNIX, plus an add-on software layer to support single system image, availbility, parallelism, communication, and load balancing.

e) Disctributed shared memory (DSM) machine
1. The memory of the DSM machines is physically distributed among different nodes, but system hardware and software create an illusion of a single address space to application users.
2. Special hardware and software extensions are used to maintain memory consistency and coherence.

Problem 1.7
a) For multiprocessors, the replicated unit is processor while in multicomputer, the replicated unit is the whole computer. Besides, multiprocessors are tightly coupled to the high degree sharing resource through a high-speed backplane or motherboard. However, the multicoomputers consist of multiple computers, often called nodes, interconnedted by a message-passing network. It is loosely coupled since it has only a low degree of resource sharing through commodity network. Each node is an autonomouse computer consisting of a processor, local memory, and sometimes attached disks or I/O peripherals.
For multiprocessors, many processors share the shared memory, shared disk, network and I/O. Interaction is done through shared variables. For multicomputers, each node has its own disk, memory (and maybe I/O). The only shared resource is the network. Interaction is done through message passing.

b)
* UMA stands for uniform memory access. All memory locations are at an equal distance away from any processor, and all memory accesses roughly take the same amount of time.
* NUMA stands for non-uniform memory access. It does not support constant-time read and write operations. In most NUMA architectures, memory is organized hierarchically, so that some portions can be read and written more quickly by some processors than by others.
* COMA stands for cache-only memory architecture. All local memories are structured as caches. Such a cache has much larger capacity than the level-2 cache or the remote cache of a node. COMA is the only architecture that provides hardware support for replicating the same cache block in multiple local memories.
* DSM is distributed shared memory. The memories are physically distributed to all processors, but logically shared so that a programmer can see a logically shared memory. Some computers also have a hierarchical memory organization. One processor's local memory may be another's remote memory. Besides local memories, the processors have accesses to a global memory.
* NORMA is no-remote memory access. The node memories have separate address spaces. A node can not directly access remote memory. The only way to access remote data is by passing messages.

c) Clusters differ from a conventional network of autonomous computers in that all cluster nodes must be able to work collectively as a single, integrated computing resource; in addition to filling the conventional role of using each node by interactive users individually. A cluster realizes this single-resource concept by a number of single-system image (SSI) tecuniques. The SSI layer makes the system collaborate to give the user an illusion that the computers virtually form one single huge computer, so that it can be eqasier to use, manage and maintain. Furthermore, availability is enhanced by redundancy features to eliminate single points of failure.

d) Clusters have the following features:
* Cluster is more scalable than SMP. Cluster's scalability is a multitude scalability. SMPs are processor scalable systems, while clusters scale in many components, including processor, memory, disks, and even I/O devices. Being loosely coupled, clusters can scale to hundreds of nodes, while it is extremely difficult to build an SMP of more than tens of processors. In an SMP, the shared memory (and the memory bus) is a bottleneck, while in a cluster, there is no memory bottleneck. Cluster can provide much higher aggregate memory bandwidth and reduced memory latency. The local disks of a cluster also aggregate to large disk space, which can easily surpass that of a centralized RAID disk. The enhanced processing, storage, and I/O capability enables a cluster to solve large-scale programs by using some of the well developed parallel software packages.
* A cluster has multiple memory, local disks and processor components. When one fails, the others can still be used to keep the cluster going. In contrast, when the shared memory of an SMP machine fails, the entire system will be brought down. A cluster has multiple OS images, each residing in a separate node. When one system image crashes, other nodes still work, while in SMP, which has a single OS iamge residing in the shared memory. The failure of this image crashed the entire system.