高性能计算/并行计算基础问题

课程作业问题。求助中文答案
1.What are the main features/components in parallel computer architectures? Briefly describe the Shenwei Taihu Light in terms of these features/components.
2.Can we use the same approach (for instance apply striped partitioning of the matrix) to parallelize the Jacobi iteration and the Gauss-Seidel iteration? Explain your answers.
3.Explain why the use of standard MPI_SEND and MPI_RECV can cause a deadlock in a parallel execution. Give an example of such a deadlock. Will the use of shared-memory to exchange data sometimes also leads to a deadlock? Explain your answer.
4.What do we understand under the class of so-called “hierarchical methods or algorithms”? What do these methods have in common?
Compare the parallelization of the Multi-Grid method and the Barnes-Hut algorithm, what makes the parallelization of the Barnes-Hut algorithm more complicated?
Can you think of a problem in your (future) application which might benefit from a hierarchical approach? (in the discussion about the last question, you don’t have to give a solution already, just some ideas or thoughts are sufficient).
5.Consider the following MPI program.
(a) Is this a correct program? What should be modified to make it more efficient?
(b) If you compile this program, it will most likely execute without problem, however, if the send and receive of 1 value is replaced by 10^6 numbers (the program is not calculating the sum anymore), what will then happen with the communication?
/* C program: Sum processor numbers by passing them around in a ring. /
#include "mpi.h"
#include
/ Set up communication tags (these can be anything) /
#define to_right 201
#define to_left 102
void main (int argc, char *argv[]) {
int ierror, value, new_value, procnum, numprocs;
int right, left, other, sum, i;
MPI_Status recv_status;
/ Initialize MPI /
MPI_Init(&argc, &argv);
/ Find out this processor number /
MPI_Comm_rank(MPI_COMM_WORLD, &procnum);
/ Find out the number of processors /
MPI_Comm_size(MPI_COMM_WORLD, &numprocs);
/ Compute rank (number) of processors to the left and right /
right = procnum + 1;
if (right == numprocs) right = 0;
left = procnum - 1;
if (left == -1) left = numprocs-1;
sum = 0;
value = procnum;
for( i = 0; i < numprocs; i++) {
/ Send to the right /
MPI_Send(&value, 1, MPI_INT, right, to_right,
MPI_COMM_WORLD);
/ Receive from the left /
MPI_Recv(&new_value, 1, MPI_INT, left, to_right,
MPI_COMM_WORLD, &recv_status);
/ Sum the new value /
sum = sum + new_value;
/ Update the value to be passed /
value = new_value;
/ Print out the partial sums at each step /
printf ("PE %d:\t Partial sum = %d\n", procnum, sum);
}
/ Print out the final result /
if (procnum == 0) {
printf ("Sum of all processor numbers = %d\n", sum);
}
/ Shut down MPI */
MPI_Finalize();
return;
}
图片说明

问题太多了，回答第一题吧。如果你采纳了本答案，可以回答更多。
What are the main features/components in parallel computer architectures? Briefly describe the Shenwei Taihu Light in terms of these features/components.
并行计算架构主要分为纯cpu，cpu和gpu混合两种。神威太湖的相关情况可以看公开的报道，据说用的是国产的众核cpu。众核cpu介于cpu和gpu之间。