Bringing Computing Algorithms to Life Bruce R. Maxim Mathematics and Statistics The Universiy of Michigan-Dearborn Gregory F. Bachelis Mathematics Wayne State University David A. James Mathematics and Statistics The University of Michigan-Dearborn Quentin F. Stout Electrical Engineering and Computer Science The University of Michigan-Ann Arbor Recent advances in the development of parallel computers have spurred new interest in parallel algorithms among computer scientists. Algorithms written for conventional computers call for their steps to be executed one at a time. We refer to these as serial algorithms. Parallel, or concurrent algorithms, call for execution of several of their steps at the same time. The ideas underlying many parallel algorithms are quite accessible to secondary school students. Many students are already aware of the notion of parallel operations as they relate to automobile assembly or old-time barn raising. Consequently, parallel algorithms provide a source of interesting material for secondary school students in their study of algorithms. This article describes activities for introducing the concepts of parallel processing in the secondary school classroom. In doing any algorithmic work with students, it is a good idea to have them trace through the steps of an algorithm using test data, rather than to have them merely read descriptions of the algorithm. The hardest part about tracing parallel algorithms is that often a lot is going on at the same time, and it is difficult to observe the overall progress of the algorithm towards the completion of its goal. Our solution to this dilemma is to have the students assume the roles of processors in a hypothetical parallel computer and to act out the algorithm. Using this approach brings parallel algorithms to life. The students not only gain a greater appreciation of what constitutes an algorithm, but they also experience firsthand the increased efficiency to be gained when several people cooperate with one another to achieve an objective. Here we are talking about the type of cooperative activity in which everyone is doing a similar task. This is a very important concept in parallel computing, since the processors which are linked together are usually identical to each other. In this article, we concentrate on the problem of sorting a list of numbers, although a number of other problems are equally accessible. The activities described below take about one class period and have been well received by several secondary school mathematics and computing classes. Prior to introducing the activities discussed below, the basic notions algorithms should be reviewed with the students. This might be done by having the students develop algorithms for tasks like adding two numbers together or getting dressed in the morning. The importance of efficiency should be approached by focusing the students' attention on the speed with which an algorithm accomplishes its objective. The students should also be given a chance to discuss how their algorithm for getting dressed might change if it were possible to put on several pieces of clothing simultaneously. Initially, the students are shown a deck of 16 numbered cards arranged in random order and asked to think of ways of sorting them in ascending order. During the discussion, the teacher should help the students focus on identifying the subtasks which make up the process of sorting. At some point, the teacher should suggest using the selection sort algorithm, described below, to put the deck in increasing order. Selection Sort 1. Find the largest card in the deck. 2. Place this card face up on a pile of largest cards. 3. Repeat steps 1 and 2 for remaining cards until only one card remains in the deck. 4. Place the last card on the pile of largest cards. To actually implement the algorithm described above, the class should first consider the task of finding the largest card in the deck. An algorithm for this appears below. The basic operation involved in finding the largest card is that of comparing two cards, and a reasonable measure of the algorithm's efficiency is to count the number of comparison steps required. Two student volunteers will be required, one to act as the processor and one to count the comparisons by making tally marks on the chalk board. The student/processor is then handed the deck of 16 cards. He or she begins by picking up two cards, returning the larger card to the original deck and discarding the smaller (beginning a loser's pile), while the student/recorder makes a single tally mark. When the algorithm ends the student/recorder will have made 15 tally marks. These marks should be labeled (on the board) with the heading "Largest Card". Largest Card 1. Pick up two cards from the deck. 2. Place the smaller on a pile of lesser cards. 3. Return larger card to deck. 4. Repeat steps 1 to 3 until one card remains in the deck. Having now determined the largest card, the students should return to the task of using selection sort to put the cards in increasing order. The largest card discovered above is placed face up on the table. The largest card algorithm is then carried out using the 15 cards discarded during its first execution. The student/recorder continues making tally marks for each comparison. After the student/processor has located the largest card among the 15 remaining cards (the second-largest card overall) and placed it on the table on top of the (already found) largest card, the student/recorder will have made 14 tally marks. These new marks should be labeled "Next Card". The class should be asked to predict the number of steps required to locate the largest card among the 14 remaining cards and then be given the opportunity to observe that it does indeed take 13 comparison steps. After placing this card on top of the two cards already on the table the class should be asked to consider what is happening to the stack of "largest" cards growing on the table. They need to be certain that if the process is continued, this stack of cards will eventually contain all 16 cards arranged in ascending order. Once the students agree the algorithm will correctly sort the deck of cards, they should be asked to calculate the number of comparisons required to do so. This can be done by completing the table begun by the student/recorder and then summing the entries as shown below. 15+14+13+12+11+10+9+8+7+6+5+4+3+2+1 = 120 comparisons The students should now be asked to consider whether using more processors for parts of the selection sort algorithm would reduce the time required to sort the deck. They should be led to the suggestion of splitting the deck in two and then using two student/processors, acting concurrently, each performing the selection sort algorithm, described above, on his or her half of the deck. This would result in two sorted decks, each having eight cards. Some technique for combining these decks of cards into a single sorted deck would then need to be devised. This "divide-and-conquer" technique is a standard device in parallel algorithms. At this juncture, some points about parallel processing should be considered by the students. The first is the importance of the two processors working simultaneously and synchronously. This can be guaranteed by instructing the processors to wait for a verbal signal from the teacher before comparing each pair of cards. The class must also agree to count the two-card comparisons being made simultaneously by the two processors as a single tally count. The tally count will therefore represent the number of units of time (or concurrent comparison steps) required to complete the algorithm. Now the two processor portion of the revised selection sort algorithm can be run. Two student/processors and one recorder should be designated. The student/processors are each given 8 cards and told to wait for a signal prior to comparing each pair of cards and discarding the smaller. After receiving 7 such signals, each student/processor will have found the largest card among his or her deck of eight, and the recorder will have made 7 tally marks on the board and labeled them "Largest Card". The student/processors then repeat this process with their remaining 7 cards, 6 cards, etc., until each has completed the sorting algorithm for his or her individual deck of cards. The tally marks should be summed with the result shown below. 7+6+5+4+3+2+1 = 28 comparison steps Now is the time to consider how to combine the two decks. The teacher should suggest using the merge algorithm, as applied to two sorted lists, if the students don't suggest this on their own. The merge algorithm appears below. Merging Two Decks 1. Take one card from each deck. 2. Place smaller card face down on a pile of smaller cards. 3. Pick up a card from the appropriate deck with the empty hand. 4. Repeat steps 1 to 3 until one deck is exhausted. 5. Place remaining deck face down on deck of smaller cards. As before, a tally is made of the number of comparisons required to accomplish this operation. When the merge is completed the recorder will have made at most 15 additional tally marks on the board. So using two processors, this selection-merge sorting algorithm requires at most 28 + 15 = 43 comparison steps. The students should be asked to think of conditions under which the merge operation requires less than the maximum of 15 comparison steps. The students should be given a chance to discuss what they have observed about sorting using one processor versus sorting using two. If invited, they will no doubt suggest that using four or eight processors might be one way to further reduce the number of comparison steps required. In the four processor algorithm, each processor initially sorts 4 cards, which takes 6 comparisons. Then processors 1 and 2 merge their sorted decks of 4 cards, while processors 3 and 4 merge theirs. These merges require at most 7 comparison steps. The resulting 8 card decks are then merged, requiring at most 15 additional comparison steps. The total number of steps is at most 6 + 7 + 15 = 28 comparison steps. Finally, the 8 processor algorithm allows the initial sort to be completed with one "round" involving a single comparison step, followed by three rounds of merge operations requiring 3, 7, and 15 steps, respectively. So this algorithm requires a total of at most 26 comparison steps. We might call this version of the algorithm "8-processor mergesort" since, as applied to 16 cards, it is all "merge", with the "selection" portion having been reduced to a single comparison step. It is important to point out to the students that in the 8- processor mergesort the number of processors that are working is cut in half after each round. The students might like to consider whether a further reduction in the number of comparison steps could be achieved if the work load were distributed more evenly among the processors. The idea would be to involve all processors during each round of the algorithm. Another way of doing selection sort in parallel is to arrange the processors into a tree as shown below. This particular tree has 15 processors. Initially the 16 cards are distributed to the processors in Row 4, so that each student has two cards. Processor Configuration for Parallel Selection Sort on a Tree Row 4 * / \ Row 3 * * / \ / \ Row 2 * * * * / \ / \ / \ / \ Row 1 * * * * * * * * There is a great deal of activity going on in this particular algorithm. Each row of processors is doing something different. We have found it helpful to Xerox the instructions and have each student/processor look only at his or her instructions. It is also important to make sure that each student/processor knows where his or her children and parent are located, before attempting to run the algorithm. The instructions for each row appear below. Basically, what happens is that the smaller card held by each student/processor will be passed to his or her parent, who repeats these actions. It can be shown that the number of parallel computation steps for this algorithm is 34. Row 1 Instructions Round 1: Give your smaller card to your parent. You may get the card back. Round 2, 4, etc. (Even numbered rounds): Bye Round 3, 5, etc. (Odd numbered rounds): If parent faces you and you still have cards left then Give parent your smaller card (or only card). You may get the card back. If you run out of cards then Do nothing after this round. Row 2 Instructions Round 1: Face your children. Receive one card from each. Return larger card to child who sent it. Round 2: Give your card to your parent. You may get the card back. Round 3, 5, etc. (Odd numbered rounds): If you have no card then Face your children. Receive a card from each, if they still have any. If both children send cards then Return the larger to child who sent it. If neither sends you a card then Do nothing after this round. Round 4, 6, etc. (Even numbered rounds): If parent faces you and you still have a card left then Give parent your smaller (or only) card. You may get the card back. Row 3 Instructions Round 1: Bye Round 2: Face your children. Receive one card from each. Return larger card to child who sent it. Round 3, 5, etc. (Odd numbered rounds): Give your card to your parent. You may get the card back. Round 4, 6, etc. (Even numbered rounds): If you have no card then Face your children. Receive a card from each, if they still have any. If both chilren send cards then Return the larger to child who sent it. If neither sends you a card then Do nothing after this round. Row 4 Insructions Round 1, 2: Bye. Round 3, 5, etc. (Odd numbered rounds): Face your children. Receive a card from each, if they still have any. If both children send cards then Return the larger to child who sent it. If neither sends you a card then Sort is finished. Round 4, 6, etc. (Even numbered rounds): Place your card face down on top of the pile in front of you. (Pile is created during round 4). The tree version of selection sort, like the selection-merge sorts discussed earlier, still suffers from the problem of having a serial "bottleneck". It is possible to implement an odd-even compare-exchange sorting algorithm using 16 students which achieves perfect "load-balancing". The students are arranged so that 8 face the front of the room and 8 face the back of the room, as shown below. This algorithm is a parallel version of bubblesort; it can be shown that at most 16 (concurrent) comparison steps are required to sort 16 cards. Unlike serial bubblesort, this parallel bubblesort algorithm is very efficient. Parallel Bubble Sort Processor Arrangement front * * * * * * * * * back * * * * * * * * * The algorithm for utilizing this arrangement of processors appears below. The cards are initially distributed one per processor. Note, that each processor is kept busy during every round of the algorithm. We would suggest that the instructions for each group of processors be displayed on the front and back walls of the classroom and that students be given a chance to rehearse their actions before running the complete algorithm. It may be the case that the deck is sorted using fewer than 16 comparisons and the students should be encouraged to discuss how this might occur. Parallel Bubble Sort Algorithm ODD Rounds: IF facing front THEN Pass card right and across Receive one card back IF facing back THEN Receive card from right Compare two cards Pass back smaller card EVEN Rounds: IF facing front THEN Pass card left and across Receive one card back IF facing back THEN Receive card from left Compare two cards Pass back smaller card We hope that you will try some of these activities with your students. If you do, we would like to hear of your experiences. You and your students are encouraged to introduce modifications to the activities we have described. If you are interested in pursuing other activities with your students involving serial and parallel algorithms (such as adding large numbers, finding primes, or additional sorting algorithms), or if would like more detailed lesson plans for the activities described above, please contact one of us at our respective universities.