Help for the Hashing Animation Tool

Introduction | Sample Instructions | Additional Display Information | Algorithms | Operating Requirements | Suggestions
 
  Introduction:  
 

The purpose of this applet is to demonstrate, and allow comparisons between, five hashing algorithms:

After the user enters specifications for the hash table and numerical input, the applet will animate the storage or retrieval of the data. As the demo runs, the user may display the pseudocode for the algorithm. In this mode, lines of code are highlighted in synchronization with the animation to demonstrate the algorithm logic. The user may alternatively elect to display statistics that are continuously updated as the algorithm executes.

Tabular setup and statistics summaries are provided for up to five runs. These allow the user to compare the performance of different algorithms or the same algorithm with different settings. In addition, the applet will also display plots of average time and average probes as a function of table load.

 

 
  Sample Instructions:  
 
  1. When the applet loads in the browser it displays an animated title screen and a toolbar along the bottom. To display the Setup page for the demo, select an algorithm from the drop-down menu or click the Setup button if you wish to use Linear Probing.

  2. The Setup page shows the current settings for the selected algorithm. If you wish to change a setting, click the Change button for the appropriate section, enter the new setting(s) and click the OK button. Instructions for each input are provided in the Notes display. When you are finished, click the Execute button on the toolbar.

  3. The Execution page displays the hash table, hash function, execution progress and the statistics table. To view the algorithm pseudocode instead of the statistics, click the Source Code button. Click the Statistics button if you wish to toggle back.

  4. To begin the demo click either the Step or Run button. If the Statistics table is showing, the Step button will execute one table search or insertion. If the pseudocode is displayed, it will execute the action corresponding to the highlighted line of code. Click Step again for the next action or click the Run button to animate the demo to completion. To adjust the speed of the animation or disable the sound, click the Options button and set the controls in the pop-up dialog box. Clicking the Pause button while a demo is running halts the execution and allows you to switch back to Step mode if desired. Clicking Abort resets the execution display back to its start state. After executing one or more demos, click the Results button.

  5. The Results page displays tabular summaries of up to five completed runs. Here you can compare the performance of different algorithms or settings. To delete a run, click on any of its summary fields to select it and click the Delete button. Clicking the Undelete button restores any deleted summaries one at a time if there is room to display them. Next, click the Plot button.

  6. The Plot page allows you to plot Average Time and Average Probes as functions of Average Table Load Factor for storage and as functions of numbers searched/total numbers for retrieval. To plot a run, click on its summary row with the mouse. The summary text will change color and will match its corresponding data on the graph. This page allows three possible views of the two plots: two small plots, one large time plot or one large probes plot. Click either the Time, Probes or Both button on the toolbar to display the desired view.

You are not restricted to performing the steps in the order outlined above. Any button or control that is enabled may be used at any time.

 

 
  Additional Display Information:  
 

This section provides additional information about each of the applet displays.

 
Setup Page

Each algorithm is setup individually. When the Setup page is selected it will display the default settings for the current algorithm or the settings entered the last time the Setup page for that algorithm was modified.

The range limits on the individual settings are interrelated and some of them are strictly for display purposes. For example, more one-slot buckets will fit on the screen than three-slot buckets. Therefore, if you increase the slots per bucket from one to three the applet may automatically reduce the number of buckets to ensure the table will fit on the screen. When this happens, the Notes area will display a message indicating that an adjustment was made and a notification sound will play (if sound is enabled). A similar adjustment will also occur if you make changes that reduce the capacity of the table to below that of the current data size setting.

There are two options that are only available for Retrieve demos. Both are found in the Execution mode section of the page. The first allows you to specify the percentage of searches for data that will be successful. The second is a quick load option. For Retrieve demos, you may watch or step through the loading of the table prior to running the actual data retrieval. To skip this portion of the demo, select the quick load checkbox. The table will then be loaded with data prior to being displayed.

 

Execution Page

Table Display

The hash table uses different colors to indicate various states and actions. An empty bucket slot is gray. A slot that contains a value is purple. Links to other buckets are cyan. A link displaying a diagonal line indicates that the value of that pointer is null. Nodes of linked-list chains are show in orange.

When a bucket or node is searched, either to find an empty slot or to find a certain value, the border of the bucket or node is highlighted in yellow. A successful search for a value is indicated by displaying the slot or node containing the value in green.

Due to display limitations, nodes of linked-list chains over a certain length will not be visible. In these cases, the last visible node will have a dashed link arrow to indicate there are additional nodes in the chain. This arrow will be shown in yellow to indicate when these additional nodes are being searched and it will be shown in green when a search is successful.

The text above of table indicates whether a store or retrieve demo is selected. It also indicates the current search status of the algorithm or whether the demo has finished executing.

 

Hash Function Display

The upper right portion of the Execution page indicates the current hash function. In all cases division hashing is used and the equation will be:

   home address = value mod (number of buckets)

This display is updated as each value is stored or retrieved.

 

Progress Display

The progress display is immediately below the hash function. This display indicates how many values out of the data size specified during setup remain to be processed. The progress display is the only portion of the Execution page updated during a quick load.

 

Statistics Display

The following statistics are continuously updated as the demo executes:

  • Time - the minutes, seconds, and centiseconds that have elapsed since the demo began. This data is obtained from your computer system's clock and is not adjusted to account for animation speed or delays introduced by using the Step and Pause buttons.
  • Average Probes - the average number of bucket and node searches per value that the algorithm attempted to store or retrieve.
  • Load Factor - the number of values in the table divided by the table capacity. The table capacity equals: buckets x slots + size of separate overflow.
  • Searches - if a value was successfully stored or retrieved from the table it counts as a successful search. When the algorithm finishes the sum of the successful and unsuccessful searches will equal the data size.

Pseudocode Display

This portion of the page shows the pseudocode for the current algorithm. As the demo runs the lines of code are highlighted in synchronization with the animation to demonstrate the algorithm logic. See the Algorithms section for more information about the individual collision resolution algorithms.

 

Results Page

Any demo that is allowed to run to completion creates a summary listing that will be displayed on the Results page. Please note that the results displayed for a retrieve demo do not include any of the probes, searches, etc. that occurred while the table was being loaded.

The Results Page displays summaries for up to five runs. If additional demos are run, the summary displayed in column one will be lost, the remaining summaries shifted down and the new summary added to column five.

You may delete any of the displayed summaries to make room on the page for new runs. To do this click any of the summary data for the runs you wish to remove and click the Delete button. Clicking the Undelete button restores deleted summaries one at a time if there is room to display them. Up to five deleted summaries may be restored.

 

Plot Page

The Plot page allows you to plot Average Time and Average Probes for up to five demos. Average Time refers to the average number of seconds that elapsed for each value the algorithm attempted to store or retrieve. Similarly, Average Probes is the average number of bucket and node searches per value that the algorithm attempted to store or retrieve. The interpretation of the x-axis depends on if the algorithm was storing or retrieving data. For storage runs, the plots are functions of Average Table Load Factor. Retrieve demos, however, plot data as functions of the current number of values searched over the total number to search (the data size).

 

Options Dialog

The Options dialog is a pop-up window that is displayed when the Options button is clicked on the toolbar. This dialog allows you to enable or disable sound and to adjust the algorithm execution speed.

Sounds are played under the following circumstances: to indicate an input error on the Setup page, to indicate the number of buckets or data size was automatically reduced, and to indicate successful and unsuccessful searches.

The animation delay is controlled with a slider and sets the number of milliseconds the algorithm pauses after each action. Setting the delay to zero results in maximum execution speed. The execution will not appear instantaneous however, since the applet still requires a certain amount of time to refresh the elements displayed on the screen.

 

  Algorithms:  
 

This section provides additional information about the five collision resolution algorithms the applet demonstrates. Recall that all of the algorithms use the same division hash function to determine the home address. The collision resolution algorithms dictate the actions taken when a value cannot be stored in its home address because it is already full. Similarly, the approach taken when searching for data depends on the algorithm used to store the data initially.

 

 

Linear Probing

If a collision occurs while trying to store a value at its home address, this algorithm moves forward through the table searching each bucket one by one until an empty slot is found. If the bottom of the table is reached, the search wraps around back to the beginning of the table and continues until either an empty slot is found or the search reaches its starting point, in which case the table is full. For the applet demo, the data size cannot exceed the table capacity; so all storage attempts will be successful.

The retrieve algorithm also performs a wrap-around search starting at the home address. The search is considered unsuccessful if the algorithm encounters an empty slot or wraps around to the home address before finding the desired value.

 

Quadratic Probing

This algorithm is similar to linear probing except a different formula is used to calculate which address is searched next. This formula is:

address = home address + i*i    where "i" is a zero-based search counter.

If the value of an address exceeds the address of the bottom of the table, the search wraps around from address 0.

There are variations of this algorithm that, with certain restrictions on the table and input, guarantee each bucket will be searched. The applet's implementation does not impose these restrictions. Instead, the maximum number of searches it will perform for an empty bucket is equal to the number of buckets in the table. When this condition is reached the storage is considered unsuccessful. Retrieval is similar except that a search is also considered unsuccessful if an empty slot is encountered prior to finding the value searched for.

 

Bucket Chaining

Storage using the bucket chaining algorithm requires two passes through the input data set. On the first pass, the home address for a value is determined and the data is inserted in the table if there is an available slot at this address. If there is a collision however, the data is stored in a temporary data structure until the second pass.

The second pass works as follows:
For each value that wasn't stored on the first pass, the algorithm checks the link of the value's home address. If the link is null, a wrap around linear probing search is done for a completely empty bucket. If an empty bucket is found, the value is inserted and the link of the home address is set to the address where the value was inserted. If the link at the home address is not null, the algorithm searches for an empty slot in the bucket pointed to by the link. If there is an empty slot the value is inserted. If this bucket is also full and has a link that is not null, the next bucket in the chain is searched. If the chain ends with a full bucket, the algorithm performs a wrap around linear probing search for a completely empty bucket. If an empty bucket is found, the value is inserted and the link of the last bucket in the chain is set to the address where the value was inserted. Any wrap around search that returns to its origin is considered unsuccessful.

The retrieve algorithm starts at the home address and searches each bucket in the chain. The search is unsuccessful if an empty slot or a full bucket with a null link is encountered prior to finding the value searched for.

 

Linked List Chaining

This algorithm always stores a data value at its home address. Data is inserted in the bucket slots until the bucket is full. Subsequent values hashed to this address are added as new nodes in the link list pointed to by the bucket. For the algorithm demonstrated by the applet, new nodes are added to the front of the list.

The retrieve algorithm first searches the slots of the bucket at the home address. If the value is not present, the algorithm searches each node of the linked list one by one. The search is unsuccessful if the end of the chain is reached prior to finding the value searched for.

 

Chaining with Separate Overflow

This algorithm requires two hash tables. The hash function is based on the number of buckets in the primary hash table. All home addresses calculated by the hash function refer to the primary table. Data is inserted in the overflow table only if the home address in the primary table is full. The overflow table used in the applet demo is limited to one slot per bucket.

If a collision occurs while trying to store a value at its home address, the algorithm checks the link at this bucket. If the link is null, the algorithm performs a linear probing search of the overflow table for an empty bucket starting at the first bucket in the table. If an empty bucket is found, the value is inserted and the link of the home address is set to the address in the overflow table where the value was inserted.

If the link at the home address is not null, the algorithm searches the associated chain in the overflow table until the end of the chain is reached. At this point the algorithm performs a wrap around linear probing search for an empty bucket. If an empty bucket is found, the value is inserted and the link of the last bucket in the chain pointed to this bucket's address. Any wrap around search that returns to its origin is considered unsuccessful.

The retrieve algorithm starts at the home address and searches each bucket in the chain. The search is unsuccessful if a full bucket with a null link is encountered prior to finding the value searched for.

  Operating Requirements:  
 

The Hashing Animation Tool is implemented as a Java applet. In order to view the applet a Java enabled web browser that supports Java 1.1, such as Internet Explorer 4+ or Netscape Navigator 4+, is required. In addition, due to the dimensions of the applet a monitor resolution of at least 1027 x 768 is recommended. Finally, the file size for the applet is fairly large. Users with dial-up connections may need to wait a minute or so for the applet to finish loading.

 

 
  Suggestions for running the applet:  
 

When running Storage demos, the first insertions into the table are not very interesting since there will probably be few collisions. Adjust the animation delay to its minimum value, click Run and let the demo execute until you begin to see collisions. After that, either click Pause and switch to Step mode or increase the animation delay so that you can observe how the algorithm works.

As stated in the Execution Page section, the time statistic is obtained from your computer system's clock and is not adjusted to account for animation speed or delays introduced by using the Step and Pause buttons. If you wish to compare the execution times for different algorithms only use the Run button and use the same animation delay setting for each demo.

Finally, when you are ready to browse away to a new web page abort the demo if it is running. If you don't and are running Netscape, you may have to wait a few extra seconds before the browser will let you leave the page.

 

 
  Introduction | Sample Instructions | Additional Display Information | Algorithms | Operating Requirements | Suggestions