C Tutorial (23) : Sorting

Sorting is the computer term given to ordering lists of values.

Not only must you be able to find data in arrays, but you often need to arrange array data in a certain order.

Your programs don’t always hold array data in the order you want to see that data. For example, students don’t enroll based on alphabetical last name, even though most colleges print lists of students that way. Therefore, after collecting student data, the school’s computer programs must somehow arrange that data in last name order for reports.

This tutorial explains the easiest of computer sorting techniques, called the bubble sort.

If you want to alphabetize a list of letters or names, or put a list of sales values into ascending order (ascending means from low to high, and descending means from high to low), you should use a sorting routine. Of course, the list of values that you sort will be stored in an array because array values are so easily rearranged by their subscripts.

Think about how you’d put a deck of cards in order if you threw the cards up in the air and let them fall. You would pick them up, one by one, looking at how the current card fit in with the others in your hand. Often you would rearrange some cards that you already held. The same type of process is used for sorting an array; often you have to rearrange values that are in the array.

The bubble sort isn’t extremely efficient compared to other sorts, but it’s the easiest to understand. The name bubble sort comes from the nature of the sort. During a sort, the lower values “float” up the list each time a pass is made through the data. Figure below shows the process of sorting five numbers using a bubble sort.

 

c_sorting_1

During each pass, the lower values “float” to the top of the array.

The next program sorts a list of 10 numbers. The numbers are randomly generated using rand(). The bubble sort routine is little more than a nested for loop. The inner loop walks through the list, swapping any pair of values that is out of order down the list. The outer loop causes the inner loop to run several times (one time for each item in the list).

/* This program generates 10 random numbers and then sorts them */

#include <stdio.h>
#include <stdlib.h>
#include <time.h>

main()
{
int ctr, inner, outer, didSwap, temp; 
int nums[10];

time_t t;

// If you don't include this statement, your program will always
// generate the same 10 random numbers
srand(time(&t));

// The first step is to fill the array with random numbers
// (from 1 to 100)
for (ctr = 0; ctr < 10; ctr++)
{
  nums[ctr] = (rand() % 99) + 1;
}

// Now list the array as it currently is before sorting 
puts("\nHere is the list before the sort:");
for (ctr = 0; ctr < 10; ctr++)
{
  printf("%d\n", nums[ctr]);
}

// Sort the array
for (outer = 0; outer < 9; outer++)
{
  didSwap = 0; //Becomes 1 (true) if list is not yet ordered 
  for (inner = outer; inner < 10; inner++)
  {
    if (nums[inner] < nums[outer])
    {
      temp = nums[inner];
      nums[inner] = nums[outer];
      nums[outer] = temp; 
      didSwap = 1;
    }
  }

  if (didSwap == 0)
  {
    break;
  }
}

// Now list the array as it currently is after sorting 
puts("\nHere is the list after the sort:");
for (ctr = 0; ctr < 10; ctr++)
{
  printf("%d\n", nums[ctr]);
}

return(0);
}

The output from this sorting program is as follows:

Here is the list before the sort 64 17 1 34 9 5 58 5 6 70

Here is the list after the sort 1 5 5 6 9 17 34 58 64 70

Note : Your output might be different than that shown in the preceding example because rand() produces different results between compilers.

Here is the swapping of the variables inside the inner loop:

temp = nums[inner]; 
nums[inner] = nums[outer]; 
nums[outer] = temp;

If a swap needs to take place the program must swap nums[inner] with nums[outer].

You might wonder why an extra variable, temp, was needed to swap two variables’ values. A natural (and incorrect) tendency when swapping two variables might be this:

nums[inner] = nums[outer]; /* Does NOT swap the */ 
nums[outer] = nums[inner]; /* two values */

The first assignment wipes out the value of nums[inner] so that the second assignment has nothing to assign. Therefore, a third variable is required to swap any two variables.

Tip : If you wanted to sort the list in descending order, you would only have to change the less-than sign (<) to a greater-than sign (>) right before the swapping code.

If you wanted to alphabetize a list of characters, you could do so by testing and swapping character array values, just as you’ve done here.

An added bonus that is common to many improved bubble sort routines is the testing to see whether a swap took place during any iteration of the inner loop. If no swap took place, the outer loop finishes early (via a break statement). Therefore, if the loop is sorted to begin with, or if only a few passes are needed to sort the list, the outer loop doesn’t have to finish all its planned repetitions.

Faster Searches

Sometimes sorting data speeds your data searching. In the last tutorial, you saw a program that searched a customer ID array for a matching user’s value. The customer ID values were not stored in any order.

The possibility arose that the user’s entered customer ID might not have been found. Perhaps the user entered the customer ID incorrectly, or the customer ID was not stored in the array. Every element in the entire customer ID array had to be searched before the programmer could realize that the customer ID was not going to be found.

However, if the arrays were sorted in customer ID order before the search began, the program would not always have to look at each array element before deciding that a match couldn’t be made. If the customer ID array were sorted and the user’s customer ID were passed when looking through a search, the program would know right then that a match would not be found. Consider the following list of unsorted customer IDs:

313
532
178
902
422
562

Suppose the program had to look for the customer ID 413. With an unsorted array, a program would have to match the ID 413 to each element in the array.

If the arrays contained hundreds or thousands of values instead of only six, the computer would take longer to realize unmatched searches because each search would require that each element be looked at. However, if the values were always sorted, a program would not always have to scan through the entire list before realizing that a match would not be found. Here is the same list of values sorted numerically, from low to high customer IDs:

178
313
422
532
562
902

A sorted list makes the search faster. Now if you search for customer ID 413, your program can stop searching after looking at only three array values. 422 is the third element, and because 422 is greater than 413, your program can stop searching. It can stop searching because 422 comes after 413.

Note : In extreme cases, searching a sorted array is not necessarily faster than sorting using an unsorted array. For instance, if you were searching within the previous list for customer ID 998, your program would have to search all six values before realizing that 998 was not in the list.

We can avoid this by adding a check if value is between min and max.

The following program is a combination of the customer ID searching program shown in the previous tutorial and the sorting routine shown in this tutorial. Keep in mind that having only 10 array values makes this program seem like overkill, but if there were tens of thousands of customers, the code would not be different.

/* This program searches a sorted list of customer IDs in order to
   get credit totals */

#include <stdio.h>

main()
{
int ctr; // Loop counter
int idSearch; // Customer to look for (the key) 
int found = 0; // 1 (true) if customer is found

/* Defines the 10 elements in each of the parallel arrays */ 
int custID[10] = {313, 453, 502, 101, 892,
                  475, 792, 912, 343, 633};

float custBal[10] = { 0.00, 45.43, 71.23, 301.56, 9.08, 192.41, 
                      389.00, 229.67, 18.31, 59.54};

int tempID, inner, outer, didSwap, i; // For sorting 
float tempBal;

// First, sort the arrays by customer ID */ 
for (outer = 0; outer < 9; outer++)
{
  didSwap = 0; // Becomes 1 (true) if list is not yet ordered 
  for (inner = outer; inner < 10; inner++)
  {
  if (custID[inner] < custID[outer])
  {
    tempID = custID[inner]; // Must switch both arrays 
    tempBal = custBal[inner]; // or they are no longer linked 

    custID[inner] = custID[outer]; 
    custBal[inner] = custBal[outer]; 
    custID[outer] = tempID; 
    custBal[outer] = tempBal;

    didSwap = 1; // True because a swap took place
  }
}

if (didSwap == 0)
 {
  break;
 }
}

/* Interact with the user looking to find a balance */ 
printf("\n\n*** Customer Balance Lookup ***\n");
printf("What is the customer number? "); 
scanf(" %d", &idSearch);

/* We can add check here : min <= idSearch <= max
   So that for wrong values program will break here itself */

// Now, look for the ID in the array 
for (ctr=0; ctr<10; ctr++)   
{   
  if (idSearch == custID[ctr]) // Do they match?   
  {   
   found = 1; //Yes, match flag is set to TRUE 
   break; //No need to keep looping 
  }   
  if (custID[ctr] > idSearch) // No need to keep searching
  {
   break;
  }
}
// Once the loop has completed, the ID was either found (found = 1) or not
if (found)
{
  if (custBal[ctr] > 100)
  {
   printf("\n** That customer's balance is $%.2f.\n", custBal[ctr]);
   printf("No additional credit.\n");
  }
  else // Balance is less than $100.00
  {
   printf("\n**The customer's credit is good!\n");
  }
}
else // The ID was not found
{
  printf("** You have entered an incorrect customer ID."); 
  printf("\n ID %3d was not found in the list.\n", idSearch);
}

return(0);
}

Before seeing this program, you mastered both array searching and array sorting. This program simply puts the two procedures together. About the only additional job this program does is keep the two parallel arrays in synch during the search.

An early search termination could take place because of the following:

  if (custID[ctr] > idSearch) // No need to keep searching
  {
   break;
  }

When there are several thousand array elements, such an if saves a lot of processing time.

Keeping arrays sorted is not always easy or efficient. For instance, you don’t want your program sorting a large array every time you add, change, or delete a value from the array. After storing several thousand values in an array, sorting the array after adding each value takes too much time, even for fast computers. Advanced ways of manipulating arrays ensure that you always insert items in sorted order.

In some scenarios we may not be allowed to change (sort) array, so we have to use sequential search in those cases.

Searching < Prev                                                                                  Next > Pointer

Advertisements

2 comments

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s