This is release 2 of a (hopefully) plug-in replacement for the standard C
library qsort() routine. I have now done very extensive testing, and I'm
confident there are no bugs which prevent it from sorting, although there
could certainly still be less than optimal algorithms around. The files are:

   Makefile     - a makefile for VC++ which should also work with Unix if
                  you comment out the VC++ stuff and uncomment the Unix
                  stuff.

   README       - This file.

   myqsort.c    - The actual qsort code.

   tabulate.pl  - Perl script to generate summary tables from the detailed
                  test program output.

   testq.c      - The test program (all the rest of the files are also
   tascend.c      part of the test program).
   tascend.h
   tbase.c
   tdescend.c
   tdescend.h
   tqsort.h
   trandom.c
   trandom.h
   trandup.c
   trandup.h
   tscramble.c
   tscramble.h

Various compile flags you might want to turn on in the Makefile:

   -DDEBUG - define to turn on mega assertion testing (set breakpoint on the
             break_here() routine to stop when an assertion fails).  This
             will totally throw off the compare count stats in the test
             program, so ignore them when built with -DDEBUG.

Alloca stuff (if you don't want to fool with it, just don't define
HAVE_alloca, and it will use malloc instead):

   -DHAVE_alloca - define if you have the alloca() routine available

   -DGET_alloca_FROM_ALLOCA_H - define if you need to #include <alloca.h>

   -DGET_alloca_FROM_MALLOC_H - define if you need to #include <malloc.h>

   may also need to add something like -Dalloca=_alloca or
   -Dalloca=__builtin_alloca depending on how it works for your
   compiler.

Stuff that happened since first release:

Did some coverage tests - determined that I have indeed hit every case in
the compare code at least once (so I'm a lot more confident it works
correctly now).

Wrote an insane test program with various different kinds of test data
generators and implemented a few variations on the qsort algorithm based on
a trick that the netbsd qsort does (all compiled with ifdef code in
myqsort.c).

The test data generators are:

   ascending
      Just unique element in already sorted ascending order.

   ascending with scrambled regions
      Take an ascending array, then scramble the elements within regions of
      that array (deliberately contrived to try and make the netbsd trick
      fail badly :-).

   descending
      Unique elements in reverse sorted order.

   random
      Completely random elements.

   random with duplicates
      Random elements, but with a lot of duplicated values scattered through
      the array.

The algorithm variations are:

   system
      The standard system library qsort() routine.

   plain
      The original myqsort.c with no added tricks, but a slightly improved
      insertion sort since the initial release.

   alg1
      The netbsd trick - if we detect (almost) no swaps during a qsort
      partitioning, then switch to insertion sort for the new partitions
      (guessing that the data might be ordered if we didn't have to
      swap). This is implemented by setting the max insertion sort partition
      size to the size of the current partition.

   alg2
      Not as gullible as alg1 - instead of switching to a insertion sort for
      the whole partition, just double the size of partitions we will use a
      qsort on (this doubling keeps on going as new partitions are created
      and they also don't swap any elements).

   alg3
      More drastic than alg2, but still less than alg1. In this one we
      square the size of the partition instead of doubling it.

   ins
      Pure insertion sort (mainly for comparison to others). I commented
      this out in the test program when I found it was taking near infinite
      amounts of time on the bigger sized sets.

The overall compare call count results I get with VC++ 5.0 on Windows NT:

                   alg2 = 33969658
                  plain = 44947981
                   alg3 = 132411902
                   alg1 = 134877092
                 system = 195751388

If you look at the detailed reports, alg1 really only gets beat up in the
"ascending with scrambled regions" data set, so it looks like my attempt to
contrive a test which breaks it was successful. The results for alg2 shows
that it isn't too much worse than alg1, and also still works well with the
contrived test data, so I'm kind of leaning towards alg2 as the final
version.

I also ran the test program on some systems at work, and they all show
alg2 as the winner:

A Concurrent (formerly Harris) NightHawk 88k system running CX/UX 7.1:

                   alg2 = 34336665
                  plain = 45314857
                   alg3 = 132623557
                   alg1 = 135063009
                 system = 217374159

A Concurrent NightHawk PowerPC system running PowerMAX OS 4.1:

                   alg2 = 34336665
                  plain = 45314857
                 system = 46417461
                   alg3 = 132623557
                   alg1 = 135063009

A Sun sparcstation running Solaris (forget which version) with gcc:

                   alg2 = 34336665
                  plain = 45314857
                   alg3 = 132623557
                   alg1 = 135063009
                 system = 217374159

So algorithm 2 seems to be the best general purpose sort for any kind of
data which might get thrown at it on at least a few different systems.  It
isn't as fast as the netbsd trick when the trick works well, but it also
isn't as gullible when the trick turns out not to work, and it is
competitive, even when the trick does work.
