Introduction
Genetic improvement in a set of traits occurs when dogs selected to become parents of the next
 generation are superior in performance compared to the population from which they were
 selected.
The amount of genetic change to expect from one generation of selection is governed
by both the heritability of the trait(s) being improved and by the degree to which average
performance of newly selected parents exceeds the average performance of the population
from which they were selected (Falconer and Mackay, 1996).
In 1980, these principles of genetic selection were implemented at The Seeing Eye in the
 successful production of several thousand German Shepherd Dogs, Labrador Retrievers,
 Golden Retrievers and some Labrador by Golden Retriever crossbred dogs. The methodical
 approach to choosing replacement breeders and deciding how those breeders were mated
 resulted in improved hip quality and an improved ability to be trained to guide blind people.
 The purpose of this paper is to describe the breeder selection process used by The Seeing Eye
 to obtain genetic improvement in both hip quality and trainability. It will also document the
 amount of genetic improvement actually realized for these two traits in each of the three pure
 breeds.
Materials and Methods
An organized breeding plan must address four key components:
- Define goals to be obtained
 - Decide whether to use pure breeding or crossbreeding as the primary method for producing new
offspring - Define and implement a record keeping system with accurate phenotypic measurements capable of
supporting the breeding plan - Define and implement objective criteria for selecting replacement breeders
 
The Seeing Eye addressed each of these four components in 1980 and began implementing the
 current breeding plan in that year. The objectives were clear: dramatically improve hip quality
 while simultaneously improving the ability of dogs to be trained for work as guides. Purebred
 production was chosen as the primary method for producing new offspring. A record keeping
 system was developed using computer equipment that pre-dated introduction of the IBM
 personal computer by two years. Over the intervening years, three additional record keeping
 systems were developed, with data collected by each preceding system being imported into its
 successor. The current record keeping system contains information on over 18,000 dogs, some
 of which were born in the 1970’s. Many of those early dogs define the foundation generation in
 pedigrees of puppies currently being born, now up to nine generations later.
Phenotypic measurements for hip quality included both an extended view and a distraction view
radiograph. Each extended view radiograph was scored by a single veterinary radiologist using
a 9-point scale (Leighton, 1997), where 9 is the most desirable.
All films were scored by the same radiologist. To aid in summarizing the change in hip quality over time, 
extended view hip scores 1-5 were classified as dysplastic while scores greater than 5 were classified as 
normal. All distraction view radiographs were submitted to PennHip (Smith, ???) for scoring. Because
 the PennHip procedure was developed in the mid- to late-1980’s and perfected into the 1990’s,
 hip quality of few dogs from the earlier years was assessed by this technique. Beginning in the
 early-1990’s, hip quality of all dogs was assessed by both an extended view hip score and a
 PennHip score. Radiographic films for both techniques were taken when dogs returned to The
 Seeing Eye to begin training. Most dogs were at least 14 months old, but some were as old as
 18-20 months of age.
The trainability score is the phenotypic measure assessing a dog’s ability to be trained to guide
 blind people. Methodology for the trainability score was developed by The Seeing Eye in the
 early 1980’s. Since then, only four different people have assigned the score. At its most
 elementary level, the trainability score is a comparison rating that ranks dogs into one of 9
 classes with 9 being the most trainable dogs and 1 being the least trainable.. Each dog is given
 a score that reflects its ability to be trained as a guide relative to all other dogs of that breed
 scored in the last 1-3 months. As time passes, the quality of dogs improves. The best to worse
 rank scale is a successful measure because it is always accounting for the change in score
 definition over time. This change in score definition over time is accounted for in the statistical
 model by including a term for contemporary groups. A contemporary group is simply defined as
 all dogs of a given breed passing through the system in the same calendar quarter.
Estimated breeding values (EBV’s) (Mrode, 2005) provided a means to use the phenotypic
 measures to make objective selection decisions among available candidates. EBV’s are
 statistical calculations utilizing all of data on the individual and its relatives to provide an
 estimate of how well each dog would transmit desired traits to its offspring compared to other
 dogs in the population. Breeders are selected as the dogs with the best EBV’s. EBV’s were
 calculated nightly by the Seeing Eye record-keeping system, since EBV’s change as new
 phenotypic data are entered into the database each day.
EBV’s for the PennHip score and the trainability score were combined into one overall index
 value, with trainability being weighted with twice the importance of hip quality. This single
 number formed the basis for deciding which litters were most likely to contain superior breeding
 candidates. Based only on their pedigree information, litters of 9-month old dogs with the
 highest overall index are marked as possible breeder candidates. When dogs return to The
 Seeing Eye from their puppy development homes at 14 months of age, they are thoroughly
 scrutinized, both from a health perspective and for trainability. Any dogs marked as possible
 breeder candidates identified with health problems like megaesophagus, elbow dysplasia, or
 inherited ophthalmic conditions were eliminated as breeder candidates. Remaining breeder
 candidates completed a one-month long compressed training regimen, wherein they
 demonstrated their ability and willingness to be trained for work as guides.
To track genetic change over time, a generation coefficient was calculated for each litter by
 adding 1 to the average generation coefficients of the sire and dam (Pattie, 1965). Foundation
 animals for which parents were unknown were assigned a generation coefficient of zero. A
 mating between two zero generation parents produced first generation offspring, while a mating
 between a first generation sire and a zero generation dam, for example, produced offspring with
 a generation coefficient of 1.5. To summarize the data, generation coefficients were classified
 by rounding them to the nearest whole number.
All breeders were housed in a separate breeding center, where they remained for their
 approximate 2-year breeding career. Males were kept in breeding until they completed 8
 matings or until their replacement was found. Females were kept in breeding until 48 months of
 age or until their replacement was identified. In recent years, the goal was to produce about 600
 puppies per year, from which approximately 55% were eventually chosen for breeding or trained
 for work as guides. Many dogs not chosen for use by The Seeing Eye were chosen for work by
 police agencies or by U.S. Government agencies. The remaining unused dogs became pets.
Results
By breed, descriptive statistics are shown in Table 1 for two measures of hip quality and the
 trainability score for all dogs born into generation 1 or later after fiscal year 1980.
As assessed by the extended view hip score, genetic change in hip quality is shown in Table 2
 for each breed. Based on this criterion, hip dysplasia dropped below 5% of dogs affected by the
 6th generation in Labrador Retrievers and by the 7th generation in German Shepherd Dogs and
 Golden Retrievers. Generation class means are significantly different (P<0.01) for all three
 breeds.
 
Hip quality assessed by the PennHip score is summarized across generations in Table 3. For all
 three breeds, sex of the dog and generation class explained a significant (P<0.001) part of the
 variation observed in PennHip score.
The trainability score is a comparison ranking of dogs into 1 of 9 score groups with dogs with
 high scores being superior in their ability to be trained for work as guides compared with lower
 scored dogs. The criterion for assignment of the score can change from one calendar quarter to
 the next one, but it is constant in this scoring protocol that superior dogs are ranked higher than
 dogs with lesser trainability. Even though mean value of the trainability score has changed only
 slightly over multiple generations of selection, genetic quality of the dogs has steadily improved
 because the superior dogs with respect to trainability are kept for breeding.
 The genetic model used for calculating trainability score estimated breeding values included
 pedigree information and a term for contemporary groups, which were the calendar quarters
 across time when dogs were evaluated. To view the trend in genetic change of the trainability
 score, trainability EBV’s across generations of selection are summarized by generation class in
 Table 4. They are expressed in genetic standard deviation units to make it easier to interpret the
 results.

In 6 generations of selection in German Shepherds and 8 in Labrador Retrievers, the trainability
 score improved by more than two genetic standard deviations. In Golden Retrievers, essentially
 no genetic improvement was realized over 7 generations, which might be a reflection of the
 smaller population size for this breed. It is well known (Falconer and Mackay, 1996) that chance
 plays a much larger role than selection in producing genetic change as population size
 decreases. Over the years, there was also an influx of numerous Golden Retrievers into the
 population, some of which came from non-guide dog sources.
Discussion
For phenotypic measurements to be useful as a selection tool, they must differentiate among
 individual breeding candidates. By the 6th generation of selection using extended view hip
 scores, hip quality had improved to the point in both German Shepherd Dogs and Labrador
 Retrievers that almost all dogs received a score of 8. From about the 7th generation onward in
 these two breeds, few dogs were lost from either training or field service due to hip dysplasia, so
 selection had worked to produce higher-quality hips.
 When hip quality was assessed in more recent generations using the PennHip score, however,
 it was clear that a wide range in laxity still exists among dogs receiving an extended view hip
 score of 8. Clearly, there remains a need to continue placing some selection pressure on hip
 quality using the PennHip score, if for no other reason than to guard against allowing hip quality
 to begin degenerating. This degeneration, if it occurred, would result from choosing dogs for
 breeding based on a high extended view score only, which would be equivalent to choosing
 dogs at random with respect to laxity as measured by PennHip.
Phenotypic measures need to be validated to show that they actually measure the trait
 intended. Validation involved verifying that the measure is predictive and demonstrating it was
 heritable. A novel scoring system in the trainability score was developed by The Seeing Eye
 Since because there was not a measure of trainability available in 1980 that could be used to
 genetically improve the ability of dogs to be trained for work as guides. This system has
 continued to be effective in improving trainability because the definition of score quality was
 relative to the contemporary group of dogs being assessed. This allowed the scale to remain
 effective in defining best to worst and supported The Seeing Eye’s goal to make genetic
 change over time. In contrast, the usefulness of the fixed 9-point extended view hip quality scale
 degenerated over time because as quality improved, almost all dogs had the same high score.
 Another key factor in the ability of the trainability score to work over decades of selection was
 consistency by the person doing the scoring, because only four highly-skilled, very senior dog
 trainers have assigned the scores.
Genetic improvement in any heritable trait can be directed with sound methods of selective
 breeding. For this to work, the process must include accurate phenotypic measures and a
 consistent application of selection methods over time. Use of these methods by The Seeing Eye
 has yielded a steady supply of high-quality purpose-bred dogs.
References
- Falconer DS and Mackay TFC. 1996. Introduction to Quantitative Genetics, 4th Ed. Addison Wesley
Longman Ltd, Essex, England. - Leighton EA. 1997. Genetics of canine hip dysplasia. J Am Vet Med Assoc 210: 1474-1479.
 - Mrode RA. 2005 Linear models for the prediction of animal breeding values, 2nd Ed. CABI Publishing,
Cambridge, MA 02139 USA. - Pattie WA. 1965. Selection for weaning weight in Merino sheep, 1: Direct response to selection. Aust
J Exp Ag 5: 353-360. - Smith GK , Biery DN and Gregor TP. New concepts of coxofemoral joint stability and development of
a clinical stress-radiographic method for quantitating hip joint laxity in the dog, JAVMA. 196: 59-70. 

