Hopefully you have DNA profiles of every breed of dog in the US, at least. My friend paid a place to do DNA testing on her dog and I'm very skeptical of the results. The dog has very strong characteristics, instincts, and markings of a certain breed but the test said he was a mishmash of about 8-10 breeds -- none of which was what he looked and acted like. So I wonder how many little-known or rare breeds don't have DNA test markers for correct identification.
Forums > How comprehensive is your canine DNA database?
Al, you're right the reference set will be the biggest limitation in any breed analysis we might do as described in our FAQ.
To elaborate, let me start by reiterating that we are not a breed test. That is not our purpose, and it is not our goal. Ancestry information will be a likely by-product of our analyses and we will share any such information that we find on your dog - but we are not offering breed testing as a product.
But if you are looking for a breed test, there are a few metrics you should be interested in to determine how well breeds can be assigned:
1. how is the variation in your dog's DNA captured
Note that virtually no method of analyzing a DNA sample tells us everything about that DNA. There is always a trade-off between how much data we can get and the cost and complexity of the analysis. Whole genome sequencing at a depth of 30x can get pretty close to getting all the information in a DNA sample. But whole genome sequencing is orders of magnitude more expensive than most other approaches. No currently available breed test is doing whole genome sequencing.
All comercially available breed tests that I am aware of use genotyping. This is a much cheaper, easier, and much more efficient analysis that samples a number of positions within the DNA that are known to vary among dogs. To put this in context, there are about 18 million positions in the dog genome that are known to be different between dogs. Until recently, genotyping tools for dogs only checked ~170 thousand of those positions. That's less than 1% of the actual variation. To be fair, if those 170 thousand spots are well chosen, good inferences can be made about many more because of linkage as I recently described in the IAABC journal. But even under the best of conditions, those genotyping tools are only getting a portion of the variation. Very recently a new genotyping tool for dogs was made available that samples ~650 thousand different positions. This is better than the 170 thousand tool, but still has limitations especially for mixed breed dogs.
Here at Darwin's Dogs we are not sure that new genotyping tool will be good enough for our study of behavioral variation. So we are experimenting with a new approach called low coverage sequencing. This works a lot like the whole genome sequencing mentioned at the start, but rather than getting high confidence measures of nearly every bit of the DNA we get just some measure of almost every bit of the DNA, and by comparing across a large number of dogs we can make good inferences about which measures are reliable and which may be false positives or misses.
So here's a summary of relevant analysis techniques. Full sequencing gets high quality measures of nearly every spot in the DNA but it is prohibitively expensive - no breed test uses this technique. Genotyping gets high quality measures of a small subset of the DNA - this is what most breed tests use. Low coverage sequencing gets moderate quality measures of nearly every spot in the DNA.
2. How extensive is the reference data set
This is the question you are asking. To do proper breed assignment, one does need an extensive database of dogs from as many breeds as possible. Here at Darwin's Dogs we do have access to some published and publicly availble genomes of purebred dogs, but it's really not enough. But we also have something else: we have 12000 dogs currently a fair portion of which are registered purebreds. I just did a quick count of how many distinct registered purebred dogs we have and it looks to be about 400 breeds.
3. What algorithm is used to assign breed ancestry to various parts of your dogs DNA
This may be the most "mysterious" part of the process for two reasons. First, most comercially available breed-tests consider this to be a proprietary secret recipe. They will not likely share their method. In contrast, we will absolutely share all the details to any ancestry mapping we do once it is up and running. But the second reason this is a bit "mysterious" is the algoritms used are far from easy reading. Anyone with an advanced math degree would likely find it digestable, but for most of us it's pretty tricky. Most approaches to breed assignment work much like ancestry reconstruction in humans. A google scholar search for these topics will give lots of examples.
So, overall, how will our ancestry reconstruction compare to breed tests? First and most importantly, this is not our goal and we don't promise any results - so if you really want a breed test, we don't compare: send your dog's sample into one of the comercial services. But for the depth of information we get out of your dog's DNA, we will be getting much more. For the number of reference purebreds used, comercial tests that have been in business for a while probably have quite a bit more than we do - but we are growing quickly. For the algoritm used for the analysis ... I can't compare as I don't know what the breed-tests use.
OK, but specifically you might want to acquire the DNA markers for the Plott hound, a relatively unknown breed just recently recognized by the AKC. They have quite a few characteristics distinctive to their breed, both appearance and mannerisms.
Jesse Thanks for your explanation. I had my mixed breed dog's DNA tested. I was skeptical of their accuracy, after I received the results. Your explanation helped explain why their results seemed so off. Pat
I have been thinking about this, and the bit about "indemnify and hold harmless", and would like to suggest that, when and if you give breed results, you phrase it carefully. Apparently, there has been at least one case where someone did a breed test on a registered purebred, and it came back stating that one of the grandparents was not that breed, even though the breeder swore up and down that they were. It is possible that the breeder was lying, but it is also possible that one of the grandparents just had gene-positions-that-aren't-as-common in that breed. They were forced to declare that dog, and all of the dogs of that line, as non-purebred, and that caused them a lot of problems since many of the dogs had been bought for showing and breeding. When I did the Wisdom panel, they clearly stated that there could be issues with the breed identifications if there were large differences between the field lines and show lines, or between American and European populations of the breed. So, I would suggest that, instead of saying something like "your dog is 25% German Shepherd, 50% Chihuahua, and 25% Lab" you go with wording like "X% of your dogs genetic markers are similar to those found in breed..." Because, as I understand the system and statistics, it is statistically possible that there is a purebred dog out there that has none of the common breed markers, and just as statistically possible that there is some random mutt that has all of a breed's common markers. Statistics being what they are, it's a very very very rare possibility, but still, not impossible. So, just to cover yourself, and be clear on what the information shows, I would phrase the results very carefully.