DNA Barcoding in Forensic Vertebrate Species Identification

Animal identification is essential in a large number of forensic cases, including bush meat harvest, unregulated trade in protected species or species’ derivatives, introduction of exotic species without a proper permit and food fraud. The analysis of morphological traits has been the most traditional method used for species identification and taxonomy. However, when morphological identification is compromised, genetic identification can be used to associate sequences from unknown samples to a sequence from a reference sample. Based on a standard region of 650 base pairs of the subunit I of cytochrome c oxidase mitochondrial gene (COI) and using a validated reference database, the DNA Barcoding system for cataloging and identifying animal species has been proposed. In order to test the utility of DNA Barcoding in forensic vertebrate species identification, COI sequences from previously identified samples from human and a variety of domestic and wild specimens of Brazilian mammals, birds, fishes were compared against the Barcode of Life Database (BOLD). BOLD provided a correct species-level identification for 12 out of the 20 queried sequences (60%) and presented the correct species as the best matched one for 17 out of 18 samples morphologically identified to this level (94%). Cases where BOLD did not deliver a species level identification were associated with the controversial taxonomic status of some species, the possible occurrence of a biological event like hybridization and the lack of representation of some groups in the database. The results showed that DNA Barcoding is already effective for species identification in many cases and, although presenting some limitations, the use of the tool must be improved and widespread in forensic casework.


Introduction
Animal identification is essential in a large number of forensic cases, including bushmeat harvest, the unregulated trade in protected species or its derivatives and the introduction of exotic species without a proper permit, activities that threaten the survival of natural populations and cause damages to the ecosystems 1,2,3 .Animal identification is also important in food fraud investigations, since there are serious economic and public health implications when illegal or poor quality items are sold as more expensive or legal ones 4,5 .Without the unambiguous species identification, many of these illegal activities cannot be characterized, preventing the prosecution and eventually the punishment of the offenders.
The analysis of morphological traits has been the most traditional method used for species identification and taxonomy 6,7 .Characteristics such as size, shape and structures present in the bodies of animals can be used to identify the species in most cases.However, morphology based identification requires certain conditions, including the existence of distinctive features inherent to a single species, the presence of such features in the specimen being examined and sometimes very specific taxonomic knowledge, which makes the identification of cryptic species, immature individuals and animal parts, products or fluids extremely difficult in many situations ¨6,8 .When morphological identification is not possible, genetic identification can be used in order to associate unknown samples to a reference sample by comparing sequences of mitochondrial genes that differ between species 9 .The use of mitochondrial DNA for species identification has some advantages when compared to nuclear DNA, including the increased sensitivity for detection, resulting from the high copy number per cell, which is useful when the DNA is degraded or the amount available is low, and high mutation rates, meaning that usually even closely related species can be discriminated based on their nucleotide sequence differences 10 .Furthermore, the use of universal primers for amplification of the same region across different taxa simplifies the methodology and enables the identification without any prior knowledge about the material being analyzed 7,11 .
One of the most commonly used mitochondrial genes for species identification is the subunit I of cytochrome c oxidase (COI).Based on a standard region of 650 base pairs of this gene, a universal system for cataloging and identifying animal species, named DNA Barcoding, has been proposed 12,13 .A previous study using 964 pairs of Chordate species showed that the mean COI divergence of 9.6%, enabling COI to discriminate among closely related species in most cases 14 .Created as part of the proposed identification system, the Barcode of Life Database (BOLD) is an international publicly available reference database where sequences from multiple voucher specimens that accomplish some quality criteria can be uploaded 15 .For identification purposes BOLD uses a mixed search routine, combining methods of similarity with distance tree construction.After aligning the translated query sequence to a consensus model of the COI protein, a linear search of the reference library selects the 99 best hits and uses them to reconstruct a Neighbor-Joining tree 8,16 .According to Ratnasingham et al. (2007), BOLD delivers a species identification with a probability of placement if there is less than 1% sequence divergence between the query sequence and a reference sequence.When a species-level match cannot be made, the query sequence is assigned to a genus if the sequence divergence is less than 3%.
Although developed with scientific objectives, supplementing the knowledge of taxonomists as well as being an innovative tool allowing non-experts to make identifications, quite soon became clear that DNA Barcoding could be used for forensic and regulatory purposes.In fact, Danway ET AL. (2007) conducted a validation study and concluded that COI gene enables accurate species identification in forensic casework where adequate sequence data exists and, more recently, the United States Food and Drug Administration (FDA) has approved the use of a validated DNA barcoding protocol for the identification of seafood products 17 .In the last few years some studies have addressed the utility of DNA barcoding as a tool to investigate wildlife crimes 18,19,20 and mislabeled food 4,5,21 .
This paper aims to contribute to evaluate the utility of the DNA Barcoding in forensic animal identification.To accomplish that, tissue or other previously identified samples from human and a variety of domestic and wild specimens of Brazilian mammals, birds, fishes and reptiles were selected, had a fragment of the COI gene amplified using universal primers and were sequenced.Sequences were compared against BOLD and the identification results were discussed.

Material and methods
Animal tissue fragments and other biological materials are kept in the collection of the Brazilian Federal Police DNA Laboratory to be used as casework references.The material deposited in the collection has multiple sources, including seizures and specimens from scientific projects conducted by the Laboratory.To be used in this study was selected a total of 20 tissue or membrane samples from human and a variety of domesticated and wild mammals, birds, fishes and reptiles.Except for two specimens, all the samples were obtained from individuals morphologically identified to the species level by the author or a group specialist, in those cases where the morphological identification was more difficult.
Although preserving morphological characters that allowed the recognition of the group, specimens from the genera Mazama and Dasypus were too damaged to be safely identified beyond the genus level (Table 01).EDTA, 2% SDS, pH 7.5) and 20 mg/mL Proteinase K, DNA was extracted from samples using standard phenol-chloroform procedures and purified with Amicon® Ultra (Millipore), following an adapted protocol previously described 22 .Fragments of approximately 650 bp from the 5' region of the COI gene were amplified using FishF1 and FishR1 primers 23 for fish and some bird samples (C.japonica, R. toco and A. aestiva) or LCO1490 and HCO2198 primers 24 for the other samples.The PCR were performed in 25 µl reaction tubes containing 2U of AmpliTaq Gold® (Life Technologies), 1.5 mM MgCl2, 0.2 mM dNTPs, 0.4 mM of each primer and 1 μl DNA (DNA not quantified).The cycling parameters employed were 11 min at

Results
DNA extraction from the 20 samples was successful and good quality 650 bp sequences were obtained from all of them.Analysis of the nucleotide sequences showed no signs of heteroplasmy and its translation into amino acids sequences did not reveal the presence of putative stop codons or pseudogenes.The sequences obtained here were uploaded to GenBank (Table 02).
All the 20 sequences were matched to reference sequences with less than 2% sequence divergence.BOLD provided a species-level identification for 12 out of the 20 queried sequences (60%).Five queried sequences resulted in more than one species as probable candidates and three sequences resulted in the best matched species out of the 1% species identification threshold and were not associated by BOLD to a candidate species.However, even without providing a species level identification in eight cases, BOLD searches resulted in the correct species being presented as the best matched for 17 out of 18 samples morphologically identified to this level (94%).Sequences from Mazama sp. and Dasypus sp., which cannot be checked to the species level due to the lack of previous information, were associated to the correct genus.In one case the best matched species did not correspond to the sample identity, but BOLD presented a same genus species (genus Cichla) (Table 02).

Discussion
The methodology described here to extract DNA from distinct sources and amplify the targeted segment of the COI gene showed to be simple and robust, resulting in good quality sequences from all samples.Only two sets of primers were enough to amplify the same DNA region of individuals from distinct taxa.Although the property of amplifying across a wide taxonomic range may complicate correct species identification where samples are mixed or contaminated 9 , the use of universal primers is essential in forensic casework, since many times there is no previous indication about the nature of the specimen under analysis.
The effectiveness of BOLD for species identification depends basically on two factors.
First, the COI sequence divergence must allow the differentiation between species and, second, the reference database must represent the diversity of the targeted group.Although there are some exceptions, previous studies have shown that COI is useful to discriminate most species of different animal groups, including birds 26,27 , fishes 25,28,29,30 , mammals 31,32 and reptiles 33 .The major limitation of DNA Barcoding has always been considered the lack of an authenticated and widely representative reference database.By the time this paper was written, BOLD had in its database barcodes from more than 290.000 chordate specimens, which represented 29.000 species, including more than 15.000 fishes, 5.600 birds, 2.800 mammals and 2.200 reptiles (http://www.boldsystems.org).Despite the fact that many species and individuals still need to be included in order to capture most of the possible patterns of genetic variation, the use of BOLD in this study allowed the unambiguous species-level identification of 60% of the samples and presented the correct species as the best matched in 94% of the cases, showing that the database is already effective in many situations.
The sequence from Amazona aestiva was matched to the correct species by BOLD.
However, BOLD did not provide a species level identification and also presented A.
ochrocephala as a possible candidate species.In fact, previous phylogenetic studies based in mitochondrial genes showed great genetic similarity between A. aestiva and groups of A.
ochrocephala complex 34,35,36 .The two sequences from Caiman species also resulted in more than one species or subspecies within BOLD identification threshold and the correct species presented the best similarity value.However, the sequence from C. yacare was matched to both C. crocodillus and C. yacare with 100% similarity.Based on 13 external morphological characters, C. yacare was considered a full species, sufficiently differentiated from the other subspecies of C. crocodillus 37 .However, a study with mitochondrial and nuclear genes showed that the phylogenetic relationships of C. yacare and C. crocodillus were unclear, as the two species share mitochondrial and nuclear haplotypes 38 .These results reflect the controversial taxonomic status of Amazona and Caiman and further studies with a more extensive sampling must be conducted in order to access the COI variability within both genera and verify the value of DNA Barcoding to differentiate species of these groups.
The Gallus gallus sequence was associated to the correct species through a high number of 100% sequence similarity matches, which leaves little doubt that the identification was correct.However, since two sequences of G. sonneratti also presented perfect matches, the species was listed by BOLD as a candidate species as well.The sequences of G.
sonneratti that presented high similarity values with G. gallus resulted from a study based in the whole mitochondrial DNA and segments of the nuclear genome, showing genetic evidences of hybridization between both species 39 .A similar result was found for the Oreochromis niloticus, with O. mossambicus e O. aureous being presented as potential candidate species with very high similarity values.Recent or incomplete processes of speciation allow species of the genus Oreochromis to hybridize easily and successful strains for aquiculture purposes were produced by hybridizing the tree species listed here as potential candidate species 40 .The presence of hybrids appears to be the explanation for the results obtained in this study.
The sequences from Mazama and Dasypus presented high values of sequence similarity but were not identified to the species level.These cases usually result from the lack of representation of the true species in the database, due to a high level of intraspecific variation or both things.Since the samples came from individuals identified only to the genus level, it is impossible to know if the best matched species presented by BOLD corresponded to the true species and therefore, the reason why a species level identification was not provided remains unclear.Similarly, BOLD searches did not deliver a species-level identification for the queried Cichla sequence.However, in this case was possible to check that the best matched species, C. temensis, did not correspond to the true species, C. orinocencis.This probably reflects the fact that the genus Cichla is extremely underrepresented in the database.Although 15 morphologically distinct species of the genus Cichla were described 41 , other studies based on molecular data showed that some of them could be considered synonymous, resulting in only eight distinct species 42,43 .Five of these eight species, including C. orinocencis, are not currently represented in BOLD.These data, together with evidence of the occurrence of high rates of hybridization and introgression among species of the genus 42 , raises doubts about the correct identification of the species based only on sequences of a single gene like COI.It is important to mention that sometimes the identification of the genus or other higher taxonomic group is enough to characterize the illegal activity and therefore the use of DNA Barcoding may still be helpful in such cases.
The results of this study showed that DNA Barcoding was effective to identify most of vertebrate species samples used in this study.However, it should be mentioned that the use of the tool has limitations and complicating factors, including the taxonomic uncertainty of some species, the inability to separate some species based only on COI sequences, the occurrence of biological events such as hybridization and introgression between some closely related species and the lack of representation of some groups in the reference database.Thus, it is important that the identification of forensic samples through DNA Barcoding be always accompanied by an extensive study of the biology and taxonomic relationships of the identified species, as well as an evaluation of the representation of the group in the database, in order of to check if the identification is conclusive.
The use of DNA Barcoding for animal species identification in forensic casework is relatively new and very limited in Brazil.In many situations the use of genetic tools is the only alternative to characterize crimes and avoid impunity and, therefore, their use must be widespread.In this scenario, the cooperation between law enforcement agencies and research institutions is extremely important to determine priority groups and to improve the use the technique.

94° C, followed by 35
cycles of 94° C for 30 s, 54° C (FishF1/FishR1) or 50° C (LCO1490/HCO2198) for 30 seconds and 72° C for 1 min.Amplification products were purified using 1 µL Exo-SAP-IT® (USB) and sequenced in both directions using Big Dye Terminator kit v1.1 (Life Technologies).The extension products were treated with 1u SAP and purified by ethanol precipitation.Capillary electrophoresis was performed in an ABI 3130 genetic analyzer (Life Technologies).Sequences were assembled and had their quality assessed with SeqScape v2.6 software (Life Technologies).MEGA 5 software25 was used to align and translate the consensus sequences.Sequences were searched in BOLD //www.boldsystems.org).

Table 01 .
Vertebrate species and samples used in the study.

Table 02 .
BOLD identification engine results for the query sequences (BOLD best matched species is the reference sequence in the database with the highest value of sequence similarity from the 99 best-hit list.BOLD ID represents the cases where BOLD delivered a species-level identification).