More UK biodiversity resources.
Why Name Species?
Simply by naming objects, we open the door to a variety of useful processes, from merely communicating information about them, through comparing them and recording them, to more esoteric concepts such as classifying them.
Living organisms are just one type of object that we often need to name, although the correspondence between the object and name is less well defined than many other everyday items. The concept of a species (broadly defined as organisms which interbreed with one another) provides a useful criteria for distinguishing types of organism, and is generally applicable to everything from bacteria, and fungi to plants and fauna, both living and extinct.
Two approaches are mainly used for naming species:
- the (somewhat haphazard) collection of common names that has evolved gradually as part of language and culture.
- the artificial (binomial) system of scientific names first published in 1753 by Karl von Linne (better known by his latinicised name: Linnaeus).
Difficulties in adapting these systems for computerised record-keeping has led to a third, less-widely used, system based on identification codes or abbreviations for naming species.
Which way is best?
Advantages and disadvantages of the different naming systems makes them appropriate for different groups:
Common names are a natural extension to our everyday language - these are the first and often only names encountered by many people. In contrast, the rigour of scientific names more closely matches the requirements of research applications. Identification codes often fulfill a more specific role in collective recording schemes, where accurate communication of information is of particular importance.
Biological data and Computers
Computers (and in particular databases) have dramatically expanded the scope for storing and using different forms of information. However database management software also imposes very specific requirements on the form of the data being stored, the most important of which is uniqueness - each item in a list must be associated with a unique name or code. This can lead to a number of difficulties when dealing with species names.
These days databases of animals, plants or fungi are generally indexed using unique codes for species. This approach makes it relatively easy to accommodate the constant changes in species' names (nomenclature). However other idiosyncracies associated with cataloging living species cause additional problems.
Latin names, being artificially created by biologists to describe unique species, seem immune from most problems. However unexpected difficulties can still occur when, for example, something considered to be a single species turns out to be two or more very similar species. This occurred recently with the pipistrelle bat, which is now recognised as two distinct species:
- Common Pipistrelle (Pipistrellus pipistrellus)
- Soprano Pipistrelle (Pipistrellus pygmaeus)
Cryptic species are turning up much more frequently as new techniques like DNA-profiling are used to explore relationships between species.
In contrast, many Common names present such complexities for use with databases (see below) that they have been relegated to a very secondary or even non-existent role. As a result many recording initiatives have, almost at a step, become distanced from field naturalists who provide the core observational data.
Why isn't storing common names on computers straightforward?
Common names differ in a number of ways from the names and labels which databases are used to handling:
Common names are not unique - unrelated species often share similar or identical names.
Additionally some names are applied generically to a number of closely related species:
Magpie (bird, moth)|
Emperor (moth, dragonfly)
Brimstone (butterfly, moth)
Whirlygig beetle (several species)|
Eyebright (several related species)
Sea Squirt (a number of species)
- it is difficult to create unique entries for each species in a database.
Multiple common names occur - often with no 'correct' form.
Alternative names may be associated with different parts of the country, or all forms may be used indiscriminatly.
- Arranging the data for efficient searching is difficult.
Names do not follow standard rules of punctuation - or indeed any rules at all.
For example, consider:
|Blackhead Salmon||Black-headed Gull||Jupiter's Distaff||Solomon's-seal|
|Lily of the Valley||Star-of-Bethlehem||Fox and Cubs||Hen-and-chickens Houseleek|
- Prevents application of standard text-processing techniques.
Many common names are subject to slight variations in spelling or punctuation in different reference books.
eg. (Hypericum perforatum)
Saint John's Wort|
St. John's Wort
St John's Wort
- Requires flexible database search methods, increasing query complexity.
These characteristics raise a number of problems with conventional database systems, which commonly use hash-tables (based on exact pattern-matching) to resolve relational references and for fast data searches. In many cases it would be necessary to include all possible variants of common names explicitly in the database to ensure a match. The alternative, flexible pattern-matching algorithms, is too slow to be viable.
What Solutions are available?
One possible solution is to "stabilise" (ie. standardise) common names. This approach was applied to Wild Flower names by Rayner in 1926 (Standard Catalogue of English Names of our Wild Flowers) and again by the Botanical Society of the British Isles in 1974 (English Names of Wild Flowers). In 2000 the Fungus Conservation Forum employed a similar approach for Fungi names, creating a further 400 new names in the process.
Whilst stabilisation may encourage use of common names through rationalisation and increasing coverage, it is not without drawbacks. The loss of alternatives (a single name is generally preferred) and enforcement of systematic patterns (eg. generic terms, hyphenation) unavoidably degrades the cultural value of names. Proponents may also face difficulties in persuading people to switch abruptly to using the new names, particularly if they have been using closely similar variants for many years.
Various software solutions have been employed including: increasingly complex database structures, parallel (coupled) databases for Latin and common names, and development of bespoke database systems for handling biological data.
The UK Species project was originally devised by Ecologica (UK) Ltd in October 2003 to explore some alternative methods for storing and retrieving biological data. The aim of the project is:
"to create an accessible online resource of information for UK Species
of use to individual naturalists and conservation organisations."
(accessible = searchable by common or scientific name).
We identified three main objectives for the project:
- A database management system (DBMS) capable of efficiently storing biological data;
- Search algorithms capable of accommodating peculiarities in common names, and;
- An extensive list of common names for UK species.
The UK Species site was implemented using a proprietary MIrel database developed by Ecologica coupled to a flexible search interface.
MIrel databases offer a number of advantages for handling biological information, and are particularly efficient at storing and searching lists of alternative names (synonyms) and taxonomic relationships (a brief description is given here.)
Database searches are carried out using a novel hashing algorithm, developed through analysing common name characteristics. This simplifies common name queries in particular by resolving variable hypenation, apostrophes and other punctuation.
Finally, we compiled the UK Species Inventory, a database of > 50,000 species, listing common and latin names, together with taxonomic information, protection and status data. The database contains than 19,000 common names - one of the largest online compilations in the UK. The online version (UK Species website) lists slightly more than 13,000 of the more widely recognised UK species, together with associated information.
The web-interface allows the database to be searched by latin or common name, using exact or fuzzy-matching methods. The fuzzy matching search uses pattern-recognition algorithms to match names which are mis-spelt (including omissions and transpositions of letters), and is particularly useful for searching latin names.
The website was operational in January 2004 and officially lauched in April 2004
Standard Catalogue of English Names of our Wild Flowers. Rayner, 1926.
English Names of Wild Flowers. John G. Dony, Franklyn Perring and Catherine M. Rob. (Butterworth Group, London. 1974).
Recommended English Names for fungi. The Fungus Conservation Forum (PlantLife Website)
We are extremely grateful to the National Biodiversity Network and Biological Records Centre for providing detailed species information, and to the following organisations for allowing use of Species identification codes: