Wikispecies:Microformat

This pages discusses the draft 'Species' microformat, and how it might be applied to Wikispecies, to benefit its editors and users.

What is a microformat?

edit

A Microformat (sometimes abbreviated μF or uF) is a way of adding simple semantic meaning to human-readable content which is otherwise, from a machine's point of view, just plain text. They allow data items such as events, contact details or locations, on HTML (or XHTML) web pages, to be meaningfully detected and the information in them to be extracted by software, and indexed, searched for, saved or cross-referenced, so that it can be reused or combined.

More technically, they are items of semantic mark up, using just standard (X)HTML with a set of common class-names. They are open and available, freely, for anyone to use.

For example, in:

the birds roosted at 52.48,-1.89 and left the next morning

is a pair of numbers which may be understood, from their context, to be a set of geographic coordinates. By wrapping them in spans (or other HTML elements) with specific class names (in this case part of the geo microformat specification:

 the birds roosted at 
 <span class="geo">
   <span class="latitude">52.48</span>, 
   <span class="longitude">-1.89</span>
 </span> 
 and left the next morning
 

machines can be told exactly what each value represents, and can then index it, look it up on a map, export it to a GPS device, convert it to RDF, or whatever; here's the live example:

the birds roosted at 52.48, -1.89 and left the next morning

which can now be downloaded, for example, as a Markup Language KML file using this link:

http://suda.co.uk/projects/microformats/geo/get-geo.php?type=kml&uri=http://species.wikimedia.org/wiki/Wikispecies:Microformat

Other microformats allow the encoding and extraction of events, contact information, social relationships, and so on, More are being developed, including one for marking up taxonomic information.

Support for microformats will be built into version 3 of the Firefox browser, already in beta testing.

What is the 'Species' microformat?

edit

The 'Species' microformat is currently in draft (as a so-called "straw man"), and may change. It comprises an HTML class ("biota") for the whole microformat, plus a class for each taxonomic rank ("superphylum", "family", "genus"), a class for binimoals ("binomial") and classes for related terms such as "vernacualr", "hybrid", "cultivar", etc.

How can I contribute

edit

Their final naming, and use, is still being debated and you contributions will be welcome, on this article's talk page, the relevant page on the microformats wiki or the microformats mailing list for new proposals.

How might the 'Species' microformat be used?

edit

Imagine viewing a web page with a reference to a species - and being able to use an add-on to you browser to be taken directly to information about that species, on Wikispecies, or, say, Wikipedia, or Google Images, or another site, such as in an academic database, of your choosing.

Your software would automatically know to search site A if the scientific name referred to a moth, site B for a bird, and site C for a plant - and you could set your preferences as to which sites those were to be, and in which order two or more were to be searched (e.g. for moths, try UK Moths first, if not found try The Global Lepidoptera Names Index).

Or supposing someone writes a long, chronologically-ordered web page about all the birds, insects, mammals and plants they saw on a wildlife safari, with lots of prose description about the paces where they saw them and the people they were with, but you want to extract a list of species, sorted into alphabetical order within taxonomic class (birds first, then insects then...) or in taxonomic order.

Those are just two of the things a "species" microformat might do for you.

Your software, or a search engine, would be able to differentiate between a pages discussing HMS Beagle, the ship, and a Beagle dog; or birds that fly as opposed to a slang term for women.

Language

edit

Another benefit would be that user-agents could be instructed to treat text marked up in this way as not being in the base language of the document or element in which they occur - pronunciation should be as for Latin, they should not be translated (e.g. where a component word happens also to be a valid word in that language, such as the genus Colon, Circus cyaneus, Hesperia comma, or anything with major or minor on an English-language page) and should not be spell-checked, or be spell-checked with a specialised dictionary (a need identified in this 2003 ietf-languages discussion of language values for taxonomic names).

Checklists

edit

A further benefit the species microformat would bring is in the enriching and enhancement of species checklists, which are commonly found on the web. Broadly speaking, a species checklist is a list of taxa, usually for a particular group of similar organisms such as birds or vascular plants, found within a particular geographical region (a country, a region, a county, or a specific site, large or small). A typical example of a species checklist is the Checklist of Beetles of the British Isles which, as the name suggests, lists beetles known to be found within the British Isles. A Google Search for "species checklist" will reveal many other such examples. Species checklists are presented in a broadly consistent manner but are usually unable to be parsed and utilised by computers due to the lack of a common standard for marking them up in HTML. The species microformat would provide that common standard. A fully microformat enabled checklist would be parsable by computers and thus would provide developers with a means by which to aggregate and otherwise make use of this invaluable content beyond the current, rather limited, use of simple online viewing.

A specific example of checklist use might be in enabling biological recording software to parse and aggregate checklists in order to include them in their own species dictionaries. Typically there are waits of many months or even years while humans collate checklist changes manually for inclusion in recording software; automated checklist parsing and aggregation would greatly expedite and increase the efficiencies of this process.

Can I see the proposed microformat in use?

edit

The draft proposal has been implemented on Wikipedia, using the Taxobox template. It can be discovered using the Operator Extension for Firefox.

Installation instructions for Operator

edit

After installing Operator and restarting Firefox, install the species user script. After installing it (overwriting the previous version, if any), restart Firefox. Go into Operator's "Options", then "Data Formats" and add "Species"; then restart one last time. You should then be offered a range of options each time you visit a Wikipedia page with a Taxobox, including searching for that species (or other rank) on Wikispecies or other sites. For example Barn Owl or Mustelidae

Screenshots

edit

How can the microformat be added to WikiSpecies?

edit

This would require two things:

  1. An outer wrapper with class="biota"
  2. Wrapping each entry in an appropriate class

The former might be as simple as an opening <div class="biota">; but the question is: how to close it?

To achieve the latter:

Superregnum: [[Eukaryota]] <br />

(for example) would become

Superregnum: <span class="superkingdom">[[Eukaryota]]</span> <br />

These would seem to be jobs for a bot.

Also, the template {{VN}} could be modified to output a wrapping class="vernacular".

Example

edit

A proof-of-concept example, with "subfamily" but no other ranks, has been added to Glaucidium sanchezi; and to its subfamily, Surniinae.

Other microformats

edit

Other microformats which might be used on Wikispecies include hCard for people (authorities) and the proposed citation microformat for references.