Intro edit

This page documents my work using some SQL on the PLANTS checklist (from http://plants.usda.gov/dl_all.html ) to find some species not yet on Wikispecies.

  I have a concern that USDA's PLANTS database uses the Cronquist(sp?) classification system for vascular plants, although some said they preferred APG II here. Since I'm more of a database programmer than a taxonomist, I figure if the name is completely missing here and it is listed as "valid" on ITIS, I'll give it to MonoBot to create. If anyone wants me to stop, or just be more sophisticated in my choices, let me know.

Process edit

After downloading the complete PLANTS checklist[1] (as of 2008-03-23), I imported it into an Oracle database* (90,610 records total). Then I saved a list of the 542 families in the PLANTS checklist to my sandbox (see "Distinct Families" SQL below). Any families that were "redlinks" (in other words, did not exist on Wikispecies at the time), I drilled down on, into the corresponding species (see Osmunda SQL below).

*The table structure is:

create table PLANTS (
S	varchar2(7),
SYN	varchar2(7),
SCI	varchar2(155),
COMMON	varchar2(45),
FAM	varchar2(30)
);

Distinct Families edit

select '#[['|| FAM || ']]', count(*)
from P_ORIG
group by '#[['|| FAM || ']]'
order by 1;

See the list returned.

Osmunda SQL edit

SELECT distinct '* [['|| substr(SCI,1,instr(SCI,' ',1,2)-1) || ']]'
FROM p_orig
WHERE sci LIKE (SELECT max(substr(SCI,1,instr(SCI,' ')-1) || '%')
          FROM p_orig
          WHERE fam = 'Anemiaceae')
ORDER BY 1

Returns these valid species