Wikispecies:Bots/Requests for approval/MerlIwBot

  1. Operator: w:de:User:Merlissimo
  2. Automatic or Manually Assisted: automatic
  3. Programming Language(s): Java
  4. Function Summary: interwiki
  5. Edit period(s) (e.g. Continuous, daily, one time run): Continuous
  6. Edit rate requested: 4-6 edits/min
  7. Already has a bot flag (Y/N): alswiki,arwiki,bnwiki,bswiki,cswiki,dewiki,dvwiki,enwiki,eswiki,fiu_vrowiki,fiwiki,frwiki,idwiki,itwiki,jawiki,kshwiki,ltwiki,mrwiki,mtwiki,ndswiki,nlwiki,nnwiki,plwiki,ptwiki,rowiki,ruwiki,simplewiki,slwiki,trwiki,ukwiki,wuuwiki, commons, global bot =(flag on 259 of 272 wikipedia interwiki projects)
  8. Function Details: The bot will add/modify/delete interwikis on all namespaces. I have seen that Template:Commons category has a fixed position end of articles. So it is possible to add commonscat automatically also in article namespace and not in category space only. On a scan of this wiki i found 7926 interwikis on 7040 pages linking to not existing pages. So a lot of work ... Merlissimo (talk) 23:58, 19 April 2011 (UTC)[reply]

Question: How will your bot determine when to add/delete/modify the interwikis? The names are not one-to-one. Quite often, there will be a single article on the English Wikipedia that will correspond to more than one taxon page here. For example, what would your bot do in regard to links for en:Calypso (orchid), which corresponds to the Wikispecies page Calypso but also covers information about the species Calypso bulbosa, which is a separate page here on Wikispecies but not a separate page on Wikipedia? The additional problem is that some Wikipedias will have separate pages for a genus and species, even if there is a single species in the genus, but other Wikipedias will have a single page in these situations. And some Wikipedias are not consistent about this. So, how does your bot determine which links are "correct"? I ask because I've seen bots repeatedly decimate the links for certain taxa on Wikipedia, and the owners have never been willing to concede that the problem exists. I am therefore skeptical that a bot can successfully do what you claim your bot will do. --EncycloPetey (talk) 05:51, 3 May 2011 (UTC)[reply]
My does not guess interwiki connections between articles because of same title (this is done for wiktionary only). The bot first creates an interwiki graph of already existing connections. These connections can be already existing language links from this project or links to wikispecies at wikipedia projects by using a template (e.g. en:Template:Wikispecies on enwiki). Then my bot uses some graph algorithms like strongly connected components for finding missing links These new links are added to articles or links to missing (e.g. deleted) articles are removed.
Your example does not contain any existing links between enwiki and wikispecies, so my bot cannot add any links. The perfect linking scenario for your example between wikispecies and enwiki would be:
  1. add {{wikispecies|Calypso}} to en:Calypso (orchid)
  2. add language link [[en:Calypso (orchid)]] to Calypso
  3. add __STATICREDIRECT__ to en:Calypso bulbosa, so that this redirect is treated as independent page having its own interwikis which are not mixed with en:Calypso (orchid)
  4. add [[en:Calypso bulbosa]] as languagelink to Calypso bulbosa
Merlissimo (talk) 11:04, 3 May 2011 (UTC)[reply]
The Taxoboxes of fiwiki and huwiki contains a single wikispecies parameter which seem to be very reliable. Perhaps i may use this as hint in future. Merlissimo (talk) 18:07, 4 May 2011 (UTC)[reply]
The same is with plwiki's: [1] and [2]. Both with "wikispecies" param. Ark (talk page) 19:40, 4 May 2011 (UTC)[reply]

I am willing to give it a try and approve it for a trial of 7 days. Open2universe | Talk 13:17, 4 May 2011 (UTC)[reply]

Ok, i'll start my bot using an write delay of 333 seconds. If you find any mistakes or have any ideas for improving the bots just tell me. Because the framework is written completely by my own i can simply adjust its behavior. Merlissimo (talk) 14:53, 4 May 2011 (UTC)[reply]
I have stopped the bot because i think otherwise it is spaming RecentChanges only. With an edit delay of 5 minutes the bot could do 800 unflagged edits within this week. I think there are enough test edit done, so that you could make your decision. There were 18792 pending edits for specieswiki in its queue when i stopped the bot. Merlissimo (talk) 12:44, 5 May 2011 (UTC)[reply]
See what your bot does to Equisetales; I'm curious becuase there are number of incorrect links there currently. --EncycloPetey (talk) 01:48, 5 May 2011 (UTC)[reply]
At the moment my bot wouldn't do anything on this page because the bot itself cannot decide which interwiki group would be correct. I am mainly using Tarjan's algorithm. If the bot reads the data from infoboxes as discussed above, also links from wikipedia to species are added to the linkgraph. This could improve the result. But the bot won't change anything if it is not sure. Please note that the bot reads the data from mediawiki api which return a unique language set [3].
I run the bot in debug mode with Equisetales as starting page, so nothing was changed but a readable output is created. The bot tries to group different pages (the (Gxx) tags). Merlissimo (talk) 04:53, 5 May 2011 (UTC)[reply]
You can read the log for Equisetales by visiting this old version of this page (190 kB). Merlissimo (talk) 19:55, 11 May 2011 (UTC)[reply]

I have done some checking of the bots results and it seems fine. Unless there is more discussion I will mark it as a bot later today Open2universe | Talk 13:05, 11 May 2011 (UTC)[reply]

Approved. Open2universe | Talk 01:38, 12 May 2011 (UTC)[reply]