Tuesday, August 29, 2006

Adding contents to wikipedia using a bot

Well, this question comes up over and over again and I would like to describe here how to do this - and this is valid for Wikipedia and Wiktionary.

Now I did this quite often on the Italian wiktionary and on the Neapolitan wikipedia (and some other projects).

For the upload I use the pywikipediabot - and in particular pagefromfile.py. This bot was mainly created to upload pages to Wiktionary, but then it turned out to be a great tool for wikipedia as well.

You need a .txt file saved in utf-8 code. The bot understands the first word on the page between '''and''' pagename and will of course create that page. If the page already exists it will be skipped.

Now the question I got is how a typical entry would look like. Here is an example:

{{-start-}}
'''Rome''' is the capital of Italy.
{{-stop-}}

This means the bot would create the page Rome and add the contents "Rome is the capital of Italy." to the page.

If the first word between '''and''' is not the page name you can use a workaround using a comment:

{{-start-}}

'''Statistical data''' about Rome: ....
{{-stop-}}

In this case the template Rome is being created that contains statistical data.

Just add everything you want to see on the wikipage you want to create between start and stop.

Now one thing you are probably wondering about is how to do this for a huge number of cities or other data. Well: use mailmerge in OpenOffice.org Writer or Microsoft Office and create the layout for a typical template page, then enter the fields of the database you have and simply have it merge. Copy and paste the whole contents of the resulting file into a .txt file (Editor) and save it with utf-8 coding. You can try to do this with Word and OpenOffice.org as well (I mean create the utf-8 coded text file), but we noted that on some systems this creates problems. So just try it out.

Then copy the file in your pywikipediabot folder and call the file.

To have the bot run I use the following comand for the file nap.txt:
pagefromfile.py -start:{{-start-}} -end:{{-stop-}} -file:nap.txt -utf

Of course first you must login using login.py.

I hope this helps those who want to know how to do things. If you have further questions: well, just ask :-) I'll answer asap.
Post a Comment