Monday, April 16, 2007

Talking Neapolitan with someone who is not Neapolitan

The other day I had my first conversation after one and half a year working with Bèrto ëd Sèra who you might know from the Piedmontese Wikipedia - among others he also spoke Neapolitan with me and: he can do this really well having his accent and so one understands that it is a learnt language for him and not the mothertongue. Strange enough I had a problem myself. I speak Neapolitan most of the time with my husband, neighbours, people here in the city and it is normal for me to simply do so. The strange thing is: hearing him talking with that accent I had the same reaction people had at the beginning when I came here when talking with me. I had like a mental block - even if I wanted: no word in Neapolitan would come out of my mouth. I had to speak either Italian or English. I believe that is what many people here have as a reaction ... only when you are considered to be part of the place where you live they talk "their language" with you.

This also shows me another thing: our less resourced linguistic entities need help - a lot of help - they need to become languages like all others that people learn and where people (me included) will not have that mental block that does not allow you to speak that language ...

Thursday, April 05, 2007

A Wikipedia that definitely needs help +++

In these days some (very few really) people are arguing against the policies of the Language Committee that created some policies for the creation of new Wikipedias.

There is one Wikipedia that really needs help and in some way I don't like to take it as an example, but it shows so clearly why working first in the Incubator and on Betawiki makes so much sense.

I am talking about the Wikipedia in Tarantino dialect. Besides the fact that the language code used for the project cannot be correct from an ISO 639 point of view and therefore sooner or later should be considered by us as well. The Wikipedia was created at the end of Septermber 2006 and I was one of the first people signing up starting to create the Babel templates there. These templates are used to indicate if somebody knows a language and to which level (the level btw. is quite a subjective indication besides the mention native and professional - these two are the only objective ones).

Back to that Wikipedia. Since its creation, let's say we start to consider 1 October 2006 as starting point, the UI was only poorly localized. Partly it was localized from English to Italian and not to Tarantino. The quite a bunch of chances in that sense can be seen on 10 March 2007 - the following are just some of them:

# (diff) (hist) . . N MediaWiki:Mycontris‎; 14:15 . . (+17) . . Davide21 (Talk | contribs) (New page: I miei contributi)
# (diff) (hist) . . N MediaWiki:Cite article link‎; 14:14 . . (+16) . . Davide21 (Talk | contribs) (New page: Cita questa voce)
# (diff) (hist) . . N MediaWiki:Sitesupport‎; 14:12 . . (+9) . . Davide21 (Talk | contribs) (New page: Donazioni)
# (diff) (hist) . . N MediaWiki:Whatlinkshere‎; 14:12 . . (+11) . . Davide21 (Talk | contribs) (New page: Puntano qui)
# (diff) (hist) . . N MediaWiki:Upload‎; 14:11 . . (+14) . . Davide21 (Talk | contribs) (New page: Carica un file)

In any case I would like to thank Davide21 who is doing work on the roa-tara wikipedia, because be translating into Italian at least he makes sure more people can understand it in Italy. But in any case it does not help the Tarantino dialect - it is counterproductive since it seems as if it was Italian in that way.

The Wikipedia today has 40 articles, three admins and one bureaucrat. Some of these admins were elected during the last days. From the beginning not even 1000 edits were made and most of them were not to create contents. The stub template is at least in Tarantino. You are wondering why I can say that? Well: I can read it - besides some specific words for that region it is enough to know Neapolitan and Italian to understand what is written there. Many words correspond to Neapolitan written with Italian coding.

On the page about the admin elections it is mentioned that User:Beren85 is the only person really knowing Tarantino. That is problematic, since he cannot do everything on his own. The others already help to clean spam (and small wikipedias get loads of it ...) and that is relevant.

Now I had a look at the 40 pages and that is what came out of it:

Most pages do not have contents.

Pages with contents in Tarantino:
  • http://roa-tara.wikipedia.org/wiki/Pozna%C5%84
  • http://roa-tara.wikipedia.org/wiki/Toru%C5%84
  • http://roa-tara.wikipedia.org/wiki/P%C3%A0ggina_Principale
  • http://roa-tara.wikipedia.org/wiki/%C3%80v%C3%AB_Marije
  • http://roa-tara.wikipedia.org/wiki/Kur%C3%B3w
  • http://roa-tara.wikipedia.org/wiki/%C5%81%C3%B3d%C5%BA

Most of the pages in Tarantino dialect, except the main page, have just one sentence and that one is amost the same anywhere:
  • Kurów éte 'na cettà pulacche.
Substitute the city name with another one and you get the contents of the other pages.

The longest bit of text is on the Ave Maria page, that is the prayer itself.

Pages with contents in Italian:
  • http://roa-tara.wikipedia.org/wiki/Aiuto:Canale_IRC
  • http://roa-tara.wikipedia.org/wiki/Pagine_delle_prove/sandbox
  • http://roa-tara.wikipedia.org/wiki/Ultime_modifiche

Pages with contents in English:
  • http://roa-tara.wikipedia.org/wiki/Anemone
  • http://roa-tara.wikipedia.org/wiki/Homo_erectus_lantianensis
  • http://roa-tara.wikipedia.org/wiki/Homo_neanderthalensis
  • http://roa-tara.wikipedia.org/wiki/Nuphar
  • http://roa-tara.wikipedia.org/wiki/Portale:Ch%C3%ACmeche (really half English, half Spanish titles)

The rest of the 40 contents pages have just a picture with the scientific Latin name under it.


This really means: they need help. Now considering the new policies this wikipedia would still be on the incubator and probably it would be the best place for it since even one person can work there step by step and has the time to attract other community members who can help him.

What I am wondering about is: where have al the Tarantino speaking people gone that supported the project? See: this would have been noted on Incubator - now the wikipedia is left on its own and they have to hassle with all sorts of problems.

Now to you who you are against the new policies: this wikipedia was created according to old ways of doing it - you support old ways ... so what I expect from you now is that you help this project - don't tell me: we are just volunteers: you volunteer in criticising the way of doing things, so you also need to stay to your word and show us that your way of doing things works better then ours. Also we, the language committee are all volunteers in several projects and also we have a private life, a family and work to do.

We know that working in the Language Committee, caring about language codes (co-operating with ISO-639-6 in another project), considering how to make sure a project can have success etc. will get us loads of critics - and we know we are only at the beginning.

Once we got the Wikipedia creation stuff right (and that took hours and days of work) we will need to care about the other projects like Wiktionary, Wikibooks, Wikisourc etc. and we will need loads of input from them - this means: we need to understand where the problems are and then try to find ways to avoid them.

Thanks for taking the time to read all this.

Localising Mediawiki Software

Now this is an answer to a question that came up on the Mediawiki i18n list - how to localize Mediawiki. The best way to do it is to do it in one place. For now there is Betawiki that helps, one day we will hopefully have that feature integrated in Incubator. Nevertheless it takes quite a lot of time when you need to translate the messages one by one: open a page, write some word, save.

This needed time shortens a lot when you work with the help of a CAT-Tool like OmegaT.

For now working with OmegaT is possible, but it needs again Nikerabbits help to do so - he needs to extract the messages and then only one person can work on it at a time and then these messages can be uploaded again with the help of Nikerabbit. It is not the right way at this time, since he already has to deal with many things on Betawiki, so no, I don't want him to work more than necessary.

There is a feature on the way called WikiRead-WikiWrite (for OmegaT) - this feature should have been already there but due to health problems of the programmer it was not. Now we asked another programmer if he has time to do it, since otherwise we would loose funding and also the tool. It is a situation we would never have wanted to happen, but that's life - there is always that unpredictable part in it.

WikiRead-WikiWrite for OmegaT is supposed to enable the CAT-Tool to access a wiki page and get it into OmegaT so by translating you create a translation memory. This translation memory is relevant for translating similar texts and terminology research. We need consistency of terminology within Mediawiki at some stage - so: yes, it makes sense to use it.

Once we have the possibility OmegaT can read all the to be translated pages at once, one translates offline and stores to target. Translating offline without the need to have to open one by one the pages and then store them one by one requires only a third of the actual time needed if you do it online. Mind me: if you need to translate only one or two messages it makes sense to do it directly on Betawiki, if you have to deal with a whole series of messages then OmegaT will help to do a better job.

In a second stage we do imagine a wikidata application where the translation memories are stored. It is not so different from what we have now on OmegaWiki. What must be different is:

  • no limitation in lenghts of the entry on syntrans level
  • the definition field becomes a notes field and may be empty
  • license information must be properly stored according to the collection
  • possibility to store and retrieve translation memories

This means that any open source software can have its repository there and it also means that many strings you find in multiple open source applications can be re-used quite easily requiring less work in localization.

Of course having OmegaT with the direct connection to download the translation memory from there would be even better - but that needs additional programming from withing OmegaT - I don't have a clue on how much work this is ... well who knows how to code will be able to tell us I suppose.

I know we are quite a step away from that scenario even if WikiRead-WikiWrite hopefully will be there soon ...

Well if you want to make me a special birthday present and a huge improvement in localization efforts: consider to work on that bit.

Probably I will need to re-edit this post since I am not sure if I really considered everything (I suppose not ... its a period that often requires me to interrupt what I am doing).