Saturday, December 08, 2007

When Neapolitans become Indians and Pulcinella dances his raindance ...

You are wondering about that title, right? So let me tell you the story. On our group for Neapolitan language we have people from many places in the world. Not all speak Italian, most speak English, many speak or learn Neapolitan ... what a mixture, right? Well ... discussions are always a bit particular since we have to use the language which most of us can understand and write - otherwise: would it make sense to discuss just in a handful of people? Uhmmm ... Wikipedians who are about NPOV will understand that the more people are involved the better it is for any kind of project and discussion.

There was this member of the group ... he complained about us writing in English, talks about the birth (?!?) of a new language called Naplish (never heard of this ... could be some Neapolitan indigenous dialect???) and says that there are only two natives in the group????? Moment ... but ... besides the two we all know who live in Naples also he lives there ... ehmmm ... I'd say they are at least three then, right? And the rest of the many of us who live in Neapolitan speaking regions??????????

Not enough ... first we are told that we should not write in English, but then who writes in English is actually the one complaining ... uhmmm ... does he eventually believe we don't understand him when he writes his very own version of Neapolitan??? Yes, he has a very own version that does not follow the actual grammar and spelling rules ... but: considering that we know how to talk ... well: we can understand what he writes when we apply Italian spelling rules to Neapolitan pronunciation (what a mess ... right? ... well, that's Neapolitan ...).

He talks about two mayor difficulties ... let me quote out of that mail:

"On one side the complexity of a language counting infinite variations on a
rather vast territory. Very often one may find deep lexical, phonetic and
syntactic discrepancies at very short distances.
On the other side our intellectuals and institutions live isolated in their
ivory towers, out of touch with the 'indians confined in their cultural reserve'where the language is still actively used."

Wow ... are there languages that are not complex? Without variations??? As much as I know only dead languages are without variations (but still complex) ... so what? Or am I completely ... ehmmm ... no ... ?!? I mean even if you take someone speaking Italian who lives in one region and then you take another one who lives in another region ... they both speak with variations, even grammatical variations ... even different words used to talk about the same thing, but all is considered to be Italian ... so why should this be different for Neapolitan which has much older roots and therefore more influences from outside?

"Our intellectuals" ... uhm ... who are these? I mean those few people who write Neapolitan to his opinion are intellectuals? And so I would be one of them? Gosh guys ... I'm an intellectual and I did not even know that ... that is hilarious ... And then: they live isolated in ivory towers? Wow... I'd like one ... that would eventually resolve our space problems ... and again: we then would be not in touch with the Indians (that is native Neapolitans) who live in cultural reserves (cities, small places etc.?) where the language is actively used?!?

Wow ... I am married to an Indian then ... and I live in the middle of a reserve ... hardly anybody here speaks Italian (well they know how to speak it, but not with people who live here - Italian is for strangers and even if I am German: I am not considered a stranger anymore ... they speak Neapolitan with me ...) or did I then become an Indian myself? (Indians, the real ones, please don't be upset about me using this - you have a great culture, please be proud of it and help to make your language and culture survive!)

Uhmmm and Pulcinella? Imagine him dancing a rain dance in his Neapolitan reserve ... well, hopefully rain will come and wash some negative thoughts away ... and Pulcinella will make people laugh like he has done for centuries ... have to write to a theatre company ... that's really something for them.

Quoting another piece:
"To aggravate the situation, these days you may find media and techniques which may tend to transform a language into an industrial product. The best
translation software will never be able to translate the nostalgic despair of
Santa Chiara or the magic wits of A rumba ré scugnizze."

First of all don't read the last sentence - there are 4 errors in 4 words ... if you want to know why, subscribe to the Napulitano newsgroup and read the explanations.

Then again: what I don't understand is that people seem to believe that Machine Translation is used and then the text is left as it is ... that would be plain stupid. Well, probably they don't do their homework about how professionals work and believe they are just better typers using babelfish ... Instead: machine translation it is a help in order to not having to type in all that stuff - and: machine translation is not suitable for any kind of text, certainly not poems and songs ... oh ... I forgot that our dear writer believes that Neapolitan is used only in literature and music ... but then again, when you go into the computershop here in the city and you ask for what you want using the proper word and pronunciation they often don't understand you: you need to know the Neapolitan one :-) (which very often is based on English writing with Neapolitan pronunciation ... a bit like anywhere).

Next quote:
"Considering this situation, we should publish easy, basic texts with the support of available technology, thus showing the fluency of an idiom still in use and evolution."

Wow ... he is a member of the group where I posted many bits and pieces from nap.wikipedia linking to it ... and I sent in all the short and easy written articles we published on Positanonews with "easy reading" links to OmegaWiki where people can find the word explained and translated into other languages ... or are my mails invisible???? (Please note that due to time problems the last article was not tagged that way.) Well over 4800 reads of one article show me that my mails seem to be somewhat visible ... or maybe there is some kind of magic wand for certain people that does not leave certain information through and filters it out??? Or is it Pulcinella doing one of his tricks ... see Pulicnella loves it when people struggle and fight (he is a bit naughty sometimes and helps things to go in a certain funny way ;-)

Next quote (with some omissis):
"As a little start, let me follow the example of ... (anonymizing dots), adapted to my native expression:"

And then The Devine Comedy by Dante (!!!!!!!) that is really one of the easiest texts of Italian (or better not really Italian) literature:

Nel mezzo del cammin di nostra vita
mi ritrovai per una selva oscura
ché la diritta via era smarrita. ....

Propie a mità ra vita mia
Ie me truvaie mmieze a na buscaglia scura,
roppe ch'eva perze a via maesta. ...."
(some attention to ortography is needed, sorry)

Half-way through my life
I found myself in a dark forest
after missing the right way. ....."

I mean calling a text by Dante (born in 1265, died in 1321) a simple text ... well ... I suppose Dante would be really upset ... I hope he will not come up during the dreams of somebody and tease and prick him ... that coud hurt ... well, he could use Pulcinella for that ... really ... and knowing how Pulcinella normally behaves :-D ...

Well let's say I prefer my Indians and Pulcinella dancing his raindance to using Dante to show simplicity of language ...

Friday, December 07, 2007

Translating Wikipedia articles (2)

Like I already said yesterday, I would come back to this argument today.

Apertium is already used in some projects, one of which is the Occitan Wikipedia. For those who are not familiar with Wikis: there you have the possibility to compare the not proofread version with the proofread version and that is something you will see by clicking here.

What you see on the left hand side is the text as it was after the machine translation and on the right hand side the proofread version of the text. The changes are highlighted in green on the left and in blue on the right hand side. There are even some parts of the text that were not changed at all.

The work on the glossary and the grammar rules (well I am not using the specific terminology here to make things understandable for all) has been going on for approximately one year now.

At a certain stage the problems arise from vocabulary that is missing and not so much from the rules. Of course these translations will probably never be a 100% perfect, but the quality depends very much on us and our adding terminology and classifying it.

Comparing the above result to what you would see for Spanish-Catalan, well the last one having been under development for years is much better.

You can find further reading about co-operation between Wikipedia and Apertium on the Apertium Wiki.

Language pairs that are right now available are:

  • Spanish←→Catalan
  • Spanish←→Galician
  • Spanish←→Portuguese (pt and pt_BR)
  • Catalan←→English
  • Catalan←→French
  • Catalan←→Occitan (oc and oc@aran)
  • Romanian→Spanish

Many other language pairs are under development. Of course: you may start on any language combination that is comfortable for you. Please keep in mind: the more similar two languages are the easier it is to program the rules, the faster the translation engine will produce good translations.

If you want to start to work on wordlists, please write me at: s.cretella (at) and tell me which language pair you are interested in. You can also reach me by skype at: sabinecretella

I will upload a wordlist to google docs and give you access. Please let me know if you have difficulties to work online (that is if you work with a dial-in connection).

The Apertium Chat is on Freenode.

One more thing I just received criticism since machine translation would flatten the language: well any translated text, in particular when it comes to literature translations, is post edited by a second person. The translation is never published directly since during translation - and you can be the best translator of the world - there are always some bits and pieces that sound a little strange or that do not really transport the scene into the other culture. And please allow me to introduce the concept of cultural localization here that will be explained in one of the future posts here and that was coined by Dr. Martin Benjamin who is part of the advisory board of Vox Humanitatis. The concept of cultural localization became then immediately part of the scope of the association.

And since I am adding notes here: please remember that the Fundraiser of the Wikimedia Foundation is still running and that you can help by donating and telling others that the fundraiser is on. For more information and to donate please click here.

Thursday, December 06, 2007

Translating Wikipedia articles ...

... into less resourced languages. Well, time has come that we can start to think about how to go about a faster creation of contents for the many small Wikipedias. As you all know, often we have just a handful of people creating and translating and then adapting articles. Well ... combining various Open Source and Open Content projects we can now go a further step into the direction of fast contents creation, but that does not mean: stub upload. This is a completely different way of doing things.

Apertium is a machine translation tool that works really great with similar languages. Approx. a year ago I had a translation from Spanish to Catalan done by Apertium through the online interface ( and asked some people of the Catalan Wikipedia to have a look at it. They told me that of course it was not perfect, but that it would be easy to proofread it and much faster than actually translating it. In March I made a similar test during a masters for translation studies in Pisa. I asked one of the students who was bilingual Spanish and Catalan to have a look at the outcome of the machine translation of a general text. The grammar was almost perfect and and also the terminology. There were just 5 corrections in a bit more than half a page (A4).

Now what does this mean to us: if we have a bilingual wordlist for two similar languages under a free license, we can pass it on to the Apertium people. From there we are a step closer of getting machine translation for that specific language combinations on their way.

One note inbetween for the Apertium people who might read this: please don't mind me not using specific terminology to describe what needs to be done. It could become to techy.

So the next step is to identify what a term is and how it needs to be handled. That is for example a verb needs to be declared as such, then one needs to give it a tag that indicates which conjugation scheme needs to be applied. This needs doing for all word types, that is verbs, nouns, adjectives etc. After that grammar rules need to be considered. Step by step the correctness level will be improved and the time invested to complete wordlists which will be available as google doc spreadsheet and to add all the additional information will help to save a lot of time. That is: now it will take longer, once the engine "learnt" how to deal with the terminology and grammar for that specific language combination creating contents will become much faster. This will help the small projects in such a way that the few editors can concentrate on proof reading and adapting and will result in a faster contents growth that has quite high quality.

This project that is going to care about less resourced languages will be one of the first lead through Vox Humanitatis. Should you be interested in helping with the wordlists, please let us know which language combination you would like to work on (that is starting from English right now and step by step from others since most of the Terminology is there in English). We will get you the access to the online document. If you need to work offline, please let us know. You can contact me by e-mail: s.cretella (at)

I just received a list of the supported language combinations as well as an example for Catalan-Occitan and some notes on evaluation of machine translation co-operating with a Wikipedia community. This means I have quite some further stuff to tell you. I'll post that info tomorrow, otherwise this blog would become too long.

Please also note that the documents will be released under CC-BY license and therefore they can be integrated into any wiktionary.

Naples' airport ... a very particular publicity

When on Sunday, 2nd November, I waited for the boarding to go to Barcelona, not having a book with me, I took some photos around and one is particular: the Italian mineral water producer Ferrarelle is creating publicity where each line of the publicity is written in a different language, in this case using "e 'o tiempo". It was the first time I saw Neapolitan taken to the level of all other European languages.

Thank you Ferrarelle!

(And yes, I don't mind giving a commercial company relevance if they do something like that).

Wednesday, December 05, 2007

Local languages applied - Catalan

During the last three days I was in Barcelona at the European Forum on Science Journalism, but more about that during the following days. Now I want to talk about a language that has made its comeback into every day life and is doing really well.

I have been to Barcelona quite a long time ago, just for a transfer to change airplane and reach Malaga. Then I remember the signs at the ariport were in Spanish and English. Today when you come out of the airport you see them in Catalan, English, Spanish. Now you will say: but what's so special, right? Well, I already know that Catalan now is "official", but one thing is knowing it and a different one is experiencing it. Imagine the Naples airport with signs in Neapolitan, English and Italian or the Turin airport with Piedmontese, English and Italian ... it gives a very particular feeling to see that. In Catalonia people are very proud of their language and culture. When you talk to them they will tell you that it is relevant to use it for anything, at home and in business and of course at school or universities.

I was not sure about what to do: going by bus to the city centre or taking a taxi, but considering that probably Sunday around noon was the only moment when I could have a short walk in the centre I chose the bus. It turned out to be the right decision ...

When I reached the centre I saw something that looked a bit like a market and of course: I had to go there and see what was on. It was an exhibition of goods with labels in Catalan and there was an information stand. So I looked a bit around to see what they had. I started to talk with some young people who could not really understand English and so they called a man who then spoke French. Well my French is far from being perfect, but we managed to talk and so I found out he was the husband of the president of ADEC (Associació en Defensa de l'Etiquetatge en Català) and that they are actively promoting Catalan and with that typical Catalan products.

They had publicity material and I took some of them with me. I also got the contact to the association itself - that is: I will need to write them, but I should do that in Catalan - so if there is someone out there who can help, it would be great.

Besides all the other particular things and also similarities I found between the Amalfi Coast's traditions and typical things in Barcelona, I immediately had one thing come in mind: if this is Catalonia and it is so distinct in how things go and are done, in architecture etc. how might other regions of Spain be? I mean: I had that feeling of wanting to know more.

Now what does that tell us: I was looking at Barcelona with the eyes of a tourist and seeing all these particularities with all these particular names brought me to the conclusion that besides Catalonia there are many other regions of Spain to be explored. This was local languages applied - unification using diversification. Underlining the differences Catalonia helps the other regions in Spain to become more relevant as well. Tourists will come back over and over again, because they will feel the need that there is really a lot to discover.

Now imagine how that would be when a tourist comes to Italy. Having different languages which distinguish the local products from the national onew will help them discovering Italy - if you have the same name written on a product you need to read further to understand where things are from and the local product, the particularities loose much of their marketing force. Instead of just buying a bottle of wine when they go back home that is from Italy they will buy various bottles of wine: from Sicily, Tuscany, Campania, Puglia, Veneto, Piedmont (just to name some). That is: it would increase the demand and therefore economy would have better prospectives.

In Catalonia they went so far to have Catalan also in Universities, used for research, in schools. Kids grow up with Catalan and Spanish on one level and immediately also start with English. This means these kids will be used to think in various ways and will be able to easily connect to other cultures. They will be better communicators. And this will lead again to economical advantages for the region.

Of course we cannot apply everything at once, but step by step reconsidering what local culture really is about, we can follow this really great example Catalonia is giving us.

The same is valid for the whole world - for all regions, all people, any culture.

And yes: sooner or later I'll be back in Barcelona and discover it more deeply. During the next days I'll (time permitting) tell you more about the Museum, the conference, people I met ...