Thursday, November 23, 2006

Open Content :-)

Well, some days ago Gerard told me that somebody on IRC said to him that WiktionaryZ, since it is not a Wikimedia Foundation project, is not Open Content ... well that is completely wrong.

Fact is that we would like very much to see WiktionaryZ in the Wikimedia foundation. Fact is that we were told by the foundation that they do not have the technical means to fulfill the requirements for Wiktionaryz (among others 24*7 uptime and adeguate backup procedures). Hosting WiktionaryZ will cost quite a lot in terms of financing ... well: therefore we need partners. But we do not only have financial partners - we also have many partners thanks to their contributions in terms of contents.

Our license is the same as the one the Wikimedia Foundation uses: GFDL and in addition, to allow for easier sharing of data in the form of dumps - may they be bilingual dictionaries or multilingual dictionaries of specific themes or the whole contents - the CC-BY license.

So: there are two very clear points - or better three. Yes, we would like to be within the Wikimedia foundation if that was technically possible. Unfortunately there is the lack of the technical possibilities to be hosted on the Wikimedia Foundation's servers and therefore: yes, we want to co-operate closely. We are indeed Open Content since we use adeguate licensing. Whoever says the contrary simply is wrong.

Thank you!

Wednesday, November 01, 2006

OLPC ... who would buy the laptop at which price?

Gerard just passed me this link: http://hardware.slashdot.org/article.pl?sid=06/11/01/0239245 and yes, that post interests me a lot, because we are working on a multilingual dictionary for the OLPC laptop.

So someone promised to pay USD 300 if other 100.000 people would promise to do the same. Well, that was a very difficult goal, because without the right surroundings, without having the possibility to really be able to use that laptop, who promised this did it probably out of an ideological mindset.

Considering my situation (having two kids of 4 1/2 years who already play and surf with mom's old PC) and that of many other people - considering the prices you pay for an e-book reader of a certain kind - I am sure that high turn arounds, diffusion and therefore refinancing of the project can be achieved quite well.

Consider to have kids in the States and in Europe who have parents with a PC and WiFi. For these kids having such a laptop would be an advantage. Consider schools that have computers where kids only can work about 2 hours a week ... what if instead of buying many single PCs they buy a server that holds data (e-books, dictionaries, exercises) etc. at disposal and kids can load it from there, read, play games, communicate through e-mail and instant messaging with their friends, teachers and parents, do their exercises (and if needed also additional ones) etc. That would make sense. Why?

Well, kids in our times read less and reading on their laptop could be perceived as fun. Books could become somewhat interactive .... instead of having to buy all the classical books they can be downloaded from the school's server or the internet ... and so at a certain stage the laptop would pay for itself. Please consider also that such a laptop can break and that if it is used at school it could then be necessary to repair it or buy a new one.

Therefore it is all about marketing and have people adopt the system for everyday use. That mans: if you ask for a price of approx. 100 EUR which corresponds to 120-130 USD you will find a much wider potential customer group than having it at 300 USD, which is definitely too much, since again the poorer families would be excluded, and here I am talking about Europe and the United States and not the really "poor" countries. Many families today cannot afford to pay for an adequate education for their kids - getting one OLPC laptop and having servers in schools would help even here in Europe and in the USA to assure that kids get a real chance in life.

Of course to pay 1 laptop for a poor country you need to sell 5 laptops ... but: consider all the classes in the whole world.

Distribution can go through the schools, but I would not be surprised that even chains of supermarkets such as Lidl and Aldi in Germany could be a good option - maybe adding an external USB CD reader/writer (that is sold separately) - so kids could have some kind of storage device. This year we are to late - but consider Christmas next year: the parents buy the laptop, the grandparents any USB device + aunts etc. CDs and similar things.

I excluded some more possibilities, because they will not be possible in all cities ... but with a little imagination for sure you will understand how much potential is behind this project.

All this and much more is possible - it is "only" a matter of evangelizing the project properly. Be sure I will do my part - please also do yours.

How you can do your part? Inform yourself about the project. If you are a developer: think about fun software for kids that help them to learn (plase always keep in mind that any software needs to be localised). You are a writer of kids stories? Well: write them :-). You are a designer? Help to create some characters which can be used for stories to tell, everyday stories, so that culture among kids can be exchanged. You know more than one language? Then you can join the OLPC Children's dictionary project or consider to be part of the localisation teams that will grow step by step in future.

You have plenty of money of course the OLPC project would be glad about donations - oh, you don't want to just donate it and prefer to contribute it to a specific project? Well contact people - also me: I can get you in touch with the right ones :-)

And of course: if you have ideas: tell us.

Sunday, October 15, 2006

Priorities ...

There are moments when one has to choose among several priorities. Now this is one of these moments ... I had to decide if to let others decide on how I spend my time or if it is me to decide it ... well, who knows me also knows that it is definitely me to decide on my own time.

Some weeks ago the usual trolls came along the nap.wikipedia - yes, I call them trolls, because they do not do facts, but only complain. They complain that people in Abruzzo (and please be aware that I do not say the whole of Abruzzo) speak Neapolitan (or better a variety of it). They claim that Abruzzese is a dialect of Italian quoting Ethnologue ... well some will say: it could be ... but no, it cannot be, because if Abruzzese was a dialect of Italian it would have developed after Italian had developed. This means it would have developed after the unification of Italy around the end of the XIX century. Well yes, Ethonologue says this, but Ethnologue did not make a study on Neapolitan like Rohlfs did. It is commonly accepted that Neapolitan is not only the language spoken in Naples, but, according to linguistic research, and also ISO, Neapolitan (code: nap) is spoken in many variations in quite a huge area. The fact that parts of Abruzzo are part of that macro language is not negotiable even if I can understand that people, not being linguists, but caring just about regional issues, have a problem with that.

The two or three people were offered our co-operation to get their own language code if they provided the necessary prerquisites. Instead they go ahead discussing, changing even wikipedia entries (this can be seen comparing the IPs and the edit time on the various wikis) like they want to have them. This fact is problematical in itself since without stating a paper they are doing original research or even worse vandalism. They do not provide other proof than Ethnologue, but that one cannot be correct, because I believe (well, know) that the Abruzzese language developed well before the unity of Italy and therefore it cannot be a dialect of Italian.

Now the last weeks went by discussing useless discussions - arriving at a point that people insult single contributors of nap.wikipedia directly. And no, we cannot answer them: they are anonymous. What I think about people who are not man enough to sign with their own name should be clear enough. What are they worrying about? If they are so sure about what they are saying and they can proof it by providing the prerequisites to be called an own language and not only a variety of the nap macro language, why don't they show up and tell us who they are? What would happen to them? I for myself always sign what I write with my name and you will always know what I think even if sometimes it might be not so nice.

Well at this stage time for decision has come: the language commission is dealing with the matter and it will take some time to come to a conclusion - no, I am not going to express myself there. They will get a copy of this post. This means that as long as it takes to have a decision on allowing Wikipedia to include original research or on offering these people the possibility from the side of the subcom to bring the prerequisites for an own language code, like already offered more than just once, and co-operating I will concentrate on projects that definitely need my attention. I am not going to have myself rule by some unknown people telling me what to do and what not.

I am sure they will have their chance to really poof they have their own language - but again: it does not depend on nap.wikipedia nor on the language subcom to proof that - it depends on them and on how serious they really are about what they want - it means they will have to work for their language and do good for it and I really hope, that, should they be right, they will do it ... otherwise another language would die out sooner or later (if it is considered to be a language of its own).

What does this mean? I am going to work on the OLPC Children's Dictionary on WiktionaryZ and on some other projects.

Also words & more needs some more attention - it will soon be part of a Stichting (a Dutch foundation). Well yes, it will not be a separate Italian foundation (ONLUS) - it is going to be integrated in an organisation that among other activities will care about translations.

Well, you who are reading this, if you want to co-operate on the OLPC Children's Dictionary project helping us to edit and improve 10 entries by adding translations of definitions + translations of words or by proofreading them and only adding them if they are missing, please write a short mail to kids@wiktionaryz.org stating into which language you would like to translate.

Thank you so much for your help and patience - and see you soon :-)

Tuesday, October 10, 2006

Why should we create many stubs on small wikipedias?

This is a crossposting - originally the following text comes out of an e-mail to Aphropohnewikis, a yahoo-group where you can talk and learn about the wikipedias in various African languages.

Let's consider: which stubs are those that invite most people to contribute? Well ... that will be different from one wikipedia to the other, but talking about wikipedias that need a low hurdle in order to allow people for participation we should consider the following things:
*generally people know something about their home town and neigbouring towns - so stubs on these town + the capitals + countries can invite to add information - if not: they would just be there waiting for someone to take up this project
*the almanac: people know the birthdays + dates when people died - presidents, authors, painters etc. - that is why you want to have the calendar pages immediately. Then take people there and have them add the dates .... this is where they have to write small texts, create a wiki link and automatically learn how to work on a wiki
*sport: in many countries of the world football (soccer) is one of the favourite sports ... stubs about the players and teams will invite to contribute

What we, from the small wikipedias, should try to understand: our first steps are not about getting immediately great articles - it is about getting people to start doing things - it is about teaching people how to work on a wiki. The calendar pages for example were the first articles I uploaded to nap wikipedia - almost immediately - not only to educate people - one year ago I had more problems with Neapolitan - so writing short entries was one way for me of learning. To day there are hardly any corrections to my entries - so it serves also to teach pople the language.

Don't take en.wikipedia and de.wikipedia as examples - they don't have many difficulties we have to cope with... think about what people can easily do and give them the possibility to do - create projects like: add the dates of all presidents - after that create another project - tell people about it in a newsletter - not only in a mailing list. Create a first page that changes often so that people want to look at it - and use the first page to send it out to people. Print the fist page an put it on notice boards etc.

People will only start to contribute if they find you ... if they don't know where you are and what you are doing, how can they help? Go outside the Wikipedia crowd, connect to people you would not even expect to read Wikipedia. Pass them info - if there are questions about a theme somewhere: answer the question on wikipedia and send people the link ... there are so many ways of doing things. Create a page of "please help to translate these articles" and pass it on to universities and people who could help ... even by working offline and sending the translations to you. Then attribute them and give them a copy of what they translated as pdf.

That is why I disagree with "we need first to have good articles and a community to create stubs" - why (we were working in two when I did this on nap - yes, I was criticised by some wikipedians, that doesn't matter: the nap.wikipedia is now read on an almost daily basis by many people in the world)? You want people to help and work: make it as easy as possible for them - don't expect them to start articles themselves - for newbies that is a very high hurdle to take.

Even in our very computerised world what is mostly needed is human interaction. People want to be taken by the hand ... maybe they need it even more than many years ago - and only we, the ones who know the projects, can do that - and yes: it takes time.

Sunday, October 08, 2006

Encyclopaedias ... how long do their entries need to be???

This is one of these questions. When it comes to wikipedia many people immediately think about long articles about each topic. Considering the multitude of encyclopaedias you can find in a book shop one thing is obvious: entries can range from one sentence to full pages.

Now people often complain that the encyclopaedias of regional language's wikipedias do have many stubs and just some long articles - well: I would compare these to the kind of encyclopaedias that give you basic information on many topics in just one or two volumes. I would not compare them to the big ones like Britannica, Brockhaus etc.

Of course over time you will find single articles that get longer, others will remain short for a long time, but ... well ... does it really matter? I'd say no - because Wikipedia is not in concurrence with anyone to my opinion - people put us into a concurrence position, but we are not, because the way each language version is built and works is so unique.

Fact is: one can only write about what he/she knows or translate ... well being a professional translator I really avoid to translate also in my free-time if I don't have to. Not having much time to contribute I normally care about the almanac + some stubs related to the almanac ... much more is not possible (see the main page of the Neapolitan wikipdia for the historical events I normally work on). Another thing I of course care about is notions on the city I live in and that in various languages.

Everyone of us editors have something particular they care about ... and I believe that is great - and it is exactly what wikipedia is about: collecting the knowledge of everybody and put it into one huge encyclopaedia.

Monday, September 04, 2006

Internet Explorer ... experiences after a long time not using it

Now I am here in Frankfurt ... dealing with an Internet terminal in the hotel that has Internet Explorer on it ... I am so much used to Firefox now that IE seems to be some kind of dinosaur ... or worse ... well it is terrible ... I am missing my firefox!!!!!

Thursday, August 31, 2006

Creating contents for many Wikipedias

The basis to this is a project about mass contents creation on meta and Wikidata. Mass contents creation is an idea of user Millosh and yes, he is right about that - it is how I already did certain stuff for the Neapolitan wikipedia.

There is so much easy to create contents out there that Wikipedias could share easily and even if we will not have Wikidata implemented into wikipedias we can use the data in databases to create stubs by using Mailmerge (in OpenOffice.org or Word) and upload them with the bot. (see my other post of today).

This means: if we now start to add all names of:
continents
countries
cities
rivers
mountains
monuments
places
yes, even streets, because there are some who have translations
lakes
seas
animals
plants
names of people (also these are translated)
etc.

And then we start to translate them. At the same time people care to add statistical data to a table that is exactly about this (if we cannot do this with a separate wikidata installation ... anyway we do not need relational information for now ... just information).

How many articles (stubs) can be created in this way and how many people can work on it?

We also should not forget about film and book titles, the Greek and Roman gods (I suppose other parts of the world will have other material on such tings).

It is really a huge project, but it is feasable ... there are many of us who have similar goals.

Where to start: well we need the infoboxes translated into as many languages as possible - and we then need the place names etc. translated. This must be combined with a datasheet.

Example:
Castua: http://it.wikipedia.org/wiki/Castua
We have the box on the right side with all the statistical/basic information - all that can be translated into many languages. The first sentence in the stub will simply be the definition of WiktionaryZ.
So most of it like stato (state), regione (region) etc. can be translated within WZ and - in that way we can populate the templates used to all wikis. As for the not "not seen part" of the template I would use or a lingua franca (English) or simply the same names that are visible.

There's not much about it - we need the lists to start off with. If we use the pagefromfile.py to upload the ready pages existing ones with the same name will be skipped and written in a logfile - these are then the only ones someone has to look after manually.

If sooner or later we get a pure wikidata application that takes the translations from WZ and combines them with the rest of the data: that would be great ... since that would avoid that we need to correct the entries when there are corrections.

Using the Geoboxes we already have a good way to compare lists ... but does it make sense to do it that way right now? Or does it make sense to prepare now all possible translations to be ready once we can have wikidata for geographical entries?

Hmmm ... I was interrupted quite often while writing this blog ... and I don't have the time to re-read now. So sorry if things seem to be a bit mixed up.

Tuesday, August 29, 2006

Adding contents to wikipedia using a bot

Well, this question comes up over and over again and I would like to describe here how to do this - and this is valid for Wikipedia and Wiktionary.

Now I did this quite often on the Italian wiktionary and on the Neapolitan wikipedia (and some other projects).

For the upload I use the pywikipediabot - and in particular pagefromfile.py. This bot was mainly created to upload pages to Wiktionary, but then it turned out to be a great tool for wikipedia as well.

You need a .txt file saved in utf-8 code. The bot understands the first word on the page between '''and''' pagename and will of course create that page. If the page already exists it will be skipped.

Now the question I got is how a typical entry would look like. Here is an example:

{{-start-}}
'''Rome''' is the capital of Italy.
{{-stop-}}

This means the bot would create the page Rome and add the contents "Rome is the capital of Italy." to the page.

If the first word between '''and''' is not the page name you can use a workaround using a comment:

{{-start-}}

'''Statistical data''' about Rome: ....
{{-stop-}}

In this case the template Rome is being created that contains statistical data.

Just add everything you want to see on the wikipage you want to create between start and stop.

Now one thing you are probably wondering about is how to do this for a huge number of cities or other data. Well: use mailmerge in OpenOffice.org Writer or Microsoft Office and create the layout for a typical template page, then enter the fields of the database you have and simply have it merge. Copy and paste the whole contents of the resulting file into a .txt file (Editor) and save it with utf-8 coding. You can try to do this with Word and OpenOffice.org as well (I mean create the utf-8 coded text file), but we noted that on some systems this creates problems. So just try it out.

Then copy the file in your pywikipediabot folder and call the file.

To have the bot run I use the following comand for the file nap.txt:
pagefromfile.py -start:{{-start-}} -end:{{-stop-}} -file:nap.txt -utf

Of course first you must login using login.py.

I hope this helps those who want to know how to do things. If you have further questions: well, just ask :-) I'll answer asap.

Tuesday, August 22, 2006

Lost in translation ????? (Episode 1)

This is called lost in translation, because sometimes we have to translate really funny (???) stuff and I would in some way like to collect these examples of source texts:
  • When MMS cannot be sent out due to setting of GPRS not complete, system will pop up a window to notify User that GPRS are not complete.
  • The E-mail may contain virus or other elements that will be harmful to your PC or cellular phone, if you do not certain the sender’s identity, please do not open the accessories.
  • If have input the Name, press OK key.
  • It is mainly used to provide digital mobile phone and other wireless terminal devices with wireless communication and information service for.

Saturday, August 19, 2006

What is a TMX file?

This is a question I received quite frequently during the last days and therefore I believe it makes sense to write a blog about it.

TMX stands for Translation Memory eXchange. It is a standard format used by many CAT-Tools (CAT = Computer Assisted Translation). CAT-Tools are mainly used by translators, but lately, talking with Connel and other membes in the Wiktionary chat he suggested them for language study - and yes, it makes sense to use them also there. TMX-files would then be even more relevant. Students translate texts of different levels and aften having the translations corrected by the teacher or professor they exchange them with others. When searching for a word in a specific sentence they can do a concordance search in the Translation Memory and so they will see how that specific word was used in other sentences.

As for translators Translation Memories are helpful in two ways: one for concordance search and two for repetitive texts and updates of manuals they already translated before. Imagine you translate a manual of a TV-set then, one year after a new model of that TV-set is produced and you get the follow-up translation. By using your translation memory of the year before you will find many sentences that are already there - maybe they need to be adapted a bit to make reading more fluent, sometimes you do not need to do even that (well, you have to check, of course) . This helps to assure quality.

Tags: , , , , , ,

Thursday, August 17, 2006

Articles ...

Quite a bunch of news this time copied from words & more - there you can also access to the links to read the complete articles.
  • Image:Es_.png Los médicos de este centro hospitalario pueden comunicarse, gracias a este sistema, en tiempo real con pacientes que hablen inglés, árabe, chelja, alemán y francés. - El servicio de traducción telefónica simultánea, puesto en marcha por el complejo hospitalario Carlos Haya de Málaga a finales de 2004, ha atendido ya 585 traducciones en inglés, árabe, chelja (idioma hablado fundamentalmente en el norte de África), alemán y francés.
  • Image:Es_.png La Policía Municipal de Madrid 'habla' idiomas - Los agentes locales cuentan desde este verano con un sistema pionero de traducción para atender a los turistas extranjeros en su propia lengua.
  • Image:It_.png Chiesa, quando Matteo Ricci tradusse Confucio in latino - La storia del gesuita che voleva globalizzare le culture: nel 1594 portò a termine quella che per l'epoca fu una vera e propria impresa culturale
  • Image:Usgb.png AAA Translation Selects Shafer Communications as its Public Relations Agency of Record - As AAA Translation expands to meet growing needs of the global marketplace, Shafer Communications will create comprehensive public relations program to support client's growth.
  • Image:Usgb.png Road sign leaves Welsh-speakers bewildered - Welsh-speaking cyclists have been left baffled - and possibly concerned for their health - after a bizarre translation mix-up.
  • Image:Usgb.png Dollar Renta Car Launches License Translation Service in Japan - Dollar Rent A Car, a subsidiary of Dollar Thrifty Automotive Group, Inc. (NYSE: DTG) today announced a new driver's license translation service for its Japanese customers traveling to the United States.
  • Image:Usgb.png Language Weaver to Demonstrate Integration of Automated Translation into Homeland Security Support Applications at Intelink Conference - Language Weaver, a leading software company developing enterprise software for the automated translation of human languages, today announced it will demonstrate multiple applications where automated language translation has been integrated with communications programs that help the homeland security and U.S. defense efforts.
  • Image:Usgb.png Ukrainian PM confirms his stance on Russian language issue - Ukraine's PM Viktor Yanukovich told journalists in Sochi today, that the Russian language would be granted status of the second national language in Ukraine as soon as the coalition secures a majority in the Supreme Rada.
  • Image:Usgb.png LanguageScape.com Helps Companies and Individuals Bridge Language Barriers and Expand Their Global Reach - BOSTON, Aug. 16 /PRNewswire/ -- EditAvenue Incorporated today announced the launch of http://www.LanguageScape.com, an online marketplace for translation services, to help both companies and individuals translate documents into any language.
  • Image:Usgb.png Verbalplanet.com Launches the World’s First Global Online Language Tuition Marketplace - United Kingdom (PRWEB) August 16, 2006 -- Verbalplanet.com is a global marketplace for online language tuition services, enabling language tutors to sell their services online and interact with language learners across the globe.
  • Image:Usgb.png Sorenson opens new sign-language interpreting centers across U.S. - Sorenson Communications has opened 17 new video relay service interpreting centers for deaf and hard-of-hearing individuals throughout the United States, the company said Tuesday.
  • Image:Usgb.png Monterey's Language Line shrinks Q2 loss to $2.9M - Language Line Holdings Inc. on Monday reported a second quarter loss of $2.9 million, about 12 percent lower than its loss of $3.3 million in the year-ago period.
  • Image:Usgb.png Language no issue in Chinese venues - Much has changed in China since President Richard M. Nixon's historic visit in 1972. As you walk down a street and see McDonald's, KFC, Starbucks and Victoria's Secret, you may think you are in New York, Los Angeles or Chicago rather than Beijing, Shanghai or Guangzhou.
  • Image:Usgb.png Doctors Look To VoIP To Bridge Language Barriers - A creative use of voice and video over IP is helping three California hospitals overcome increasingly common language barriers between doctors and patients.
  • Image:Usgb.png Views sought on boosting Gaelic - Scotland's first ever National Plan for Gaelic has gone out for public consultation.
  • Image:Usgb.png Wiradjuri Language resource launch - Parkes Shire library has acquired a number of books and CDs which form a Wiradjuri Language resource. The collection consists of a Wiradjuri Dictionary and kits on learning Wiradjuri and Wiradjuri language songs for children of all ages.
  • Image:De_.png Mehr Effizienz durch klare und freundliche Sprache - Das "Handbuch Bürgerkommunikation" verdankt seine Entstehung dem Projekt "Verständliche Verwaltung", das von der Stadtverwaltung Arnsberg in Angriff genommen wurde.
  • Image:De_.png Duden-Redaktion gibt Google nach - Sprache contra Markenschutz: Die Redaktion des Duden hat die Definition des Verbs "googeln" geändert.
  • Image:De_.png Spracherkennung: Immer präziser, immer effizienter - (pd) Computerprogramme, die Sprache in Text umwandeln, werden immer besser. Vor allem Krankenhäuser, Ärzte und Juristen nutzen Softwarelösungen zur digitalen Spracherkennung.
  • Image:De_.png Fremdsprachenkenntnisse erweitern den Freundeskreis - Fremdsprachen zu beherrschen ist hilfreich - Mischlingshund Alex erklärt warum...
  • Image:De_.png Eine Studie zeigt die sprachlichen Trends für das kommende Jahr - Die Slogans der deutschen Werbung werden kürzer, einfacher, deutschsprachiger und auffordernder. Medienbeobachter von Slogans.de und Trendbüro Hamburg vergleichen die Merkmale Wortwahl, Wortart, Wortanzahl, Worthäufigkeit, Wortverwendung, Satzbau, Satzart, Satzeichen und verwendete Sprache.
  • Image:De_.png Sechsfachsuche für Firefox - Der "Feuerfuchs" gilt zunehmend als Kult-Browser. Die Popularität des Gratis-Programms zeigt sich auch an der hohen Zahl der Plugins, die inzwischen im Web zu haben sind. Die Definero-Toolbar spendiert gleich sechs nützliche Suchroutinen.
  • Image:Fr_.png Quand la traduction fait passer les vessies pour des cyclistes - Par la faute d'une erreur de traduction, les cyclistes abordant un rond-point très fréquenté au Pays de Galles sont avertis par un panneau d'une «irritation de la vessie» en lieu et place d'un avis leur conseillant de descendre de vélo.
  • Image:Usgb.png Translators Selling on eBay - Lately I’ve seen a number of translators selling their services on eBay (Germany). Personally I think that eBay is not the best place to sell ...

Tuesday, August 15, 2006

What to do with all those links ....?

That was the question when I received a link yesterday ... now I get frequently interesting links and would like to put them somewhere - these are not always news - it can be a funny website, an e-book, simply an interesting website, a word on WiktionaryZ, an article on Wikipedia - there are so many possibilities ... you might have the same problem ... well: for that scope I opened the section Linksoup on words & more. There I simply paste the links with two or three words that should give an idea about the link. You can do the same btw - but please log in - in that way it is easier to understand who does what and if an IP is a spam IP or not. In a second stage, when the software on words & more is upgraded I will install semantic Mediawiki and tag them. From that moment on the "soup" will be easier to search. At a certain stage, when links become too much, a different scheme will be needed ... but for now ... I feel it is a good solution to make sure links do not get lost.

Monday, August 14, 2006

Articles ...

Links to articles added on wordsandmore.org

  • Image:De_.png Alles in Butter oder kommt doch das blaue Wunder? - „Nein, da bist du aber auf einem Holzweg, Lisa!“ ... Den Holzweg, aus dem die Redewendung resultiert, findet man im Mittelalter.
  • Image:De_.png Was war. Was wird. - Das Schöne am Journalismus ist, dass es immer etwas Neues zu lernen gibt, immer neue Worte buchstabiert werden müssen. Nehmen wir nur die "Degetoisierung" des deutschen Fernsehvolkes, die der Musikantenstadlerisierung auf dem Fuß folgt.
  • Image:Usgb.png Welcome to linguafranca.com - According to Kaled Fattal: “People say the Net works, but it only works for those communities whose native language is Latin-based.

MediaWiki software and "small" wikiprojects

This is a post I am sending right now to the wikitech-l ... I am posting it here as well, because many people will not read it there - only a few are subscribed. There is a discussion about wysiwyg and wiki software - now the discussion went into the direction of specific needs for specific languages and/or keyboards. Since I myself face some of these issues daily it only makes sense to talk about it.

Well: let's make some practical point: people on the nap.wikipedia are driven away because they have to use workarounds for '' - that is ' in whatever combination - we now uwse '' for '' to have a unique way to identify words and word combinations if some day we should need to use replace.py. Many now create non-standard artilces using the accent of à or á that it ` or ´ to create articles ... inserting spaces where they are not needed etc. (all sorts of strange solutions to avoid to see all in italics afterwards) before the &# thingie we needed to use to get things right. It is an annoyance to have continuously use workarounds - now I am quite "wikiphile" I'd say, being able to install my own wikis + extensions and create sometimes quite complicated templates ... imagine how someone feels who has no clue at all about wikis - someone who would like to start a first article and then, clicking on save gets weird stuff and therefore does not come back to edit again, because editing is "too difficult". I have also plenty of colleagues who maybe would like to help, but simply don't want to loose the time to learn wiki, because donating a translation is already a lot considering how much they would earn if they did translation jobs instead.

Next thingie - the | sign - you don't have it on the German keyboard - and it gets even worse if you have a laptop keyboard like I have - there is no way to reproduce it easily with alt+ like I could do it on a normal keyboard.

How I work on the nap.wikipedia? Well there are two things to it: I first write in OOo or Word, then I subsitute all '' with the numeric code with search and replace and I avoid to create wiki-links, because I am simply very annoyed (better I remain with nice words...) to copy and paste it here and there. Or I write the article on the wiki and then substitute the parts needed - both ways require loads of time more.

At least the {{ [[ are on the keyboard - so normal wiki-links are not much of an issue - for Wiktionary that works fine - but not for wikipedia where you have declensed forms of a word.

I find wikis great - but they are not suitable for the biggest part of potential contributors - maybe to 5% of them. See: I already said this last time I wrote about these issues here ... if Mediawiki was not developed by English sepaking people, but maybe by people speaking some kind of "strange" language - if it was not developed for the English speaking market only we would have more contributors in the regional editions it would be different - more conscious about such issues - no, please don't object - if people do things most of them only think about the English wikipedia - the other projects might exist, but are considered to be some kind of fun-stuff ... instead many of the small wikipedias are very serious projects - much more serious than anyone of you can imagine, they face many more problems you could ever imagine - a good software approach would help many of us to grow a better community and to be able to create more articles instead of having to re-read and adapt every article to our standards (' is some kind of standard for nap now) and people, who already have difficulties in writing local languages, yes we have an alphabetization rate of approx 2 to 4%, would not have to concentrate on multiple issues at a time, they could concentrate on the text they are writing ... that would be great ... that would be really a step ahead ... that would mean "think about the users" not "for the majority it's fine ... so who cares ..."

I am sorry that I have to write this ... it should not be necessary.

I'll post this mail also on my blog in order to have it accessible to more people.

Thank you for taking the time to read ... well I hope one day I will be able to say "thank you for caring about the small communities".

Best, Sabine

Saturday, August 12, 2006

OmegaT, WiktionaryZ, Betawiki ... some questions that need an answer ...

In the Wiktionary IRC the following questions were made by Connel: "... considers omegat.org. Is the intent for it to just auto-upload stuff to WZ? to/from ZW? Or betawiki, or both betawiki and WZ? Or is betawiki just for WikiMedia total localization?"

That is a lot ... so let me go step by step.

The intent of OmegaT is not to auto-upload stuff to WiktionaryZ or download it from there. Nor is it only there for Betawiki and WiktionaryZ, even if it will probably be used for both sooner or later. OmegaT is a CAT-Tool that helps translators to do their work.

What does this mean: imagine you use for all of your translations a tool that creates a Translation Memory, a file containing the translations you did segmented into sentences, combining source and target sentence. Then you do further translations and let the CAT-Tool access these already translated files. Now if your translation is of a subject you already translated chances are high that most terminology needed is already in there and you can even see in which context it was used. So with OmegaT you do a search on your project and the available translation memories to see if and how a term was already translated. This can help a lot.

Now consider a manual - of a machine, a computer, whatever. These manuals need updates once a new version of that machine or computer is produced. Normally companies than also just update the description and parts of it remain the same as before (simply because the functionality of these parts is still the same). When you then translate you will find these parts that are unchanged in your translation memory and depending on how you set your options OmegaT proposes the 100% match or overwrites the translation part of your project with the already existing translations. In this way you can save loads of time.

Having the right parser also the MediaWiki UI could be translated in such a way. Now we always will have people that translate things manually online and who will not use a CAT. This means that OmegaT should be able to access the single pages containing the messages on Betawiki, you translate them on your computer and store them to the page in the correct language version. This is feasible.

Another use will be: creation of contents for small wikipedias. Once we get our wiki read/wiki write option within OmegaT it is possible to start a translation of an article, let's say from the English wikipedia, and translate it to any language, let's say the Neapolitan wikipedia. This means you tell OmegaT which page to get on en.wikipedia and which page to write on nap.wikipedia. The same is valid for any African language. The advantage of this is: if there is no online-connection people can work offline on translations.

The translation memories out of these translations should be stored (WiktionaryZ is already enabled to upload translation memories) somewhere in order to allow others to access and use them to be faster and of higher quality during their own translations. Another aspect of doing things this way is: the proof reading of a translation is easier since you see the source text above the translation for each sentence. This eases the job a lot and the quality of the translated article raises.

Now to WiktionaryZ and OmegaT: OmegaT for now has quite a simple glossary function - you create a tab separated text file and put it into your glossary directory. While you translate OmegaT shows you the translation proposals for the words that are present in that sentence and in the glossary. Now imagine what that means if you connect the glossary function to WiktionaryZ: the whole repository of data at your fingertips - of course: considering the mass of data that is online in WiktionaryZ it becomes very important to attribute domains to terminology. Often a word can be translated in 20 ways or even more into another language ... well, it does not make sense if you are doing a translation about medical equipment that you get proposals from another domain, let's say machinery - the possibilities from other domains should only be proposed (showing that other domain) when there is no entry for medical equipment.

At this stage we don't have this domain structure for terminology on WiktionaryZ and therefore the data, once we have loads of it online, cannot be used - it would just create a huge mess and would be very time consuming. So one of the things we really nees asap is a domain structure where we can connect the single terms to - the sooner we have it the better .... otherwise we will have loads of double and triple work or WiktionaryZ could become completely useless for the use within OmegaT and as such it would not be of any advantage for translators. Not even for scientist really ... imagine a biologist search for terminology and get whatever result ... also those of machinery or whatever other domain.

Back to the use within OmegaT:

The next step is then: what if the searched term is not in WiktionaryZ ... I already noted that during my last translation - for now it is too time consuming to add terms to WiktionaryZ and also Wiktionary when you wish to do that while you are translating - but: it would make so much sense. So what is planned in the reference implementation for a translation glossary is that when working with OmegaT you get the possibility to add such a term directly from there. You simply tell OmegaT to add it to WiktionaryZ with your user ID and you can attribute all the necessary domains etc. without problems as well as tag the term as "definition needs to be added". What happens in that way is that WiktionaryZ will get quite a bunch of very specific terminology over time.

Another use is OmegaT for language lessons - Connel, from en.wiktionary thought about it and he is right: OmegaT could be used for language learning as well ... what if we have a huge sentence repository and people start to translate texts to study that language - they do not need a paper dictionary - OmegaT would help them to see the use of a word in various sentences and they would get the terminology proposals like the translators. When being back at school or university (or maybe also online with a language teacher) they can understand their errors, update WiktionaryZ and the online sentence repository.

For exams teachers would have a mass of proposals and they could determine which glossary group shall be included in the exams ... that is to be thought about ... it was not considered up to now even if there are already thoughts on how to use WiktionaryZ for language learning.

Did I miss something? Hmmm ... not sure. Well if you have questions: just ask :-)

Friday, August 11, 2006

Piedmontese - Venetian - Ukrainian in WiktionaryZ now

Yesterday evening three more languages were added to WiktionaryZ. Now it is also possible to add terminology in Piedmontese, Venetian and Ukrainian.

I hope people who read this will pass on the message, also to the relevant beer parlours.

Have fun! :-)

Thursday, August 10, 2006

Creating Open Contents against payment

It is approximately a year ago when there was the first translated article on Wikipedia that was paid for. The idea then was to create a translation service that works on that basis - I also put up a basic website that was never finished (because time was not due). Then there already were voices against such a way of earning money, but since it was only an article about a city things calmed down.

Some days ago there was a report about mywikibiz.com around and people reacted quite irritated ... a person creating articles for companies on Wikipedia being even paid for it? Many would say: that is impossible ... where is NPOV going ... well: this user already added some articles and they were not deleted, because they were OK. Now knowing he does it against payment, does that make the article any worse or better? No, the contents remains the same. The difference is: this person made a work out of his hobby and it seems as if he is good.

So where's the real problem? That he earns money because he needs to live from something? Well ... anyone of us does that ... in many different ways ...

The real problem is that people are not used yet to this thought ... someone earning money by working on Open Content ... but don't we have software developers that are paid for developing Open Source software and we happily use it because id does not cost anything? Well do we really expect that people maybe work the whole day on free projects and live from nothing? Or should everyone of us really use only the free time to do this?

The thing is: nobody will ever be able to stop such an initiative. If you really want to get your article there, you get it there. Isn't it better to know who it wrote and why it was written? Isn't it better to co-operate and make sure things go the right way? I would not wonder if there are already many people being paid to add certain kinds of articles and we just don't know about it. Now that Greg's work is publicly known (no, I don't know him - I only wrote him an e-mail telling him about wikitranslations) people react scandalised ... they refuse to understand: creating Open Content agains payment will be a job of the future ...

Now some of you are worrying about NPOV on Wikipedia - why? There are all the other editors that will, like always, chek the article, edit it if necessary. Once an article is published under GFDL on Wikipedia it can be edited and changed.

Do also consider one thing: it is not said that it is positive to have an article on Wikipedia about a person or a business. All facts that are known can be added. It could well be that companies will then read about the problems they eventually had some years ago and that by now everybody forgot - by having an encyclopaedic NPOV article all notes on history, if positive or negative, can and will be mentioned.

Consider also that not all companies can be included in Wikipedia. There are guidlines to follow. A small company next door is normally not to be inserted into Wikipedia. Most companies do not correspond to the Wikipedia guidelines for the insertion of companies. Therefore, some time ago, yellowikis was created. There you will have space for any kind of business to be inserted - it is a GFDL directory that anyone can edit. And it is getting more and more known. Once it is on a good level being present on Yellowikis, which is known as a business directory anybody can edit, means just as much as being present on Wikipedia - the only difference will be: companies that wrote history due to their inventions or due to their international high level presence like Siemens, Ferrari, Nokia just to name some, will have entries in Wikipedia and Yellowikis.


This blog was written from scratch with various interruptions - it may well be that I am going to add or change some parts.

African languages - how are they connected to what I do?

Well I was just writing my first message to the AphrophoneWikis discussion group - and of course people in that group will wonder why I joined it ... well: I will tell you and them on my blog, because maybe you have or know people who have similar goals - and if so: please contact us.

Well as you can see from the various blogs I am involved in languages ... WiktionaryZ ... Wikipedia ... and other projects. I very much care about regional languages and how to make their life easier, make them known, connect people etc.

In WiktionaryZ we will have many languages where actually there is no Wikipedia and for many it will be the first repository on the Internet. In Africa there are many languages that need attention, otherwise these languages together with the culture of the poeple who speaks them would die. According to UNESCO each week one language dies.

Another thing is: our small Wikipedias - may it be Venetian, Piemontese, Sicilian, Lombard, Neapolitan, Akan, Ripuarian, Asturian, Maltese, Samogitian etc. (just to name some without giving any preference) - all face very similar problems. There are not as many speakers available as for English, German, French, Italian and the other big ones - so often only a handful of people work on them. There are ways to co-operate and make contents available for all of us. This is why I am in the African Wiki group - this is why I want to communicate with people: to give more value to our all contributions - to create projects that help all wikipedias, also the big ones, to have better data available and reach higher quality, to do certain tasks only once having them available for all.

If we start to talk and get such things on the way then one of our goals is partly reached ... why partly? Well: there is loads of contents to be added to these projects and that will take time.

On which kind of data could wikipedias co-operate:
  • Geographical data
  • Basic data of species
  • Baisc data of people
  • Basic data of events
  • ... and much, much more ...

Finding an extraordinary blog ...

Today Martin Benjamin sent a mail to the newly founded group for Wikipedias in African languages. Well, yes, Ethan Zuckerman mainly is about African languages, but the points he makes are valid for all small Wikipedias around. I would very much like to see a co-operation start. We can get things on the way ... many small drops of water form an Ocean ... let the small Wikipedias become our ocean. Well read his blog about Your language or mine and you will understand.

OmegaT 1.6 RC 10 comes complete with Java (testo anche in italiano)

The other evening Henry Pijffers created the packages for Windows and Linux that can be used "out of the box" without the need to install Java.

Just download the Windows or Linux bundles by clicking on the link.

It is a huge step forwards for all these users that don't like to care about having Java installed or that have problems to check which Java version they have and eventually update it.

If you have questions or need help, please contact me through my talk page, write to the OmegaT user group or just come into the OmegaT IRC-Channel.

And now: have fun with OmegaT :-)

*****

L'altra sera Henry Pijffers ha creato i pacchetti per Windows e Linux che possono essere utilizzati "out of the box" senza dover installare Java.

Puoi scaricare il pacchetto Windows o Linux cliccando sul link.

È un grande passo avanti per tutti quelli che non hanno voglia di occuparsi dell'installazione o dell'attualizzazione di Java o che hanno problemi di farlo.

Se avete domande, per piacere contattatemi tramite la mia me pagina di discussione, scrivete al gruppo di utenti OmegaT (anche in italiano) o venite nello chat di OmegaT.

E ora: buon divertimento con OmegaT :-)

Monday, August 07, 2006

Unbuntu ... yes it works :-)

Well, yesterday was another Ubuntu day ... consider that I did not know how to install software on Linux - well I have some knowledge of DOS, but that is different even if you can imagine where to look and what more or less needs to be done.

Well: my problem was and still is my router - it will be substituted asap. As for the rest things work smoothly - anyway, before changing to Ubuntu you should try out the live CD - if that one works you can expect also the rest to work.

Ubuntu has a great Italian community - you can find them on IRC, in their discussion list and in their forum. They really helped me a lot and I would say particular thanks to MartinderKiller (no, he's not German, he's Italian) and Jacopo who paitently took me through the hurdles.

Well and then: special thanks to Celestianpower (see the link to his blog on the right side) - talking with him he gave me a link on how to easily install Skype ... well it was not that easy for me (due to the router problems), but since he gave me the link I decided to go ahead yesterday - and like always: when there is a problem I normally cannot stop until it is solved.

So if you ask me if it makes sense to pass over: yes, in particular because you can work contemporarily with Linux and Windows on your computer and so you have plenty of time to learn all you need - there's no reason to worry since at the beginning you will still mainly use windows until you are accustomed to Ubuntu (which in the end is very similar to Win - you only have more "direct communication" with your computer).

Friday, August 04, 2006

Collected articles in several languages (DE, EN, FR, IT) on words & more

On words & more I collect all sorts of langauge and translation related articles. Today I added particularly many articles and that is why I am copying that part down here. Please go through http://wordsandmore.org to have the functioning links to read the complete articles that interest you. It would simply take too much time to create them also in the blog.

There you also find the link to the archive. I hope you enjoy :-)

  • Image:Es_.png Pekín busca acabar con el 'Chinglish' para las Olimpiadas - PEKÍN (Reuters) - Las autoridades de Pekín esperan erradicar para los Juegos Olímpicos de 2008 el "Chinglish" de los rótulos bilingües de la capital china, según informaron el viernes los medios estatales.
  • Image:It_.png Livedictionary traduce "in diretta" pagine Web - Eloquents presenta una nuova versione di Livedictionary, un dizionario e vocabolario per Safari che traduce e spiega in diretta ogni termine presente in una pagina web.
  • Image:Usgb.png Lionbridge profit soars on Bowne acquisition - Lionbridge Technologies Inc., which provides translation services for companies selling software and other products overseas, said net income for the second quarter jumped year over year from $1.38 million to $3 million, helped largely by its acquisition of Bowne Global Solutions.
  • Image:Usgb.png Association for Machine Translation in the Americas Opens Its Conference Doors to Public to Showcase the Wonders of Automated Translation - STROUDSBURG, Pa.--(BUSINESS WIRE)--Aug. 1, 2006--At its seventh biennial conference, AMTA 2006 to be held at the Marriott in Cambridge, Massachusetts, The Association for Machine Translation in the Americas will open its doors to the public for a free showcase of applications on Thursday, Aug. 10 from noon to 4:00 pm.
  • Image:Usgb.png Koreans go to other Asian countries for language training - JUST when the Philippines has been recognized as one of the leading English training centers in Asia, Koreans nowadays are eyeing other Asian countries to train them on other international languages beside English.
  • Image:Usgb.png Nigeria: Don Decries Apathy to Local Language Studies - Prof of Yoruba Language, Olanrewaju Folorunso, has expressed concern over the apathy of students to the study of the country's indigenous languages.
  • Image:Usgb.png Language Weaver Expands Its Reach in Educational Market with Sale of Automated Translation Software to Educational Testing Service - Non-profit educational advancement company uses translation to simplify instruction for English language learners.
  • Image:Usgb.png Three nations to promote Malay language - Jakarta: Malaysia, Indonesia and Brunei will intensify collaboration in promoting the Malay language internationally.
  • Image:Usgb.png IM language is not spoiling English, say Canadian researchers - TORONTO: Are you one of those conformists who believe the IM culture is spoiling Queen's English?
  • Image:Usgb.png Watch your language - TODAY is the start of National Language Month, whose theme is "national languages." As everybody knows, there are eight other Philippine languages (they used to be called "dialects") and our Constitution, original language English, mandates two official languages, English and Filipino (which used to be called Tagalog). Is it any wonder that we do not seem to understand one another?
  • Image:De_.png Instant Messaging verdirbt die Sprache nicht - Die Abkürzungen, die Jugendliche für Nachrichten über Handy oder PC verwenden, haben wenig Einfluss auf ihre Sprache.
  • Image:De_.png Von seriös bis locker: der Schreibstil in E-Mails - Auch in E-Mails sollte je nach Adressat ein angemessener Schreibstil gewahrt werden. „E-Mails sind immer noch eine schriftliche Form der Kommunikation und keine Gesprochene wie zum Beispiel Chat“, sagt die Sprachwissenschaftlerin Annette Trabold vom Institut für Deutsche Sprache in Mannheim.
  • Image:De_.png In welcher Sprache? - In einigen Zürcher Kindergärten wird Hochdeutsch zur Standardsprache. Das gilt sogar für die Pause.
  • Image:De_.png Man spricht kein Deutsch in Brüssel - Die deutsche Sprache hat in der Hauptstadt Europas eine schwache Stellung – auch bei der Schweizer EU-Mission. Unter deutscher EU-Präsi-dentschaft im ersten Halbjahr 2007 soll sich dies ändern.
  • Image:Usgb.png Wikimania 2006 hits Cambridge - In just five years the humble wiki, a Web page that can be added to, excised from, and otherwise edited by pretty much anyone with an Internet connection, has fundamentally changed the way humans learn and communicate.
  • Image:De_.png Zwiebelfisch: Als ich noch der Klasse Sprecher war - Wieso wird der Stich einer Biene nicht Bienestich genannt? Die deutsche Sprache hält immer ein paar Buchstaben parat, um Fugen zwischen Wörtern zu füllen. Einige verzichten jedoch auf Fugenzeichen und verwenden lieber Fuge-Zeichen.
  • Image:De_.png Sieg des Deppenapostrophs - Früher war alles irgendwie besser: Viele schrieben "Ulli's Imbiss" - und einige wussten, dass es eigentlich "Ullis Imbiss" heißen muss. Doch zum Ärger der Sprachpfleger erlaubt der neue Duden beide Formen.
  • Image:De_.png Sprachen lernen soll Spaß machen - Neue Software auf dem Markt mit neuen Lernmethoden - Am Beispiel des Tschechischen - Das gute alte Vokabelheft und das stupide Pauken der fremden Grammatik sind Vergangenheit.
  • Image:De_.png TU Chemnitz verbessert sprechendes Online-Wörterbuch - Chemnitz (dpa) - Die Technische Universität Chemnitz hat ihr kostenloses Online-Wörterbuch in Deutsch und Englisch aufgerüstet.
  • Image:De_.png woerterbuch.info mit 950.000 Übersetzungen und Synonymen - Hamburg (pts/01.08.2006/10:00) - Das kostenlose Online-Wörterbuch http://www.woerterbuch.info hat die Marke von 950.000 Deutsch-Englisch Übersetzungen und Synonymen überschritten.
  • Image:De_.png Wie viel Englisch verträgt eine Pressemitteilung? - Das wollten wir von deutschen Journalisten wissen, die sich mit IT- und Technikthemen beschäftigen.
  • Image:De_.png Ein Schatzhaus aus Wörtern - Philologen arbeiten seit 112 Jahren an einem lateinischen Lexikon - Auch Wörter haben ein Leben. Und ihr Biograf heißt Hugo Beikircher. Auf seinem Schreibtisch in der Residenz stehen Pappkästen, schlicht und grau.
  • Image:De_.png Tragbares Sprachgenie - Mit einer Grundfläche von rund 15 x 8 Zentimetern und 312 Gramm Gewicht passt der Sprachcomputer "Partner EGm800" von Ectaco in jede Reisetasche.
  • Image:Fr_.png Systran: en hausse malgré la chute des bénéfices. - (Cercle Finance) - Systran, groupe spécialisé dans l'édition de logiciels de traduction automatique, a vu sa rentabilité se dégrader fortement au premier semestre 2006 sous l'effet des lourds investissements effectués en prévision de la sortie de la prochaine version 6 de son logiciel.
  • Image:Fr_.png Opposition à la traduction "au plus bas prix" - L'application rigide de la politique d'approvisionnement au plus bas prix contribuera à "marchandiser" la traduction au sein du gouvernement fédéral. C'est du moins ce que redoute le Conseil des traducteurs, terminologues et interprètes du Canada (CTTIC), qui craint pour l'avenir des pigistes et des petites maisons de traduction.