Endangered Languages


As I promised last week, I’ve managed to find a copy of the SBS World News report in which I appeared, that mentions and demonstrates the mobile phone dictionary – thanks to Jeremy who recorded it – and so I’ve put it up here.

Just bear in mind that I had no idea that I was going to be interviewed, which is why I’m unshaven and wearing – ahem – a Transformers T-shirt (Decepticons, no less).

I suppose this destroys for good any semblance of internet anonymity that I had feigned.

<UPDATE>
As Michael noticed, I think the large video file was causing some strife for the company that generously hosts this site, Affernet, so I’ve YouTubed it instead.
</UPDATE>

When collecting field recordings, always, always begin each audio file with a little blurb mentioning the date, the location, who’s present, and what language is being researched. It’ll cost you about 10 seconds of each recording and you’ll sound like a bit of a tool repeating yourself, but you’ll save yourself hours of work years later when you (finally) get around to archiving your recordings and you need to find all this information from other sources, like airline booking confirmation emails.

Oh, and transcribe your recordings while they’re fresh in your head, lest you find yourself devoting countless hours of unpaid work to do so when you have a brazillion1 other things to do.


  1. I’m alluding to a George W. Bush joke here:
    One of the president’s advisers rushes into the oval office and tells the president that there’s been a terrorist attack in Rio and that 2 Brazilians have been killed.
    “Oh my God!” Screams the president, to the astonishment of the advisor, who didn’t think the death of a mere 2 people would have fazed the president so much. “How many are in a brazillion?”

If you’re in Australia, tune in to SBS World News tonight either tomorrow or Sunday night [I just got a call from them; they’ve bumped it back to the weekend] at 6:30pm. I have a feeling that there’ll be an interesting report on indigenous languages in Australia, and the use of modern technology (such as electronic dictionaries and mobile phones) in their revitalisation.

Or such was the impression I got when I gave them the interview.

A few weeks ago I mentioned that a bunch of us at Sydney Uni had submitted an abstract for a conference presentation of the Kaurna electronic dictionary.

Just recently, we received the news that our abstract has been accepted. So, if you’re planning on coming along to Australex ’08 at the Victoria University of Wellington in November and you’d like to see the public unveiling of our Kirrkirr and mobile phone dictionaries, then by all means look out for us – by which I mean me.

As it’s been about a month since my last post, it’s probably about time I posted something at least to ensure that this site doesn’t get referred to as a ‘dead blog’. To make matters worse, not only have I not been posting, I’ve also been neglecting my reciprocal blogger duties of reading other people’s work, which I hope is a good indicator of how busy I’ve been. Reading through the myriad of blogs in my feed reader is  normally one of my most favoured activities.

So what is my excuse then?

The same old story really — work. But this time the various jobs are a little different. Besides my regular duties as audio engineer at Paradisec and my unrelenting duties as tutor of first-year linguistics, I have been preparing a grant application with a colleague to continue our work developing electronic dictionaries of minority languages, including dictionaries available as java applications on your mobile phone1.

We have also been preparing several papers, conference talks, seminars and so on to detail our project and our process of producing visually-rich multimedia electronic dictionaries from basic wordlists. There are a couple of conferences later in the year that this sort of thing would be perfect for, but we also plan to get a paper sent off to some prestigious lexicography journal somewhere.

As a teaser, here’s an abstract that we sent off to one such conference earlier this month:

Kaurna is the indigenous Australian language of Adelaide and the Adelaide Plains. It has not been actively used since 1929, when the last native speaker died. More recently, efforts have been undertaken to restore Kaurna to a state of community use. One recent project involved the creation of an electronic Kaurna dictionary carried out by a team at the University of Sydney during the first half of 2008. As this was a community-driven project, it had certain requirements, such as the need to archivally preserve the two main documentary sources of Kaurna: a book published in 1840, and a hand-written manuscript from 1857.

In an effort to maximise flexibility, portability and transparency, the Kaurna dictionary project opted for an XML formatted master dictionary that could then be converted to other formats, such as an HTML web-page, or even a printed dictionary. The current means of presentation is through Kirrkirr,  a multimedia-rich dictionary visualisation tool.

In this project we also developed software for presenting the dictionary on mobile phones. Mobile phones are almost ubiquitous today and most modern mobile phones have the memory capacity and features necessary for storing and presenting the dictionary content. They therefore present an excellent opportunity for learners of minority languages to have access to a dictionary. The mobile phone dictionary software is currently in its early stages, but we hope to improve it with further work and make it available to people compiling electronic dictionaries for other languages.

I’ll let you know how it all goes.


  1. You can read all about this project, which began with Kaurna, at a post of mine here, and at James’ post here. James’ post also includes example software for download, in case you want to try any of this out.

To continue the saga of the stolen wordlists (see my own posts on this here and here, or Peter Austin’s posts here and here for background) I’ve decided that if you can’t beat ‘em, join ‘em.

It is with that in mind that I give you (over the fold) the Murrinh-Patha crossword puzzle, my own creative work, using Philip M. Parker’s online dictionary of the Murrinh-Patha language. (more…)

A few posts back, I wrote about a book that David Nash had found on Amazon.com, which appeared to be a bi-directional crossword-puzzle book between English and Wageman [sic1]. It seemed as though these books, and a few others on Amazon on Wageman, contained the very same wordlist collected by a previous researcher and published under copyright at AIATSIS.

This is by no means an isolated incident. Parker has wordlists for around 600 languages stored online, and could potentially create crossword books, dictionaries and thesauri for each of them. See also Peter Austin’s post at Transient Languages and Cultures regarding a similar thing having happened to the Kamilaroi/Gamilaraay dictionary.

Instead of letting this issue slide into the obscurity of my Mabitjbaran, or Archives, I bought a copy of each, English to Wageman and Wageman to English, and have made contact with the ‘author’, Philip M. Parker, to solicit his explanation of what appears to be a blatant violation of copyright restrictions.

First thing’s first though. The books actually appear to be a pretty good educational resource, assuming that the school in Pine Creek is up to the point of recommencing its Wagiman language programs, of which I’ve only ever seen fleeting bits of evidence of ever having taken place2. The books comprise probably hundreds of automatically generated crosswords with the solution words in alphabetical order at the bottom. In spite of the books’ copyright restrictions by their supposed author, I’ve scanned a page of one of these books, which you can view here.

I’ve also done a little more background research on the author of these books, Philip M. Parker, and as it turns out, he’s not at all involved with dictionary compiling, language work or language education. In actual fact, he’s a professor of marketing and a generic entrepreneur at the Singapore campus of an international private business and marketing college based in France, called INSEAD. He even has a biography page on Wikipedia, which is interesting to this topic, as it goes into detail about his book publishing career. Apparently he’s quite famous in the marketing and entrepreneurial world.

His fame derives from the fact that he has developed a process that automatically produces and prints books on demand, with little or no interactive work. Each book that gets printed costs him an estimated 12 pence Sterling. So good is his software apparently that he has authored 85,764 books on sale at Amazon.com.

Parker estimates that it costs him about 12p to write a book, with, perhaps, not much difference in quality from what a competent wordsmith or an MBA might produce.

Nothing but the title need actually exist until somebody orders a copy. At that point, a computer assembles the book’s content and prints up a single copy.

Not much difference in quality from what a competent3 wordsmith might produce? If you check a random selection of some of these books, you’d be forgiven in not seeing what sort of quality he’s referring to:

The 2007-2012 Outlook for Tufted Washable Scatter Rugs, Bathmats, and Sets That Measure 6-Feet by 9-Feet or Smaller in India

Riveting. And that costs US$495.00, in case you were wondering.

What Parker does is harvest data, irrespective of what sort of data it is, and churns out books with it. It doesn’t matter if no one’s interested in the statistical prognostications for the Indian mid-sized bathmat industry, because each book is printed if and only if someone actually orders it; a copy may never actually exist. But considering there are libraries around the world that will buy a copy of each and every publication under the sun, Parker is probably earning a lot of money.

As I mentioned at the start, I’ve made contact with Parker and courteously attempted to solicit some information, such as which wordlist he used, and whether there were any copyright protections on that data. This is the response I got back:

Thank you for your concern; there are no copyright violations. Please feel free to copy my puzzles for your teaching4.

p.s. translations of words, themselves, cannot hold copyright, only the format in which they are presented (translations of single words are public knowledge; translations of creative works are not). I will later be doing anagrams, poems, rhyming sections, etc.. java-based web games (free to use), etc.

I felt a little confused by this response; I’m not very knowledgeable about copyright law and would have expected that someone’s research and work would be protected under copyright. At the same time though, I’m sure that Parker has done his legal research and knows full well what he can and cannot do. Peter Austin has a legal advantage over me in this respect; his Gamilaraay dictionary included some reconstructions:

It is not possible to copyright common knowledge such as words and meanings. Unfortunately for Parker, some of the quoted forms, like muRumuRu on page 11 are creative works since they are reconstitutions which I have posited on the basis of 19th century published and unpublished amateur recordings (as explained in the preface of my dictionaries — note that the orthographic R is not a Gamilaraay sound but a cover term for where I could not determine whether the source represented a flap rr or a continuant r). Now that is copying of creative work without attribution, in my view.

It may turn out to be a little more difficult to demonstrate some ‘creative work’ with the Wagiman dictionary, and we may just have to accept that legally, this sort of blatant plagiarism will be allowed to continue.

Let my warning be this: If you find a book written by Philip M. Parker that looks interesting, avoid it; you can probably find the content online for free.


  1. We spell it Wagiman these days. Wageman was the spelling adopted by earlier researchers, Ethnologue and AIATSIS. Phonetically speaking, I couldn’t judge either way. For ease of fact-checking, I’ll retain the spelling used in the books.
  2. Perhaps Wamut could help me out here.
  3. Notice also that he implies here that he is an incompetent wordsmith.
  4. I take my blog to be ‘teaching’, thereby indemnifying myself against the apparent copyright violation of my publishing of a scan of one of his crosswords

Over the weekend, David Nash drew my attention to a book that he found on Amazon, that purported to contain bilingual crosswords puzzles in English and Wageman1.

I was a bit perlexed by this, since, well, Wagiman doesn’t have much in the way of practical applications such as second-language learning, that is, of course, beyond the community of Wagiman people. It should be noted at this point though, that this book is not being marketed towards the small community of non-Wagiman speaking Wagiman people, but to a North American audience.

The book is published by a mob called Webster’s Online Dictionary, who I take to have no connection whatsoever to Merriam-Websters, given the look of their respective websites. Theirs appears to contain worldlists of hundreds and hundreds of languages, many of them minority languages, and it seems some of them have been converted to print, albeit in the bizarre form of bidirectional crossword puzzle books.

Here is the product description, as supplied by Amazon, and likely supplied by Philip M. Parker, the person behind Webster’s Online Dictionary:

Webster’s Crossword Puzzles are edited for three audiences. The first audience consists of students who are actively building their vocabularies in either Wageman or English in order to take foreign service, translation certification, Advanced Placement® (AP®) or similar examinations. By enjoying crossword puzzles, the reader can enrich their vocabulary in anticipation of an examination in either Wageman or English.

A translation certificate, Advanced Placement certificate, in Wagiman?  Really?

The second includes Wageman-speaking students enrolled in an English Language Program (ELP), an English as a Foreign Language (EFL) program, an English as a Second Language Program (ESL), or in a TOEFL® or TOEIC® preparation program.The third audience includes English-speaking students enrolled in bilingual education programs or Wageman speakers enrolled in English speaking schools.

EFL, ESL, TOEFL or TOEIC programs being run anywhere near Wagiman country? Really?

However, I can see in this book a benefit for some eventual teaching of Wagiman language in the local school, to help increase literacy in Wagiman, but unfortunately, the book uses an outdated orthography and may actually undermine increased Wagiman literacy efforts.

I wouldn’t want to financially support someone who – it appears – has taken a wordlist published in the public domain2 and has created something proprietary, like a book, with the goal of profit in mind, but I think I might still have to have a Wagiman-English crossword puzzle book on my shelf, just for the fun of it.


  1. Wageman was one of the variant spellings. Others include Wakiman (Cook, Austin) and Wogeman (Tyron).
  2. I find it ironic, furthermore, that while the original wordlist was a public domain web-publication, Webster’s Online Dictionary prohibits automatic harvesting of any of their data. I doubt that they copy-pasted each and every entry from the wordlist.

Not long ago, I received a call from a friend in Kybrook Farm. She informed me that an old lady, one of the last remaining Wagiman speakers, had died a little while earlier.

I’ve never experienced the death of a language informant before. I can only describe in that it feels exceedingly bitter to know that in addition to the pain of losing a friend, that such a death represents another irreversible step towards the loss of one of the world’s unique languages.

Of all the speakers, she was the best to work with. She was a warm and friendly woman who really enjoyed a laugh and would gladly speak to me for hours, selflessly helping me learn her language.

She will be missed by many, myself included.

Mamak, ngal-marttiwa.

Long term readers of this blog would probably know that I occasionally like to mess around with Google Earth and to try out new things to do with languages and so forth. It began with an exercise in mapping some known and established place names in the Sydney Metropolitan Area, mostly concentrated in and around the Harbour, and then it moved on to a small project of mine to map the region of the Northern Territory with which Wagiman is traditionally associated¹.

Another project I began, and finished, a while ago, was to take the divided segments of the AIATSIS map of Australia’s Indigenous languages, and overlay them as images onto Google’s Earth. When I say ‘finished’, what I mean is, I’d posted it to the Google Earth community as a downloadable file, but I didn’t know that I’d screwed it up and made the images too transparent to see the language boundaries clearly.

Just the other day though, Jungurra expressed some interest in using it for the Australian Languages course that he’ll be teaching from next week, which prompted me to go and fix it up and make all the images fully opaque. So now, the whole thing can be made transparent so that the images don’t necessarily block the satellite images beneath. The new file can be found here.

Preparing this made me realise just how much of a problem the curvature of the Earth actually is. The further south you get, the more the images have to be contorted into place, and therefore the larger the discrepancy in location at some points. Some of the maps are displaced by anything up to about a hundred kilometres.

I don’t know how receptive AIATSIS are to this sort of new-fangled technology, but I think it’s something that they, even in collaboration with Google, could should think about, and eventually produce a Google Maps or Google Earth package of files that show languages and language boundaries. I envisage a situation where the language names and boundaries are treated as place names and borders like any others, and not as images that become blurred the further in you zoom.

At the end of the day, this is a bit of fun, but perhaps there are practical applications to such widespread popular things like Google Earth such that linguists, and others, can put them to (more) good educational use.

~

<update>
Here’s a screenshot, which I wasn’t able to do earlier. This is with the opacity of the AIATSIS map overlayed images turned quite far down, otherwise, you’d just be looking at the overlay, and it wouldn’t be very interesting. You can also see here how imperfect the fitting together of the original segments is, as there’s quite a lot of overlap, and boundaries that don’t quite match. But you know, I did the best I could. Click on the image for the larger size.

screenshot

You can even see Wagiman in the middle there.
</update>


¹As opposed to ‘where Wagiman is spoken’, for clear sociolinguistic reasons.

« Previous PageNext Page »