Languages


I read in this morning’s Herald that a school in Victoria has been trialing the use of iPods for facilitating school work. iTouches1 are being used to research and submit assignments, to download music and for students to communicate with their teachers over email. The results so far suggest that students are much more likely to interact with school work over the medium of an iPod than more traditional methods, and are more likely to use the iPods than laptops.

This story ties in with James and my work over the past year, which will continue throughout this year, into the use of mobile phones for the maintenance of endangered languages. It also overlaps with the government’s ‘education revolution’ promise of the last election, in which each student receives a laptop.

So far the government’s plan has been marred by cost blowouts - although I’m almost certain this is due to the ‘Government letterhead’ effect2 - and concerns about the long-term technical support of the computers. The iTouch wins hands down on both counts, as they’re much cheaper - about 300 bucks as opposed to a grand at least - and they can be easily supported by Apple’s existing technical support infrastructure, especially if the iTouches come with the extended warranty.

Another issue raised here is the future of personal technology - though this is getting considerably geeky of me. I’ve long thought that there was too much increasing overlap between personal portable computers and mobile phones. More and more, mobile phones are internet enabled (although costly, as you have to go through your telco), support more data, can run programs, and generally operate like mini-computers. My prediction has been that mobile phones will get bigger and more functional, and laptops will get smaller and more portable, until they meet in the middle with personal PDA-style touchscreen computers with phones in them. Obviously such things have already been created, like Blackberries, iPods and, until recently, palm pilots, but the market is only beginning to catch on.

In addition to mobile phone applications for dictionaries of endangered languages, we think we can probably make downloadable programs for other devices, like iPods, and mobile phones that run Android (Google’s open-source and free answer to Apple’s iPhone). And we dont just mean dictionary viewing programs, but dictionary creation tools as well.

Imagine, for instance, if students of outback schools were equipped with iTouches pre-loaded with bilingual Kriol-English learning programs, and were pre-configured with a Kriol language pack, so that the iTouch’s menus and options started out in Kriol, until such a time as their English literacy reaches the point where they can switch it over to operate it in English.


  1. I’ve written right to the end of this post and realised that I’ve said ‘iTouch’ way too many times. I should point out right from the start that the device may as well be any of this new breed of mobile phone - though preferably something developed by the Open Handset Alliance and running Android. But for ease, I’m just going to refer to ‘iPod’ and ‘iTouch’ all the way through.
  2. The Government letterhead effect is when a private contractor increases their prices exponentially when they receive a quote request with a government letterhead. Remember the guys that wrote ‘No War’ on the Sydney Opera House in red paint? It cost $100,000 to clean.

    As if.

A couple of months ago, I received a phonecall from a journalist from the Herald, who’d seen my appearance on SBS World News, and was interested in writing an article about the mobile phone dictionary project.

A few things have happened between then and now, including conferences, holidays and a didjeridu performance by Nicole Kidman on German TV that seems to have absorbed all local interest in indigenous affairs for a few days1, but on Friday morning, two articles appeared in the front page section of the Herald, based in part on an interview I gave a little while back.

The main article is about Phil Parker, the marketing guru who’s recently delisted his ‘books’ on Australian languages (including dictionaries, thesauruses and crossword puzzle books) after his dubious publications hit the virtual shelves, and after a small but vociferous group of linguists complained. The other article is about this mobile phone dictionary project that James and I are getting more and more involved in, and (very quickly) how this sort  of project can prevent the theft of data in the first place.

I feel that the article on Philip Parker makes me look like a bit of a whinger. Here’s the operative quote:

Aidan Wilson, a Sydney University linguist who wrote an honours thesis on the Wagiman language spoken north-west of Katherine, said Professor Parker had used the wrong spelling on the cover of his publication Webster’s English To Wageman Crossword Puzzles: Level 1.

Yes; it’s true that Parker had the wrong spelling, but it’s clearly not the reason I’m annoyed at the publication of these books. I’m more annoyed that the entirety of information within them is publicly available at locations that properly explain the data, the language, and cite sources, while these dictionaries, thesauruses and crossword puzzle books omit all of this information. In short, they are lossy2 versions of dictionaries already freely available.

The article also makes it sound like we, speakers of indigenous communities and linguists working with them, have hindered the publication of useful educational resources due to our collective sensitivities. It doesn’t help the situation that Parker probably had his heart in the right place in wanting to further disseminate information relating to critically endangered languages.

A dyslexic, he collects lists of words and publishes dictionaries, thesauruses and crossword puzzles at a loss, he says, in the interests of education. His work has been heralded as a way to create paper resources for resource-starved Third World students.

That’s all well and good, but perfectly good materials already exist - those that the linguists have produced and made freely available in full consultation with the language community. It surely isn’t helpful to convert these into forms in which the information is distilled and compressed such that it no longer conforms to even the minimum standard required for the most basic dictionary. All information apart from the name of the language, the headword and a single gloss has been omitted. That truly is lossy. To give you an idea of what I mean, here’s an entry from the Online Wagiman Dictionary:

ngal-gawu-mang

nominal

1. grandmother (mother’s mother)

Ga-ngotjje-ji-n ngal-gawu-mang-gu. Ga-ngotjje-ji-n gahan warren yerdeng-nga ya-nggi, ngal-gawu-mang warle-na. ‘He is scared of his grandmother. That kid ran away and hid because his grandmother growled him.’ (LM)

2.grandchild (from a woman to her daughter’s children)

see also gawu, ngal-gawu.

You can see that there are no less than 6 tiers of information here; a headword, part of speech, glosses divided into multiple senses, illustrative sentences, their glosses and importantly, the speaker responsible for that illustrative sentence, as well as related words. Parkers dictionary merely has this:

ngal-gawu-mang
grandmother
grandchild

I don’t think anyone could reasonably argue that the latter is more useful than the former, or even that it is good for it to be around in addition to the original. I would even go as far to say that its existence in this form is potentially harmful and outweighs any possible benefits of it as an educational resource.

There is another issue that stems from this that deserves attention. Suppose you found one of these dictionaries for a language you’ve never heard of. Let’s say it has some pretty extraordinary stuff in it and you’d like to know more, or even go to the sources and do some fact checking. How do you go about doing it? There’s no citations given anywhere,  no examples have made it through the distillation process and no speakers are referenced. We’re in a different situation as we know the original is a good quality publication due to Stephen Wilson’s work, and can pretty much trust that the ‘distilled’ version will more or less be correct. But if Parker gave the same treatment to a highly dubious dictionary, Urban dictionary, let’s say, then the output looks just as authoritative as something that derived from a reputable source in the first place. This clearly makes it very difficult for readers of dictionaries to make informed decisions about the quality of what they’ve got.

I should reiterate that I think Parker had the best of intentions; to further disseminate information about as many languages as possible, something I naturally admire as a linguist. Yet he fails to recognise that lexicography is not easy work; it can’t be done just with a data-harvester, a spreadsheet and a bunch of automatically generated Amazon.com comments and reviews. It takes linguists and lexicographers years to compile the information and resources necessary to create dictionaries. Producing very low-quality dictionaries, thesauruses and crossword puzzle books of some 600 worldwide languages does nothing but undermine their efforts.


  1. And that’s a whole nother post in its own right.
  2. To borrow an audio term.

I’ve been back in Sydney for almost a week now, having been in Melbourne before that to attend the University of Melbourne Linguistics and Applied Linguistics Postgraduates Conference, where I presented the Kaurna Electronic Dictionary1 to a sell-out crowd. It was the final leg of an epic, two part world wind whirlwind tour that began in Wellington almost two weeks ago. (more…)


  1. For some background on the dictionary, see these posts (definitely not automatically generated):
    Mobile Phone Dictionaries

    Ceased to Be

    Conferences, Seminars and Dictionaries

    More Good News
    One down, one to go

I didn’t get a chance to post this yesterday as I was too busy after the conference having dinner and ’sampling’ New Zealand’s finest Monteith’s beers1, but I think the presentation was mostly a success.

I probably should have refined it a little more on Thursday night instead of heading to the pub and, yes, sampling more of New Zealand’s finest Monteith’s beers, because I think it was a little rushed and felt a bit underbaked, but aside from that I got the feeling that the reception was good. I didn’t leave any time for questions unfortunately, and after my talk were two more in the session, meaning people probably let it slip into their subconscious. Nonetheless, there has been some positive feedback.

The four plenary talks were all brilliant. Sarah Ogilvie took a historical look at the impact of James Murray, the first editor of the Oxford English Dictionary, and his understated willingness to be as inclusive to borrowed words as he could, despite some later revisionists’ assertions that he was too stubborn with including foreign words. Bruce Moore on the other hand, carpetted the Oxford’s more recent publications for sloppy antipodean citations, showing that many of the multiple citations for such obscure Australian and New Zealand word such as Old Thing for a dish of salted beef and unleaven bread, all derived from a single source, a wordlist of Australian words published in 1941 by Sidney Baker, yet the OED has listed them as separate pieces of evidence.

More relevant to my talk though, were two other talks yesterday on electronic dictionary systems. One was by Dave Moskowitz who developed the Freelex dictionary creation software for the adult monolingual Māori dictionary2, mostly because he didn’t want to do it all himself. Freelex, as its name might suggest, is free (as in both beer and speech) and open source, and it runs on a MySQL backend. The other talk was by Gilles-Maurice de Schryver who developed TshwaneLex, a commercial product that does a similar job, but which runs on a prorietary format at its  backend, based on XML.

Each of those are in hugely more advanced stages of development that our humble XML-based multiple format dictionary project. Even so, the demonstration of the Kirrkirr Kaurna dictionary and the mobile phone dictionary, which I was able to run on the projector screen as an emulator, were absorbed by the audience with a great deal of interest; especially paying attention to the idea that mobile phones were just the obvious choice for housing dictionaries in some parts of the world. Such a system, for instance, would be perfect for Southern Africa, which has a similar internet situation to Northern Australia.

Among our many Monteith’s last night, we had a long discussion about some aspects of theoretical lexicography3 such as what purpose dictionaries are meant to serve. Several of the talks refered to dictionary users being put off by things such as labels, parts of speech, scientific names and so on. These talks mentioned ‘training’ the users how to get the most out of that dictionary. But another point of view, not necessarily my own, that was put forward last night was that it may be better to instead rebuild the dictionary so that it’s what the user wants and needs, rather than to persevere with a non-user-friendly dictionary that try to shoehorn the audience into it.

For instance, Julie Baillie gave a talk directly after mine, in which she presented Oxford’s new beginner’s wordlist, which uses corpus techniques to find the words most used by younger children, who are just beginning to read and write. The inspiration for her research, which culminated in the production of the Oxford Wordlist, was that children in primary school classes were learning to read and write using wordlists created in the 60s and 70s in Europe. They naturally involved concepts foreign to Australian and New Zealand kids abnd were for the most part useless for the kids to learn to read and write with. She compiled the wordlist by the frequency of these words as they appeared in small narratives written by children in target age groups, and therefore better reflect those children’s worldviews. So, she has rebuilt the dictionary to suit the needs of the user, rather than force the user to conform their needs to the functions of the dictionary.

Brilliant.

Anyway, that’s one conference down, one to go. I’m off to Melbourne next week for the Unimelb postgrad conference, and perhaps also to discuss the possibility of doing a PhD there beginning in 2010.


  1. These Monteith’s Brewery beers are fantastic, mostly. Unless you like cider you can give the Summer Ale a miss, and the Raddler Ale is pretty much like a shandy. By far the best is Original Ale, whose closest Australian analogue would have to be Squire’s Amber Ale. Following closely behind is the Pilsener.

    You can tell that I’ve been busy in research this week.

  2. Which reminds me, I really want to find a copy of a good Māori dictionary before I leave
  3. Far out, I am the King of the Nerds

I’m sneakily writing this during afternoon tea of the first day of Australex on the lectern’s computer, which has an unrestricted internet connection, because I just heard a great New Zealandism that I thought I’d share.

The talk was by Tony Deverson from the University of Canterbury, talking about creating a dictionary of New Zealandisms and one of those that yhe brought up was to turn to custard, which is basically equivalent to Australian English to go pear-shaped. That, however, is not the New Zealandism that I want to share. When he was trying to gauge from the audience the wider use of the term, specifically whether it was used in Australia, he refered to Australia as The West Island.

In other news, I present tomorrow, so I’ll post something afterwards about how it unfolds. This will be my first time presenting anything, ever! And now someone needs to set up for their presentation, so I’d better go!

Furthermore to presenting the Kaurna electronic dictionaries at Australex next week, we’ve been invited to give a talk at the University of Melbourne Linguistics & Applied Linguistics Postgraduate Conference 2008, held November 21-22. It’s a great excuse for me to finally visit Melbourne for the first time in… about 13 years.

Then, this morning, we received confirmation that our abstract has been accepted for the 1st International Conference on Language Documentation and Conservation in Honolulu, Hawai’i in March next year! By which time we should both be well and truly stuck into our next phase of the project, being generously supported by a grant from the Hoffman foundation, which you can read about here.

Unfortunately for me, March next year is during the teaching period meaning I won’t be able to attend. But hopefully James will be free then and will present our project to a wider audience.

Last night’s Foreign Correspndent featured a short report about the Amaraic language of the village of Malula, Syria.

The story goes that Aramaic was the language of Jesus and was spoken in a fairly large region of the Middle East, until the 7th and 8th centuries when Arabic spread with Islam. Aramaic speakers - both Christian and Muslim - were apparently persecuted by Arabic-speaking Muslims and anyone who dared speak Aramaic would have their tongue cut out.

As a result, Aramaic was soon restricted to Malula, and survives today with a community of about 5,000 people, split down the middle into Christian and Muslim, but who live in complete harmony with each other. Even the head of the local Coptic church reckons that the Muslims speak better, more traditional Aramaic than the Christians do.

Aramaic is of course the language made famous recently by Mel Gibson’s Passion of the Christ, except that according to a Malula shepherd, a Muslim who has seen the film a dozen times, the Aramaic is ‘broken’ and they apparently speak too slow1.

The video of the report is up already, so if you have a spare ten minutes and are interested in this language, which sounds fantastic by the way, take a look.


  1. In keeping with the theme of accurate depictions of languages in films, I wonder if anyone knows whether the Mayan language in another of Gibson’s epics, that monstrosity Apocalypto, is at all accurate. I doubt it to be honest, as the film isn’t even consistent as to their location. At one point they’re in Guiana, at another they’re in Yucatec, and later on they’re in Brazil.

As I promised last week, I’ve managed to find a copy of the SBS World News report in which I appeared, that mentions and demonstrates the mobile phone dictionary - thanks to Jeremy who recorded it - and so I’ve put it up here.

Just bear in mind that I had no idea that I was going to be interviewed, which is why I’m unshaven and wearing - ahem - a Transformers T-shirt (Decepticons, no less).

I suppose this destroys for good any semblance of internet anonymity that I had feigned.

<UPDATE>
As Michael noticed, I think the large video file was causing some strife for the company that generously hosts this site, Affernet, so I’ve YouTubed it instead.
</UPDATE>

This morning’s post at Language Log on code switching reminded me that I intended to write about an instance of code switching by a friend of mine that I was fortunate enough to witness.

This friend is South African and her first language is Afrikaans, although she has been speaking Australian English for long enough that she only occasionally appears to have a twang of an accent. She still speaks Afrikaans with members of her family, as all live in Australia and speak often.

I was doing a favour for said friend which basically entailed my sitting in the passenger seat as she drove from her house to the RTA in her “Smart” Car. I put ’smart’ in double quotes for good reasons, which shall become clear below.

While we were en route, we had the misfortune of running over a nail, which caused one of the rear tyres to deflate, which we noticed only after it was too late; the tubeless tyre was shredded, and would need replacing. My friend’s driving test is minutes away.

“No worries,” I said, “I’ll put the spare on.”

“There’s no spare.” was my friend’s reply. “Smart cars don’t come with a spare,” ironically.

With no other option, we limped into a service station and asked whether they could fit a new tyre. The reply was that Smart Cars, being so terribly smart, use a slightly different sized tyre than any other car; their wheel size is absolutely unique and, owing to the minority of the Smart Car market in Australia, tyre suppliers don’t generally keep them in stock. I hope you can see now why I put ’smart’ in quotes.

To cut a long and largely irrelevant story short, my friend had no way of taking the test that day, so we set off back to her house. On the way, she phoned her brother to tell him, in Afrikaans, what had happened. Now, my Afrikaans is about as good as my Walmajarri, so I won’t try and transcribe it here, but when she related to her brother the cost of a new tyre, she did so while code switching into English.

As her Australian English accent is so good, she’d normally have no issue saying “a hundred and eighty five dollars” as [əˈhʌndɹədnˌeɪɾifaɪv.ˈdɔləz], exept that it came out as a particularly stereotypical Afrikaaner [əˈhʌndɹətəˌnaɪtifɒf.ˈdɔləs]1.

This is interesting to me because I’ve barely done any psycholinguistics or bilingualism in my undergrad studies, so I enjoy it when I come across cool little bits of evidence that allow me to make broad generalisations about the mind and the language faculty, such as the following.

This implies to me that my friend, and bilingual speakers in general, have an L1 bit2 of their brain and an L2 bit. Each bit contains a lexicon; the vocabulary of each language, and each bit contains a phonology. So, what happens during code switching? From this I’d take a naive guess that code switching is the act of moving out of the Lx bit, and taking a word from the lexicon of Ly (in this case she moved out of her L1 to select a word from her L2’s lexicon).

A question emerges here; where does a word’s phonetic representation come from? I would have previously thought (again, naively) that the mental lexicon contains the phonetic representation, much as a dictionary entry contains an IPA transcription. But here, the borrowed words are fed into the phonology of the borrowing language, so the words don’t bring their phonology with them.

My broad and uneducated conclusion then, is that within one’s L1 or L2 is contained separate modules of language: a lexicon, a phonology, a syntax and all the rest of it, and when speaking using L1, you use all the modules in that language, diverging from them as little as possible. So code switching allows words to go between L1 and L2 or vice versa, but the phonology being used is still that of the language you’re speaking.

If you were fluent in two languages and code switched from one to the other, I believe it would take a conscious effort to use those borrowed words with their ‘normal’ pronunciation, by which I mean, the pronunciation they usually take in the language they belong to. Conversely, if you’re a learner of a language, you haven’t yet formed a distinct and independent L2, so the pronunciation of the new language is all a conscious act, in which case, when code switching back to their L1, they’d still use their L1 phonology.

The more I think about this, the more it appears to be commonsense, so I apologise if, for instance, your an expert in bilingualism and either a) you’re wondering why anyone would other writing a thousand words on something so natural, or b) I’m completely wrong.


  1. My apologies if you can’t read IPA; just trust me that the way she said it was almost what I’d expect of a satire.
  2. For want of a better term. I realise that there’s no single bit, but I’m talking abstractly.

When collecting field recordings, always, always begin each audio file with a little blurb mentioning the date, the location, who’s present, and what language is being researched. It’ll cost you about 10 seconds of each recording and you’ll sound like a bit of a tool repeating yourself, but you’ll save yourself hours of work years later when you (finally) get around to archiving your recordings and you need to find all this information from other sources, like airline booking confirmation emails.

Oh, and transcribe your recordings while they’re fresh in your head, lest you find yourself devoting countless hours of unpaid work to do so when you have a brazillion1 other things to do.


  1. I’m alluding to a George W. Bush joke here:
    One of the president’s advisers rushes into the oval office and tells the president that there’s been a terrorist attack in Rio and that 2 Brazilians have been killed.
    “Oh my God!” Screams the president, to the astonishment of the advisor, who didn’t think the death of a mere 2 people would have fazed the president so much. “How many are in a brazillion?”

Next Page »