Languages


For the past couple of weeks I’ve been working my way through my several hours of Wagiman recordings from my recent fieldtrip, all the time remarking at how excellent they are. It’s a combination of a good recording device; a Roland Edirol R-4, a great microphone with a proven track record in the field; a Røde NT41, and experience in microphone placement and input gain control2. I’m finding the best tokens of all the words I recorded for eventual insertion into the electronic versions of the Wagiman dictionary, including a Kirrkirr instance, and a mobile phone dictionary.

Splitting the recordings into some 1500 individual sound files is a time-consuming occupation, and unfortunately, as it’s the only one of my many jobs that isn’t actually paying me anything, higher priority tasks often win out.

Eventually though, we’ll have a Wagiman electronic dictionary ready for distribution, and a down-sampled version of the same ready for installation on mobile phones. So keep posted!

[Cross-posted at pfed.info]

  1. Both of which were loaned from PARADISEC. []
  2. Gain control was really key in the end, as it was raining most of the time,which would cause low-level hiss if the gain were set too high. Luckily my speaker didn’t mind talking directly and loudly into the microphone, so I was able to keep the gain right down to stop too much ambient noise getting in. []

Well, my time in the Territory has come to an end, almost. I’m sitting in Darwin airport waiting for my flight. Not a lot to do in Darwin, so I pretty much came straight here after getting dinner in town. Luckily, I stumbled upon an ethernet port that was obviously for one of those airport internet kiosks – the ones that charge 2 bucks per 8 minutes – that the airport has evidently neglected to disable, meaning I have free broadband internet for the first time in a month!

I’ve got plenty of time to make use of it too; my flight isn’t for another 4 hours1. I intended to studiously listen to my recordings and split them into individual sound files, one per word, for eventual insertion into the Wagiman Electronic Dictionary, but catching up on old email correspondences, reading old xckd comics and Language Log posts and downloading the latest Herald cryptic crossword file have sadly taken priority.

My work up here slowed down a little lately, owing to a bunch of meetings in the community this week, and the fact that my informant and I have been getting a little tired of covering tthe same territory. I actually got caught short this week and didn’t get to finish off the checking of the dictionary content, but I’ll be able to do some final checks the next time I’m up here, probably in the middle of the year2.

As far as the dictionary goes, it’s progressing nicely. I’ve been able to make some additions, and get rid of some words that were always dubious. The more recent ethnobiology research from Glenn Wightmann will need to be integrated at some stage, but I can do that from Sydney. The software for mobile phone dictionaries is also going steadily, and you can read all, or mostly, about that at pfed.info, the website we’ve created for this project. Demo dictionaries can be downloaded or tested online at pfed.info/wksite, although it’s all still in its infancy.

The reaction to the mobile phone dictionary that I’ve been showing off up here has pretty much been universally positive. Everyone I’ve shown it to has been interested in it, even the adults in the community, although the teenagers took a particular liking to it. Not only does this stand to reason, but it bodes well for what we’re actually trying to achieve with this project; increased access to a dictionary of one’s language in a format that’s easy to use. I haven’t wasted any time in showing it to the linguists up here and they too have shown interest, so much in fact that we’ve gone on to wunderkam3 dictionaries of a further two languages: Dalabon and Bilinarra.

We have a couple of other ideas up our collective sleeve that would potentially aid in the wider use of electronic dictionaries of minority languages, but I don’t want to give anything away just yet4.

  1. Actually it’s only 3 by now, such is the time it takes me to write a post these days. []
  2. So that I can escape the bitterst of Sydney’s winter, as well as having inadvertently escaped the worst of summer this time around. []
  3. This is a backformation from Wunderkammer, the name that James came up with to cover the mobile phone dictionary software. So, what else does a Wunderkammer do if it doesn’t wunderkam? My intended meaning for this word is ‘to convert a dictionary into a mobile phone-ready format’. I felt I needed a new word, since a default ‘do’ would imply that we had a hand in producing the content, which would clearly detract from the hard work of the researchers, language workers and speakers. []
  4. More accurately, I don’t want to promise anything that real-world constraints, such as computational impossibility or pecuniary limitations, would prevent me from being able to deliver, but ‘not spoiling the show’ sounds much better. []

I reckon I chose about the best time to come to the Northern Territory, given that this weekend in Sydney is meant to be swelteringly hot, 44 degrees odd, while up here it peaks at about 30 degrees before bucketing down with rain in the afternoon.

The work is also going relatively well, given the constraints of working in the rain, and with informants who are increasingly old and decreasingly mobile. I’ve been working with one speaker on clearing up a number of words that have been left out of the dictionary so far due to a lack of data, and we managed to get about half of them back in.

Everyone who has seen the mobile phone dictionary now has been interested in it, most of all the younger adults who predictably use their phones more than anyone. There has also been some interest in the mobile phone and Kirrkirr dictionaries from the Northern Territory Education Department, a representative of whom saw a demonstration of the software yesterday. This would mean, provided we can get permission from the various people involved, that we’ll be producing a Kirrkirr instance and mobile phone dictionary for Dalabon, a Gunwinyguan language from southern Arnhem Land.

The other main task I have over the next few weeks is to sit down with my speakers, when they can, and systematically go through the list of headwords in the dictionary, and procude clear, audible recordings of each for insertion into the dictionaries.

I can now confirm that I’ll be back in the territory in a little over a week’s time. It’s my first time back there in over 18 months, and it’ll be my first experience of a Northern Territory wet season, so I can’t wait.

The reason I’m going is to do some work for the electronic dictionary of Wagiman that James and I are producing, including a mobile phone version, using generously donated funds from the Hoffman Foundation. I’ll just be going over the revisions that need to be made to the current dictionary, record sounds and possibly take photos for inclusion into the dictionary, and discuss with the community how they’d like it to work.

For one thing, there are plenty of words that I know the older speakers don’t particularly want the younger kids to know about, so I’m guessing they’ll want such words ‘hidden’ from the kids’ version of the dictionary. However as James pointed out to me, the first words younger kids look up in dictionaries are swear words and taboo body parts, and having them there for them to gawk over provides a means with which the kids can relate to the dictionary matter.

Also, we’ve decided that it’s about time to set up a website and blog for the project, except we haven’t yet got around to installing the wordpress software. The site will contain information relating to the project, new releases of software, instructions on how to convert toolbox databases into other formats, and extensive documentation of the whole process.

<update>
The PFED website and blog is now up and running!
</update>

I read in this morning’s Herald that a school in Victoria has been trialing the use of iPods for facilitating school work. iTouches1 are being used to research and submit assignments, to download music and for students to communicate with their teachers over email. The results so far suggest that students are much more likely to interact with school work over the medium of an iPod than more traditional methods, and are more likely to use the iPods than laptops.

This story ties in with James and my work over the past year, which will continue throughout this year, into the use of mobile phones for the maintenance of endangered languages. It also overlaps with the government’s ‘education revolution’ promise of the last election, in which each student receives a laptop.

So far the government’s plan has been marred by cost blowouts – although I’m almost certain this is due to the ‘Government letterhead’ effect2 – and concerns about the long-term technical support of the computers. The iTouch wins hands down on both counts, as they’re much cheaper – about 300 bucks as opposed to a grand at least – and they can be easily supported by Apple’s existing technical support infrastructure, especially if the iTouches come with the extended warranty.

Another issue raised here is the future of personal technology – though this is getting considerably geeky of me. I’ve long thought that there was too much increasing overlap between personal portable computers and mobile phones. More and more, mobile phones are internet enabled (although costly, as you have to go through your telco), support more data, can run programs, and generally operate like mini-computers. My prediction has been that mobile phones will get bigger and more functional, and laptops will get smaller and more portable, until they meet in the middle with personal PDA-style touchscreen computers with phones in them. Obviously such things have already been created, like Blackberries, iPods and, until recently, palm pilots, but the market is only beginning to catch on.

In addition to mobile phone applications for dictionaries of endangered languages, we think we can probably make downloadable programs for other devices, like iPods, and mobile phones that run Android (Google’s open-source and free answer to Apple’s iPhone). And we dont just mean dictionary viewing programs, but dictionary creation tools as well.

Imagine, for instance, if students of outback schools were equipped with iTouches pre-loaded with bilingual Kriol-English learning programs, and were pre-configured with a Kriol language pack, so that the iTouch’s menus and options started out in Kriol, until such a time as their English literacy reaches the point where they can switch it over to operate it in English.

  1. I’ve written right to the end of this post and realised that I’ve said ‘iTouch’ way too many times. I should point out right from the start that the device may as well be any of this new breed of mobile phone – though preferably something developed by the Open Handset Alliance and running Android. But for ease, I’m just going to refer to ‘iPod’ and ‘iTouch’ all the way through. []
  2. The Government letterhead effect is when a private contractor increases their prices exponentially when they receive a quote request with a government letterhead. Remember the guys that wrote ‘No War’ on the Sydney Opera House in red paint? It cost $100,000 to clean.

    As if. []

A couple of months ago, I received a phonecall from a journalist from the Herald, who’d seen my appearance on SBS World News, and was interested in writing an article about the mobile phone dictionary project.

A few things have happened between then and now, including conferences, holidays and a didjeridu performance by Nicole Kidman on German TV that seems to have absorbed all local interest in indigenous affairs for a few days1, but on Friday morning, two articles appeared in the front page section of the Herald, based in part on an interview I gave a little while back.

The main article is about Phil Parker, the marketing guru who’s recently delisted his ‘books’ on Australian languages (including dictionaries, thesauruses and crossword puzzle books) after his dubious publications hit the virtual shelves, and after a small but vociferous group of linguists complained. The other article is about this mobile phone dictionary project that James and I are getting more and more involved in, and (very quickly) how this sort  of project can prevent the theft of data in the first place.

I feel that the article on Philip Parker makes me look like a bit of a whinger. Here’s the operative quote:

Aidan Wilson, a Sydney University linguist who wrote an honours thesis on the Wagiman language spoken north-west of Katherine, said Professor Parker had used the wrong spelling on the cover of his publication Webster’s English To Wageman Crossword Puzzles: Level 1.

Yes; it’s true that Parker had the wrong spelling, but it’s clearly not the reason I’m annoyed at the publication of these books. I’m more annoyed that the entirety of information within them is publicly available at locations that properly explain the data, the language, and cite sources, while these dictionaries, thesauruses and crossword puzzle books omit all of this information. In short, they are lossy2 versions of dictionaries already freely available.

The article also makes it sound like we, speakers of indigenous communities and linguists working with them, have hindered the publication of useful educational resources due to our collective sensitivities. It doesn’t help the situation that Parker probably had his heart in the right place in wanting to further disseminate information relating to critically endangered languages.

A dyslexic, he collects lists of words and publishes dictionaries, thesauruses and crossword puzzles at a loss, he says, in the interests of education. His work has been heralded as a way to create paper resources for resource-starved Third World students.

That’s all well and good, but perfectly good materials already exist – those that the linguists have produced and made freely available in full consultation with the language community. It surely isn’t helpful to convert these into forms in which the information is distilled and compressed such that it no longer conforms to even the minimum standard required for the most basic dictionary. All information apart from the name of the language, the headword and a single gloss has been omitted. That truly is lossy. To give you an idea of what I mean, here’s an entry from the Online Wagiman Dictionary:

ngal-gawu-mang

nominal

1. grandmother (mother’s mother)

Ga-ngotjje-ji-n ngal-gawu-mang-gu. Ga-ngotjje-ji-n gahan warren yerdeng-nga ya-nggi, ngal-gawu-mang warle-na. ‘He is scared of his grandmother. That kid ran away and hid because his grandmother growled him.’ (LM)

2.grandchild (from a woman to her daughter’s children)

see also gawu, ngal-gawu.

You can see that there are no less than 6 tiers of information here; a headword, part of speech, glosses divided into multiple senses, illustrative sentences, their glosses and importantly, the speaker responsible for that illustrative sentence, as well as related words. Parkers dictionary merely has this:

ngal-gawu-mang
grandmother
grandchild

I don’t think anyone could reasonably argue that the latter is more useful than the former, or even that it is good for it to be around in addition to the original. I would even go as far to say that its existence in this form is potentially harmful and outweighs any possible benefits of it as an educational resource.

There is another issue that stems from this that deserves attention. Suppose you found one of these dictionaries for a language you’ve never heard of. Let’s say it has some pretty extraordinary stuff in it and you’d like to know more, or even go to the sources and do some fact checking. How do you go about doing it? There’s no citations given anywhere,  no examples have made it through the distillation process and no speakers are referenced. We’re in a different situation as we know the original is a good quality publication due to Stephen Wilson’s work, and can pretty much trust that the ‘distilled’ version will more or less be correct. But if Parker gave the same treatment to a highly dubious dictionary, Urban dictionary, let’s say, then the output looks just as authoritative as something that derived from a reputable source in the first place. This clearly makes it very difficult for readers of dictionaries to make informed decisions about the quality of what they’ve got.

I should reiterate that I think Parker had the best of intentions; to further disseminate information about as many languages as possible, something I naturally admire as a linguist. Yet he fails to recognise that lexicography is not easy work; it can’t be done just with a data-harvester, a spreadsheet and a bunch of automatically generated Amazon.com comments and reviews. It takes linguists and lexicographers years to compile the information and resources necessary to create dictionaries. Producing very low-quality dictionaries, thesauruses and crossword puzzle books of some 600 worldwide languages does nothing but undermine their efforts.

  1. And that’s a whole nother post in its own right. []
  2. To borrow an audio term. []

I’ve been back in Sydney for almost a week now, having been in Melbourne before that to attend the University of Melbourne Linguistics and Applied Linguistics Postgraduates Conference, where I presented the Kaurna Electronic Dictionary1 to a sell-out crowd. It was the final leg of an epic, two part world wind whirlwind tour that began in Wellington almost two weeks ago. (more…)

  1. For some background on the dictionary, see these posts (definitely not automatically generated):
    Mobile Phone Dictionaries

    Ceased to Be

    Conferences, Seminars and Dictionaries

    More Good News
    One down, one to go []

I didn’t get a chance to post this yesterday as I was too busy after the conference having dinner and ‘sampling’ New Zealand’s finest Monteith’s beers1, but I think the presentation was mostly a success.

I probably should have refined it a little more on Thursday night instead of heading to the pub and, yes, sampling more of New Zealand’s finest Monteith’s beers, because I think it was a little rushed and felt a bit underbaked, but aside from that I got the feeling that the reception was good. I didn’t leave any time for questions unfortunately, and after my talk were two more in the session, meaning people probably let it slip into their subconscious. Nonetheless, there has been some positive feedback.

The four plenary talks were all brilliant. Sarah Ogilvie took a historical look at the impact of James Murray, the first editor of the Oxford English Dictionary, and his understated willingness to be as inclusive to borrowed words as he could, despite some later revisionists’ assertions that he was too stubborn with including foreign words. Bruce Moore on the other hand, carpetted the Oxford’s more recent publications for sloppy antipodean citations, showing that many of the multiple citations for such obscure Australian and New Zealand word such as Old Thing for a dish of salted beef and unleaven bread, all derived from a single source, a wordlist of Australian words published in 1941 by Sidney Baker, yet the OED has listed them as separate pieces of evidence.

More relevant to my talk though, were two other talks yesterday on electronic dictionary systems. One was by Dave Moskowitz who developed the Freelex dictionary creation software for the adult monolingual Māori dictionary2, mostly because he didn’t want to do it all himself. Freelex, as its name might suggest, is free (as in both beer and speech) and open source, and it runs on a MySQL backend. The other talk was by Gilles-Maurice de Schryver who developed TshwaneLex, a commercial product that does a similar job, but which runs on a prorietary format at its  backend, based on XML.

Each of those are in hugely more advanced stages of development that our humble XML-based multiple format dictionary project. Even so, the demonstration of the Kirrkirr Kaurna dictionary and the mobile phone dictionary, which I was able to run on the projector screen as an emulator, were absorbed by the audience with a great deal of interest; especially paying attention to the idea that mobile phones were just the obvious choice for housing dictionaries in some parts of the world. Such a system, for instance, would be perfect for Southern Africa, which has a similar internet situation to Northern Australia.

Among our many Monteith’s last night, we had a long discussion about some aspects of theoretical lexicography3 such as what purpose dictionaries are meant to serve. Several of the talks refered to dictionary users being put off by things such as labels, parts of speech, scientific names and so on. These talks mentioned ‘training’ the users how to get the most out of that dictionary. But another point of view, not necessarily my own, that was put forward last night was that it may be better to instead rebuild the dictionary so that it’s what the user wants and needs, rather than to persevere with a non-user-friendly dictionary that try to shoehorn the audience into it.

For instance, Julie Baillie gave a talk directly after mine, in which she presented Oxford’s new beginner’s wordlist, which uses corpus techniques to find the words most used by younger children, who are just beginning to read and write. The inspiration for her research, which culminated in the production of the Oxford Wordlist, was that children in primary school classes were learning to read and write using wordlists created in the 60s and 70s in Europe. They naturally involved concepts foreign to Australian and New Zealand kids abnd were for the most part useless for the kids to learn to read and write with. She compiled the wordlist by the frequency of these words as they appeared in small narratives written by children in target age groups, and therefore better reflect those children’s worldviews. So, she has rebuilt the dictionary to suit the needs of the user, rather than force the user to conform their needs to the functions of the dictionary.

Brilliant.

Anyway, that’s one conference down, one to go. I’m off to Melbourne next week for the Unimelb postgrad conference, and perhaps also to discuss the possibility of doing a PhD there beginning in 2010.

  1. These Monteith’s Brewery beers are fantastic, mostly. Unless you like cider you can give the Summer Ale a miss, and the Raddler Ale is pretty much like a shandy. By far the best is Original Ale, whose closest Australian analogue would have to be Squire’s Amber Ale. Following closely behind is the Pilsener.

    You can tell that I’ve been busy in research this week. []

  2. Which reminds me, I really want to find a copy of a good Māori dictionary before I leave []
  3. Far out, I am the King of the Nerds []

I’m sneakily writing this during afternoon tea of the first day of Australex on the lectern’s computer, which has an unrestricted internet connection, because I just heard a great New Zealandism that I thought I’d share.

The talk was by Tony Deverson from the University of Canterbury, talking about creating a dictionary of New Zealandisms and one of those that yhe brought up was to turn to custard, which is basically equivalent to Australian English to go pear-shaped. That, however, is not the New Zealandism that I want to share. When he was trying to gauge from the audience the wider use of the term, specifically whether it was used in Australia, he refered to Australia as The West Island.

In other news, I present tomorrow, so I’ll post something afterwards about how it unfolds. This will be my first time presenting anything, ever! And now someone needs to set up for their presentation, so I’d better go!

Furthermore to presenting the Kaurna electronic dictionaries at Australex next week, we’ve been invited to give a talk at the University of Melbourne Linguistics & Applied Linguistics Postgraduate Conference 2008, held November 21-22. It’s a great excuse for me to finally visit Melbourne for the first time in… about 13 years.

Then, this morning, we received confirmation that our abstract has been accepted for the 1st International Conference on Language Documentation and Conservation in Honolulu, Hawai’i in March next year! By which time we should both be well and truly stuck into our next phase of the project, being generously supported by a grant from the Hoffman foundation, which you can read about here.

Unfortunately for me, March next year is during the teaching period meaning I won’t be able to attend. But hopefully James will be free then and will present our project to a wider audience.

« Previous PageNext Page »