THE PHRASE

The Oxford English Corpus

WHAT IT IS

An electronic databank of English writing and speech. It's words, phrases and sentences from blogs, chat rooms, newspapers, magazines and fiction. Lexicographers act as lab scientists, acquiring text to feed into the databank by studying language patterns and uncovering new words. The resulting corpus is used as a resource when new editions of the Oxford English Dictionary are put together.

HOW IT WORKS

Software sorts through the vast amount of text put into the databank, tagging new words, new meanings and new phrases and sorting them by part of speech, use, author, and so on. Lexicographers study the material once it is sorted. People who want to submit a new phrase can e-mail dictionaries@oup.com. "We're always happy to hear from people who want to have their text in," says Erin McKean, editor-in-chief for American Dictionaries for Oxford University Press. "It's like volunteering your body for science, only you don't have to die."

IN THE NEWS

The contents of the corpus hit 1 billion words on Wednesday. Not 1 billion different words, McKean explains. "It has a billion words total," she says. "The word 'the' is 50 million of that billion. Those are 50 million different contexts." Think of it like this: There's an ocean filled with fish, and each fish is a different species. "You don't just study one of each kind of fish," she says. "You want to know how they interact with each other. We're looking at the whole ecosystem of words."

WHAT IT'S NOT

Merriam-Webster. To keep their dictionaries updated, they use what's called a citation file, entering individual words, sentences or passages into a computer system, according to their Web site. Oxford does things differently. "We take a whole book, a whole article," says McKean.

NEW WORD OF THE DAY

"Pre-game," as a verb. It's a slang term, meaning to drink a lot of alcohol before an event where no alcohol will be served. The folks at Oxford are aware of it, but McKean says, "I don't see any example so far in the corpus because it's too new."

IT'S BEEN SAID

"The real importance of the corpus is not so much to find new words, but to make sure that we know what's going on in the language as a whole," says McKean. "Obviously, we'll find new words as we pour more text in. But it's kinda like just having a couple of hot days doesn't tell you the weather is changing. You have to be taking lots and lots of measurements everywhere."

MORE TO KNOW

-- Launched in January 2000, the Oxford English Corpus is the world's largest-funded language research project, costing $90,000 to $107,000 per year.

-- The corpus has helped identify how the spellings of common phrases have changed in everyday usage -- like "fazed by" becoming "phased by," or "free rein" becoming "free reign."

-- The corpus collects evidence from all the places on the globe where English is spoken.

LEARN MORE ONLINE

http://www.askoxford.com/oec/

___

Megan Scott is an asap reporter based in New York.

___

Want to comment? Sound off at soundoffasap@ap.org .

©2006 The Associated Press. All rights reserved. This material may not be published, broadcast, rewritten or redistributed. Learn more about our Privacy Policy.