On Concordancing. The Advanced Reader´s Collocation Searcher (ARCS)

On Concordancing

What is interesting about the questions Barry and Geoff have raised with respect to concordances is that they intersect two fields--corpus linguistics and cognitive psychology. Corpus linguistics deals with written texts and oral discourses and cognitive psychology deals with what Walter Kintsch calls "situation models." These situation models are the mental representations that listeners/readers construct as they listen to or read the discourse or text. Misunderstanding occurs when the reader constructs a situation model that is not well-related to the text that the reader is trying to process.
    I find the concept of situation model useful in ESL teaching because we can figure out what is unhelpful in a student's situation model for certain kinds of texts (say, mathematical word problems) and try to teach the student strategies that permit him or her to construct better schemas to deal with that kind of text.
    About Geoff's "vacuum cleaner": When somebody decides to create a corpus, he or she makes principled decisions about what will go into it. In a new paperback, Biber, Conrad & Reppen have a nice little section on how one makes such decisions. A corpus is a sampling of the language and what a linguist samples depends on the questions she wants to ask. Register variation is crucial and a linguist can sample many registers or can focus upon just certain ones. For instance, one of my grad students in her thesis focussed on the science register for children. The corpus consisted of the chapters on magnetism and electricity in science textbooks published by three different publishers on two grade levels, Grades 3 and 4--a total of six chapters. An example result of the analysis of the corpus was that she found lots of passive clauses. From cognitive psychology, however, we know that 10-year old children have only a limited understanding of certain types of passives. So the situation model (mental representation of the text) that they build up as they read these texts cannot be accurate, given what we know about the children's psycholinguistic development. Unfortunately, American and Canadian publishers pay absolutely no attention to syntax or discourse structure in the science and social studies textbooks they publish.
    Another of my students investigated the ESL textbook that is used at the University of Havana to teach academic English to biology students and compared her findings to Biber's findings about science writing. The happy outcome of her analysis was that indeed the U of H ESL textbook for biology majors contained the same proportions of syntactic forms & morphological forms that standard science writing contains, with just a few exceptions, and she was able to explain why these exceptions might occur in biology writing.
    I personally wish we did more with corpora in ESL.
    I'm enjoying this conversation because it seems to be bringing forth rather deep reflections from people on linguistics and psychology.

Gloria Sampson (1999)

What Barry and Geoff have raised with respect to concordances is that they intersect two fields--corpus linguistics and cognitive psychology.

In my remarks about concordancing I was trying to make a couple of points.

1. Since concordancing is limited to the analysis of text, since the language is abstracted from the conditions of use,it cannot reveal the discourse functions of textual forms.
2. Concordancing tells us a lot about text that is new and revealing, but we must not be blinded by it. Although corpus analysis provides a detailed profile of what people do with the language, it does not tell us everything about what people know. Chomsky, Quirk, Greenbaum and many others argue that we need to describe language not just in terms of the performed (as Sinclair, Biber, Willis,Lewis and even Johns at times, suggest) but in terms of the possible. The implication of Sinclair and Biber's argument is that what is not part of the corpus is not part of competence, and this is surely far too narrow a view,
which seems to hark back to the behaviourist approach. Greenbaun says "We cannot expect that a corpus, however large, will always display an adequate number of examples.... We cannot know that our sampling is sufficiently large or sufficiently representative to be confident that the absence or rarity of a feature is significant." (Greenbaum, 1988)Significant, that is, of what users know as opposed to what they do. Widdowson points out that in discourse analysis there is increasing recognition of the importance of participant rather than observer perspective. To the extent that those engaged in discourse analysis define observable data in terms of participant experience and recognise the psycho-sociological influences behind the observable behaviour, they too see the actual language as evidence for realities beyond it.
The problem is how do we get at this linguistic cognition, without having to depend on the unreliable and unrepresentative intuitions of the analyst? While the description of utterances is based on empirical observation, it is obviously far more difficult to describe linguistic cognition, since one is forced to rely on introspection. Conceptual elicitation was one stab at it. Many years ago (75???) Rosch devised a questionnaire to elicit from subjects the word which first sprang to mind as an example of a particular category. The results of this conceptual elicitation showed that subjects consistently chose the same hyponym for a particular category: given the superordinate "bird", "robin" was elicited, the word "vegetable" consistently elicited "pea", and so on. The results did not coincide with frequency profiles, and are evidence of a "mental lexicon" that concordancers cannot reach. In summary, the description of language that emerges from concordance-based text analysis has its limitations, and there are good reasons for adopting a wider view. The most important reason is that the wider view allows us to recommend richer criteria on which to base pedagogical practice.
3. Even with its limitations, concordancing is fascinating, and I agree with Tim Johns that it is a great tool in learners' hands. By asking learners to investigate the language with the help of a concordancer and suitable corpora, we encourage them to discover things about the language, one "thing" being that the "traditional" division between lexis and syntax is in fact a very fuzzy line.
4. It's interesting to see Gloria say that people "create" corpora, and of course it's true that they do, since selection is inevitable. But I think that once a corpus gets beyond 300 million words, includes a decent spoken element, and is well-tagged (all of which is true of the British Corpus) then the users are free to ask a pretty wide range of questions!

Geoff Jordan (1999)