Full-text computer-assisted research systems have become standard tools for searching large quantities of legal documents. Yet there remain questions as to the relative effectiveness of full-text searching. Mr. Dabney reviews recent research into these questions and discusses the implications the results for computer-assisted legal research systems. He concludes that the performance of currently available systems could be improved.
[']If men learn this, it will implant forgetfulness in their souls; they will cease to exercise memory because they rely on that which is written, calling things to remembrance no longer from within themselves, but by means of external marks. What you have discovered is a recipe not for memory, but for reminder. And it is no true wisdom that you offer your disciples, but only its semblance, for by telling them of many things without teaching them you will make them seem to know much, while for the most part they know nothing, and as men filled, not with wisdom, but with the conceit of wisdom, they will be a burden to their fellows.' 1

Securities Regulation
II. State Regulation (Blue Sky Laws)
(C)Offenses and Prosecutions
325. Criminal Prosecutions
327. -Evidence in general.
Traditional legal research procedures are rapidly proving inadequate to permit access to vast, continually expanding reservoirs of information. Based largely in the hierarchical organization of subject matter, manual research tools are effective only so long as the lawyer can easily tune in on the mental frequency of the person who indexed the information the lawyer seeks. While this system has previously been sufficient to meet most of lawyers' research needs, it has grown too cumbersome, too expensive and too rigid to accommodate practically and efficiently either the continuous influx of routine material or such new precedent as lawyers and judges are now formulating in evolving areas of law.13
boy |
child |
youth |
infant |
minor |
juvenile |
ten-year-old |
youngman |
son |
brother | ward |
student |
pupil |
victim |
witness |
plaintiff |
defendant |
appellant |
petitioner |
patient |
girl |
daughter |
ten-and-a-half-year-old |
Table 1

59,104 = total number of documents
6,027 = number of documents containing "school"
685 = number of documents containing "dog"
75 = number of documents containing "sniff!"31

| Considered Relevant | Recall | Confidence (95%) | Precision |
|---|---|---|---|
| 1. V + S + M | 19.99% | 78.97% | |
| 2. V + S | 25.30% | 56.58% | |
| 3. V only | 48.24% | 18.22% |
Here, appellant argues that the securities sold were exempt from the registration requirement by virtue of A.R.S. §§ 44-1843(8) and 44-1843(10). We note that A.R.S. § 44-2033 places the burden of proving the existence of an exemption upon the party raising the defense. [citing authority]51
In the West digests, this point of law is represented this way:
SECURITIES REGULATION
II. State Regulation (Blue Sky Laws)
(C) Offenses and Prosecutions
325. Criminal Prosecutions
327. -- Evidence in general.
Burden of proving existence of exemption from securities registration requirements is upon party raising the defense. A.R.S. §§ 44-1843, subds. 8, 10, 44-2033.
But the representation of this headnote on WESTLAW is this:
349k327
SECURITIES REGULATION
Burden of proving the existence of exemption from securities registration requirements is upon party raising the defense. A.R.S. §§ 44-1843, subds. 8, 10, 44-2033.
[T]hat's the strange thing about writing, which makes it truly analogous to painting. The painter's products stand before us as though they were alive, but if you question them, they maintain a most majestic silence. It is the same with written words; they seem to talk to you as though they were intelligent, but if you ask them anything about what they say,...they go on telling you just the same thing forever. And once a thing is put in writing, the composition, whatever it may be, drifts all over the place, getting into the hands not only of those who understand it, but equally of those who have no business with it; it doesn't know how to address the right people, and not address the wrong.60
* @ Daniel P. Dabney, 1986. This is an edited version of a paper presented at the 78th Annual Meeting of the American Association of Law Libraries, New York, New York, July 9, 1985. It is one of the winning articles in the 1985 Call for Papers competition addressed to newer law librarians.
** Reference Librarian, University of Texas Tarlton Law Library, Austin, Texas. The author would like to thank M.E. Maron and David Blair for their kindness in supplying an advance copy of their article and supplemental material, and Robert C. Betting and Roy M. Mersky for their support and encouragement. Editor's note: Responses to this article from representatives of Mead Data Central and West Publishing Company will be included in the next issue of Law Library Journal.
1. PLATO, PHAEDRUS 275a-b.
2. Blair & Maron, An Evaluation of Retrieval Effectiveness for a Full-Text Document-Retrieval System, 28 Com. A.C.M. 289 (1985) (publication of the Association for Computing Machinery).
3. See, e.g., D. SOERGEL, INDEXING LANGUAGES AND THESAURI: CONSTRUCTION AND MAINTENANCE, 45-50 (1974).
4. P. ENYINGI, M. LEMBKE & R. MITTAN, CATALOGING LEGAL LITERATURE: A MANUAL ON AACR2 AND LIBRARY OF CONGRESS SUBJECT HEADINGS FOR LEGAL MATERIALS 329 (1984).
5. See J. JACOBSTEIN & R. MERSKY, FUNDAMENTALS OF LEGAL RESEARCH 66 (3d ed. 1985)
6. AMERICAN LIBRARY ASSOCIATION, ANGLO-AMERICAN CATALOGING RULES 322 (2d ed. 1978).
7. P, ENYINCI, M. LEMBKE & R. MITTAN, supra note 4, at 358-59.
8. Another effect of subject authority control in indexing may be an influence on the substantive development of the subject of the collection. For example, some of the terms that might be used as subject headings have connotations that implicitly comment on the subject matter so indexed. Consider, for example, that generations of lawyers and judges have found law relating to employment relations under the heading "Master and Servant." This subject heading no doubt seemed reasonable to the legal community of the turn of the century when the heading was incorporated into the West key number system. A different segment of the society of that period might have found it reasonable to put such material under the heading "Toiler and Leech," and colored future perception of the topic in a different way. "Toiler and Leech" seems outrageous to us; "Master and Servant" seems merely archaic, but this is to a large extent the effect of familiarity. For an artful demonstration of differing levels of perceived prejudice in language, see D. HOFSTADTER, A Person Paper on Purity in Language, in METAMAGICAL THEMAS 159 (1985). Many indexing systems make some effort to eliminate bias in their subject headings. For example, between the Eighth and Ninth Decennial Digests, the topic "Bastards" was redesignated "Illegitimate Persons" and has since been changed to "Children-out-of-Wedlock. " The precoordination of subject headings in a thesaurus also may affect the development of the literature by making it appear that certain ideas go together and others do not.
9. The greatest objection to the older version of the Index to Legal Periodicals was that it was based on an inadequate thesaurus, one that contained too few subject headings to represent the topics covered by its collection. See, e.g., Report of the Subcommittee on the Index to Legal Periodicals, in PROCEEDINGS OF THE 1976 ANNUAL MEETING OF THE ASSOCIATION OF AMERICAN LAW SCHOOLS, Pt. 1, § 1, at 30, 33-34.
10. State v. Baumann, 125 Ariz. 404, 610 P.2d 38 (1980).
11. In the case used for the preceding illustrations, for example, the DWI is unhelpful. Securities Regulation key number 327 is not posted under "securities regulation ... .. exemptions," "evidence ... .. burden of proof ... .. registration," or "criminal law" in the DWI.
12. Some systems that do not have any mechanism for allowing the user to specify combinations of index terms in a query still claim to be postcoordinated if they permit the assignment of multiple index terms. Such systems might better be described as "uncoordinated."
13. Legal Research and the Computer (1975) (early promotional material from LEXIS).
14. For a general introduction to both the terminology and the substance of information retrieval, see A. FOSKETT, THE SUBJECT APPROACH TO INFORMATION (4th ed. 1982).
15. 28 U.S.C.A. 119-367 (1984).
16. Eldridge, An Appraisal of a Case Law Retrieval Project, in PROCEEDINGS OF THE COMPUTERS AND THE LAW CONFERENCE 1968 at 36, 41 (D. Johnston ed.).
17. Blair & Maron, supra note 2, at 293.
18. Some experts recommend that users of CALR systems use searches broad enough to achieve about 50 percent precision to achieve a satisfactory level of recall. Sprowl, WESTLAW vs LEXIS: Computer Assisted Legal Research Comes of Age, 15 PROGRAM 132, 135 (1981).
19. This is not to say that there is no indexing of any kind in a full-text data base. Documents added to a full-text system are posted to an inverted file (a "concordance") that serves as the index of the system and greatly facilitates its operation. In addition to the file inversion, which is mechanical, most full-text systems identify the inverted file postings as being from a particular part of the document. For example, data bases containing case law typically have "field" or "segment" indicators for the name of the case, the name of the authoring judge, the date of the decision, and so forth. This is not an entirely mechanical process and so can be considered a use of human indexing. However it is viewed, it is one of the most useful features of the system.
20. Even the fastest computers currently available cannot make a linear scan of a body of text as large as the National Reporter System. Full-text search requests are processed using an inverted file of the words appearing in the collection. This distinction is ordinarily invisible to the user of the system, but it does much to explain why the systems are designed as they are.
21. For a review of research in this area, see Waltz, The State of the Art in Natural-Language Understanding, in STRATEGIES FOR NATURAL LANGUAGE PROCESSING (1982).
22. Some work has been done on document retrieval systems that avoid this problem by translating the information contained in the collection into a form that a computer can process, a knowledge representation language. See, e.g., C. HAFNER, AN INFORMATION RETRIEVAL SYSTEM BASED ON A COMPUTER MODEL OF LEGAL KNOWLEDGE (1981).
23. This taxonomy of errors, and some of the examples cited to illustrate it, are taken from J. JACOBSTEIN & R. MERSKY, supra note 5, at 435-37.
24. The selection of elements for a search is discussed further in section VI of this paper.
25. LEXIS and WESTLAW both provide limited assistance to the searcher by having the computer automatically search for words closely related to search terms (such as regular plurals). An expansion of this capability to include less obvious synonyms is seen by some as being a primary means for improving the performance of full-text systems. See Bing, Third Generation Text Retrieval Systems, I J. L. & INFORMATION SCI. 183, 191-93 (1983).
26. 393 U.S. 503 (1969).
27. J. JACOBSTEIN & R. MERSKY, supra note 5, at 438-41
28. Here it is assumed that the various values of Ei r and Eif are independent of each other. The independence assumption is false for many (if not most) legal research problems, but the purpose of this discussion is not to provide a practical method for calculating recall and fallout, but rather to show that they vary together. A second assumption is that the connectors used to join the elements are simple "ANDs" rather than more sophisticated proximity connectors. It is not good search practice to use AND connectors, see J. JACOBSTEIN & R. MERSKY, id., but allowing for the effects of proximity connectors would add an unenlightening layer of complexity to this example.
29. It has been hypothesized that the distribution of words in natural language is roughly in proportion to the terms of the harmonic series, that is, that the Nth most common word in the language occurs I/N as often as the most common word. See G. ZIFF, HUMAN BEHAVIOR AND THE PRINCIPLE OF LEAST EFFORT: AN INTRODUCTION To HUMAN ECOLOGY (1949). If this is correct, the number of occurrences of common words in a data base increases much more quickly than the size of the lexicon. The most common words have virtually no value as search terms because they are so common, and full-text systems protect themselves against fruitless searching by making many common words part of an unsearchable "stop list." Even after the elimination of the stop list words, however, large full-text systems contain many words that are too common for productive searching.
30. J. JACOBSTEIN & R. MERSKY, supra note 5, at 438-41.
31. All of the figures for this example (including related searches in the New Mexico and federal data bases) were obtained from the LEXIS system from searches run in late April of 1984.
32. The figure used for the size of the GENFED-CASES data base (600,000) is an estimate based upon limited knowledge of the size of the similar data base in WESTLAW. See infra note 38. The author was unable to make LEXIS count the total number of cases in this file.
33. Relevance here was determined by the author. All cases that seemed fairly analogous were considered relevant, not just those sniffing dog cases that were on "all fours" with the hypothetical facts.
34. Manual research was limited to an examination of all of the cases cited in Annot. 31 A.L.R. Fed. 931 (1977) and its October 1984 pocket part. The topic of this annotation, "Use of Trained Dog to Detect Narcotics or Drugs as Unreasonable Search in Violation of Fourth Amendment," might be expected to cover all cases directly on point, but not all other relevant cases. This bias accounts for the concentration of unfound relevant cases in the GENFED-CASES "dog" and "sniff!" search.
35. Blair & Maron, supra note 2.
36. The study included a smaller test in which the requesting attorneys did the actual searching. Id. at 294-95.
37. Id at 298.
38. Swanson, Searching Natural Language Text by Computer, 132 SCIENCE 1099 (1960).
39. Salton, A New Comparison between Conventional Indexing (MEDLARS) and Automatic Text Processing (SMART), 23 J. Am. SOC'Y FOR INFORMATION Sci. 75 (1972); Salton, Automatic Text Analysis, 168 SCIENCE 335 (1970).
40. To get some indication of the size of the WESTLAW data bases, the author ran the search "BANC COURT MEMORANDUM TRIAL CASE LAW JJ JUDGE JUSTICE PER" in ALLSTATES on April 19, 1985. The search returned 1,066,550 cases. A similar query in ALLFEDS was aborted by the system after some 431,000 cases had been found.
41. For a critical appraisal of several of the seminal document retrieval experiments, see Swanson, Information Retrieval as a Trial-and-Error Process, 47 LIBR. Q. 128 (1977).
42. C. CLEVERDON, J. MILLS & M. KEEN, FACTORS DETERMINING THE PERFORMANCE OF INDEXING SYSTEMS (1966) (referred to in the literature as Cranfield 11).
43. Swanson, Some Unexplained Aspects of the Cranfield Tests of Indexing Performance Factors, 41 LIBR. Q. 223 (1971).
44. An acronym for Storage and Information Retrieval System/Thesaurus Linguistic System.
45. The state of New Mexico implemented a STAIRS-based legal research system containing New Mexico Statutes Annotated and recent New Mexico appellate decisions. The author made use of this system in 1979 and 1980 in his capacity as a judicial clerk and found the operation of the system functionally equivalent to that of WESTLAW.
46. See Swanson, supra note 38.
47. For this and the following point, the discussion is based in part on the author's correspondence with David Blair, who supplied information not in the published account of his experiment.
48. Blair & Maron, supra note 2, at 295-96.
49. See, e.g., J. JACOBSTEIN & R. MERSKY, supra note 5, at 438-41.
50. Coco, Full-Text vs. Full-Text Plus Editorial Additions: Comparative Retrieval Effectiveness of the LEXIS and WESTLAW Systems, LEGAL REFERENCE SERVICES Q., Summer 1984, at 27.
51. 125 Ariz. at 412, 610 P.2d at 46 (1980).
52. The full analysis of this number is SECURITIES REGULATION; IL State Regulation (Blue Sky Laws); (A) In General; 277 Renewal, modification, revocation, or suspension. Though the headings do not so indicate, it seems to deal with the licensing of securities dealers.
53. J. BOSWELL, THE LIFE OF SAMUEL JOHNSON, LL.D. 252 (London 1791).
54. In June of 1985, Mead announced that it would add "star paging" to its federal case law data bases. Mead planned to embed bracketed page numbers in the text of its cases so that users could tell from the LEXIS display the exact page on which the corresponding material can be found in paper copy. West has sued to prevent Mead from implementing star paging with respect to West's copyrighted publications and, at the time of this writing, has been awarded a preliminary injunction. West Publishing Company v. Mead Data Central, 616 F. Supp. 1571 (D. Minn. 1985).
55. The benefits of using "terms" ranking appear in this example. All ten of the target cases appear in the first fifty cases retrieved by terms ranking, but "age" ranking buries most of the target cases out of the first 100 cases, with the oldest case ranked as low as 324th.
56. The ease and speed of this technique are attested by the fact that about half of the 400 firstyear students trained in the use of CALR systems at the University of Texas Law School in the spring semester of 1985 were able to complete this problem in their first half-hour at a WESTLAW terminal. The students were told to start by finding the first target case in the Missouri cases data base (the Missouri target case is ranked near the top of the output for virtually any plausible search). A reference librarian helped in matters of terminal operation and, to a lesser extent, query formulation.
57. The annotations used are Offsetting Unemployment Benefits Received against A ward for Backpay in Employment Discrimination Actions, 66 A.L.R. FED. 880 (1984) and Waiver of Right to Trial by Jury as Affecting Right to Trial by Jury on Subsequent Trial of Same Case in Federal Court, 66 A.L.R. FED. 859 (1984). The issue covered by the latter annotation has been noted as being particularly ill-suited to CALR. J. JACOBSTEIN & R. MERSKY, supra note 5, at 436. This is not a fair comparison between manual research techniques and CALR. The time and effort that goes into the creation of an A LR annotation is much greater than anyone might be expected to give a CALR search. For a checklist that shows the depth of research that goes into an ALR annotation, see the last leaf of What is the Difference between Owning Lawbooks and Owning a "System" of Legal Research?, a promotional booklet distributed by the Lawyers Cooperative Publishing Company.
58. See, e.g., Golden Eagle l3kributing Cup. v. Burmughs Cory, 103 M.D. 124 (N.D. Cal. 1984), in which sanctions were ordered against a law firm for failing to cite pertinent subsequent authority in an argument.
59. It is ironic that some of these suggested improvements are similar to features of the WESTLAW system that have been abandoned. WESTLAW used to have an on-line listing of all searchable words contained in its data bases, together with figures showing the frequency of appearance of each. WESTLAW also used to promote the use of a nonboolean search logic Mat accepted lists of search terms and ranked the output according to the number and frequency of the occurrence of the search terms. Finally, WESTLAW also used to depend entirely upon human abstracting (in the form of headnotes) for retrieval. An examination of the reasons for the changes in WESTLAW would be instructive.
60. PLATO, PHAEDRUS 275d-e.