Article 14491 of sci.lang: From: fass@cs.sfu.ca (Dan Fass) Subject: Re: Basic English List Organization: Faculty of Applied Science, Simon Fraser University Date: Thu, 27 Oct 1994 19:41:56 GMT > Could anyone direct me to or send me a list of the 850 Basic English > words ... created by C.K. Ogden. Here's the list; Mark Robert Thorson posted it on sci.lang a while back (in 1989). - Dan Fass Natural Language Laboratory, Simon Fraser University From: mmm@cup.portal.com (Mark Robert Thorson) Newsgroups: sci.lang Subject: Re: Artificial Language References? Date: 2 Sep 89 06:05:09 GMT There are five main parts to the Basic English word list: operators, things (general), things (pictured), qualities (general), qualities (opposite) The 100 operators are: come get give go keep let make put seem take be do have say see send may will about across after against among at before between by down from in off on over through to under up with as for of till than a the all any every no other some such that this I he you who and because but or if through while how when where why again ever far forward hear near now out still there then together well almost enough even little much not only quite so very tomorrow yesterday north south east west please yes The 400 general things are: account act addition adjustment advertisement agreement air amount amusement animal answer apparatus approval argument art attack attempt attention attraction authority back balance base behaviour belief birth bit bite blood blow body brass bread breath brother building burn burst business butter canvas care cause chalk chance change cloth coal colour comfort committee company comparison competition condition connection control cook copper copy cork cotton cough country cover crack credit crime crush cry current curve damage danger daughter day death debt decision degree design desire destruction detail development digestion direction discovery discussion disease disgust distance distribution division doubt drink driving dust earth edge education effect end error event example exchange existence expansion experience expert fact fall family father fear feeling fiction field fight fire flame flight flower fold food force form friend front fruit glass gold government grain grass grip group growth guide harbour harmony hate hearing heat help history hole hope hour humour ice idea impulse increase industry ink insect instrument insurance interest invention iron jelly join journey judge jump kick kiss knowledge land language laugh law lead learning leather letter level lift light limit linen liquid list look loss love machine man manager mark market mass meal measure meat meeting memory metal middle milk mind mine minute mist money month morning mother motion mountain move music name nation need news night noise note number observation offer oil operation opinion order organization ornament owner page pain paint paper part paste payment peace person place plant play pleasure poison point polish porter position powder power price print process produce profit property prose protest pull punishment purpose push quality question rain range rate ray reaction reading reason record regret relation religion representative request respect rest reward rhythm rice river road roll room rub rule run salt sand scale science sea seat secretary selection self sense servant sex shade shake shame shock side sign silk silver sister size sky sleep slip slope smash smell smile smoke sneeze snow soap society son song sort sound soup space stage start statement steam steel step stitch stone stop story stretch structure substance sugar suggestion summer support surprise swim system talk taste tax teaching tendency test theory thing thought thunder time tin top touch trade transport trick trouble turn twist unit use value verse vessel view voice walk war wash waste water wave wax way weather week weight ind wine winter woman wood wool word work wound writing year The 200 picturable things are: angle ant apple arch arm army baby bag ball band basin basket bath bed bee bell berry bird blade board boat bone book boot bottle box boy brain brake branch brick bridge brush bucket bulb button cake camera card carriage cart cat chain cheese chest chin church circle clock cloud coat collar comb cord cow cup curtain cushion dog door drain drawer dress drop ear egg engine eye face farm feather finger fish flag floor fly foot fork fowl frame garden girl glove goat gun hair hammer hand hat head heart hook horn horse hospital house island jewel kettle key knee knife knot leaf leg library line lip lock map match monkey moon mouth muscle nail neck needle nerve net nose nut office orange oven parcel pen pencil picture pig pin pipe plane plate plough pocket pot potato prison pump rail rat receipt ring rod roof root sail school scissors screw seed sheep shelf ship shirt shoe skin skirt snake sock spade sponge spoon spring square stamp star station stem stick stocking stomach store street sun table tail thread throat thumb ticket toe tongue tooth town train tray tree trousers umbrella wall watch wheel whip whistle window wing wire worm The 100 qualities are: able acid angry automatic beautiful black boiling bright broken brown cheap chemical chief clean clear common complex conscious cup deep dependent early elastic electric equal fat fertile first fixed flat free frequent full general good great grey hanging happy hard healthy high hollow important kind like living long male married material medical military natural necessary new normal open parallel past physical political poor possible present private probable quick quiet ready red regular responsible right round same second separate serious sharp smooth sticky stiff straight strong sudden sweet tall thick tight tired true violent waiting warm wet wide wise yellow young The 50 opposites are: awake bad bent bitter blue certain cold complete cruel dark dead dear delicate different dirty dry false feeble female foolish future green ill last late left loose loud low mixed narrow old opposite public rough sad safe secret short shut simple slow small soft solid special strange thin white wrong The rules are: plurals as usual with '-s'or '-es' nouns with '-er', such as plant -> planter nouns with '-ing' such as plant -> planting adjectives with '-ed' such as plant -> planted adverbs with '-ly' such as quick -> quickly negation with 'un-' such as common -> uncommon compound words such as wastebasket and chalkboard When you have the thought = "Would it be good I be learning the ESPERANTO language?" Please at the same time have the thought = "Why not BASIC ENGLISH--THE INTERNATIONAL SECOND LANGUAGE?" BASIC has a much smaller number of words and rules. 850 words. BASIC is clear and simple. Article 76556 of alt.usage.english: From: J.Wexler@ed.ac.uk (John Wexler) Subject: Re: Basic English word list Date: 26 Jan 1996 17:43:13 GMT Organization: The University of Edinburgh jajones@sable.ox.ac.uk (Jonathan Jones) and Keith C. Ivey <kcivey@cpcug.org> have pointed out some potential problems with Basic English. I'm not qualified to comment on them. But I thought people might be interested in a sample: "In another minute we were face to face, I and this delicate thing out of the Future. He came straight up to me, laughing into my eyes. What took my attention first was that he gave no sign of fear. Then, turning to the two others who were at his back, he said something to them in a strange and very sweet and liquid tongue. "There were others coming, and in a short time a little group of possibly eight or ten of these surprisingly beautiful beings were about me. One of them said something to me. It came into my head, strangely enough, that my voice would seem very rough and deep to them. So I made signs to him, shaking my head, pointing to my ears, and then shaking my head again. He took a step forward, seemed to be in doubt for a little, and then gave my hand a touch. Then there came other soft little feelers touching my back and arms. They were seeing if I was truly a living being. There was no cause for fear in any of this. In fact there was something in these sweet little beings which gave one a sense of peace - they did everything so smoothly and quietly, and were as simple and natural as boys and girls. And then they seemed so feeble and delicate that I was certain I was strong enough to get the better of all twelve of them if necessary." Article 25871 of sci.lang: From: Annette Padfield <Annette@vannin.demon.co.uk> Subject: 100 core words, as requested [was: Average vocabulary size?] Date: Wed, 08 Nov 95 00:34:26 GMT Organization: Home Here is the list of 100 words of the Basic Core Vocabulary, as set out by Morris Swadesh, from "Archaeology and Language" by Colin Renfrew: I, you, we, this, that, who, what, not, all, many, one, two, big, long, small, woman, man, person, fish, bird, dog, louse, tree, seed, leaf, root, bark, skin, flesh, blood, bone, grease, egg, horn, tail, feather, hair, head, ear, eye, nose, mouth, tooth, tongue, claw, foot, knee, hand, belly, neck, breasts, heart, liver, drink, eat, bite, see, hear, know, sleep, die, kill, swim, fly, walk, come, lie, sit, stand, give, say, sun, moon, star, water, rain, stone, sand, earth, cloud, smoke, fire, ash, burn, path, mountain, red, green, yellow, white, black, night, hot, cold, full, new, good, round, dry, name. I wonder how he decided which words to put in and which to leave out? Apparently his original list was 200 words, but he thinned it out. -- Annette |\ |/ |/ |\/| /|\ /|\ |\/| , |\/ |\ |\/| |/ | |\/| |\ |\/| Padfield |\ /| /| | | | | | | ' |/\ |\ |/\| |/ | | | | |/\| Article 1979 of sci.lang: From: gburnage@natcorp.ox.ac.uk (Gavin Burnage) Subject: Re: Online English Word Frequency list? Date: 3 Nov 92 12:12:12 GMT Organization: BNC, Oxford University, UK In article <1992Oct29.170240.28464@msuinfo.cl.msu.edu> cook@cpsin2.cps.msu.edu (Thomas E Cook) writes: >Can anyone tell me if there is a good English word frequency list >available online? > >Thanks. > >--Tom Cook (cook@cps.msu.edu) CELEX, the Dutch national Centre for Lexical Information based at the University of Nijmegen, provides frequency information for British English derived from the COBUILD corpus of the University of Birmingham. Raw figures for the frequency of all the types in the corpus are the most basic frequency information available. More usefully, figures are also available for the frequency of each wordform (flectional form) and for the frequency of each lemma (headword, or family of inflectional forms). These figures were arrived at by disambiguating samples of the corpus by hand, then estimating frequencies for the whole corpus, and calculating the statistical accuracy of those estimates. Such figures are available for the whole corpus (17.9 million), the written part of the corpus (16.6 million words), and the spoken part of the corpus (1.3 million), expressed simply as the number of occurrences in each (sub) corpus. They are also available scaled down to what they would have been in a 1 million word corpus (useful for checking written frequency against spoken frequency, and for checking this corpus against others, notably Brown and LOB), and in logarithmic form. Until recently, CELEX existed to provide researchers at home and abroad with such information. Some charges were made to fund the continued existence and development of the centre. The current situation is not clear, however, and in the near future CELEX data may not be available to external users. The best thing to do is write to Richard Piepenbrock at CELEX who can help you with more information and news about current usage of CELEX data: RICHARD@CELEX.KUN.NL Regards, Gavin. ==== Gavin Burnage gburnage@natcorp.ox.ac.uk British National Corpus gburnage@vax.ox.ac.uk Oxford University Computing Services 13 Banbury Road 0865-273280 (work) OXFORD OX2 6NN 0865-273275 (fax) Article 5540 of alt.usage.english: From: gtoal@pizzabox.demon.co.uk (Graham Toal) Subject: Re: Need Help finding common words Keywords: common english words Organization: Cuddlehogs Anonymous Date: Sun, 17 Jan 1993 07:20:00 GMT In article <C0zF1H.Hpp@boi.hp.com> msloane@boi.hp.com (Mike Sloane) writes: :Can anyone tell me how to find the most commonly used words of the english :language? I need the top 500 to 1000 most prevalent words. I could also use :the same stats for Spanish, French, and German. Fetch the MRCD from black.ox.ac.uk and use the program provided to output pairs of <word, freq>, then sort +n G Article 5826 of sci.lang: From: bew@cix.compulink.co.uk (Brian Wilkins) Subject: Word frequency effects Date: Sat, 31 Jul 1993 13:26:38 +0000 According to _Sprachstatistik_, 1968, edited by P M Alexejew, W M Kalinin & R G Piotrowski, the 20 most frequently occurring words in written English are: the of to in and a for was is that on at he with by be it an as his The 20 most common words in the London-Lund corpus of spoken conversation are: the and I to of a you that in it is yes was this but on well he have for >>>MATRIX version 1.23a Article 35600 of alt.usage.english: From: misrael@scripps.edu (Mark Israel) Subject: Re: Most commonly used English words Date: 8 Dec 1994 06:23:15 GMT Organization: The Scripps Research Institute, La Jolla, California, USA In article <3c4s5o$jni@carina.unm.edu>, jdlucero@unm.edu (Joyce Diane Lucero) writes: > I am looking for a list of the most commonly used words in the > U.S. English language. Does anyone know where I might find such > a list? Thanks. Peter Kauffner sent me some info on this for the FAQ: ---- According to the _Guinness Book of Records_, the commonest word in written English is "the," followed by: of, and, to, a, in, that, is, I, it, for, as. The commonest word in spoken English is "I." ---- Forget the "commonest words" list from _Guinness_ I sent you last time. _Guinness_ hasn't updated the entry in over twenty years and I've found a much better source. Its _Frequency Analysis of English Vocabulary and Grammar: Based on the LOB Corpus_ (1989) by Stig Johansson and Knut Hofland. The LOB Corpus is a large database of English-language materials published in 1961. Here are the top eighteen words and their frequencies: 1. the 68315 2. of 35716 3. and 27856 4. to 26760 5. a 22744 6. in 21108 7. that 11188 8. is 10978 9. was 10499 10. it 10010 11. for 9299 12. he 8776 13. as 7337 14. with 7197 15. be 7186 16. on 7027 17. I 6696 18. his 6266 The book goes on to give the top 50 words in this manner. I can e-mail the rest to you in another week or so if you are interested. One thing I found surprising about this list is that the frequency given for "the" is so much higher than that for any other word. Yet in the KJV Bible, the most common word is not "the," but "and," according to _Guinness_. ---- Here is a list of the top 300 words in order of frequency and in groups of 100. My source is _The American Heritage Word Frequency Book_ by John B. Carroll, Peter Davies, and Barry Richman (Houghton Mifflin, 1971, ISBN 0-395-13570-2): the of and a to in is you that it he for was on are as with his they at be this from I have or by one had not but what all were when we there can an your which their said if do will each about how up out them then she many some so these would other into has more her two like him see time could no make than first been its who now people my made over did down only way find use may water long little very after words called just where most know get through back much before go good new write out used me man too any day same right look think also around another came come work three word must because does part even place well such here take why things help put years different away again off went old number great tell men say small every found still between name should Mr home big give air line set own under read last never us left end along while might next sound below saw something thought both few those always looked show large often together asked house don't world going want school important until 1 form food keep children feet land side without boy once animals life enough took sometimes four head above kind began almost live page got earth need far hand high year mother light parts country father let night following 2 picture being study second eyes soon times story boys since white days ever paper hard near sentence better best across during today others however sure means knew it's try told young miles sun ways thing whole hear example heard several change answer room sea against top turned 3 learn point city play toward five using himself usually Peter Kauffner Minneapolis, Minnesota kauffner@mermaid.micro.umn.edu Article 56562 of alt.usage.english: From: Lexik@highlands.com (Barnhart) Subject: Re: List of most frequently used English words???? Date: 19 Jul 1995 15:16:28 GMT Organization: The Highlands Chain Word frequency counts are useful but generally based on too small a corpus. The following have been useful at times in my work as a dictionary editor: Thorndike, Edward L. and Irving Lorge, The Teacher's Word Book of 30,000 Words. New York: Teachers College, Columbia University, 1944 Thorndike, Edward L., The Teaching of English Suffixes. New York: Teachers College, Columbia University, 1941 West, Michael, A General Service List of English Words. London: Longmans, 1953 Kucera, Henry and W. Nelson Francis, Computational Analysis of Present-Day American English. Providence, Rhode Island: Brown Univ. Press, 1967 Carroll, John B. et al., Word Frequency Book. Boston: Houghton Mifflin Company, 1971 The Thorndike/Lorge work is based upon 13,000,000 words of text. Kucera/Francis is based upon 1,000,000 words of text and Carroll et al. upon 5,000,000 words of text. They are each useful as far as the limitations of each study is concerned. Thorndike is a bit dated; however, Kucer/Francis generally confirms the frequencies in the core vocabulary items. Carroll et al. includes information about grade level. I hope this is helpful. Barnhart@Highlands.com Article 33343 of sci.lang: From: hexis@netcom.com (James C. Harrison) Subject: Re: Most Commonly Used Words Organization: NETCOM On-line Communication Services (408 261-4700 guest) Date: Mon, 8 Apr 1996 03:27:01 GMT Have I got the book for you, assuming Dover has kept it in print: An English, French, German, Spanish Word Frequency Dictionary: A correlation of the first six thousand words in four single-language frequency lists, compiled by Helen S. Eaton 1967. When I bought my copy, it cost $3.00... hexis Article 103579 of alt.usage.english: From: lee@sq.com (Liam R. E. Quin) Subject: Re: English word frequency Organization: SoftQuad Inc., Toronto, Canada Date: Sun, 11 Aug 1996 02:52:46 GMT Clive Young (clive.young@umist.ac.uk) wrote: > I'm looking for frequency-based English vocabulary lists. > What are the 1000/2000/3000 most commonly used English words? > Are such lists available on the Web somewhere? hughett@galton.psycha.upenn.edu (Paul Hughett) wrote: > I don't know of any such lists on-line but Nation (1990) cites several > such lists in printed form and evaluates their relative utility. She > also describes some of the hazards of taking such lists too literally. > Nation, I. S. P., ``Teaching and Learning Vocabulary'', > Newbury House Publishers, 1990 You might also like to see Susan Armstrong (Ed.), ``Using Large Corpora'', MIT Press 1993 which discusses some of the issues relating to generating and using such lists and other word-related information from sources of written or transcribed texts. Tom Collins <tcollins@inforamp.net> wrote: > For some years I have been using the Collins COBUILD series of > dictionaries,which give you information on word frequency. > THese dictionaries are based on a 200 million word computer > database and focus on the most commonly-used words-in all their > various forms, which is why I find it so useful. The latest > version of the dictionary uses a 1-5 rating method to show > relative frequency. I don't want to say anything against COBUILD, which by all accounts is excellent work, but I should say that if you're going to go as far as the 3,000 most common words, you really do have to choose your domain very carefully. Decide on whether you want `bland, unrecognisable English', `Literary English', `Newspaper English', `Technical English' and so forth. Decide whether you want American English reflected in your corpus, and if so to what extent. What about computer terms? In the King James Bible, `Shalt' is a fairly common word, but although it was probably generally common in 17th C. English, it isn't common today. With a 200 million word corpus, you'll find probably start to find a number of idioms that are common in spoken English just starting to be statistically significant; compare `strong man' which is a statistically significant collocation (t=2.0, but only 6 occurrences) in the 47 million word AP 1991 corpus [Armstrong, op.cit., Table 8 p.19, and text on p.18]. The best thing to do may be to measure the word frequencies in the actual data in which one is most interested. Text manipulation programs such as WordCruncher and TACT, and programming tools such as Unix shell, awk and perl are often used for this sort of work. There's a simple shell script given by Doug McIlroy of AT&T Bell Labs that looks something like (from memory) tr -cs '[a-zA-Z]' '\012' < input | # words one per line tr '[A-Z]' '[a-z]' | # convert to lower case sort | # collate repetitions together uniq -c | # count multiple occurrences sort -nr | # sort by frequency sed 400q > output # take the most common 4000 I'll leave the proper handling of words such as can't and o'clock to the interested reader :-) This approach does not attempt any morphological analysis or stemming. Another approach would be to use Porter's Algorithm (see any book on information retrieval, e.g. Salton or Frakes et.al.) to bring word forms more or less together (but without any tectual analysis or etymological exactitude). Finally, note that word lists like this are generally not very useful for spelling checkers (which is what many people want them for). Study the publicly available ispell program instead... although short lists are usually _much_ better for spelling checkers than long ones! E.g. the Shorter Oxford Wordlist (don't ask me for it, do a web search, it's out there) contains every word used by Milton... including the ones he used by mistake, as misspellings... :-) Commercial spelling checkers usually do have fairly large vocabularies -- typically in excess of 10,000 words -- but a lot of that is proper nouns, such as American cities, names of competing firms -- e.g. `Interleaf' :-) -- common first names, and often a great quantity of domain-specific terms, such as names of illnesses (Influenza, Percival, Priscilla), computing terms, and animals. Such as Fruitbats and Meerkats these days :-) Lee -- Liam Quin, SoftQuad Inc | lq-text freely available Unix text retrieval lee@sq.com +1 416 239 4801 | FAQs: Metafont fonts, OPEN LOOK UI, OpenWindows SGML: http://www.sq.com/ |`Consider yourself... one of the family... The barefoot programmer | consider yourself... At Home!' [the Artful Dodger] Article 5182 of sci.lang: From: Ron Hardin <rhhardin@mindspring.com> Subject: Re: On artificial languages Date: Fri, 03 Aug 2001 20:39:20 -0400 Organization: MindSpring Enterprises There's also Basic English. Wayne Booth writes in _Now, Don't Try to Reason with Me_ ``[when teaching an exchange student program in English] On the morning of July 8, I found in my pile of English suggestions [from the State Department] a copy of _Basic English Self-taught_ and carbon copies - to this day I do't know precisely where they came from - of three supplementary lists, an Economics List, a Business List, and a Poetry List. I am a little ashamed to confess that I ignored both the book and the supplementary lists for several days. I was - let me be frank - skeptical about the value of Basic English, perhaps because of an earlier brush with Esperanto.'' I am pleased to reproduce the _Poetic Annex to Basic English_ below as Booth gives it: angel, arrow beast, blind, bow, breast, bride, brow, bud calm, child, cross, crown, curse dawn, delight, dew, dove, dream eagle, eternal, evening, evil fair, faith, fate, feast, flock, flow, fountain, fox gentle, glad, glory, God, grace, grape, grief, guest hawk, heaven, hell, hill, holy, honey, honour image, ivory joy lamb, lark, life, lion, lord meadow, melody, mercy noble passion, perfume, pity, pool, praise, prayer, pride, priest, purple rapture, raven, robe, rock, rose, rush search, shining, shower, sorrow, soul, spear, spirit, storm, strength, sword their, tower, travel valley, veil, vine, violet, virtue, vision wandering, wealth, weariness, weeping, wisdom, wolf, wonder -- Ron Hardin rhhardin@mindspring.com On the internet, nobody knows you're a jerk.