Article 14491 of sci.lang:
From: fass@cs.sfu.ca (Dan Fass)
Subject: Re: Basic English List
Organization: Faculty of Applied Science, Simon Fraser University
Date: Thu, 27 Oct 1994 19:41:56 GMT
> Could anyone direct me to or send me a list of the 850 Basic English
> words ... created by C.K. Ogden.
Here's the list; Mark Robert Thorson posted it on sci.lang a while back
(in 1989).
- Dan Fass
Natural Language Laboratory, Simon Fraser University
From: mmm@cup.portal.com (Mark Robert Thorson)
Newsgroups: sci.lang
Subject: Re: Artificial Language References?
Date: 2 Sep 89 06:05:09 GMT
There are five main parts to the Basic English word list: operators, things
(general), things (pictured), qualities (general), qualities (opposite)
The 100 operators are:
come get give go keep let make put seem take be do have say see send may will
about across after against among at before between by down from in off on over
through to under up with as for of till than a the all any every no other some
such that this I he you who and because but or if through while how when where
why again ever far forward hear near now out still there then together well
almost enough even little much not only quite so very tomorrow yesterday north
south east west please yes
The 400 general things are:
account act addition adjustment advertisement agreement air amount amusement
animal answer apparatus approval argument art attack attempt attention
attraction authority back balance base behaviour belief birth bit bite blood
blow body brass bread breath brother building burn burst business butter canvas
care cause chalk chance change cloth coal colour comfort committee company
comparison competition condition connection control cook copper copy cork
cotton cough country cover crack credit crime crush cry current curve damage
danger daughter day death debt decision degree design desire destruction detail
development digestion direction discovery discussion disease disgust distance
distribution division doubt drink driving dust earth edge education effect end
error event example exchange existence expansion experience expert fact fall
family father fear feeling fiction field fight fire flame flight flower fold
food force form friend front fruit glass gold government grain grass grip group
growth guide harbour harmony hate hearing heat help history hole hope hour
humour ice idea impulse increase industry ink insect instrument insurance
interest invention iron jelly join journey judge jump kick kiss knowledge land
language laugh law lead learning leather letter level lift light limit linen
liquid list look loss love machine man manager mark market mass meal measure
meat meeting memory metal middle milk mind mine minute mist money month morning
mother motion mountain move music name nation need news night noise note number
observation offer oil operation opinion order organization ornament owner page
pain paint paper part paste payment peace person place plant play pleasure
poison point polish porter position powder power price print process produce
profit property prose protest pull punishment purpose push quality question
rain range rate ray reaction reading reason record regret relation religion
representative request respect rest reward rhythm rice river road roll room rub
rule run salt sand scale science sea seat secretary selection self sense
servant sex shade shake shame shock side sign silk silver sister size sky
sleep slip slope smash smell smile smoke sneeze snow soap society son song
sort sound soup space stage start statement steam steel step stitch stone stop
story stretch structure substance sugar suggestion summer support surprise
swim system talk taste tax teaching tendency test theory thing thought thunder
time tin top touch trade transport trick trouble turn twist unit use value
verse vessel view voice walk war wash waste water wave wax way weather week
weight ind wine winter woman wood wool word work wound writing year
The 200 picturable things are:
angle ant apple arch arm army baby bag ball band basin basket bath bed bee bell
berry bird blade board boat bone book boot bottle box boy brain brake branch
brick bridge brush bucket bulb button cake camera card carriage cart cat chain
cheese chest chin church circle clock cloud coat collar comb cord cow cup
curtain cushion dog door drain drawer dress drop ear egg engine eye face farm
feather finger fish flag floor fly foot fork fowl frame garden girl glove goat
gun hair hammer hand hat head heart hook horn horse hospital house island jewel
kettle key knee knife knot leaf leg library line lip lock map match monkey moon
mouth muscle nail neck needle nerve net nose nut office orange oven parcel pen
pencil picture pig pin pipe plane plate plough pocket pot potato prison pump
rail rat receipt ring rod roof root sail school scissors screw seed sheep shelf
ship shirt shoe skin skirt snake sock spade sponge spoon spring square stamp
star station stem stick stocking stomach store street sun table tail thread
throat thumb ticket toe tongue tooth town train tray tree trousers umbrella
wall watch wheel whip whistle window wing wire worm
The 100 qualities are:
able acid angry automatic beautiful black boiling bright broken brown cheap
chemical chief clean clear common complex conscious cup deep dependent early
elastic electric equal fat fertile first fixed flat free frequent full general
good great grey hanging happy hard healthy high hollow important kind like
living long male married material medical military natural necessary new normal
open parallel past physical political poor possible present private probable
quick quiet ready red regular responsible right round same second separate
serious sharp smooth sticky stiff straight strong sudden sweet tall thick tight
tired true violent waiting warm wet wide wise yellow young
The 50 opposites are:
awake bad bent bitter blue certain cold complete cruel dark dead dear delicate
different dirty dry false feeble female foolish future green ill last late left
loose loud low mixed narrow old opposite public rough sad safe secret short
shut simple slow small soft solid special strange thin white wrong
The rules are:
plurals as usual with '-s'or '-es'
nouns with '-er', such as plant -> planter
nouns with '-ing' such as plant -> planting
adjectives with '-ed' such as plant -> planted
adverbs with '-ly' such as quick -> quickly
negation with 'un-' such as common -> uncommon
compound words such as wastebasket and chalkboard
When you have the thought = "Would it be good I be learning the ESPERANTO
language?"
Please at the same time have the thought = "Why not BASIC ENGLISH--THE
INTERNATIONAL SECOND LANGUAGE?"
BASIC has a much smaller number of words and rules. 850 words. BASIC is clear
and simple.
Article 76556 of alt.usage.english:
From: J.Wexler@ed.ac.uk (John Wexler)
Subject: Re: Basic English word list
Date: 26 Jan 1996 17:43:13 GMT
Organization: The University of Edinburgh
jajones@sable.ox.ac.uk (Jonathan Jones)
and Keith C. Ivey
have pointed out some potential problems with Basic English. I'm not
qualified to comment on them. But I thought people might be interested
in a sample:
"In another minute we were face to face, I and this delicate thing out of
the Future. He came straight up to me, laughing into my eyes. What
took my attention first was that he gave no sign of fear. Then, turning
to the two others who were at his back, he said something to them in a
strange and very sweet and liquid tongue.
"There were others coming, and in a short time a little group of possibly
eight or ten of these surprisingly beautiful beings were about me. One
of them said something to me. It came into my head, strangely enough,
that my voice would seem very rough and deep to them. So I made signs
to him, shaking my head, pointing to my ears, and then shaking my head
again. He took a step forward, seemed to be in doubt for a little, and
then gave my hand a touch. Then there came other soft little feelers
touching my back and arms. They were seeing if I was truly a living
being. There was no cause for fear in any of this. In fact there was
something in these sweet little beings which gave one a sense of peace -
they did everything so smoothly and quietly, and were as simple and
natural as boys and girls. And then they seemed so feeble and delicate
that I was certain I was strong enough to get the better of all twelve
of them if necessary."
Article 25871 of sci.lang:
From: Annette Padfield
Subject: 100 core words, as requested [was: Average vocabulary size?]
Date: Wed, 08 Nov 95 00:34:26 GMT
Organization: Home
Here is the list of 100 words of the Basic Core Vocabulary, as set out by
Morris Swadesh, from "Archaeology and Language" by Colin Renfrew:
I, you, we, this, that, who, what, not, all, many, one, two, big, long,
small, woman, man, person, fish, bird, dog, louse, tree, seed, leaf, root,
bark, skin, flesh, blood, bone, grease, egg, horn, tail, feather, hair,
head, ear, eye, nose, mouth, tooth, tongue, claw, foot, knee, hand, belly,
neck, breasts, heart, liver, drink, eat, bite, see, hear, know, sleep, die,
kill, swim, fly, walk, come, lie, sit, stand, give, say, sun, moon, star,
water, rain, stone, sand, earth, cloud, smoke, fire, ash, burn, path,
mountain, red, green, yellow, white, black, night, hot, cold, full, new,
good, round, dry, name.
I wonder how he decided which words to put in and which to leave out?
Apparently his original list was 200 words, but he thinned it out.
--
Annette |\ |/ |/ |\/| /|\ /|\ |\/| , |\/ |\ |\/| |/ | |\/| |\ |\/|
Padfield |\ /| /| | | | | | | ' |/\ |\ |/\| |/ | | | | |/\|
Article 1979 of sci.lang:
From: gburnage@natcorp.ox.ac.uk (Gavin Burnage)
Subject: Re: Online English Word Frequency list?
Date: 3 Nov 92 12:12:12 GMT
Organization: BNC, Oxford University, UK
In article <1992Oct29.170240.28464@msuinfo.cl.msu.edu> cook@cpsin2.cps.msu.edu (Thomas E Cook) writes:
>Can anyone tell me if there is a good English word frequency list
>available online?
>
>Thanks.
>
>--Tom Cook (cook@cps.msu.edu)
CELEX, the Dutch national Centre for Lexical Information based at the
University of Nijmegen, provides frequency information for British
English derived from the COBUILD corpus of the University of Birmingham.
Raw figures for the frequency of all the types in the corpus are the
most basic frequency information available. More usefully, figures are
also available for the frequency of each wordform (flectional form)
and for the frequency of each lemma (headword, or family of
inflectional forms). These figures were arrived at by disambiguating
samples of the corpus by hand, then estimating frequencies for the
whole corpus, and calculating the statistical accuracy of those
estimates.
Such figures are available for the whole corpus (17.9 million), the
written part of the corpus (16.6 million words), and the spoken
part of the corpus (1.3 million), expressed simply as the number of
occurrences in each (sub) corpus. They are also available scaled down
to what they would have been in a 1 million word corpus (useful
for checking written frequency against spoken frequency, and for
checking this corpus against others, notably Brown and LOB), and
in logarithmic form.
Until recently, CELEX existed to provide researchers at home and abroad
with such information. Some charges were made to fund the continued
existence and development of the centre. The current situation is not
clear, however, and in the near future CELEX data may not be
available to external users. The best thing to do is write to
Richard Piepenbrock at CELEX who can help you with more information
and news about current usage of CELEX data: RICHARD@CELEX.KUN.NL
Regards,
Gavin.
====
Gavin Burnage gburnage@natcorp.ox.ac.uk
British National Corpus gburnage@vax.ox.ac.uk
Oxford University Computing Services
13 Banbury Road 0865-273280 (work)
OXFORD OX2 6NN 0865-273275 (fax)
Article 5540 of alt.usage.english:
From: gtoal@pizzabox.demon.co.uk (Graham Toal)
Subject: Re: Need Help finding common words
Keywords: common english words
Organization: Cuddlehogs Anonymous
Date: Sun, 17 Jan 1993 07:20:00 GMT
In article msloane@boi.hp.com (Mike Sloane) writes:
:Can anyone tell me how to find the most commonly used words of the english
:language? I need the top 500 to 1000 most prevalent words. I could also use
:the same stats for Spanish, French, and German.
Fetch the MRCD from black.ox.ac.uk and use the program provided to output
pairs of , then sort +n
G
Article 5826 of sci.lang:
From: bew@cix.compulink.co.uk (Brian Wilkins)
Subject: Word frequency effects
Date: Sat, 31 Jul 1993 13:26:38 +0000
According to _Sprachstatistik_, 1968, edited by P M Alexejew, W M Kalinin & R
G Piotrowski, the 20 most frequently occurring words in written English are:
the of to in and a for was is that on at he with by be it an as his
The 20 most common words in the London-Lund corpus of spoken conversation
are:
the and I to of a you that in it is yes was this but on well he have for
>>>MATRIX version 1.23a
Article 35600 of alt.usage.english:
From: misrael@scripps.edu (Mark Israel)
Subject: Re: Most commonly used English words
Date: 8 Dec 1994 06:23:15 GMT
Organization: The Scripps Research Institute, La Jolla, California, USA
In article <3c4s5o$jni@carina.unm.edu>, jdlucero@unm.edu (Joyce Diane Lucero) writes:
> I am looking for a list of the most commonly used words in the
> U.S. English language. Does anyone know where I might find such
> a list? Thanks.
Peter Kauffner sent me some info on this for the FAQ:
----
According to the _Guinness Book of Records_, the commonest word in written
English is "the," followed by: of, and, to, a, in, that, is, I, it, for,
as. The commonest word in spoken English is "I."
----
Forget the "commonest words" list from _Guinness_ I sent you last time.
_Guinness_ hasn't updated the entry in over twenty years and I've found a
much better source. Its _Frequency Analysis of English Vocabulary and
Grammar: Based on the LOB Corpus_ (1989) by Stig Johansson and Knut
Hofland. The LOB Corpus is a large database of English-language materials
published in 1961. Here are the top eighteen words and their frequencies:
1. the 68315
2. of 35716
3. and 27856
4. to 26760
5. a 22744
6. in 21108
7. that 11188
8. is 10978
9. was 10499
10. it 10010
11. for 9299
12. he 8776
13. as 7337
14. with 7197
15. be 7186
16. on 7027
17. I 6696
18. his 6266
The book goes on to give the top 50 words in this manner. I can e-mail the
rest to you in another week or so if you are interested. One thing I found
surprising about this list is that the frequency given for "the" is so
much higher than that for any other word. Yet in the KJV Bible, the most
common word is not "the," but "and," according to _Guinness_.
----
Here is a list of the top 300 words in order of frequency and in groups of
100. My source is _The American Heritage Word Frequency Book_ by John B.
Carroll, Peter Davies, and Barry Richman (Houghton Mifflin, 1971, ISBN
0-395-13570-2):
the of and a to in is you that it he for was on are as with his they at be
this from I have or by one had not but what all were when we there can an
your which their said if do will each about how up out them then she many
some so these would other into has more her two like him see time could no
make than first been its who now people my made over did down only way
find use may water long little very after words called just where most know
get through back much before go good new write out used me man too any day
same right look think also around another came come work three word must
because does part even place well such here take why things help put years
different away again off went old number great tell men say small every
found still between name should Mr home big give air line set own under
read last never us left end along while might next sound below saw
something thought both few those always looked show large often together
asked house don't world going want
school important until 1 form food keep children feet land side without
boy once animals life enough took sometimes four head above kind began
almost live page got earth need far hand high year mother light parts
country father let night following 2 picture being study second eyes soon
times story boys since white days ever paper hard near sentence better
best across during today others however sure means knew it's try told
young miles sun ways thing whole hear example heard several change answer
room sea against top turned 3 learn point city play toward five using
himself usually
Peter Kauffner
Minneapolis, Minnesota kauffner@mermaid.micro.umn.edu
Article 56562 of alt.usage.english:
From: Lexik@highlands.com (Barnhart)
Subject: Re: List of most frequently used English words????
Date: 19 Jul 1995 15:16:28 GMT
Organization: The Highlands Chain
Word frequency counts are useful but generally based on too small a corpus.
The following have been useful at times in my work as a dictionary editor:
Thorndike, Edward L. and Irving Lorge, The Teacher's Word Book of 30,000
Words. New York: Teachers College, Columbia University, 1944
Thorndike, Edward L., The Teaching of English Suffixes. New York: Teachers
College, Columbia University, 1941
West, Michael, A General Service List of English Words. London: Longmans,
1953
Kucera, Henry and W. Nelson Francis, Computational Analysis of Present-Day
American English. Providence, Rhode Island: Brown Univ. Press, 1967
Carroll, John B. et al., Word Frequency Book. Boston: Houghton Mifflin
Company, 1971
The Thorndike/Lorge work is based upon 13,000,000 words of text.
Kucera/Francis is based upon 1,000,000 words of text and Carroll et al. upon
5,000,000 words of text. They are each useful as far as the limitations of
each study is concerned. Thorndike is a bit dated; however, Kucer/Francis
generally confirms the frequencies in the core vocabulary items. Carroll et
al. includes information about grade level.
I hope this is helpful.
Barnhart@Highlands.com
Article 33343 of sci.lang:
From: hexis@netcom.com (James C. Harrison)
Subject: Re: Most Commonly Used Words
Organization: NETCOM On-line Communication Services (408 261-4700 guest)
Date: Mon, 8 Apr 1996 03:27:01 GMT
Have I got the book for you, assuming Dover has kept it in print: An
English, French, German, Spanish Word Frequency Dictionary: A correlation
of the first six thousand words in four single-language frequency lists,
compiled by Helen S. Eaton 1967. When I bought my copy, it cost $3.00...
hexis
Article 103579 of alt.usage.english:
From: lee@sq.com (Liam R. E. Quin)
Subject: Re: English word frequency
Organization: SoftQuad Inc., Toronto, Canada
Date: Sun, 11 Aug 1996 02:52:46 GMT
Clive Young (clive.young@umist.ac.uk) wrote:
> I'm looking for frequency-based English vocabulary lists.
> What are the 1000/2000/3000 most commonly used English words?
> Are such lists available on the Web somewhere?
hughett@galton.psycha.upenn.edu (Paul Hughett) wrote:
> I don't know of any such lists on-line but Nation (1990) cites several
> such lists in printed form and evaluates their relative utility. She
> also describes some of the hazards of taking such lists too literally.
> Nation, I. S. P., ``Teaching and Learning Vocabulary'',
> Newbury House Publishers, 1990
You might also like to see
Susan Armstrong (Ed.), ``Using Large Corpora'', MIT Press 1993
which discusses some of the issues relating to generating and using such
lists and other word-related information from sources of written or
transcribed texts.
Tom Collins wrote:
> For some years I have been using the Collins COBUILD series of
> dictionaries,which give you information on word frequency.
> THese dictionaries are based on a 200 million word computer
> database and focus on the most commonly-used words-in all their
> various forms, which is why I find it so useful. The latest
> version of the dictionary uses a 1-5 rating method to show
> relative frequency.
I don't want to say anything against COBUILD, which by all accounts is
excellent work, but I should say that if you're going to go as far as
the 3,000 most common words, you really do have to choose your domain
very carefully. Decide on whether you want `bland, unrecognisable English',
`Literary English', `Newspaper English', `Technical English' and so forth.
Decide whether you want American English reflected in your corpus, and if
so to what extent. What about computer terms?
In the King James Bible, `Shalt' is a fairly common word, but although it
was probably generally common in 17th C. English, it isn't common today.
With a 200 million word corpus, you'll find probably start to find a number
of idioms that are common in spoken English just starting to be statistically
significant; compare `strong man' which is a statistically significant
collocation (t=2.0, but only 6 occurrences) in the 47 million word AP 1991
corpus [Armstrong, op.cit., Table 8 p.19, and text on p.18].
The best thing to do may be to measure the word frequencies in the actual
data in which one is most interested.
Text manipulation programs such as WordCruncher and TACT, and programming
tools such as Unix shell, awk and perl are often used for this sort of work.
There's a simple shell script given by Doug McIlroy of AT&T Bell Labs that
looks something like (from memory)
tr -cs '[a-zA-Z]' '\012' < input | # words one per line
tr '[A-Z]' '[a-z]' | # convert to lower case
sort | # collate repetitions together
uniq -c | # count multiple occurrences
sort -nr | # sort by frequency
sed 400q > output # take the most common 4000
I'll leave the proper handling of words such as can't and o'clock to the
interested reader :-)
This approach does not attempt any morphological analysis or stemming.
Another approach would be to use Porter's Algorithm (see any book on
information retrieval, e.g. Salton or Frakes et.al.) to bring word forms
more or less together (but without any tectual analysis or etymological
exactitude).
Finally, note that word lists like this are generally not very useful for
spelling checkers (which is what many people want them for). Study the
publicly available ispell program instead... although short lists are
usually _much_ better for spelling checkers than long ones! E.g. the
Shorter Oxford Wordlist (don't ask me for it, do a web search, it's out
there) contains every word used by Milton... including the ones he used by
mistake, as misspellings... :-)
Commercial spelling checkers usually do have fairly large vocabularies --
typically in excess of 10,000 words -- but a lot of that is proper nouns,
such as American cities, names of competing firms -- e.g. `Interleaf' :-) --
common first names, and often a great quantity of domain-specific terms,
such as names of illnesses (Influenza, Percival, Priscilla), computing
terms, and animals. Such as Fruitbats and Meerkats these days :-)
Lee
--
Liam Quin, SoftQuad Inc | lq-text freely available Unix text retrieval
lee@sq.com +1 416 239 4801 | FAQs: Metafont fonts, OPEN LOOK UI, OpenWindows
SGML: http://www.sq.com/ |`Consider yourself... one of the family...
The barefoot programmer | consider yourself... At Home!' [the Artful Dodger]
Article 5182 of sci.lang:
From: Ron Hardin
Subject: Re: On artificial languages
Date: Fri, 03 Aug 2001 20:39:20 -0400
Organization: MindSpring Enterprises
There's also Basic English.
Wayne Booth writes in _Now, Don't Try to Reason with Me_
``[when teaching an exchange student program in English] On the
morning of July 8, I found in my pile of English suggestions
[from the State Department] a copy of _Basic English Self-taught_
and carbon copies - to this day I do't know precisely where
they came from - of three supplementary lists, an Economics List,
a Business List, and a Poetry List. I am a little ashamed to confess
that I ignored both the book and the supplementary lists for
several days. I was - let me be frank - skeptical about the value of
Basic English, perhaps because of an earlier brush with Esperanto.''
I am pleased to reproduce the _Poetic Annex to Basic English_ below
as Booth gives it:
angel, arrow
beast, blind, bow, breast, bride, brow, bud
calm, child, cross, crown, curse
dawn, delight, dew, dove, dream
eagle, eternal, evening, evil
fair, faith, fate, feast, flock, flow, fountain, fox
gentle, glad, glory, God, grace, grape, grief, guest
hawk, heaven, hell, hill, holy, honey, honour
image, ivory
joy
lamb, lark, life, lion, lord
meadow, melody, mercy
noble
passion, perfume, pity, pool, praise, prayer, pride, priest, purple
rapture, raven, robe, rock, rose, rush
search, shining, shower, sorrow, soul, spear, spirit, storm, strength, sword
their, tower, travel
valley, veil, vine, violet, virtue, vision
wandering, wealth, weariness, weeping, wisdom, wolf, wonder
--
Ron Hardin
rhhardin@mindspring.com
On the internet, nobody knows you're a jerk.