Chapter Fourteen: Inductive Generalization

David Carl Wilson

Part Six: Evaluating Inductive Logic

Chapter Fourteen: Inductive Generalization

There is nothing in which an untrained mind shows itself more hopelessly incapable, than in drawing the proper conclusions from its own experience.

—John Stuart Mill, Inaugural Address at St. Andrews

There’s nothing like instances to grow hair on a bald-headed argument.

—Mark Twain

TOPICS

Correct Form for Inductive Generalization
The Total Evidence Condition (1): Sample Size
The Total Evidence Condition (2): Random Selection
Evaluating the Truth of Premises about Sampling
Complex Arguments

A certain raja, according to a story told by the Buddha, took all the blind men of Savatthi to show them an elephant. As each one felt the elephant, the raja said, “Tell me, what sort of thing is an elephant?” Those who had been presented with the head answered, “Sire, an elephant is like a pot.” Those who had felt the ear replied, “An elephant is like a winnowing basket.” Those who had been presented with a tusk said it was a plowshare. Those who knew only the trunk said it was a plow; others said the body was a granary; the foot, a pillar; the back, a mortar; the tail, a pestle; the tuft of the tail, a brush. Then they began to quarrel, shouting, “Yes it is!” “No, it is not!” “An elephant is not that!” “Yes, it’s like that!” and so on, until they came to blows over the matter.

In one important way we are all like the blind men examining the elephant: there is much that we wish to understand but do not directly experience. Whether we are tasting a spoonful of soup to see if the pot has enough salt or reading about the polling of registered voters to learn who the electorate prefers for president, we habitually draw general conclusions from a few observations—that is, we habitually reason by inductive generalization.

Many writers actually mean inductive generalization when they write about induction—which helps explain why some have dubiously defined induction itself as inference that moves from the particular to the general. This particular-to-general feature highlights the most fundamental difference between inductive generalization and frequency arguments; frequency arguments, recall, move from the general to the particular.

14.1 Correct Form for Inductive Generalization

If an inductive generalization is to be logically successful, it—like all other inductive arguments—must satisfy both the correct form condition and the total evidence condition. This is typically the correct form for inductive generalizations:^[1]

n of sampled F are G (where n is any frequency, including 0 and 1).
∴n (+ or – m) of F are G.

Both the premise and the conclusion are frequency statements of the sort described in the preceding chapter. Note that in this form of argument, the premise and conclusion differ in only two ways—sampled is in the premise but not in the conclusion, and the margin of error (+ or – m) is in the conclusion but not in the premise.

Guideline. Structure an inductive generalization, when it would be loyal to do so, so that the conclusion drops the term sampled and adds a margin of error.

14.1.1 The Logical Constant Sampled

The term sampled appears in the premise but disappears in the conclusion. This is what makes this form of argument a generalization—the premise is strictly about those individuals in the population that have been sampled, while the conclusion is generally about the population as a whole. We will treat sampled as a logical constant, like if–then, or, and not. Stylistic variants include visited, seen, observed, tested, polled, and experienced. When you can see that an argument is an inductive generalization, translate all of these stylistic variants to sampled.

Guideline. In the premise of an inductive generalization, translate stylistic variations into the logical constant sampled.

EXERCISES Chapter 14, set (a)

Identify the term that is being used as a stylistic variant for sampled in each of these sentences, then paraphrase each sentence so that it displays correct form for the premise of an inductive generalization.

Sample exercise. Of the 100 people I asked, 53 said they are better off now than they were four years ago.

Sample answer. Sample solution. Asked is the stylistic variant. 53 percent of the sampled people say they are better off now than they were four years ago.

I’ve never had a piece of pie at the Country Kitchen that I didn’t think was delicious.
All of the websites I visited had site maps on the bottom of the landing page.
There has never been a documented case of a human attacked by a healthy wolf.
We began our study by randomly selecting 1,000 students enrolled in the college and interviewing them. It turned out that 820 of them said “yes” to the question, “Does it annoy you to be asked questions as part of a randomly selected sample?”
More than ten percent of America’s long-term coal miners who were x-rayed had black lung disease.
Only 5 of the 25 cars we saw driving in the car-pool lane today had more than one occupant.

Exercises Chapter 14, set (b)

List five stylistic variants for sampled that have not yet been introduced in the text, and make up an ordinary-language frequency statement (not necessarily in standard format) that uses it.

Sample answer: Rode on. All of the Metro buses I rode on had a bumpy ride.

14.1.2 The Margin of Error

The second difference between premise and conclusion is the + or – m of the conclusion, which represents the margin of error. Consider the following inductive generalization:

Fifty percent of the sampled voters favor Jones.
∴Fifty percent (+/- 3 percent) of the voters favor Jones.

The 3 percent margin of error simply means that between 47 percent and 53 percent—inclusive—of the voters favor Jones. Professionals term 47 percent to 53 percent the confidence interval. That is, the conclusion would mean the same thing if it were stated in this way:

∴The percentage of voters who favor Jones is somewhere in the range from 47 percent to 53 percent.

Notice that the margin of error strengthens the argument enormously. Without the margin of error, the conclusion would have been the much more precise 50 percent of the voters favor Jones. And this would have been false if the actual frequency of voters favoring Jones had turned out to be .53 or even .501. By including the margin of error in the conclusion, the conclusion turns out to be true with either result.

The margin of error is sometimes expressed more colloquially. When the premise in a casually expressed argument is, for example, Half of the sampled F are G, one way of including a margin of error in the conclusion is to say About half of F are G. Or when the premise is, for example, All sampled F are G, a margin of error is being incorporated when the conclusion is expressed as Almost all F are G. As you can see, about half and almost all are much more likely to generate a true conclusion than half and all.

Why, then, shouldn’t an arguer include the largest possible margin of error in every inductive generalization? For the simple reason that we need some degree of precision in the answers to most of the questions that inductive generalizations answer. A pollster would certainly be much more likely to have a true conclusion with the following argument:

Fifty percent of the sampled voters favor Jones.
∴Fifty percent (+ or – 50 percent) of the voters favor Jones.

As long as anywhere from 0 percent to 100 percent of the voters turn out to be in favor of Jones, the pollster’s results are accurate. But the pollster would also quickly be unemployed; we don’t need professionals to tell us that between none and all of the voters favor a particular candidate. Even narrow margins of error can sometimes render an inductive generalization useless. One recent poll of citizens of Quebec concluded that 49.5 percent (+/- 3 percent) favored secession from Canada while 50.5 percent (+/- 3 percent) opposed it. The overlap produced by the 3 percent margin of error leaves us with an inconclusive result;^[2] the margin would have to be reduced to less than one-half of a percent for this particular poll to be useful. The necessity of including even a 3 percent margin of error, in this case, renders the results useless.

This also applies to everyday life. Suppose one of my children is scared of monsters in the night. I turn on the light, check a few places in the room, find no monsters in the sampled places, and, remembering to include a margin of error, I say to my child, “I have concluded that almost no places in the room have monsters.” This would clearly not be satisfactory, for the situation requires a conclusion with much greater precision—namely, the very precise “No places in the room have monsters.”

It can be charitable to include a margin of error in your paraphrase, but only when loyalty allows it. If someone argues, “I’ve never witnessed a single rainy day in southern California, so I conclude that it absolutely never rains in southern California,” it would be disloyal to paraphrase the conclusion with a margin of error, even though it would make it more likely to be true:

No sampled days in southern California are rainy.
∴Almost no days in southern California are rainy.

On the other hand, if someone says, “Half of my students tell me that they are planning to pursue an advanced degree, and I take that as a reliable indicator of the plans of students throughout the country,” then it seems proper to provide this charitable paraphrase:

Half of the sampled American college students are planning to pursue an advanced degree.
∴About half of the American college students are planning to pursue an advanced degree.

Many inductive generalizations, for better or for worse, are like the rainy day argument above—they cannot be loyally paraphrased with any margin of error in the conclusion. This does not mean that such an argument fails to satisfy the correct form condition; rather, it simply means that it should be understood as including a margin of error of 0 percent.

Guideline. When the principle of loyalty allows, paraphrase inductive generalizations so as to include a non-zero margin of error in the conclusion.

Correct Form for Inductive Generalization

n of sampled F are G. (Where n is any frequency, including 0 and 1.)
∴n (+ or – m) of F are G.

Exercises Chapter 14, set (c)

Provide a conclusion, in correct form for inductive generalization, for each of the premises in set (a) of the exercises for this chapter. Include a non-zero margin of error; don’t worry for now about whether the margin of error is too large or too small. (And don’t forget to drop the term sampled.)

Sample exercise. 53 percent of the sampled people are better off now than they were four years ago.

Sample answer. 53 percent (+ or – 10 percent) of the people are better off now than they were four years ago.

14.2 The Total Evidence Condition (1): Sample Size

As we have established, if an inductive argument is to be logical it is not enough that it satisfies the correct form condition. Correct inductive form makes the argument a candidate for logical success, but it can tell you nothing about how inductively strong the argument is. This is where the total evidence condition makes its entrance. Once we see that the conclusion fits the premises, we must then see how well it fits the total available evidence.

For an inductive generalization, when considering the total evidence condition the central question to ask is this: Is the sample representative of the population? Does the part of the elephant touched by the blind man feel like the rest of the elephant? Does the tasted spoonful of soup taste like the rest of the pot? Do the polled voters accurately reflect the views of the entire electorate? There is only one way to be completely sure—namely by sampling the remainder of the population. But this is generally not practicable; if you taste the rest of the soup, there’s none left for the dinner guests.

There are two things to look at when assessing whether the sample accurately represents the population: the size of the sample—it must be large enough—and the randomness of the selection process—every member of the population must have an equal chance to be included in the sample. Inductive generalizations that fail in one or both of these areas are sometimes said to commit the fallacy of hasty generalization. It is worth mentioning this fallacy, however, only because it reminds us of how easy it is to be satisfied with a sample that is not representative. The fallacy tells us nothing about the specific way in which the argument fails; for that reason, it is best to avoid the term and focus your evaluation on specific failures in measuring up to standards for size of sample and randomness of selection.

Total Evidence Condition for Inductive Generalizations

What makes the sample representative:

The sample must be large enough.
The sample must be randomly selected.

14.2.1 The Sample Must Be Large Enough

The first total-evidence question to ask is this: Is the sample large enough? No single size is right for every sample. Sometimes a sample of one should be enough. How many spoonfuls do you have to taste to decide if there is enough salt in the soup? But in other cases, 1,000 might be closer to the right number. Most market research and public opinion firms seem to interview roughly that number of people. And sometimes the really ambitious researchers go for gigantic samples (though, as we will see, this is almost always unnecessary). Dr. Alfred Kinsey, for example, who published the enormously influential volumes Sexual Behavior in the Human Male and Sexual Behavior in the Human Female in the mid-20th century, was convinced that he needed to collect 100,000 histories to have a representative sample of the population.

Guideline. In considering whether an inductive generalization has satisfied the total evidence condition, first ask, Is the sample large enough?

14.2.2 When a Sample of One Is Enough

We will proceed from here via a few simple rules of thumb; these tips will give you all that you need for most practical purposes to evaluate most inductive generalizations. (If you are thirsty for more, you may wish to read a book or take a course in statistics.) The first rule of thumb is this: for most inductive generalizations, you need a sample of either one or 1,000.

The way to decide whether it should be one or 1,000 is to ask the question, Is this an all-or-none property? With some properties, it is fairly clear that either all or none of the entire population has it. Saltiness of soup—assuming the pot has been stirred—is a good example; before you take a taste, it is reasonable to believe that if the taste is too salty, then the entire pot is too salty; but if it is not salty enough, then the entire pot is not salty enough. The properties too salty and not salty enough are in this case all-or-none properties.

We can look around us and discover many more everyday examples of all-or-none properties. Are you curious about what the morning edition of the Chicago Tribune reports about the snow conditions on the slopes of Vail? When you buy a copy and see that it reports excellent snow conditions, it is reasonable for you to conclude that this is what is reported by all the papers in that entire edition. That is, it is reasonable for you to reason as follows:

All sampled copies of the morning edition of the Chicago Tribune report excellent snow conditions at Vail.
∴All copies of the morning edition of the Chicago Tribune report excellent snow conditions at Vail.

Reporting excellent snow conditions at Vail is likely to be an all-or-none sort of property for copies of the same edition of a newspaper; so a sample size of one is sufficient. You could buy a thousand copies from newsstands and newspaper boxes throughout Chicago and check them, just to be sure the sample is large enough to support your conclusion. Doing so would strengthen your argument a bit, since it would help rule out the remote possibility that the first copy was the result of some bizarre error or trick. But that possibility is typically so remote that the added strength of the 999 extra copies would be negligible.^[3]

Calvin Coolidge, it is said, was once visiting a farm with some friends. When they came to a flock of sheep, one of the friends said, “I see these sheep have just been shorn.” Coolidge, famous for his caution, replied, “Looks like it from this side.” Coolidge was reluctant to generalize from the visible part—the sampled part—of each sheep to the whole sheep. But he didn’t really have to be so cautious. Given what most of us know, it is reasonable to believe that whether a sheep is shorn is an all-or-none sort of property; thus, even if we have sampled only one part of the sheep, if that part is shorn we can generalize to the whole sheep.

Inductive generalizations, however, are often criticized quite legitimately for relying on samples of one, or samples larger than one that are nevertheless too small. You would not, for example, interview merely one voter to find out which presidential candidate is preferred by the electorate, since favors candidate A is not typically an all-or-none property; we expect to find variety in the population with respect to this property. (The story is very different if you wish to find out which presidential candidate is favored by the electoral college of a single state; since they all vote the same way, depending on which candidate won the plurality of the state’s votes, you can generalize to them all if you know the vote of one.) Likewise, you should not—and if you are careful, you would not—make a decision about, say, someone’s honesty based on a single interchange with that person. Someone’s honest behavior in your first brief conversation may or may not be representative of that person’s behavior in general. A sample much larger than a single meeting is necessary for a logical argument about general behavior—this is one of the reasons we typically date before marriage.

If the inductive generalization is conducted scientifically—by a public opinion poll, say, or an experiment on rats or human subjects—then typically you will find that the property is not all-or-none. If the scientists had believed it to be an all-or-none property, they would not have gone to the trouble and expense required to construct a large sample so carefully.

Guideline. If the property is likely to be all-or-none, then a sample of one is typically enough. It is almost certainly not an all-or-none property if there has been an effort to scientifically construct the sample.

Exercises Chapter 14, set (d)

For each of the simple arguments below, clarify it in standard format, identify the relevant property, and state whether it is probably, for this population, an all-or-none property.

Sample exercise. One camper to another, taking a thermometer out of a pot of water boiling over the fire: “See, the thermometer shows 99 degrees Centigrade. So that’s the temperature at which water boils at an altitude of 5,000 feet.”

Sample answer.

All sampled water at an altitude of 5,000 feet boils at 99 degrees Centigrade.
∴All water at an altitude of 5,000 feet boils at 99 degrees Centigrade.

Property is boils at 99 degrees Centigrade. It is probably an all-or-none property, thus a sample of one should be enough.

One 7-Eleven shopper to another, holding up a can of coffee: “I can see from this can that 13-ounce cans of Folger’s coffee cost $3.99 here.”
That driver almost ran me off the road. It’s obvious that people who live in this city are terrible drivers.
You have trouble doing long division? Then you’re not very intelligent, are you? (Hint—make the population opportunities to show intelligence.)
How do I know that any copy of her brand new book that you pick up will be long? I just read all 750 pages.
I’ve known two other people from Syracuse, and they were both of Norwegian descent. So I guess most people from Syracuse are Norwegian.
That ant bit me and left an angry red welt on my leg. I’m not going near the rest of them.

14.2.3 When a Sample of 1,000 Is Enough

When the one-or-1,000 rule of thumb is applied and the property is not all-or-none, you can assume for most purposes that a sample of 1,000 is sufficient for a logically strong argument—assuming the sample is randomly selected. This is well illustrated by public opinion polls and marketing surveys, which almost always have samples of roughly that size. But there is nothing magical about the number 1,000; its sufficiency depends on several things—most notably the margin of error. Whether a random sample of 1,000 is big enough depends on whether the margin of error is at least 3 percent. If it is impossible to collect a sample of 1,000, then the arguer must settle for a larger margin of error or for a logically weaker argument.

Let us look at this in a more general way. We have already seen two things that can increase the logical strength of an inductive generalization. The larger the margin of error, the stronger the logic of the argument. And the larger the sample size—assuming it is randomly selected—the stronger the logic of the argument (though, as we have already seen, increases in sample size after a certain point are only marginally helpful). This suggests another rule of thumb: if the margin of error increases appropriately as the sample size decreases, the logical strength of the argument remains steady. The bigger the margin of error, the smaller the necessary sample size. The reverse is likewise true: the smaller the margin of error, the larger the necessary sample size.

Statisticians can establish for any sample size (assuming the sample is randomly selected) the margin of error that can be confidently assumed. Here are some useful points along the continuum:

The Margin of Error per Sample Size

Sample size	Margin of error
10	+/- 30 percent
100	+/- 10 percent
500	+/- 4 percent
1,000	+/- 3 percent
2,000	+/- 2 percent

This means that a voter opinion survey of 1,000 people can provide the basis for a strong inductive generalization so long as the conclusion allows for a margin of error of at least 3 percent. Suppose this is the premise:

1. Fifty percent of the sampled voters favor Jones.

A random sample of 1,000 is large enough to support this conclusion:

∴Fifty percent (+/- 3 percent) of the voters favor Jones.

But if the random sample were only 100, the logic of the induction would be equally strong only if the argument concluded that from 40 percent to 60 percent favored Jones. If the random sample were 10, then the conclusion would have to be that from 20 percent to 80 percent favored Jones. If, however, it were as large as if 2,000, then the conclusion could be narrowed to the assertion that 48 percent to 52 percent favored Jones.

Some arguments do not need a high level of precision. Suppose I am interested in providing venture capital to fund a specialty candy store in the local shopping mall, and I determine that at least 10 percent of the shoppers will have to buy something in the store if it is to succeed. I randomly (really randomly) interview 10 shoppers and find that 6 of them would have bought candy from my store. From this I can conclude that 60 percent (+/- 30 percent) of the shoppers would buy candy from the store (I take this directly from the preceding table of sample sizes), that is, anywhere from 30 percent to 90 percent. This is well above my cutoff point of 10 percent, so greater precision is not necessary. Again, in general, the less precision needed in the conclusion, the smaller the sample that is needed.

Guideline. For properties that are not all-or-none, if the margin of error increases appropriately as the sample size decreases, then the logical strength of the argument remains steady.

Rules of Thumb for Judging Sample Size When the Sample Is Randomly Selected

One is enough when the property is all-or-none.
1,000 are enough when the property is not all-or-none and the margin of error is at least 3 percent.

Exercises Chapter 14, set (e)

For each sample that is described write a conclusion with an appropriate margin of error.

Sample exercise. A random sample of 500 pairs of socks put into clothes dryers showed that one-fourth of the pairs lost one member by the end of the cycle.

Sample answer. Twenty-five percent (+/- 4 percent) of pairs of socks put into clothes dryers lose a member by the end of the cycle.

In a random sample of 10 owners of Subarus in the most recent model year, 5 of them were “extremely pleased” with their car.
Four percent of 2,000 randomly sampled American homeowners said they preferred renting.
In a random sample of 100 days in Atlanta, on 7 of them unhealthful levels of ozone were in the air.
One-tenth of a random sample of 1,000 mosquitoes captured in a Florida swamp were carrying the virus that causes encephalitis.
In a random sample of 1,000 Texas adults, 483 believe the state sport should be rodeo.
In a random sample of 500 television episodes from 50 years of television history, one-third of them depicted at least one murder.
Eighteen percent of the 500 streetlights sampled at random in Manhattan were out of order.
Of 2,000 randomly sampled American Express cardholders, 1,609 were pleased with their customer service.

14.2.4 Population Size

Although the size of the sample is very important, the size of the population has very little to do with the logical strength of the argument. This may initially strike you as contrary to common sense. But note that common sense does not tell you that you must take a bigger taste if you have a bigger pot of soup, assuming it is properly stirred.

If the population is very small, population size can matter. Suppose you have 1,000 trees in your apple orchard and you want to sample them to learn how many trees are diseased. Being diseased is not likely to be an all-or-none property in an apple orchard, so our one-or-1,000 rule tells us to sample 1,000 trees. Suppose you do that and find that 160 of them are diseased. You no longer need to generalize; you have sampled the entire population, you have found 16 percent to be diseased, and no inference from sample to population is necessary. Small populations matter because they make inductive generalizations unnecessary.

Suppose, however, that owing to the cost and difficulty of testing the trees, you cannot randomly check more than 500 of them; you do so and find that 71 of the sampled trees are diseased. At this point the best thing you can do is refer to the preceding sample size table. If you want a logically strong argument, you must conclude that 14.2 percent, plus or minus 4 percent, are diseased—the same as if your sample of 500 had been from an orchard 10 times larger. We find the same phenomenon in polling practices. If there are 150 voters in our hamlet, we can interview every voter and avoid the generalization. If there are 15 million voters, which is roughly the case in Canada, pollsters require a random sample of 1,000 to conclude that half the voters, with a margin of error of 3 percent, favor Jones. And if there are 150 million voters, which is roughly the case in America, they still require a random sample of 1,000 to conclude that half the voters, with a margin of error of 3 percent, favor Jones.

Why does this work? If 16 percent of the trees are diseased, then for each randomly selected tree there is a 16 percent chance, or a .16 probability, that it is diseased. This is true whether we are talking about 16 percent of 1,000 or 16 percent of 10,000. And if exactly half of the voters favor Jones, then—whether we are talking about half of 10 million or half of 100 million—there is a 50 percent chance, or a .50 probability, that each randomly selected voter favors Jones. It is this property of each member of the sample that governs the behavior of the sample as a whole.

For practical reasons it may be more difficult to get a random sample when the population is far larger. Ten thousand trees or 100 million voters may be spread over a huge geographical area, making it impossible to give every member of the population an equal chance of being included in the sample. Thus, sample size or margin of error may be strategically increased to offset these practical difficulties (as we will see in the next section). But these adjustments are directly due to lack of randomness, and only indirectly due to population size.

Guideline. When the population is large, variation in population size has no bearing on the size of the random sample that is needed, although it may have a bearing on how easy it is to get a random sample.

14.2.5 Logical Strength and Confidence Level

Exactly how logically strong is the inductive generalization that begins 50 percent of the sampled voters favor Jones, assuming that it is based on a random sample of 1,000 and a margin of error of 3 percent? How much support does the premise provide the conclusion? This can be answered quite precisely: the probability of the conclusion, based on this premise and the relevant background evidence, is .95.

Suppose the population of voters in the Jones poll numbers 10 million, and suppose that exactly 5 million—that is, 50 percent—favor Jones. Statisticians tell us that if we took 20 different random samples of 1,000 from that population of 10 million, 19 of those 20 times the number in the sample that favored Jones would be in the range of 47 percent to 53 percent (that is, 50 percent +/- 3 percent). Since the true conclusion, namely, 50 percent (+/– 3 percent) of the voters favor Jones, would occur 19 out of the 20 times, or in 95 percent of the cases, its probability of success is .95. We may have gotten it wrong this time, but that would mean that this is the 1 time in 20 it would happen.

If, on the other hand, we took 20 different random samples of only 10 from that population, 19 out of 20 times the number in the sample that favored Jones would be in the range of 20 percent to 80 percent. To get the same .95 probability of success with such a small sample, the conclusion must be 50 percent (+/– 30 percent) of the voters favor Jones.

Professional researchers would typically term this a confidence level of .95. Confidence level, however, is just another expression for the probability of the conclusion, given the truth of the premise and given the relevant background information. That is, it is just another expression for logical strength. (It is not the level of confidence that you do have, but the level of confidence that you rationally ought to have.) Professional researchers tend to aim for arguments with a .95 probability, and we will typically refer to arguments that achieve this level of probability as very strong.

No rule says that when the probability is .95 you should believe the conclusion. We are only talking about the argument’s logic. There must also be a very high probability that the premises are true before you accept the argument as sound. Nor does any rule say that when you do accept such an argument as sound and when you do believe the conclusion that you should act with confidence on it. If the argument’s conclusion has to do with whether a rope bridge over a treacherous waterfall is able to support you, you may quite reasonably turn around and go home unless you can be given much better than a 19 out of 20 chance of survival. But if the conclusion has to do with whether a black speck on the table is a fly or an imperfection in the surface, you may quite reasonably attempt to brush it away even if the confidence level is considerably lower than .95.

The vocabulary of logical strength, probability, and confidence level can also be applied to the other sorts of arguments we have covered. In deductively valid arguments, for example, the conclusion would be true every time you considered the premises; thus, they confer a probability of 1.00 on their conclusions—that is, they have a confidence level of 1.00. And consider frequency arguments, such as this one:

Sixty-seven percent of the marbles in the clay pot are red.
The marble I’ve just taken in my hand is a marble in the clay pot.
∴The marble I’ve just taken in my hand is red.

Assuming I have no relevant background evidence except for the frequency expressed in the first premise, we can say that the conclusion will be true 67 percent of the times that I take a marble in my hand. Thus, the conclusion, given those premises and that background evidence, has a probability of .67—and the argument has a confidence level of .67.

Guideline. Judge as very strong the logic of any inductive generalization that renders its conclusion .95 probable.

Exercises Chapter 14, set (f)

Create a brief argument of the sort described and with the degree of logical success described. Explain.

Sample exercise. Inductive generalization, no support at all.

Sample answer. All of the Italian restaurants I have visited have served pasta, so it follows that only Italian restaurants serve pasta. (No support because does not satisfy the correct form condition.)

Inductive generalization, .95 probability (very strong).
Frequency argument, .55 probability (very weak).
Singular affirming the antecedent, 1.00 probability (valid).
Frequency argument, .50 probability (no support).
Singular denying the antecedent, .50 probability (no support—note that a probability below .50 would support the falsity of the conclusion).

14.3 The Total Evidence Condition (2): Random Selection

14.3.1 Random Selection

To review, we are considering how to evaluate the logic of inductive generalizations. We are assuming that the correct form condition is satisfied and we are focusing on the total evidence condition. For the total evidence condition to be satisfied, recall, the key question is whether the sample accurately represents the population. This can be divided into two questions: whether the sample is large enough, and whether the sample has been randomly collected. We now turn to the second question.

To say that sample selection is random, for the practical purposes of this text, is to say that every member of the population has had an equal opportunity to be included in the sample, so that exactly the relevant variations of the population might be proportionately represented. This is an important definition, for it differs from the way we ordinarily use the term. There would be nothing unusual about my saying, “I randomly interviewed 30 people at the bus station to find out what people in the city think of rapid transit.” This, however, is not the sort of randomness that we are looking for in evaluating inductive generalizations. In this relaxed use of the term, random simply means indiscriminate, or without any special principle of selection. But notice that not everyone in the city had an equal opportunity to be included in the sample—only those who happened to be at the bus station. This means that relevant variations of the population have almost certainly been omitted from the sample; for example, people who never ride the bus, and so are excluded from the sample, probably tend to have views on this subject that differ from those who ride it. In short, the randomness that we are looking for is not indiscriminate randomness; it requires carefully considered principles of selection.

An ideal way to get a perfectly random sample would be to list all the members of the population, run the names through a computerized randomizing program (or shake them thoroughly in a giant hat, or put each name on a surface of a huge many-sided fair die), and sample the first 1,000 that are selected. But this is almost never something that works in real life. It would be prohibitively expensive to do this if, say, you were generalizing about voter preferences across the entire American population. And it would simply make no sense if you were, say, generalizing about pollution throughout an entire river. (How would you list all the potential beakers of water that make up the river?)

Professionals usually find it simpler to achieve randomness by a technique called stratification. They make an informed judgment regarding which subpopulations are likely to differ from the larger population in the frequency with which they exhibit the property in question. They divide the population proportionately into these smaller populations, or strata, and sample at random from each stratum. Suppose, for example, the population is registered voters in the state of North Carolina and the property is prefers the Republican candidate in the North Carolina gubernatorial election. Voter preference is likely to vary according to factors such as party affiliation, ethnicity, economic status, and gender. So the pollsters must ensure that they have randomly selected, for example, Republicans, African-Americans, welfare recipients, and women in sufficient numbers so that their share of the sample matches their share of the population of North Carolina’s registered voters. Voter preference is not likely to vary, however, according to astrological sign, so there is no need to be sure that a Scorpio stratum is included in the sample.

Guideline. Do not judge an inductive generalization to be logically strong unless its sample is randomly selected—that is, unless the sample includes the relevant variations in the appropriate frequency. Remember that not all variations are relevant.

Exercises Chapter 14, set (g)

For each statement in set (e), list (i) the population, (ii) the property, (iii) two relevant variations in the population, and (iv) one irrelevant variation.

Sample exercise. A random sample of 500 pairs of socks put into clothes dryers showed that one-fourth of the pairs lost one member by the end of the cycle.

Sample answer. Population: pairs socks put into clothes dryers. Property: lost one member by the end of the cycle. Relevant variations: size of load, time of cycle. Irrelevant variation: brand of socks.

14.3.2 Random Mistakes

Our purpose in this textbook is not to design samples but to evaluate arguments. This section will help you in detecting ways in which a sample might fail to be randomly selected and thereby contribute to an unsound argument.

Sometimes you can see that a relevant variation has been omitted without knowing the exact sampling process that was used. If you knew that 75 percent of those in the sample were men, and the question was whether Americans thought that women were treated equally in the workforce, then you would know there was a problem with the sample; attitudes on this vary with gender, so the genders must be equally represented. If, on the other hand, the question were whether baseball fans favored the designated hitter rule, you probably would not know whether there was a problem with the sample. It may well be that 75 percent of all baseball fans are men, in which case they would turn up with this frequency in a random sample.

Often you simply have no details about the sample, in which case your approval of the argument’s logic may depend on whether you trust the person or organization that collected it. The Chapters 8 and 9 guidelines for appeals to authority are directly pertinent here. Was the research done by a credible organization? Is there no sign of sponsorship by a business that has an interest in a certain outcome? Is the prior probability of the outcome reasonably high? Yes answers to all of these questions count in favor of the argument.

There are, however, a few tips that can reliably tell you when a sample is not randomly selected. Grab sampling, for example, is the process of including in your sample whatever members of the population happen to come your way. This is the method used in the bus station case; it is easy to do, but it rarely provides a representative sample. In The De-Valuing of America, William Bennett recounts the use of such a technique by a department chair at a prestigious university, who remarked the day after the 1980 presidential election: “I voted for Carter. Most of my colleagues voted for Carter. And a few voted for Anderson. But Reagan got elected. Who the hell voted for Reagan?”

The following Los Angeles Times story includes an obviously flawed grab sample:

The Water Quality Control Board is considering imposing fines of $10,000 against the City of Los Angeles for each major discharge of raw sewage. But Harry Sizemore, assistant director of the city Bureau of Sanitation, insists that the water in the ocean does not cause disease. “I swim there,” he said. “And several members of our bureau are avid surfers who use the area. None of us has ever caught any diseases from it.”

This argument has several defects besides its dependence on a flawed sampling procedure. For example, there is some reason to distrust the reports of this particular group—and thus, reason to doubt the truth of the premise. Further, the sample is a very small one. And we would prefer an analysis of a random sample of the water itself rather than a random sample of those who have been in the water. But the relevant point here is that Sizemore has not provided us with a random sample of those who have been in the water. It is a grab sample, made up of whomever Sizemore happened to talk to at the office, and thus there is no reason to think that it is representative.

Snowball sampling, a close relative of grab sampling, is the process of adding new members to the sample on the basis of their close relationship with those already included (thus gathering members in the same way that a snowball gathers snow as it rolls along). I have already mentioned the highly publicized studies on sexual behavior conducted by Alfred Kinsey in the 1940s and 1950s. Kinsey frequently selected new interview subjects by asking his interview subjects to refer him to their friends and acquaintances. Given that he had a special interest in talking to those whose sexual practices were not considered mainstream, and given that friends and acquaintances of those who were not in the sexual mainstream were themselves somewhat likely to be out of the mainstream, this snowball sampling produced significant distortions in his sample. True, Kinsey collected an enormous sample. But, due to his snowball technique, the magnified sample size magnified the distortion.

Self-selected sampling is probably the most common, and most insidious, error. This occurs when members of the population decide for themselves whether to be included in the sample. Before we stray too far from Alfred Kinsey, note this Psychology Today review of a similar but more recent study:

Love, Sex, and Aging is a report of a survey of 4,246 Americans aged 50 and older—the largest sample of older persons about whom detailed sexual data exist. It is composed entirely of volunteers who responded to an ad in Consumer Reports. The authors of the book say, “We are confident that many or most of our findings apply to a very broad segment of Americans over 50,” and present their findings in that spirit. Item: two-thirds of the women and four-fifths of the men 70 or older are still sexually active. Grandma, Grandpa, you couldn’t! You don’t!

Let’s begin by treating the argument a bit more fully. For simplicity, let’s clarify only the argument about men:

Eighty percent of the sampled men aged 70 or older are still sexually active.
∴About 80 percent of men aged 70 or older are still sexually active.

The frequency is 80 percent, the population is men aged 70 or older, and the property is still sexually active. I’ve charitably included an informal margin of error (about 80 percent) in the conclusion, which seems warranted by the imprecise way the authors express their conclusion (“most of our findings apply to a very broad segment of Americans”). I will take the premise to be probably true, since I have no reason to doubt the truthfulness of the authors and no compelling reason to doubt the word of those who submitted the survey (though it is possible that those who submitted the surveys either overstated or understated the extent of their sexual activities).

This brings us to an evaluation of the argument’s logic. It clearly satisfies the correct form condition, so we can move on to the total evidence condition. Is the sample large enough? It is hard to say, since the excerpt merely states that 4,246 people over the age of 50 responded to the survey; but the argument we are considering is based only on the surveys submitted by men over the age of 70. Let’s suppose there are a few hundred in this category, thus probably the sample is large enough to support the vague “about 80 percent” of the conclusion.

But is the sample randomly selected? Certainly not. As the passage states, the sample is made up of those who voluntarily responded to a survey in Consumer Reports. This filters out all of those who read Consumer Reports but are not interested enough in sex to be interested in filling out a survey on the topic. It also filters out a large group of elderly people who ignore Consumer Reports because they can’t afford most of the items described in the magazine. These people are also unable to afford top medical care and for that reason they are probably less healthy and less interested in sex. In short, the sample is self-selected and thus grossly unrepresentative.

For that reason alone, the logic of the argument is very weak. There is no problem with the argument’s premise or with its conversational relevance, but because of its weak logic it is clearly unsound.

Finally, dirty sampling is the contamination of the sample—usually unintentional—by the sampling process itself. If you are examining your newly laundered shirts with muddy hands, your sample shirts will be muddy. Even if you have made no other sample-selection mistakes, this sample cannot support the general conclusion that all your newly laundered shirts are muddy. This is a failure of randomness, since in a randomly selected sample, exactly the relevant variations of the population are proportionately represented. Introducing mud is introducing a relevant variation that is not in the population.

Dirty sampling does not necessarily introduce dirt, but it does introduce a change in the sample that makes the sample relevantly different from the population. Suppose you are a somewhat absent-minded naturalist and wish to learn more about the eyesight of a tiny species of shrew that is nearing extinction. You use a strong light to see their eyes better, and find that all shrews in your sample have extremely small pupils relative to the size of their eyes. Your sampling procedure, of course, is dirty, since in mammals strong light typically causes the pupils to contract. The sampling process cannot be considered random, and the premise can provide no support to the conclusion.

Guideline. Be alert for ways in which an argument may fail to include a relevant variation in its sample. Typically, arguments that depend on grab sampling, snowball sampling, self-selected sampling, or dirty sampling do not have randomly selected samples and are thus logically very weak (and thus unsound).

Exercises Chapter 14, set (h)

For each of these passages, clarify the inductive generalization and then answer, with a brief explanation, the two total evidence questions.

Sample exercise. “The people, it seems, have declared California Republican Ronald Reagan the winner of the Reagan–Carter debate. Nearly 700,000 people paid 50 cents each to take part in an instant ABC News telephone survey following the presidential debate, and by a 2-to-1 margin they said Ronald Reagan had gained more from the encounter than Georgia Democrat Carter. ABC said that of the callers who reached one of the two special 900-prefix numbers during the 100 minutes following Tuesday night’s debate, 469,412 people or 67 percent dialed the number designated for Reagan and 227,017 or 33 percent dialed the one assigned to Carter. The network said an especially heavy volume of calls was recorded from ‘Western states’ but had no more precise breakdown immediately.”—from the Associated Press

Sample answer.

Sixty-seven percent of the sampled Americans considered Reagan the winner of the debate.
∴About 67 percent of Americans considered Reagan the winner of the debate.

The sample is easily big enough (by 700 times). But it is not randomly selected. It was self-selected, with more Democrats (who would have favored Carter) filtered out because they are not as able to afford the 50 cents and with more non-Westerners (who would have been less likely to favor the Californian Reagan) filtered out because they were in a later time zone and had gone to bed.

21 of 30 students in an English 101 course at the local community college expressed doubt that the degree they were working toward would actually get them a good job. From this it seems reasonable to conclude that the majority of the students at the school don’t have much faith in the practical value of their education.
Only 25 percent of 1,000 residents of Manhattan polled at a free concert in Central Park said they would support privatizing the park and instituting a mandatory fee for entrance. The sample would seem to reflect the attitude of New Yorkers in general.
You are in charge of quality control for a pharmaceutical company, and part of your job is to run a laboratory that collects random samples of your company’s drugs each month and examines them carefully for purity. One month your lab obtains a startling result: 60 percent of the sampled drugs are impure. You alert the company president (and, of course, the public relations officer) that over half of that month’s product is tainted. (Meanwhile, one of your lab technicians inspects the beakers used for pre-examination sample storage and discovers that due to a change in laboratory cleaning protocol this month, a microscopic chemical residue is left on the beakers after cleaning. Minute amounts of this residue have commingled with many of the drugs, causing the impurity.)
In 1936, in the midst of the Great Depression, the Literary Digest randomly selected 10 million names from phone books across the country and mailed them sample ballots for the upcoming presidential election between Republican Alf Landon and Democrat Franklin Delano Roosevelt. About 2 million of the ballots were returned and, based on the results of that sample, the magazine predicted confidently that Landon would win by a clear majority. (Postscript: Roosevelt won with 60 percent of the popular vote, and the Literary Digest, having lost all credibility, ceased publication soon after.)
An elderly woman overheard speaking to her friend: “Recently I drove through a small ‘art-colony’ village in Pennsylvania, which is normally frequented by tourists. I got the shock of my life when I saw about 75 young people all dressed exactly alike—in blue denim! I wondered if there had been a prison break, or an invasion of the Union Army. What is it with our young people? They have about as much individuality as connected sausage links. They all look alike. Same dress, same jeans, same long straight hair—it’s hard to tell one from the other.”
Most of the kids in this remote, rural high school in Grants, New Mexico, have only television to provide them with their images of big cities. Paul Sanchez confesses that he hates what he has seen of New York on television. As part of a class assignment, he writes: “New York seems like a corrupt place. Crime seems to rule. I am not a person who is easily intimidated but TV did it.”—TV Guide
Americans support the idea of letting children attend public schools of their choice. The public favored by a margin of 62 percent to 33 percent allowing students and parents to choose which public schools in their community the students attend. Officials said the Gallup-Phi Delta Kappa poll is the most comprehensive survey of American attitudes on educational issues since the series began in 1969. This year, Gallup interviewers asked a selected sample of 1,500 American adults 80 questions. The margin of error was 3 percentage points.—Associated Press
I have a master’s degree in mathematics and was well thought of by my professors. I am working as a computer programmer, and my coworkers, supervisors, and users admire my abilities. I scored in the upper 2 percentile on college entrance tests, usually in the upper percentile for mathematics and biology. However, I would probably score poorly on the Kaufmans’ test because I have a poor short-term memory. It sometimes takes me several months to learn my telephone number and address when I move. I find it hard to believe there is a strong correlation between short-term memory and the ability to think logically.—Letter to the editor, Science News

Four Ways Samples Can Fail to Be Randomly Selected

Grab sampling
Snowball sampling
Self-selected sampling
Dirty sampling

14.4 Evaluating the Truth of Premises about Sampling

Everything we said in Chapter 9 about the truth of premises applies to the premise of an inductive generalization. Of all the points covered there,
the most important for present purposes is the point about dependence on authority. Usually, whether you accept the premise of an inductive generalization is a matter of whether you believe the sampler. Did the person really sample that population and find that property with that frequency? Make this decision in the same way you make any other decision about whether to rely on an authority.

14.4.1 Misunderstood Samples

In misunderstood samples, the method used for collecting information about the sample is not entirely reliable. This results in a misunderstanding of the sample’s properties, rendering the premise false. A. C. Nielsen, who established the Nielsen ratings system for television shows, began his career doing market research for retailers. One of his early accounts was Procter & Gamble, for whom he did a survey on soap. He carefully constructed his sample, did his survey, and returned with results that were drastically at odds with the Procter & Gamble sales data. The main discrepancy was that sales of Lux bar soap were lagging badly, even though huge numbers of those surveyed said they used Lux regularly. Nielsen was perplexed until he realized that Lux had the image of a soap for the well-to-do; people wanted to impress the interviewer, and thus said they used Lux whether they did or not. His sample was representative—it was large enough and it was randomly selected. The problem was with the premise, which stated that the sampled consumers used Lux with a certain frequency. They did not; the premise was false. The lesson for Nielsen was to find a more reliable way of determining what people really think.

In an old Frank Capra movie called Magic Town, James Stewart plays a public opinion pollster who doesn’t have the resources to compete with the major organizations like Gallup and Harris. He happens upon a small town that perfectly reflects the variations found in the American public in general. He regularly solicits their opinions by disguising the interviews as casual conversation and produces astonishingly accurate results. But his love interest, a journalist played by Jane Wyman, finds out about his technique. Choosing truth over love, she writes a widely distributed feature article about the town. With the appearance of the article, one town councilman snorts, “In one week’s time I wouldn’t give the wart off my nose for anybody’s opinion in this town.” And he is right. Soon the townspeople are setting up booths for the dispensing of their opinions and affecting pompous airs. Aware of their importance, they take themselves too seriously, and the next poll is a disaster. Their views haven’t ceased to be representative; rather, they are now expressing views that they think they ought to have instead of their real views.

There are things that can be done to encourage a misunderstanding of the sample. In Chapter 4 we saw the power of slanted language; two questions might have the same cognitive content, but, cloaked in very different language, might generate very different reactions. When Jerry Falwell was the leader of the ultraconservative Moral Majority, he once took out a full-page advertisement asking readers to return their answers to several questions. One of the questions was this:

Are you willing to trust the survival of America to a nuclear freeze agreement with Russia, a nation that rejects on-site inspection of military facilities to ensure compliance?

It is very hard to say “yes” to the question. But if 90 percent of the respondents said “no” and Falwell reported, say, that 90 percent of the sampled Americans oppose a nuclear freeze agreement with Russia, it is likely that the premise would be false. Even if 90 percent said they opposed it, many would not have been expressing their true views.

At the same time, creative researchers often find ways of overcoming obstacles to understanding the sample. One study by a market research firm asked people to name their favorite magazine, knowing that they were likely to cite magazines that might impress the interviewer, such as Harper’s or the New Yorker. The surveyors, out of gratitude for the interview, then offered each person a free copy of any magazine of the person’s choice. The frequency with which they chose People and TV Guide was much higher than the frequency with which they admitted it was their favorite. You can imagine which data the researchers used as the basis of their report.

Because people’s attitudes are easily hidden they are easy to misunderstand. Misunderstanding of samples is not necessarily limited, however, to people’s attitudes. In principle, any sample can be misunderstood; but the more hidden the property, the greater the opportunity for misunderstanding. I am less likely to misunderstand if I am sampling the weather in my backyard or the number of autos on the freeway. But I may start to slip if I am sampling the weather in China or the number of microparticles in auto emissions on the freeway.

There is no special reason to think that a misunderstood sample is also a dirty sample. As you scrutinize the auto emissions through your microscope, dirt on the lens may lead you to misunderstand the sample and so to offer a false premise about it. But it is not a dirty sample—and thus not unrepresentative—until the dirt falls from the lens and into the microparticles.

Guideline. Be especially alert for ways in which the sample may have been misunderstood, thus producing a false premise. The more hidden the property, the greater the opportunity for misunderstanding.

Exercises Chapter 14, set (i)

For each of these passages, clarify the inductive generalization and then evaluate the truth of the premise, with a special view to whether and how the sample has been misunderstood.

Sample exercise. One study showed that, based on their own report, 80 percent of the population is above average in intelligence.

Sample answer.

Eighty percent of the sampled population is above average in intelligence.
∴About 80 percent of the population is above average in intelligence.

The premise is probably false. (Not certainly, since we are not told the sampling process, and it is possible that the sampling was not random, but was done, say, at a reunion of college graduates.) If people are asked if they are above average in intelligence, they will usually say they are (and probably believe that they are) even if they are not. So there is no reason to accept the premise.

Across the board, reputable polls in 2016 estimated Trump’s level of support to be around 40%. Yet when the votes came in, he received over 46% of the votes. (Consider the “shy Trumper” view that many voters knew that supporting Trump was socially undesirable and thus did not admit it to pollsters.)
According to her, the whole world is rosy. You’d expect her to think that, since she’s always looking at it through rose-colored glasses.
A study by the fitness club study showed that 95 percent of its customers looked better after two months in its program. Subjects were asked to decide whether the customers looked better before or after based on “before” and “after” photographs supplied by the fitness club. (Scrutiny of the photographs indicates that in the “after” pictures the lighting was better and the customer had on more makeup, better clothes, and a bigger smile.)
Two University of Texas at Austin sociologists, David A. Snow and Cynthia L. Phillips, tested 1,125 students to see whether they were primarily concerned with themselves or society—with “impulse” or “institution,” as the researchers put it. Eighty percent saw themselves guided by their own “feeling, thought, and experience.” Only 20 percent saw themselves guided by “institutionalized roles and statuses.”—Psychology Today

14.5 Complex Arguments

Complex arguments, as we have seen, are nothing more than chains of simple arguments. If you can clarify and evaluate simple ones, you can do the same for complex ones. There is one fairly common sort of chain, however, that includes an inductive generalization and is worth considering here.

Sometimes, especially in informal arguments, we move from a statement about a sampled portion of a population to a conclusion about another member of the same population. I might argue, for example, “Every Japanese car I’ve ever owned has been well-built, so that Toyota is probably well-built.” Some would create a special category for such an argument; some logicians, for example, term it a singular predictive inference. Others might quite naturally take it to be an argument from analogy, in which that Toyota is argued to be analogous to every Japanese car I’ve ever owned. (See the next chapter for more detail on arguments from analogy.) But, as we saw briefly in Chapter 11, it is probably most useful to clarify it as a complex argument, made up of an inductive generalization followed by a singular categorical argument (or, in related cases, followed by a frequency argument). The clarification, then, would look something like this:

All sampled Japanese cars are well-built.
∴[All Japanese cars are well-built.]
[That Toyota is a Japanese car.]
∴That Toyota is well-built.

The inference to 2 is an inductive generalization, while the inference from 2 and 3 to C is a singular categorical argument.

Using reasoning of this sort, the FBI makes detailed profiles of criminals, interpreting evidence left at the scene in the light of their extensive records of similar crimes. In one sensational case a white female murder victim, naked and mutilated, was found in the Bronx. Agents at the FBI concluded that the killer was white, because in the overwhelming majority of mutilation murders, the killer is the same race as his victim. They further concluded that the murderer was in his mid-20s to early 30s, because the crime scene demonstrated a kind of methodical organization and such organization made an impulsive teenager or someone in his early 20s an unlikely suspect. An older man would likely have been jailed already, as the urge to commit brutal sex murders tends to surface at an early age, and the chances that a person could commit a number of such murders over a span of years without being captured would be slim. In this way, the FBI put together a detailed portrait of the killer and quickly found and convicted him.

This reasoning includes the sampling of hundreds of cases of mutilation murders; it also includes the application of that experience to a specific case. The following clarification captures one of the many similar complex arguments contained in the passage:

Almost all sampled mutilation murderers are the same race as their victims.
Almost all mutilation murderers are the same race as their victims.
The Bronx murderer is a mutilation murderer.
∴The Bronx murderer is the same race as his victim.

It can now be evaluated in two parts—the first as an inductive generalization, the second as a frequency argument.

Guideline. When an argument moves from a sample to a specific instance, clarify and evaluate it as an inductive generalization followed by a singular categorical argument or frequency argument.

Exercises Chapter 14, set (j)

For each of these passages, clarify and evaluate the complex argument.

Sample exercise. “‘My name is McGlue, sir—William McGlue. I am a brother of the late Alexander McGlue. I picked up your paper this morning, and perceived in it an outrageous insult to my deceased relative, and I have come around to demand, sir, WHAT YOU MEAN by the following infamous language: “The death-angel smote Alexander McGlue, and gave him protracted repose; he wore a checked shirt and a number nine shoe, and he had a pink wart on his nose. No doubt he is happier dwelling in space over there on the evergreen shore. His friends are informed that his funeral takes place precisely at quarter-past-four.”

“This is simply diabolical. My late brother had no wart on his nose, sir. He had upon his nose neither a pink wart nor a green wart, nor a cream-colored wart, nor a wart of any other color. It is a slander! It is a gratuitous insult to my family, and I distinctly want you to say what do you mean by such conduct?

“‘. . . How could I know,’ murmured Mr. Slimmer, ‘. . . that the corpse hadn’t a pink wart? I used to know a man named McGlue and he had one, and I thought all McGlues had. This comes of irregularities in families.’”—Max Adeler, “The Obituary Poet”

Sample answer.

All sampled McGlues have pink warts on their noses.
∴All McGlues have pink warts on their noses.
Alexander McGlue is a McGlue.
∴Alexander McGlue has a pink wart on his nose.

EVALUATION OF ARGUMENT TO 2

TRUTH

Premise 1 is probably true (in the story—given the silly nature of the story); no special reason to doubt Slimmer’s report.

LOGIC

Extremely weak. Satisfies correct form condition for inductive generalization. But a sample of one is insufficient, since having a pink wart on one’s nose is not an all-or-none property for families.

SOUNDNESS

Unsound due to weak logic.

EVALUATION OF ARGUMENT TO C

TRUTH

Premise 2 is certainly false; not only is it not supported by the argument provided, but the story gives evidence that Alexander is a counter
example.

Premise 3 is probably true—given context of story—no reason to doubt it.

LOGIC

Valid singular categorical argument.

SOUNDNESS

Unsound due to falsity of premise 2.

A recent survey of 500 owners of golden retrievers indicated that 95 percent of them considered their dog to be well behaved with children. I think I’ll get this golden retriever, then—since it should be good with my kids.
Two educational psychologists at Temple University analyzed the instructor evaluation ratings done by 5,878 students at Temple, matching the ratings with the grades that the students had predicted for themselves. They found that in most courses, evaluations seem to be based on a variety of factors that generally outweigh the matter of just getting a good grade—that teachers can’t significantly affect their scores by leading students to believe that they will get good grades. So, your professor in this class shouldn’t expect assurances of good grades to inflate your instructor evaluation.

Exercises Chapter 14, set (k)

Clarify and evaluate the following arguments.

Most teenage girls now aspire to professional occupations, such as doctor or lawyer, according to a report by Helen Farmer, a psychologist at the University of Illinois. Farmer queried 1,234 9th and 12th graders from nine Illinois schools. By way of contrast, less than half of the boys had similar aspirations.—Associated Press.
“I’m grateful that CBS still carries the ‘Bugs Bunny/Road Runner’ show, that collection of Warner Bros. animated classics. The only catch is that occasional bits of cartoon ‘violence’ have been trimmed away by the network. This strikes me as unnecessary and downright silly. After all, I watched these cartoons without cuts when I was a kid, and I turned out fine. Or at least OK.”—TV Guide
“I watched ‘PBS NewsHour’ one night last week and I watched the ‘ABC Evening News’ an hour later. With about 28.5 minutes more than the 21 actually delivered by ABC, PBS did an inferior job. I know that ABC’s commercials reap a good deal more cash from the network’s news operation in a month than the alms givers contribute to PBS stations in a year. But I also know that thorough reportage and editing costs no more than sloppy work. The network product is much better, and that’s not what the beggars are claiming.” (Take PBS newscasts as the population.)—George Higgins, Wall Street Journal
Yankelovich Clancy Shulman, a market research company based in Westport, Conn., asked 2,500 consumers whether they agreed or disagreed with the statement: “I feel somewhat guilty when buying non-American made products generally.” The figure was 51 percent, with a margin of error of 2 percentage points. “Something in the back of Americans’ heads is saying that they do or ought to feel guilty,” said Susan Hayward, senior vice president of Yankelovich.—Washington Post
I confess in advance that I saw only a few gusts-worth of “The Winds of War. Not the least amazing thing about the series is that so many had so many evenings free to give it. It is absolutely true that I am, metaphorically speaking, judging the roll by the caraway seed, but caraway seeds aren’t nothing. It seemed to me on brief acquaintance that the acting, to put it in a kindly way, was serviceable rather than inspired.—Charles Champlin, Los Angeles Times

14.6 Summary of Chapter Fourteen

Inductive generalizations are typically represented as arguments with a single premise, in which both the premise and the conclusion are frequency statements. When an argument satisfies the correct form condition for an inductive generalization, the premise states that a sampled portion of a population has a certain property with a certain frequency, while the conclusion says that the entire population has the same property with the same frequency. Thus, these arguments generalize from a sample to a whole. In addition, the conclusion typically allows for a margin of error; this makes the argument logically stronger by making it more probable that the conclusion is true. Large margins of error, though logically helpful, can undermine the practical value of the argument.

The total evidence condition is usually a matter of whether the sample is representative of the population as a whole. Testing for representativeness requires asking two questions. The first question is whether the sample is large enough. As a rough-and-ready rule of thumb, samples should be made up of either one or 1,000 members of the population, regardless of the size of the population itself. A sample of one is enough if the property in question is an all-or-none sort of property. Otherwise, a random sample of 1,000 is typically enough, assuming a margin of error of 3 percent is satisfactory; a larger random sample is required for a smaller margin of error, while a smaller sample requires a larger margin of error. Samples set up in this way can result in arguments with very strong inductive logic; their confidence level is .95, simply meaning that the premises support the conclusion with a .95 level of probability.

The second question is whether the sample is randomly selected. For the practical purposes of this text, this means that every member of the population has had an equal opportunity to be included in the sample, so that exactly the relevant variations of the population might be proportionately represented. If there is no obvious problem with the sample and it is the result of research by a reputable organization, then that may be enough to support the judgment that the sample is randomly selected. There can be many easy-to-detect flaws with samples, however, including grab sampling, snowball sampling, self-selected sampling, and dirty sampling.

Inductive generalizations that are logically strong may nevertheless have false premises. A special problem for such arguments is the misunderstanding of samples; the more hidden the property, the easier it is to misunderstand the sample.

Samples are sometimes used as the basis for conclusions about unsampled single members of the population, without any specific mention of an intermediate general subconclusion. These arguments are best taken as complex enthymemes, made up of an inductive generalization followed by a singular categorical argument or frequency argument.

14.7 Guidelines for Chapter Fourteen

Structure an inductive generalization, when it would be loyal to do so, so that the conclusion drops the term sampled and adds a margin of error.
In the premise of an inductive generalization, translate stylistic variations into the logical constant sampled.
When the principle of loyalty allows, paraphrase inductive generalizations so as to include a non-zero margin of error in the conclusion.
In considering whether an inductive generalization has satisfied the total evidence condition, first ask, Is the sample large enough?
If the property is likely to be all-or-none, then a sample of one is typically enough. It is almost certainly not an all-or-none property if there has been an effort to scientifically construct the sample.
For properties that are not all-or-none, if the margin of error increases appropriately as the sample size decreases, then the logical strength of the argument remains steady.
When the population is large, variation in population size has no bearing on the size of the random sample that is needed, although it may have a bearing on how easy it is to get a random sample.
Judge as very strong the logic of any inductive generalization that renders its conclusion .95 probable.
Do not judge an inductive generalization to be strong unless its sample is randomly selected—that is, unless the sample includes the relevant variations in the appropriate frequency. Remember that not all variations are relevant.
Be alert for ways in which an argument may fail to include a relevant variation in its sample. Typically, arguments that depend on grab sampling, snowball sampling, self-selected sampling, or dirty sampling do not have randomly selected samples and are thus logically very weak (and, thus, unsound).
Be especially alert for ways in which the sample might have been misunderstood, thus producing a false premise. The more hidden the property, the greater the opportunity for misunderstanding.
When an argument moves from a sample to a specific instance, clarify and evaluate it as an inductive generalization followed by a singular categorical argument or frequency argument.

14.8 Glossary for Chapter Fourteen

Confidence level—the logical strength of the argument; the frequency with which the conclusion would be true if the premise(s) were true.

Dirty sampling—the contamination—usually unintentional—of a sample by the sampling process itself. This is a failure of randomness. In a randomly selected sample, exactly the relevant variations of the population are proportionately represented. Introducing contamination is introducing a relevant variation that is not in the population.

Fallacy of hasty generalization—the mistake of arguing from a sample that is not representative—that is not large enough or randomly selected. It is normally more illuminating if you avoid this term and focus your evaluation on the more specific mistakes made by the argument.

Grab sampling—the process of including in your sample whatever members of the population happen to come your way. This is a failure of randomness.

Inductive generalization—argument that draws general conclusions about an entire population from samples taken of members of the population. Form is:

n of sampled F are G. (Where n is any frequency, including 0 and 1.)
∴n (+ or – m) of F are G.

Margin of error—in the conclusion to an inductive generalization, the range of frequencies within which the property is stated to occur. Also called the confidence interval.

Misunderstood sample—when the method used for collecting information about the sample is not entirely reliable it results in a misunderstanding of the sample’s properties, rendering the premise false. The more hidden the property (people’s attitudes, for example, are easily hidden), the more likely the misunderstanding.

Random selection—the process of selecting a sample such that every member of the population has had an equal opportunity to be included, so that exactly the relevant variations of the population might be proportionately represented.

Self-selected sampling—when members of the population decide for themselves whether to be included in the sample. This is a failure of randomness.

Snowball sampling—the process of adding new members to the sample on the basis of their close relationship with those already included (thus gathering members in the same way that a snowball gathers snow as it rolls along). This is a failure of randomness.

Stratification—the construction of a random sample, for practical purposes, by identifying groups within the population that tend to be relatively uniform and including strata, or groups, of the sample in numbers that proportionally represent their membership in the entire population.

This is not the only form, just the most common. There are also, for example, comparative inductive generalizations; which may be clarified as follows: 1. Sampled F has H n more (or less) than sampled G. ∴ C. F has H n(+l-m) more (or less) than G ↵
As statisticians would put it, the overlap in the margins of error means that the difference in the two results is not statistically significant ↵
A property that is normally an all-or-none property is, nevertheless, not necessarily such a property. If you have reason to think that the soup has not been stirred, or that this copy of the newspaper is a dummy, then a sample of one is not sufficient. ↵

License

Icon for the Creative Commons Attribution-NonCommercial 4.0 International License

A Guide to Good Reasoning: Cultivating Intellectual Virtues Copyright © 2020 by David Carl Wilson is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License, except where otherwise noted.

14.1 Correct Form for Inductive Generalization

14.1.1 The Logical Constant Sampled

14.1.2 The Margin of Error

14.2 The Total Evidence Condition (1): Sample Size

14.2.1 The Sample Must Be Large Enough

14.2.2 When a Sample of One Is Enough

14.2.3 When a Sample of 1,000 Is Enough

14.2.4 Population Size

14.2.5 Logical Strength and Confidence Level

14.3 The Total Evidence Condition (2): Random Selection

14.3.1 Random Selection

14.3.2 Random Mistakes

14.4 Evaluating the Truth of Premises about Sampling

14.4.1 Misunderstood Samples

14.5 Complex Arguments

14.6 Summary of Chapter Fourteen

14.7 Guidelines for Chapter Fourteen

14.8 Glossary for Chapter Fourteen

License

Share This Book