The FRQ is a great way to prep for the AP exam! Review FRQ practice writing samples from Unit 5 and corresponding feedback from Fiveable teacher Jerry Kosoff.
A researcher in Yellowstone National Park observed the โOld Faithfulโ geyser for several weeks. For each eruption of the geyser, the duration from start to end, in seconds, was recorded. The histogram below summarizes the results from 421 observations. The mean of the distribution is 210 seconds, with a standard deviation of 68 seconds.
a. Describe the sampling distribution of sample mean eruption length for random samples of 40 eruptions from the researcherโs observations.
b. What is the probability that the sample mean eruption length for a random sample of 40 eruptions is 200 seconds or less?
a) The sampling distribution of sample mean eruption length for random samples of 40 eruptions has a mean of 210 seconds, and it is bimodal with peaks at 100-125 seconds and 250-275 seconds. The shape of the sampling distribution of sample mean eruption length seems to be roughly symmetrical, and the range of the sampling distribution of sample mean eruption length for random samples of 40 eruptions is no more than 250 seconds.
b) x = mean geyser eruption duration for a random sample of 40 eruptions
Conditions:ย Randomย - stated that there were random samples of 40 eruptions,ย 10% Rule for Independenceย - satisfied since there are at least 421(10) = 4210 observations of geyser eruptions,ย Normal/Large Sampleย - satisfied since n =40 >= 30; therefore, the sampling distribution of sample mean eruption duration is approximately normal.
P(x<200) = P(z<-0.14) = .4443 <โfrom Table A using z-score of -0.14
z = (200-210)/68 = -0.14
[pretend i drew a picture of a normal distribution here with 210 as median, 200 slightly to left of it, and everything shaded below -0.14]
The probability that the sample mean eruption duration for a random sample of 40 eruptions is 200 seconds is less is 0.4443.
In part (a), you appear to misunderstand what youโre being asked to describe. You describe the distribution provided by the histogram. However, the histogram is really providing the distribution of the โpopulationโ in this scenario; we are being asked to describe what it would look like if we took repeated samples of 40 eruptionsย from the graph shownย and create aย newย graph of x-bars. Since 40 > 30, the shape of the original distribution (the one you described) doesnโt matter; the Central Limit Theorem applies and we can describe the resultingย samplingย distribution as approximately normal, with a mean of 210 seconds and a standard error of 68/sqrt(40) seconds [using formulas from our formula sheet].
The misconception in part (a) then extends to part (b) - you use the โoriginalโ standard deviation when we should instead use the standardย errorย of 68/sqrt(40) = 10.751 seconds. This impacts the z-score you would get an ultimately your associated probability. In previous rubrics, you would still get partial credit for calculating the probability that you did, because you did all of your work correctly given the mistake you made.
Another small thing: you had the right idea checking the 10% condition, but used the wrong numbers. We should be comparing 40 to 421 (and since 40(10) = 400 < 421, the condition is still met). I am unsure whether that would be penalized on a typical rubric.
a) The population is approximately normal, the value of n (40) is >= 30 by the Central Limit Theorem, and the sample shows no strong skew or outliers. The center of the sample mean is 210 seconds. The variability is 10.752 seconds because 68/sqrt(40) = 10.752.
b) P(x<=200) = ?
Use the z = (x-bar - mu)/(standard deviation/sqrt(n)) equation
(200-210)/10.752 = -0.93
Using Table A, a z-score of -0.93 is a p-value of 0.1762.
The probability that the sample mean eruption length for a random sample of 40 eruptions is 200 seconds or less is 0.1762.
In part (a), you correctly describe the shape, center, and spread of the sampling distribution, citing the Central Limit Theorem as the reason for the distribution being โapproximately normal.โ What you should be careful with is that you start by saying โthe population is approximately normalโ, when itโs theย sampling distributionย that is approximately normal. Unfortunately, that would sometimes be enough to lower your score by a level (from fully correct to partially correct); youโve used the wrong statistical term.
In part (b), you do a good job of communicating the probability you are asked to find, then carry out calculations correctly and answer in context. Nice job!
a.) The sampling distribution is approximately normal because according to the Central Limit Theorem, if the sampling size (40) is greater than 30, the shape is approximately normal. The mean of the sampling distribution is 210 seconds. The standard deviation of the sampling distribution is 68/sqrt(40)=10.75.
b.) P(x<200)= P(z<-0.93)=ย .1762
z= P(200-210)/10.75 (from part a)=-.93
There is a 17.62% chance that the sample mean eruption length for a random sample of 40 eruptions in 200 seconds or less.
Good on both parts! In part (a), you give correct descriptions for shape, center, and spread, and correctly invoke the Central Limit Theorem sinceย nย = 40 > 30. In part (b), you calculate the correct probability. Small note on notation: P(X < 200) should be P(x-bar < 200). I know that we canโt format โx-barโ on here, but you were asked about a sample mean so we need to use the appropriate symbol. That actually could be enough to bump you down a scoring level, so watch your symbols/notation carefully.
A. The sample is approximately normal due to the Central Limit Theorem (40 is greater than 30). There seems to contain not outliers. For the center, the mean of the distribution is 210. And for the spread, the standard deviation is 10.75 ( 68/ square root of 40).
B. Require Assumptions:
Sampling: There is a random sample of 40 eruptions.
Normally Distributed: 40 is greater than 30 therefore it meets the Central Limit Theorem so we can assume approximately normal.
Independence: 10(40) is less than all geyser eruptions.
The mean is 210. Standard deviation is 10.75 ( 68/ square root of 40). I then proceeded to find the z-score:
200-210/10.75=-.9302.
The probability statement is P(z is less than or equal to -.9302).
Then using my calculator I did normalcdf(-1000,-.9302,0,1) and found the p-value which is .1761.
To conclude, thereโs a 17.61% chance that a random sample of 40 eruptions is 200 seconds or less.
Also I would have added a sketch to show the distribution
Well done!
Youโve correctly invoked the CLT in part (a) to justify your shape being approximately normal, while giving correct measures of center and spread. Be careful - your first two words are โthe sampleโ instead of โthe sampling distributionโ - thereโs a big difference in those two things. There are no issues in part (b) - nice job!
A. The sampling distribution of sample mean eruption length for random samples of 40 eruptions would be approximately normal due to the Central Limit Theorem โ because the sample size is greater than 30, the sampling distribution will be approximately normal.
B. The probability that the sample mean eruption length for a random sample of 40 eruptions is 200 seconds or less is about 44.04%. The Z-score would be about -0.15.
For part (a), you correctly identify the shape as โapproximately normalโ due to the CLT (and give the correct reason, n = 40 > 30). However, a description of a distribution (of any type) should include measures of center and spread to go with shape. (Many teachers use โS.O.C.S.โ or โC.U.S.S.โ as acronyms to help students remember - Shape/Outliers/Center/Spread or Center/Unusual Features/Shape/Spread). In this case, you did not mention the mean of the sampling distribution (which would still be 210) or the standard error (which would be 68/sqrt(40) = 10.75). This then impacted your probability calculation in part (b). You correctly used z-scores and did a correct calculation for what your z-score was, but would only earn partial credit from not calculating the standard error in part (a)
a) The sampling distribution of sample mean eruption length for random samples of 40 eruptions from the researcherโs observations is approximately normal(random samples of 40 eruptions > 30; Central Limit theorem). The distribution has a mean(center) of 210 sec and and standard deviation (spread) of 10.75 sec ( 68/ sqrt 40).
b) The probability that the sample mean eruption length for a random sample of 40 eruptions is 200 seconds or less is 0.176.
Conditions: Random: Random sample of 40 eruptions was taken.
Independent: Random sample of 40 eruptions is less than 10% of the population; 400<421
Normal: 40 samples > 30 ; Central Limit Theorem is satisfied
Calculator: normalcdf[ Lower:0, Upper: 200, u: 210, st. dev. : 10.75] = 0.1761
Perfect all around!
a) The distribution eruption length is approximately normal as n>30 with a range of 225 seconds, a mean of 210 seconds, and a standard deviation of ~10.7517 seconds.
Do I need to mention outliers and range here?
b) According to the central limit theorem, a sample of n>=30 so our sample of 40 tells us this is approximately normal. Also, 40*10 is less than the population of 421 and we are told the sample is random.
normCdf(lower=-1e99,upper=200,ฮผ=210,ฯ=10.751744)=0.176164
part (a) has everything needed to describe a distribution (center, shape, and spread are mentioned, so no need for outliers/range), though you should show where the 10.75 seconds calculation came from. Part b, youโve done all appropriate calculations.
a. The sampling distribution of 40 random samples of eruption would be approximately normal. The distribution of 40 random samples would be centered at the mean of 210. The shape of the distribution would be bell-shaped and approximately symmetrical. The sampling distribution would be spread with a standard deviation of 68/sqrt(40) = 10.752. The sampling distribution would not have any unusual features or gaps.
b. Assumptions:
-We have a random sample of geyser eruptions.
-Population of eruptions is at least 400.
-Since the sample size is large enough (n>30) due to CLT, the sampling distribution is approximately normal
-Sigma_d is known
Calculations
p(x_bar โค 200) = normalcdf(-1E99, 200, 210, 10.752) = 0.1762
Conclusion
The probability that the sample mean eruption length for a random sample of 40 eruptions is 200 seconds or less is 0.1762.
Solid work! The only possible issue: in part (a), you mention the shape as โapproximately normalโ, but donโt give the reason for that until part (b) (sinceย nย = 40 > 30, the CLT applies). On some rubrics, weโd be able to give you retroactive credit for part (a) based on that description in part (b), but itโs always safe to show that the CLT applies whenever youโre citing a sampling distribution of x-bar being approximately normal.
a. The sampling distribution of sample mean eruption length for random samples from the researchers observations is approximately normal 40>30 so CLT applies, it is centered at mu=210 and the samples were chosen randomly and the spread is 68/sqrt40=10.75 and the sample of 40(10)=400 which is less than the 421 total observations.
b. 200-210/10.75=.93
p=.1762
The probability that the sample mean eruption length for a random sample of 40 eruptions is 200 seconds or less is .1762.
Perfect! You mention center, shape, and spread in part (a) - correctly applying the CLT - and do appropriate calculations in part (b).
a. The sampling distribution of the sample mean eruption length for random samples of 40 eruptions from the researcherโs observations can be described as approximately normal (by the Central Limit Theorem, as n = 40 >30 and is thus sufficiently large) with a mean of 210 seconds and a standard deviation of 68/sqrt(40) = 10.7517 seconds (N(210, 10.7517))).
b. Let us define the continuous random variable X as N(210, 10.7517) (from the description of the sampling distribution of the sample mean eruption length in part a)
The probability that the sample mean eruption length for a random sample of 40 eruptions is 200 seconds or less is: P(X<200) = normcdf(lowerbound = -infinity, upperbound = 200, mu = 210, sigma = 10.7517) = 0.176164.
Perfect! Youโve correctly justified with the CLT and then done appropriate calculations in part (b).
a) The sampling distribution of sample mean eruption length for random samples of 40 eruptions has a mean of 210 seconds and a standard deviation of 10.75 (68/sqrt40). It is normally distributed as the central limit theorem is applicable because n is greater than or equal to 30 (n=40). Because the sampling distribution is normally distributed, its shape can be described as symmetrical and the distribution shows no evidence of skews or outliers.
b) Conditions: There is a random sample of 40 eruptions. The sample is normally distributed because n>30 which meets the central limit theorem.
P(X <= 40)=?
z=200-210/(68/sqrt40)= -.93
P(z<= 200): normalcdf(-9999,-.93,0,1) = .176
The probability that the sample mean eruption length of a random sample of 40 eruptions is 200s or less is .176
Nice work. Itโs a small thing, but it matters: when using the CLT we have to sayย approximatelyย normal. t-distributions are never perfectly normal until we hit infinity as a sample size (which is of course impossible). It would result in partial credit on an otherwise perfect response.
a. Since the sample is large (n=40 above 30), we can approximate our sample distribution with a normal curve. Since the data is from a random sample, the population mean is the sample mean (210). We can assume there are more than 10(40)=400 eruptions (10% rule). We have the sample standard deviation equal to population standard deviation divided by the square root of n. We have the sample standard deviation of 68/sqrt(40)=10.752.
b. We let x be the mean duration of a random sample of 40 eruptions. We want to find P(xโค200), which is equivalent to P(zโค(200-210)/10.752=-0.93). We use normal cdf with a lower bound of -โ, a higher bound of -0.93, a mean of 0, and a standard deviation of 1, and we get 0.176. The probability that the sample mean eruption length for a random sample of 40 eruptions is 200 seconds or less is 0.176.
Correct on both parts. While you donโt say โCentral Limit Theoremโ in part (a), you justify the shape (approximately normal) based on the sample size, so that will count.