sampling the population proportion to measure difference in intervals
Case Study:
No access: sample size = 500, sample proportion = 0.67
Access permited: sample size = 100, sample proportion = 0.80
Solution:
#Input:
sample_one_size = 500
sample_one_prop = 0.68
sample_two_size = 100
sample_two_prop = 0.80
alfa = 0.90
#STEP 1: The point estimate: p1 - p2
#the difference between when access was permitted and when it wasn't
point_estimate = sample_one_prop - sample_two_prop
#STEP 2: to make sure that the samples are sufficiently large.
#2.1 sample one
Error = 3*sqrt((sample_one_prop*(1- sample_one_prop))/sample_one_size)
c(between(sample_one_prop - Error, 0, 1),between(sample_one_prop + Error, 0, 1) )
#[1] TRUE TRUE: the sample 1 size is big enough!
#2.2 sample two
Error = 3*sqrt((sample_two_prop*(1- sample_two_prop))/sample_two_size)
c(between(sample_two_prop - Error,0, 1),between(sample_two_prop + Error, 0, 1))
#[1] TRUE TRUE: the sample 2 size is big enough!
# STEP 3: Confidence Interval:
z_alfa = qnorm( (1 - alfa)/2,
mean = 0,
sd = 1
)
c((sample_one_prop - sample_two_prop) + z_alfa*sqrt(
(sample_one_prop*(1- sample_one_prop))/sample_one_size +
(sample_two_prop*(1- sample_two_prop))/sample_two_size
),
(sample_one_prop - sample_two_prop) - z_alfa*sqrt(
(sample_one_prop*(1- sample_one_prop))/sample_one_size +
(sample_two_prop*(1- sample_two_prop))/sample_two_size
))
Conclusion:
The 90% confidence interval is [−0.20, −0.06 ]
. We are 90% confident
that the difference in the population proportions lies in the interval
[−0.20, −0.06 ]
, in the sense that in repeated sampling 90% of all
intervals constructed from the sample data in this manner will contain
p1 − p2
. Taking into account the labeling of the two populations, this
means that we are 90% confident that the proportion of projects that pass on the first inspection is between 6 and 20 percentage points higher after
public access to the records than before.

Comments
Post a Comment