Perzentilentrennung

Karikatur von negativ und positiv schiefen Verteilungen.

Angenommen, Sie möchten, dass die Stichprobe für einige Perzentile ausreicht, um nicht zu hoch zu liegen. Hier ist der Code:

quantile.CI <- function(n, q, alpha=0.1) {
  #
  # Search over a small range of upper and lower order statistics for the 
  # closest coverage to 1-alpha (but not less than it, if possible).
  #
  u <- qbinom(1-alpha/2, n, q) + (-2:2) + 1
  l <- qbinom(alpha/2, n, q) + (-2:2)
  u[u > n] <- Inf
  l[l < 0] <- -Inf
  coverage <- outer(l, u, function(a,b) pbinom(b-1,n,q) - pbinom(a-1,n,q))
  if (max(coverage) < 1-alpha) i <- which(coverage==max(coverage)) else
    i <- which(coverage == min(coverage[coverage >= 1-alpha]))
  i <- i[1]
  #
  # Return the order statistics and the actual coverage.
  #
  u <- rep(u, each=5)[i]
  l <- rep(l, 5)[i]
  return(list(Interval=c(l,u), Coverage=coverage[i]))
}


###bootstrap alternative method############

quantile.CI.via.bootstrap <- function(x, p, alpha = 0.1) {
  ## Purpose:
  ##   Calculate a two-sided confidence interval with confidence level of (1 - alpha) for
  ##   a quantile, based on the (computing intensive) bootstrap resampling method. 
  ##
  ## Arguments:
  ##   - x: a vector of values, representing a data sample. 
  ##   - p: probability cutpoint for the quantile (between 0 and 1).
  ##   - alpha: type I error level (default to 0.1 so a 90% CI is calculated)
  ##
  ## Return:
  ##   - CI: the lower and upper limits of the two-sided CI. 
  
  q <- quantile(x, probs = p)         
  message("Quantile Point Estimate = ", q, " (Probability Cutpoint = ", p, ")\n")
  
  ## Bootstrap resampling with 2000 replications
  library(boot)
  set.seed(1)
  b <- boot(x, function(x, i) quantile(x[i], probs = p), R = 2000)
  
  boot.ci(b, conf = 1 - alpha, type = c("norm", "basic", "perc", "bca"))
  
}

Die erste Methode implementiert die Methode von Hahn & Meeker 1991 und die zweite ist lediglich ein nicht-parametrischer Bootstrap.

n.s<-seq(50,200,5)

q.90<-0.90 #los streng
q.99<-0.99
ergebnisse.90<-matrix(NA,2,length(n.s))
ergebnisse.99<-matrix(NA,2,length(n.s))

for (i in 1:length(n.s)){
  
  lu.90 <- quantile.CI(n.s[i], q.90)$Interval
  results.90[1,i]<-lu.90[1]
  results.90[2,i]<-lu.90[2]
  
  lu.99 <- quantile.CI(n.s[i], q.99)$Intervall
  results.99[1,i]<-lu.99[1]
  results.99[2,i]<-lu.99[2]
}


plot(results.90[2,],ylab="Geschätzte CI-Grenzwertreihenfolge",xlab="n",type="l",col="blue",axes=FALSE,pch=19)
axis(2)
axis(1, at=seq_along(results.90[1,]),labels=as.character(n.s), las=2)
box()
lines(results.99[1,],col="lila",type="l",pch=18)

abline(v = which(n.s==95), col="schwarz", lwd=3, lty=2)
# Hinzufügen einer Legende
legend(1, 175, legend=c("Untergrenze 99.", "Obergrenze 90."),
       col=c("lila", "blau"), lty=1, cex=0.8)

CI-Grenzwertdiagramm mit unterem und oberem Grenzwert

Wie wir sehen können, ist eine Stichprobe von n=95 nach der Bootstrap-Methode ausreichend.

Bootstrap-Ergebnisse:

n.s<-seq(50,200,5)
q.95<-0.90
q.99<-0.99
ergebnisse.95<-matrix(NA,2,length(n.s))
ergebnisse.99<-matrix(NA,2,länge(n.s))


for (i in 1:length(n.s)){

  x <- rnorm(n.s[i], mean = which.mean.max, sd = which.sd.max) 1TP5Hier der größte CV
  lu.95 <-quantile.CI.via.bootstrap(x, 0.90, 0.1)$bca
  results.95[1,i]<-lu.95[4]
  results.95[2,i]<-lu.95[5]
  
  lu.99 <- quantile.CI.via.bootstrap(x, 0.99, 0.1)$bca
  results.99[1,i]<-lu.99[4]
  results.99[2,i]<-lu.99[5]
}



plot(results.95[2,],ylab="Geschätzte CI-Grenze",xlab="n",type="l",col="blue",axes=FALSE,pch=19,ylim=c(300,400))
axis(2)
axis(1, at=seq_along(results.95[1,]),labels=as.character(n.s), las=2)
box()
lines(results.99[1,],col="lila",type="l",pch=18)

abline(v = which(n.s==100), col="schwarz", lwd=3, lty=2)
# Hinzufügen einer Legende
legend(1,5 , legend=c("Untergrenze 99.", "Obergrenze 90."),
       col=c("lila", "blau"), lty=1, cex=0.8)

Liniendiagramm mit geschätzten Konfidenzintervallgrenzen.

Gemäß den Boostrap-Ergebnissen sind 100 ausreichend. Beachten Sie, dass wir hier BCA-Schätzungen in einer Bootrstrap-Stichprobe verwenden, die eine robuste Version darstellt.

Nicht, dass beide Methoden verteilungsfrei, nicht parametrisch sind!

Du kannst sie auch für andere Perzentile verwenden. Viel Spaß!

Schreibe einen Kommentar Antworten abbrechen