使用蒙特卡洛方法对几何分布进行序贯概率比检验 (SPRT)

问题遇到的现象和发生背景

在 2 个复合假设的情况下，用计算机实施广义顺序测试，
使用蒙特卡洛方法估计其错误类型 I 和 II 的概率，以及条件预期样本量。

实现：几何分布，$\theta$ 是它唯一的参数。

分析错误概率和“假设接近度”和“大小”的预期样本大小的依赖关系。
将 $\alpha$ 和 $\beta$ 用于您选择的阈值计算。

我的解答思路和尝试过的方法，不写自己思路的，回答率下降 60%

我找到的方法代码如下，但是仅仅只是实现几何分布的蒙特卡洛方法，不知道该怎样实现使用蒙特卡洛方法对几何分布进行序贯概率比检验 (SPRT)。

import numpy as np
import matplotlib.pyplot as plt

np.random.seed(222)

# 把计算得到的函数写成一个函数
def distribution_z(z, p, max_k=200):
    import math
    j = int(math.floor(z))
    A = 0
    for m in range(1, j + 1):
        A += (1 - p) ** (m - 1)
    A *= p

    B = 0
    for k in range(j + 1, max_k + 1):
        a = (1 - p) ** (k - 1)
        a /= k
        B += a
    B *= z * p

    return A + B


def pdf_z(z, p, max_k=200):
    import math
    j = int(math.floor(z))
    B = 0
    for k in range(j + 1, max_k + 1):
        a = (1 - p) ** (k - 1)
        a /= k
        B += a
    return B * p


p = 0.1
# 选取数据点，点越多越精确
dataPoints = 10000

Unit = np.random.rand(dataPoints)
Geom = np.random.geometric(p, dataPoints)
distri_of_Monte = Geom * Unit

# 概率密度函数 PDF
plt.hist(distri_of_Monte, bins=40, range=(0, 40))
points_of_z = np.arange(0, 41, 0.01)
pdf_of_z = np.array([pdf_z(zi, p) for zi in points_of_z]) * dataPoints
plt.plot(points_of_z, pdf_of_z)
# print(pdf_of_z)
plt.show()

hist, bin_edges = np.histogram(distri_of_Monte, bins=40, range=(0, 40))

# 概率分布函数 CDF
hist_list = np.cumsum(hist) / dataPoints

plt.plot(bin_edges[1:], hist_list)

points_of_z = np.arange(1, 41, 0.1)
distri_of_z = [distribution_z(zi, p) for zi in points_of_z]

plt.plot(points_of_z, distri_of_z)

plt.show()


import sprt as sprt
import numpy as np

# Null value
h0 = 0.5
# Alternative value
h1 = 0.55
# Type I error rate = 0.05
alpha = 0.05
# Type II error rate = 0.2
beta = 0.2
# Values
values = np.random.binomial(1, 0.55, 100)
test = sprt.SPRTBinomial(h0 = h0, h1 = h1, alpha = alpha, beta = beta, values = values)

test.plot()

# Plot the data and boundary but without fill the color
test.plot(fill = False)

我想要达到的结果，如果你需要快速回答，请尝试 “付费悬赏”

使用蒙特卡洛方法，对几何分布进行序贯概率比检验 (SPRT)，同时满足题目的要求。使用Python编写代码。

可以先导入所需的库：

import numpy as np
import matplotlib.pyplot as plt

然后可以定义一个函数来实现序贯概率比检验 (SPRT) 算法。这个函数接受一组观察数据和两个阈值 $\alpha$ 和 $\beta$ 作为参数，并返回是否应该拒绝假设。

def sprt(data, alpha, beta):
    n = len(data)
    p_hat = data.mean()
    z = p_hat / (1 - p_hat)
    a = (1 - beta) / alpha
    b = beta / (1 - alpha)
    for i in range(n):
        z *= data[i] / (1 - data[i])
        if z < a:
            return "Reject H0"
        elif z > b:
            return "Reject H1"
    return "Fail to reject"

然后可以使用蒙特卡罗方法来模拟数据生成过程。假设已经实现了一个函数来计算几何分布的概率密度函数（pdf）和累积分布函数（cdf），这里就不再赘述。

接下来，可以定义一个函数来生成符合几何分布的观察数据。这个函数接受几何分布的唯一参数 $\theta$ 和要生成的数据点数量 $n$ 作为参数，并返回一组符合几何分布的观察数据。

def generate_data(theta, n):
    p = 1 - theta
    data = np.random.geometric(p, n)
    return data

然后可以使用这个函数来生成大量的符合几何分布的观察数据，并使用 SPRT 算法对这些数据进行检验。可以记录每次检验的结果，并统计错误类型 I 和 II 的次数。

# 设置参数
theta = 0.5
alpha = 0.05
beta = 0.1
n_trials = 10000

# 记录错误类型 I 和 II 的次数
error_I = 0
error_II = 0

# 运行多次模拟
for i in range(n_trials):
    # 生成符合几何分布的观察数据
    data = generate_data(theta, n)
    # 进行序贯概率比检验
    result = sprt(data, alpha, beta)
    if result == "Reject H0":
        error_I += 1
    elif result == "Reject H1":
        error_II += 1

# 计算错误概率
error_I_prob = error_I / n_trials
error_II_prob = error_II / n_trials

可以使用计算出的错误概率来估计条件预期样本大小。可以通过计算错误概率与假设接近度之间的依赖关系来估计条件预期样本大小。这里假设已经定义了一个函数 expected_sample_size 来计算条件预期样本大小。

# 计算条件预期样本大小
n_0 = expected_sample_size(error_I_prob, alpha)
n_1 = expected_sample_size(error_II_prob, beta)

望采纳。

我使用蒙特卡洛方法来模拟一个随机变量，然后使用序贯概率比检验来判断随机变量是否满足几何分布


import numpy as np

def monte_carlo_sprt(n, alpha, beta, true_prob):
# 初始化检验统计量
test_statistic = 0


# 在随机变量的取值范围内进行n次模拟
for i in range(n):
    # 随机生成一个随机变量
    x = np.random.geometric(true_prob)
    # 根据随机变量的取值更新检验统计量
    test_statistic += np.log(true_prob / (1 - true_prob))

# 判断检验统计量是否超过阈值
if test_statistic > np.log(beta / (1 - alpha)):
    # 检验统计量超过阈值，拒绝假设
    return False
elif test_statistic < np.log(alpha / (1 - beta)):
    # 检验统计量小于阈值，接受假设
    return True
else:
    # 检验统计量在阈值之间，无法决定是否接受或拒绝假设，继续进行模拟
    return monte_carlo_sprt(n, alpha, beta, true_prob)
#测试
result = monte_carlo_sprt(1000, 0.1, 0.1, 0.5)
print(result)

Python：利用蒙特卡洛方法模拟验证概率分布

import numpy as np
import matplotlib.pyplot as plt

np.random.seed(222)

# 把计算得到的函数写成一个函数
def distribution_z(z, p, max_k = 200):
    import math
    j = int(math.floor(z))
    A = 0
    for m in range(1, j + 1):
        A += (1 - p) ** (m - 1)
    A *= p
    
    B = 0
    for k in range(j + 1, max_k + 1):
        a = (1 - p) ** (k - 1)
        a /= k
        B += a
    B *= z * p

    return A + B

def pdf_z(z, p, max_k = 200):
    import math
    j = int(math.floor(z))
    B = 0
    for k in range(j + 1, max_k + 1):
        a = (1 - p) ** (k - 1)
        a /= k
        B += a
    return B * p

p = 0.1
# 选取数据点，点越多越精确
dataPoints = 20000

Unit = np.random.rand(dataPoints)
Geom = np.random.geometric(p, dataPoints)
distri_of_Monte = Geom * Unit

# 概率密度函数 PDF
plt.hist(distri_of_Monte, bins = 40, range = (0, 40))
points_of_z = np.arange(0, 41, 0.01)
pdf_of_z = np.array([pdf_z(zi, p) for zi in points_of_z]) * dataPoints
plt.plot(points_of_z, pdf_of_z)
# print(pdf_of_z)
plt.show()

hist, bin_edges = np.histogram(distri_of_Monte, bins = 40, range = (0, 40))

# 概率分布函数 CDF
hist_list = np.cumsum(hist) / dataPoints

plt.plot(bin_edges[1:], hist_list)

points_of_z = np.arange(1, 41, 0.1)
distri_of_z = [distribution_z(zi, p) for zi in points_of_z]

plt.plot(points_of_z, distri_of_z)

plt.show()

https://blog.csdn.net/weixin_39679367/article/details/107908264

参考实例【Python：利用蒙特卡洛方法模拟验证概率分布】，链接：https://blog.csdn.net/weixin_39679367/article/details/107908264