引入KPCA对SVM进行HyperOpt寻优程序跑飞

近日使用HyperOpt(TPE)对SVM结合KernelPCA对分类问题进行自动超参数寻优的时候,发现一个奇怪问题。当不引入KernelPCA时,可以HyperOpt可以正常的跑完优化,能得出SVM的超参数组合;但是想进一步获得更高的预测值,引入KernelPCA,此时只能使用嵌套循环来局部遍历KernelPCA中的超参数,但此时会出现HyperOpt优化过程中跑飞。
data数据如下:
A B C D E F G H class
-22.995 -38.182 -24.354 3.813 -21.133 10.802 1912.78 4.86742 1
-5.635 -18.802 -11.348 3.795 -11.206 10.175 2035.19 4.72152 1
-21.548 -35.589 -20.497 3.993 -16.5 11.526 1803.625 5.09828 1
12.278 0.052 5.899 5.963 -0.063 11.055 2183.333 5.87648 1
-21.548 -35.589 -20.497 3.993 -16.5 11.526 1803.625 5.09828 1
-27.465 -43.373 -27.319 3.018 -22.074 11.442 1899.175 4.3895 1
-21.548 -35.589 -20.497 3.993 -16.5 11.526 1803.625 5.09828 1
1.631 -8.885 -6.204 1.221 -4.578 8.051 1808.165 3.21737 1
-18.461 -31.592 -16.981 4.742 -13.64 11.526 1976.875 4.90688 1
-1.661 -12.736 -2.871 6.223 -7.018 10.328 2255.094 5.33437 1
0.353 -10.375 -1.021 6.373 -6.113 9.98 2282.927 5.37054 1
-45.12 -61.295 -34.823 7.752 -34.975 11.526 1594.375 7.27831 1
-39.774 -55.174 -30.373 6.884 -29.577 11.526 1604.625 6.90383 1
-11.521 -24.988 -15.302 3.079 -11.974 13.371 1693.336 5.06397 1
-5.09 -22.93 -9.87 5.802 -8.591 13.381 2589.9 4.55556 1
-27.593 -42.869 -23.19 6.231 -22.78 13.381 1710.3 6.60749 1
-27.657 -41.411 -23.233 4.743 -18.782 12.976 1546.167 6.33068 1
-10.842 -23.614 -11.626 4.356 -10.509 13.381 1714.5 5.48275 1
-12.195 -27.21 -14.823 3.953 -11.866 13.381 2021.9 4.58346 1
-13.37 -26.964 -16.235 3.293 -12.615 13.381 1675.1 5.24691 1
4.143 -11.236 -4.868 3.623 -5.908 11.843 2229.66 4.51397 1
-13.37 -26.964 -16.235 3.293 -12.615 13.381 1675.1 5.24691 1
-12.706 -24.39 -12.77 3.76 -9.459 10.46 1748.906 5.32911 1
4.514 -9.074 -3.87 3.558 -5.012 12.708 2249.735 4.28981 1
-5.113 -19.83 -11.512 3.613 -10.805 13.145 2176.611 4.40658 1
-13.371 -28.963 -17.588 3.64 -15.399 13.332 2111.184 4.48833 1
5.971 -7.128 -1.941 3.104 -0.519 12.236 1930.205 5.69197 1
3.703 -9.023 -2.999 4.007 -5.017 11.376 2092.579 4.87867 1
4.338 -8.154 -2.759 3.676 -4.611 10.889 2079.833 4.67176 1
-2.439 -15.682 -8.425 3.532 -8.613 11.395 2032.069 4.61383 1
7.325 -8.219 -4.807 2.317 -4.671 12.708 2452.794 3.42375 1
0.851 -15.422 -10.478 2.153 -8.267 13.145 2368.389 3.33009 1
-10.333 -27.189 -18.494 1.883 -13.325 13.381 2224.9 3.16374 1
-0.975 -16.748 -7.096 5.189 -7.92 13.145 2321.5 4.90418 1
-8.625 -24.759 -12.75 4.643 -10.358 13.381 2182.7 4.75534 1
-7.924 -22.434 -11.258 4.996 -11.497 12.966 2030.688 5.41516 1
-11.956 -28.14 -17.955 1.695 -12.603 13.381 2145.3 3.0621 1
-13.37 -26.964 -16.235 3.293 -12.615 13.381 1675.1 5.24691 1
-10.016 -14.874 -11.42 2.113 -8.266 5.667 1077.583 4.39651 1
-6.155 -18.64 -12.715 2.367 -10.346 12.966 1633.938 4.51672 1
-14.756 -30.325 -16.767 4.574 -14.396 13.381 1992.9 5.18586 1
-13.37 -26.964 -16.235 3.293 -12.615 13.381 1675.1 5.24691 1
-10.91 -22.975 -14.287 2.121 -9.991 12.201 1707.327 3.69634 1
-12.604 -24.96 -15.426 2.28 -10.753 12.377 1699.48 3.84642 1
-16.489 -29.479 -17.95 2.633 -12.45 12.728 1679.863 4.16638 1
-25.808 -40.312 -23.154 3.663 -17.783 11.716 1581.965 5.34299 1
-8.45 -21.687 -13.661 2.703 -10.845 13.308 1722.436 4.72458 1
4.286 -8.203 -2.797 3.656 -4.368 11.297 2148.455 4.6242 1
4.286 -8.203 -2.797 3.656 -4.368 11.297 2148.455 4.6242 1
-5.838 -21.025 -12.527 2.902 -10.615 12.966 2234.313 4.1369 1
18.334 3.874 6.031 3.386 2.645 13.381 2524.2 4.01454 1
-5.838 -21.025 -12.527 2.902 -10.615 12.966 2234.313 4.1369 1
-13.37 -26.964 -16.235 3.293 -12.615 13.381 1675.1 5.24691 1
-33.047 -48.857 -26.854 6.879 -27.179 13.381 1637.7 6.98481 1
-3.302 -16.549 -6.654 4.553 -6.118 14.897 1911.25 5.09867 1
-12.974 -25.283 -13.393 3.783 -10.449 11.809 1747.493 5.32007 1
2.824 -10.659 -1.988 5.878 -5.881 13.238 2110.928 5.7399 1
-4.797 -19.133 -8.187 5.617 -10.199 13.649 2057.409 5.65456 1
-15.274 -31.128 -19.52 1.892 -14.553 13.385 2140.077 3.41314 1
2.534 -12.998 -6.302 3.245 -6.317 14.5 2398.56 4.1116 1
4.904 -9.815 -3.731 3.594 -4.844 13.463 2319.13 4.50443 1
-3.644 -17.495 -10.154 3.98 -10.381 13.718 2056.75 4.75016 1
-15.826 -29.315 -16.779 3.274 -13.024 13.921 1675.333 5.19967 1
-16.695 -30.409 -17.563 3.29 -13.426 14.222 1675.558 5.15366 1
-17.543 -31.454 -18.289 3.301 -13.8 14.432 1675.774 5.10883 1
-18.361 -32.449 -18.959 3.309 -14.149 14.585 1675.981 5.06514 1
-19.143 -33.389 -19.578 3.314 -14.473 14.697 1676.182 5.02253 1
-17.224 -31.132 -18.475 3.438 -14.299 14.784 1594.192 5.49811 1
2.534 -12.998 -6.302 3.245 -6.317 14.5 2398.56 4.1116 1
-14.441 -28.189 -17.113 3.157 -13.37 14.865 1620.62 5.2489 1
2.911 -7.385 -5.327 1.164 -4.427 10.328 1727.863 3.07247 2
5.452 -5.989 -3.694 0.406 -4.1 11.408 1767.15 0.84781 2
-4.221 -15.998 -10.616 1.985 -8.744 10.779 1768.82 3.94012 2
-0.912 -12.106 -8.14 1.406 -7.078 10.389 1809.8 3.20968 2
-2.561 -14.091 -9.442 1.717 -7.942 10.602 1786.62 3.6225 2
-1.341 -12.211 -8.643 1.858 -6.919 10.836 1693.651 3.97935 2
-1.401 -12.8 -8.71 1.571 -7.438 10.336 1782.435 3.44636 2
-1.02 -12.276 -8.298 1.521 -6.971 10.294 1787.415 3.40831 2
-0.311 -11.381 -7.695 1.44 -6.285 9.978 1794.055 3.35305 2
-2.396 -13.569 -9.567 2.026 -7.701 10.641 1681.85 4.14157 2
-1.341 -12.211 -8.643 1.858 -6.919 10.836 1693.651 3.97935 2
2.792 -8.342 -1.604 2.916 -0.187 12.946 1316.24 5.29995 2
4.062 -7.309 -4.785 0.673 -4.721 12.05 1844.557 1.9994 2
2.548 -9.354 -6.251 1.011 -5.735 12.402 1826.994 2.66169 2
0.292 -12.062 -8.084 1.431 -7.001 12.733 1803.922 3.30216 2
-1.169 -13.725 -9.149 1.674 -7.736 12.887 1789.851 3.61999 2
-2.826 -15.569 -10.276 1.931 -8.513 13.026 1774.316 3.92695 2
-4.647 -17.566 -11.437 2.196 -9.314 13.145 1757.5 4.21869 2
-0.543 -13.018 -8.702 1.572 -7.427 12.825 1795.826 3.49029 2
3.516 -8.084 -5.348 0.803 -5.111 12.198 1837.89 2.27767 2
3.516 -8.084 -5.348 0.803 -5.111 12.198 1837.89 2.27767 2
-0.543 -13.018 -8.702 1.572 -7.427 12.825 1795.826 3.49029 2
2.172 -9.814 -6.706 1.21 -5.835 12.073 1800.897 3.05873 2
-2.685 -15.534 -10.567 1.955 -8.916 12.602 1752.136 3.94665 2
-0.543 -13.018 -8.702 1.572 -7.427 12.825 1795.826 3.49029 2
-0.543 -13.018 -8.702 1.572 -7.427 12.825 1795.826 3.49029 2
3.296 -6.072 -3.672 0.964 -3.975 8.351 1837.573 2.55494 2
1.597 -10.414 -7.212 1.368 -6.157 12.294 1763.206 3.2745 2
1.451 -10.691 -7.314 1.327 -6.342 12.513 1783.75 3.18596 2
1.283 -10.887 -7.355 1.287 -6.456 12.585 1802.132 3.10421 2
1.597 -10.414 -7.212 1.368 -6.157 12.294 1763.206 3.2745 2
1.451 -10.691 -7.314 1.327 -6.342 12.513 1783.75 3.18596 2
1.283 -10.887 -7.355 1.287 -6.456 12.585 1802.132 3.10421 2
0.501 -11.82 -7.926 1.395 -6.892 12.708 1805.971 3.25206 2
-0.255 -11.697 -7.239 1.678 -5.945 11.768 1761.6 3.60685 2
3.516 -8.084 -5.348 0.803 -5.111 12.198 1837.89 2.27767 2
-0.543 -13.018 -8.702 1.572 -7.427 12.825 1795.826 3.49029 2
-0.543 -13.018 -8.702 1.572 -7.427 12.825 1795.826 3.49029 2
-0.519 -12.914 -8.818 1.702 -7.339 12.648 1751.461 3.71087 2
-0.543 -13.018 -8.702 1.572 -7.427 12.825 1795.826 3.49029 2
1.184 -11.059 -7.477 1.264 -6.661 12.224 1816.675 3.07586 2
1.403 -10.98 -7.607 1.281 -6.813 12.224 1792.225 3.09084 2
-33.047 -48.857 -26.854 6.879 -27.179 13.381 1637.7 6.98481 2
-2.685 -15.534 -10.567 1.955 -8.916 12.602 1752.136 3.94665 2
-2.685 -15.534 -10.567 1.955 -8.916 12.602 1752.136 3.94665 2
6.204 -3.77 -2.564 0.543 -2.417 10.21 1752.153 1.88331 2
3.491 -7.773 -5.571 1.102 -4.786 11.36 1720.311 2.93997 2
-0.543 -13.018 -8.702 1.572 -7.427 12.825 1795.826 3.49029 2
0.622 -11.711 -7.952 1.447 -6.831 12.694 1785.531 3.34982 2
3.516 -8.084 -5.348 0.803 -5.111 12.198 1837.89 2.27767 2
-0.543 -13.018 -8.702 1.572 -7.427 12.825 1795.826 3.49029 2
-0.786 -13.126 -7.169 2.714 -7.638 11.287 1891.899 3.79712 2
-2.395 -15.209 -10.264 1.929 -8.582 12.858 1755.955 3.94665 2
3.516 -8.084 -5.348 0.803 -5.111 12.198 1837.89 2.27767 2
0.501 -11.82 -7.926 1.395 -6.892 12.708 1805.971 3.25206 2
-0.543 -13.018 -8.702 1.572 -7.427 12.825 1795.826 3.49029 2
-0.543 -13.018 -8.702 1.572 -7.427 12.825 1795.826 3.49029 2
-0.44 -12.492 -8.733 1.405 -7.68 12.721 1660.825 3.2928 2
-0.543 -13.018 -8.702 1.572 -7.427 12.825 1795.826 3.49029 2
-0.543 -13.018 -8.702 1.572 -7.427 12.825 1795.826 3.49029 2
-0.543 -13.018 -8.702 1.572 -7.427 12.825 1795.826 3.49029 2
-8.446 -19.918 -12.071 2.36 -9.297 11.833 1671.743 4.39555 2
-10.361 -22.292 -13.642 2.461 -10.208 12.24 1672.167 4.36317 2
-12.189 -24.515 -15.049 2.547 -11.034 12.513 1672.568 4.33068 2
-13.908 -26.577 -16.31 2.62 -11.782 12.703 1672.947 4.29821 2
-15.506 -28.474 -17.438 2.681 -12.457 12.835 1673.308 4.26585 2
-0.543 -13.018 -8.702 1.572 -7.427 12.825 1795.826 3.49029 2
-0.543 -13.018 -8.702 1.572 -7.427 12.825 1795.826 3.49029 2
-0.543 -13.018 -8.702 1.572 -7.427 12.825 1795.826 3.49029 2
0.501 -11.82 -7.926 1.395 -6.892 12.708 1805.971 3.25206 2
0.501 -11.82 -7.926 1.395 -6.892 12.708 1805.971 3.25206 2
-0.543 -13.018 -8.702 1.572 -7.427 12.825 1795.826 3.49029 2
7.02 -3.391 -1.844 1.303 -1.229 12.257 1705.292 3.14617 2
3.437 -7.467 -4.305 1.825 -3.169 12.602 1677.227 3.8167 2
8.863 -1.22 -0.462 1.008 -0.139 12.005 1720.135 2.69028 2
5.205 -5.473 -3.122 1.575 -2.237 12.451 1691 3.51209 2
3.437 -7.467 -4.305 1.825 -3.169 12.602 1677.227 3.8167 2
0.089 -11.187 -6.41 2.266 -4.828 12.812 1651.132 4.30018 2
3.516 -8.084 -5.348 0.803 -5.111 12.198 1837.89 2.27767 2
-0.543 -13.018 -8.702 1.572 -7.427 12.825 1795.826 3.49029 2
-2.62 -15.342 -10.14 1.9 -8.419 13.011 1776.227 3.89131 2
0.501 -11.82 -7.926 1.395 -6.892 12.708 1805.971 3.25206 2
-0.543 -13.018 -8.702 1.572 -7.427 12.825 1795.826 3.49029 2
4.316 -6.501 -4.27 0.697 -3.897 12.233 1832.938 2.20804 2
3.917 -7.029 -4.638 0.781 -4.14 12.323 1828.325 2.37781 2
-5.942 -19.394 -10.461 3.608 -12.702 13.671 1815.912 4.99515 2
-27.822 -39.79 -22.869 4.391 -17.104 12.453 1362.95 6.27071 2
-2.302 -15.067 -10.027 1.699 -8.501 14.628 1723.688 3.6795 2
4.383 -6.862 -4.316 0.549 -4.434 13.6 1808.481 1.61094 2
3.351 -8.336 -5.415 0.738 -5.312 13.921 1775.559 2.0547 2
-2.302 -15.067 -10.027 1.699 -8.501 14.628 1723.688 3.6795 2
-2.302 -15.067 -10.027 1.699 -8.501 14.628 1723.688 3.6795 2
0.918 -11.026 -6.667 1.716 -5.498 13.915 1724.94 3.71524 2
12.292 2.272 1.962 0.638 2.017 13.921 1743.794 2.08072 2
8.37 -2.321 -0.879 1.246 -0.21 14.432 1713.217 3.14617 2
-5.942 -19.394 -10.461 3.608 -12.702 13.671 1815.912 4.99515 2
0.024 -12.437 -8.319 1.345 -7.322 14.432 1743.783 3.18053 2
0.024 -12.437 -8.319 1.345 -7.322 14.432 1743.783 3.18053 2
-1.703 -14.398 -9.604 1.612 -8.209 14.585 1728.778 3.56283 2
-1.703 -14.398 -9.604 1.612 -8.209 14.585 1728.778 3.56283 2
-1.703 -14.398 -9.604 1.612 -8.209 14.585 1728.778 3.56283 2
-0.361 -12.421 -7.384 1.685 -7.482 13.037 1823.31 3.46628 2
-3.416 -16.299 -10.788 1.856 -9.026 14.697 1714.318 3.88015 2
0.574 -10.643 -5.953 1.714 -6.459 11.855 1847.005 3.50357 2
0.152 -11.684 -6.665 1.795 -7.261 12.701 1839.983 3.55103 2
-2.879 -15.741 -9.355 2.444 -9.922 13.193 1795.813 4.18728 2
-9.396 -23.402 -13.885 3.86 -14.788 13.179 1749.332 5.29178 2
-10.567 -24.499 -15.153 3.857 -15.197 12.264 1704.552 5.38669 2
-9.214 -23.14 -14.405 3.741 -14.802 12.109 1721.66 5.26023 2
-1.703 -14.398 -9.604 1.612 -8.209 14.585 1728.778 3.56283 2
0.5 -11.354 -6.933 1.374 -6.718 12.808 1824.275 3.14411 2
-0.402 -12.464 -7.401 1.685 -7.499 13.039 1825.13 3.46631 2
-0.402 -12.464 -7.401 1.685 -7.499 13.039 1825.13 3.46631 2
3.319 -8.379 -5.447 0.745 -5.334 13.928 1775.222 2.0706 2
2.481 -9.481 -6.255 0.915 -5.894 14.1 1766.633 2.43675 2
1.59 -10.588 -7.046 1.08 -6.441 14.242 1758.044 2.74668 2
-1.295 -13.94 -9.31 1.551 -8.006 14.553 1732.277 3.47911 2
-10.742 -24.316 -12.021 4.544 -14.445 14.312 1829.991 6.22497 2
0.867 -10.723 -6.207 1.023 -5.944 14.193 1763.607 2.94019 2
0.632 -11.747 -7.328 1.353 -7.615 14.125 1837.063 2.93652 2
-2.98 -15.578 -9.389 2.431 -8.773 13.065 1747.04 4.28105 2
-2.98 -15.578 -9.389 2.431 -8.773 13.065 1747.04 4.28105 2
-2.98 -15.578 -9.389 2.431 -8.773 13.065 1747.04 4.28105 2

KernelPCA+HyperOpt+SVM嵌套循环代码如下:

from sklearn.model_selection import cross_val_score
from sklearn.svm import SVC
from sklearn.metrics import f1_score, accuracy_score
from hyperopt import fmin, tpe, hp
from hyperopt import Trials
from sklearn.decomposition import KernelPCA
from sklearn.model_selection import train_test_split

import warnings
warnings.filterwarnings("ignore")

import pandas as pd
Alloydata = pd.read_csv('data.csv')
X = Alloydata.drop('class',axis=1)
y = Alloydata['class']

from sklearn.preprocessing import StandardScaler
scaler = StandardScaler()

for i in range(9, 13):
    for j in range(1, 101):
        for k in range(1, 31):
            valuei = i
            valuej = j*0.000001
            valuek = k*0.02
            KPCA=KernelPCA(n_components=int(valuei), kernel='rbf', gamma=valuej, degree=1, coef0=valuek)
            X = KPCA.fit_transform(X)
            scaler.fit(X)
            X = scaler.transform(X)
            X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=1)            
            
            def fn_SVC(params):
                acc = cross_val_score(SVC(**params), X_train, y_train, scoring='accuracy', cv=5, n_jobs=-1).mean()
                return -acc
            
            space_SVC = {
                            'C': hp.uniform('C', 5, 500),
                            'gamma': hp.uniform('gamma', 0.00001, 0.9),
                            'kernel': hp.choice('kernel', ['poly']),
                            'degree': hp.choice('degree', [1])
                        }
            trials = Trials()
            best = fmin(fn_SVC, space_SVC, algo=tpe.suggest, max_evals=100, trials=trials)
            SVC_temp = SVC(C = best['C'], gamma = best['gamma'], kernel = 'poly', degree = 1)
            SVC_temp.fit(X_train, y_train)
            test = SVC_temp.predict(X_test)
            print(
                    'Accuracy: %.3f' % SVC_temp.score(X_train, y_train)    
                    + ';  F1 score(weighted): %.3f' % f1_score(y_test, test, average='weighted')
                    + ';  Accuracy: %.3f' % accuracy_score(y_test, test)
                    + ';  C: %.5f' % best['C']
                    + ';  gamma: %.6f' % best['gamma']
                    + ';  n_components: %.0f' % valuei
                    + ';  KPCA_gamma: %.6f' % valuej
                    + ';  coef0: %.3f' % valuek
                )

循环每一遍正常跑下来会有一条结果,例如

100%|██████████| 100/100 [00:01<00:00, 60.57trial/s, best loss: -0.8666666666666666]
Accuracy: 0.881;  F1 score(weighted): 0.897;  Accuracy: 0.897;  C: 310.60313;  gamma: 0.511483;  n_components: 9;  KPCA_gamma: 0.000420;  coef0: 0.240
100%|██████████| 100/100 [00:01<00:00, 66.18trial/s, best loss: -0.8666666666666666]
Accuracy: 0.889;  F1 score(weighted): 0.897;  Accuracy: 0.897;  C: 216.24260;  gamma: 0.108702;  n_components: 9;  KPCA_gamma: 0.000420;  coef0: 0.260

但是嵌套循环遍历没有完成就会中途卡死,例如

 95%|█████████▌| 95/100 [00:20<00:00, 43.84trial/s, best loss: -0.8666666666666666]

针对以上,目前尝试过以下方法:
1.保留嵌套循环,尝试同样的循环范围多跑几遍,观察卡死时所对应的参数值是否都是同一组值。经过实验发现,KPCA中使用同一组参数值,HyperOpt中卡死的位置是不一样的,相当于卡死不是在同一个参数组合下。
2.在以上基础上更换多组不同的KPCA参数组合,卡死位置也是不一致,即随机出现卡死。
3.将上述代码中的循环嵌套剔除,仅保留以下部分

KPCA=KernelPCA(n_components=9, kernel='rbf', gamma=0.00042, degree=1, coef0=0.3)
X = KPCA.fit_transform(X)

scaler.fit(X)
X = scaler.transform(X)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=1)

def fn_SVC(params):
    acc = cross_val_score(SVC(**params), X_train, y_train, scoring='accuracy', cv=5, n_jobs=-1).mean()
    return -acc

space_SVC = {
                'C': hp.uniform('C', 5, 500),
                'gamma': hp.uniform('gamma', 0.00001, 0.9),
                'kernel': hp.choice('kernel', ['poly']),
                'degree': hp.choice('degree', [1])
            }
trials = Trials()
best = fmin(fn_SVC, space_SVC, algo=tpe.suggest, max_evals=100, trials=trials)
SVC_temp = SVC(C = best['C'], gamma = best['gamma'], kernel = 'poly', degree = 1)
SVC_temp.fit(X_train, y_train)
print(SVC_temp.score(X_train, y_train))
test = SVC_temp.predict(X_test)
print('F1 score(weighted): %.3f' % f1_score(y_test, test, average='weighted') + ';  Accuracy: %.3f' % accuracy_score(y_test, test) + ';  C: %.3f' % best['C'] + ';  gamma: %.6f' % best['gamma'])

相当于直接给定KPCA的参数值,此时HyperOpt能够正常优化完成。这就是奇怪的地方,同样的参数单词跑可以完成,嵌套循环不能完成。
4.同时还尝试了将第3步的代码用不同的KPCA参数运行,有时候也会出现跑飞。也就意味着优化过程无法完成还是受到了KPCA参数的影响
综上所述,如何实现在嵌套循环遍历下实现KPCA+HyperOpt+SVM分类问题参数优化?如果KPCA超参数优化也能用HyperO或是贝叶斯优化那就好了,避免了局部遍历的方式,毕竟这种方式效率太低。
请大家指点!