新闻频道 > 新政风向

机器学习:贝叶斯分类器(二)——高斯朴素贝叶斯分类器代码实现_蜘蛛股票倍投补仓

来源: 新华社
03:12:20

攀枝花蓟盏亟汽车服务有限公司_机器学习:贝叶斯分类器(二)——高斯朴素贝叶斯分类器代码实现

一 高斯朴素贝叶斯分类器代码实现

  • 网上搜索不调用sklearn实现的朴素贝叶斯分类器基本很少,即使有也是结合文本分类的多项式或伯努利类型,因此自己写了一遍能直接封装的高斯类型NB分类器,当然与真正的源码相比少了很多属性和方法,有兴趣的可以自己添加。代码如下(有详细注释):
class NaiveBayes():
    """高斯朴素贝叶斯分类器"""

    def __init__(self):

        self._X_train = None
        self._y_train = None
        self._classes = None
        self._priorlist = None
        self._meanmat = None
        self._varmat = None

    def fit(self, X_train, y_train):
        
        self._X_train = X_train
        self._y_train = y_train
        self._classes = np.unique(self._y_train)                       #  得到各个类别
        priorlist = []
        meanmat0 = np.array([[0, 0, 0, 0]])
        varmat0 = np.array([[0, 0, 0, 0]])
        for i, c in enumerate(self._classes):
            # 计算每个种类的平均值,方差,先验概率
            X_Index_c = self._X_train[np.where(self._y_train == c)]        # 属于某个类别的样本组成的“矩阵”
            priorlist.append(X_Index_c.shape[0] / self._X_train.shape[0])  # 计算类别的先验概率
            X_index_c_mean = np.mean(X_Index_c, axis=0, keepdims=True)     # 计算该类别下每个特征的均值,结果保持二维状态[[3 4 6 2 1]]
            X_index_c_var = np.var(X_Index_c, axis=0, keepdims=True)       # 方差
            meanmat0 = np.append(meanmat0, X_index_c_mean, axis=0)         # 各个类别下的特征均值矩阵罗成新的矩阵,每行代表一个类别。
            varmat0 = np.append(varmat0, X_index_c_var, axis=0)
        self._priorlist = priorlist
        self._meanmat = meanmat0[1:, :]                                    #除去开始多余的第一行
        self._varmat = varmat0[1:, :]

    def predict(self,X_test):
        
        eps = 1e-10                                                        # 防止分母为0
        classof_X_test = []                                                #用于存放测试集中各个实例的所属类别
        for x_sample in X_test:
            matx_sample = np.tile(x_sample,(len(self._classes),1))         #将每个实例沿列拉长,行数为样本的类别数
            mat_numerator = np.exp(-(matx_sample - self._meanmat) ** 2 / (2 * self._varmat + eps))
            mat_denominator = np.sqrt(2 * np.pi * self._varmat + eps)
            list_log = np.sum(np.log(mat_numerator/mat_denominator),axis=1)# 每个类别下的类条件概率取对数后相加
            prior_class_x = list_log + np.log(self._priorlist)             # 加上类先验概率的对数
            prior_class_x_index = np.argmax(prior_class_x)                 # 取对数概率最大的索引
            classof_x = self._classes[prior_class_x_index]                 # 返回一个实例对应的类别
            classof_X_test.append(classof_x)
        return classof_X_test

    def score(self, X_test, y_test):
        
        j = 0
        for i in range(len(self.predict(X_test))):
            if self.predict(X_test)[i] == y_test[i]:
                j += 1
        return ("accuracy: {:.10%}".format(j / len(y_test)))
  • 对于手动实现的高斯型NB分类器,利用鸢尾花数据进行测试,与调用sklearn库的分类器结果差不多,基本在93-96徘徊。这是由于多次进行二八切分,相当于多次留出法。为计算更准确精度,可进行交叉验证并选择多个评价方法,这里不再实现。
import numpy as np
from sklearn import datasets
from sklearn.model_selection import train_test_split
from sklearn import preprocessing
# 获取数据集,并进行8:2切分
iris = datasets.load_iris()
X = iris.data
y = iris.target
# print(X)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)

nb = NaiveBayes()
nb.fit(X_train,y_train)
print(nb.predict(X_test))
print(nb.score(X_test,y_test))
#输出结果如下:
[0, 2, 1, 1, 1, 2, 1, 0, 2, 0, 1, 1, 1, 0, 2, 2, 2, 2, 0, 1, 1, 0, 2, 2, 2, 0, 1, 0, 1, 0]
accuracy: 96.6666666667%

二 其他

  • 基于属性条件独立性假设的朴素贝叶斯,在现实中往往很难成立,因此产生了“半朴素贝叶斯分类器”。其基本思想是适当考虑一部分属性间的相互依赖信息,从而既不需要进行完全联合概率计算,又不至于彻底忽略比较强的属性依赖关系。“独依赖估计”是最常用的一种策略,即假设每个属性在类别之外最多依赖一个其他属性。包括SPODE方法,TAN方法,AODE方法等。
  • np.unique():返回原来array中不重复元素组成的新array,元素从小到大。
y = np.array([1, 2, 9, 1,2,3])
classes = np.unique(y)                     # 返回y中所有不重复的元素组成的新array([1,2,3,9])
print(classes)                             # 结果为np.array([1,2,3,9])
  • np.where():对array进行操作
"""
1. np.where(condition, x, y)
满足条件(condition),满足进行x操作,不满足进行y操作
"""
a= np.array([[9, 7, 3], [4, 5, 2], [6, 3, 8]])
b=np.where(a > 5, 1, 0)               #对于a中的元素如果大于5,则改写成1,否则写成0.                
print(b)
输出结果:
[[1 1 0]
 [0 0 0]
 [1 0 1]]
"""
2. np.where(condition)
只有条件 (condition),没有x和y,则输出满足条件元素的坐标 (等价于numpy.nonzero)。
这里的坐标以tuple的形式给出,通常原数组有多少维,输出的tuple中就包含几个数组,分别对应符合条件元素的各维坐标。
"""
c = np.array([[9, 7, 3], [4, 5, 2], [6, 3, 8]])
d = np.where(c > 5)                                                #条件为元素大于5
print(d)
输出结果如下(元组):
(array([0, 0, 2, 2], dtype=int64), array([0, 1, 0, 2], dtype=int64)) 表示下表为 00,01 20,22的元素满足条件。

a = np.array([1,3,6,9,0])
b = np.where(a > 5)
print(b)
输出结果(array([2, 3], dtype=int64),)表示坐标为2和3的元素满足,注意末尾的逗号,表明一维时实质输出元组为二维,2_,3_只不过后面没有而已,a维数大于等于2时,元组和a维数相同。
输出的结果是可以直接作为数组下标。
x = np.array([[1, 5, 8, 1], [2, 4, 6, 8], [3, 6, 7, 9], [6, 8, 3, 1]])
print(x[b])  结果为x的第2,3行组成的数组[[3  6 7 9]  [6  8 3 1]],等价于x[[2,3]],x[2,3]为输出为元素9,x[[2],[3]]输出数组[9]。

yi gao si pu su bei ye si fen lei qi dai ma shi xian wang shang sou suo bu tiao yong sklearn shi xian de pu su bei ye si fen lei qi ji ben hen shao, ji shi you ye shi jie he wen ben fen lei de duo xiang shi huo bo nu li lei xing, yin ci zi ji xie le yi bian neng zhi jie feng zhuang de gao si lei xing NB fen lei qi, dang ran yu zhen zheng de yuan ma xiang bi shao le hen duo shu xing he fang fa, you xing qu de ke yi zi ji tian jia. dai ma ru xia you xiang xi zhu shi: class NaiveBayes: """ gao si pu su bei ye si fen lei qi""" def __init__ self: self. _X_train None self. _y_train None self. _classes None self. _priorlist None self. _meanmat None self. _varmat None def fit self, X_train, y_train: self. _X_train X_train self. _y_train y_train self. _classes np. unique self. _y_train de dao ge ge lei bie priorlist meanmat0 np. array 0, 0, 0, varmat0 np. array 0, 0, 0, for i, c in enumerate self. _classes: ji suan mei ge zhong lei de ping jun zhi, fang cha, xian yan gai lv X_Index_c self. _X_train np. where self. _y_train c shu yu mou ge lei bie de yang ben zu cheng de" ju zhen" priorlist. append X_Index_c. shape self. _X_train. shape ji suan lei bie de xian yan gai lv X_index_c_mean np. mean X_Index_c, axis 0, keepdims True ji suan gai lei bie xia mei ge te zheng de jun zhi, jie guo bao chi er wei zhuang tai 3 4 6 2 1 X_index_c_var np. var X_Index_c, axis 0, keepdims True fang cha meanmat0 np. append meanmat0, X_index_c_mean, axis ge ge lei bie xia de te zheng jun zhi ju zhen luo cheng xin de ju zhen, mei xing dai biao yi ge lei bie. varmat0 np. append varmat0, X_index_c_var, axis self. _priorlist priorlist self. _meanmat meanmat0 1:, : chu qu kai shi duo yu de di yi xing self. _varmat varmat0 1:, : def predict self, X_test: eps 1e10 fang zhi fen mu wei classof_X_test yong yu cun fang ce shi ji zhong ge ge shi li de suo shu lei bie for x_sample in X_test: matx_sample np. tile x_sample, len self. _classes, 1 jiang mei ge shi li yan lie la chang, xing shu wei yang ben de lei bie shu mat_numerator np. exp matx_sample self. _meanmat 2 2 self. _varmat eps mat_denominator np. sqrt 2 np. pi self. _varmat eps list_log np. sum np. log mat_numerator mat_denominator, axis 1 mei ge lei bie xia de lei tiao jian gai lv qu dui shu hou xiang jia prior_class_x list_log np. log self. _priorlist jia shang lei xian yan gai lv de dui shu prior_class_x_index np. argmax prior_class_x qu dui shu gai lv zui da de suo yin classof_x self. _classes prior_class_x_index fan hui yi ge shi li dui ying de lei bie classof_X_test. append classof_x return classof_X_test def score self, X_test, y_test: j for i in range len self. predict X_test: if self. predict X_test i y_test i: j 1 return " accuracy: :. 10". format j len y_test dui yu shou dong shi xian de gao si xing NB fen lei qi, li yong yuan wei hua shu ju jin xing ce shi, yu diao yong sklearn ku de fen lei qi jie guo cha bu duo, ji ben zai 9396 pai huai. zhe shi you yu duo ci jin xing er ba qie fen, xiang dang yu duo ci liu chu fa. wei ji suan geng zhun que jing du, ke jin xing jiao cha yan zheng bing xuan ze duo ge ping jia fang fa, zhe li bu zai shi xian. import numpy as np from sklearn import datasets from sklearn. model_selection import train_test_split from sklearn import preprocessing huo qu shu ju ji, bing jin xing 8: 2 qie fen iris datasets. load_iris X iris. data y iris. target print X X_train, X_test, y_train, y_test train_test_split X, y, test_size 0. 2 nb NaiveBayes nb. fit X_train, y_train print nb. predict X_test print nb. score X_test, y_test shu chu jie guo ru xia: 0, 2, 1, 1, 1, 2, 1, 0, 2, 0, 1, 1, 1, 0, 2, 2, 2, 2, 0, 1, 1, 0, 2, 2, 2, 0, 1, 0, 1, accuracy: 96. 6666666667 er qi ta ji yu shu xing tiao jian du li xing jia she de pu su bei ye si, zai xian shi zhong wang wang hen nan cheng li, yin ci chan sheng le" ban pu su bei ye si fen lei qi". qi ji ben si xiang shi shi dang kao lv yi bu fen shu xing jian de xiang hu yi lai xin xi, cong er ji bu xu yao jin xing wan quan lian he gai lv ji suan, you bu zhi yu che di hu lue bi jiao qiang de shu xing yi lai guan xi." du yi lai gu ji" shi zui chang yong de yi zhong ce lue, ji jia she mei ge shu xing zai lei bie zhi wai zui duo yi lai yi ge qi ta shu xing. bao kuo SPODE fang fa, TAN fang fa, AODE fang fa deng. np. unique: fan hui yuan lai array zhong bu chong fu yuan su zu cheng de xin array, yuan su cong xiao dao da. y np. array 1, 2, 9, 1, 2, 3 classes np. unique y fan hui y zhong suo you bu chong fu de yuan su zu cheng de xin array 1, 2, 3, 9 print classes jie guo wei np. array 1, 2, 3, 9 np. where: dui array jin xing cao zuo""" 1. np. where condition, x, y man zu tiao jian condition, man zu jin xing x cao zuo, bu man zu jin xing y cao zuo""" a np. array 9, 7, 3, 4, 5, 2, 6, 3, 8 b np. where a gt 5, 1, dui yu a zhong de yuan su ru guo da yu 5, ze gai xie cheng 1, fou ze xie cheng 0. print b shu chu jie guo: 1 1 1 1""" 2. np. where condition zhi you tiao jian condition, mei you x he y, ze shu chu man zu tiao jian yuan su de zuo biao deng jia yu numpy. nonzero. zhe li de zuo biao yi tuple de xing shi gei chu, tong chang yuan shu zu you duo shao wei, shu chu de tuple zhong jiu bao han ji ge shu zu, fen bie dui ying fu he tiao jian yuan su de ge wei zuo biao.""" c np. array 9, 7, 3, 4, 5, 2, 6, 3, 8 d np. where c gt 5 tiao jian wei yuan su da yu 5 print d shu chu jie guo ru xia yuan zu: array 0, 0, 2, 2, dtype int64, array 0, 1, 0, 2, dtype int64 biao shi xia biao wei 00, 01 20, 22 de yuan su man zu tiao jian. a np. array 1, 3, 6, 9, b np. where a gt 5 print b shu chu jie guo array 2, 3, dtype int64, biao shi zuo biao wei 2 he 3 de yuan su man zu, zhu yi mo wei de dou hao, biao ming yi wei shi shi zhi shu chu yuan zu wei er wei, 2_, 3_ zhi bu guo hou mian mei you er yi, a wei shu da yu deng yu 2 shi, yuan zu he a wei shu xiang tong. shu chu de jie guo shi ke yi zhi jie zuo wei shu zu xia biao. x np. array 1, 5, 8, 1, 2, 4, 6, 8, 3, 6, 7, 9, 6, 8, 3, 1 print x b jie guo wei x de di 2, 3 xing zu cheng de shu zu 3 6 7 9 6 8 3 1, deng jia yu x 2, 3, x 2, 3 wei shu chu wei yuan su 9, x 2, 3 shu chu shu zu 9.

当前文章:http://www.fuelmein.com/ef3/457074-1206587-31777.html

发布时间:09:59:15

横财富超级中特网??顶尖高手论坛??白小姐论坛??香港赛马结果??香港挂牌赢钱六肖??本港台直播??香港抓码王??www.11139a.com??香港赛马会财神六肖王??234733.com??

关键词:扬州狄泊廖教育咨询有限公司,琼中勘挛企业管理有限公司,大同呢戳信用担保有限公司责任编辑:文海陵安