本文共 13382 字,大约阅读时间需要 44 分钟。
上节中,我们介绍了如何计算一个路径的标签得分 e S i e^{S_i} eSi,那么,还有一个问题需要解决,即怎么计算所有路径的总得分: P t o t a l = P 1 + P 2 + … + P N = e S 1 + e S 2 + … + e S N P_{total} = P_1 + P_2 + … + P_N = e^{S_1} + e^{S_2} + … + e^{S_N} Ptotal=P1+P2+…+PN=eS1+eS2+…+eSN
计算总得分最简单的方法就是:枚举所有可能的路径,然后将每条路径的得分加起来,这样虽然能实现目标,但是该方式是非常低效的,训练可能需要很久很久…
在学习下边的内容前,强烈建议你拿出草稿纸和笔,跟着示例步骤,一步一步往下走,用这种方式肯定能帮助你更好地理解算法细节,从而,你可以使用你喜欢的编程语言进行实现。
Step 1:回顾CRF损失函数
在2.3节(上一片博客)中,CRF损失函数定义如下: L o s s F u n c t i o n = P R e a l P a t h P 1 + P 2 + . . . + P N LossFunction=\frac{P_{RealPath}}{P_1+P_2+...+P_N} LossFunction=P1+P2+...+PNPRealPath为了便于计算,我们将该损失函数改写为log损失函数,如下:
L o g L o s s F u n c t i o n = l o g P R e a l P a t h P 1 + P 2 + . . . + P N LogLossFunction=log\frac{P_{RealPath}}{P_1+P_2+...+P_N} LogLossFunction=logP1+P2+...+PNPRealPath因为在训练模型时,我们通常是最小化损失函数,因此我们在上式乘以-1:
L o g L o s s F u n c t i o n LogLossFunction LogLossFunction = − l o g P R e a l P a t h P 1 + P 2 + . . . + P N =-log\frac{P_{RealPath}}{P_1+P_2+...+P_N} =−logP1+P2+...+PNPRealPath = − l o g e S R e a l P a t h e S 1 + e S 2 + . . . + e S N =-log\frac{e^{S_{RealPath}}}{e^{S_1}+e^{S_2}+...+e^{S_N}} =−logeS1+eS2+...+eSNeSRealPath = − ( l o g ( e S R e a l P a t h ) − l o g ( e S 1 + e S 2 + . . . + e S N ) =-(log(e^{S_{RealPath}})-log(e^{S_1}+e^{S_2}+...+e^{S_N}) =−(log(eSRealPath)−log(eS1+eS2+...+eSN) = − ( S R e a l P a t h − l o g ( e S 1 + e S 2 + . . . + e S N ) =-(S_{RealPath}-log(e^{S_1}+e^{S_2}+...+e^{S_N}) =−(SRealPath−log(eS1+eS2+...+eSN) = − ( ∑ i = 1 N x i y i + ∑ i = 1 N − 1 t y i y i + 1 − l o g ( e S 1 + e S 2 + . . . + e S N ) =-(\sum_{i=1}^{N}x_{iy_i}+\sum_{i=1}^{N-1}t_{y_iy_{i+1}}-log(e^{S_1}+e^{S_2}+...+e^{S_N}) =−(∑i=1Nxiyi+∑i=1N−1tyiyi+1−log(eS1+eS2+...+eSN)在之前的章节中,我们已经知道了如何计算真实路径得分,现在我们需要找到一种有效的方法来计算 l o g ( e S 1 + e S 2 + . . . + e S N ) log(e^{S_1}+e^{S_2}+...+e^{S_N}) log(eS1+eS2+...+eSN)。
Step2:回顾发射和转移得分
假设我们正在基于一个长度为3的句子训练模型: x = [ w 0 , w 1 , w 2 ] \mathbf{x} = [w_0, w_1, w_2] x=[w0,w1,w2] 训练数据中,只有两个标签: L a b e l S e t = { l 1 , l 2 } LabelSet = \{l_1,l_2\} LabelSet={ l1,l2} 我们从BiLSTM层获取的发射得分结果如下表所示:l 1 \mathbf{l_1} l1 | l 2 \mathbf{l_2} l2 | |
---|---|---|
w 0 \mathbf{w_0} w0 | x 01 x_{01} x01 | x 02 x_{02} x02 |
w 1 \mathbf{w_1} w1 | x 11 x_{11} x11 | x 12 x_{12} x12 |
w 2 \mathbf{w_2} w2 | x 21 x_{21} x21 | x 22 x_{22} x22 |
x i j x_{ij} xij是 w i w_i wi被标记为 l j l_j lj的得分。
CRF层的转移得分如下表所示:l 1 \mathbf{l_1} l1 | l 2 \mathbf{l_2} l2 | |
---|---|---|
l 1 \mathbf{l_1} l1 | t 11 t_{11} t11 | t 12 t_{12} t12 |
l 2 \mathbf{l_2} l2 | t 21 t_{21} t21 | t 22 t_{22} t22 |
t i j t_{ij} tij是标签从 i i i到 j j j的转移得分。
Step3:计算(备好纸和笔 again)
提示: 我们的目标是计算 log ( e S 1 + e S 2 + … + e S N ) \log(e^{S_1} + e^{S_2} + … + e^{S_N}) log(eS1+eS2+…+eSN)上式的计算是一个累加过程,其思想和动态编程相似(即使你不了解动态编程,你也可以继续,我会一步一步解释示例的,但是还是强烈推荐您去学习一下动态编程算法), w 0 w_0 w0所有路径的总得分计算后,要计算 w 0 → w 1 w_0 → w_1 w0→w1的总得分,然后我们利用上述总得分计算 w 0 → w 1 → w 2 w_0 → w_1 → w_2 w0→w1→w2的得分,该得分就是我们所需要的最终得分。
在下述步骤中,你将会看到两个变量:obs和previous,previous是先前步骤的最终结果,obs是当前单词的信息。w 0 w_0 w0:
o b s = [ x 01 , x 02 ] obs = [x_{01}, x_{02}] obs=[x01,x02]
p r e v i o u s = N o n e previous = None previous=None如果该语句中仅有一个单词 w 0 w_0 w0,我们没有之前的词的结果,所以 previous 为 None。此外,我们也只能获取到第一个单词,其信息 o b s = [ x 01 , x 02 ] obs = [x_{01}, x_{02}] obs=[x01,x02],其中 x 01 x_{01} x01 和 x 02 x_{02} x02即上述所说的发射得分。
那么 w 0 w_0 w0所有可能路径的总得分即为: T o t a l S c o r e ( w 0 ) = log ( e x 01 + e x 02 ) TotalScore(w_0)=\log (e^{x_{01}} + e^{x_{02}}) TotalScore(w0)=log(ex01+ex02)w 0 w_0 w0 → w 1 w_1 w1:
o b s = [ x 11 , x 12 ] obs = [x_{11}, x_{12}] obs=[x11,x12]
p r e v i o u s = [ x 01 , x 02 ] previous = [x_{01}, x_{02}] previous=[x01,x02]1)将 previous扩展为:
p r e v i o u s = ( x 01 x 01 x 02 x 02 ) previous=(\begin{matrix} x_{01} & x_{01} \\ x_{02} & x_{02}\end{matrix}) previous=(x01x02x01x02) 2)将 obs 扩展为: o b s = ( x 11 x 12 x 11 x 12 ) obs=(\begin{matrix} x_{11} & x_{12} \\ x_{11} & x_{12}\end{matrix}) obs=(x11x11x12x12)我们之所以这样做的原始是:矩阵可以使得所有得分计算的效率更高。
3)将 previous obs和转移得分进行相加:
s c o r e s = ( x 01 x 01 x 02 x 02 ) + ( x 11 x 12 x 11 x 12 ) + ( t 11 t 12 t 21 t 22 ) scores=(\begin{matrix} x_{01} & x_{01} \\ x_{02} & x_{02}\end{matrix})+(\begin{matrix} x_{11} & x_{12} \\ x_{11} & x_{12}\end{matrix})+(\begin{matrix} t_{11} & t_{12} \\ t_{21} & t_{22}\end{matrix}) scores=(x01x02x01x02)+(x11x11x12x12)+(t11t21t12t22)
得到 s c o r e s = ( x 01 + x 11 + t 11 x 01 + x 12 + t 12 x 02 + x 11 + t 21 x 02 + x 12 + t 22 ) scores=(\begin{matrix} x_{01}+ x_{11}+t_{11} & x_{01} +x_{12}+t_{12}\\ x_{02}+x_{11}+ t_{21} & x_{02}+ x_{12}+ t_{22}\end{matrix}) scores=(x01+x11+t11x02+x11+t21x01+x12+t12x02+x12+t22)更新 previous:
p r e v i o u s = [ l o g ( e x 01 + x 11 + t 11 + e x 02 + x 11 + t 21 ) , l o g ( e x 01 + x 12 + t 12 + e x 02 + x 12 + t 22 ) ] previous=[log(e^{x_{01}+x_{11}+t_{11}}+e^{x_{02}+x_{11}+t_{21}}),log(e^{x_{01}+x_{12}+t_{12}}+e^{x_{02}+x_{12}+t_{22}})] previous=[log(ex01+x11+t11+ex02+x11+t21),log(ex01+x12+t12+ex02+x12+t22)]这样,第二轮迭代就完成了,从 w 0 → w 1 w_0 → w_1 w0→w1的所有路径: ( l a b e l 1 label_1 label1 → l a b e l 1 label_1 label1, l a b e l 1 label_1 label1 → l a b e l 2 label_2 label2, l a b e l 2 label_2 label2 → l a b e l 1 label_1 label1, l a b e l 2 label_2 label2 → l a b e l 2 label_2 label2),其总得分可以用下式计算:
T o t a l S c o r e ( w 0 → w 1 ) TotalScore(w_0 → w_1) TotalScore(w0→w1) = log ( e p r e v i o u s [ 0 ] + e p r e v i o u s [ 1 ] ) =\log (e^{previous[0]} + e^{previous[1]}) =log(eprevious[0]+eprevious[1]) = log ( e log ( e x 01 + x 11 + t 11 + e x 02 + x 11 + t 21 ) + e log ( e x 01 + x 12 + t 12 + e x 02 + x 12 + t 22 ) ) =\log (e^{\log(e^{x_{01}+x_{11}+t_{11}} + e^{x_{02}+x_{11}+t_{21}})}+ e^{\log(e^{x_{01}+x_{12}+t_{12}} + e^{x_{02}+x_{12}+t_{22}})} ) =log(elog(ex01+x11+t11+ex02+x11+t21)+elog(ex01+x12+t12+ex02+x12+t22)) = log ( e x 01 + x 11 + t 11 + e x 02 + x 11 + t 21 + e x 01 + x 12 + t 12 + e x 02 + x 12 + t 22 ) =\log(e^{x_{01}+x_{11}+t_{11}}+e^{x_{02}+x_{11}+t_{21}}+e^{x_{01}+x_{12}+t_{12}}+e^{x_{02}+x_{12}+t_{22}}) =log(ex01+x11+t11+ex02+x11+t21+ex01+x12+t12+ex02+x12+t22)这就是我们的目标 log ( e S 1 + e S 2 + … + e S N ) \log(e^{S_1} + e^{S_2} + … + e^{S_N}) log(eS1+eS2+…+eSN)。
从上式中我们可以发现w 0 w_0 w0 → w 1 w_1 w1 → w 2 w_2 w2:
当你读到这里的时候,其实你已经学会了,在这次迭代中,我们知道按照上述的方式进行相同操作即可 o b s = [ x 21 , x 22 ] obs = [x_{21}, x_{22}] obs=[x21,x22] p r e v i o u s = [ log ( e x 01 + x 11 + t 11 + e x 02 + x 11 + t 21 ) , log ( e x 01 + x 12 + t 12 + e x 02 + x 12 + t 22 ) ] previous=[\log (e^{x_{01}+x_{11}+t_{11}} + e^{x_{02}+x_{11}+t_{21}}), \log (e^{x_{01}+x_{12}+t_{12}} + e^{x_{02}+x_{12}+t_{22}})] previous=[log(ex01+x11+t11+ex02+x11+t21),log(ex01+x12+t12+ex02+x12+t22)]1)将previous扩展为:
p r e v i o u s = ( log ( e x 01 + x 11 + t 11 + e x 02 + x 11 + t 21 ) log ( e x 01 + x 11 + t 11 + e x 02 + x 11 + t 21 ) log ( e x 01 + x 12 + t 12 + e x 02 + x 12 + t 22 ) log ( e x 01 + x 12 + t 12 + e x 02 + x 12 + t 22 ) ) previous=\left( \begin{matrix}\log (e^{x_{01}+x_{11}+t_{11}} + e^{x_{02}+x_{11}+t_{21}}) & \log (e^{x_{01}+x_{11}+t_{11}} + e^{x_{02}+x_{11}+t_{21}})\\\log (e^{x_{01}+x_{12}+t_{12}} + e^{x_{02}+x_{12}+t_{22}})&\log (e^{x_{01}+x_{12}+t_{12}} + e^{x_{02}+x_{12}+t_{22}})\end{matrix}\right) previous=(log(ex01+x11+t11+ex02+x11+t21)log(ex01+x12+t12+ex02+x12+t22)log(ex01+x11+t11+ex02+x11+t21)log(ex01+x12+t12+ex02+x12+t22)) 2)将obs扩展为: o b s = ( x 21 x 22 x 21 x 22 ) obs = \left( \begin{matrix} x_{21} & x_{22}\\ x_{21} & x_{22} \end{matrix}\right) obs=(x21x21x22x22) 3)将 previous,obs和转移得分加起来:s c o r e s = ( log ( e x 01 + x 11 + t 11 + e x 02 + x 11 + t 21 ) log ( e x 01 + x 11 + t 11 + e x 02 + x 11 + t 21 ) log ( e x 01 + x 12 + t 12 + e x 02 + x 12 + t 22 ) log ( e x 01 + x 12 + t 12 + e x 02 + x 12 + t 22 ) ) + ( x 21 x 22 x 21 x 22 ) + ( t 11 t 12 t 21 t 22 ) scores=\left( \begin{matrix}\log (e^{x_{01}+x_{11}+t_{11}} + e^{x_{02}+x_{11}+t_{21}}) & \log (e^{x_{01}+x_{11}+t_{11}} + e^{x_{02}+x_{11}+t_{21}})\\\log (e^{x_{01}+x_{12}+t_{12}} + e^{x_{02}+x_{12}+t_{22}})&\log (e^{x_{01}+x_{12}+t_{12}} + e^{x_{02}+x_{12}+t_{22}})\end{matrix}\right)+\left( \begin{matrix} x_{21} & x_{22}\\ x_{21} & x_{22} \end{matrix}\right)+(\begin{matrix} t_{11} & t_{12} \\ t_{21} & t_{22}\end{matrix}) scores=(log(ex01+x11+t11+ex02+x11+t21)log(ex01+x12+t12+ex02+x12+t22)log(ex01+x11+t11+ex02+x11+t21)log(ex01+x12+t12+ex02+x12+t22))+(x21x21x22x22)+(t11t21t12t22)
然后: s c o r e s = ( log ( e x 01 + x 11 + t 11 + e x 02 + x 11 + t 21 ) + x 21 + t 11 log ( e x 01 + x 11 + t 11 + e x 02 + x 11 + t 21 ) + x 22 + t 12 log ( e x 01 + x 12 + t 12 + e x 02 + x 12 + t 22 ) + x 21 + t 21 log ( e x 01 + x 12 + t 12 + e x 02 + x 12 + t 22 ) + x 22 + t 22 ) scores=\left( \begin{matrix}\log (e^{x_{01}+x_{11}+t_{11}} + e^{x_{02}+x_{11}+t_{21}})+x_{21}+ t_{11} & \log (e^{x_{01}+x_{11}+t_{11}} + e^{x_{02}+x_{11}+t_{21}})+x_{22}+t_{12}\\\log (e^{x_{01}+x_{12}+t_{12}} + e^{x_{02}+x_{12}+t_{22}})+x_{21}+t_{21}&\log (e^{x_{01}+x_{12}+t_{12}} +e^{x_{02}+x_{12}+t_{22}})+ x_{22}+ t_{22}\end{matrix}\right) scores=(log(ex01+x11+t11+ex02+x11+t21)+x21+t11log(ex01+x12+t12+ex02+x12+t22)+x21+t21log(ex01+x11+t11+ex02+x11+t21)+x22+t12log(ex01+x12+t12+ex02+x12+t22)+x22+t22) 更新previous: p r e v i o u s = [ previous =[ previous=[ log e log ( e x 01 + x 11 + t 11 + e x 02 + x 11 + t 21 ) + x 21 + t 11 + e log ( e x 01 + x 11 + t 11 + e x 02 + x 11 + t 21 ) + x 22 + t 12 ) , log ( e log ( e x 01 + x 12 + t 12 + e x 02 + x 12 + t 22 ) + x 21 + t 21 + e log ( e x 01 + x 12 + t 12 + e x 02 + x 12 + t 22 ) + x 22 + t 22 ) \begin{matrix} \log e^{\log (e^{x_{01}+x_{11}+t_{11}} + e^{x_{02}+x_{11}+t_{21}})+x_{21}+ t_{11}}+e^{\log (e^{x_{01}+x_{11}+t_{11}} + e^{x_{02}+x_{11}+t_{21}})+x_{22}+t_{12}}),\\\log(e^{\log (e^{x_{01}+x_{12}+t_{12}} + e^{x_{02}+x_{12}+t_{22}})+x_{21}+t_{21}}+e^{\log (e^{x_{01}+x_{12}+t_{12}} +e^{x_{02}+x_{12}+t_{22}})+ x_{22}+ t_{22}})\end{matrix} logelog(ex01+x11+t11+ex02+x11+t21)+x21+t11+elog(ex01+x11+t11+ex02+x11+t21)+x22+t12),log(elog(ex01+x12+t12+ex02+x12+t22)+x21+t21+elog(ex01+x12+t12+ex02+x12+t22)+x22+t22) ] ] ] = [ log ( ( e x 01 + x 11 + t 11 + e x 02 + x 11 + t 21 ) e x 21 + t 11 + ( e x 01 + x 12 + t 12 + e x 02 + x 12 + t 22 ) e x 21 + t 21 ) , =[\log( (e^{x_{01}+x_{11}+t_{11}} + e^{x_{02}+x_{11}+t_{21}})e^{x_{21} + t_{11}} + (e^{x_{01}+x_{12}+t_{12}} + e^{x_{02}+x_{12}+t_{22}})e^{x_{21} + t_{21}} ), =[log((ex01+x11+t11+ex02+x11+t21)ex21+t11+(ex01+x12+t12+ex02+x12+t22)ex21+t21), log ( ( e x 01 + x 11 + t 11 + e x 02 + x 11 + t 21 ) e x 22 + t 12 + ( e x 01 + x 12 + t 12 + e x 02 + x 12 + t 22 ) e x 22 + t 22 ) ] \log( (e^{x_{01}+x_{11}+t_{11}} + e^{x_{02}+x_{11}+t_{21}})e^{x_{22} + t_{12}} + (e^{x_{01}+x_{12}+t_{12}} + e^{x_{02}+x_{12}+t_{22}})e^{x_{22} + t_{22}})] log((ex01+x11+t11+ex02+x11+t21)ex22+t12+(ex01+x12+t12+ex02+x12+t22)ex22+t22)] 如上所述,我们使用新更新的previous来计算总得分: T o t a l S c o r e ( w 0 → w 1 → w 2 ) TotalScore(w_0 → w_1 → w_2) TotalScore(w0→w1→w2) = log ( e p r e v i o u s [ 0 ] + e p r e v i o u s [ 1 ] ) =\log (e^{previous[0]} + e^{previous[1]}) =log(eprevious[0]+eprevious[1]) = log ( e log ( ( e x 01 + x 11 + t 11 + e x 02 + x 11 + t 21 ) e x 21 + t 11 + ( e x 01 + x 12 + t 12 + e x 02 + x 12 + t 22 ) e x 21 + t 21 ) =\log (e^{\log( (e^{x_{01}+x_{11}+t_{11}} + e^{x_{02}+x_{11}+t_{21}})e^{x_{21} + t_{11}} + (e^{x_{01}+x_{12}+t_{12}} + e^{x_{02}+x_{12}+t_{22}})e^{x_{21} + t_{21}} )} =log(elog((ex01+x11+t11+ex02+x11+t21)ex21+t11+(ex01+x12+t12+ex02+x12+t22)ex21+t21) + e log ( ( e x 01 + x 11 + t 11 + e x 02 + x 11 + t 21 ) e x 22 + t 12 + ( e x 01 + x 12 + t 12 + e x 02 + x 12 + t 22 ) e x 22 + t 22 ) ) +e^{\log( (e^{x_{01}+x_{11}+t_{11}} + e^{x_{02}+x_{11}+t_{21}})e^{x_{22} + t_{12}} + (e^{x_{01}+x_{12}+t_{12}} + e^{x_{02}+x_{12}+t_{22}})e^{x_{22} + t_{22}})} ) +elog((ex01+x11+t11+ex02+x11+t21)ex22+t12+(ex01+x12+t12+ex02+x12+t22)ex22+t22))= log ( e x 01 + x 11 + t 11 + x 21 + t 11 + e x 02 + x 11 + t 21 + x 21 + t 11 =\log (e^{x_{01}+x_{11}+t_{11}+x_{21}+t_{11}}+e^{x_{02}+x_{11}+t_{21}+x_{21}+t_{11}} =log(ex01+x11+t11+x21+t11+ex02+x11+t21+x21+t11
+ e x 01 + x 12 + t 12 + x 21 + t 21 + e x 02 + x 12 + t 22 + x 21 + t 21 +e^{x_{01}+x_{12}+t_{12}+x_{21}+t_{21}}+e^{x_{02}+x_{12}+t_{22}+x_{21}+t_{21}} +ex01+x12+t12+x21+t21+ex02+x12+t22+x21+t21 + e x 01 + x 11 + t 11 + x 22 + t 12 + e x 02 + x 11 + t 21 + x 22 + t 12 +e^{x_{01}+x_{11}+t_{11}+x_{22}+t_{12}}+e^{x_{02}+x_{11}+t_{21}+x_{22}+t_{12}} +ex01+x11+t11+x22+t12+ex02+x11+t21+x22+t12 + e x 01 + x 12 + t 12 + x 22 + t 22 + e x 02 + x 12 + t 22 + x 22 + t 22 ) +e^{x_{01}+x_{12}+t_{12}+x_{22}+t_{22}}+e^{x_{02}+x_{12}+t_{22}+x_{22}+t_{22}}) +ex01+x12+t12+x22+t22+ex02+x12+t22+x22+t22)撒花
我们已经得到了目标: log ( e S 1 + e S 2 + … + e S N ) \log(e^{S_1} + e^{S_2} + … + e^{S_N}) log(eS1+eS2+…+eSN),因为该示例中,这一句话中有3个词,有两个标签,因此,总共有8个可能的路径。虽然该过程看起来比较复杂,但是用编程语言实现该算法是很简单的,因为计算机比较擅长处理重复性工作。
下篇博客将介绍如何预测一句话的标签。
转载地址:http://ggjti.baihongyu.com/