# 簡介

P(飛行 | fly) + (P(搭機 | fly) + P(蒼蠅 | fly) = 1

(1)
$$h(w) = - p_w log(p_w)$$

(2)
\begin{align} H(W) = - \sum_{w \in W} p_w log(p_w) \end{align}

# 從觀察樣本到機率模型

(3)
\begin{align} \tilde{p}(x,y) = \frac{n(x, y)}{N} \end{align}

f(x,y) = 1 if y follow x
0 otherwise


(4)
\begin{align} \tilde{p}(f) \equiv \sum_{x,y} \tilde{p}(x,y) f(x,y) \end{align}

(5)
\begin{align} p(f) \equiv \sum_{x,y} \tilde{p}(x) p(y|x) f(x,y) \end{align}
(6)
\begin{align} p(f) \equiv \tilde{p}(f) \end{align}

(7)
\begin{align} \sum_{x,y} \tilde{p}(x,y) f(x,y) = \sum_{x,y} \tilde{p}(x) p(y|x) f(x,y) \end{align}

$\tlide{P}(x,y)$ 時 (像是雙語對齊語料庫)，

# 熵的距離公式

(8)
$$d(X,Y) = H(X,Y) - I(X;Y) = H(X|Y) + H(Y|X) = 2 H(X,Y) - H(X) - H(Y)$$

# 目標

(9)
\begin{eqnarray} && argmax \quad { d(X,Y) } \\ & \rightarrow & argmax \quad { H(X|Y) + H(Y|X)} \\ & \rightarrow & argmax \quad { 2 H(X,Y) - H(X) - H(Y) } \end{eqnarray}

# 參考文獻

1. Maximum Entropy Modeling — http://homepages.inf.ed.ac.uk/lzhang10/maxent.html

# 軟體程式

1. YASMET — training of conditional maximum entropy models, Yet Another Small MaxEnt Toolkit. Believe it or not, this implementation is written in only 132 lines of C++ code and still has feature selection and gaussian smoothing. You need GCC 2.9x to compile the source.
2. maxent.sf.net — Great java maxent implementation with GIS training algorithm. Part of OpenNlp project.
3. Amis — A maximum entropy estimator for feature forests. A maximum entropy estimator with GIS, IIS and L-BFGS algorithms.
4. maxent — Another Maximum Entropy Modeling Package with Ruby binding, GIS, Gaussian Prior smoothing and XML data format.
5. Predictive Modeling Toolkit
6. Robert Malouf's [Maximum Entropy Parameter Estimation software], now available as [Toolkit for Advanced Discriminative Modeling on sourceforge.net], has GIS, IIS, L-BFGS and Gradient Descent training methods and parallel computation ability through PETSc. You may want to read his paper first.
7. MEGA Model Optimization Package — A recently appeared ME implementation by Hal Daumé III. The software features CG and LM-BFGS Optimization and is written in OCaml. Although I no longer use OCaml, I'd say that's a great language, and is worth learning.
8. Text Modeller — A python implementation of a joint Maximum Entropy model (aka. Whole Sentence Language Model) with sampling based training. Now seems to be part of scipy.
9. Stanford Classifer is another open source implementation of Maximum Entropy Model in java, suitable for NLP tagging and parsing tasks.
10. NLTK includes a maxent classifier written entirely in Python. IIS and GIS training methods available. Suitable for text categorization and related NLP tasks.
11. Here is another small maxent package in C++ with a BSD-like license, written by Dekang Lin.
12. SharpEntropy, a C# port of the java maxent package (http://maxent.sf.net) mentioned above.
13. Maxent software for species habitat modeling by Robert E. Schapire et al. Registration needed for downloading.
14. Maxent implementation in C++ with Python binding, GIS, L-BFGS and Gaussian Prior Smoothing