自然語言處理 -- 程式工具

自然語言

前言

簡介

歷史

理論篇

知識表達

語法理論

語意理論

語用理論

方法篇

規則比對

機率統計

神經網路

應用篇

語料建構

全文檢索

自動分類

自動摘要

機器翻譯

問答系統

中文處理

程式篇

交談程式

英漢翻譯

維基語料

搜尋引擎

相關資源

語料辭典

程式工具

相關網站

相關文獻

網頁列表

統計資訊

最新修改

訊息

相關網站

參考文獻

最新修改

簡體版

English

陳鍾誠的程式實作

  1. 程式實作:中英對照翻譯程式 — 簡單的機器翻譯 (含字典),採用 Java 實作。
  2. 程式實作:翻譯精靈 — 利用 JavaScript 呼叫 Google 翻譯的小書籤(Bookmarklet)。
  3. 程式實作:網路爬蟲 — (Crawler or Spider) 抓取全球網頁,以供全文檢索使用,採用 Java 實作。
  4. 程式實作:檢索系統 — (Index & Search) 全文檢索系統,採用 Java 實作。
  5. 程式實作:中文版 Eliza — 你的程式會與人聊天嗎,本文仿照 Eliza 用 Java 設計了一個交談機器人。

開放原始碼

  1. Google 翻譯 — Google 的翻譯工具網頁,有 JavaScript API 可用。
  2. Moses — 統計式機器翻譯系統 (使用 Giza++ 進行語句對齊)。
  3. Giza++ — (F.J.Och) 統計式語句對齊系統 (Sentence Alignment)。
  4. mkcls — (F.J.Och) training of word classes.
  5. YASMET — (F.J.Och) training of conditional maximum entropy models.

工具分類

以下文章來源為自然語言工具索引網站 — Free/open-source machine translation software

Rule-based systems

  1. Apertium — a free/open-source rule-based machine translation platform.
  2. Matxin — a free/open-source rule-based machine translation system for Basque.
  3. OpenLogos — a free/open-source version of the historical Logos machine translation system.
  4. Anusaaraka — English-Hindi machine translation system.

Statistical machine translation systems

Decoders

  1. Moses — a statistical machine translation system.
  2. Marie — an n-gram-based statistical machine translation decoder.
  3. Joshua — an open source decoder for statistical translation models based on synchronous context free grammars
  4. Phramer — an open-source statistical phrase-based machine translation decoder

Training translation models

  1. Giza++ — is a tool to train translation models for statistical machine translation (see also the related mkcls tool to train word classes)
  2. Thot — is a toolkit to train phrase-based models for statistical machine translation.

Language models

  1. IRSTLM — free/open-source language modelling tool to be used with Moses instead of SRILM, which is not free.
  2. RandLM — space-efficient ngram-based language models built using randomized representations (Bloom Filters etc).
  3. Kenneth Heafield's software — for the fast filtering of ARPA format language models to multiple vocabularies.

Scoring

  1. Kenneth Heafield's scripts — that make it easy to score machine translation output using NIST's BLEU and NIST, TER, and METEOR.

Other software

  1. RIA — is a tool for automatic induction of transfer rules for Transfer-Based Statistical Machine Translation using dependency structures.
  2. Chaski — Distributed phrase-based machine translation training tool based on Hadoop.

Example-based machine translation systems

  1. Cunei — machine translation platform, an example-based machine translation system.
  2. CMU Example-Based Machine Translation System — .
  3. Tilburg University Phrase-based memory-based machine translation system — .
  4. DCU OpenMaTrEx — marker-driven example-based machine translation system] — (partially released before as Marclator) .

Multi-engine machine translation / system combination

  1. Kenneth Heafield's multi-engine machine translation system — .

Aligners and translation models

  1. Giza++ — training of statistical translation models.
  2. Anymalign — a multilingual sub-sentential aligner.
  3. Ventsislav Zhechev's Sub-tree aligner — which can be used for the automatic generation of parallel treebanks.

Web services around machine translation

  1. Tradubi — is an open-source Ajax-based web application for social translation built upon Apertium (may be tested online).

Distributed machine translation

  1. ScaleMT — (no release yet, browse at the Apertium Subversion repository) is a free/open-source framework for building scalable machine translation web services.

Other useful tools

… that may be used to build machine translation systems

  1. Freeling — a free/open-source suite of language analyzers.
  2. Bitextor — an automatic bitext harvester
  3. Foma — a finite-state machine toolkit and library
  4. HFST — Helsinki Finite State Technology for natural-language morphologies.
  5. VISL — CG-3, the constraint grammar parser at the Visual Interactive Syntax Learning project of Syddansk Universitet: browse Subversion repository, source snapshots.

Facebook

Unless otherwise stated, the content of this page is licensed under Creative Commons Attribution-NonCommercial-ShareAlike 3.0 License