反傳遞代理人網路 -- Back-Propagation Agent Network

自然語言

前言

簡介

歷史

理論篇

知識表達

語法理論

語意理論

語用理論

方法篇

規則比對

機率統計

神經網路

應用篇

語料建構

全文檢索

自動分類

自動摘要

機器翻譯

問答系統

中文處理

程式篇

交談程式

英漢翻譯

維基語料

搜尋引擎

相關資源

語料辭典

程式工具

相關網站

相關文獻

網頁列表

統計資訊

最新修改

訊息

相關網站

參考文獻

最新修改

簡體版

English

論文檔案下載:AgentNet.doc

Chung-Chen Chen

Abstract

Back propagation neural-network successfully resolves many problems in computer science. However, it is not such a good model for symbolic problems such as natural language processing (NLP), because the neuron cannot represent symbol intuitively. In this paper, we proposed an agent-network based on the back-propagation algorithm to resolve not only problems of weight adjustment, but also problem for symbolic computation.

Keywords : Agent, Neural Network, Back-propagation, Natural Language

1. Introduction

Agent becomes a fancy idea in computer science recently without formal definition. Informally speaking, an agent is an object that perceiving its environment through sensors and acting upon that environment through effector [1]. Several question arise from the informal definition, including the terms “environment”, “sensor”, and “effector”. In this paper, we propose a learning model for a network of agents, and define the “envionment”, “sensor” and “effector” in this model. The learning model is similar to the back-propagation neural network (BPNN), so we call it the back-propagation agent network (BPAN).

The difference between BPAN and BPNN is list as following.
(1). All neurons in BPNN are replaced with agents in BPAN.
(2). In BPNN, all input and output for neurons are numbers, but in BPAN, symbolic inputs and output are allowed for each agent.
(3). In BPNN, weights for links are adjusted to approximate the target function, but in BPAN, agent adjust hypothesis to approximate the target function.

2. Background

Although BPAN is similar to BPNN, but the idea comes from the LMS (least mean squares) learning approach invented by Widrow and Hoff in 1960[2]. The LMS approach is a classic method for machine learning and categorized into reinforcement learning approach in several books [1][3]. The goal of LMS approach is to learn a set of weights for utility function to minimize the error E between the training values (Vtrain) and the values predicted by the hypothesis (Vpredict). However, for many problems, such as machine control and chess playing, programs do not get response from environment in every step, so that no training value available for the program. Widrow and Hoff proposed a smart approach to use the average utility of successor state as the expect value (Vexpect) when there are no responses at the time now. The following formula shows function Vexpect and the error function.

Vexpect(e) = s=successor(e) Vpredict(s)
E = e (Vexpect(e) - Vpredict(e))2 ------- (1)

Widrow and Hoff proposed an algorithm to learn weights for minimizing the error function showed in Figure 1.

Algorithm LMS-Learning
Input : a set of training examples {(e1, v1)… (em, vm) }
Output : a set of weights {w1,…wn} to approximate target function f

Initialize weights randomly.
For each training example (e,v)
Use the current weights to calculate Vpredict(e)
For each weight wi, update it as
wi  wi +  (Vexpect(e) – Vpredict(e)) xi
End
Figure 1 : LMS Learning Algorithm
Several applications adopt the LMS approach successfully, such as game-playing and robot control. For game-playing, Arthur Samuel build a successful chess program by modifying the LMS approach [4][5]. The modified approach is now called Temporal Difference Learning (TD-learning) [6][7] with Q-function[][] in several books [1][3]. A more recently application designed by Tesaro called TD-gammon system [10] using the TD-learning have a better result than his previous designed Neurogammon system[11] based on neural network. For robot control, Michie and Chambers designed a system called BOXES to resolve the cart-pole problem [12].

Algorithm Back-Propagation
Input : N: a multi layer network
E : a set of training examples {(e1,v1)…(em,vm)}
 : the learning rate
Output : a set of weights {w1,…wn} to approximate target function f

For each training example (e,v)
Propagate the input forward through the network :
1. Input the instance e to the network and compute the output O(u) for every neuron in the network
Propagate the error backward through the network :
2. For each network output unit k, calculate it’s error terms k
k  ok(1-ok)(tk-ok)
3. For each hidden unit h, calculate its error term h,
h  oh(1-oh)  wkhh
4. wji wji +wji where wji=  j xji
End
Figure 2 : Back-Propagation Algorithm

If we read the algorithm in Figure 1 and Figure 2 carefully, we may found that both of them may be abstract into the following algorithm

For each example e
For each node (n) in a graph in order
Calculate the predicted output of n
For each agent a in reverse order
Calculate the expected input of n
Adjust the weights to approach expect input from predicted output
Figure 3 : Abstract Back-Propagation Algorithm

3. The Model of Back-Propagation Agent Network

For each example e
For each agent a in order
a.predict()
For each agent a in reverse order
a.expect()
a.adjust()
Figure 4 : Back-Propagation Algorithm for Agent Network

A BPAN is a network with shared variables. Agents in the network “sense” the value of variables, adjust the hypothesis and modify the value of variables as it’s “effect”. Shared variables in the network may be categorized into two groups – prediction variables and expectation variables. and define a set of functions for each agent in this model.

4. Examples for Back-Propagation Agent Network

4.1 Reinforcement Learning

5. Conclusion and Future Works.

Reference

1. Russell, S., & Norvig, P. (1995) Artificial Intelligence : A modern approach. Englewood Cliffs, NJ : Prentice-Hall.
2. Widrow, B. and Hoff, M.E. (1960). Adaptive switching circuits. In 1960 IRE WESCON Conention Record, pages 96-104, New York.
3. Tom M. Mitchell (1997) Tom M. Mitchell McGraw-Hill International Editions.
4. Samuel, A.L.(1959) Some studies in machine learning using the game checkers. IBM Journal of Research and Development. 3(3):210-229.
5. Samuel,A.L. (1967). Some studies in machine learning using the game of checkers II-Recent progress. IBM Journal of Research and Development, 11(6):601-617.
6. TD-Learning
7. Watkins, C.J.(1989). Models of Delayed Reinforcement Learning. PhD thesis, Psychology Department, Cambridge University, Cambridge, United Kingdom.
8. Tesaro, G. (1992). Practical issues in temporal difference learning. Machine Learning, 8(3-4):257-277.
9. Tesaro, G. and Sejnowski, T.J. (1989). A parallel network that learns to play backgammon. Artificial Intelligence, 39(3):357-390.
10. Michie, D. and Chambers, R.A.(1968). BOXES: An experiment in adaptive control. In Dale, E. Michie, D., editors, Machine Intelligence 2, pages 125-133. Elsevier/North-Holland, Amsterdam, London, New York.
11. Minsky, M., & Papert, S. (1969). Perceptrons. Cambrige, MA: MIT Press.
12. Newell, A., Shaw, J.C., and Simon, H.A. (1958) Chess Playing Programs and the problem of Complexity. IBM Journal of Research and Development, 4(2):320-335. Repriented in Feigenbaum and Fedman (1963).
13. Genesereth and Ketchpel (1994), Agent-Based Software Engineering.

Facebook

Unless otherwise stated, the content of this page is licensed under Creative Commons Attribution-NonCommercial-ShareAlike 3.0 License