A Decision-making process Markovian partially observable (POMDP) is a Modèle Stochastique resulting from the Decision theory and Theory of probability. The models of this family, inter alia, are used in Artificial intelligence for the control of complex systems like intelligent agents.

This model is derived from the Markovian Decision-making processes (MDP). The difference is that, in a POMDP , uncertainty is double. Not only the effect of the actions which one undertakes is dubious, but moreover, one has only indices to know the state in which one is, and thus to decide. These indices are called observations and in this direction, POMDP are Modèles of Markov Cachés (HMM) particular, in which one has probabilistic actions.

Formal definition

A POMDP is a tuple \ {S, has, T, R, \ Omega, O \} \, where:

  • S = \ {s_0, \ cdots, s_ { |S-1| } \} \, is the discrete finished whole of the states possible of the system to be controlled (they are the hidden states of the process).

  • A = \ {a_0, \ cdots, a_ \} is a whole of symbols which one can observe.
  • O: S \ times \ Omega \ to is a function of observation which in a given state associates the probability p (\ Omega | S) = O (S, \ Omega) \, to observe a given symbol.

Note: There exist alternatives in the which rewards can depend on the actions or the observations. The observations can also depend on the actions carried out.

Approaches

There exist two great types of approaches to attack a problem POMDP .

  • One can seek to determine in the most unquestionable possible way the which state in which one is (by maintaining up to date a probability distribution on the states called belief-state )
  • One can work directly on the observations of \ Omega without considering the hidden state. That is not without posing problems bus of the similar observations can be obtained in different states (for example, with the local observation of the crossroads in a Labyrinthe, one can fall very well on two crossroads in form T). A possible approach to discriminate these observations consists in keeping a memory of the observations met in the past (let us note that in this case, one loses the Markovian Propriété).

Related articles

  • the Markovian Decision-making processes (MDP), from which POMDP for the aspect decision derive,

  • Modèle of Markov hidden, from which POMDP for the aspect partial observability derive,
  • stochastic Processus

References

  1. Kaebling L.P., Littman Mr. L., Cassandra A.R., '' Planning and Acting in Observable Partially Stochastic Domains '', '' Artificial Intelligence '', vol. 101, num. 1-2, pp. 99-134, 1998.
  2. McCallum A.K., '' Reinforcement learning with selective perception and hidden state '', '' PhD thesis '', University off Rochester, Computer Dept Science., 1996.

External bonds

  • Tony' S POMDP Page is a page of resources of Anthony R. Cassandra

  • POMDP information page, the page of resources of Michael L. Littman

Random links:Route European 7 | Islamic bank of development | Blazers of Kamloops | Canton of Boulogne-Billancourt-North-Is | ACE Khroub