Reasoning by case

See also: CBR

To solve the problems of the daily life, we call upon our experiment naturally. We remind ourselves the similar situations already met. Then let us compare we them with the current location to build a new solution which, in its turn, will be added to our experiment.

The reasoning by case or Box Based Reasoning (CBR) , copy this human behavior. It solves the problems by finding similar cases in its base of knowledge and by adapting them to the case considered. This technology appeared there are about fifteen years but initial work on the subject goes back however to the experiments of Schank and Abelson in 1977 at the university of Yale. It however remains ignored still enough compared to other technologies belonging to the field of the cognitive Sciences as the dated mining. It differs from the latter by its approach. Indeed, here, one uses only indirectly the data to find the close cases, from which one will generate a solution.

Stages of the process

A system CBR has a base of case. Each case has a description and a solution. To use this information, an engine is also present. This one will find the cases similar to the problem arising. After analysis, the engine provides an adapted solution which must be validated. Finally the engine adds the problem and its solution in the base of case.

This diagram presents well the principal stages in the process of a system of reasoning per case. From these stages three main issues emerge:

  • the representation of the cases
  • the research of the cases
  • the creation of the function of adaptation

To develop a system of reasoning per case worthy of this name, it is thus necessary to find a solution effective with each one of these problems. The revision and the training are two other problems which rise from the three first.

Representation of the cases

The representation of the cases takes an important place in the realization of a system CBR. Indeed this representation will determine the effectiveness and the speed of the research of the cases in the base. It is thus necessary to choose information to be stored in each case and to find in which form.

Structure of a case

A case is described by many characteristics representing various types of information:

  • the description of the problem
  • the solution and the stages which carried out
  • to it the result of the evaluation
  • the explanation of the failures

All the CBR inevitably do not use each type of information. Of course, the description of the problem and the solution brought are essential elements. Certain characteristics (most discriminating) will be used as index at the time of the search and the addition for case. The indices must be sufficiently concrete and abstracts at the same time so that they relate to a maximum of case and that they are reusable in the future reasoning. They must also make it possible to deduce the cases quickly.

Generally one regards the cases as a list of couples attribute-value. Each couple corresponding to a characteristic. The attributes are typified, here for example the types used in ReMind:

  • Standard traditional: text, entirety, reality, Boolean, date.

  • Standard symbol: it makes it possible to enumerate a list of symbols which will be stored in a tree. The root of the tree will contain the most general symbol and the sheets the most specific symbols.

  • Standard case: it makes it possible to refer cases which are under parts of the case considered.

  • Standard formula: the value of this attribute is the result of the calculation of a formula.

  • Standard list: this type is a list of objects using the preceding types.

Organization of the memory

Then it is necessary to build a model of organization and indexing to connect the cases between them. This model must have certain qualities. First of all it is necessary that the addition ensures accessibility the old cases. The search for similar cases must preserve a constant complexity as the base of case fills. A system of CBR not being interesting that for an important base of case, it is obviously necessary to consider a solution making it possible to find the similar cases quickly. Generally one uses the indexing for this reason.

There exists in many ways to order the cases, we will study the whole of the existing models quickly. We will be delayed on two models in particular:

  • the model with dynamic storage
  • the model containing categories

Simple model

Let us start first of all with the most simplistic model: the linear organization. Of course, this organization is not used to manage the whole of the memory of the cases. However it can be implicitly combined with other more complex models on the level the small ones under - whole of case.

It is possible to organize the memory in the form of a Decision tree: each node corresponds to a question about one of the indices and the wire correspond to the various answers. To be the most effective possible tree must put the questions in the good order and be the least deep possible. This tree must be built dynamically. The best method to build it is to use the Data mining.

Another model consists in building the memory in the form of a hierarchy of prototypes. A prototype makes it possible to describe conditions on characteristics of the cases. All the cases checking these conditions are associated with this prototype. The prototypes are organized in a hierarchy of heritage. One can thus specify general prototypes of which inherit the more specific prototypes. By combining the decision trees with this hierarchy of prototypes, one obtains an interesting structure. The “final” prototypes then do not store any more their cases in a list but in a decision tree. The hierarchy of prototype represents knowledge a priori system and the dynamically generated decision trees allow a rather flexible structure.

Model with dynamic storage

The model with dynamic storage was introduced by Robert Schank and Janet Kolodner. In this model, the cases are stored in a hierarchical structure called generalized episode. One also speaks about MOP for Memory Organization Packets. The various cases having similar properties are gathered in a more general structure, a generalized episode. They contain three objects types:

  • standards: characteristics common to each case indexed under the generalized episode.

  • indices: elements discriminating the cases contained in the generalized episode. An index has two fields: its name and its value. It can point towards another episode or simply towards a case.
  • cases: the knowledge of the system. One thus reaches it via index.

The diagram gives an idea of the model to dynamic storage. It has a structure close to a tree. One finds well the three types of stated objects, with the difference close a distinction is made between the indices and the values. One can also notice that it is possible to reach certain cases in various manners. This model is thus redundant.

The research of the similar cases is carried out starting from the node root. One in common will seek the generalized episode having the most characteristics with the current problem. Then one traverses the indices, which represent the characteristics absent from the standard of the generalized episode on which one works. The couple index-value selected is that which is most similar with the problem. From this one, either one arrives at another generalized episode, in this case, one starts again the process, or one obtains a case similar to the problem arising.

The procedure of addition of new cases functions in a way close to research to similar cases. Indeed graph traverses it is identical. When one found the episode generalized having the most standards in common with the case running, one carries out the addition. For that, it is necessary to generate a couple index-value distinguishing the new case with the others wire from the generalized episode. If there exists already a case having the same couple, one creates a new generalized episode containing these two cases.

One thus obtains a network discriminating using the indices which make it possible to find the cases. The generalized episodes are mainly structures of indexing. The standards make it possible to represent a general knowledge of the subjacent cases whereas the couples index-value define specificities.

However, this process of indexing can lead to an exponential growth of the number of index compared to the number of cases. One thus associates generally some limiting in the choice of the indices even if that involves a fall of performances.

Model containing categories

This model is an alternative to the preceding model. Here, a case is also called example. The guiding idea is that reality should be defined in an extensive way by cases. The characteristics generally described by a name and a value, have a level of importance function of the adhesion of a case to a category.

In this model, the base of case is a network of categories and case. The indices are bonds which can be of three kinds:

  • Of recall: connecting a characteristic to a category or a case.

  • Of example: connecting a category to the cases with which it is associated.
  • Of difference: connecting two cases differing only from one restricted number of characteristics.

The diagram below illustrates the various types of bonds available. However it represents only one category. It thus should be added that the examples can belong to several categories.

The research of the similar cases consists in finding the category which has the characteristics closest to the new problem. When it is found, the prototypic cases are turned over.

Similar search for case

Before the research of the similar cases, it is necessary to study the problem arising. It is necessary to identify its characteristics but also its context if that is possible. So certain information is missing, it is possible to neglect certain characteristics or to question the user. It is during this stage, preamble with research strictly speaking, that system CBR must try to determine and correct the disturbed or incoherent data. For that, one can call upon tools of Datacleaning. It is also possible to try to deduce from the characteristics from others using a model of knowledge. All these operations generally require the approval of the user before passing at the following stage.

Research breaks up into two phases: filtering and selection. During these two stages, one calls upon static and dynamic indices. There exist various ways of determining which characteristics will be selected as index. One can use:

  • All the characteristics

  • Certain characteristics
  • determining characteristics in the past
  • most discriminating characteristics

It is possible to choose the indices at the time of the realization of system CBR, one speaks then about static indices. When they are selected automatically or by the intermediaries of a man-machine interface, they are qualified the dynamic ones. Finally certain CBR give an importance to the various indices.

Filtering

The stage of filtering consists in reducing as a preliminary the number of cases used in research. This stage can be jumped to pass directly to the selection. There exist various algorithms but those are often related to a type of representation of the cases. For example for the representation in MOP, i.e. the representation with dynamic storage, filtering will consist in reducing the whole of case to a MOP close to the problem. One will reach it while descending the indices successively.

Selection

From the whole of cases obtained at the time of the stage of filtering, one will build a new whole of similar cases. For that, one can use the Algorithme of the closest neighbors (Nearest Neighbor) or others Heuristique S which will enable us to measure the similarity between the problem arising and the cases candidates. In fact, one will compare the new case with the others only via the indices. Starting from the similarity on each index, one will obtain the total similarity.

It is necessary thus that each index has a function together measuring the similarity between two values of sound of research. This can pose problem if the cases consist of complex types symbolic systems. But this problem is not specific to the CBR, it is characteristic of research in Analogie. There thus exists often already a method to calculate the similarity for each type of data. Generally the function of calculation turns over a value belonging to an interval.

It is possible to enrich this method of the closest neighbors by the use of heuristic by selection. So that the selection is most optimal, it is not necessary to discover the cases most similar to the problem but rather those which are most useful to its resolution. The heuristic ones must select:

  • the cases which solve part of the goals of the problem

  • the cases which share the most important characteristics
  • the most specific cases
  • the most used cases
  • the most recent cases
  • the cases easiest to adapt

Re-use of case and adaptation

In simple systems CBR, when a similar case was found, one directly re-uses the solution which he proposes for the current problem. In this type of systems, one considers that the similarities are sufficient and that one can neglect the differences between the found case and the problem.

This way of proceeding is nevertheless not very satisfactory. It is rare that one finds a case identical to the problem, it is then often necessary to adapt the preexistent solutions. The adaptation thus consists in building a new solution starting from the problem running and of the found similar cases. This phase stresses the differences between the found cases and the problem and on the useful information to be transferred to the new solution.

There exist two types of adaptation:

  • the transformationnelle adaptation
  • the derivative adaptation

Transformationnelle adaptation

The transformationnelle adaptation consists in directly re-using the solutions of the last cases. This type of adaptation does not take into account the way in which the solutions of the similar cases were generated. One uses laws of adaptation to transform the old solutions. These laws are dependant on the scope of application of system CBR.

Derivative adaptation

One can use this type of adaptation when one lays out for each case stored in the base of the stages of the driving reasoning with the solutions. The derivative adaptation consists in applying the same reasoning to the new problem. During the construction of the new solution, one will privilege the ways taken by the old solutions selected and will avoid the unfruitful ways. However the new case is different, of new under-objectives will be continued.

Other possible adaptations

There exist other types of possible adaptations. One can for example call upon cases of adaptation. That amounts in general considering a system CBR dedicated to the adaptation. It would be specialized towards no field in particular and would contain rather abstract cases of adaptation.

Another approach is to classify the cases in a hierarchy. This approach makes it possible cases to be re-used with a level of possible abstraction highest, which makes them easily applicable to a new situation. To adapt under parts of a solution, the system will refer to the context of the general solution.

Revision

After its generation by the system, the solution of the problem is tested. This stage is generally external with the CBR. According to the field, one can call upon a software of simulation or an expert. Let us not forget that the duration of an evaluation can be very long, in particular in the medical field for the test of treatments. If this evaluation is conclusive, one will retain this new experiment. It is the phase of training which we will study thereafter. However if the solution is not satisfactory, it is necessary to repair it or at least explain the reasons of the failure. It is the phase of revision.

The phase of revision thus initially will try to determine the reasons of the failure. For that, one can try to explain why certain goals were not reached. Collected information will enrich a memory by failure used at the time of the phase of adaptation. Thus during the next generations of solutions, the system will not repeat its errors.

When the system determined the reasons of the failure of the solution, it is possible to try to repair it. This stage of repair can be seen like another function of adaptation. The only difference is that in repair, one works starting from a solution incorrect but adapted to the problem instead of unsuited correct solutions. One will be based on the explanations of the failure to carry out the modifications.

Training

It is the last stage of the cycle of the CBR. During this phase, the new case and its validated solution will be added to the base of case. It is thus necessary to determine which information must be safeguarded and in which form, and how to index this new case in the base.

If the case were solved without the assistance of the preexistent cases, for example using knowledge of an expert, it undoubtedly should be added in the base. On the other hand, if the solution were generated starting from old cases, the procedure is more complex. Indeed it will then not be inevitably necessary to add the new case directly. One can for example generalize the former case, origin of the new solution. In another manner, this new case can be integrated into a category or a generalized episode.

With regard to information to be safeguarded, it is obvious that one must safeguard the characteristics and the solution of the problem. In certain fields, one can all the same make the dead end on the characteristics which are easily deductible or without interest for the problem. It is also possible to record the explanations of the reasoning having led to the solution. This will enable us to use the derivative adaptation as we saw previously. It is also possible to safeguard the failures as we saw in the preceding paragraph. One can add to the base the cases of failures or the reasoning incorrect.

It is then a question of deciding which type of index the system will use to find this case. The majority of the existing software employ the totality of the characteristics. Other methods will traverse the base to find the characteristics most discriminating with the case to be added.

Finally it is necessary to integrate the new case in the base. During this phase, one will modify the indexing of the existing cases so that the system more easily detects the similarities at the time of the research of the similar cases. Practically one will increase the weight of a driving index to a case which made it possible to reach a useful case in the construction of the solution. Contrary, the system will decrease the weight of a driving index to a case which leads to a failure. In the final analysis, certain characteristics are privileged.

It can be interesting at the end of the training to test the system by resting the problem to him which it has just dealt with. Thus one can see whether the system behaves as it is awaited.

Technologies used

Mining dated

The use of the dated mining appears advantageous in the indexing of the data. The indexing is a very important point for the CBR. It is thanks to it that the calculation of similarity is carried out. The indices refer to various criteria and the datamining will make it possible to indicate the most discriminating criteria and thus most representative for construction of the indices.

Moreover, the calculation of the distance between the cases, carried out by the function of similarity intervenes logically between the new case and all the old cases of the base. Thanks to the datamining, classes will be made up for thus directly excluding the cases which do not have any relationship with the current case. The calculation of the distances thanks to the function of similarity will be carried out thus only inside one same class, which will reduce the execution time considerably.

Datacleaning

The Datacleaning also makes it possible to increase the performances of a system CBR. It will make so that the data recorded in the CBR are right and that there is no inaccurate case which could involve completely false reasoning. Nevertheless, it can happen that a CBR has in its base of the contradictory cases but he does not have therefore not being able any more to draw the conclusions, of the contradictory experiments existing in the real life.

Fuzzy logic

The fuzzy Logique is used in the fields where it is difficult to classify objects in a unit or another. To use this technology brings many advantages to systems CBR:

  • a better management of the inaccuracy: with traditional logic, an inaccuracy can result in ruining a reasoning, it on the other hand will be well tolerated if fuzzy logic is used.

  • translation of the linguistic quantifiers in numerical informations usable by the system: if the description of case is textual, there are strong chances so that characteristics are qualified by quantifiers such as “very”, “a little”, “approximately”, etc They will be easily translated by a value between 0 and 1.

  • the management of the continuous and real values facilitated: the indexing and the search for case will be easier to implement.

  • the introduction of the concepts of confidence and relevance: for each stored information, or each characteristic of the new case, one will use the degree of confidence which one has in information and the relevance. For example, if information comes from a not very reliable source, confidence will be weak. In another manner, the relevance will represent the importance of information for the problem.

Certain systems CBR use fuzzy logic as of the storage of the cases. They manage degrees of relevance and confidence for each case. But the principal use of fuzzy logic is made on the level of research of similar cases. There too, the systems use the degrees of confidence and relevance to calculate the similarities. Finally it is also possible to optimize the adaptation using this technique.

Projects containing reasoning by case

  • CHIEF: This software proposes to carry out receipts of kitchen by using the technique of the CBR. The user indicates to the program the food of which it lays out and CHIEF seeks to work out a receipt starting from these ingredients. With this intention, it operates like all it CBR: it has a very large database of " cas" valid receipts and seeks to create, by imitation, a new receipt containing the selected ingredients. CHIEF with the characteristic to take account of the failures and to safeguard them in order to avoid reproducing them. When the constraints imposed by the user are too strong and that no solution was found, CHIEF gives an explanation indicating the reasons of the failure.

  • SWALE : Project SWALE seeks to provide explanations to abnormal situations. It brings in particular explanations on the causes of died of the animals or the men. The program for example will compare the unexpected death of a very known racehorse in full force of the age with died with a cyclist due to an excessive consumption of doping products.

  • TO PERSUADE: TO PERSUADE is management tools of conflicts based on the reasoning by case. It functions on the principle of negotiation/mediation. It is able to provide solutions documented for the solution to problem of group. TO PERSUADE made in kind be able to build a mutually agreed payment between the various actors of the argument.

Random links:Ulysses Simpson Grant | List publications by editors - Dargaud - M | Boulevard (album) | Mohamed Samir | Guy de Kérimel | Alexandre_R._Todd,_baron_Todd