next up previous
Next: Optimization Parameters Up: The Genetic Algorithm Previous: Chromosome generation


Fitness measure

The number output by the BWER algorithm is a real number in the range [0,1], i.e. it is a confidence estimate with 1.0 signaling that the algorithm is extremely confident that the candidate region is a BWER and 0.0 that it is confident that it is not a BWER. Pragmatically, we drop all candidates which receive confidence estimates of less than 0.3 as having weak endorsements5. The truthed cases do not have such a fine grain of detail - the BWERs identified are either ``strong'' or ``marginal''. Our philosophy in devising a fitness measure for the chromosome is that it should be rewarded more for correctly identifying a strong BWER with a 1.0 estimate than it should for identifying it with 0.4 estimate. Similarly, it should be punished less for creating a false alarm with a 0.4 estimate than it should for creating one with a 1.0 estimate.

To that end, the fitness of a chromosome is computed from the set of all detections and the set of all truthed cases. For each truth-BWER, we compute the ``validity'' of the match of each detection. Let $d$ be the distance6 between the detection of interest and the truth BWER. Let $D$ be the maximum distance within which we can accept that the truth and the BWER correspond to the same feature. If $c$ is the confidence estimate of the detection and $a$ the confidence estimate of the truth-BWER (assigned as 0.5 for ``marginal'' and 1.0 for ``strong'), then validity, $v$, of the match between a detection and the truth-BWER is given by:

\begin{displaymath}
v = \begin{array}{cr}
c*(1-d/D) &~~if~~ (d \leq D) \\
0 &~~if~~ (d > D) \\
\end{array}\end{displaymath} (6)

For each truth-BWER, we find the most valid detection, $v_{max}$. The BWER has been missed to the extent $miss$ given by:
\begin{displaymath}
miss = \begin{array}{cr}
a - v_{max} &~~if~~ ( v_{max} \leq a ) \\
0 &~~if~~ ( v_{max} > a ) \\
\end{array}\end{displaymath} (7)

and been correctly detected to the extent $hit$ given by:
\begin{displaymath}
hit = \begin{array}{cr}
v_{max} &~~if~~ ( v_{max} \leq a ) \\
a &~~if~~ ( v_{max} > a ) \\
\end{array}\end{displaymath} (8)

Unless a truth-BWER is detected with identical scores at the perfect spot or missed completely, each truth-BWER is missed to a certain degree and detected to a certain degree. This gradation of performance measurement provides enough inducement for the GA to pick better and better chromosomes through selecting $x_1$ and $x_2$ values for each of the single-feature fuzzy rules.

Similarly, we cycle through the list of detections to find the false alarms and correct null detections. For every detection, we compute the validity, $v$ of each truth-BWER given by:

\begin{displaymath}
v = \begin{array}{cr}
a*(1-d/D) &~~if~~ (d \leq D) \\
0 &~~if~~ (d > D) \\
\end{array}\end{displaymath} (9)

and find the most valid truth-BWER, $v_{max}$. Each detection is a false alarm, $fa$, to the extent:
\begin{displaymath}
fa = \begin{array}{cr}
c - v_{max} &~~if~~ ( c \geq v_{max} ) \\
0 &~~if~~ ( c < v_{max} ) \\
\end{array}\end{displaymath} (10)

The number of null events correctly ignored is calculated by subtracting the false alarm total from the total number of candidates after the $hit$, $miss$ and $fa$ of the complete set of truth-BWERs and detections have been computed.

The numbers we have identified as $hit$, $miss$ and $fa$ are really just constructs - they do not represent the real-world skill measure. In weather detection algorithms, a detection can either be a hit or a false alarm. It cannot be both. So, we redo the computation, this time thresholding the BWER detections. Detections with strong endorsements, defined as those with confidences greater than 0.75, are retained. These detections are either hits or false alarms depending on whether there is a truth-BWER within a distance $D$ of the detection. Similarly, truth-BWERs which have not been matched within a distance $D$ by a detection are counted as misses. Let the hits, misses and false alarms obtained by this either-or logic be given by $hit_r$, $miss_r$ and $fa_r$, with the subscript denoting that these numbers denote the real measure of skill.

Then, the fitness of any chromosome is given in terms of its success in processing the test cases by:

\begin{displaymath}
fitness = \frac{0.7*hit}{hit + miss + fa} +
\frac{0.3*hit_r}{hit_r + miss_r + fa_r}
\end{displaymath} (11)

i.e. we provide the GA with a graduated measure of skill to allow it to mutate toward better solutions but also provide, albeit with a lower weight, the real measure of skill. Of course, the measure being used here is a weighted average of two Critical Success Index (CSI) scores Donaldson et al. (1975).

Because these fitness values, in spite of the graduated measure of skill we provide, tend to lie very close together for randomly chosen chromosomes, we scale the fitness of a chromosome based on the raw fitness values of the other chromosomes in the generation using sigma truncation followed by linear scaling Goldberg (1989). This scaled fitness is used for probabilistic selection.

The fitness measure that we have used could be used directly to score the resulting algorithm. Therein lies an important advantage of genetic algorithms. The entire analysis is carried out in the space of the original problem. We have not had to deal with gradients or any attribute that the run-time algorithm doesn't deal with. In most other search and optimization methods, it is necessary to compute (or approximate) such attributes that are not part of the run-time algorithm. This is particularly useful because it is not easy to describe the BWER algorithm as a closed-form function, so as to be able to take partial derivatives.


next up previous
Next: Optimization Parameters Up: The Genetic Algorithm Previous: Chromosome generation
Lakshman : lakshman@nssl.noaa.gov