To that end, the fitness of a chromosome is computed from the set of all detections and the set of all truthed cases. For each truth-BWER, we compute the ``validity'' of the match of each detection. Let
be the distance6 between the detection of interest and the truth BWER. Let
be the maximum distance within which we can accept that the truth and the BWER correspond to the same feature. If
is the confidence estimate of the detection and
the confidence estimate of the truth-BWER (assigned as 0.5 for ``marginal'' and 1.0 for ``strong'), then validity,
, of the match between a detection and the truth-BWER is given by:
| (6) |
| (7) |
| (8) |
Similarly, we cycle through the list of detections to find the false alarms and correct null detections. For every detection, we compute the validity,
of each truth-BWER given by:
| (9) |
| (10) |
The numbers we have identified as
,
and
are really just constructs - they do not represent the real-world skill measure. In weather detection algorithms, a detection can either be a hit or a false alarm. It cannot be both. So, we redo the computation, this time thresholding the BWER detections. Detections with strong endorsements, defined as those with confidences greater than 0.75, are retained. These detections are either hits or false alarms depending on whether there is a truth-BWER within a distance
of the detection. Similarly, truth-BWERs which have not been matched within a distance
by a detection are counted as misses. Let the hits, misses and false alarms obtained by this either-or logic be given by
,
and
, with the subscript denoting that these numbers denote the real measure of skill.
Then, the fitness of any chromosome is given in terms of its success in processing the test cases by:
Because these fitness values, in spite of the graduated measure of skill we provide, tend to lie very close together for randomly chosen chromosomes, we scale the fitness of a chromosome based on the raw fitness values of the other chromosomes in the generation using sigma truncation followed by linear scaling Goldberg (1989). This scaled fitness is used for probabilistic selection.
The fitness measure that we have used could be used directly to score the resulting algorithm. Therein lies an important advantage of genetic algorithms. The entire analysis is carried out in the space of the original problem. We have not had to deal with gradients or any attribute that the run-time algorithm doesn't deal with. In most other search and optimization methods, it is necessary to compute (or approximate) such attributes that are not part of the run-time algorithm. This is particularly useful because it is not easy to describe the BWER algorithm as a closed-form function, so as to be able to take partial derivatives.