![]() |
It is important to understand that downbursts are different in many ways from events such as tornadoes and mesocyclones, and therefore different criteria from those used to evaluate the Mesocyclone Detection Algorithm (MDA) and Tornado Detection Algorithm (TDA) must be used to evaluate the DDPDA. By their nature, the precursor signals preceding a short-lived downburst event may only be observed by the DDPDA for one volume scan. Tornadoes, in contrast, are events that may have an observable signature for several volume scans and are therefore more suited to the “time window” scoring methods that have been used to evaluate the performance of the MDA and TDA. It is therefore inappropriate to use “time window” scoring on the DDPDA, as this scoring method cannot provide an accurate portrayal of DDPDA performance.
This can be demonstrated with a simple example. This example assumes a six-volume scan (approximately 30 minute) time window encompassing the downburst event, as employed in the evaluation of other operational algorithms. Since downburst precursors may only be observable by radar for one volume scan, the Probability of Detection (POD) may be as low as 0.167 if all precursors were correctly identified and every downburst event was correctly predicted. This is a very low POD, but given these assumptions could be considered a “perfect” score for the algorithm, since the performance is the maximum that can be achieved under these circumstances. In contrast, a long-lived macroburst event could have long-lived precursors and multiple DDPDA predictions. If DDPDA predictions were issued for all six volume scans in the time window for one of these long-lived events, then the algorithm could miss 5 other downburst events entirely and still have a “perfect” POD of 0.167! Clearly, the performance of the DDPDA is very different for both cases, even though the POD is 0.167 for each case. The nature of the downburst event prevents the time-window scoring method from being an effective evaluation tool for the DDPDA. Therefore, there is no rational basis to evaluate the DDPDA using this method.
The scoring methodology
used to evaluate the DDPDA is meant to simulate how a forecaster might
use the algorithm, if s/he were to issue a Severe Thunderstorm Warning
each time the DDPDA issued a downburst prediction. This does not
take into account how a forecaster might interpret the DDPDA output, and
whether or not they would use the output, but is a simple comparison that
is meant to be intuitive to interpret. A brief overview of the evaluation
criteria follows:

The final statistics for the DDPDA development in 1999 are presented
below. A “dependent” prediction equation was developed based on 33
randomly selected downburst events and 442 corresponding “null” events
from the same days. This equation was tested on the 18 remaining downburst
events and 222 “null” cases. The equations developed on the dependent
data set are of the form CTD = D0,
and the coefficients and variables are given in table 6.1.
Table 6.2 shows the DDPDA’s
performance at the end of December 1999. The first row shows the idealized
skill score for the dependent data set, which is the data set that the
prediction equations were developed on and includes events across the United
States. The second row is the skill scores for the data tested on
18 severe downburst events (240 total cells) that were not included in
the development of the prediction equations. The third row shows
the skill scores for the prediction equation run on the entire database
(dependent plus independent data).
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Table 6.2: DDPDA performance
for prediction equations developed in December 1999.
![]() |
Finally, earlier versions of the prediction equations were re-evaluated
on the current database and with current scoring techniques (described
above) in order to determine improvements over the past several years.
Heidke’s skill statistic was calculated for each previous version of the
algorithm, as well as the current version. These scores are based
on the full data set that was used to develop and test the latest version
of the DDPDA. For some earlier years, especially 1998, the assumption
of independence may not be valid, which may artificially inflate the HSS
for those years. Figure 6.3 and table 6.3 both show a history of
DDPDA prediction equations. Note that many of the earlier equations
were developed based on limited data sets. The equation developed
for the Tucson WDSS test is not included in this comparison, as no testing
on independent data was conducted for that case. Intermediate results
from October 1999 that were presented at the 1999 OSF/NSSL User’s Group
Meeting are also listed.
![]() |
Task: Algorithm scoring will be done both using current SCIT performance and using idealized SCIT performance (manual tracking).
None of the variables used to develop downburst prediction equations during 1999 were dependent on correct time associations by SCIT. For instance, Vertically Integrated Liquid (VIL) is not contingent on a correct SCIT time association. Other variables, such as the rate of change of VIL with time, depend on a correct time association by the SCIT algorithm. However, these rate-of-change variables were not part of the prediction equations developed in 1999, making the comparison between “manual tracking” and “idealized tracking” irrelevant for these equations. These variables may be examined in a future analysis,
Task: NSSL will investigate the use of probabilistic output by the DDPDA, given a sufficient number of cases.
Probabilistic output are available as part of the discriminant analysis procedure that was used to develop downburst prediction equations, as posterior probabilities (used for event classification) can be obtained for each event. A reliability diagram was constructed in order to evaluate the usefulness of these probabilities. Figure 6.4 shows the reliability of posterior probabilities for a discriminant analysis on the full data set. Downburst prediction probabilities were calculated based on downburst predictions occurring 3 volume scans before a severe wind event. Although the probabilities produced by the analysis show some reliability when the probability of severe wind is less than 0.3, the reliability rapidly decreases along the x-axis.
Additional time originally intended to further examine probabilistic output was instead used to prepare for a requested additional an end-of-year algorithm review, conducted December 19, 1999.
Figure 6.4: Reliability diagram for posterior probabilities from the
DDPDA discriminant function.