Final Report for Task 6: DDPDA

Task: “NSSL will inspect at least an additional 30 severe high-reflectivity downbursts to be used in prediction equation development and testing/scoring of the algorithm.”

a)Overview

The first part of 1999 was primarily spent on modifications to the ground truth software.  This scoring software was partially modified to automatically take into account the cells that were identified by an enhanced SCIT algorithm, and allow them to be filtered by population density and range from the radar.  The scoring methodology was slightly altered such that a greater number of cells could be confidently identified and tracked by increasing the lowest SCIT reflectivity threshold to 40 dBZ. Each cell was then examined to ensure correct identification and tracking, and any errors were corrected.  This allowed cases to be more quickly assimilated into the database, with a larger total number of cells in the sample. The new method also eliminated most user bias, user errors and tediousness in tracking non-event cells.
During the year, a total of 51 high-reflectivity downburst events that occurred within 80 km of a WSR-88D were either assessed for the first time or re-evaluated based on the modified scoring methodology.  In addition, 664 cells that did not produce severe downburst events were also analyzed, for a total of 715 cells in the database.  A total of 35 severe downburst events were added during 1999.  The other 16 cases in the database were from data sets that had been analyzed in previous years.  A number of events that had been part of the database in previous years were dropped from the database, as they were greater than 80km from a WSR-88D site.  The specifics for the included cases are documented in Task 12.

b) Tucson WDSS Proof-of-Concept test

Two new sets of downburst predictions equations were developed based on these data.  The first set, developed in July, was based on Arizona data only, and was implemented in the Tucson WDSS Proof-of-Concept test.  Although no quantitative evaluation of the DDPDA was possible due to poor ground truth information, the algorithm was well received by forecasters and was used in warning operations.  Seven out of eight forecasters who participated in a survey at the end of the Proof-of-Concept test thought that the DDPDA was a valuable aid in the warning decision process, and listed the DDPDA time-height trend output among the top five displays in the WDSS.
Attempts to quantitatively evaluate the performance of the DDPDA in Tucson were impossible due to the lack of adequate ground truth information.  There were several reasons for this:

 
We suggest that limitations such as these be considered when selecting future WDSS proof-of-concept test sites.  NSSL personnel conducted damage surveys of the events that were reachable via the road network, as reported in Deliverable D9.1.3 (Tucson Proof-of-Concept Test Final Report), and ensured that all available spotters were contacted during warning operations.
Text Box:   Figure 6.1: Population density of southeastern Arizona.  The 80-km range ring shows the effective predictive range of the DDPDA.  Major roads are highlighted in blue and purple.

c) Prediction equations and DDPDA scoring

A second set of prediction equations was developed and evaluated in November.  These equations were developed on a subset of 33 downburst events from across the nation, and independently tested on 18 downburst events.  Due to the nature of these events, the “time window” scoring methodology that has been used to evaluate other algorithms (such as TDA and MDA) is scientifically inappropriate for evaluating DDPDA performance, although it may be appropriate for other algorithms.

It is important to understand that downbursts are different in many ways from events such as tornadoes and mesocyclones, and therefore different criteria from those used to evaluate the Mesocyclone Detection Algorithm (MDA) and Tornado Detection Algorithm (TDA) must be used to evaluate the DDPDA.  By their nature, the precursor signals preceding a short-lived downburst event may only be observed by the DDPDA for one volume scan.  Tornadoes, in contrast, are events that may have an observable signature for several volume scans and are therefore more suited to the “time window” scoring methods that have been used to evaluate the performance of the MDA and TDA.  It is therefore inappropriate to use “time window” scoring on the DDPDA, as this scoring method cannot provide an accurate portrayal of DDPDA performance.

Text Box:  Figure 6.2: Terrain elevation within 230 km of the KEMX WSR-88D.  Areas of beam blockage for the 0.5-degree elevation angle are shown in black.  Range rings are spaced at 50 km.  The elevation of the KEMX radar is 1616 m.

This can be demonstrated with a simple example.  This example assumes a six-volume scan (approximately 30 minute) time window encompassing the downburst event, as employed in the evaluation of other operational algorithms.  Since downburst precursors may only be observable by radar for one volume scan, the Probability of Detection (POD) may be as low as 0.167 if all precursors were correctly identified and every downburst event was correctly predicted.  This is a very low POD, but given these assumptions could be considered a “perfect” score for the algorithm, since the performance is the maximum that can be achieved under these circumstances.  In contrast, a long-lived macroburst event could have long-lived precursors and multiple DDPDA predictions.  If DDPDA predictions were issued for all six volume scans in the time window for one of these long-lived events, then the algorithm could miss 5 other downburst events entirely and still have a “perfect” POD of 0.167!  Clearly, the performance of the DDPDA is very different for both cases, even though the POD is 0.167 for each case.  The nature of the downburst event prevents the time-window scoring method from being an effective evaluation tool for the DDPDA.  Therefore, there is no rational basis to evaluate the DDPDA using this method.

The scoring methodology used to evaluate the DDPDA is meant to simulate how a forecaster might use the algorithm, if s/he were to issue a Severe Thunderstorm Warning each time the DDPDA issued a downburst prediction.  This does not take into account how a forecaster might interpret the DDPDA output, and whether or not they would use the output, but is a simple comparison that is meant to be intuitive to interpret.  A brief overview of the evaluation criteria follows:
 

Text Box: DVILCarea2dBZ8kmD1kmCdpthC95%CmeanSHIdBZmaxCarea3volD0C0-3.45-73.330.146832.01.6331199.51257.40.169.909-11.850.142-239.9C1-3.39-51.250.1941018.81.236800.41468.80.1710.013-14.300.147-253.3Table 6.1: Coefficients for the final DDPDA downburst prediction equation developed in 1999.  The variables are as follows: VIL (Vertically Integrated Liquid), Carea2 (maximum area of convergence exceeding 0.005 s-1), dBZ8km (maximum reflectivity above 8km ARL), D1km (maximum divergence below 1km ARL), Cdpth (depth of convergence exceeding 10 ms-1), C95% (95th percentile of convergence from 1-6 km ARL), Cmean (mean convergence), SHI (Severe Hail Index from HDA), dBZmax (maximum reflectivity), Carea3 (maximum area of convergence exceeding 0.001 s-1), vol (cell volume).


The final statistics for the DDPDA development in 1999 are presented below.  A “dependent” prediction equation was developed based on 33 randomly selected downburst events and 442 corresponding “null” events from the same days. This equation was tested on the 18 remaining downburst events and 222 “null” cases.  The equations developed on the dependent data set are of the form CTD = D0, and the coefficients and variables are given in table 6.1.

Table 6.2 shows the DDPDA’s performance at the end of December 1999. The first row shows the idealized skill score for the dependent data set, which is the data set that the prediction equations were developed on and includes events across the United States.  The second row is the skill scores for the data tested on 18 severe downburst events (240 total cells) that were not included in the development of the prediction equations.  The third row shows the skill scores for the prediction equation run on the entire database (dependent plus independent data).
 
 
 
H
M
FA
CN
POD
FAR
CSI
HSS
Lead
#Cells
Dependent
21
12
29
413
0.64
0.58
0.34
0.52
5.0
475
Independent
11
7
22
200
0.61
0.67
0.28
0.40
5.0
240
Full, based on dependent
32
19
51
613
0.63
0.61
0.31
0.47
5.0
715
 
Hits
Misses
False larms
Correct Nulls
Prob of Detection
False Alarm Ratio
Critical Success Index
Heidke’s Skill Statistic
Lead Time, in minutes
 

Table 6.2: DDPDA performance for prediction equations developed in December 1999.
 
Text Box:  Figure 6.3: Heidke’s Skill Statistic for DDPDA, 1994-1999

Finally, earlier versions of the prediction equations were re-evaluated on the current database and with current scoring techniques (described above) in order to determine improvements over the past several years. Heidke’s skill statistic was calculated for each previous version of the algorithm, as well as the current version.  These scores are based on the full data set that was used to develop and test the latest version of the DDPDA.  For some earlier years, especially 1998, the assumption of independence may not be valid, which may artificially inflate the HSS for those years.  Figure 6.3 and table 6.3 both show a history of DDPDA prediction equations.  Note that many of the earlier equations were developed based on limited data sets.  The equation developed for the Tucson WDSS test is not included in this comparison, as no testing on independent data was conducted for that case.  Intermediate results from October 1999 that were presented at the 1999 OSF/NSSL User’s Group Meeting are also listed.
 
Text Box: YearPredictions based on:Data set (events at < 80 km)Heidke’s Skill Statistic1994Linear least squares50.071996Fuzzy logic100.001997Fuzzy logic180.041998Discriminant Analysis230.221999 (Oct)Discriminant Analysis510.21999 (Dec)Discriminant Analysis510.4Table 6.3: Historical DDPDA prediction methods, events, and performance.

Task: Algorithm scoring will be done both using current SCIT performance and using idealized SCIT performance (manual tracking).

None of the variables used to develop downburst prediction equations during 1999 were dependent on correct time associations by SCIT.  For instance, Vertically Integrated Liquid (VIL) is not contingent on a correct SCIT time association.  Other variables, such as the rate of change of VIL with time, depend on a correct time association by the SCIT algorithm.  However, these rate-of-change variables were not part of the prediction equations developed in 1999, making the comparison between “manual tracking” and “idealized tracking” irrelevant for these equations.  These variables may be examined in a future analysis,

Task: NSSL will investigate the use of probabilistic output by the DDPDA, given a sufficient number of cases.

Probabilistic output are available as part of the discriminant analysis procedure that was used to develop downburst prediction equations, as posterior probabilities (used for event classification) can be obtained for each event.  A reliability diagram was constructed in order to evaluate the usefulness of these probabilities.  Figure 6.4 shows the reliability of posterior probabilities for a discriminant analysis on the full data set.  Downburst prediction probabilities were calculated based on downburst predictions occurring 3 volume scans before a severe wind event.  Although the probabilities produced by the analysis show some reliability when the probability of severe wind is less than 0.3, the reliability rapidly decreases along the x-axis.

Additional time originally intended to further examine probabilistic output was instead used to prepare for a requested additional an end-of-year algorithm review, conducted December 19, 1999.


Figure 6.4: Reliability diagram for posterior probabilities from the DDPDA discriminant function.