Re: Best Performing Models
Posted by Chris in Tampa on 7/26/2013, 8:30 am
I just made an update to how the best performing models map works. I thought I would post some other info about it too. Dorian had very little model data for the first few runs. The previous method of calculating the best performing models did not have a large sample of models in this circumstance for the first five days of a storm and made it easier for a model to guess right in the first few model runs.

The new method averages up to 2 cases of error data 12 hours after model data is first available, up to 3 cases 18 hours after model data is first available and up to 4 cases 24 hours or more after model data is first available.

It uses a smaller hour error interval than previously for the first 7 and a half days until there is 120 hours (5 days) worth of track data to look at as we slowly include less and less older model runs. The idea is to use more model runs, and more quickly use less of the first runs, in the best performing models calculation.

An example...

180 hours (7.5 days) since model data was released, this is how we calculate the best performing models:

60 hours after model data is first released...
If a model has a 120 hour forecast position we compare that forecast with the actual storm position at 180 hours since model data was released, which is 120 hours later.
We get the distance in nautical miles between those two points and that is the single run model error data for the 120 hour error interval for the model run that was released 180 hours after model data was first released. That is one case of error data.

We do that for the previous three runs as well, for a total of up to four cases. If the model did not have a 120 hour forecast position that we can use then there could be less cases and often will be. If a model only comes out every 12 hours, then there will be up to two cases.

A model's 120 hour forecast 54 hours after model data is first released compared to the storm position 120 hours later at 174 hours.
A model's 120 hour forecast 48 hours after model data is first released compared to the storm position 120 hours later at 168 hours.
A model's 120 hour forecast 42 hours after model data is first released compared to the storm position 120 hours later at 162 hours.

We add those up and divide by the number of cases that there were and get the model's average error for a 120 hour forecast over a 24 hour period. That occurs for every model and then they are ranked.

That is the easy explanation on how it works from 7.5 days and on.

At the end of the page I cover how it works before 7.5 days since model data was released. An outline is like this, which is about as simple as I configure it in the script:

0 and 6 hour... BAMM and LBAR displayed.

Then:

Hour since models were first released, Hour error interval

12, 6
18, 6
24, 6
30, 12
36, 12
42, 18
48, 18
54, 24
60, 30
66, 36
72, 36
78, 42
84, 48
90, 54
96, 54
102, 60
108, 66
114, 72
120, 72
126, 78
132, 84
138, 90
144, 90
150, 96
156, 102
162, 108
168, 108
174, 114
180 or higher, 120



-------------



When I last posted about the best performing model map I don't think I mentioned about some models being excluded so I thought I would go ahead and cover that.

You can view what models are excluded here:
http://www.hurricanecity.com/models/data.cgi?page=bestperformingmodels

Basically, the GFS ensemble members are excluded. By sheer amount, they often appeared in the calculation. The way I understand ensembles to work, they tweak things. What if this were true, then this would happen. What if we tweaked this other thing, then this would happen. If all the ensemble members were bunched together, that indicates a greater certainty. Even if various conditions change, the storm will likely track in a more predictable way. If some of the ensemble members go to Mexico and the rest spread out all the way to Maine, then we can all be glad we don't work at the NHC and have to make a forecast. That means that if something changes, maybe even something small, the track could change drastically. Having one of the ensemble members be right in the short term does not tell us too much as they probably will not differ too significantly and by the sheer number some would get in the top 5 best performing models. We do allow the GFS Ensemble Mean (AEMN) however, an average of the members. The other reason the GFS ensemble members are excluded is because other ensemble members are not there. The CMC ensemble members are not in the model file that the NHC releases for example. Plus, the GFS influences a lot of other models that are released so things are already very GFS heavy.

If you want to go really indepth into what model is performing best, there are other text products you can view, though some are complex to explain. It depends on whether you want to look at how a model is performing recently or over the life of the storm.

Average error for Dorian over the life of the storm (Table | Graphically)

Although I should note that graphically, you can see the SPC3 model is way off and in linking to that page I should explain why as it is one of those it needs further explanation models. I added to the Google Map about that model this:

"Note by our site: This model is a statistical intensity consensus. The track displayed is sometimes very wrong, perhaps due to when some parts of the averaged members are unavailable. (or perhaps another reason, we do not know) Please keep that in mind when choosing whether to display track data from this model rather than intensity data."

It is actually a great intensity model. Right now, it is the best performing intensity model. As for the track, well, it is often way, way off, and I'm not talking slightly. Sometimes it might go backwards. It might be off by a thousand miles for an easy track forecast.

There are also other features for model error, but they get more complex to explain. There is a page here that tries to explain it. My site has a highly experimental feature that tries to explain each square when you hover your mouse over it. (It seems to work every time I use it, but it also make the size of the HTML of the page very large, so until I work on it more in the future it is not on Jim's site.) For average error over the life of the storm, you can view the average error heat map, with extra info when you hover your mouse over a square, here. (same page as the chart I linked to before, only with popups when you hover your mouse over a square)



-------------------------------------------------------------
-------------------------------------------------------------


Below I include an example of how it works before 7.5 days since model data was released. It is long, but I include it in case anyone really, really wants to know.

For comparison, you can see how it used to work here.


-------------------------------------------------------------
-------------------------------------------------------------



0 hours since model data was released...
 BAMM and LBAR are displayed


6 hours since model data was released...
 BAMM and LBAR are displayed


-------------
Change occurs...
We can start displaying model error data.
-------------


12 hours since model data was released...
 6 hour model error interval used, averaged over 2 cases.
   For example:
     6 hour single run model error data (from hour 0, when model data was first released, to hour 6, when the second model run was released) averaged with
     6 hour single run model error data (from hour 6, when model data was first released, to hour 12, when the third model run was released).
     If there is no model error data available for 1 of those cases, then there is just 1 case.
     Some individual models may have 2 cases, some may have 1 case and some will not be considered at all if there is no model error data available for the particular model.


-------------
Change occurs...
Add a third case.
-------------


18 hours since model data was released...
 6 hour model error interval used, averaged over 3 cases.
   For example:
     6 hour single run model error data (from hour 0 to 6) averaged with
     6 hour single run model error data (from hour 6 to 12) averaged with
     6 hour single run model error data (from hour 12 to 18).
     If there is no model error data available for 1 of those cases, then there are just 2 cases.
     If there is no model error data available for 2 of those cases, then there is just 1 case.
     Some individual models may have 3 cases, some may have 2 cases, some may have 1 case and some will not be considered at all if there is no model error data available for the particular model.


-------------
Change occurs...
Add a fourth case.
For all calculations from this point, there will always be up to 4 cases where model error data is averaged for a model. For models that come out every 12 hours, there will always only be up to two cases.
-------------


24 hours (1 day) since model data was released...
 6 hour model error interval used, averaged over 4 cases.
   For example:
     6 hour single run model error data (from hour 0 to 6) averaged with
     6 hour single run model error data (from hour 6 to 12) averaged with
     6 hour single run model error data (from hour 12 to 18) averaged with
     6 hour single run model error data (from hour 18 to 24).
     If there is no model error data available for 1 of those cases, then there are just 3 cases.
     If there is no model error data available for 2 of those cases, then there are just 2 cases.
     If there is no model error data available for 3 of those cases, then there is just 1 case.
     Some individual models may have 4 cases, some may have 3 cases, some may have 2 cases, some may have 1 case and some will not be considered at all if there is no model error data available for the particular model.


-------------
Change occurs...
Increase model error hour to 12 hours.
-------------


30 hours since model data was released...
 12 hour model error interval used, averaged over 4 cases.
   For example:
     12 hour single run model error data (from hour 0 to 12) averaged with
     12 hour single run model error data (from hour 6 to 18) averaged with
     12 hour single run model error data (from hour 12 to 24) averaged with
     12 hour single run model error data (from hour 18 to 30).


36 hours since model data was released...
 12 hour model error interval used, averaged over 4 cases.
   For example:
     12 hour single run model error data (from hour 6 to 18) averaged with
     12 hour single run model error data (from hour 12 to 24) averaged with
     12 hour single run model error data (from hour 18 to 30) averaged with
     12 hour single run model error data (from hour 24 to 36).


-------------
Change occurs...
Increase model error hour to 18 hours.
-------------


42 hours since model data was released...
 18 hour model error interval used, averaged over 4 cases.
   For example:
     18 hour single run model error data (from hour 6 to 24) averaged with
     18 hour single run model error data (from hour 12 to 30) averaged with
     18 hour single run model error data (from hour 18 to 36) averaged with
     18 hour single run model error data (from hour 24 to 42).


48 hours (2 days) since model data was released...
 18 hour model error interval used, averaged over 4 cases.
   For example:
     18 hour single run model error data (from hour 12 to 30) averaged with
     18 hour single run model error data (from hour 18 to 36) averaged with
     18 hour single run model error data (from hour 24 to 42) averaged with
     18 hour single run model error data (from hour 30 to 48).


-------------
Change occurs...
Increase model error hour by 6 hours until we have used the third model run four times. (hour 12, which is 12 hours after model data was first released) We have already used it above once.
-------------


54 hours since model data was released...
 24 hour model error interval used, averaged over 4 cases.
   For example:
     24 hour single run model error data (from hour 12 to 36) averaged with
     24 hour single run model error data (from hour 18 to 42) averaged with
     24 hour single run model error data (from hour 24 to 48) averaged with
     24 hour single run model error data (from hour 30 to 54).


60 hours since model data was released...
 30 hour model error interval used, averaged over 4 cases.
   Another example:
     30 hour single run model error data (from hour 12 to 42) averaged with
     30 hour single run model error data (from hour 18 to 48) averaged with
     30 hour single run model error data (from hour 24 to 54) averaged with
     30 hour single run model error data (from hour 30 to 60).


66 hours since model data was released...
 36 hour model error interval used, averaged over 4 cases.
   Another example:
     36 hour single run model error data (from hour 12 to 48) averaged with
     36 hour single run model error data (from hour 18 to 54) averaged with
     36 hour single run model error data (from hour 24 to 60) averaged with
     36 hour single run model error data (from hour 30 to 66).


-------------
Change occurs...
We have used the third model run four times. (hour 12, which is 12 hours after model data was first released) We don't want to keep using the same run constantly as we increase the model error hour. So we do not increase the model error hour here. This allows us to use another run, 36 hours after model data was first released.
-------------


72 hours (3 days) since model data was released...
 36 hour model error interval used, averaged over 4 cases.
   Example:
     36 hour single run model error data (from hour 18 to 54) averaged with
     36 hour single run model error data (from hour 24 to 60) averaged with
     36 hour single run model error data (from hour 30 to 66) averaged with
     36 hour single run model error data (from hour 36 to 72).


-------------
Change occurs every 24 hours...
We do what we just did over the past 24 hours, keep increasing the model error hour by six hours until 24 hours have passed and then we drop the oldest model run we are using in the calculation and at that time do not increase the model error hour.
-------------


78 hours since model data was released...
 42 hour model error interval used, averaged over 4 cases.
   Example:
     42 hour single run model error data (from hour 18 to 60) averaged with
     42 hour single run model error data (from hour 24 to 66) averaged with
     42 hour single run model error data (from hour 30 to 72) averaged with
     42 hour single run model error data (from hour 36 to 78).


84 hours since model data was released...
 48 hour model error interval used, averaged over 4 cases.
   Another example:
     48 hour single run model error data (from hour 18 to 66) averaged with
     48 hour single run model error data (from hour 24 to 72) averaged with
     48 hour single run model error data (from hour 30 to 78) averaged with
     48 hour single run model error data (from hour 36 to 84).


90 hours since model data was released...
 54 hour model error interval used, averaged over 4 cases.
   Another example:
     54 hour single run model error data (from hour 18 to 72) averaged with
     54 hour single run model error data (from hour 24 to 78) averaged with
     54 hour single run model error data (from hour 30 to 84) averaged with
     54 hour single run model error data (from hour 36 to 90).


-------------
Change at 24 hour interval again...
Drop the oldest model run we are using in the calculation and do not increase the model error hour.
-------------


96 hours (4 days) since model data was released...
 54 hour model error interval used, averaged over 4 cases.


102 hours since model data was released...
 60 hour model error interval used, averaged over 4 cases.


108 hours since model data was released...
 66 hour model error interval used, averaged over 4 cases.


114 hours since model data was released...
 72 hour model error interval used, averaged over 4 cases.


-------------
Change at 24 hour interval again...
Drop the oldest model run we are using in the calculation and do not increase the model error hour.
-------------


120 hours (5 days) since model data was released...
 72 hour model error interval used, averaged over 4 cases.


126 hours since model data was released...
 78 hour model error interval used, averaged over 4 cases.


132 hours since model data was released...
 84 hour model error interval used, averaged over 4 cases.


138 hours since model data was released...
 90 hour model error interval used, averaged over 4 cases.


-------------
Change at 24 hour interval again...
Drop the oldest model run we are using in the calculation and do not increase the model error hour.
-------------


144 hours (6 days) since model data was released...
 90 hour model error interval used, averaged over 4 cases.


150 hours since model data was released...
 96 hour model error interval used, averaged over 4 cases.


156 hours since model data was released...
 102 hour model error interval used, averaged over 4 cases.


162 hours since model data was released...
 108 hour model error interval used, averaged over 4 cases.


-------------
Change at 24 hour interval again...
Drop the oldest model run we are using in the calculation and do not increase the model error hour.
-------------


168 hours (7 days) since model data was released...
 108 hour model error interval used, averaged over 4 cases.


174 hours (7.25 days) since model data was released...
 114 hour model error interval used, averaged over 4 cases.


-------------
Final Change!
We now always use 120 hour (5 day) model error data
-------------


180 hours (7.5 days) since model data was released...
 120 hour (5 day) model error data displayed, averaged over 4 cases.
   Example:
     120 hour single run model error data (from hour 42 to 162) averaged with
     120 hour single run model error data (from hour 48 to 168) averaged with
     120 hour single run model error data (from hour 54 to 174) averaged with
     120 hour single run model error data (from hour 60 to 180).


186 hours (7.75 days) since model data was released...
 120 hour (5 day) model error data displayed, averaged over 4 cases.
   Another example:
     120 hour single run model error data (from hour 48 to 168) averaged with
     120 hour single run model error data (from hour 54 to 174) averaged with
     120 hour single run model error data (from hour 60 to 180) averaged with
     120 hour single run model error data (from hour 66 to 186).


192 hours (8 days) since model data was released...
 120 hour (5 day) model error data displayed, averaged over 4 cases.
   Yet another example:
     120 hour single run model error data (from hour 54 to 174) averaged with
     120 hour single run model error data (from hour 60 to 180) averaged with
     120 hour single run model error data (from hour 66 to 186) averaged with
     120 hour single run model error data (from hour 72 to 192).


198 hours (8.25 days) since model data was released...
 120 hour (5 day) model error data displayed, averaged over 4 cases.
   And a final example:
     120 hour single run model error data (from hour 60 to 180) averaged with
     120 hour single run model error data (from hour 66 to 186) averaged with
     120 hour single run model error data (from hour 72 to 192) averaged with
     120 hour single run model error data (from hour 78 to 198).
157
In this thread:
Best Performing Models - Slamdaddy, 7/25/2013, 6:05 pm
< Return to the front page of the: message board | monthly archive this page is in
Post A Reply
This thread has been archived and can no longer receive replies.