I assume 數據 means factors, not data points. (e.g."days between last gallop & next race" is a factor, a particular horse had "6 days" is a data point for this factor.)
5 years data = ~3500 races, or 40,000 horse instance, or less than 2000 average per trainer. So your data point to factor ratio is less than 20:1 on average. With this low ratio, you system is almost guaranteed to have overfit. Therefore it may look very good on paper on past races, but cannot really predict the future races.
To build a good system, you need as little factors as possible, but as many data points as possible. Does it make sense to you? 作者: 贏馬人 時間: 27-2-2010 04:50 AM