godzilla
First Class Captain
- Joined
- Aug 8, 2006
- Runs
- 5,404
- Post of the Week
- 1
With some time on my hands and having read some atrocious misuse of statistics lately on these boards, I unfortunately found the motivation to waste an afternoon looking at stats of players for our test team.
Firstly, I'm going to make some very obvious statements about the use of statistics, the only reason being they seem to be under-understood, or wilfully ignored.
1) Statistics dont tell the whole picture, even when used properly. BUT the exceptions are exceptions - not the rule; eg: although Grahame Hick and Mark Ramprakash were monsters in domestic cricket, they could not replicate those achievements in the international arena. conversely, Sangakara has an appreciably higher test average than his first class batting average. Of course, this can happen, but theres a very small number of these examples. By and large, on balance, statistics (by definition) tend to be good indicators of the quality of a player.
2) Statistics are heavily misused. The most egregious error is sample size. For example, lets take batting average. Quite often on PP someone writes an average taken over five or ten or fifteen games. This is meaningless. The reason for that is that players exhibit ebbs and flows in form, this might span a season, or a ground, or a particular opponent. Consequently, averaging over a small number of games is at risk of representing a purple patch, or a drop in form, rather than illustrating the underlying skill of the player. To get around this, the larger the number of data points, the better.
That begs the question - how many data points? There is no easy answer to that question. As a rough rule of thumb, I looked at a bunch of Pakistani players this afternoon: I took a list of the top 30 RATED bowlers and batsmen in FC cricket in Pakistan, and then added a bunch of other names of current interest. (am happy to add more if anyone is particularly interested).
For the rated players, I used an old PCB site which appears to still receive updated data (http://www.pcboard.com.pk/Features/) The site also explains its rating methodology - roughly speaking it attempts to rank players by averages weighted by recent form, opposition and ground, with a higher rating given to players who perform against stiffer competition, and less credit for performance against weak teams.
Looking at the average number of games (not innings) played by the bowlers and the batsmen in the list, the data was: 68 and 107. That makes intuitive sense, since one would expect the best bowlers probably have youth, speed and strength somewhere as important as experience; whereas for batting, experience is more important than youth. The average number of innings though, give some indication of how much data we can reasonably attempt to look to to gain some confidence in the predictive capability of statistics. So a decent sample size woudl appear to be in the 70-100 range for FC games played in this case.
(To play around further with sample sizes and confidence intervals (and to read about definitions), the following site is as good as any: http://www.surveysystem.com/sscalc.htm#one )
3) Bearing in mind 1) and 2) above, its possible to put together a bunch of players from the domestic scene who from their performances put a compelling argument for selection. That doesn't mean that they should definitely be in the side, but if the numbers are outstanding, it means that there has to be a very good reason to keep them out. In addition, if the numbers are outstanding over a large enough sample size, there is a very strong argument to give them a long run in the side to make sure we know whether they will be Hick/Ramprakashs or not. A couple of games here and there is a total waste of time.
The results were surprising to me, but may not be for the more astute of you out there. Its been repeated ad nauseum how good sadaf hussains stats have been, for example, but the picture is far more vivid when you compare those statistics to his peers.
Im not sure how best to attach a spread sheet here on pp, so ill attempt to paste a few screen shots below, dont know if they will be visible.
The list of players is used are:
Bowlers - PCB top 30 rated PLUS Amir, IK, Asif, Rahat, Wahab, Junaid.
Batters - PCB top 30 rated PLUS Hafeez, Shan, Younis, Sarfraz, Rizwan, Asad, Shehzad, Haris, Sohaib, Sami Aslam, Babar.
Conclusions.
a) The currents squad is statistically, significantly sub optimal. It will come as no surprise that there are players who should not be there. Most obvious: Shan who is third from bottom on the average list, but who's father is on the board coincidentally. Hafeez scans poorly too, although to be fair to him, he has a long career, so his average includes his earlier, poorer stats. If one was to look at his last three or four years, his numbers would be much better. But thats another discussion. And Wahab too scans awfully, averaging second worst in averages, second worst in strike rate and ninth worst in terms of economy.
b) There is either gross and unforgivable incompetence and corruption at the level of the board, or there is a side to the story that we are missing in the case of some of the more outstanding players.
c) There will be some push back that some of the better screening players have been given chances on the international stage and have failed. Given the statistics when used properly include a significant number of data points, the first part of that debate ought to be about whether the players concerned were given enough chances. Its is patently idiotic for example, to argue that Alam who has scored close to 10,000 runs over 126 games, has been found out over two games - rather than assuming that he was in poor form or handicapped by the menace of the chop because of the historically idiotic PCB selection policy.
d) the noteworthy players based on metrics combined with large number of data points are:
Bowling -
sadaf is hands and feet above everyone else over his 59 games, both on average and strike rate. It is absolutely criminal that he has not been given a strong run in the national side. If speed is an issue, there are any number of the top ten rated bolwers in the world that are not express (andersen, broad, hazelwood). Both that and the rubbish that Wahab and Johnson throw down proves that speed is not the ultimate criterion.
Samiullah is next best on averages but is 35 years old. Mir Hamza comes next, but having only played 37 games, arguably needs to still prove himself. Very surprising for me is to see Hammad Azam feature at number four on average and the 4th best pacer for economy over 64 games. I've noticed that he has been toward the top of the bowling tables in the past few years too. I should add here that I have spoken first hand to both international and domestic players who have seen both Sadaf and Hammad bowl, and in general most of them were of the opinion that they are flattered by their statistics on account of the types of balls the PCB uses - they dont rate either Sadaf or Hammad, but the numbers say different.
Also of note is that Ehsan Adil comes up next and screens well on Average and Sr but again, only over 36 games. Amir is below all of them (35 games), closely followed by Imran Khan who is third in the list of SR (73 games). Rahat, Junaid, Zia fall lower than mid table, surprisingly followed by Asif on Average, with Gul and Wahab at the butt of the list.
Most of the list is pacers, so not sure how well we can compare slow bowlers but Yasir Shah doesnt come out well on any metric. I guess we will see whether he is our Sangakara and just performs much better on the international stage, or whether he will revert to type and fall away.
The range of the list is 18-30 on averages, 2.7-4 on economy and 37-53 on strike rate, so there is significant performance difference between the best and worst.
Batting -
Of the fit batsmen (ie not Haris) who have played a significant number of games (ie over 100) Alam is hands, feet and streets above everyone else; he averages 57 over 125 games versus younis (51) and misbah (51) (see point iv below). If we adjust for not outs and treat them as outs, he still tops the table followed by younis and misbah. The next closest rival with more than a 100 games is imran farhat on the adjusted averages.
SR data is not available for the full list, but of the players for whom it is available, its interesting to note that Alam is fifth on the list which is headed by Sharjeel and Shahzaib, both of whom average mid 30s and so are not comparable.
Going back to a straight average, the next interesting high performer is Salahuddin (age 27) on 82 games averaging 45, and ukmal is just below him. The only difference between rizwan and sarfraz is that sarfraz has performed the same over 118 games versus rizwans 55, so i think its fair that sarfraz gets the nod.
Way below the top end of alam, haris (only 57 games though), misbah and younis, its interesting to see the rest of the usual names languish in the high 30's to low 40 averages, even below our two keepers: shehzad, babar azam (stats show he is not the second coming as so many here think), asad shafiq (showing that so called technique doesnt seem to matter all that much when it comes to generating runs which is what wins matches), CLH, azher.
The senior test team squad members shan, aslam (exceptional but very early stage List A stats, but because of haroon rashid, was selected for tests instead of ODIs) are towards the bottom of the table, with hafeez not much better.
The range of averages on the list is 29-57, with only four players averaging over 50, 2 between 45 and 50, 13 between 40 and 45, and 22 below 40 - which just goes to show how far above everyone else the guys at the top are, and how there can be no justification whatsoever for not playing all of them (except injury or retirement).
Summary:
Looking at the tables, Alam and Sadaf are so much better than everyone else, there really needs to be some very vigorous questioning as to why they have been left out. Hammad is another who is stand out before the rest of the pack with Ehsan Adil pretty close too. Amongst the lists, first team players namely Wahab and Shan occupy amongst the very worst performers of the top 35 or so best domestic performers. They should not play test cricket for Pakistan again, unless they massively improve.
Note:
(i) I'm not familiar enough with the players to know which of the batters in the list are openers/one down rather than middle order, which would make a difference to the analysis of course.
(ii) For the bowlers, I've included average, economy, SR and wickets. The economy rate to my mind is a rough proxy to accuracy, and so is worth looking at. SR, similarly. Wickets is a proxy for number of games - again, the more games, and wickets, the more dependable the data for predicting the underlying quality of the player.
(iii) for the batsmen, ive included average and number of games which are by far the most important stats for FC/Tests. Ive added SR where available, just because a high strike rate with a high average I think would indicate exceptional talent. I have not included 100s and 50s because in my view they are vastly inferior in terms of importance than average and variance or standard deviation (if i had the data to calculate it). I've added not outs, and the adjusted average is the batting average calculated by including all not out scores as out scores. the%age difference is the difference between the two averages - the reason for the inclusion of these is to answer the inevitable questions about averages being inflated by not out scores.
(iv) although these are all FC stats used so as to compare the players on a like for like basis, thats not strictly true, since for the international players, FC stats include TEST innings which would be against far stronger opposition. So there is decent argument to suggest the international players should be treated leniently in this analysis - although I would suspect that the difference wont be huge.
Batsman list:
Batsmen sorted by average:
Bowlers list:
Bowlers sorted by Average:
Bowlers sorted by SR:
Firstly, I'm going to make some very obvious statements about the use of statistics, the only reason being they seem to be under-understood, or wilfully ignored.
1) Statistics dont tell the whole picture, even when used properly. BUT the exceptions are exceptions - not the rule; eg: although Grahame Hick and Mark Ramprakash were monsters in domestic cricket, they could not replicate those achievements in the international arena. conversely, Sangakara has an appreciably higher test average than his first class batting average. Of course, this can happen, but theres a very small number of these examples. By and large, on balance, statistics (by definition) tend to be good indicators of the quality of a player.
2) Statistics are heavily misused. The most egregious error is sample size. For example, lets take batting average. Quite often on PP someone writes an average taken over five or ten or fifteen games. This is meaningless. The reason for that is that players exhibit ebbs and flows in form, this might span a season, or a ground, or a particular opponent. Consequently, averaging over a small number of games is at risk of representing a purple patch, or a drop in form, rather than illustrating the underlying skill of the player. To get around this, the larger the number of data points, the better.
That begs the question - how many data points? There is no easy answer to that question. As a rough rule of thumb, I looked at a bunch of Pakistani players this afternoon: I took a list of the top 30 RATED bowlers and batsmen in FC cricket in Pakistan, and then added a bunch of other names of current interest. (am happy to add more if anyone is particularly interested).
For the rated players, I used an old PCB site which appears to still receive updated data (http://www.pcboard.com.pk/Features/) The site also explains its rating methodology - roughly speaking it attempts to rank players by averages weighted by recent form, opposition and ground, with a higher rating given to players who perform against stiffer competition, and less credit for performance against weak teams.
Looking at the average number of games (not innings) played by the bowlers and the batsmen in the list, the data was: 68 and 107. That makes intuitive sense, since one would expect the best bowlers probably have youth, speed and strength somewhere as important as experience; whereas for batting, experience is more important than youth. The average number of innings though, give some indication of how much data we can reasonably attempt to look to to gain some confidence in the predictive capability of statistics. So a decent sample size woudl appear to be in the 70-100 range for FC games played in this case.
(To play around further with sample sizes and confidence intervals (and to read about definitions), the following site is as good as any: http://www.surveysystem.com/sscalc.htm#one )
3) Bearing in mind 1) and 2) above, its possible to put together a bunch of players from the domestic scene who from their performances put a compelling argument for selection. That doesn't mean that they should definitely be in the side, but if the numbers are outstanding, it means that there has to be a very good reason to keep them out. In addition, if the numbers are outstanding over a large enough sample size, there is a very strong argument to give them a long run in the side to make sure we know whether they will be Hick/Ramprakashs or not. A couple of games here and there is a total waste of time.
The results were surprising to me, but may not be for the more astute of you out there. Its been repeated ad nauseum how good sadaf hussains stats have been, for example, but the picture is far more vivid when you compare those statistics to his peers.
Im not sure how best to attach a spread sheet here on pp, so ill attempt to paste a few screen shots below, dont know if they will be visible.
The list of players is used are:
Bowlers - PCB top 30 rated PLUS Amir, IK, Asif, Rahat, Wahab, Junaid.
Batters - PCB top 30 rated PLUS Hafeez, Shan, Younis, Sarfraz, Rizwan, Asad, Shehzad, Haris, Sohaib, Sami Aslam, Babar.
Conclusions.
a) The currents squad is statistically, significantly sub optimal. It will come as no surprise that there are players who should not be there. Most obvious: Shan who is third from bottom on the average list, but who's father is on the board coincidentally. Hafeez scans poorly too, although to be fair to him, he has a long career, so his average includes his earlier, poorer stats. If one was to look at his last three or four years, his numbers would be much better. But thats another discussion. And Wahab too scans awfully, averaging second worst in averages, second worst in strike rate and ninth worst in terms of economy.
b) There is either gross and unforgivable incompetence and corruption at the level of the board, or there is a side to the story that we are missing in the case of some of the more outstanding players.
c) There will be some push back that some of the better screening players have been given chances on the international stage and have failed. Given the statistics when used properly include a significant number of data points, the first part of that debate ought to be about whether the players concerned were given enough chances. Its is patently idiotic for example, to argue that Alam who has scored close to 10,000 runs over 126 games, has been found out over two games - rather than assuming that he was in poor form or handicapped by the menace of the chop because of the historically idiotic PCB selection policy.
d) the noteworthy players based on metrics combined with large number of data points are:
Bowling -
sadaf is hands and feet above everyone else over his 59 games, both on average and strike rate. It is absolutely criminal that he has not been given a strong run in the national side. If speed is an issue, there are any number of the top ten rated bolwers in the world that are not express (andersen, broad, hazelwood). Both that and the rubbish that Wahab and Johnson throw down proves that speed is not the ultimate criterion.
Samiullah is next best on averages but is 35 years old. Mir Hamza comes next, but having only played 37 games, arguably needs to still prove himself. Very surprising for me is to see Hammad Azam feature at number four on average and the 4th best pacer for economy over 64 games. I've noticed that he has been toward the top of the bowling tables in the past few years too. I should add here that I have spoken first hand to both international and domestic players who have seen both Sadaf and Hammad bowl, and in general most of them were of the opinion that they are flattered by their statistics on account of the types of balls the PCB uses - they dont rate either Sadaf or Hammad, but the numbers say different.
Also of note is that Ehsan Adil comes up next and screens well on Average and Sr but again, only over 36 games. Amir is below all of them (35 games), closely followed by Imran Khan who is third in the list of SR (73 games). Rahat, Junaid, Zia fall lower than mid table, surprisingly followed by Asif on Average, with Gul and Wahab at the butt of the list.
Most of the list is pacers, so not sure how well we can compare slow bowlers but Yasir Shah doesnt come out well on any metric. I guess we will see whether he is our Sangakara and just performs much better on the international stage, or whether he will revert to type and fall away.
The range of the list is 18-30 on averages, 2.7-4 on economy and 37-53 on strike rate, so there is significant performance difference between the best and worst.
Batting -
Of the fit batsmen (ie not Haris) who have played a significant number of games (ie over 100) Alam is hands, feet and streets above everyone else; he averages 57 over 125 games versus younis (51) and misbah (51) (see point iv below). If we adjust for not outs and treat them as outs, he still tops the table followed by younis and misbah. The next closest rival with more than a 100 games is imran farhat on the adjusted averages.
SR data is not available for the full list, but of the players for whom it is available, its interesting to note that Alam is fifth on the list which is headed by Sharjeel and Shahzaib, both of whom average mid 30s and so are not comparable.
Going back to a straight average, the next interesting high performer is Salahuddin (age 27) on 82 games averaging 45, and ukmal is just below him. The only difference between rizwan and sarfraz is that sarfraz has performed the same over 118 games versus rizwans 55, so i think its fair that sarfraz gets the nod.
Way below the top end of alam, haris (only 57 games though), misbah and younis, its interesting to see the rest of the usual names languish in the high 30's to low 40 averages, even below our two keepers: shehzad, babar azam (stats show he is not the second coming as so many here think), asad shafiq (showing that so called technique doesnt seem to matter all that much when it comes to generating runs which is what wins matches), CLH, azher.
The senior test team squad members shan, aslam (exceptional but very early stage List A stats, but because of haroon rashid, was selected for tests instead of ODIs) are towards the bottom of the table, with hafeez not much better.
The range of averages on the list is 29-57, with only four players averaging over 50, 2 between 45 and 50, 13 between 40 and 45, and 22 below 40 - which just goes to show how far above everyone else the guys at the top are, and how there can be no justification whatsoever for not playing all of them (except injury or retirement).
Summary:
Looking at the tables, Alam and Sadaf are so much better than everyone else, there really needs to be some very vigorous questioning as to why they have been left out. Hammad is another who is stand out before the rest of the pack with Ehsan Adil pretty close too. Amongst the lists, first team players namely Wahab and Shan occupy amongst the very worst performers of the top 35 or so best domestic performers. They should not play test cricket for Pakistan again, unless they massively improve.
Note:
(i) I'm not familiar enough with the players to know which of the batters in the list are openers/one down rather than middle order, which would make a difference to the analysis of course.
(ii) For the bowlers, I've included average, economy, SR and wickets. The economy rate to my mind is a rough proxy to accuracy, and so is worth looking at. SR, similarly. Wickets is a proxy for number of games - again, the more games, and wickets, the more dependable the data for predicting the underlying quality of the player.
(iii) for the batsmen, ive included average and number of games which are by far the most important stats for FC/Tests. Ive added SR where available, just because a high strike rate with a high average I think would indicate exceptional talent. I have not included 100s and 50s because in my view they are vastly inferior in terms of importance than average and variance or standard deviation (if i had the data to calculate it). I've added not outs, and the adjusted average is the batting average calculated by including all not out scores as out scores. the%age difference is the difference between the two averages - the reason for the inclusion of these is to answer the inevitable questions about averages being inflated by not out scores.
(iv) although these are all FC stats used so as to compare the players on a like for like basis, thats not strictly true, since for the international players, FC stats include TEST innings which would be against far stronger opposition. So there is decent argument to suggest the international players should be treated leniently in this analysis - although I would suspect that the difference wont be huge.
Batsman list:
Batsmen sorted by average:
Bowlers list:
Bowlers sorted by Average:
Bowlers sorted by SR:
Last edited by a moderator: