BBO Discussion Forums: Rating Players - BBO Discussion Forums

Jump to content

  • 8 Pages +
  • « First
  • 2
  • 3
  • 4
  • 5
  • 6
  • Last »
  • You cannot start a new topic
  • You cannot reply to this topic

Rating Players Basic theory

#61 User is offline   hrothgar 

  • PipPipPipPipPipPipPipPipPipPipPip
  • Group: Advanced Members
  • Posts: 15,495
  • Joined: 2003-February-13
  • Gender:Male
  • Location:Natick, MA
  • Interests:Travel
    Cooking
    Brewing
    Hiking

Posted 2010-October-29, 17:53

In an amusing development, I'm attending a lecture tomorrow on Bayesian methods by Marc Glickman (he of the famed Glicko rating system)
Alderaan delenda est
0

#62 User is offline   dlbalt 

  • PipPip
  • Group: Members
  • Posts: 23
  • Joined: 2007-January-10

Posted 2011-March-12, 13:55

Competitive Chess has had an effective and efficient rating system for decades. It's worth a look for anyone who seriously wants to generate a rating system for Bridge.

If my memory is working, this is how it worked 25 years ago:

1) Ratings run from a low of about 1200 (beginning players) to roughly 2800 (National Masters). Above that, a player's FIDE (International) rating is more important, although it is calculated in a similar fashion.

2) Each recorded game contributes to a player's rating, except as noted below. Typically, a tournament player will play 3 or 4 games per day in a local tournament.

3) A player's rating increases by 16 points for each win, and decreases by 16 points for each loss, plus or minus 4% of the difference in the players ratings, up to a difference of 400 points. So, if you beat a player who is 400 points lower in the rankings than you are, the effect on your rating is (16 - (.04 * 400)) = 0. Similarly, losing to a player who is ranked 400 points higher than you, you don't lose any rating points. If you beat a player who is ranked 200 points higher, the effect is (16 + (.04 * 200)) = 24;

4) As a corollary to (3), players never lose rating points as a result of winning a game, and never gain points by losing a game. A player can never lose more than 32 points or gain more than 32 points as a result of a single game.

5) Ratings are provisional until a player has some number of rated games - 24 I think.

There are some grumblings about the chess rating system (of course, since there are grumblings about everything), but it has worked well. Bridge players who want a rating system should take a look at it.
0

#63 User is offline   dlbalt 

  • PipPip
  • Group: Members
  • Posts: 23
  • Joined: 2007-January-10

Posted 2011-March-12, 14:07

View Postmatmat, on 2010-July-26, 02:26, said:


Excluding pairs that play and practice together (THE VAST VAST VAST VAST VAST VAST VAST VAST (that's a lot of VASTs) majority of who DO NOT CHEAT) is ridiculous. it randomizes the field, causes bad and high variance bridge and rewards good luck rather than good actions.



How do you know that the (VAST *8) majority of established partnerships do not cheat?

I wouldn't say that they DO cheat, because I have no evidence. Apparently you have some evidence that they don't cheat. What is it?
0

#64 User is offline   TimG 

  • PipPipPipPipPipPipPip
  • Group: Advanced Members
  • Posts: 3,972
  • Joined: 2004-July-25
  • Gender:Male
  • Location:Maine, USA

Posted 2011-March-12, 21:25

View Postdlbalt, on 2011-March-12, 14:07, said:

I wouldn't say that they DO cheat, because I have no evidence. Apparently you have some evidence that they don't cheat. What is it?

People are inherently good?
0

#65 User is offline   matmat 

  • ded
  • PipPipPipPipPipPipPip
  • Group: Advanced Members
  • Posts: 3,459
  • Joined: 2005-August-11
  • Gender:Not Telling

Posted 2011-March-12, 21:36

View Postdlbalt, on 2011-March-12, 13:55, said:

Bridge players who want a rating system should take a look at it.


I don't care much for the chess rating system. Last time i played, my p led the pawn of black, later crossed the rook with the king and sacrificed the queen to the opponents' knight. Before I could do anything my p already blew the board and my rating went down... and the other pair were NOOOOBS.
1

#66 User is offline   matmat 

  • ded
  • PipPipPipPipPipPipPip
  • Group: Advanced Members
  • Posts: 3,459
  • Joined: 2005-August-11
  • Gender:Not Telling

Posted 2011-March-12, 21:38

View Postdlbalt, on 2011-March-12, 14:07, said:

How do you know that the (VAST *8) majority of established partnerships do not cheat?

I wouldn't say that they DO cheat, because I have no evidence. Apparently you have some evidence that they don't cheat. What is it?



I am not sure how you get through life thinking that unless you have some evidence to the contrary people are doing you wrong?

Or perhaps all the partnerships you have been involved in have cheated?
0

#67 User is online   mycroft 

  • Secretary Bird
  • PipPipPipPipPipPipPipPip
  • Group: Advanced Members
  • Posts: 7,450
  • Joined: 2003-July-12
  • Gender:Male
  • Location:Calgary, D18; Chapala, D16

Posted 2011-March-14, 11:11

The beautiful chess rating, go rating, golf handicap...all work really well for individual events.

How good is LeBron? Really, really good, we all know that. But how good is LeBron when paired with a high school centre? Or with Random Ball Hog (I'm sure we all can think of at least one example). How good is LeBron, against competition at his level, playing pickup street against a professional team allowed to practise together? And how does that affect his rankings?

I'm a decent flight A player in the fields I play in. Not great, decent. With a team of people at my level, we were one exhausted cardplay mistake away from qualifying for day 2 of the Sat-Sun Swiss in Reno last year. But I was playing with my regular partner, my teammates played together frequently; if I had to play pickup - even if my partner was measurably "better" than the one I had - we wouldn't have done as well, pretty much guaranteed. So, what does that do to rating?

The other thing that ratings-that-can-drop do is to make people not *want* to play with new people, or lesser players, because it might affect their rating in ways they can't help. So, welcome to even *more* cliques and stratifying of the game. I haven't *proven*, but it's pretty obvious, that there is no way to set a single-person rating that can't be obviously gamed (either way, really - if I want a *bad* rating, so that I can get into an event where I can clean up, I can arrange that, without "dumping").

Really, the only sane thing one can measure is *partnership* ratings; and that's going to leave many players, who don't just don't play "regular" partnerships, out in the cold. It certainly won't help the reason "everybody" wants a rating system - "how good is he? Am I willing to play with him?" And even those would probably have to have separate "MPs/IMPs/BAM" ratings, the same way FIDE has separate regular/speed ratings - it's a different game.

Attendance points suck for rating. But really, very few things are *unambiguously* better - just less worse.
When I go to sea, don't fear for me, Fear For The Storm -- Birdie and the Swansong (tSCoSI)
1

#68 User is offline   slior 

  • PipPip
  • Group: Members
  • Posts: 22
  • Joined: 2006-May-31

Posted 2011-April-26, 15:22

View Postdlbalt, on 2011-March-12, 13:55, said:

Competitive Chess has had an effective and efficient rating system for decades. It's worth a look for anyone who seriously wants to generate a rating system for Bridge.



I think that developing a rating system for bridge requires considerable research. Apologies for the technical language that follows, but you cannot really discuss the technical merits of a rating system without some mathematical language. I don't have anything to say about social and business questions such as whether BBO should have a rating system, whether some games should be rated and others unrated and so on.

The post above game the numerical calculations involved in running a rating system, but omitted the heart of the system -- that mathematical model that leads to these calculation. The basic model for chess, proposed by Arpad Elo, is the following:

1. We assume that every player has a current "average ability level" (this is the number that the rating system will be trying to measure). In any individual game, the player will play at an ability level which is randomly distributed around his "average" level, and the winner of the game is the player who happened to display a higher level during that game.
2. The key assumption (to be discussed below): the distribution of abilities around the average is normal (Gaussian), with a standard deviation of 200 rating points.
3. Say we have two players with true abilities r_1, r_2. We can calculate the probability that player 1 will beat player 2 (that's the probability that a random sample from N(r_1,200) is larger than a random sample from N(r_2,200)). This is the "expected result of the game". Now assume that r_1 and r_2 were only the ratings associated to the players instead of their true abilities. We can still compare the calculated probability to the actual result of the game (player 1 can score 0,0.5, or 1 point). If player 1 did better than expected, we infer that we underestimated r_1 and overestimated r_2, and make a small adjustment to the ratings accordingly (dlbalt's post above explains how this is done). Conversely if player 1 did worse than expected.
4. Mathematical fact: If assumption 2 is valid, then then o a long series of games the ratings of all players will converge to their true ability levels.

The main weakness of this model is the fixed standard deviation of 200 points. If all players of a given level had the same standard deviation, this would simply serve to fix the value of "1 rating point". In practice they don't quite. Another weakness is that an input to the rating update process is a measure of the "rating uncertainty" (if we are very confident that r_1 is close to player 1's ability, we want to only make a small change after the game; if we are unsure we want to make a large change), which affects the speed of convergence of the ratings to the true abilities. The original Elo system had a fixed value (later versions give players a higher uncertainty during their first games). Glicko's system improves on this in two ways. First, he modifies assumption 2 by assigning a different variance to each player (which the system will also try to measure). Second, we improve step 3 by assigning to every player a "ratings deviation" which more directly measures our uncertainty about his rating. The first change is fundamental (makes the model closer to reality) while the second is designed to improve the convergence properties).

How does one test the system? Take a set of players and a large collection of games. Use the first part of the sample to calculate ratings as described above, using enough games for the ratings to converge. Then examine the remaining games to see whether the statistical model really predicts the results (i.e. whether the probability of winning is really given by the larger-of-two-Gaussian-samples model, and whether our model for the variance [either 200 points or per-player] agrees with reality). In chess this works to some extent -- see for example the papers by Mark Glickman.

What about adapting this to bridge? Here are some thoughts:

1. Formally, one could simply look at each hand played separately (say looking at the points margin). Given enough time (hands), the ratings may converge. I think the general perception is that convergence here would be too slow, and that it's better to compare what player did with the same cards rather than compare NS to EW directly. Also probably different models are needed for team games (only one other table) and duplicate games (many other tables).

2. On the other hand, bridge has more detailed scoring information than just "win/draw/lose", which should help. It is not a-priori clear which component of the score best predicts future success.

This much is for rating pairs. But what we really want is rating individuals, and this raises the last question:

3. Can players be assigned individual ratings which effectively predict the performance of pairs? The naive answer is that established partnerships perform much better than pick-up partnerships with the same ability, but I don't know of research into how large this effect actually is and (more importantly) to what extent it limits the accuracy of individual ratings in predicting results. To get a genuine answer we have to create a statistical rating model and then test it against actual data.

Side thought: in trying to extract individual ability from pair data, it would be helpful to incorporate the identity of the declarer into the model (that is, probably the ability of declarer affects the result of the hand more than the ability of dummy), but since the identity of declarer is not simply a function of the cards I don't see an obvious way to do this.

In summary: before we discuss whether BBO should have a rating system, it may be worth doing some statistical research and actually <i>validate</i> a rating system for bridge.
0

#69 User is offline   hrothgar 

  • PipPipPipPipPipPipPipPipPipPipPip
  • Group: Advanced Members
  • Posts: 15,495
  • Joined: 2003-February-13
  • Gender:Male
  • Location:Natick, MA
  • Interests:Travel
    Cooking
    Brewing
    Hiking

Posted 2011-April-26, 17:24

View Postslior, on 2011-April-26, 15:22, said:

In summary: before we discuss whether BBO should have a rating system, it may be worth doing some statistical research and actually <i>validate</i> a rating system for bridge.


Developing an accurate rating system for a given population of players isn't particularly hard.

Convincing half the players that they are below average is much more daunting, as is the mind numbing tedium trying to convince numerically illiterate yahoos why this algorithm says that you suck...

Figure out how to deal with this and I'll invest the time/effort to develop an accurate rating system
Alderaan delenda est
1

#70 User is offline   deannz 

  • Pip
  • Group: Members
  • Posts: 9
  • Joined: 2010-October-20

Posted 2011-April-26, 18:56

Based on the Chess/American Baseball system ELO

http://otagobridgecl....nz/ratings.php

D./
0

#71 User is offline   slior 

  • PipPip
  • Group: Members
  • Posts: 22
  • Joined: 2006-May-31

Posted 2011-April-27, 01:21

View Postdeannz, on 2011-April-26, 18:56, said:

Based on the Chess/American Baseball system ELO

http://otagobridgecl....nz/ratings.php

D./


If this works we're in good shape. How well is the distribution of hand results in this club approximated by the statistical distribution from the underlying model given the players' ratings?
0

#72 User is offline   mgoetze 

  • PipPipPipPipPipPipPip
  • Group: Advanced Members
  • Posts: 4,942
  • Joined: 2005-January-28
  • Gender:Male
  • Location:Cologne, Germany
  • Interests:Sleeping, Eating

Posted 2011-May-02, 11:46

Quote

Pair ratings are calculated as the average of the individual ratings. Adjustments to the ratings are applied equally to both members of the pair.


So.... convincing.
"One of the painful things about our time is that those who feel certainty are stupid, and those with any imagination and understanding are filled with doubt and indecision"
    -- Bertrand Russell
0

#73 User is offline   hrothgar 

  • PipPipPipPipPipPipPipPipPipPipPip
  • Group: Advanced Members
  • Posts: 15,495
  • Joined: 2003-February-13
  • Gender:Male
  • Location:Natick, MA
  • Interests:Travel
    Cooking
    Brewing
    Hiking

Posted 2011-May-02, 12:25

View Postmgoetze, on 2011-May-02, 11:46, said:

So.... convincing.


For what its worth, I had the chance to discuss this topic (bridge ratings) with Glickman after a talk he gave a few monthes back.

I posited (and Glickman concurred) that the best way to approach the problem was to focus on developing an accurate rating system for pairs.
Once you have an accurate system for rating pairs, you can then try to decompose accurate ratings for individuals out of a sets of pair ratings.

It's entirely possible that Glickman doesn't believe any such thing and thought that agreeing with me was the best way to get me to go away.
Alderaan delenda est
0

#74 User is offline   babalu1997 

  • Duchess of Malaprop
  • PipPipPipPipPip
  • Group: Full Members
  • Posts: 721
  • Joined: 2006-March-09
  • Gender:Not Telling
  • Interests:i am not interested

Posted 2011-May-02, 18:50

View Posthrothgar, on 2011-May-02, 12:25, said:

For what its worth, I had the chance to discuss this topic (bridge ratings) with Glickman after a talk he gave a few monthes back.

I posited (and Glickman concurred) that the best way to approach the problem was to focus on developing an accurate rating system for pairs.
Once you have an accurate system for rating pairs, you can then try to decompose accurate ratings for individuals out of a sets of pair ratings.

It's entirely possible that Glickman doesn't believe any such thing and thought that agreeing with me was the best way to get me to go away.


one of my partner always says this about rating pairs

but says also that no bridge organization will consider such a thing for pecuniary reasons

View PostFree, on 2011-May-10, 03:57, said:

Babalu just wanted a shoulder to cry on, is that too much to ask for?
0

#75 User is offline   lamford 

  • PipPipPipPipPipPipPipPip
  • Group: Advanced Members
  • Posts: 6,446
  • Joined: 2007-October-15

Posted 2011-October-23, 18:51

The ELO system is highly regarded in International Football:

http://www.eloratings.net/system.html

and "about ratings" gives details. The important thing is to have a high weight constant in the early stages reducing as the players get more experience. A percentage in any event can be converted into a rating difference, and compared with the player's expected result.
I prefer to give the lawmakers credit for stating things for a reason - barmar
0

#76 User is offline   Lurpoa 

  • PipPipPipPip
  • Group: Full Members
  • Posts: 324
  • Joined: 2010-November-04
  • Gender:Male
  • Location:Cogitatio 40
  • Interests:SEF
    BBOAdvanced2/1
    2/1 LC
    Benjamized Acol
    Joris Acol
    Fantunes
    George's K Squeeze

Posted 2012-April-26, 11:59

View PostJlall, on 2009-November-20, 15:58, said:

On OKB almost no one played unrated games. Generally if they did they were not very good. If you are a good player who wants to play unrated games it might be quite hard to find a good game. Even if you are not a good player it will limit your options greatly since almost everyone will be playing rated.

Of course BBO is now much bigger than OKB ever was, and OKB charged membership fee which kinda biases the results of how many play rated vs unrated a lot (if you're willing to pay 100 bucks a year, you'll probably want to play rated), so maybe this wouldn't be the case on BBO.





655321 ???
Really.
Take care
:)
I will not follow...:)




Bob Herreman
0

#77 User is offline   JLOGIC 

  • 2011 Poster of The Year winner
  • PipPipPipPipPipPipPipPip
  • Group: Advanced Members
  • Posts: 6,002
  • Joined: 2010-July-08
  • Gender:Male

Posted 2012-April-27, 22:40

Lurpoa, I do not understand what you wrote, can you clarify?
0

#78 User is offline   32519 

  • Insane 2-Diamond Bidder
  • PipPipPipPipPipPip
  • Group: Advanced Members
  • Posts: 1,471
  • Joined: 2010-December-22
  • Gender:Male
  • Location:Mpumalanga, South Africa
  • Interests:Books, bridge, philately

Posted 2012-April-28, 01:08

Here is another suggestion:

On the “Hand Records” screen, the following options appear –
• Interval to retrieve: days / weeks / months. Choosing “months” only allows the last month. Add another one “last 6 months.”
• Similarly for the “Show summaries every” option, add “every 6 months.”

Now get the programmers to extract the 6 month summaries into a sub-file and sort them from the highest average to the lowest average. The top X% get automatically graded expert, the next Y% get automatically graded advanced, the next Z% get graded intermediate etc.

Remove the self rating option altogether from each players profile. Instead replace it programmatically with the rating as calculated above. One can decide upon the frequency upon which the rating gets recalculated / replaced in each players BBO profile e.g. daily / weekly / monthly. My guess is weekly should be fine (24h00 on Sundays USA (BBO headquarters) time.

The only way you can progress from say, advanced to expert is to up your game. You can try and bullshit the system by playing with a lot of weaker players to get a higher average. However, as soon as you start playing against real experts you will be exposed, your average will plummet and you will drop back into a lower category where you probably belong anyway.

Other things to consider: a) New players may need to be excluded from the calculation until they have played enough hands; b) similarly for players who haven’t played in a long time (can consider maintaining their last average).
0

#79 User is offline   jjbrr 

  • PipPipPipPipPipPipPip
  • Group: Advanced Members
  • Posts: 3,525
  • Joined: 2009-March-30
  • Gender:Male

Posted 2012-April-28, 19:24

lol no
OK
bed
0

#80 User is offline   jdgalt 

  • PipPipPip
  • Group: Full Members
  • Posts: 87
  • Joined: 2007-July-09
  • Gender:Male
  • Location:northern California
  • Interests:Also a board game player (I'm "jdgalt" on BoardGameGeek, too).

Posted 2012-May-09, 19:49

I like the idea of a rating system, but I see several problems that it would need to overcome. Just offhand:

(1) Suppose two very unequal partners pair up. Assume ratings something like ACBL masterpoints: Alice with 1200 and Bob with 50 partner up against Charlie and Doug with 500 each. If Alice and Bob win, does it mean that only Bob gains rating points since Alice was better than their opponents? Or do we count them both as their average of 625 points, so that neither gains anything?

(2) How to deal with a pair that used to be much better than they are now. I like the way WBF masterpoints decay over time; something like that might be called for.

(3) How to deal with players who avoid joining the league so that their points aren't counted. I have run up against some very good players in this category at clubs. In the US you could do the same thing by joining one of ACBL/ABA and playing at the other.

(4) For that matter, a person could have multiple 'nyms on BBO. I'm sure this is against the rules but I'm not at all sure it can be caught. Even if there are 5 BBO login names on the same PC, maybe they're a family or housemates.

My feeling is that rating systems are a good idea for clubs (though (1) through (3), at least, need to be dealt with) but online games should not award points of any kind, unless they're the online forum's own private points, because it's impossible to police adequately. Only noticing cheaters if they consistently get "too good" results will catch only the stupidly greedy.
0

  • 8 Pages +
  • « First
  • 2
  • 3
  • 4
  • 5
  • 6
  • Last »
  • You cannot start a new topic
  • You cannot reply to this topic

6 User(s) are reading this topic
0 members, 6 guests, 0 anonymous users