I agree. No anonymous ratings. It muddies the transparency of the ratings.
That said, I do also think that the algorithm is a good idea for the reasons posted above.
Getting to this discussion a bit too late, but disallowing anonymous rating is no guarantee that you know the true identity of the person giving the ratings under a certain username ( I should think not!)
You all realize that if somebody wants to maliciously “sabotage” ratings they can create sockpuppet accounts? And vice versa, people can create sockpuppet accounts to give 100 ratings to their own teas. Nevermind all that. I am a cynic.
The blog entries were very interesting. Not sure I get the Bayesian average, but just suggesting that it would be interesting to have a formula for average where it takes into account, for each person´s rating, what is the average rating that person tends to give. Say two people whose average rating is the 90s rate a tea 95 and that is the only two ratings that tea is. Or another tea is rated 95 by two people whose average rating is on the 70s. I think tea B should have a higher steepster rating.
I think the current formula is fine. These guys are technology geeks and know what makes the best statistical sense in terms of rating.
I believe it did happen, as you said, some people create sockpuppet accounts to rate. But I think the web masters can find them out if attention is brought up. This, I think also happened before, and largely discouraged the behavior of using fake accounts for malicious rating. Also, “real” drinkers actively participate in rating can largely “dilute” the effect of malicious rating.
I respectfully disagree that they “know what makes the best statistical sense”… they may have an opinion, but not everyone agrees w/ it. Not everyone is a tech geek. There’s nothing wrong w/ being one, but not everyone is one.
I agree with you completely about the significant variation in average ratings, Cteresa, and have thought this many times. If a person is considering a tea’s overall rating in a prospective purchase, its relevance is strongly impacted by this second-tier issue, as well as the algorithm. I often end up eyeballing the raters’ rating distributions. It’d be great if Jason could figure out a way to work that mean into the formula.
My opinion is that one can be a tech (programming, or electronics or whatever) geek without being a statistics geek, or a maths geek in general. Theoretically everybody who studied hard science or engineering or computer science (in my native language, part of engineering) had some training on statistics and probabilities. But often the more you know of statistics the hardest it is to pick the “best” method to evaluate something:p
What the? I just found out about the algorithm. I looked at a tea that I rated yesterday that I noticed I was the only one that reviewed it. I gave it a 94 and the overall score is in the 70s! And I’m the only that rated it? Why would it do that? Weird.
Interesting. There are really two issues here (as pointed out earlier)
The potential for sabotage ratings
The need to understand how the ratings are calculated
Oh dear. Measurement theory, here we come. First there is the question of what indeed are we measuring, and then, how do we combine those measurements into a useful summary?
I’m actually pretty impressed with the way Steepster has set things up, and the visual bar with happy faces is similar to the visual analog approach that has been used for ratings of pain.
The Bayesian approach for the means is brilliant…it also to a certain degree counteracts the sabotaging by taking into account a wider range of information than just the tea ratings for that one tea (and thus the potential sabotage victim).
One could also present confidence intervals around the ratings, but those may be a little hard to explain.
Nonetheless, I’d agree that anonymous ratings may increase the risk of sabotage. In my pollyanna world, no one would do that!
Such is my life – talking about math on a tea site.
For those of you not familiar with our ratings system, here are 2 blog posts about some of the details behind it:
Sure, there are probably always going to be things we could do to make it better. But this was what we thought would be the most effective at the time. We thought it was important to not only consider individual ratings, but how “supported” an individual tea was so that a single rating of 100 wouldn’t make a tea score higher than a tea that had several 99’s.
Towards, the transparency of ratings. We would definitely consider making ratings more transparent in the future. Given the way the site had developed, ratings without tasting notes was something we wanted to offer, but we hadn’t fully considered how that might leave the system open for potential gaming.
In instances where this happens we do address the situation. While some are just an affect of the ratings system (as we believe to be the case with this tea). But Ginko also brings up a good point, to a certain extent, people will always be able to game the system with fake accounts (as is the case with any online ratings system). We do our best to eliminate fake accounts, but they happen. What we’d really like to get into is a reputation system which could factor into ratings and help you understand which people are trusted reviewers in the community.
Just wanted to explain things a little…
I just want to underscore that I read those blogs prior to commenting and am really impressed. Kudos to you. The trusted advisor concept is a good one, and you could also somehow factor in, at a future point, characteristics of the raters into the algorithm, as that is another source of systematic measurement variance (e.g. One rater may tend high on all teas, while another tends low, but the relative positions of teas are consistent). At some point, I’d love to play with your data and come up with a validation of the algorithm….but that would be project 345 so it would be a few years down the road!