Tom Hull: A Note on Grades

Links

Local Links

Music

Music Database

Artist Search:

Browse

Website Search

Google:

A Note on Grades

I'll rewrite this in the near future. Meanwhile, see my answer to Jeff Golick's question on grades.

Background Notes

I was curious about what I had written about grading/rating in the past, so searched through the notebook. An egrep for 'ratings|grades' kicked out 907 lines. The following are selections from the notebooks (with occasional annotation):

2005-April-05: Methodological note on Jazz Consumer Guide, including a fairly long discussion of grades:

My system puts a ridiculous amount of emphasis on grades. This is wrong in that it suggests that there is a measurable standard against which the records are evaluated. Of course, there is no such standard. The closest simple grading system I can come up with would be to measure two factors: how expertly do you fulfill my expectations for a type of music, and how surprising is the result. In other words, competency and invention. But two such factors are incommensurable, often even contradictory. Quantify the two and multiply them and the answers is bound to be nonsense. Yet that's more or less what grading does. Still, I do it. I see two advantages in it: one is that it helps in managing quantities of data; the other is that it makes my writing more economical. With the grade at the end you know whether I like the record or not, and approximately how much -- no need to tune adjectives. And the data is large: I get about 400 jazz albums a year, and the grades map those 400 into a context provided by over 10,000 grades in my album database.

The grading system I use is roughly based on what Robert Christgau has done in his Consumer Guides, but I'll give you my definitions here. First, a B record is a good one: competent, skilled, pleasing, unremarkable. I could play B records all day long and never complain, but presumably I'd wind up wondering why I bothered. I've mostly tuned my ears to not notice B records. Anything below B has somehow managed to annoy or offend me. I rarely go very far down the grade list, and don't claim much precision there -- once a record dips below the line of tolerance I lose interest in it. In general, a C+ record is probably a competent piece of hackwork, while a C- record is likely to be a much less competent atrocity. Lower grades usually indicate pain, as opposed to mere annoyance.

A B+ is a consistently enjoyable album or one with remarkable features that I may not fully appreciate or value. I've found many of my favorite albums in Christgau's B+; you will likely find treasures in mine. In practice, the upper third are records I enjoy a lot; the bottom third include records that I admire more than I like, but they all have much to recommend. It's just that the A- records have more -- sometimes much more. Higher grades are rare -- in the database they are usually records that have stood the test of time, that exemplify a unique artistic vision, but sometimes they just make me deliriously happy from beginning to end. I'd like to think that A and A+ records are universal -- that even someone who doesn't think they like avant-garde jazz, for instance, could really get into records like Dave Holland's Conference for the Birds or Amalgam's Prayer for Peace.

Based on what I get, I'd have to say that the distribution for current jazz records is a bit above normal -- the mean record is somewhere in the low B+ range. My own results skew this way mostly because I seek out good records while bypassing not so good ones, but if I did get everything, and managed miraculously to grade it all, the mean should drop into the mid B range, but the distribution wouldn't be normal -- it would be skewed high, more B+ than B-/C+, maybe more A- than C/C-. Most of the reasons for this are systemic -- they apply to any kind of music, where good musicians (however you define that, and there's a wide range of opinion) simply get more opportunities to record than bad musicians, where good records get promoted more than bad ones, etc. But I will mention two reasons that are relatively specific to jazz. One is that it's a relatively homogeneous form of music -- mostly instrumental, mostly out of a specific historical tradition, with common conventions. The other is that jazz is relatively untouched by commercial pressures -- and things that go with money, like production budgets. Proof of these points can be gleaned by looking at the exceptions: vocal jazz grades much more variably than instrumental, while the most commercial jazz variants skew quite a bit down. (The mean for "smooth jazz," in my rather limited experience, is close to the B-/C+ border, and I'm rather open-minded on the subject.)

2005-October-16: About when I started to break B+ grades down with 1, 2, or 3 stars. Robert Christgau introduced the 3-star ratings in Christgau's Consumer Guide: Albums of the 90s. He used stars to sort out rough levels of Honorable Mentions, without explicitly tying the system to his previous B+ grades. He explains in his Key to Icons. His intension was to only review B+ or better records, and to only look for possible A- or better -- although he didn't fully escape having to review the occasional Dud (or Turkey, a distinction without a difference).

What follows is one week's worth of notes on listening for the next Jazz Consumer Guide. These were written on my first play in this cycle, although I've played some of these records before. Grades in brackets are tentative: I figure I need at least one more play before I can be certain of my rating. The B+ grades also have 1-3 stars. In practice, the 3-star records will probably turn up as Honorable Mentions, and the 1-star records won't. All B+ records are good records, consistently enjoyable or sporadically brilliant, but probably not both. The ratings help me sort out what to write about. The notes are just stops on the way to writing the reviews. Some words may survive, especially if I come up with a catchy phrase, but at this stage I don't worry much about phrasing and conciseness. I'm putting this into the blog partly to document how I work, and partly because I feel guilty that most of these records won't eventually show up in a published Jazz CG.

2009-November-04: This was a comment on the third revision of Robert Christgau's User's Guide to the Consumer Guide, which was originally written in 2006, when Christgau resurrected his Consumer Guide at MSN Music. It provided an opportunity to contrast our grading approaches.

Christgau's current Consumer Guide review count is 14,595 (including 2 CGs I haven't gotten around to loading into the database yet). He's been doing this longer than I have, and more consistently (especially in the 1980s and 1990s, when I mostly worked on other things), but his rate is lower -- more like 500 per year. He explains it this way: "But my second biggest gift is that I know what I think. I don't write about something till I'm pretty sure how much I like it, and I'm skilled at recognizing when that is." We both spend roughly the same amount of time listening -- he pegs it as 12-18 hours a day, which is an average day here too, but one that is hard to add to. The 3-5 plays he cites for albums he writes up for Consumer Guide is also about what I average, although sometimes I'll bag one earlier. The big difference is that he samples a lot of stuff that he doesn't grade, whereas I jot down something for virtually everything I play -- and he's gotten more disciplined at that over time, whereas I've gotten sloppier. But then I've never been so convinced of my grades. They've always struck me as probabilities that become more significant the more I play something, but given how few times I play most things -- the median is either 2 or 1, which is to say that the 2-play point is very probably somewhere between the 40 and 60 percentiles: at least 40% of my grades are on records that I have played 2 or more times, maybe as many as 60%. If I had the data (and I can't even contemplate starting now) it would be interesting to qualify each grade with a play count.

On the other hand, my uncertainty is to some extent a trait of personality. When I do go back and replay records -- which, e.g., happens when I need to craft CG reviews of records I've already graded -- I almost always get the same results: some may inch up or down a notch, but very few. Presumably one-play grades of B+(*) or B which are more or less polite dismissals of things I can't use are more volatile, but even if 25% are off by a notch and 5% by two notches it's hard to justify the extra time. Christgau does the same sort of triage, and does it faster -- he wouldn't, as I'm doing now, play through to the end of a string quartet album that's not even bad.

Another subject of the User Guide is range and prejudice. A lot of the things on Christgau's pet peeve list are things that I rather like -- art-rock, bluegrass, fusion, techno, salsa, soul jazz, swing -- although they do tend to be hit and miss (and I've never had much luck shopping for salsa). Metal isn't a prejudice so much as one of a bunch of genres with a rather low likelihood of interest, but the same can be said about lots of genres -- new age, experimental rock, nu soul, and pop jazz are no more promising. Irish folk and gospel are more like prejudices, and ones that we share, most likely for similar reasons. Classical, too: that's a "genre" that elicits physical revulsion -- a genuine case of prejudice that I doubt I'll ever overcome.

You can get a rough idea of the distribution of my taste and expertise by looking at the database table. About half of the total number of records rated are jazz (7,846 of 16,019, or 48.9%), loosely considered. That's grown like cancer since I started Jazz CG in 2003 -- the most obvious proof is the 1098 records by jazz artists who hadn't recorded before 2000. Still, there's very little there that I'm not well versed on -- weak spots are Latin jazz, old crooners, and pop jazz, but those cases are still relative. I also consider myself pretty expert on blues (674), country (949), and rock and roll through the 1960s (1150) and for that matter the 1970s (1207). I'd also claim a fair sampling of hip-hop (476) and reggae (294), and at least a serious amateur interest in African (410). Folk (254) isn't a very clear category, as some leans country and some singer-songwriter rock. Everything else is rather patchy, including Latin (271 or 422 if you count the previously counted Latin jazz). I've imagined trying to write The Gringo's Guide to Latin American Music, but I honestly doubt that I've heard more than 20% of the records I should hear to write any such book. I haven't broken rock down into white and black, but the breakdown is probably representative. The electronica list (254) doesn't actually go very deep. A lot of this could be better categorized if I had a better scheme, but I've never come up with a good one.

Still, most of the people I know who have prejudices can't stand either country or rap, some don't like jazz, and a few are narrowly focused on things like metal or techno. (I know of people who like classical music, but can't think of any exclusivists, even though they were legend when I was growing up.)

2011-December-03: This is part of a longer piece on EOY lists and metacritic aggregates, which has nothing much to do with my grades. However, this does:

Actually, the line that cut closer to home was Sheffield's: "I really don't give a giraffe's nads what anyone, even myself, thinks of a new album after one listen, or half a listen, or a third of a listen." That seems like qualified data to me -- not what you'd get with multiple careful listens, but snap reactions often prove right, and I'd rather have more data than less -- at least that's why I'm (usually) willing to commit myself after a single play. (I almost never stop early, although I was sorely tempted by a Bill Orcutt solo guitar album last night.)

2016-September-16: This was where I started to collect my jazz reviews into LibreWriter files. My plan was to produce two record guides: a comprehensive and potentially publishable Recorded Jazz in the Early 21st Century, and whatever scraps I managed to collect from the 20th Century. This occasioned a revamping of the grading system, as I thought a numeric scale would be easier to manage and understand than letter grades.

I'm also considering making a fairly substantial change to the grading system. I thought it might be better to convert the letter grades (with their 3-star subdivision of B+) into a numeric scale (1-10). My first attempt at a conversion was: 10 = A+, 9 = A, 8 = A-, 7 = B+(***), 6 = B+(**), 5 = B+(*), 4 = B, 3 = B- or C+, 2 = C or C-, 1 = any D, 0 = any E.

Two problems there, one at the top of the scale, the other near the bottom. The former started when I initially applied my letter grade scale to my records list, A and A+ made sense only for records that had stood the test of time and many plays. However, after JCG started my working methodology changed so that I almost never managed the several dozen plays those older records had enjoyed. I basically stopped using those grades. For instance, the one and only A+ I've given to a jazz record released this century was James Carter's Chasin' the Gypsy, and that was released in 2000. (I'm pretty sure my most recent A+ was Lily Allen's It's Not Me, It's You in 2009, although it didn't get promoted until several years later.)

Actually, there's not much A+ jazz earlier either: I count 41 albums, one each (or more in parens, but some are redundant) for: Louis Armstrong (5), Ornette Coleman, John Coltrane, Miles Davis (2), Duke Ellington (9), Ella Fitzgerald (3), Coleman Hawkins (2), Billie Holiday (2), Fletcher Henderson, Johnny Hodges (2), Louis Jordan, Charles Mingus, Thelonious Monk, Art Pepper, Don Pullen, Sonny Rollins (3), Roswell Rudd, Jimmy Rushing, Pharoah Sanders, Horace Silver, Frank Sinatra, Art Tatum. That's out of 14032 jazz albums rated, so 1/334 (0.2%). That's, well, even I have to admit that's pretty picky -- rarefied even -- especially if the concept is to grade on some sort of curve.

There are a good deal more A records, ten times as many (419, or 2.9% of the total), but they too are concentrated among older artists. From 2000 onward, I've given out 65 A grades (counting Carter's A+), an average of 4 per year (exactly, not counting 2016, which so far has 1). I don't have an easy way of counting the sample size there, but it's at least 5000 and probably closer to 7000 so we're looking at a number that will round off (probably up) to 1%. Seems to me like I could combine A and A+ at 10 and still have no more than 1% at that level -- less than 100 records covering two decades.

The other problem is at the bottom. Keeping the three subdivisions of B+, which I think is well justified by my recent practice, pegging A- at 8 pushes B down to 4, and forces me to combine lower grades. This is less important, but intuitively it seems to me that B should be 5, and that the distinction between B- and C+ is meaningful (not that the difference between 4 and 3, or 3 and 2, is really going to sway any of your buying decisions). Below that matters less, not least because I put so little effort into discerning qualitative distinctions between records I actively dislike.

In recent years my impression has been that each of the three B+ levels were fairly evenly distributed (possibly with a slight bulge in the middle, at **), with A- and B tapered off, and sub-B grades rare -- partly because I don't seek out records I'm unlikely to like, and partly because many of their publicists have given up on me. But I've never counted until now. I did three counts, first on the entire rated database (27526 albums), then on the jazz subset (14032), and finally on the post-2000 jazz subset (undercounted a bit at 8268), which breaks down thus: A+ 1 (0.01%), A 63 (0.76%), A- 883 (10.7%), B+(***) 1445 (19.0%), B+(**) 2122 (27.7%), B+(*) 1730 (22.6%), B 1064 (12.9%), B- 364 (4.4%), C+ 81 (0.97%), C 30 (0.36%) C- 15 (0.18%), D+ 2 (0.02%), D 2 (0.02%), plus 455 additional B+ albums (divided proportionately for the percentages; the overall B+ percentage is 69.56%). This actually looks rather like a pretty normal distribution, left-shifted by various factors biased in favor of selecting better records (ones I bought, sought out, or that savvy promoters sent my way) in an idiom that I broadly respect and enjoy. Or it may just be that the left-shift is to be expected, just because the skillset jazz demands is so exceptional.

Taking all this into account, a few days back I proposed to shift my grade scale a bit leftward, combining A/A+ at 10 (still just the top 1% of rated albums), moving A- to 9 (10%, so the top decile), the B+ tiers to 8-7-6 (all records that will repay your interest), B to 5, B- to 4, C+ to 3, C or C- to 2, all D to 1. Of course, the latter ranks will be underrepresented. The only real reason for flagging a bad album is to warn consumers who might otherwise be tempted, but most bad records never tempt anyone -- they come from people you don't know or care about, and quickly vanish without a trace.

So I wrote my proposal up and sent it around to various critics, most of whom didn't like it. For example, Robert Christgau wrote back: "I definitely think everything shd be a notch down, with perhaps a somewhat lenient view of what constitutes an A plus than in my system." So I should shift some A records to 10, leave the rest at 9, peg A- at 8, and let everything else fall accordingly, combining various lower grades I rarely use anyway. Splitting out more bins on the left would provide a more even distribution, but keeping 9 and 10 reserved for less than 1% also suggests a fetish for perfection that hardly anything can achieve. I'm not sure that's either useful or achievable.

A couple others mentioned the Spin guide as a familiar model, with the implication that A- should be pegged at 8 (or maybe split between 7-8). However, my copy defines 10 as "an unimpeachable masterpiece or a flawed album of crucial historical importance" and 7-9 as "well worth buying, sure to provide you with sustained pleasure," and they even have kind words for 4-6 if you're "deeply interested in the artist or genre." I'm not sure what I'd be curious to see a histogram of those grades: how does the distribution line up with my own data? My mapping would put A- through B+(**) into the 7-9 range, as various degrees of records I recommend (indeed, that I store separately from recent jazz graded lower), while the 4-6 range gets B- to B+(*) -- the latter are records that I respect and sometimes even admire but don't much feel like playing again (those usually go to the basement, but thus far I haven't discarded any).

Of course, if one started from scratch, one could devise an elegant distribution curve (say 4-7-10-13-16-16-13-10-7-4, or 2-5-9-14-20-20-14-9-5-2) and sort everything accordingly. But that assumes you can rank everything before slicing it into tranches, something that based on no small experience I find impossible. But more importantly for me, I need some way to mechanically transcribe the letter grades I have into numerical grades. So while I might get a more pleasing curve if I could move the uper half of my A- records from 8 to 9 and the upper third of my B+(***) albums from 7 to 8 and slide some slice starting at B+(*) down a notch, it would be hell for me to try to figure out how to split my existing levels. (It's going to be bad enough just to divvy up the unsorted B+ records.)

In the Introduction to my ratings database, I wrote:

I've been accumulating records since the mid-1970s, and have sporadically written about popular music since then. . . . The database evolved from simple lists just to keep track of stuff -- originally records that I had listened to, then it grew to include records that other people think are worth listening to. . . . The grades probably say more about me than about the music.

The ratings are letter grades, similar to Robert Christgau's Consumer Guides.