Open Music Database
The following are a set of notes that start to define a project
that I call Open Music Database. The purpose is to create
a free discographical database about recorded music. This project
needs a good deal of design work to get under way.
There are other music databases online (including
mine), but they all have a
lot of limits. By far the largest such database is the
All-Music Guide --
it is wonderful, but has a lot of problems.
I think that a project like this could eventually draw a fair
amount of industry support. One of my hopes here is to get
reviewers involved, and eventually build up a large metadata
log of writings, comments, and ratings.
- All software must meet the standard of the
Open Source Definition.
The preferred software license is
- The data must be unencumbered. The data must be freely downloadable
and reusable, without restriction.
- The data should be partitionable: it should be possible for users to
download a coherent database that covers a well-defined subset of the
total data. (For example, a database of reggae artists and records.)
- Users should be allowed to add their own data and layered software
without restriction. (For example, a record store should be able to
add inventory and price data, and interfaces to that data.)
- It should be possible to attach arbitrary metadata to database
records. These may include: text/quotes, graphics, sound, video.
- Metadata is generally considered outside of the database itself.
In particular, downloading the database need not deliver metadata;
rather, downloading metadata should be able to attach itself to
- Sound and/or video should not be a priority. However, we should
make it as easy as possible for industry to contribute freely
- Licensing for metadata is TBD, but probably wide open. We should
probably allow other parties to sell compatible metadata.
- General information service: web interface to whole database. This
database can be mirrored. Different copies of the database can be
set up with different user preferences (e.g., language).
- Specialized information service: a partition of the database for
a special interest group (e.g., style/category, label).
- Framework for a record store catalog, auction or trading service.
This could include a record store kiosk.
- Framework for a record review/rating service.
- Framework for a personal inventory.
- There should be a uniform framework for subjective ratings of
artists and recordings. There may be multiple rating categories
(z.B., sound quality).
- Ratings are to be provided by approved, registered reviewers.
The reviewers must answer a questionnaire which provides some
background information on the reviewer. and data where by the
user can correlate interests with a set of reviewers.
- Each database site can establish its own set of reviewers.
Queries / Reports
- By person/artist:
- By title:
- By label:
- By style/category:
- By instrument/language:
- By date:
- By ratings:
- By song:
- Compound queries: It should be possible for the user to construct
logically complex queries. This may be done through a metalanguage
(like SQL, but probably not SQL) and/or GUI. The metalanguage
should have popular language bindings (e.g., Perl).
- Reports (e.g., what shows up in your web browser as a result of a
query) should also be scriptable. I.e., you should be able to ask
for certain information to be formatted in certain ways.
Integrity and Accountability
- Update to the database and source code is restricted to known
contributors. Direct access may be limited to moderators, who
can screen input from contributors. (Don't know whether there
will be a need to limit access by area.)
- There should also be a reviewer role. Reviewers can specify
areas of interest/expertise, and will be notified of all changes
in those areas (proposed and actual). Notification should be
- Each change to the database should be logged. The log may include
information on the authority for the entry, so that we can judge
correctness and resolve controversies.
- There should be a maintained list of open questions and uncertainties.
- We may want a means to lock information that is considered certain
(e.g., that Louis Armstrong was born in 1901, not 1900).
- Applicability to other domains (of which Movies, Books, and
Objets d'Art are the most tempting). These may be good projects,
and may ultimately share a good deal of software, but should not
be considered requirements.
I am not a database expert, and this domain (which is large, complex,
and ill-behaved) is beyond my skills to suggest a representation.
The following are notes on the types of information that needs to
be represented. How to do this is TBD.
One record per person, in any way referenced by database. Not to
be confused with Artists, below.
- Unique identifier.
- Names: Given (birth) name, other legal names, nicknames, aliases,
pseudonyms. Any of these could be search targets (may also need
common alternate spellings, e.g.: Grappelli, Grappelly). There
should be a header name; i.e., the name that appears at the
head of a person listing.
- Association list: membership in Artists.
- Birth/death date/place. Note that dates and places may more/less
precise (e.g., June 6 1914, 1914, 1912-1915, 1910s; US, US:NY,
US:NY:Buffalo). We may also want to denote uncertainty.
- Instrument list (including vocals): would be nice to quantify somehow,
so we don't pick up spurious hits.
- Language list: for vocalists, languages of material recorded in.
We may want to enforce some minimal threshold; e.g., does Debbie
Harry qualify for French?
- Ratings, notes, metadata.
I'm using Artist to refer to any name under which a record is released,
which may be a person, an alias, a group.
- Unique identifier.
- Name membership. Groups with membership that changes over time may
have one entry per name set (e.g., Rolling Stones #1 with Brian Jones;
#2 with Mick Taylor; #3 with Ron Wood); these need to be grouped.
- Active dates: minimally begin/end (or still active).
- Ratings, notes, metadata.
Records / Releases
A record is a set of releases, where each release has substantially the
same contents under substantially (or exactly?) the same title. The main
reason for the grouping is that things like ratings get diluted without
the grouping. (Although ratings could be summed for several related
records; e.g., "Greatest Hits", "The Best of ...", "The Very Best of ...",
"The Ultimate ...", etc.)
Also, while most information is common among releases, some will vary.
Mostly common information:
Information that tends to vary by release:
- Dates recorded: minimally start-end; may want individual session
dates. Note that in many (most?) cases date recorded is only known
to be before first date released.
- Dates for initial release.
- Style / category information.
- Previously released: if so, where is original?
- Related records, including how much intersection?
- Detailed personnel list.
- List of songs.
- Ratings, notes, metadata.
- ID number.
- Release date.
- Media: LP, CD, etc. (I'd be happy to just do CD only, at least to
- In print flag.
- Number manufactured: estimated or actual?
- UPC code.
- Company or other grouping: e.g., Polygram for hundreds of labels.
- Contact information (address, phone number, URL).
- Distribution information: who distributes in which markets?
- Dates active.
- Notes, metadata.
Styles / Categories
Style or genre classification is necessary for targeting queries and
evaluating reviewers. The definition of such styles / genres is often
very subjective, and sorting it out will be a lot of trouble. Also,
note that hierarchical models (like AMG) break down with fusion
In addition to conventional styles / genres, we may want to provide
other categories that relate to conceptual classification. For example,
I once proposed a "quark scheme" of classification: up, down, strange,
charm, top, bottom. Categories may vary by style / genre.
- Title: there may be variations in title listings.
- Associations: derivations, answer songs.
- Date composed.
- Links to recordings/artists.
- Ratings: identify classic/quintessential performance(s). (Example,
"Summertime", IMHO: Sidney Bechet, The Ravens, Billy Stewart,
Janis Joplin, Zoot Sims.)
- Value: should be a uniform numeric scale; e.g., 0 .. 10 or -5 .. +5;
possibly with 0.1 optional precision.
- Credentials: what qualifies the reviewer to review, including
- Rating histograms: how does reviewer apportion ratings?
- Contact information: needed for maintenance, but public view
should be optional.