Exploring my music collection: foobar and folders

My music collection

Listening to music is one of my hobbies and I have a large collection of CDs.  (Huh, CDs, what are those again?)  All my CDs have been meticulously ripped to my computer as lossless files and then converted to high-bitrate MP3s.  I have been ripping my music as lossless files by habit, even during the heydays of peer-to-peer file sharing and before online music stores (e.g. iTunes Store) became popular, so that I can archive my collection.  Of course I use music streaming services such as Spotify today, but I use them more for music discovery, rather than for listening to my favourite albums.

I wanted to know ‘more’ about my music and make my listening experience fun and personal, so I started to use foobar2000 and was attracted by the variety of its components.  Over time, I have tagged my music files with all sorts of information, such as beats-per-minute (BPM) and ‘beat intensity’.  Through all my tagging, I have been able to produce playlists for my collection and search my collection more efficiently.

Without further ado, let’s see a glimpse of my music collection.

Image of my music collection (foobar2000).
My music collection in action (currently playing The Killers).

I have nearly 400 releases (albums, extended plays, singles, etc.) in my collection and that amounts to almost 19 days of music.

As that glimpse shows, I like to have a nice visual element to my music library and the listening experience.  I have album art for everything in my collection and other images such as the disc of an album (where available).  If anything, I think vinyl has had a resurgence in the last 10 years because of the large artwork (eye candy) and the more physical nature of the medium, compared to CDs and digital downloads.

There are so many questions that I think could be answered by making programs for my collection.  Streaming services are making their own algorithms with their big data systems.  Here are some examples that I could explore:

  • Are there artists in my collection that tend to have songs with strong rhythms?  (Perhaps more than I might expect?)
    • I would expect genres such as house, funk, ska and electronic dance music to have songs with prominent rhythms.  In contrast, jazz numbers and ballads would not have prominent beats.
  • What songs could go into my party playlists?  e.g. BPM range of 100 to 140 and songs with a strong rhythm seem ideal.  How much music would I have left when filtering the playlist to a certain decade, e.g. 2000s?
  • Could I categorise my collection based on the tags and metadata, e.g. via K-means clustering?
  • What are the most common colours across the album artwork in my collection?  Is the black-and-white palette dominated by post-punk albums or certain artists (e.g. Joy Division, the xx)?
  • What are the longest and shortest songs in my collection, by: genre, artist, year, etc.?

Making it easier to maintain and search my collection

My library is organised into a directory with this folder structure:

[album artist]\[year] – [album]

It works well for finding files quickly, but it is very difficult to ‘zoom out’ or take a ‘macro’ view of my collection.  This is where programming can  help: automating tedious tasks.  For example, if I want to do a batch rename of a selection of filenames, then a program could do that quickly and efficiently.

In my album folders, I can have a lot of ‘non-music’ files:

  • folder.jpg: the album art image
  • artist.jpg: image of the artist
  • disc.png: image of CD or record, with transparency
  • LOG file that has information on my music rip, e.g. accurate rip statistics, errors with the rip

As I have two separate libraries stored on different hard drives—my lossless and MP3 libraries—it can be hard to keep track of everything.  For example, I might have added album artwork to my MP3 library but not to my lossless library.

I have produced my own program to list directory paths that have image files, as one way to make it easier to maintain my music library.

These are some things I wanted to answer through my program:

  • How many images are in my music library’s directory?
  • How much hard drive storage is being used for the images?

With about 100 lines of code, I made a program that works a treat for my collection.

Music library images program output: start
Music library images: start of output

Here is a snippet of the code for the main function:

def list_images_in_lib():
 # Start time
 start_time = time.time()
 # 1: check album art
 tot_f, tot_f_size = (0, 0)
 print("\n", '*' * 20, "folder.jpg", '*' * 20, "\n")
    for root, albumArtists, albums in os.walk(os.getcwd()):
       for file in albums:
          if file.endswith("folder.jpg"):
          print(os.path.join(root, file))
          tot_f += 1
          tot_f_size += os.path.getsize(os.path.join(root, file)) / 1000000
 print("\n", "Number of 'folder' images in library:", tot_f, "(%f MB)" %(tot_f_size))
 # 2: check artist art
 tot_a, tot_a_size = (0, 0)
 # ...LARGE SNIP...
 # ................
 print("\n", "Number of 'disc' images in library:", tot_d, "(%f MB)" %(tot_d_size))
 # Calculate amount of time needed to process all the images:
 end_time = time.time()
 elapsed_time = end_time - start_time
 tot_images = tot_f + tot_a + tot_d
 tot_images_size = tot_f_size + tot_a_size + tot_d_size
 print("\n", "Time to process all", tot_images, "images:",
   round(elapsed_time, 6), "seconds",
   "(%f MB)" %(tot_images_size))

The list_images_in_lib function uses Python’s os.walk() and loops to get all the information needed for my files.  Fortunately, my music library is structured in a way that can make use of the os.walk() tuple results neatly.

The working directory here is ‘C:\…\MP3-library’.  Let’s give some intuition on how the os.walk() tuple is being used to reach an album:

  1. C:\Users\Mark\Music\MP3-library\
  2. Air\
  3. {2016} – Twentyears\

That is not enough information, as the item of interest is a file (e.g. folder.jpg, disc.png) in the album folder.  Another loop is needed to process the file: ‘for file in albums: […]’.  Within that loop, accumulator patterns and print statements are producing the output of interest: e.g. the stats on file size and the filepaths.

Music library images program output: end
Music library images: end of output

It takes about 6 seconds to traverse my music library and return all the output.  Perhaps there could be more clever use of string concatenation and a way to write the code without a nested loop, but for my purposes the code is working well and I would not benefit from speed gains much, even with a larger collection.

I was surprised that my average image size was anywhere near 1MB.

There is other functionality that I have included in my program but have not discussed in this post:

  • Check if an external hard drive is connected (e.g. my backup hard drive).
  • Choose music directory (local or external) to use for the program.

Tools that could be used to explore my music collection further

The foobar2000 Text Tools component can be used to gather the tags (metadata) for my music and eventually have all the music organised in a spreadsheet format.  A spreadsheet with all my songs opens up immense potential for statistical analysis and even machine learning (e.g. classification of songs).

Now that I have all my album folders as string output (when removing ‘folder.jpg’), there is also potential to make more programs for library organisation, e.g. batch renaming.


Where should I go next?  It would be great to see how else people have analysed their music collections.