Monday, February 17, 2014

Sox spectrogram log frequency axis and upper/lower frequency limits

Linear spectrogram
As a result of some of the work I did last week on rendering MP3 files to scrolling spectrum waterfall plots, I noticed that most of the interesting detail was in frequencies under 1kHz.  One obvious way to see the detail at lower frequencies, yet still keep the higher frequencies is to plot on a log frequency scale.

Unfortunately this is currently not supported in SoX, so I modified the spectrogram module to add this feature. This I noticed that plotting charts from 1Hz to the nyquist frequency meant that a lot of screen space was wasted in the lower 1 - 50Hz range which is not the most interesting for most audio files. So I also added a switch to set the lower and upper frequencies of the chart (this actually took considerably more work than the plotting to the log scale!).

Log axis spectrogram. Lower frequency details are far more visible.
The modified spectrogram.c file is available here [1]. Visit the SoX site [2], download the source code, replace src/spectrogram.c with this file and compile. This has been tested with the latest code as of 14 Feb 2014.

Summary of changes:
  • Implement -L which plots the spectrogram on a log10 frequency axis. By default from 1Hz to nyquist frequency.
  • Implement -R <low_freq>:<high_freq> : restrict spectrogram frequencies. I also updated the linear scale code to honor this switch.


Example:

sox mymusic.mp3 -n spectrogram -L -R 50:8k

Known problems and observations:
  • At lower frequencies (1 - 100Hz) each spectrum bin in q->dBfs[] array occupies several pixel rows: so it looks blocky. A lesser problem: at the top some frequency bins will be ignored because there will be more than one bin for each pixel row. I could fix the blockyness by getting a more detailed spectrum.
  • If lower and upper frequencies are in the same decade then there are no y axis tick labels. Hopefully will have a fix for that soon.
  • When compiling I'm getting this warning when using log10f() and powf() : "warning: passing argument 1 of 'log10f' as 'float' rather than 'double' due to prototype". According to the man pages for those function they should accept float args! I see some discussion on this relating to compiler switches: but I don't want to go changing anything there.
  • I created two new functions to parse the -R frequency range switch to cut down on code duplication.
    parse_range (const char *s, int *a, int *b)
    parse_num_with_suffix (const char *s, int *a)
    I'm not familiar with the sox code base, so I'm not sure if similar functions already exist (I couldn't find any). I'm also concerned I'm making a temporary change to a string in parse_range when the type is const char *.  I need to brush up on my C :)


Footnotes:

[1] https://github.com/jdesbonnet/joe-desbonnet-blog/tree/master/projects/sox-log-spectrogram

[2] http://sox.sourceforge.net/


3 comments:

Anonymous said...

Great article! I followed your instructions. Got the source, replaced your file, compiled and installed. but nothing changed.

Is it because I had SoX already installed (by apt-get) and didn't install it. I thought Could this be why it's not working?

Joe Desbonnet said...

Sorry for the delay in replying: yes, if you already have it installed, you'll probably need to have an explicit path to the new binary when invoking sox (I can't remember exactly where the binary is
but your command line might like something like this

/home/jbloggs/my-modified-sox/bin/sox ...sox args...

lifeh2o said...

Log graph is stretched out on bottom. So the frequencies are not scaled in log, but instead it is just the image stretched like log scale, right?