Instantaneous Power of Phase-Aligned Signal Segment

Finding periodicity and period length in a signal is essential for some
pitch shifting algorithms. It is often very easy to identify periods by
eye in a signal plot. The example below has an obvious repetitive
character. But how can a computer routine determine the pattern length?
The signal has several zero-crossings per period, and several peaks as
well.

signal with four equally strong harmonics |

Phases of the harmonics in the above test signal were randomly
selected, which makes it hard to identify a period start. This may seem
unrealistic, but I've found that acoustic sounds do not always have
their phases aligned either. Anyhow, a process to align phases is often
used in pitch detection: autocorrelation. Below is the autocorrelation
function plotted of the example signal, together with spikes indicating
the autocorrelation peak heights and locations:

unbiased autocorrelation of signal with four equally strong harmonics |

With the spikes being so different in height, it is now easy to identify period lenght as the interval between two peaks of considerable height.

Short-time autocorrelation has some similarity with convolution. A
segment of the signal is taken, then time-reversed, and used as
convolution kernel on that same segment. As a result, phases are set to
zero, and the original amplitudes are modified. Autcorrelation output
has length [input length * 2 - 1].

Just like how autoconvolution of a block results in a triangle,
autocorrelation functions have a triagularish shape, which is regularly
undone by an inverse scaling function (unbiasing). Autocorrelation is
most efficiently done via frequency domain, as the inverse FFT of the
power spectrum.

It is not uncommon for acoustic sounds to exhibit a harmonic recipe with a weak fundamental. Typically, this happens with low-register tones in vocals and instruments for which the resonator is too small to amplify the low notes. Below is an example test signal featuring three harmonics with amplitude ratio's 0.2, 1.0 and 0.2. Phases are randomised again, and it is hard to imagine how a computer could tell the period length here, even though the signal lobes are still somewhat different for the eye.

signal with weak fundamental |

Indeed, the autocorrelation function of the above signal is not very
significant. Sure the peaks are still a bit different in height, but
natural sound level fluctuations and noises would marginalize such
differences. This is a likely cause for errors in autocorrelation
analysis. The dreadful octave errors!

autocorrelation of signal with weak fundamental |

Using the amplitude spectrum instead of power spectrum, a signal segment can be made with the original amplitudes of the components and all phases set to zero. Here is the PASS function (Phase-Aligned Signal Segment) corresponding to the signal with weak fundamental:

PASS of signal with weak fundamental |

The PASS function's periodicity peaks are more pronounced than in
the autocorrelation. From PASS, it is possible to construct another
function which really stands out in the missing-fundamental case:
IPPASS, Instantaneous Power of Phase-Aligned Signal Segment.

IPPASS of signal with weak fundamental |

For comparison, here is the test signal once more from which the above IPPASS was derived:

signal with weak fundamental |

The IPPASS function is derived from the two phases of an analytic
signal, zero-phase and quadrature phase. If a test signal is made of
cosine components plus sine components of the same weight, we already
have an analytic signal. When the signal is connected to an x-y
oscilloscope, you can see the amplitude curve which is followed by the
signal. The amplitude is a radius on the complex plane. A pure sinewave
describes a circle, but a harmonic recipe shows periodic amplitude
modulations. It is funny that we do not hear the amplitude modulations
as such. Only if a false note is played, we start to perceive explicit
amplitude modulations: beats or roughness.

The x-y plots below show a few examples, with their harmonic recipes indicated as fractions again. The second plot has the weak-fundamental recipe from the earlier test signal. In each case, the amplitude plot describes several (sub)cycles, but there is only one large amplitude peak per full period.

1.0 + 1.0 |
0.2 + 1.0 + 0.2 |
1.0 + 1.0 + 1.0 + 1.0 |

The IPPASS function uses the square of the amplitude, the instantaneous
power. The range of the power function is expanded, making it easier to
distinguish small peaks from large peaks.

For some harmonic recipes, the instantaneous amplitude curve
describes several equal traces per period. The case plotted below has
two odd harmonics, a clarinet-like recipe. For such cases, the IPPASS
function does not help in tracking the period, but instead it is
confusing. And in the case of a pure sinusoid, IPPASS shows a flat
line. Fortunately, such signals show unambiguous peaks in the
autocorrelation or in the phase-aligned signal segment. PASS and IPPASS
are sort of complementary in this sense.

1.0 + 0.0 + 0.8 |

IPPASS is the instantaneous power of a phase-aligned signal segment.
Alignment of the phases is required to concentrate as much signal
energy as possible in a main periodical peak, analogous to the purpose
of autocorrelation. That aspect is best described as a frequency-domain
characteristic, the amplitude spectrum of a signal segment:

where k are the frequency bin indexes |

The PASS function is taken as the zerophase signal resulting from
inverse Fourier transform of the amplitude spectrum:

where n are the PASS sample indexes |

Likewise, a quadrature phase version of the phase-aligned segment is created as the inverse Fourier transform of the amplitude spectrum multiplied by -i:

Together the zerophase and quadrature phase signal segments form an
analytic signal segment, of which the instantaneous power can be
computed by squaring the phase samples and summing them pointwise:

So far, all seems simple and straightforward, but in practice it is
not at all so easy to produce a neat phase-aligned signal segment. The
example test signals above had frequencies harmonizing with the FFT
size to make a nice demo. But a sinusoid not harmonizing with the FFT
size has a lot of real and imaginary coefficients even if the signal
segment is at phase zero respective to the FFT frame. Illustrations of
this can be found on page 'FFT
output'.

Making all FFT coefficients real positive creates an output signal
with cosines only, but this is not quite the zero-phase version of the
input signal which I had naively hoped for. Here is an illustrative
example:

zero-phase input signal |

inverse FFT of amplitude spectrum of above signal |

Wait, is it not the case that phase changes in the spectrum make
signal tails fold over, like with circular convolution? For this reason
autocorrelation is done with zero-padding so the output gets space to
grow.

Let's have a look at the phase-aligned version of a rectangular
window with zero-padding. Bah, it looks like an onion!

IFFT of magnitude spectrum of rectangular zero-padded window |

There's really no way to factor that onion countour out of an
arbitrary 'phase-aligned' signal segment. The input signal must be
Hann-windowed, that's for sure. Now a new problem arises. Normally,
windowing is undone at the output by overlapping segments. But the
phase aligned output segments can not be integrated to form a continous
signal stream, since it starts at phase zero every time, regardless of
the input phase! That's why I'm consistenly speaking of 'signal
segment' all the time. There is nothing to overlap here. The segments
must be 'unwindowed' by division, brrr...

That's a pity. This nice IPPASS function must be calculated with
help of such an ugly thing as division by the window. I've been
puzzling for days to find a better solution, but this might be one of
those impossible missions. I'm now using something inbetween a Hann and
Hamming window to get a compromise, avoiding too small numbers in the
denominator. It needs refinement.

Update: a Gaussian window seems to work better.

Anyway, I can now use the functions on real world input. Here is a
plot showing autocorrelation and PASS/IPPASS as derived from an
acoustic input signal. It is my voice saying a noisy 'uuuu'. Before
analysis, the input signal was steeply lo-pass filtered with cutoff at
1 KHz. Here, autocorrelation is on the brink of failure, while PASS and
notably IPPASS have no problem to indicate the correct period length.

Of course I produced that sound on purpose to demonstrate the
qualities of IPPASS. But frankly, there are also cases where IPPASS is
good for nothing. Like here, where the (filtered) input signal
approaches a pure sinewave:

I have no clue if the function which I've now baptized 'IPPASS' is
already in use for any purpose. In particular, I hope it is not taken
hostage in the patent war on pitch-tracking. In my view, math functions
should be free for all to use.

If you want to see the IPPASS function in action, download
IPPASS05.pd, a demo patch for Pd-extended:

IPPASS05.pd.zip,
8 KB, patch for Pd-extended |