between standard median filters and Bagshaws
In the following, I will present some differences between a
standard median filter (I assume that you already know
how a median filter works. If not please read a tutorial)
and the de-step filter developed by
Bagshaw for his PhD-Thesis. Bagshaw focused on the detection of
the fundamental frequency (F0, pitch) for which task the two filters
are compared. All examples are given for
fundamental frequencies. But in general the statements
below can be applied to other tasks too.
Outline of the problem
Imagine you want to compute the pitch (fundamental frequency, F0) of
a voice and do this using a pitch tracking algorithm.
A common error of pitch tracking software is, that it
does not compute the fundamental pitch but the octave of the desired
pitch. So if you look at a computed pitch contour and the reference
you will find regions where the difference between computed
fundamental frequency and the reference value is rather small.
And then there are regions where the computed fundamental
frequency is doubled or halfed with regard to the reference contour.
This phenomenon is called octave error.
If an octave error occurs we can be sure of two things:
This means, that if we find an octave error, we can correct it by
halfing or doubling the computed pitch. The problem is to
locate the octave errors among other types of error.
- the computed pitch is wrong
- the reference pitch is half or double the computed pitch
Bagshaw's de-step filter algorithm
Bagshaw designed the de-step filter to eliminate especially
octave errors. He uses a simple criterion for octave error detection:
If two computed neighbouring pitch values differ more than a certain
threshold (e.g. 75 %), then it is likely that an octave error has
Bagshaw uses 75 % as threshold, because firstly, the difference is
sufficiently high to indicate an octave error; and, secondly to
compensate for minor errors caused by the pitch tracking software.
Bagshaw assumes that we do not know if the first computed F0 value
is an octave error or not. Consequently he uses sets in the
algorithm and does not rely on the initial F0 value.
He puts all F0 values in sets. If an octave errors occurs he changes
to another set and proceeds. After all sets for all F0 values have been
computed, the biggest set is identified. He assumes that the
majority cannot be wrong and decides that the F0 values in this set
represent the true fundamental pitch. All F0 values in other sets
represent octave errors. They need correction. The correction factor can
simply be derived from the index of the set.
Here is an outline of the algorithm:
set_index = 0
true_pitch_set_index = 0
octave_error_threshold = 0.75 /* 75 % */
Put the first F0 value into Set(set_index)
FOR all other F0 values of a voiced region
/* change set if octave error */
IF (current_F0_value > (preceding_F0_value * (1 + octave_error_threshold))
/* current_value is too high, change set */
set_index = set_index + 1
IF (current_F0_value < (preceding_F0_value * (octave_error_threshold))
/* current_value is too low, change set */
set_index = set_index - 1
put current_F0_value into Set(set_index)
Compute the set with the most items and assume that these values represent the true F0 values
true_pitch_set_index = the index of the set with most items
FOR all Sets
set_index = index of this set
correct the F0 values in each set by multiplying with 2^(true_pitch_set_index - set_index)
You can see that the de-step filter allows changes in the F0 contour,
but only so far as the changes are lower than the octave error
threshold. The algorithm prohibits octave jumps.
Differences with median filter
In the section above, I illustrated the behavior of the
de-step filter. In this section some differences to a median filter
are briefly sketched.
- Median filters have a fixed size, whereas the de-step filter
examines a whole voiced region. So a median filter is
working locally whereas the de-step filter works globally.
- Median filters always smooth; the de-step filter does nothing, if
no octave errors occur. If octave errors do occur, the
de-step filter only corrects the octave errors and no neighbouring
- The de-step filter does not permit jumps greater than the octave error
threshold, whereas a median filter allows them, if the region is
large enough. This can be a drawback for the median filter or a
drawback for the de-step filter, but this depends on your
application. In (spontaneously) spoken language, an octave jump
in a single voiced region is unlikely, so a pitch
tracking algorithm benefits from this de-step filter property.
The de-step filter only performs octave error correction. This is a
kind of smoothing the fundamental pitch contour, which is however
different from the smoothing a median filter performs. In general,
the fundamental pitch contour obtained after the de-step filter
should be smoothed. This could be done using a median filter.
So the de-step filter could be regarded as another processing stage
before the final smoothing of the fundamental pitch contour.