Sauvola thresholding
A local thresholding method that is used for segmenting text. Primarily meant as a pre-processing step for optical character recognition.
Roughly the approach is for every pixel compute a background approximation using neighboring points. This background is then subtracted from the image and the threshold is based on the zero intersection of this difference. This is often better than using a global threshold.
paper from Sauvola, and a good explanation.
The method to compute the background at a specific pixel is as follows.
- First pick a neighborhood. Here it is specified as r. The neigborhood is then the (2r+1)x(2r+1) box centered at the pixel. This is roughly a radius of r. The reason we use a box rather than a circle is for numerical efficiency.
- Compute the mean and standard deviation (std) for this neighborhood. To deal with the fact that the standard deviation will depend on the scale of the values (8 bit, 16 bit etc) R is a scaling factor. This is a parameter that the paper suggests be 128, but that is based on an 8 bit image with values between 0,255. This is used to scale the standard deviation.
- The background is then constructed using these numbers along with yet another coefficient k. Look at the paper for guidance.
mean × (1 + k * ((std/R) – 1))
Example
Create a contrived example of a dark circle on a lighter background.
This is not going to work well with a single threshold value
The reason is that even though your eye can see the circle pretty easily, it uses a contrast rather than absolute intensity.
To tweak the parameters it is useful to switch out what the computational action returns. By default it just returns a polygonal representation, the zero threshold of the difference between the input intensity and the estimated background. If you get this in terms of a binary mask as well. The Image option is the image minus the background and Everything is all of the above three saved in a variable group.
The following figure shows the image and the zero value.