At Oak City Labs, we enjoy solving all kinds of problems. Our projects span subject areas from IoT, to mining data from social media to integrating video capture hardware. One of my favorite projects we’ve worked on recently involves computer vision and real-time video analysis of data from a medical device.

Our client, Altaravision, “has developed the most portable, high-definition endoscopic imaging system on the market today”, called NDŌʜᴅ. A Fiberoptic Endoscopic Evaluation of Swallowing or FEES system like this allows a medical professional to observe and record a patient swallowing food. The NDŌʜᴅ system is portable and uses an application running on a MacBook to display the endoscope feed in real time and record the swallowing test to a video file.

After the test is completed on the patient, the video is reviewed to evaluate the efficiency of swallowing. Ideally, the patient will swallow all of the food, but a range of conditions can result in the patient being unable to adequately swallow all the material. Particles that aren’t swallowed may be aspirated and cause pneumonia. When reviewing the test footage, the test administrator has traditionally had to carefully estimate the amount of residual material after swallowing. Not only is this extremely time-consuming, but also introduces human error and compromises the reproducibility of results.

Oak City Labs has been working with Altaravision to tackle this problem. How can we remove the tedious aspect from the FEES test and make the results available faster and with better consistency? As with all our automation projects, we’d like a computer to handle the boring, repetitive parts of the process. Using computer vision techniques, we’d like the NDŌʜᴅ application to process each frame of the FEES test footage, categorize pixels by color and produce a single numerical value representing the residual food material left in the throat after swallowing. We should give the user this feedback in real-time as the test is being performed.

The NDŌʜᴅ application runs on macOS, so we can leverage Core Image (CI) as the basis for our computer vision solution. CI provides an assortment of image processing filters, but the real power lies in the ability to write custom filters. A pair of these custom filters will solve the core of our problem.

Our first task is to remove the very dark and the very bright portions of our image. We’ll ignore the dark portions because we just can’t see them very well, so we can’t classify their color. Very bright portions of the image are just overlit by our camera and we can’t really see the color there either. Our first custom filter looks at each pixel in the image and evaluates its position in color space with respect to the line from absolute black to absolute white. Anything close enough to this grey line should be ignored, so we set it to be transparent. After some testing, it turned out that it was difficult to pick a colorspace distance threshold that worked well at the light end and the dark end, so we use a different value at each end of the grey spectrum and linearly interpolate between the two.

Throat no filter
Throat transparent filter

The top image is the original image data. The lower image is the image after the bright and dark areas have been removed. In particular, the dark area, deeper down the throat, in the bottom center has been filtered out as well as the camera light’s bright reflection in the top right corner.

Now that we have an image with the only the interesting color remaining, we can classify each pixel based on color. In a FEES test, the food is dyed blue or green to help distinguish it from the throat. We need our second pass filter to separate out the reddish pixels from the blueish and greenish pixels. In our second custom CI filter, we examine at each pixel and classify it as either red, green or blue by looking at it’s colorspace distance from the absolute red, green and blue tips of the color cube. We convert each pixel to its corresponding nearest absolute color.

Throat no filter
Throat color filter

The top image is the original image. The bottom image is the fully processed image, sorted into red and green (no blue pixels in this example). Note how the green areas visually match up against the residual material in the original image.

Finally, our image has been fully processed. Transparent pixels are ignored and every remaining pixel is either absolute blue, red or green. Now we use vImage from Apple’s very powerful Accelerate Framework to build a histogram of color values. Using this histogram data, we can easily compute our residual percentage as simply the sum of the green and blue pixel counts over the total number of non-transparent pixels (red + green + blue). This residual value is our single numerical representation of the swallowing efficiency for this frame of data.

In this process, we’ve been very careful to use high performance and highly optimized tools to ensure our solution can perform in real-time. The Core Image framework, including our custom filters, takes advantage of graphics hardware to run very, very quickly. Likewise, vImage is heavily optimized for graphics operations. We also use a little bit of the Metal API to display our CI images on screen, which is very speedy as well. While we’re enhancing NDŌʜᴅ on macOS, these tools are also quite fast on iOS as well.

At Oak City Labs, we love challenging problems. Working with real-time video processing for a medical imaging device has been particularly fun. As Altaravision continues to push NDŌʜᴅ forward, we look forward to discovering new challenges and innovating new solutions.