Miqe's Blog: November 2007

018 - Multimodal Interaction

Summary:

Adler et al describe a system where users draw sketches and interact verbally with the computer. The user can just talk and explain what they are drawing, or the computer will ask questions when unsure. The paper describes a Wizard of Oz type user study with an "experimenter" person sitting in pretending to be the computer. The experimenter asked the user questions to clarify the user's intent. He had his own screen to view the drawing, but avoided eye contact with the user. The study showed that user's tend to over-clarify their answers to simple questions, and they also pause their speech to finish their drawing.

Discussion:

Really some very interesting results. I can't help but think of the old starship computer on Star Trek when I think of talking and pressing buttons. Sadly, these results are still inconclusive in my mind, as the study WAS a Wizard of Oz and NOT a real computer. There is a HUGE transition between a person sitting across the table and a computer talking to you, both with the programming and with the attitude of the user. All in all, it's a very interesting idea, but a bit obscure. The paper was all about a user study, but further testing needs to be done for anything solid.

017 - Three Main Concerns in Sketch Recognition

Summary:

In this paper Mahoney and Fromherz describe what they call the three main concerns and then show a system that vaguely covers the three concerns. The three concerns are dealing with human inaccuracy in sketches, being interactive, and being able to add new features later. The system described deals with "stick figure finding". It searches through the sketch checking every single line for different labels. For example, a body has 5 attached lines - a head, 2 arms, and 2 legs. Using an algorithm, the system goes through each line and checks to see if it is a body, and eventually finds the optimal solution. This allows for context lines not to be included in the figure and be seperated.

Discussion:

Two words: NO RESULTS. This algorithm sounds TERRIBLY slow, even though the algorithm is presented in another paper. It seems this paper is totally misnamed, because it has very little to do with the three "main concerns" of sketch recognition past the Introduction. The algorithm behind the system may be interesting, but the paper itself is really just a bunch of filler and icing on top of that algorithm.

016 - Constellations

Summary:

The paper dealt with a system that combines strokes in a spatial way. It takes in a group of single strokes and calculates their general placement in the drawing area. Using this spatial data, it can identify relativity with other strokes and generalize the data into sets of known shapes. An example was a drawn face. It was required to have two eyes (a right and a left), two pupils, a stroke for the head's shape, a nose and a mouth. After those were drawn, anything could be added, such as eyelashes, an eyepatch, ears, or hair and the shape would still be recognized as a face. It accomplishes this by labeling things and searching through them with different search bounds. This speeds things up.

Discussion:

Biggest problem with this paper - no results. Honestly, I don't blame them, because the results have to be pretty low. I like the concept of the system, but the loose style of recognition really doesn't seem to mesh with me. It's almost like we're taking one step back in recognition and saying "are there 5 strokes there? OK, there are 10 strokes so it must be a face". It would be interesting to find some way to integrate the design for multiple domains, such as recognizing a face OR a boat, because right now it seems this idea is really uninspiring.

015 - OItmans' PhD Thesis

Summary:

Oltmans takes the approach that sketch data within features has too much noise and shouldn't be relied upon as much. This paper supports the idea that visualization techniques should be used instead of data to get more "humanly" recognition. The system described uses a "bullseye" technique that places a circle in essence every 5 pixels, with smaller circles cocentric to the larger circle in dartboard fashion. The bullseye is rotated freely, and includes stroke direction of all points within the bullseye, in "bins". The bins are made into frequency vectors, which are then compared to a pre-defined codebook The system then goes along the entire stroke with set windows and measure the ink of the line with all the known data and clusters all shapes that are close together. It then splits the clusters into more clusters to make individual shapes in a trickle-down fashion. Evaluation of the system was around 90% accurate in most areas.

Discussion:

I think this is an interesting outlook. While I don't believe throwing away all the data we have from the strokes is necessary, I do like the fact that you can get such high recognition without all the corner-finding algorithms and such that so many use. I'm a strong believer in editable thresholds, though, and I think that some things might be subject to change, such as the 5 pixel radius of the circles. Other than that, the idea is definitely something to consider for future work.

Miqe's Blog

Monday, November 12, 2007

018 - Multimodal Interaction

017 - Three Main Concerns in Sketch Recognition

016 - Constellations

015 - OItmans' PhD Thesis

Blog Archive