Miqe's Blog

Monday, November 12, 2007

018 - Multimodal Interaction

Summary:

Adler et al describe a system where users draw sketches and interact verbally with the computer. The user can just talk and explain what they are drawing, or the computer will ask questions when unsure. The paper describes a Wizard of Oz type user study with an "experimenter" person sitting in pretending to be the computer. The experimenter asked the user questions to clarify the user's intent. He had his own screen to view the drawing, but avoided eye contact with the user. The study showed that user's tend to over-clarify their answers to simple questions, and they also pause their speech to finish their drawing.

Discussion:

Really some very interesting results. I can't help but think of the old starship computer on Star Trek when I think of talking and pressing buttons. Sadly, these results are still inconclusive in my mind, as the study WAS a Wizard of Oz and NOT a real computer. There is a HUGE transition between a person sitting across the table and a computer talking to you, both with the programming and with the attitude of the user. All in all, it's a very interesting idea, but a bit obscure. The paper was all about a user study, but further testing needs to be done for anything solid.

017 - Three Main Concerns in Sketch Recognition

Summary:

In this paper Mahoney and Fromherz describe what they call the three main concerns and then show a system that vaguely covers the three concerns. The three concerns are dealing with human inaccuracy in sketches, being interactive, and being able to add new features later. The system described deals with "stick figure finding". It searches through the sketch checking every single line for different labels. For example, a body has 5 attached lines - a head, 2 arms, and 2 legs. Using an algorithm, the system goes through each line and checks to see if it is a body, and eventually finds the optimal solution. This allows for context lines not to be included in the figure and be seperated.

Discussion:

Two words: NO RESULTS. This algorithm sounds TERRIBLY slow, even though the algorithm is presented in another paper. It seems this paper is totally misnamed, because it has very little to do with the three "main concerns" of sketch recognition past the Introduction. The algorithm behind the system may be interesting, but the paper itself is really just a bunch of filler and icing on top of that algorithm.

016 - Constellations

Summary:

The paper dealt with a system that combines strokes in a spatial way. It takes in a group of single strokes and calculates their general placement in the drawing area. Using this spatial data, it can identify relativity with other strokes and generalize the data into sets of known shapes. An example was a drawn face. It was required to have two eyes (a right and a left), two pupils, a stroke for the head's shape, a nose and a mouth. After those were drawn, anything could be added, such as eyelashes, an eyepatch, ears, or hair and the shape would still be recognized as a face. It accomplishes this by labeling things and searching through them with different search bounds. This speeds things up.

Discussion:

Biggest problem with this paper - no results. Honestly, I don't blame them, because the results have to be pretty low. I like the concept of the system, but the loose style of recognition really doesn't seem to mesh with me. It's almost like we're taking one step back in recognition and saying "are there 5 strokes there? OK, there are 10 strokes so it must be a face". It would be interesting to find some way to integrate the design for multiple domains, such as recognizing a face OR a boat, because right now it seems this idea is really uninspiring.

015 - OItmans' PhD Thesis

Summary:

Oltmans takes the approach that sketch data within features has too much noise and shouldn't be relied upon as much. This paper supports the idea that visualization techniques should be used instead of data to get more "humanly" recognition. The system described uses a "bullseye" technique that places a circle in essence every 5 pixels, with smaller circles cocentric to the larger circle in dartboard fashion. The bullseye is rotated freely, and includes stroke direction of all points within the bullseye, in "bins". The bins are made into frequency vectors, which are then compared to a pre-defined codebook The system then goes along the entire stroke with set windows and measure the ink of the line with all the known data and clusters all shapes that are close together. It then splits the clusters into more clusters to make individual shapes in a trickle-down fashion. Evaluation of the system was around 90% accurate in most areas.

Discussion:

I think this is an interesting outlook. While I don't believe throwing away all the data we have from the strokes is necessary, I do like the fact that you can get such high recognition without all the corner-finding algorithms and such that so many use. I'm a strong believer in editable thresholds, though, and I think that some things might be subject to change, such as the 5 pixel radius of the circles. Other than that, the idea is definitely something to consider for future work.

Monday, October 29, 2007

014 - Ambiguous Intentions

Summary:

The paper describes NAPKIN, a system which allows users to draw sketchy shapes and then define them in different domains. An example would be to draw a box with four boxes around it denoting a table with four chairs. The system would then beautify and label the object. The objects are ranked by how they are recognized and by how many domains they are recognized in. The system stores all this data and saves it for later, in case a new napkin wants to use information from a previous napkin. This allows for multiple domains and ambiguous gestures, such as the above square as a table to be used in a bar graph, etc. The symbols are usually related to each other, such as the chairs to the table, because they are smaller and near. Different constraints would be made for different domains.

Discussion:

Honestly, I think this system has limited uses. While it seems and fine and dandy to have all these attachment features and ambiguous data, when people are ACTUALLY sketching something they usually don't intend anyone else to use their sketches. Since everyone has their own sketching style, labeling things would only hinder someone who is sketching something for their own personal use. Why bother labeling something as a table if the sketcher KNOWS it's a table? Also, I believe too many domains should be handled seperately, because the more domains a recognizer takes on, the less likely it is to correctly label objects in the right domain. Also, it's just as easy to write two systems for two domains as it is to write one system for two domains - if you've written the code well, that is.

013 - Herot

Summary:

The paper apparently was a little-known breakthrough in the field. While Sezgin and Rubine seemed to come up with some great ideas, this paper proves that they weren't the first to think of everything. The paper revolves around some early sketch recognition and seems to cover a broad scope. It explains that sketch recognition would be best at the beginning of the design process - a.k.a. the part where everyone's still "sketching". It deals with corner recognition by introducing speed in strokes, which Sezgin uses decades later. It further discusses latching and the user-domains that other papers have covered. It says that the system functioned well with some users and poorly with others, saying the results were "interesting".

Discussion:

Obviously, the paper's age makes it a bit dated technology wise, but the ideas are all there and still valid. I, along with the rest of the class, found it interesting that this paper even existed, because two decades later all kinds of papers flare up stating the exact same thing, although to be fair most of them did have SOME new ideas as well. This paper seems to be ahead of its time, perhaps due to limitations on availability of technology at the time. This "diamond in the rough" is sadly out-dated now by more in-depth research, and to be honest is a little disheartening that such an innovative paper can go so unnoticed.

012 - Learning of Shape Descriptions

Summary:

The paper is mostly concerning constraints on different features. The authors state that users pay more attention to some features and less to others. It introduces a score to different constraints, and allows for different settings on constraints. The best example is the parallel lines - when parallel lines are close in proximity, people recognize them as parallel lines very quickly. If the lines are far apart, it becomes less likely to be recognized, and by other gestures in between the lines human recognition drops drastically. In essence, their system was based as close to real humans as possible, not as close to the ideal recognition a computer can make.

Discussion:

I like the paper quite a bit, though I think it's somewhat unnecessary. I think the whole point of recognizers should be to get as close to human recognition as possible. In this day and age, we've proven that computers can think faster than any human being except on simple math problems and some seemingly unsolvable computer problems. I read that the human brain has around one terrabyte of memory, which you can go out and buy in a single hard drive in today's market. In that sense, we shouldn't shoot for perfect recognition in these sketches, because sketches are human made and thus not perfect. Again, I think the paper just stated what everyone who built recognizers should be thinking - make it as human as possible.

Monday, November 12, 2007

Monday, October 29, 2007

Blog Archive