Wednesday, December 12, 2007

22 - What are intelligence? And why?

Summary:

Davis begins this rather informative paper by explaining the title (English use of plural verbs) and how there are many different thoughts of what "are" intelligence. He explains the different views of reasoning, including mathematical, psychological, biological, statistical, and economical. He then goes on to explain different views of how intelligence has evolved and why it has evolved, using various views on early humans hunting skills, socializing skills, etc. The next section discusses less "human" variations on intelligence, showing different animals that have proven to be much more intelligent than most of their kind (or at least a human perception of their kind).

Discussion:

While this paper has little to do with sketch recognition, it is a very interesting read. Davis provides a great deal of information while keeping a neutral stance. When questioned in an interview, he specifically tried not to single any idea out as the best, worst, or even most in need of further consideration.

What I like about this paper is that it explains that if humans ever expect to develop a "strong AI" machine, we cannot continue to focus on things like rational, statistical logic. Intelligence is a product of evolution, shown by the lack of symmetry in the human brain, and a slow and steady process at that. If human scientists intend to create a human-like computer brain, they are going to have to cover thousands and thousands of years worth of history, let alone what little we know of before history was written.

Personally, I didn't believe that a true "strong AI" unit would ever be produced. We will have computers that think 1000 times faster and more efficient than humans, be able to make decisions in practically any situation, and perhaps even be superior to humans in almost every way, but it will never be able to emulate the random quirks that exist in every human. What I was intrigued and somewhat baffled by, however, is Davis' inclusion of "biology" in the reasoning list. I think this will be the defining step in AI, and the major hurdle and seemingly unscale-able wall that seperates "weak AI" from the real "strong AI". This will be the small flickers of random and illogical thought that happen to humans, the unexplainable phenomenon that separate humans and machines at the present.

21 - SketchREAD

Summary:

Alvarado's SketchREAD is a system based on Bayesian networks to cut down on the time-consuming task of checking every line with every template match. The system uses both a top-down method and a bottom-up approach. The top-down comes from unfinished shapes, and looks for the missing pieces of possible template matches. It will do this continuously if necessary. Bottom-up is similar to LADDER's approach, where each drawing is recognized as a primitive and the drawing builds upon itself. The system will also "prune" interpretations from the top-down approach, finally selecting the best fit after removing all the unlikely ones. Results show that SketchREAD improves upon baseline numbers, very highly in trees and significantly in circuit diagrams as well.

Discussion:

I wish I could test this system out. This system seems like a very nice alternative to LADDER and definitely in the running. I believe the different methods used within SketchREAD should definitely be looked at more closely, as using both top-down and bottom-up recognition techniques seems to be a great asset. I think integrating these techniques into existing systems would almost always increase recognition, but without knowing all the domains that exist, it's not certain. Still, I'd like to see more domains tested with these innovative ideas.

20 - 1$ Recognizer

Summary:

Wobbrock et al explain their new recognizer that can be written with little mathematical background in this system. It's given in four easy to follow steps - resampling, rotation, scaling, and classification. It begins by taking N samples from the gesture, defined by the system designer, then rotates it to be horizontal from the start point to the middle of the gesture (N/2 point). Then the gesture is scaled to fit in a specified bounding box and translated to be at the origin. The gesture is then error-checked by the system to match to the list of existing shapes. Results are very high (around 98%) with simple shapes and a decent amount of existing templates.

Discussion:

While this recognizer seems to have no significance in furthering the field, I believe it is necessary to the advancement into the next evolution of sketch recognition. The real benefit of this system is not in how much it advances the field, but more in how it draws new people into the field. This system is meant as a "first step" into sketch recognition, which was, before now, Rubine's feature system. As a beginner in the field myself, I see this paper as a much prettier and "fun" approach of entering sketch recognition, as Rubine's method is much more intimidating than this sytem. I believe that this system will be useful in drawing fresh ideas from less mathematically inclined designers.

19 - Multiscale Models of Temporal Patterns

Summary:

Sezgin et al write about how temporal patterns in user drawing process can be used to determine recognition. The paper discusses the different equations that can be used to find patterns, as well as how Hidden Markov Models can be used coinciding with these equations to find possible recognition rates. Using both gives a decently accurate sample of what the user intends to draw, except for the 'transistor' sketch on a circuit diagram. This is usually caused by segmentation with a wire being drawn after the main part of the transistor and before the "rebounded" part of the transistor. Sezgin et al compensate for that, looking further in the drawing queue for similar examples, basically translating the wire after the "rebounded" part. Results are quite impressive, usually in the 80-90% correct recognition.

Discussion:

While I believe there is a future in this way of thinking, I'm not sure about the exact direction. The paper seems to rely heavily on time data, which is not a TERRIBLE idea but can lead to a dependent relationship between the computer and specific users. The idea that people draw things in a specific order is great in theory, but sketches are usually used in a design process, and when designers design something "new" they usually don't always think linearly. For example, and architect usually draws an outlying structure of a house before putting in the interior walls, to make sure everything fits. But when he gets free reign to build something from scratch, rooms will be continuously added to an existing building, changing the outer wall again and again. I don't think these problems would make the system completely useless, but more work would be needed to keep a domain-independent status.

Monday, November 12, 2007

018 - Multimodal Interaction

Summary:

Adler et al describe a system where users draw sketches and interact verbally with the computer. The user can just talk and explain what they are drawing, or the computer will ask questions when unsure. The paper describes a Wizard of Oz type user study with an "experimenter" person sitting in pretending to be the computer. The experimenter asked the user questions to clarify the user's intent. He had his own screen to view the drawing, but avoided eye contact with the user. The study showed that user's tend to over-clarify their answers to simple questions, and they also pause their speech to finish their drawing.

Discussion:

Really some very interesting results. I can't help but think of the old starship computer on Star Trek when I think of talking and pressing buttons. Sadly, these results are still inconclusive in my mind, as the study WAS a Wizard of Oz and NOT a real computer. There is a HUGE transition between a person sitting across the table and a computer talking to you, both with the programming and with the attitude of the user. All in all, it's a very interesting idea, but a bit obscure. The paper was all about a user study, but further testing needs to be done for anything solid.

017 - Three Main Concerns in Sketch Recognition

Summary:

In this paper Mahoney and Fromherz describe what they call the three main concerns and then show a system that vaguely covers the three concerns. The three concerns are dealing with human inaccuracy in sketches, being interactive, and being able to add new features later. The system described deals with "stick figure finding". It searches through the sketch checking every single line for different labels. For example, a body has 5 attached lines - a head, 2 arms, and 2 legs. Using an algorithm, the system goes through each line and checks to see if it is a body, and eventually finds the optimal solution. This allows for context lines not to be included in the figure and be seperated.

Discussion:

Two words: NO RESULTS. This algorithm sounds TERRIBLY slow, even though the algorithm is presented in another paper. It seems this paper is totally misnamed, because it has very little to do with the three "main concerns" of sketch recognition past the Introduction. The algorithm behind the system may be interesting, but the paper itself is really just a bunch of filler and icing on top of that algorithm.

016 - Constellations

Summary:

The paper dealt with a system that combines strokes in a spatial way. It takes in a group of single strokes and calculates their general placement in the drawing area. Using this spatial data, it can identify relativity with other strokes and generalize the data into sets of known shapes. An example was a drawn face. It was required to have two eyes (a right and a left), two pupils, a stroke for the head's shape, a nose and a mouth. After those were drawn, anything could be added, such as eyelashes, an eyepatch, ears, or hair and the shape would still be recognized as a face. It accomplishes this by labeling things and searching through them with different search bounds. This speeds things up.

Discussion:

Biggest problem with this paper - no results. Honestly, I don't blame them, because the results have to be pretty low. I like the concept of the system, but the loose style of recognition really doesn't seem to mesh with me. It's almost like we're taking one step back in recognition and saying "are there 5 strokes there? OK, there are 10 strokes so it must be a face". It would be interesting to find some way to integrate the design for multiple domains, such as recognizing a face OR a boat, because right now it seems this idea is really uninspiring.