Project Abstract:
We consider the problem of mapping words to the visual aspects
they are describing. For example, we would like to figure out that
the word red refers to some range of values for the light intensity,
or color, while round describes the particular shape of an object. It
is also possible for several words to refer to the same aspect of
visual representations, but differ in the range of values accepted.
As the word red, yellow also refers to the color of some object, but
it takes on a different set of allowable values for that feature. Our
goal is to figure out such associations automatically from annotated
image data where the alignment between text and image segment
is unknown, using Expectation Maximization. The general framework
laid down in this project may be easily extended to binary predicates
such as larger and on.
|