Waiting with Josť, a vision-based mobile robot
We consider the problem of unsupervised
classification of temporal sequences of facial expressions in video.
This problem arises in the design of an adaptive visual
agent, which must be capable of identifying appropriate classes
of visual events without supervision to effectively complete its tasks.
We present a multilevel dynamic Bayesian network that learns
the high-level dynamics of facial expressions simultaneously
with models of the expressions themselves.
We show how the parameters of the model can be learned in a scalable
and efficient way. We present preliminary results using
real video data and a class of simulated dynamic event models.
The results show that our model correctly classifies the input
data comparably to a standard event classification approach, while
also learning the high-level model parameters.