Human communication in a natural context implies the dynamic coordination of contextual clues, paralinguistic information and literal as well as figurative language use. In the present study we constructed a paradigm with four types of video clips: literal and metaphorical expressions accompanied by congruent and incongruent gesture actions. Participants were instructed to classify the gesture accompanying the expression as congruent or incongruent by pressing two different keys while electrophysiological activity was being recorded. We compared behavioral measures and event related potential (ERP) differences triggered by the gesture stroke onset. Accuracy data showed that incongruent metaphorical expressions were more difficult to classify. Reaction times were modulated by incongruent gestures, by metaphorical expressions and by a gesture-expression interaction. No behavioral differences were found between the literal and metaphorical expressions when the gesture was congruent. N400-like and LPC-like (late positive complex) components from metaphorical expressions produced greater negativity. The N400-like modulation of metaphorical expressions showed a greater difference between congruent and incongruent categories over the left anterior region, compared with the literal expressions. More importantly, the literal congruent as well as the metaphorical congruent categories did not show any difference. Accuracy, reaction times and ERPs provide convergent support for a greater contextual sensitivity of the metaphorical expressions.