Research at Cisco

Video Search

Cisco is not currently accepting proposals for this RFP.

Project ID:



Video Search


As more video content is generated, there is a growing need for better methods of video searching.

Full Description:

Commonly deployed methods of characterizing video content to allow searching, and the search techniques themselves, tend to eschew direct processing of the video. Instead, they depend on applying already proven analysis techniques to the text (e.g. closed captions) and/or audio associated with the video. This results in both poor recall and poor precision (in information retrieval terms) when the visual content is not well represented in the tagging, transcript, or audio. Success in using the audio as the primary source of content characterization is also limited by inaccuracies in speech recognition (for languages that have such support) and the inability to use the audio in the case of languages that do not.

With the opportunity for everyone to become a content provider through their own YouTube channel and other social media, mining video for relevant searching information is extremely crucial.

We speculate that by directly analyzing the video, in combination with the ancillary text and audio when possible, has the potential to substantially improve the utility of video search. There are many valuable use cases, such as:

  • Show me videos where in a soccer match, someone used his hand to stop a goal from the goal-line
  • Show me videos where a red van goes from North to South followed by a cop car chasing it (surveillance application).
  • Identifying locations so that location-based search can index into video content (e.g. "where in Manhattan was it that Woody Allen met Mia Farrow"?)

However, searching video for features such as those in the examples above (e.g., objects, gestures, etc.) and combining them to extract the semantics of an action or event is a hard research problem. Relating the extracted features to user search terms and producing reasonable recall and accuracy is similarly hard.

As microblog like services enable video in the future (e.g., imagine a video tweet), there will be a huge expectations for realtime video searching capabilities that work. Users may settle for 'approximate' search results, which can be improved with other techniques (e.g., audio etc.) for filtering to get more accurate final results. An interesting related topic is how to extract an abstract from a video clip that captures the essence of the full clip in a short duration.

Broad research areas include (but not limited to):

  • Semantics of video
  • Fast/real-time video search
  • Approximate video search
  • Abstract of a video
  • Feature extraction/activity recognition in real time
  • Content-wise similarity search across videos

Constraints and other information:

IPR will stay with the University. Cisco expects customary scholarly dissemination of results, and hopes that promising results would be made available to the community without limiting licenses, royalties, or other encumbrances.

Proposal submission:

Please use the link below to submit a proposal for research responding to this RFP. After a preliminary review, we may ask you to revise and resubmit your proposal.

Cisco is not currently accepting proposals for this RFP.

RFPs may be withdrawn as research proposals are funded, or interest in the specific topic is satisfied. Researchers should plan to submit their proposals as soon as possible. The deadline for Submissions is the Friday of the first week of each calendar quarter (the months of January, April, July, October). Funding decisions and communication will occur within 90 days from the quarterly submission deadline. The usage of funding is expected within 12 months of funding decision. Please plan your requests accordingly.

Questions? Contact: