[XML-DEV Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message] Re: Something altogether different?
> So where we do understand how the vector model > works for text analysis, do we understand how to apply > it to a *text* that includes video and audio as integral > parts of the *text* and can we combine these into a > higher level space vector term Providing metadata for rich types is an area that's had some interesting work. Besides the Dublin Core ViDe initiative, I came across some interesting papers when researching this recently: 1. "Facilitating Video Access by Visualizing Automatic Analysis" http://www.fxpal.com/publications/FXPAL-PR-99-045.pdf "Metadata for video materials can be derived from the analysis of the audio and video streams. For audio, we identify features such as silence, applause, and speaker identity. For video, we find features such as shot boundaries, presentation slides, and close-ups of human faces." 2. Yahoo has recently taken the RSS approach. Video RSS provides a text description such as height, width, bitrate and running time: http://www.webservicessummit.com/Channels/WebServicesSummitAudioVideo.rss 3. SQL implementations such as DB2 UDB support content-based querying over rich types. DB2 has an Image Extender and Audio Extender with correspondiong types (DB2IMAGE, DB2AUDIO). The Audio Extender analyses the content and stores values such as whether it's 16-bit audio, samples per second, playing time, the number of clock ticks per quarter note and so on. The Image Extender stores information that enables you to provide an image and search for matches based on color and texture (contrast, directionality, etc.). IBM's CueVideo software uses speech recognition technology to generate text from the audio tracks of videos -- which could then be fed into an engine that uses the vector space model and textual similarity matching described in my previous message: http://www.almaden.ibm.com/projects/data/CueVideo.pdf 4. This paper discusses analysis of digital music using similarity matrices. Media Segmentation using Self-Similarity Decomposition http://www.fxpal.com/people/cooper/Papers/SPIE02.pdf "We assume only that the audio or music exhibits instances of similar segments, possibly separated by other segments. For example, a common popular song structure is ABABCAB, where A is a verse segment, B is the chorus, and C is the bridge or "middle eight." We would hope to be able to group the segments of this song into three clusters corresponding to the three different parts. Once this is done, the song could be summarized by presenting only the novel segments. In this example, the sequence ABC is a significantly shorter summary containing essentially all the information in the song." 3.1. Clustering via similarity matrix decomposition To cluster the segments, we factor a segment-indexed similarity matrix to find repeated or substantially similar groups of segments."
|
PURCHASE STYLUS STUDIO ONLINE TODAY!Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced! Download The World's Best XML IDE!Accelerate XML development with our award-winning XML IDE - Download a free trial today! Subscribe in XML format
|