[XML-DEV Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message]

Re: Something altogether different?


bullard extender
> So where we do understand how the vector model
> works for text analysis, do we understand how to apply
> it to a *text* that includes video and audio as integral
> parts of the *text* and can we combine these into a
> higher level space vector term

Providing metadata for rich types is an area that's had some interesting work.
Besides the Dublin Core ViDe initiative, I came across some interesting papers
when researching this recently:

1. "Facilitating Video Access by Visualizing Automatic Analysis"
http://www.fxpal.com/publications/FXPAL-PR-99-045.pdf

"Metadata for video materials can be derived from the analysis of the audio and
video streams. For audio, we identify features such as silence, applause, and
speaker identity. For video, we find features such as shot boundaries,
presentation slides, and close-ups of human faces."

2. Yahoo has recently taken the RSS approach. Video RSS provides a text
description such as height, width, bitrate and running time:
http://www.webservicessummit.com/Channels/WebServicesSummitAudioVideo.rss

3. SQL implementations such as DB2 UDB support content-based querying over rich
types. DB2 has an Image Extender and Audio Extender with correspondiong types
(DB2IMAGE, DB2AUDIO). The Audio Extender analyses the content and stores values
such as whether it's 16-bit audio, samples per second, playing time, the number
of clock ticks per quarter note and so on. The Image Extender stores information
that enables you to provide an image and search for matches based on color and
texture (contrast, directionality, etc.).

IBM's CueVideo software uses speech recognition technology to generate text from
the audio tracks of videos -- which could then be fed into an engine that uses
the vector space model and textual similarity matching described in my previous
message:
http://www.almaden.ibm.com/projects/data/CueVideo.pdf

4. This paper discusses analysis of digital music using similarity matrices.
Media Segmentation using Self-Similarity Decomposition
http://www.fxpal.com/people/cooper/Papers/SPIE02.pdf

"We assume only that the audio or music exhibits instances of similar segments,
possibly separated by other
segments. For example, a common popular song structure is ABABCAB, where A is a
verse segment, B is
the chorus, and C is the bridge or "middle eight." We would hope to be able to
group the segments of this song
into three clusters corresponding to the three different parts. Once this is
done, the song could be summarized
by presenting only the novel segments. In this example, the sequence ABC is a
significantly shorter summary
containing essentially all the information in the song."


3.1. Clustering via similarity matrix decomposition
To cluster the segments, we factor a segment-indexed similarity matrix to find
repeated or substantially similar
groups of segments."


PURCHASE STYLUS STUDIO ONLINE TODAY!

Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced!

Buy Stylus Studio Now

Download The World's Best XML IDE!

Accelerate XML development with our award-winning XML IDE - Download a free trial today!

Don't miss another message! Subscribe to this list today.
Email
First Name
Last Name
Company
Subscribe in XML format
RSS 2.0
Atom 0.3
 

Stylus Studio has published XML-DEV in RSS and ATOM formats, enabling users to easily subcribe to the list from their preferred news reader application.


Stylus Studio Sponsored Links are added links designed to provide related and additional information to the visitors of this website. they were not included by the author in the initial post. To view the content without the Sponsor Links please click here.

Site Map | Privacy Policy | Terms of Use | Trademarks
Free Stylus Studio XML Training:
W3C Member
Stylus Studio® and DataDirect XQuery ™are products from DataDirect Technologies, is a registered trademark of Progress Software Corporation, in the U.S. and other countries. © 2004-2013 All Rights Reserved.