Abstract / truncated to 115 words (read the full abstract)

In the audio research field, the majority of machine learning systems focus on recognizing a limited number of sound events. However, when a machine interacts with real data, it must be able to handle much more varied and complex situations. To tackle this problem, annotators use natural language, which allows any sound information to be summarized. Automated Audio Captioning (AAC) was introduced recently to develop systems capable of automatically producing a description of any type of sound in text form. This task concerns all kinds of sound events such as environmental, urban, domestic sounds, sound effects, music or speech. This type of system could be used by people who are deaf or hard of hearing, ... toggle 4 keywords

automated audio captioning deep learning end-to-end multimodal modelling sound events

Information

Author
Labbé, Étienne
Institution
IRIT
Supervisors
Publication Year
2024
Upload Date
July 16, 2024

First few pages / click to enlarge

The current layout is optimized for mobile phones. Page previews, thumbnails, and full abstracts will remain hidden until the browser window grows in width.

The current layout is optimized for tablet devices. Page previews and some thumbnails will remain hidden until the browser window grows in width.