Automated audio captioning with deep learning methods (2024)
Abstract / truncated to 115 words
In the audio research field, the majority of machine learning systems focus on recognizing a limited number of sound events. However, when a machine interacts with real data, it must be able to handle much more varied and complex situations. To tackle this problem, annotators use natural language, which allows any sound information to be summarized. Automated Audio Captioning (AAC) was introduced recently to develop systems capable of automatically producing a description of any type of sound in text form. This task concerns all kinds of sound events such as environmental, urban, domestic sounds, sound effects, music or speech. This type of system could be used by people who are deaf or hard of hearing, ...
automated audio captioning – deep learning – end-to-end multimodal modelling – sound events
Information
- Author
- Labbé, Étienne
- Institution
- IRIT
- Supervisors
- Publication Year
- 2024
- Upload Date
- July 16, 2024
The current layout is optimized for mobile phones. Page previews, thumbnails, and full abstracts will remain hidden until the browser window grows in width.
The current layout is optimized for tablet devices. Page previews and some thumbnails will remain hidden until the browser window grows in width.