Automated audio captioning with deep learning methods

Abstract / truncated to 115 words (read the full abstract)

In the audio research field, the majority of machine learning systems focus on recognizing a limited number of sound events. However, when a machine interacts with real data, it must be able to handle much more varied and complex situations. To tackle this problem, annotators use natural language, which allows any sound information to be summarized. Automated Audio Captioning (AAC) was introduced recently to develop systems capable of automatically producing a description of any type of sound in text form. This task concerns all kinds of sound events such as environmental, urban, domestic sounds, sound effects, music or speech. This type of system could be used by people who are deaf or hard of hearing, ... toggle 4 keywords
automated audio captioning – deep learning – end-to-end multimodal modelling – sound events

Information

Author

Labbé, Étienne

Institution

IRIT

Supervisors

Publication Year

2024

Upload Date

July 16, 2024

The current layout is optimized for mobile phones. Page previews, thumbnails, and full abstracts will remain hidden until the browser window grows in width.

The current layout is optimized for tablet devices. Page previews and some thumbnails will remain hidden until the browser window grows in width.

Follow @eurasip

Automated audio captioning with deep learning methods (2024)

Abstract / truncated to 115 words (read the full abstract)

Information

First few pages / click to enlarge