Skip to content

Commit fb11759

Browse files
committed
fix typos
1 parent 7cb99a0 commit fb11759

File tree

2 files changed

+4
-4
lines changed

2 files changed

+4
-4
lines changed

README.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -20,7 +20,7 @@
2020

2121
## Annoucement
2222

23-
- [2024-11] 🔈🔊 The `lmms-eval/v0.3.0` has been upgraded to support audio evaluations for audio models like Qwen2-Audio and Gemini_Audio across tasks such as AIRBench, Clotho-AQA, LibriSpeech, and more. Please refer to the [blog](https://github.com/EvolvingLMMs-Lab/lmms-eval/blob/main/docs/lmms-eval-0.3.md) for more details!
23+
- [2024-11] 🔈🔊 The `lmms-eval/v0.3.0` has been upgraded to support audio evaluations for audio models like Qwen2-Audio and Gemini_Audio across tasks such as AIR-Bench, Clotho-AQA, LibriSpeech, and more. Please refer to the [blog](https://github.com/EvolvingLMMs-Lab/lmms-eval/blob/main/docs/lmms-eval-0.3.md) for more details!
2424

2525
- [2024-07] 🎉🎉 We have released the [technical report](https://arxiv.org/abs/2407.12772) and [LiveBench](https://huggingface.co/spaces/lmms-lab/LiveBench)!
2626

docs/lmms-eval-0.3.md

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -109,7 +109,7 @@ This upgrade includes multiple benchmarks for audio understanding and instructio
109109

110110
| **Dataset** | **Year** | **Task Name in lmms-eval** | **Split** | **Task Format** | **Evaluation Metric** | **Number of QAs** | **Feature** |
111111
| --- | --- | --- | --- | --- | --- | --- | --- |
112-
| **AIRBench** | 2024 | air_bench_chat \| air_bench_foundation | chat, foundation | AIF | GPT-4 Eval (chat) \| Accuracy (foundation) | 2k (chat) \| 19k (foundation) | 1. Comprhensive tasks and audio types |
112+
| **AIR-Bench** | 2024 | air_bench_chat \| air_bench_foundation | chat, foundation | AIF | GPT-4 Eval (chat) \| Accuracy (foundation) | 2k (chat) \| 19k (foundation) | 1. Comprhensive tasks and audio types |
113113
| **Alpaca Audio** | 2024 | alpaca_audio | test | AIF | GPT-4 Eval | 100 | 1. Synthetic voice |
114114
| **Clotho-AQA** | 2022 | clotho_aqa | test \| val | AIF | Accuracy | test_v2 (2.06k), test \| val (1.44k \| 1.05k) | 1. Audio Question Answering<br> 2. Single word answer<br> 3. Text based question |
115115
| **Common_voice** | 2023 | common_voice_15 | test | ASR | WER(↓) (align with Qwen-audio) | en (16.4k) \| fr (16.1k) \| zh (10.6k) | 1. Real people voice<br> 2. Captioning |
@@ -130,11 +130,11 @@ AIF refers to Audio Instruction Following, and ASR refers to Audio Speech Recogn
130130

131131
| | | **Metric** | **Qwen2-Audio-Instruct (lmms-eval)** | **Qwen2-Audio (lmms-eval)** |
132132
| --- | --- | --- | --- | --- |
133-
| **AIRBench-Chat** | Speech | GPT-Eval | 7.16 | |
133+
| **AIR-Bench-Chat** | Speech | GPT-Eval | 7.16 | |
134134
| | Sound | | 6.14 | |
135135
| | Music | | 6.66 | |
136136
| | Mixed | | 5.75 | |
137-
| **AIRBench-Foundation** | Speech | Acc | 62.89 | |
137+
| **AIR-Bench-Foundation** | Speech | Acc | 62.89 | |
138138
| | Sound | | 55.42 | |
139139
| | Music | | 56.77 | |
140140
| **Alpaca** | test | GPT-Eval | 51.8 | |

0 commit comments

Comments
 (0)