fix typos

KairuiHu · KairuiHu · commit fb117599ebbd · 2024-11-27T20:19:02.000+08:00
diff --git a/README.md b/README.md
@@ -20,7 +20,7 @@
 
 ## Annoucement
 
-- [2024-11] 🔈🔊 The `lmms-eval/v0.3.0` has been upgraded to support audio evaluations for audio models like Qwen2-Audio and Gemini_Audio across tasks such as AIRBench, Clotho-AQA, LibriSpeech, and more. Please refer to the [blog](https://github.com/EvolvingLMMs-Lab/lmms-eval/blob/main/docs/lmms-eval-0.3.md) for more details!
+- [2024-11] 🔈🔊 The `lmms-eval/v0.3.0` has been upgraded to support audio evaluations for audio models like Qwen2-Audio and Gemini_Audio across tasks such as AIR-Bench, Clotho-AQA, LibriSpeech, and more. Please refer to the [blog](https://github.com/EvolvingLMMs-Lab/lmms-eval/blob/main/docs/lmms-eval-0.3.md) for more details!
 
 - [2024-07] 🎉🎉 We have released the [technical report](https://arxiv.org/abs/2407.12772) and [LiveBench](https://huggingface.co/spaces/lmms-lab/LiveBench)! 
 
diff --git a/docs/lmms-eval-0.3.md b/docs/lmms-eval-0.3.md
@@ -109,7 +109,7 @@ This upgrade includes multiple benchmarks for audio understanding and instructio
 
 | **Dataset** | **Year** | **Task Name in lmms-eval** | **Split** | **Task Format** | **Evaluation Metric** | **Number of QAs** | **Feature** |
 | --- | --- | --- | --- | --- | --- | --- | --- |
-| **AIRBench** | 2024 | air_bench_chat \| air_bench_foundation | chat, foundation | AIF | GPT-4 Eval (chat) \| Accuracy (foundation) | 2k (chat) \| 19k (foundation) | 1. Comprhensive tasks and audio types |
+| **AIR-Bench** | 2024 | air_bench_chat \| air_bench_foundation | chat, foundation | AIF | GPT-4 Eval (chat) \| Accuracy (foundation) | 2k (chat) \| 19k (foundation) | 1. Comprhensive tasks and audio types |
 | **Alpaca Audio** | 2024 | alpaca_audio | test | AIF | GPT-4 Eval | 100 | 1. Synthetic voice |
 | **Clotho-AQA** | 2022 | clotho_aqa | test \| val | AIF | Accuracy | test_v2 (2.06k), test \| val (1.44k \| 1.05k) | 1. Audio Question Answering<br> 2. Single word answer<br> 3. Text based question |
 | **Common_voice** | 2023 | common_voice_15 | test | ASR | WER(↓) (align with Qwen-audio) | en (16.4k) \| fr (16.1k) \| zh (10.6k) | 1. Real people voice<br> 2. Captioning |
@@ -130,11 +130,11 @@ AIF refers to Audio Instruction Following, and ASR refers to Audio Speech Recogn
 
 |  |  | **Metric** | **Qwen2-Audio-Instruct (lmms-eval)** | **Qwen2-Audio (lmms-eval)** |
 | --- | --- | --- | --- | --- |
-| **AIRBench-Chat** | Speech | GPT-Eval  | 7.16 |  |
+| **AIR-Bench-Chat** | Speech | GPT-Eval  | 7.16 |  |
 |  | Sound |  | 6.14 |  |
 |  | Music |  | 6.66 |  |
 |  | Mixed |  | 5.75 |  |
-| **AIRBench-Foundation** | Speech | Acc | 62.89 |  |
+| **AIR-Bench-Foundation** | Speech | Acc | 62.89 |  |
 |  | Sound |  | 55.42 |  |
 |  | Music |  | 56.77 |  |
 | **Alpaca** | test | GPT-Eval | 51.8 |  |