Skip to content

Commit 62cc689

Browse files
NimrodShabtayNimrod-Shabtay
andauthored
Add LiveXiv benchmark [ICLR 2025] (#572)
* initial commit * lint --------- Co-authored-by: Nimrod-Shabtay <[email protected]>
1 parent 5b7266d commit 62cc689

18 files changed

+254
-0
lines changed
Lines changed: 9 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,9 @@
1+
group: livexiv_tqa
2+
task:
3+
- livexiv_tqa_v1
4+
- livexiv_tqa_v2
5+
- livexiv_tqa_v3
6+
- livexiv_tqa_v4
7+
- livexiv_tqa_v5
8+
- livexiv_tqa_v6
9+
Lines changed: 26 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,26 @@
1+
dataset_path: LiveXiv/LiveXiv
2+
dataset_kwargs:
3+
token: True
4+
test_split: test
5+
dataset_name: TQA-2024-09-21
6+
output_type: generate_until
7+
doc_to_visual: !function utils.livexiv_doc_to_visual
8+
doc_to_text: !function utils.livexiv_doc_to_text
9+
doc_to_target: "answer"
10+
generation_kwargs:
11+
until:
12+
- "ASSISTANT:"
13+
image_aspect_ratio: original
14+
process_results: !function utils.livexiv_process_result
15+
process_results_use_image: true
16+
metric_list:
17+
- metric: livexiv_tqa
18+
aggregation: !function utils.livexiv_aggregation_result
19+
higher_is_better: true
20+
metadata:
21+
- version: 0.0
22+
23+
lmms_eval_specific_kwargs:
24+
default:
25+
pre_prompt: ""
26+
post_prompt: ""
Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,3 @@
1+
task: "livexiv_tqa_v1"
2+
dataset_name: "TQA-2024-09-21"
3+
include: livexiv_tqa_template_yaml
Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,3 @@
1+
task: "livexiv_tqa_v2"
2+
dataset_name: "TQA-2024-10-26"
3+
include: livexiv_tqa_template_yaml
Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,3 @@
1+
task: "livexiv_tqa_v3"
2+
dataset_name: "v3-TQA"
3+
include: livexiv_tqa_template_yaml
Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,3 @@
1+
task: "livexiv_tqa_v4"
2+
dataset_name: "v4-TQA"
3+
include: livexiv_tqa_template_yaml
Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,3 @@
1+
task: "livexiv_tqa_v5"
2+
dataset_name: "v5-TQA"
3+
include: livexiv_tqa_template_yaml
Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,3 @@
1+
task: "livexiv_tqa_v6"
2+
dataset_name: "v6-TQA"
3+
include: livexiv_tqa_template_yaml
Lines changed: 74 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,74 @@
1+
import json
2+
import re
3+
4+
5+
def extract_answer(text):
6+
match = re.findall(r"(?<!^)[A-Z]", text)
7+
if match:
8+
return match[0]
9+
return None
10+
11+
12+
def livexiv_doc_to_visual(doc):
13+
return [doc["image"].convert("RGB")]
14+
15+
16+
def livexiv_doc_to_text(doc, model_specific_kwargs=None):
17+
question = doc["question"]
18+
question += "\n" + f"A. {doc['option_a']}\n"
19+
question += f"B. {doc['option_b']}\n"
20+
question += f"C. {doc['option_c']}\n"
21+
question += f"D. {doc['option_d']}"
22+
return f"{question}\nAnswer with the option's letter from the given choices directly."
23+
24+
25+
def livexiv_process_result(doc, result):
26+
pred = result[0].strip()
27+
if len(pred) > 1:
28+
if "answer" in pred.lower():
29+
pred = extract_answer(pred)
30+
else:
31+
pred = pred[0]
32+
answer = doc["gt"]
33+
34+
return {f"livexiv_tqa": {"pred": pred, "answer": answer}}
35+
36+
37+
def livexiv_aggregation_result(results):
38+
total_count = 0
39+
total_correct = 0
40+
for result in results:
41+
try:
42+
if result["pred"].lower().strip() == result["answer"].lower().strip():
43+
total_correct += 1
44+
except Exception as e:
45+
print(e)
46+
47+
total_count += 1
48+
return total_correct / total_count
49+
50+
51+
def livexiv_aggregation_result_all(results):
52+
score = livexiv_aggregation_result(results)
53+
stored_results = []
54+
for result in results:
55+
stored_results.append({"question_id": result["question_id"], "prediction": result["pred"]})
56+
with open("./livexiv_tqa_submission.json", "w") as f:
57+
json.dump(stored_results, f, indent=4)
58+
print("Storing files for LiveXiv-TQA submission ...")
59+
60+
return score
61+
62+
63+
def livexiv_doc_to_text_mc(doc):
64+
question = doc["question"]
65+
return f"{question} Answer :"
66+
67+
68+
def livexiv_doc_to_choice(doc):
69+
return [doc["option_a"], doc["option_b"], doc["option_c"], doc["option_d"]]
70+
71+
72+
def livexiv_doc_to_mc_target(doc):
73+
answer2choice = {"A": "option_a", "B": "option_b", "C": "option_c", "D": "option_d"}
74+
return doc[answer2choice[doc["answer"]]]
Lines changed: 9 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,9 @@
1+
group: livexiv_vqa
2+
task:
3+
- livexiv_vqa_v1
4+
- livexiv_vqa_v2
5+
- livexiv_vqa_v3
6+
- livexiv_vqa_v4
7+
- livexiv_vqa_v5
8+
- livexiv_vqa_v6
9+

0 commit comments

Comments
 (0)