Skip to content

Commit 5d49a5a

Browse files
committed
update readme
1 parent 669efd3 commit 5d49a5a

File tree

7 files changed

+1998
-1
lines changed

7 files changed

+1998
-1
lines changed

README.md

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -12,7 +12,8 @@
1212
**OmniParser** is a comprehensive method for parsing user interface screenshots into structured and easy-to-understand elements, which significantly enhances the ability of GPT-4V to generate actions that can be accurately grounded in the corresponding regions of the interface.
1313

1414
## News
15-
- [2024/11/26] We release an updated version, OmniParser V1.5 which features 1) more fine grained/small icon detection, 2) prediction of whether each screen element is interactable or not. Examples in the demo.ipynb.
15+
- [2025/1] V2 is coming. We achieve new state of the art results (39.5%) on [Screen Spot Pro](https://github.com/likaixin2000/ScreenSpot-Pro-GUI-Grounding/tree/main) with OmniParser v2 (will be released soon)! Read more details [here](https://github.com/microsoft/OmniParser/tree/master/docs).
16+
- [2024/11] We release an updated version, OmniParser V1.5 which features 1) more fine grained/small icon detection, 2) prediction of whether each screen element is interactable or not. Examples in the demo.ipynb.
1617
- [2024/10] OmniParser was the #1 trending model on huggingface model hub (starting 10/29/2024).
1718
- [2024/10] Feel free to checkout our demo on [huggingface space](https://huggingface.co/spaces/microsoft/OmniParser)! (stay tuned for OmniParser + Claude Computer Use)
1819
- [2024/10] Both Interactive Region Detection Model and Icon functional description model are released! [Hugginface models](https://huggingface.co/microsoft/OmniParser)

docs/Evaluation.md

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,4 @@
1+
# Eval setup for ScreenSpot Pro
2+
We adapt the eval code from ScreenSpot Pro (ss pro) official repo: https://github.com/likaixin2000/ScreenSpot-Pro-GUI-Grounding/tree/main. This folder contains the inference script/results on this benchmark. We are under legal review proces to release omniparser v2. Once it is done, we will update the file so that it can load the v2 model.
3+
1. eval/ss_pro_gpt4o_omniv2.py: contains the prompt we use, it can be dropped in replacement for this [file](https://github.com/likaixin2000/ScreenSpot-Pro-GUI-Grounding/blob/main/models/gpt4x.py) in the original ss pro repo.
4+
2. eval/logs_sspro_omniv2.json: contains the inferenced results for ss pro using GPT4o+OmniParserv2.

eval/logs_sspro_omniv2.json

Lines changed: 1581 additions & 0 deletions
Large diffs are not rendered by default.

eval/ss_pro_gpt4o_omniv2.py

Lines changed: 411 additions & 0 deletions
Large diffs are not rendered by default.

imgs/teams.png

1.67 MB
Loading

imgs/windows.png

1.13 MB
Loading

imgs/windows_vm.png

1.92 MB
Loading

0 commit comments

Comments
 (0)