REP: Ray History Server #62

andrewsykim · 2025-11-21T16:47:16Z

reps/2025-11-21-ray-history-server/2025-11-21-ray-history-server.md

MengjinYan

Thanks for come up with the REP!

MengjinYan · 2025-11-21T19:39:25Z

reps/2025-11-21-ray-history-server/2025-11-21-ray-history-server.md

+
+For Beta:
+* A user can specify a top-level API in RayCluster to enable the history server.
+* A local Ray dashboard can use the history server as an API backend to view the state of a terminated Ray cluster.


There seems to be a very similar point in alpha. So just curious, what's the difference between the point in alpha and beta?

They're the same, I added it here to indicate that using a local Ray dashboard should still be supported in Beta, let me know if you think otherwise

MengjinYan · 2025-11-21T19:40:33Z

reps/2025-11-21-ray-history-server/2025-11-21-ray-history-server.md

+For Beta:
+* A user can specify a top-level API in RayCluster to enable the history server.
+* A local Ray dashboard can use the history server as an API backend to view the state of a terminated Ray cluster.
+* A remote Ray dashboard running on Kubernetes (managed by KubeRay) can be used to view the state of a terminated Ray cluster.


I think between beta and GA, based on the experience working with the existing online dashboard, we might need to adjust the dashboard for it to better show the history information of a cluster.

I'm assuming the dashboard changes are actually needed in alpha. I think @KunWuLuan fork of the dashboard has changes that need to somehow be incorporated in the upstream dashboard to unblock even Alpha-level support. Let me know if you think otherwise.

Yes in alpha version the dashboard changes is need and the dashboard can not be used independently. We will not try to merge the changes of dashboard in alpha back to the upstream because in beta version there will be no changes of dashboard.

Got it. Apart from the dashboard serving perspective, I think my point is more regarding the actual content of the dashboard. Basically in GA, we might want to adjust the dashboard to make it better showcasing the history information and remove the components/fields that are not applicable. But details can be discussed when we have more experiences in running the dashboards.

MengjinYan · 2025-11-21T19:41:54Z

reps/2025-11-21-ray-history-server/2025-11-21-ray-history-server.md

+## (Optional) Follow-on Work
+
+We will start with a naive approach to event processing on the history server. However, we may need to explore
+more optimal strategies if processing events introduces significant latency overhead or memory usage.


Wondering if we should link the original design doc from @KunWuLuan?

added a reference to the doc in this section

reps/2025-11-21-ray-history-server/2025-11-21-ray-history-server.md

MengjinYan · 2025-11-21T19:47:02Z

cc: @alanwguo for awareness

Signed-off-by: Andrew Sy Kim <[email protected]>

KunWuLuan · 2025-11-26T01:44:36Z

Hi, will the /api/jobs/{job_id} be supported in v1.7? We have not discussed about how to rebuild these pages. I am not sure if we can complete before v1.7 release.

Future-Outlier · 2025-12-03T15:24:00Z

reps/2025-11-21-ray-history-server/2025-11-21-ray-history-server.md

+is responsible for grouping the events.
+
+All events will initially be partitioned by Job ID. Specifically, task events associated with the same Job ID will be stored in the same directory.
+* Node-level events will be stored in: cluster_name_cluster_uid/session_id/node_events/<nodeName>-<time>


Should here be nodeName or nodeID?
cc @KunWuLuan

MengjinYan · 2025-12-04T22:36:43Z

Hi, will the /api/jobs/{job_id} be supported in v1.7? We have not discussed about how to rebuild these pages. I am not sure if we can complete before v1.7 release.

I'm not sure about the release version but I think if it is part of the dashboard, we should support it.

andrewsykim force-pushed the ray-history-server branch from 1940bb0 to 0cf9789 Compare November 21, 2025 16:49

andrewsykim commented Nov 21, 2025

View reviewed changes

reps/2025-11-21-ray-history-server/2025-11-21-ray-history-server.md Show resolved Hide resolved

MengjinYan reviewed Nov 21, 2025

View reviewed changes

andrewsykim force-pushed the ray-history-server branch 2 times, most recently from 2224da1 to 92a6859 Compare November 21, 2025 21:35

andrewsykim changed the title ~~Add initial enhancement proposal for Ray History Server~~ REP: Ray History Server Nov 22, 2025

Add initial enhancement proposal for Ray History Server

fac40c8

Signed-off-by: Andrew Sy Kim <[email protected]>

andrewsykim force-pushed the ray-history-server branch from 92a6859 to fac40c8 Compare November 24, 2025 18:48

Future-Outlier reviewed Dec 3, 2025

View reviewed changes

Future-Outlier mentioned this pull request Dec 3, 2025

add the implementation of historyserver collector ray-project/kuberay#4241

Open

4 tasks

edoakes approved these changes Dec 4, 2025

View reviewed changes

edoakes merged commit 1ed84fd into ray-project:main Dec 4, 2025
1 check passed

REP: Ray History Server #62

REP: Ray History Server #62

Uh oh!

Conversation

andrewsykim commented Nov 21, 2025 • edited by Future-Outlier Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

MengjinYan left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

MengjinYan Dec 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

MengjinYan commented Nov 21, 2025

Uh oh!

KunWuLuan commented Nov 26, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

MengjinYan commented Dec 4, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

andrewsykim commented Nov 21, 2025 •

edited by Future-Outlier

Loading

MengjinYan Dec 4, 2025 •

edited

Loading

KunWuLuan commented Nov 26, 2025 •

edited

Loading