Skip to content

Conversation

@andrewsykim
Copy link
Member

@andrewsykim andrewsykim commented Nov 21, 2025

Copy link
Contributor

@MengjinYan MengjinYan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for come up with the REP!


For Beta:
* A user can specify a top-level API in RayCluster to enable the history server.
* A local Ray dashboard can use the history server as an API backend to view the state of a terminated Ray cluster.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There seems to be a very similar point in alpha. So just curious, what's the difference between the point in alpha and beta?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They're the same, I added it here to indicate that using a local Ray dashboard should still be supported in Beta, let me know if you think otherwise

For Beta:
* A user can specify a top-level API in RayCluster to enable the history server.
* A local Ray dashboard can use the history server as an API backend to view the state of a terminated Ray cluster.
* A remote Ray dashboard running on Kubernetes (managed by KubeRay) can be used to view the state of a terminated Ray cluster.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think between beta and GA, based on the experience working with the existing online dashboard, we might need to adjust the dashboard for it to better show the history information of a cluster.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm assuming the dashboard changes are actually needed in alpha. I think @KunWuLuan fork of the dashboard has changes that need to somehow be incorporated in the upstream dashboard to unblock even Alpha-level support. Let me know if you think otherwise.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes in alpha version the dashboard changes is need and the dashboard can not be used independently. We will not try to merge the changes of dashboard in alpha back to the upstream because in beta version there will be no changes of dashboard.

Copy link
Contributor

@MengjinYan MengjinYan Dec 4, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it. Apart from the dashboard serving perspective, I think my point is more regarding the actual content of the dashboard. Basically in GA, we might want to adjust the dashboard to make it better showcasing the history information and remove the components/fields that are not applicable. But details can be discussed when we have more experiences in running the dashboards.

## (Optional) Follow-on Work

We will start with a naive approach to event processing on the history server. However, we may need to explore
more optimal strategies if processing events introduces significant latency overhead or memory usage. No newline at end of file
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wondering if we should link the original design doc from @KunWuLuan?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added a reference to the doc in this section

@MengjinYan
Copy link
Contributor

cc: @alanwguo for awareness

@andrewsykim andrewsykim force-pushed the ray-history-server branch 2 times, most recently from 2224da1 to 92a6859 Compare November 21, 2025 21:35
@andrewsykim andrewsykim changed the title Add initial enhancement proposal for Ray History Server REP: Ray History Server Nov 22, 2025
@KunWuLuan
Copy link

KunWuLuan commented Nov 26, 2025

Hi, will the /api/jobs/{job_id} be supported in v1.7? We have not discussed about how to rebuild these pages. I am not sure if we can complete before v1.7 release.

is responsible for grouping the events.

All events will initially be partitioned by Job ID. Specifically, task events associated with the same Job ID will be stored in the same directory.
* Node-level events will be stored in: cluster_name_cluster_uid/session_id/node_events/<nodeName>-<time>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should here be nodeName or nodeID?
cc @KunWuLuan

@edoakes edoakes merged commit 1ed84fd into ray-project:main Dec 4, 2025
1 check passed
@MengjinYan
Copy link
Contributor

Hi, will the /api/jobs/{job_id} be supported in v1.7? We have not discussed about how to rebuild these pages. I am not sure if we can complete before v1.7 release.

I'm not sure about the release version but I think if it is part of the dashboard, we should support it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants