Skip to content

Conversation

@KunWuLuan
Copy link
Contributor

@KunWuLuan KunWuLuan commented Dec 1, 2025

add the implementation of historyserver collector

Why are these changes needed?

Related issue number

#3966

Checks

  • I've made sure the tests are passing.
  • Testing Strategy
    • Unit tests
    • Manual tests
    • This PR is not tested :(

update go.work go.mod

Signed-off-by: KunWuLuan <[email protected]>
@Future-Outlier Future-Outlier self-assigned this Dec 1, 2025

// 分类事件
for _, event := range eventsToFlush {
hourKey := event.Timestamp.Truncate(time.Hour).Format("2006-01-02-15")
Copy link
Member

@Future-Outlier Future-Outlier Dec 3, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we be "2006-01-02-15" or "2006010215"?

Since this is different from the design doc
ray-project/enhancements#62
Attached_image

"github.com/sirupsen/logrus"
)

const runtimeClassConfigPath = "/var/collector-config/data"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems like it's not being used?

Comment on lines +215 to +221
for i := 0; i < 12; i++ {
rp, err := os.Readlink(session_latest_path)
if err != nil {
logrus.Errorf("read session_latest file error %v", err)
time.Sleep(time.Second * 5)
continue
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like we are doing the retry logic here. Could we extract the maxAttempts into constant or something we can configure, and maybe use backoff here?

And curious why we set the max retry to 12 times here? I think it may be too many times for retry, can we use 3?

Comment on lines +31 to +56
const (
RAY_SESSIONDIR_LOGDIR_NAME = "logs"
RAY_SESSIONDIR_METADIR_NAME = "meta"
)

const (
OssMetaFile_BasicInfo = "ack__basicinfo"

OssMetaFile_NodeSummaryKey = "restful__nodes_view_summary"
OssMetaFile_Node_Prefix = "restful__nodes_"
OssMetaFile_JOBTASK_DETAIL_Prefix = "restful__api__v0__tasks_detail_job_id_"
OssMetaFile_JOBTASK_SUMMARIZE_BY_FUNC_NAME_Prefix = "restful__api__v0__tasks_summarize_by_func_name_job_id_"
OssMetaFile_JOBTASK_SUMMARIZE_BY_LINEAGE_Prefix = "restful__api__v0__tasks_summarize_by_lineage_job_id_"
OssMetaFile_JOBDATASETS_Prefix = "restful__api__data__datasets_job_id_"
OssMetaFile_NodeLogs_Prefix = "restful__api__v0__logs_node_id_"
OssMetaFile_ClusterStatus = "restful__api__cluster_status"
OssMetaFile_LOGICAL_ACTORS = "restful__logical__actors"
OssMetaFile_ALLTASKS_DETAIL = "restful__api__v0__tasks_detail"
OssMetaFile_Events = "restful__events"
OssMetaFile_PlacementGroups = "restful__api__v0__placement_groups_detail"

OssMetaFile_ClusterSessionName = "static__api__cluster_session_name"

OssMetaFile_Jobs = "restful__api__jobs"
OssMetaFile_Applications = "restful__api__serve__applications"
)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In kuberay we usually put constant in a separate file. For example ray-operator put the constant in ray-operator/controllers/ray/utils/constant.go
Could we move those into a constant.go file?

}

func GetSessionDir() (string, error) {
session_latest_path := "/tmp/ray/session_latest"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we also move this to constant?

Comment on lines +9 to +10
// StorageWriter is the interface for storage writer.
type StorageWritter interface {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can StorageWritter be StorageWriter?

Comment on lines +22 to +23
PushInterval time.Duration
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where is PushInterval be used?

Role string
RayClusterName string
RayClusterID string
LogBatching int
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where is this be used?

ARG TARGETARCH

FROM --platform=$BUILDPLATFORM golang:1.25.1 as builder
ENV GOPROXY=https://goproxy.cn,direct
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should it be the official proxy? https://proxy.golang.org/

ref: https://proxy.golang.org/


RUN apt-get update && apt-get upgrade -y && rm -rf /var/cache/apt/ && apt-get install -y ca-certificates

COPY --from=builder /historyserver/output/bin/historyserver /usr/local/bin/historyserver
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The historyserver is not ready in this pr. Should it be removed?

Comment on lines +32 to +33
COPY --from=builder /historyserver/dashboard/v2.51.0/client/build /dashboard/v2.51.0/client/build
COPY --from=builder /historyserver/dashboard/homepage /dashboard/homepage No newline at end of file
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should these COPY consider BUILD_RAYSERVER_DASHBOARD? Or, it might be failed on building.

rm -rf $(OUT_DIR)

.PHONY: build
build: buildcollector buildhistoryserver
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should the buildhistoryserver be removed because it is not ready in this pr? And so does the historyserver related content in this file.

Comment on lines +32 to +40
flag.StringVar(&role, "role", "Worker", "")
flag.StringVar(&runtimeClassName, "runtime-class-name", "", "")
flag.StringVar(&rayClusterName, "ray-cluster-name", "", "")
flag.StringVar(&rayClusterId, "ray-cluster-id", "default", "")
flag.StringVar(&rayRootDir, "ray-root-dir", "", "")
flag.IntVar(&logBatching, "log-batching", 1000, "")
flag.IntVar(&eventsPort, "events-port", 8080, "")
flag.StringVar(&runtimeClassConfigPath, "runtime-class-config-path", "", "") //"/var/collector-config/data"
flag.DurationVar(&pushInterval, "push-interval", time.Minute, "")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might be better to have some usage description.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants