-
Notifications
You must be signed in to change notification settings - Fork 663
add the implementation of historyserver collector #4241
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
add the implementation of historyserver collector #4241
Conversation
update go.work go.mod Signed-off-by: KunWuLuan <[email protected]>
Signed-off-by: KunWuLuan <[email protected]>
|
|
||
| // 分类事件 | ||
| for _, event := range eventsToFlush { | ||
| hourKey := event.Timestamp.Truncate(time.Hour).Format("2006-01-02-15") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should we be "2006-01-02-15" or "2006010215"?
Since this is different from the design doc
ray-project/enhancements#62

| "github.com/sirupsen/logrus" | ||
| ) | ||
|
|
||
| const runtimeClassConfigPath = "/var/collector-config/data" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems like it's not being used?
| for i := 0; i < 12; i++ { | ||
| rp, err := os.Readlink(session_latest_path) | ||
| if err != nil { | ||
| logrus.Errorf("read session_latest file error %v", err) | ||
| time.Sleep(time.Second * 5) | ||
| continue | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like we are doing the retry logic here. Could we extract the maxAttempts into constant or something we can configure, and maybe use backoff here?
And curious why we set the max retry to 12 times here? I think it may be too many times for retry, can we use 3?
| const ( | ||
| RAY_SESSIONDIR_LOGDIR_NAME = "logs" | ||
| RAY_SESSIONDIR_METADIR_NAME = "meta" | ||
| ) | ||
|
|
||
| const ( | ||
| OssMetaFile_BasicInfo = "ack__basicinfo" | ||
|
|
||
| OssMetaFile_NodeSummaryKey = "restful__nodes_view_summary" | ||
| OssMetaFile_Node_Prefix = "restful__nodes_" | ||
| OssMetaFile_JOBTASK_DETAIL_Prefix = "restful__api__v0__tasks_detail_job_id_" | ||
| OssMetaFile_JOBTASK_SUMMARIZE_BY_FUNC_NAME_Prefix = "restful__api__v0__tasks_summarize_by_func_name_job_id_" | ||
| OssMetaFile_JOBTASK_SUMMARIZE_BY_LINEAGE_Prefix = "restful__api__v0__tasks_summarize_by_lineage_job_id_" | ||
| OssMetaFile_JOBDATASETS_Prefix = "restful__api__data__datasets_job_id_" | ||
| OssMetaFile_NodeLogs_Prefix = "restful__api__v0__logs_node_id_" | ||
| OssMetaFile_ClusterStatus = "restful__api__cluster_status" | ||
| OssMetaFile_LOGICAL_ACTORS = "restful__logical__actors" | ||
| OssMetaFile_ALLTASKS_DETAIL = "restful__api__v0__tasks_detail" | ||
| OssMetaFile_Events = "restful__events" | ||
| OssMetaFile_PlacementGroups = "restful__api__v0__placement_groups_detail" | ||
|
|
||
| OssMetaFile_ClusterSessionName = "static__api__cluster_session_name" | ||
|
|
||
| OssMetaFile_Jobs = "restful__api__jobs" | ||
| OssMetaFile_Applications = "restful__api__serve__applications" | ||
| ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In kuberay we usually put constant in a separate file. For example ray-operator put the constant in ray-operator/controllers/ray/utils/constant.go
Could we move those into a constant.go file?
| } | ||
|
|
||
| func GetSessionDir() (string, error) { | ||
| session_latest_path := "/tmp/ray/session_latest" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we also move this to constant?
| // StorageWriter is the interface for storage writer. | ||
| type StorageWritter interface { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can StorageWritter be StorageWriter?
| PushInterval time.Duration | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Where is PushInterval be used?
| Role string | ||
| RayClusterName string | ||
| RayClusterID string | ||
| LogBatching int |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Where is this be used?
| ARG TARGETARCH | ||
|
|
||
| FROM --platform=$BUILDPLATFORM golang:1.25.1 as builder | ||
| ENV GOPROXY=https://goproxy.cn,direct |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should it be the official proxy? https://proxy.golang.org/
|
|
||
| RUN apt-get update && apt-get upgrade -y && rm -rf /var/cache/apt/ && apt-get install -y ca-certificates | ||
|
|
||
| COPY --from=builder /historyserver/output/bin/historyserver /usr/local/bin/historyserver |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The historyserver is not ready in this pr. Should it be removed?
| COPY --from=builder /historyserver/dashboard/v2.51.0/client/build /dashboard/v2.51.0/client/build | ||
| COPY --from=builder /historyserver/dashboard/homepage /dashboard/homepage No newline at end of file |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should these COPY consider BUILD_RAYSERVER_DASHBOARD? Or, it might be failed on building.
| rm -rf $(OUT_DIR) | ||
|
|
||
| .PHONY: build | ||
| build: buildcollector buildhistoryserver |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should the buildhistoryserver be removed because it is not ready in this pr? And so does the historyserver related content in this file.
| flag.StringVar(&role, "role", "Worker", "") | ||
| flag.StringVar(&runtimeClassName, "runtime-class-name", "", "") | ||
| flag.StringVar(&rayClusterName, "ray-cluster-name", "", "") | ||
| flag.StringVar(&rayClusterId, "ray-cluster-id", "default", "") | ||
| flag.StringVar(&rayRootDir, "ray-root-dir", "", "") | ||
| flag.IntVar(&logBatching, "log-batching", 1000, "") | ||
| flag.IntVar(&eventsPort, "events-port", 8080, "") | ||
| flag.StringVar(&runtimeClassConfigPath, "runtime-class-config-path", "", "") //"/var/collector-config/data" | ||
| flag.DurationVar(&pushInterval, "push-interval", time.Minute, "") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It might be better to have some usage description.
add the implementation of historyserver collector
Why are these changes needed?
Related issue number
#3966
Checks