-
Notifications
You must be signed in to change notification settings - Fork 14k
cli: new CLI experience #17824
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
cli: new CLI experience #17824
Conversation
|
The PR should be ready for review now! |
|
I recommend using the
|
|
@bandoti yes that can be a good suggestion. I thought about that but I think it can be a bit tricky when logging errors, as we may still using the However, what I'm thinking is that because the llama-server now become the engine behind llama-cli, it could be cleaner to draw a line between:
The benefit could be that In addition to Tagging @ggerganov if you are OK with this. |
|
@ngxson I like the idea of separation. Though, I'm wondering if adding a log source instead would make sense so filtering may be done per-component. That might be a bit more complex and reach into other aspects of the logging code, but it would allow for other consumers of the server abstraction to log as well. It could also allow for backends to log based on source as well. If we do go with console, I would consider maybe |
The main concern about doing this is that printing text to stdout/stderr in the hot generation loop can affect the performance. It is better to do the printing asynchronously. You can do that by creating a separate |
|
Forgot to mention that the generation is now running on server_context's thread, so I think printing directly to console from CLI main thread could be ok now (?) The main motivation of my proposal was to redirect server's internal into another place, so in case of debugging, it won't cog up the console output. |
Good point - I had missed that. As long as the thread that prints does not make calls to |
tools/cli/cli.cpp
Outdated
| console::log("%s", diff.reasoning_content_delta.c_str()); | ||
| console::flush(); | ||
| } | ||
| fflush(stdout); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this flush is not needed.
| // save choice to use color for later | ||
| // (note for later: this is a slightly awkward choice) | ||
| console::init(params.simple_io, params.use_color); | ||
| atexit([]() { console::cleanup(); }); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Better make the console a singleton and avoid using atexti(). If you want, leave a TODO for now.
|
I implemented This system won't allow the console output to be logged into file, but I'm not sure if it's useful anymore. The more detailed server log can always be obtained using a combination of |
| void log(const char * fmt, ...); | ||
| void error(const char * fmt, ...); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With variadic argument formating, it's always useful to have the format attribute in order to generate compile-time warnings. See f.ex.:
Lines 15 to 23 in a7a3fbe
| #ifndef __GNUC__ | |
| # define LOG_ATTRIBUTE_FORMAT(...) | |
| #elif defined(__MINGW32__) && !defined(__clang__) | |
| # define LOG_ATTRIBUTE_FORMAT(...) __attribute__((format(gnu_printf, __VA_ARGS__))) | |
| #else | |
| # define LOG_ATTRIBUTE_FORMAT(...) __attribute__((format(printf, __VA_ARGS__))) | |
| #endif | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I use the LLAMA_COMMON_ATTRIBUTE_FORMAT from common.h instead, so that I don't need to include log.h inside console.h : e135b41
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would like to have overloads like this for simplicity:
inline void log(const char * data) {
log("%s", data);
}
inline void log(const std::string & data) {
log("%s", data.c_str());
}
There were too many places using LOG("%s", foo.c_str())); when we can simply console::log(foo);. This might need variadic templates to prevent ambiguity. I also ran into this scenario in the reasoning PR #16603 .
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's fine to leave it like this because it guarantees that our version is always compatible with c-only printf() without any conversions
There are not many places we have this pattern, so probably fine for now. We can find a way to improve it in the future
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Beside, there is also a risk of having LOG(str). If someone accidentally call it with str being a const char*, the code will still compile even when it's dangerous.
The current LOG("%s", str) does not completely prevent this case, but it leaves a bit more visibility for reviewers
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Beside, there is also a risk of having LOG(str). If someone accidentally call it with str being a const char*, the code will still compile even when it's dangerous.
This is precisely the scenario the overload approach fixes. No one can inadvertently call LOG(str) because this overload explicitly prevents it:
inline void log(const char * data) {
log("%s", data);
}
Reviewers should be familiar with this new type anyhow, so if there's a call to console::log("my log statement"); it will be known to use the proper overload that will never be a bug at runtime.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
However, we shouldn't be using the C API anyway, we ought to use C++ idioms (outside of ggml/libllama) so we ought to set precedent here. For example, we can add an iostream input too and then pass a stringstream to log.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In any cases, we are having the multiple log macros across the code base, so I would prefer either having everything expose the same form of API.
Having an overload for std::string seems fine as you explain, but I prefer having a dedicated PR to implement the same pattern everywhere in the code base.
So for now, I'll keep this PR as-is, but you can push a follow-up to improve the logging system
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah sounds good. There are a lot of changes going in this PR. Don't want to cause any friction!
I will do a follow-up PR 😊
| enum display_type { | ||
| DISPLAY_TYPE_RESET = 0, | ||
| DISPLAY_TYPE_INFO, | ||
| DISPLAY_TYPE_PROMPT, | ||
| DISPLAY_TYPE_REASONING, | ||
| DISPLAY_TYPE_USER_INPUT, | ||
| DISPLAY_TYPE_ERROR | ||
| }; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
mtmd-cli.cpp and completion.cpp need to be updated to use the new enum.
Ref: #17618
Fix: #11202
We are moving to a new CLI experience with the main code built on top of
llama-server. This brings many additional features intollama-cli, making the experience feels mostly like a smaller version of the web UI:llama-clidoesn't support)TODO:
reasoning_contentwhen possible, depends on server: delegate result_state creation to server_task #17835console::readline#17828console::readline#17829--imageand--audioargumentsllama-completionfor legacy featuresFeatures planned for next versions:
TODO for console system: