cli: new CLI experience #17824

ngxson · 2025-12-06T14:07:01Z

We are moving to a new CLI experience with the main code built on top of llama-server. This brings many additional features into llama-cli, making the experience feels mostly like a smaller version of the web UI:

Multimodal support
Regenerate last message
Speculative decoding
Fully jinja support (including some edge cases that old llama-cli doesn't support)

TODO:

Use reasoning_content when possible, depends on server: delegate result_state creation to server_task #17835
Allow using left/right arrow to edit current message, depends on Feature request: add support for left/right arrow for console::readline #17828
Maybe allow up/down arrow for history (not sure), depends on Feature request: add support for up/down history in console::readline #17829
Allow reporting timings
Support --image and --audio arguments
Add messages to warn user to move to llama-completion for legacy features

Features planned for next versions:

Allow hide/show timings and reasoning content (saving user preferences to disk and reuse later)
Allow exporting/importing conversation
Support raw completion
Support remote media URL (downloaded from internet)
Show progress (in percentage) if prompt processing takes too long

TODO for console system:

Auto-completion for commands and file paths
Support "temporary display", for example clear loading messages when it's done

tools/cli/cli.cpp

Co-authored-by: bandoti <[email protected]>

ngxson · 2025-12-09T14:56:50Z

The PR should be ready for review now!

bandoti · 2025-12-09T15:07:29Z

I recommend using the console::write abstraction instead of LOG. The reasons are:

When logging is disabled the output will still work. There is no reason the app should fail when logging is disabled.
It is cleaner to write to the output using template function and specializations. For example, writing std::string directly.

ngxson · 2025-12-09T15:21:00Z

@bandoti yes that can be a good suggestion. I thought about that but I think it can be a bit tricky when logging errors, as we may still using the LOG_ERR

However, what I'm thinking is that because the llama-server now become the engine behind llama-cli, it could be cleaner to draw a line between:

llama-server exclusively uses log.h
llama-cli exclusively uses console.h

The benefit could be that --log-file will now only log server-related lines to file, while the main CLI still function as normal.

In addition to write, console.cpp will have and write_err to log error with the correct color.

Tagging @ggerganov if you are OK with this.

bandoti · 2025-12-09T16:01:11Z

@ngxson I like the idea of separation. Though, I'm wondering if adding a log source instead would make sense so filtering may be done per-component. That might be a bit more complex and reach into other aspects of the logging code, but it would allow for other consumers of the server abstraction to log as well. It could also allow for backends to log based on source as well.

If we do go with console, I would consider maybe console::log_<level> using lowercase for the <level> that matches the LOG_* macros.

common/log.cpp

ggerganov · 2025-12-10T08:07:48Z

llama-cli exclusively uses console.h

The main concern about doing this is that printing text to stdout/stderr in the hot generation loop can affect the performance. It is better to do the printing asynchronously.

You can do that by creating a separate common_log instance for console and use that for all prints done by the console. The disadvantage would be that the generation output (i.e. from console) would not be coordinated with the internal server logs, so in some cases it might be difficult to do debugging - i.e. the debug logs can appear after the generated tokens to which they correspond to. Not sure if this is a big issue - just pointing it out, it could be ok.

ngxson · 2025-12-10T09:55:33Z

Forgot to mention that the generation is now running on server_context's thread, so I think printing directly to console from CLI main thread could be ok now (?)

The main motivation of my proposal was to redirect server's internal into another place, so in case of debugging, it won't cog up the console output.

ggerganov · 2025-12-10T10:31:10Z

Forgot to mention that the generation is now running on server_context's thread, so I think printing directly to console from CLI main thread could be ok now (?)

Good point - I had missed that. As long as the thread that prints does not make calls to libllama, it should be good.

tools/cli/cli.cpp

ggerganov · 2025-12-10T11:07:04Z

tools/cli/cli.cpp

+                        console::log("%s", diff.reasoning_content_delta.c_str());
+                        console::flush();
+                    }
+                    fflush(stdout);


I think this flush is not needed.

ggerganov · 2025-12-10T11:07:31Z

tools/cli/cli.cpp

+    // save choice to use color for later
+    // (note for later: this is a slightly awkward choice)
+    console::init(params.simple_io, params.use_color);
+    atexit([]() { console::cleanup(); });


Better make the console a singleton and avoid using atexti(). If you want, leave a TODO for now.

ngxson · 2025-12-10T11:10:57Z

I implemented console::log and console::error to make the syntax a bit javascript-like. It logs directly to stdout using vprintf. The CLI is now more or less like another "web UI" from server_context's perspective

This system won't allow the console output to be logged into file, but I'm not sure if it's useful anymore. The more detailed server log can always be obtained using a combination of -v and --log-file

ggerganov · 2025-12-10T11:15:06Z

common/console.h

+    void log(const char * fmt, ...);
+    void error(const char * fmt, ...);


With variadic argument formating, it's always useful to have the format attribute in order to generate compile-time warnings. See f.ex.:

llama.cpp/common/log.h

Lines 15 to 23 in a7a3fbe

#ifndef __GNUC__

# define LOG_ATTRIBUTE_FORMAT(...)

#elif defined(__MINGW32__) && !defined(__clang__)

# define LOG_ATTRIBUTE_FORMAT(...) __attribute__((format(gnu_printf, __VA_ARGS__)))

#else

# define LOG_ATTRIBUTE_FORMAT(...) __attribute__((format(printf, __VA_ARGS__)))

#endif

I use the LLAMA_COMMON_ATTRIBUTE_FORMAT from common.h instead, so that I don't need to include log.h inside console.h : e135b41

I would like to have overloads like this for simplicity:

inline void log(const char * data) { log("%s", data); } inline void log(const std::string & data) { log("%s", data.c_str()); }

There were too many places using LOG("%s", foo.c_str())); when we can simply console::log(foo);. This might need variadic templates to prevent ambiguity. I also ran into this scenario in the reasoning PR #16603 .

I think it's fine to leave it like this because it guarantees that our version is always compatible with c-only printf() without any conversions

There are not many places we have this pattern, so probably fine for now. We can find a way to improve it in the future

Beside, there is also a risk of having LOG(str). If someone accidentally call it with str being a const char*, the code will still compile even when it's dangerous.

The current LOG("%s", str) does not completely prevent this case, but it leaves a bit more visibility for reviewers

Beside, there is also a risk of having LOG(str). If someone accidentally call it with str being a const char*, the code will still compile even when it's dangerous.

This is precisely the scenario the overload approach fixes. No one can inadvertently call LOG(str) because this overload explicitly prevents it:

inline void log(const char * data) { log("%s", data); }

Reviewers should be familiar with this new type anyhow, so if there's a call to console::log("my log statement"); it will be known to use the proper overload that will never be a bug at runtime.

However, we shouldn't be using the C API anyway, we ought to use C++ idioms (outside of ggml/libllama) so we ought to set precedent here. For example, we can add an iostream input too and then pass a stringstream to log.

In any cases, we are having the multiple log macros across the code base, so I would prefer either having everything expose the same form of API.

Having an overload for std::string seems fine as you explain, but I prefer having a dedicated PR to implement the same pattern everywhere in the code base.

So for now, I'll keep this PR as-is, but you can push a follow-up to improve the logging system

Yeah sounds good. There are a lot of changes going in this PR. Don't want to cause any friction!

I will do a follow-up PR 😊

ggerganov · 2025-12-10T11:21:04Z

common/console.h

+enum display_type {
+    DISPLAY_TYPE_RESET = 0,
+    DISPLAY_TYPE_INFO,
+    DISPLAY_TYPE_PROMPT,
+    DISPLAY_TYPE_REASONING,
+    DISPLAY_TYPE_USER_INPUT,
+    DISPLAY_TYPE_ERROR
+};


mtmd-cli.cpp and completion.cpp need to be updated to use the new enum.

ngxson added 11 commits November 30, 2025 14:38

wip

da400e0

wip

fda30b8

Merge branch 'master' into xsn/cli_server_based

315d09d

fix logging, add display info

85c85ea

handle commands

820c46d

add args

33551d4

wip

9261022

move old cli to llama-completion

b08a426

rm deprecation notice

42e9b38

move server to a shared library

29f03bc

move ci to llama-completion

61de19b

github-actions bot added script Script related testing Everything test related examples devops improvements to build systems and github actions server labels Dec 6, 2025

add loading animation

b9b91ca

This was referenced Dec 6, 2025

Feature request: add support for left/right arrow for console::readline #17828

Closed

Feature request: add support for up/down history in console::readline #17829

Closed

add --show-timings arg

0799d30

ngxson mentioned this pull request Dec 6, 2025

common : change --color to accept on/off/auto, default to auto #17827

Merged

ngxson added 8 commits December 6, 2025 23:44

add /read command, improve LOG_ERR

fb252d7

add args for speculative decoding, enable show timings by default

9987ccb

add arg --image and --audio

57b8d60

fix windows build

f193bbf

support reasoning_content

fa95df0

fix llama2c workflow

7d76234

Merge branch 'master' into xsn/cli_server_based

3300a7a

color default is auto

9b26375

loci-dev mentioned this pull request Dec 7, 2025

UPSTREAM PR #17824: cli: new CLI experience auroralabs-loci/llama.cpp#479

Open

6 tasks

ngxson mentioned this pull request Dec 9, 2025

console: allow using arrow left/right, home/end keys and history mode #17836

Merged

ngxson commented Dec 9, 2025

View reviewed changes

tools/cli/cli.cpp Outdated Show resolved Hide resolved

ngxson and others added 6 commits December 9, 2025 14:47

Merge branch 'master' into xsn/cli_server_based

dd660e9

fix merge conflicts

a03b2af

properly fix color problem

3bb5471

Co-authored-by: bandoti <[email protected]>

better loading spinner

f00041d

make sure to clean color on force-exit

7de7697

also clear input files on "/clear"

2731b4d

ngxson marked this pull request as ready for review December 9, 2025 14:56

ngxson requested a review from CISC as a code owner December 9, 2025 14:56

ggerganov reviewed Dec 10, 2025

View reviewed changes

common/log.cpp Show resolved Hide resolved

ngxson added 3 commits December 10, 2025 11:36

simplify common_log_flush

6defee4

add warning in mtmd-cli

a7a3fbe

implement console writter

9627d2c

ggerganov approved these changes Dec 10, 2025

View reviewed changes

fix data race

d00d00a

ggerganov reviewed Dec 10, 2025

View reviewed changes

add attribute

e135b41

ggerganov reviewed Dec 10, 2025

View reviewed changes

ngxson added 3 commits December 10, 2025 12:23

fix llama-completion and mtmd-cli

ed3fe19

add some notes about console::log

9fdf597

fix compilation

c5faae9

		void log(const char * fmt, ...);
		void error(const char * fmt, ...);


	#ifndef __GNUC__
	# define LOG_ATTRIBUTE_FORMAT(...)
	#elif defined(__MINGW32__) && !defined(__clang__)
	# define LOG_ATTRIBUTE_FORMAT(...) __attribute__((format(gnu_printf, __VA_ARGS__)))
	#else
	# define LOG_ATTRIBUTE_FORMAT(...) __attribute__((format(printf, __VA_ARGS__)))
	#endif

cli: new CLI experience #17824

Are you sure you want to change the base?

cli: new CLI experience #17824

Conversation

ngxson commented Dec 6, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

ngxson commented Dec 9, 2025

Uh oh!

bandoti commented Dec 9, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ngxson commented Dec 9, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

bandoti commented Dec 9, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

ggerganov commented Dec 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ngxson commented Dec 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ggerganov commented Dec 10, 2025

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ngxson commented Dec 10, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

bandoti Dec 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

bandoti Dec 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ggerganov Dec 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

ngxson commented Dec 6, 2025 •

edited

Loading

bandoti commented Dec 9, 2025 •

edited

Loading

ngxson commented Dec 9, 2025 •

edited

Loading

bandoti commented Dec 9, 2025 •

edited

Loading

ggerganov commented Dec 10, 2025 •

edited

Loading

ngxson commented Dec 10, 2025 •

edited

Loading

bandoti Dec 10, 2025 •

edited

Loading

bandoti Dec 10, 2025 •

edited

Loading

ggerganov Dec 10, 2025 •

edited

Loading