Skip to content

Conversation

@psilabs-dev
Copy link
Contributor

Issue #1405 . Adds a lock to log rotation, and moves gzip target to prevent new writes from happening.

@psilabs-dev psilabs-dev marked this pull request as draft November 14, 2025 06:31
@psilabs-dev psilabs-dev marked this pull request as ready for review November 17, 2025 05:08
@psilabs-dev psilabs-dev marked this pull request as draft November 17, 2025 06:17
@psilabs-dev psilabs-dev force-pushed the bugfix/lograce branch 2 times, most recently from 6e8342b to b6204f1 Compare November 19, 2025 04:39
@psilabs-dev psilabs-dev marked this pull request as ready for review November 19, 2025 05:22
@psilabs-dev
Copy link
Contributor Author

This is done. Went ahead to just implement RotatingLog, among other things. A temporary solution (given it's only been a week), but should fix the current problem of log-baased data integrity issues and a certain class of random, logging-induced 500 server errors, while maintaining existing performance.

RotatingLog inherits from Mojo::Log, and does a couple extra things on top:

  • handles are created eagerly during init and frequently refreshed
  • rotation checks and rotation are done at both init and append time
  • rotation is guarded by flock
  • supports env configuration of logfiles to retain + max bytes size of logfiles (dev purposes)
  • Throughout all my later integration tests (based on python UUID stress testing + log reading), RotatingLog on LRR has never lost log events thanks to flock and rename/unlink. But I'll stay on the safe side and say that up to 0.1% of logs may be lost due to rotation.
    • In case you're curious: the current implementation of logger will lose around 0.01-0.05% of logs due to rotation races (provided the logger doesn't die so that you can collect the data)

Additionally, loggers are now cached in-memory per-process, which noticably helped with permission failures (probably due to file races). And if cache fails, we have the following redundancy to escape a die:

  • no cache -> create RotatingLog -> retry 3 times -> fallback to Mojo::Log

Current and beta integration tests passed, including windows/docker logging api stress tests on both concurrent logging pressure and append-time log rotation. I won't include them here.

QAs bc decisions are made:

  • "Why do we need a class?" to override Mojo::Log append and new, and get better low-level synergy.
  • "Why check only once every 1k times during append?" to reduce syscalls and redis network connections, and bc "exactly 1MB" isn't a rotation goal.
  • "Why refresh handle?" bc if a file is removed the handle might still point to it and the logs emitted would just disappear
  • "Why use win32api?" bc without it, we can only do copy-truncate which is lossy. If we try the same as unix (copy/delete) using only perl file handlers, we'll run into windows perms errors (see below response).
  • "Do we really need Win32API::File?" Probably yes; unless you're familiar with windows API (which I'm not really).
  • "Why not use Redis?" bc it's not really a good idea to rely on a separate service (and network) for an internal locking system, despite how easy it would be. But you'll have to consider what happens to rotation if redis goes down.

Some things I should probably note:

  • A lock file will be created in the temp directory for purposes of log rotation exclusive lock control. This is going to be used by flock.
  • most of my testing is done on API-side python calling + unix-based perl unit testing, not Windows. My observation from gh actions is that even with prefork, mojo only seems to run single-process on windows where a race would never happen.
  • On Windows, handles are refreshed on every write. Maybe this is a performance issue maybe not.

Dogfooding this now, lmk if there are any regressions in performance, or Windows-related. I'm curious how the unit tests go, I can only hope they don't fail.

rename( $prev, $next ) or die "error: rename failed: ($prev,$next)";
}
my $logpath = get_logdir . "/$logfile.log";
my $cache_key = "$logfile|$pgname";
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It should be cached per log file. This causes issues with the counter not actually counting correctly as it counts per "script" and it breaks rotation on Windows because each logger instance has an open handle to the lock file such that flock exclusive access will never work.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pressure tests pass for me so no complaints here.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With that change it uses the same pgname for everything.
image
image

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Huh...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Switched to mojo log's context feature after a bit of TDD, hopefully it works now (possible refactor notwithstanding).

@psilabs-dev psilabs-dev force-pushed the bugfix/lograce branch 2 times, most recently from 46a0eb4 to 4a615a3 Compare November 28, 2025 07:06
@psilabs-dev psilabs-dev marked this pull request as draft November 28, 2025 07:10
Copy link
Owner

@Difegue Difegue left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Need to actually give this a spin but I have a few codestyle suggestions to start with.

I think this is the first time we're actually doing real perl oop with extension classes and all that jazz!

@psilabs-dev
Copy link
Contributor Author

Thanks, yeah the code is kinda messy atm, will work around eventually to get it into a presentable way!

I'm aiming for the unit tests to test against the interface provided by get_logger, so that it's (somewhat) implementation-free. Luckily, logging is a feature for which unit testing works wonders towards development.

There's a lingering issue (not prod-side) of RotatingLog causing my LSP (https://github.com/richterger/Perl-LanguageServer) to act up and create lock/log files in not-so-random places. Not sure what's going on there but once that's out I will undraft

@psilabs-dev psilabs-dev marked this pull request as ready for review November 28, 2025 21:17
@psilabs-dev

This comment was marked as resolved.

psilabs-dev added a commit to psilabs-dev/LANraragi that referenced this pull request Dec 1, 2025
Implement log rotating logger with RotatingLog

Co-authored-by: Difegue <[email protected]>
psilabs-dev added a commit to psilabs-dev/LANraragi that referenced this pull request Dec 1, 2025
Implement log rotating logger with RotatingLog

Co-authored-by: Difegue <[email protected]>
Implement log rotating logger with RotatingLog

Co-authored-by: Difegue <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants