-
Notifications
You must be signed in to change notification settings - Fork 177
Description
YARP provides precise location information for any number of things in its AST, which is great! But each of those locations on a 64-bit platform costs 16 bytes, which means every YARP node (yp_node_t) on those platforms is at least 24 bytes, which is rather large.
For some concrete numbers, nearly half (~47%) of the memory size as measured by YARP::Debug.memsize on a subset of Stripe's codebase is consumed by locations. YARP::Debug.memsize doesn't measure this as-is; the count(s) of locations were obtained from this branch (and note whatever the branch counts needs to be multiplied by 16 to obtain the size):
main...froydnj:froydnj-location-counting
so I might have done something wrong.
Using 32-bit offsets (well, probably 31-bit offsets so there's still some way to mark locations as "invalid" or "unavailable") would immediately shrink the memory required by ~25%; I haven't put together a proof-of-concept branch with that change, but my belief is that there would be a performance gain as well, though probably not of the same magnitude. One can also imagine more complicated schemes where nodes hold a 32-bit key into some side table, which would shrink things still more, but one step at a time.
This change would limit the kinds of Ruby source files that YARP could consume. But other Ruby-consuming tools/implementations (Sorbet and--I think--TruffleRuby cc @eregon ) already maintain 32-bit offsets internally. I believe Clang and Go also only store 32-bit offsets for their locations. If Clang and Go can get away with it, I think YARP can too.