Skip to content

consider using 32-bit offsets for start and end in yp_location_t #1566

@froydnj

Description

@froydnj

YARP provides precise location information for any number of things in its AST, which is great! But each of those locations on a 64-bit platform costs 16 bytes, which means every YARP node (yp_node_t) on those platforms is at least 24 bytes, which is rather large.

For some concrete numbers, nearly half (~47%) of the memory size as measured by YARP::Debug.memsize on a subset of Stripe's codebase is consumed by locations. YARP::Debug.memsize doesn't measure this as-is; the count(s) of locations were obtained from this branch (and note whatever the branch counts needs to be multiplied by 16 to obtain the size):

main...froydnj:froydnj-location-counting

so I might have done something wrong.

Using 32-bit offsets (well, probably 31-bit offsets so there's still some way to mark locations as "invalid" or "unavailable") would immediately shrink the memory required by ~25%; I haven't put together a proof-of-concept branch with that change, but my belief is that there would be a performance gain as well, though probably not of the same magnitude. One can also imagine more complicated schemes where nodes hold a 32-bit key into some side table, which would shrink things still more, but one step at a time.

This change would limit the kinds of Ruby source files that YARP could consume. But other Ruby-consuming tools/implementations (Sorbet and--I think--TruffleRuby cc @eregon ) already maintain 32-bit offsets internally. I believe Clang and Go also only store 32-bit offsets for their locations. If Clang and Go can get away with it, I think YARP can too.

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions