In Adding opt_respond_to to the Ruby VM: part 1, inspired by recent JSON gem optimizations, I setup my goal: I want to add a new bytecode instruction to the Ruby VM which optimizes respond_to? calls. I took this Ruby code:

if $stdout.respond_to?(:write)
  puts "Did you know you can write to $stdout?"
end

And identified what bytecode instructions matter most (I think):

== disasm: #<ISeq:<compiled>@<compiled>:1 (1,0)-(3,3)>
# ...
0002 putobject                 :write
0004 opt_send_without_block    <calldata!mid:respond_to?, argc:1, ARGS_SIMPLE>

This seems pretty low-level! But it’s still very high-level in terms of what I need to actually do. I know what instructions matter, but how can I change them?

A little help from Git

Thankfully, @byroot gave me some helpful direction in the form of a recent PR. Étienne Barrié recently merged a PR to add optimized instructions for frozen literal Hash and Array. It adds special handling for frozen, empty arrays and hashes so that when you use them in your code, calls like [].freeze will not result in any additional object allocations. Cool! A neat enhancement, and kind of perfect for me to analyze.

I think the use-case is simpler than what i’d need for opt_respond_to, but looking at that PR I can see an important part of adding a new bytecode instruction is in compile.c. I’ll eventually need to add some new logic there, but what steps does CRuby take to get to compile.c?

Starting from main

Knowing a bit about the CRuby source, and how C programs start, I know there is a main function that kicks everything off. In CRuby, it’s helpfully in main.c, at the root of the project:

int
main(int argc, char **argv)
{
    //...
    return rb_main(argc, argv);
}

static int
rb_main(int argc, char **argv)
{
    RUBY_INIT_STACK;
    ruby_init();
    return ruby_run_node(ruby_options(argc, argv));
}

I’m going to guess at not needing RUBY_INIT_STACK or ruby_init for adding a new instruction. I took a peek at it and it all seems related to setting up the runtime, and creating data structures needed for the Ruby virtual machine. Past that, there are only two function calls: ruby_options and ruby_run_node. ruby_options sounds like it would just get the options needed for the program. Maybe we need to go into ruby_run_node?

int
ruby_run_node(void *n)
{
    rb_execution_context_t *ec = GET_EC();
    int status;
    if (!ruby_executable_node(n, &status)) {
        rb_ec_cleanup(ec, (NIL_P(ec->errinfo) ? TAG_NONE : TAG_RAISE));
        return status;
    }
    return rb_ec_cleanup(ec, rb_ec_exec_node(ec, n));
}

Maybe? It looks like if the “node” n isn’t executable, it fails. I’ll concentrate instead on the success path on the last line. rb_ec_exec_node will run first, followed by rb_ec_cleanup.

📝 All this ec_* stuff seems to stand for execution_context, which presumably is the state of the runtime at any given point?

📝 I’m not Mr. C programmer - so I didn’t know what void *n meant. Looking it up, this seems to be a way of specifying a generic pointer type that can point to any data type

Let’s start by checking rb_ec_exec_node:

static int
rb_ec_exec_node(rb_execution_context_t *ec, void *n)
{
    volatile int state;
    rb_iseq_t *iseq = (rb_iseq_t *)n;
    if (!n) return 0;

    EC_PUSH_TAG(ec);
    if ((state = EC_EXEC_TAG()) == TAG_NONE) {
        rb_iseq_eval_main(iseq);
    }
    EC_POP_TAG();
    return state;
}

Hmmmm. This is the first place where I don’t like what i’m seeing. My primary concern is that void *n is getting cast to rb_iseq_t. The class we used to compile our Ruby sample in part 1 - RubyVM::InstructionSequence - is defined in a C file called iseq.c. So in CRuby, iseq stands for “InstructionSequence”. If we already have an iseq, I think it means our code has already been compiled and we’ve gone too far.

Stepping back to ruby_options

ruby_run_node doesn’t do much aside from calling rb_ec_exec_node. So if ruby_run_node and rb_ec_exec_node are not the right functions… that only leaves ruby_options. Not what I would expect, but let’s check:

void *
ruby_options(int argc, char **argv)
{
    rb_execution_context_t *ec = GET_EC();
    enum ruby_tag_type state;
    void *volatile iseq = 0;

    EC_PUSH_TAG(ec);
    if ((state = EC_EXEC_TAG()) == TAG_NONE) {
        iseq = ruby_process_options(argc, argv);
    }
    else {
        rb_ec_clear_current_thread_trace_func(ec);
        int exitcode = error_handle(ec, ec->errinfo, state);
        ec->errinfo = Qnil; /* just been handled */
        iseq = (void *)INT2FIX(exitcode);
    }
    EC_POP_TAG();
    return iseq;
}

There’s a lot going on in here, but I’m drawn to the iseq = ruby_process_options(argc, argv) line. Let’s dig into ruby_process_options. This is a big one:

void *
ruby_process_options(int argc, char **argv)
{
    ruby_cmdline_options_t opt;
    VALUE iseq;
    const char *script_name = (argc > 0 && argv[0]) ? argv[0] : ruby_engine;

    if (!origarg.argv || origarg.argc <= 0) {
        origarg.argc = argc;
        origarg.argv = argv;
    }
    set_progname(external_str_new_cstr(script_name));  /* for the time being */
    rb_argv0 = rb_str_new4(rb_progname);
    rb_vm_register_global_object(rb_argv0);

#ifndef HAVE_SETPROCTITLE
    ruby_init_setproctitle(argc, argv);
#endif

    iseq = process_options(argc, argv, cmdline_options_init(&opt));

    //...

    return (void*)(struct RData*)iseq;
}

Most of this function seems to be VM setup. But I think we’re getting closer with iseq = process_options(...).

Checking process_options… whoa, this is a ~350 line function! It’s a bit much to all paste in here, but scanning the code, I think we’re on the right track. There are all sorts of option initializations here:

static VALUE
process_options(int argc, char **argv, ruby_cmdline_options_t *opt)
{
    //...
    if (FEATURE_SET_P(opt->features, yjit)) {
        bool rb_yjit_option_disable(void);
        opt->yjit = !rb_yjit_option_disable(); // set opt->yjit for Init_ruby_description() and calling rb_yjit_init()
    }
    //...
    ruby_mn_threads_params();
    Init_ruby_description(opt);
    //...
    ruby_gc_set_params();
    ruby_init_loadpath();
    //...
}

Among many other things, it sets up options for yjit, mn threads, the program description, garbage collection params, and the loadpath. That’s just scratching the surface of this function. Then around 240 lines into the function, I see a very promising if statement:

static VALUE
process_options(int argc, char **argv, ruby_cmdline_options_t *opt)
{
    //...
    struct {
        rb_ast_t *ast;
        pm_parse_result_t prism;
    } result = {0};
    // ... ~240 lines of option handling
    if (!rb_ruby_prism_p()) {
        ast_value = process_script(opt);
        if (!(result.ast = rb_ruby_ast_data_get(ast_value))) return Qfalse;
    }
    else {
        prism_script(opt, &result.prism);
    }

The beginning of the function sets up a struct that contains either a rb_ast_t, or a pm_parse_result_t. Prism is the new default Ruby parser as of Ruby 3.4, so we’re getting close. rb_ast_t must be the format for the prior CRuby parser.

From a naming perspective, I would never have guessed that ruby_options is the place that parses our Ruby code. In principle I guess this is all preamble to actually running the program, so it kind of relates.

I won’t dig into prism_script, since it would create our Abstract Syntax Tree (AST), which I expect later will be used by the compiler:

typedef struct {
    //...
    /** The resulting scope node that will hold the generated AST. */
    pm_scope_node_t node;
    //...
} pm_parse_result_t;

Ok, here we go! I think we’ve got it with this next section! The pm_scope_node_t node (which should be set by prism_script) is used to create our rb_iseq_t *iseq inside of pm_iseq_new_main!

// ~320 lines into the function
pm_parse_result_t *pm = &result.prism;
int error_state;
iseq = pm_iseq_new_main(&pm->node, opt->script_name, path, parent, optimize, &error_state);

We now have the entrypoint into creating our InstructionSequence (our iseq, or rb_iseq_t). I wanted to start digging into the actual compiler, but I think I’ll stop here for today.

Now that we know the entrypoint into the compiler, we can start figuring out what code might need to change to add a new bytecode instruction. Next up, i’m hoping we can find the appropriate area that needs that change. See you then! 👋🏼