In The Ruby Syntax Holy Grail: adding opt_respond_to
to the Ruby VM, part 3, I found what I referred to as the “Holy Grail” of Ruby syntax. I’m way overstating it, but it’s a readable, sequential way of viewing how a large portion of the Ruby syntax is compiled. Here’s a snippet of it as a reminder:
// prism_compile.c
static void
pm_compile_node(rb_iseq_t *iseq, const pm_node_t *node, LINK_ANCHOR *const ret, bool popped, pm_scope_node_t *scope_node)
{
const pm_parser_t *parser = scope_node->parser;
//...
switch (PM_NODE_TYPE(node)) {
//...
case PM_ARRAY_NODE: {
// [foo, bar, baz]
// ^^^^^^^^^^^^^^^
const pm_array_node_t *cast = (const pm_array_node_t *) node;
pm_compile_array_node(iseq, (const pm_node_t *) cast, &cast->elements, &location, ret, popped, scope_node);
return;
}
//...
case PM_MODULE_NODE: {
// module Foo; end
//...
}
//...
}
The file that code lives in, prism_compile.c
, is enormous. pm_compile_node
itself is 1800+ lines, and the overall file is 11 thousand lines. It’s daunting to say the least, but there are some obvious directions I can ignore - i’m trying to optimize a method call to respond_to?
, so I can sidestep a majority of the Ruby syntax.
Still, where I do go, specifically?
Sage wisdom
Helpfully, I got two identical sets of direction based on part 3. One from Kevin Newton, creator of Prism:
https://x.com/kddnewton/status/1872280281409105925?s=46
And one from byroot, who inspired this whole series:
https://bsky.app/profile/byroot.bsky.social/post/3le6xypzykc2x
I don’t want to jump to conclusions, but I think I need to look at the peephole optimizer 😆.
And exactly what is a “peephole optimizer”? Kevin described the process as “specialization comes after compilation”. From Wikipedia:
Peephole optimization is an optimization technique performed on a small set of compiler-generated instructions, known as a peephole or window, that involves replacing the instructions with a logically equivalent set that has better performance.
https://en.wikipedia.org/wiki/Peephole_optimization
This seems to fit my goal pretty well. I want to replace the current opt_send_without_block
instruction with a specialized opt_respond_to
instruction, optimized for respond_to?
method calls.
Finding the optimizer
So where are peephole optimizations happening in CRuby today? In Étienne’s PR, he added optimization code to a function called… iseq_peephole_optimize
. A little on the nose, don’t you think? Kevin’s comment also mentioned iseq_peephole_optimize
- seems like the winner.
I want to make the link between iseq_peephole_optimize
and where we left off at pm_compile_node
. Let’s dig into some code!
Disassembling an existing optimization
I’m going to use Étienne’s frozen array optimization to get to the optimizer and see how it relates. If you want to follow along, start with the setup instructions from part 3.
His optimization only applies to array and hash literals being frozen. So we’ll write a teensy Ruby program to demonstrate, and put it in test.rb
at the root of our CRuby project:
The best way to run test.rb
here is to use make
. It will not only run the file, but also make sure things like C files get recompiled as necessary when you make changes. Let’s run our file, but dump the instructions it would generate for the Ruby VM:
RUNOPT0=--dump=insns make runruby
RUNOPT0
lets us add an option to the ruby
call, so it’s effectively ruby --dump=insns test.rb
. Here’s the instructions we see - we can confirm that we are getting the optimized opt_ary_freeze
instruction from Étienne PR:
== disasm: #<ISeq:<main>./test.rb:3 (3,0)-(3,12)>
0000 putself ( 3)[Li]
0001 opt_ary_freeze [], <calldata!mid:freeze, argc:0, ARGS_SIMPLE>
0004 opt_send_without_block <calldata!mid:pp, argc:1, FCALL|ARGS_SIMPLE>
0006 leave
You never know what code is truly doing until you run it. So far, I’ve just been reading and navigating the CRuby source. iseq_peephole_optimize
lives in compile.c
- let’s set a breakpoint and take a look 🕵🏼♂️.
Using the debugger
We can debug C code in CRuby almost as easily as we can use a debugger
/binding.pry
.
For MacOS, you can use lldb
, and for Docker/Linux, you can use gdb
. I’m going to do everything in lldb
to start, but I’ll show some equivalent commands for gdb
after.
Let’s start by looking at the peephole optimization code for [].freeze
, inside of iseq_peephole_optimize
. I’ll add comments above each line to explain what I think it’s doing:
// compile.c
static int
iseq_peephole_optimize(rb_iseq_t *iseq, LINK_ELEMENT *list, const int do_tailcallopt)
{
// ...
// if the instruction is a `newarray` of zero length
3469: if (IS_INSN_ID(iobj, newarray) && iobj->operands[0] == INT2FIX(0)) {
// grab the next element after the current instruction
3470: LINK_ELEMENT *next = iobj->link.next;
// if `next` is an instruction, and the instruction is `send`
3471: if (IS_INSN(next) && (IS_INSN_ID(next, send))) {
3472: const struct rb_callinfo *ci = (struct rb_callinfo *)OPERAND_AT(next, 0);
3473: const rb_iseq_t *blockiseq = (rb_iseq_t *)OPERAND_AT(next, 1);
3474:
// if the callinfo is "simple", with zero arguments,
// and there isn't a block provided(?), and the method id (mid) is `freeze`
// which is represented by `idFreeze`
3475: if (vm_ci_simple(ci) && vm_ci_argc(ci) == 0 && blockiseq == NULL && vm_ci_mid(ci) == idFreeze) {
// change the instruction to `opt_ary_freeze`
3476: iobj->insn_id = BIN(opt_ary_freeze);
// remove the `send` instruction, we don't need it anymore
3481: ELEM_REMOVE(next);
Now i’ll use lldb
to see where this code runs in relation to our prism compilation. In CRuby, to debug you run make lldb-ruby
instead of make runruby
. You’ll see some setup code run, and then you’ll be left at a prompt, prefixed by (lldb)
:
> make lldb-ruby
lldb -o 'command script import -r ../misc/lldb_cruby.py' ruby -- ../test.rb
(lldb) target create "ruby"
Current executable set to '/Users/johncamara/Projects/ruby/build/ruby' (arm64).
(lldb) settings set -- target.run-args "../test.rb"
(lldb) command script import -r ../misc/lldb_cruby.py
lldb scripts for ruby has been installed.
(lldb)
At this point, we haven’t actually run anything. We can now set our breakpoint, then run the program. I’ll add a breakpoint right after all if
statements have succeeded:
(lldb) break set --file compile.c --line 3476
Breakpoint 1: where = ruby`iseq_peephole_optimize + 2276 at compile.c:3476:17
With our breakpoint set, we call run
to run the program:
You’ll see something like the following. It ran the program until it hit our breakpoint, right after identifying a frozen array literal:
(lldb) run
Process 50923 launched: '/ruby/build/ruby' (arm64)
Process 50923 stopped
* thread #1, queue = 'com.apple.main-thread', stop reason = breakpoint 1.1
frame #0: ruby`iseq_peephole_optimize(...) at compile.c:3476:17
3473 const rb_iseq_t *blockiseq = (rb_iseq_t *)OPERAND_AT(next, 1);
3474
3475 if (vm_ci_simple(ci) && vm_ci_argc(ci) == 0 && blockiseq == NULL && vm_ci_mid(ci) == idFreeze) {
-> 3476 iobj->insn_id = BIN(opt_ary_freeze);
3477 iobj->operand_size = 2;
3478 iobj->operands = compile_data_calloc2(iseq, iobj->operand_size, sizeof(VALUE));
3479 iobj->operands[0] = rb_cArray_empty_frozen;
I want to see where we are in relation to all our prism compilation code. We can use bt
to get the backtrace:
(lldb) bt
* thread #1, queue = 'com.apple.main-thread', stop reason = breakpoint 1.1
* frame #0: ruby`iseq_peephole_optimize(...) at compile.c:3476:29
frame #1: ruby`iseq_optimize(...) at compile.c:4352:17
frame #2: ruby`iseq_setup_insn(...) at compile.c:1619:5
frame #3: ruby`pm_iseq_compile_node(...) at prism_compile.c:10139:5
frame #4: ruby`pm_iseq_new_with_opt_try(...) at iseq.c:1029:5
frame #5: ruby`rb_protect(...) at eval.c:1033:18
frame #6: ruby`pm_iseq_new_with_opt(...) at iseq.c:1082:5
frame #7: ruby`pm_new_child_iseq(...) at prism_compile.c:1271:27
frame #8: ruby`pm_compile_node(...) at prism_compile.c:9458:40
frame #9: ruby`pm_compile_node(...) at prism_compile.c:9911:17
frame #10: ruby`pm_compile_scope_node(...) at prism_compile.c:6598:13
frame #11: ruby`pm_compile_node(...) at prism_compile.c:9784:9
frame #12: ruby`pm_iseq_compile_node(...) at prism_compile.c:10122:9
frame #13: ruby`pm_iseq_new_with_opt_try(...) at iseq.c:1029:5
frame #14: ruby`rb_protect(...) at eval.c:1033:18
frame #15: ruby`pm_iseq_new_with_opt(...) at iseq.c:1082:5
frame #16: ruby`pm_iseq_new_top(...) at iseq.c:906:12
frame #17: ruby`load_iseq_eval(...) at load.c:756:24
frame #18: ruby`require_internal(...) at load.c:1296:21
frame #19: ruby`rb_require_string_internal(...) at load.c:1402:22
frame #20: ruby`rb_require_string(...) at load.c:1388:12
frame #21: ruby`rb_f_require(...) at load.c:1029:12
frame #22: ruby`ractor_safe_call_cfunc_1(...) at vm_insnhelper.c:3624:12
frame #23: ruby`vm_call_cfunc_with_frame_(...) at vm_insnhelper.c:3801:11
frame #24: ruby`vm_call_cfunc_with_frame(...) at vm_insnhelper.c:3847:12
frame #25: ruby`vm_call_cfunc_other(...) at vm_insnhelper.c:3873:16
frame #26: ruby`vm_call_cfunc(...) at vm_insnhelper.c:3955:12
frame #27: ruby`vm_call_method_each_type(...) at vm_insnhelper.c:4779:16
frame #28: ruby`vm_call_method(...) at vm_insnhelper.c:4916:20
frame #29: ruby`vm_call_general(...) at vm_insnhelper.c:4949:12
frame #30: ruby`vm_sendish(...) at vm_insnhelper.c:5968:15
frame #31: ruby`vm_exec_core(...) at insns.def:898:11
frame #32: ruby`rb_vm_exec(...) at vm.c:2595:22
frame #33: ruby`rb_iseq_eval(...) at vm.c:2850:11
frame #34: ruby`rb_load_with_builtin_functions(...) at builtin.c:54:5
frame #35: ruby`Init_builtin_features at builtin.c:74:5
frame #36: ruby`ruby_init_prelude at ruby.c:1750:5
frame #37: ruby`ruby_opt_init(...) at ruby.c:1811:5
frame #38: ruby`prism_script(...) at ruby.c:2215:13
frame #39: ruby`process_options(...) at ruby.c:2538:9
frame #40: ruby`ruby_process_options(...) at ruby.c:3169:12
frame #41: ruby`ruby_options(...) at eval.c:117:16
frame #42: ruby`rb_main(...) at main.c:43:26
frame #43: ruby`main(...) at main.c:68:12
Whoa. That thing is huge! This is not the backtrace I was expecting! Seems like I missed a codepath in my earlier explorations. I got it right, up until prism_script
:
main
- which calls
rb_main
- which calls
ruby_options
, then ruby_process_options
, then process_options
- which calls
prism_script
- The next instruction I expected was
pm_iseq_new_main
, but instead we head into ruby_opt_init
- which calls
Init_builtin_features
This path seems to go through some gem preloading logic, which is why we see the rb_require
calls:
void
Init_builtin_features(void)
{
rb_load_with_builtin_functions("gem_prelude", NULL);
}
By default CRuby loads gem_prelude
, which lives in ruby/gem_prelude.rb
. Here’s that file, shortened for brevity:
require 'rubygems'
require 'error_highlight'
require 'did_you_mean'
require 'syntax_suggest/core_ext'
Compiling on-the-fly
There’s something i’ve learned here that seems obvious in hindsight, but I hadn’t considered. Ruby will only compile what is actually loaded, and only at the point it gets loaded. If I never load a particular piece of code, it never gets compiled. Or if I defer loading it until later, it does not get compiled until later.
We can actually demonstrate this by deferring a require:
sleep 10
require "net/http"
If we run this this using make lldb-ruby
, we can see the delayed compilation in action:
(lldb) break set --file ruby.c --line 2616
(lldb) run
// hits our prism compile code
(lldb) next
(lldb) break set --file compile.c --line 3476
(lldb) continue
// waits 10 seconds, then compiles the contents of "net/http"
Getting to our test.rb file
I’d rather see just my code in test.rb
get compiled, so I’m going to set a breakpoint directly on pm_iseq_new_main
, which for me is in ruby.c
on line 2616
:
(lldb) break set --file ruby.c --line 2616
(lldb) run
Process 32534 launched: '/ruby/build/ruby' (arm64)
Process 32534 stopped
* thread #1, queue = 'com.apple.main-thread', stop reason = breakpoint 1.1
frame #0: ruby`process_options(...) at ruby.c:2616:38
2613 if (!result.ast) {
2614 pm_parse_result_t *pm = &result.prism;
2615 int error_state;
-> 2616 iseq = pm_iseq_new_main(&pm->node, opt->script_name, path, parent, optimize, &error_state);
2617
2618 pm_parse_result_free(pm);
2619
Now when we run the backtrace I am seeing what I expected, because we’ve skipped the gem_prelude
compilation. This is the exact flow I walked through in part 2:
(lldb) bt
* thread #1, queue = 'com.apple.main-thread', stop reason = breakpoint 1.1
* frame #0: ruby`process_options(...) at ruby.c:2616:38
frame #1: ruby`ruby_process_options(...) at ruby.c:3169:12
frame #2: ruby`ruby_options(...) at eval.c:117:16
frame #3: ruby`rb_main(...) at main.c:43:26
frame #4: ruby`main(...) at main.c:68:12
From here, we can set our iseq_peephole_optimize
breakpoint and see only our specific code get compiled. Since we’re already in the running program, we call continue
to keep executing:
(lldb) break set --file compile.c --line 3476
Breakpoint 2: where = ruby`iseq_peephole_optimize + 2276 at compile.c:3476:17
(lldb) continue
Process 55336 resuming
Process 55336 stopped
* thread #1, queue = 'com.apple.main-thread', stop reason = breakpoint 2.1
frame #0: ruby`iseq_peephole_optimize() at compile.c:3476:17
3473 const rb_iseq_t *blockiseq = (rb_iseq_t *)OPERAND_AT(next, 1);
3474
3475 if (vm_ci_simple(ci) && vm_ci_argc(ci) == 0 && blockiseq == NULL && vm_ci_mid(ci) == idFreeze) {
-> 3476 iobj->insn_id = BIN(opt_ary_freeze);
3477 iobj->operand_size = 2;
3478 iobj->operands = compile_data_calloc2(iseq, iobj->operand_size, sizeof(VALUE));
3479 iobj->operands[0] = rb_cArray_empty_frozen;
If we call bt
from here to get the backtrace, we finally see the connection between prism_compile.c
and compile.c
. pm_iseq_compile_node
calls iseq_setup_insn
, which runs the optimization logic. In the previous post, I saw iseq_setup_insn
, but I didn’t know what it meant or what it did. Now we know. This is what Kevin Newton referred to earlier: specialization comes after compilation. Prism compiles the node in the standard way, then the peephole optimization layer - the specialization - is applied after:
(lldb) bt
* thread #1, queue = 'com.apple.main-thread', stop reason = breakpoint 2.1
* frame #0: ruby`iseq_peephole_optimize(...) at compile.c:3476:17
frame #1: ruby`iseq_optimize(...) at compile.c:4352:17
frame #2: ruby`iseq_setup_insn(...) at compile.c:1619:5
frame #3: ruby`pm_iseq_compile_node(...) at prism_compile.c:10139:5
frame #4: ruby`pm_iseq_new_with_opt_try(...) at iseq.c:1029:5
frame #5: ruby`rb_protect(...) at eval.c:1033:18
frame #6: ruby`pm_iseq_new_with_opt(...) at iseq.c:1082:5
frame #7: ruby`pm_iseq_new_main(...) at iseq.c:930:12
frame #8: ruby`process_options(...) at ruby.c:2616:20
frame #9: ruby`ruby_process_options(...) at ruby.c:3169:12
frame #10: ruby`ruby_options(...) at eval.c:117:16
frame #11: ruby`rb_main(...) at main.c:43:26
frame #12: ruby`main(...) at main.c:68:12
From here, we can inspect and see the current instruction using expr
:
(lldb) expr *(iobj)
(INSN) $4 = {
link = {
type = ISEQ_ELEMENT_INSN
next = 0x000000011f6568d0
prev = 0x000000011f656850
}
insn_id = YARVINSN_newarray
operand_size = 1
sc_state = 0
operands = 0x000000011f640118
insn_info = (line_no = 1, node_id = 3, events = 0)
}
We see that iobj
contains a link to a subsequent instruction, as well as an insn_id
and some other metadata. The instruction is currently YARVINSN_newarray
. If we run next
, that should run iobj->insn_id = BIN(opt_ary_freeze);
, and our instruction should change:
(lldb) next
(lldb) expr *(iobj)
(INSN) $5 = {
//...
insn_id = YARVINSN_opt_ary_freeze
//...
}
It does! The instruction was changed from newarray
to opt_ary_freeze
! The optimization is at least partially complete (i’m not sure if more is involved, yet).
Making one small step towards opt_respond_to
This is already the longest and densest post in the series. But i’d love to make some actual progress towards a new instruction. Let’s pattern match on respond_to?
in the peephole optimizer.
Here is our sample program:
puts "Did you know you can write to $stdout?" if $stdout.respond_to?(:write)
Run with RUNOPT0=--dump=insns make runruby
, we get the following instructions:
== disasm: #<ISeq:<main>./test.rb:1 (1,0)-(1,76)>
0000 getglobal :$stdout ( 1)[Li]
0002 putobject :write
0004 opt_send_without_block <calldata!mid:respond_to?, argc:1, ARGS_SIMPLE>
0006 branchunless 14
0008 putself
0009 putchilledstring "Did you know you can write to $stdout?"
0011 opt_send_without_block <calldata!mid:puts, argc:1, FCALL|ARGS_SIMPLE>
0013 leave
0014 putnil
0015 leave
I want to match on this line:
0004 opt_send_without_block <calldata!mid:respond_to?, argc:1, ARGS_SIMPLE>
Here’s my attempt. I’m going to copy what the newarray
freeze
optimization is doing, and just try changing a few things to match my example. Right underneath the code we’ve been debugging for newarray
, i’m adding this:
// If the instruction is `send_without_block`, ie `0004 opt_send_without_block`
if (IS_INSN_ID(iobj, send_without_block)) {
// Pull the same info the `newarray` optimization does
const struct rb_callinfo *ci = (struct rb_callinfo *)OPERAND_AT(iobj, 0);
const rb_iseq_t *blockiseq = (rb_iseq_t *)OPERAND_AT(iobj, 1);
// <calldata!mid:respond_to?, argc:1, ARGS_SIMPLE>
// 1. We have ARGS_SIMPLE, which is probably what `vm_ci_simple(ci)` checks for
// 2. We have argc:1, which should match `vm_ci_argc(ci) == 1`
// 3. We send without a block, hence blockiseq == NULL
// 4. The method id (mid) for `vm_ci_mid(ci)` matches `idRespond_to`. I searched around for names
// that seemed similar to idFreeze, but replacing `idFreeze` with `idRespond` and found `idRespond_to`
if (vm_ci_simple(ci) && vm_ci_argc(ci) == 1 && blockiseq == NULL && vm_ci_mid(ci) == idRespond_to) {
int i = 0;
}
}
Now i’ll follow the same debugging as before, but i’ll add a breakpoint in compile.c
where I added my new code. Specifically, I’m setting a breakpoint at the int i = 0;
so I am inside the if
statement:
(lldb) break set --file ruby.c --line 2616
Breakpoint 1: where = ruby`process_options + 4068 at ruby.c:2616:38
(lldb) run
(lldb) break set --file compile.c --line 3491
Breakpoint 2: where = ruby`iseq_peephole_optimize + 2536 at compile.c:3491:17
(lldb) continue
Process 61925 resuming
Process 61925 stopped
* thread #1, queue = 'com.apple.main-thread', stop reason = breakpoint 2.1
frame #0: ruby`iseq_peephole_optimize(...) at compile.c:3491:17
3488 const rb_iseq_t *blockiseq = (rb_iseq_t *)OPERAND_AT(iobj, 1);
3489
3490 if (vm_ci_simple(ci) && vm_ci_argc(ci) == 1 && blockiseq == NULL && vm_ci_mid(ci) == idRespond_to) {
-> 3491 int i = 0;
3492 }
3493 }
3494
I think it worked! It pattern matched on the characteristics of the respond_to?
call, and hit the breakpoint set on int i = 0;
. It’s a tiny step, but it’s a first step in the direction of adding the optimization.
Using gdb
For anyone wanting to do the same work using gdb
, it’s pretty similar. Let’s start off by creating a breakpoints.gdb
file in the root of your project. This will set you up with your initial breakpoint, similar to how we ran lldb
, and set the breakpoint before calling run
:
When you run make gdb-ruby
, you can use the same backtrace command, bt
:
> make gdb-ruby
Thread 1 "ruby" hit Breakpoint 4, process_options (...) at ../ruby.c:2616
2616 iseq = pm_iseq_new_main(&pm->node, opt->script_name, path, parent, optimize, &error_state);
(gdb) bt
#0 process_options (...) at ../ruby.c:2616
#1 in ruby_process_options (...) at ../ruby.c:3169
#2 in ruby_options (...) at ../eval.c:117
#3 in rb_main (...) at ../main.c:43
#4 in main (...) at ../main.c:68
(gdb)
From here, you can set your next breakpoint so that you can see the compilation solely for the newarray
instruction from our test.rb
program:
(gdb) break compile.c:3476
Breakpoint 5 at 0xaaaabaa22f14: file ../compile.c, line 3476
(gdb) continue
Continuing.
Thread 1 "ruby" hit Breakpoint 5, iseq_peephole_optimize (...) at ../compile.c:3476
3476 iobj->insn_id = BIN(opt_ary_freeze);
Similar to the lldb
command expr
, we can inspect the contents of locals using p
or print
in gdb
:
(gdb) p *(iobj)
$2 = {link = {type = ISEQ_ELEMENT_INSN, next = 0xaaaace797ef0, prev = 0xaaaace797e70}, insn_id = YARVINSN_newarray,
operand_size = 1, sc_state = 0, operands = 0xaaaace796ac8, insn_info = {line_no = 1, node_id = 3, events = 0}}
Finishing up
Ok, this went pretty long. Good on you for sticking in there with me! We’ve found the optimizer, and we’ve pattern matched our way to a respond_to?
call. Next, we need to add the new instruction definition and try to actually replace the send
with our new instruction. See you next time! 👋🏼