@byroot has been posting a series on optimizations he and others have made to the json gem, and it’s been 🔥🔥🔥. Enjoyable and informative, I highly recommend reading what he’s posted so far.
In his second post, he mentions the possibility of improving performance by adding an additional method cache. This would involve compiling respond_to?
calls in a special way:
It actually wouldn’t be too hard to add such a cache, we’d need to modify the Ruby compiler to compile respond_to? calls into a specialized opt_respond_to instruction that does have two caches instead of one. The first cache would be used to look up respond_to? on the object to make sure it wasn’t redefined, and the second one to look up the method we’re interested in. Or perhaps even 3 caches, as you also need to check if the object has a respond_to_missing? method defined in some cases.
That’s an idea I remember discussing in the past with some fellow committers, but I can’t quite remember if there was a reason we didn’t do it yet.
Inspired by his comment, I’m going to add a new bytecode instruction - opt_respond_to
- to the Ruby VM, for fun. I don’t know how to add a new bytecode instruction (yet). I don’t know if one would get accepted by the Ruby team. I don’t know if adding it will actually provide a meaningful enhancement to performance. But let’s give it a try, shall we?
Understanding the requirements
I know I want to add a new Ruby Virtual Machine bytecode called opt_respond_to
. What does code using respond_to?
look like today, after being compiled? Here’s some code to evaluate:
if $stdout.respond_to?(:write)
puts "Did you know you can write to $stdout?"
end
We can compile it using RubyVM::InstructionSequence
. We compile
the code, then we disassemble
it to see the actual Ruby bytecode:
puts RubyVM::InstructionSequence.compile(DATA.read).disassemble
__END__
if $stdout.respond_to?(:write)
puts "Did you know you can write to $stdout?"
end
📝 The
__END__
format is just a convenient way of supplying some text to your program. Here we put all our Ruby code we want to compile after__END__
and it will be available to our program as an IO object calledDATA
. Thanks to Drew Bragg for the tip!
Compiling using InstructionSequence
gives us:
== disasm: #<ISeq:<compiled>@<compiled>:1 (1,0)-(3,3)>
0000 getglobal :$stdout ( 1)[Li]
0002 putobject :write
0004 opt_send_without_block <calldata!mid:respond_to?, argc:1, ARGS_SIMPLE>
0006 branchunless 14
0008 putself ( 2)[Li]
0009 putstring "Did you know you can write to $stdout?"
0011 opt_send_without_block <calldata!mid:puts, argc:1, FCALL|ARGS_SIMPLE>
0013 leave
0014 putnil
0015 leave
I’m going to guess at the parts I think matter most - putobject
, opt_send_without_block
and branchunless
.
putobject
pushes the symbol:write
onto the vm stackopt_send_without_block
is given<calldata!mid:respond_to?, argc:1, ARGS_SIMPLE>
. I’m guessingcalldata
is a format for specifying metadata about what is being called. We’ve got the method name,respond_to?
, how many args are being used,argc:1
, and that the arguments are “simple”,ARGS_SIMPLE
.mid
stands for… “method id”?branchunless
wouldn’t be specifically related to creating a new instruction, but I just think it’s informative for how therespond_to?
result is used. I believe it means “if the last result is false, jump to instruction 14”. 14 in this case is theputnil
near the bottom of the bytecode. Each instruction seems to be prefixed with a hex value that identifies the location of the instruction.putobject
is located at0002
,opt_send_without_block
is located at0004
,branchunless
is located at0006
andputnil
is located at0014
I think the only thing I will be changing is taking calls to respond_to?
, and making that a opt_respond_to
bytecode instead of opt_send_without_block
. We’ll see!
There are actually a few variations for respond_to?
that I wasn’t aware of. The interface takes a symbol or a string as the first parameter, and then a boolean for whether to include private
and protected
methods:
$stdout.respond_to?("write")
$stdout.respond_to?("write", true)
$stdout.respond_to?(:write, true)
Let’s see the bytecode for these variations:
== disasm: #<ISeq:<compiled>@<compiled>:1 (1,0)-(3,33)>
0000 getglobal :$stdout ( 1)[Li]
0002 putstring "write"
0004 opt_send_without_block <calldata!mid:respond_to?, argc:1, ARGS_SIMPLE>
0006 pop
0007 getglobal :$stdout ( 2)[Li]
0009 putstring "write"
0011 putobject true
0013 opt_send_without_block <calldata!mid:respond_to?, argc:2, ARGS_SIMPLE>
0015 pop
0016 getglobal :$stdout ( 3)[Li]
0018 putobject :write
0020 putobject true
0022 opt_send_without_block <calldata!mid:respond_to?, argc:2, ARGS_SIMPLE>
0024 leave
It doesn’t make a huge difference. The string version is identical, except we see "write"
instead of :write
in the putobject
call. The boolean versions add an additional putobject
which pushes the boolean onto the stack. Then the opt_send_without_block
call has argc:2
instead of argc:1
. Good to understand, but functionally the same.
I’m going to keep these explorations shorter, and break them into parts, so i’m going to stop here. So far we’ve:
- Identified that I want to create an
opt_respond_to
instruction for the Ruby VM - Compiled a simple
respond_to?
example, and examined what the current bytecode looks like - Identified what I think are the relevant instructions that need to be converted to
opt_respond_to
Next up, i’m going to walk through the path CRuby takes from starting the program, to compiling our code, starting with main.c
. Here we go!