ggex-bench is finished and open source

This post was originally posted on Vaporlens Patreon

Hey folks!

Quick update: ggex-bench is finally finished and open source on GitHub:

I won't re-explain the whole benchmark again, since I've already written about it a couple of times. The important bit is that the code, gold graphs, saved outputs, and result reports are now public.

So if you were curious about the actual setup behind the previous posts, you can now poke around the repo directly instead of just reading my summaries.

Small warning: this may still be a bit messy

One important caveat: this repo came out of a private experiment repo.

I copied a bunch of stuff over from that private repo, removed things that should not be public, cleaned up the docs, renamed some scripts, checked the committed files, and generally tried my best to make it understandable for someone who is not me.

But it may still be messy in places.

There might be awkward script names, assumptions that made sense in my local setup, docs that skip over something obvious to me but not to anyone else, or experimental files that could be organized better.

I tried to make it nice, but this is still a research benchmark that escaped from an experiments folder, not a polished framework.

So if you try it and find anything weird, confusing, broken, or badly explained, please report an issue on GitHub. Even small stuff is useful: unclear setup instructions, missing environment details, confusing result files, broken commands, bad naming, anything like that.

Why open source it now

Mostly because the benchmark has done what I needed it to do for this stage.

It helped me pick a better schema direction, showed where the current model failure modes are, and gave me enough confidence to move from tiny experiments toward a larger knowledge graph extraction run for VaporLens.

At this point, keeping it private is not very useful. The results are more useful if people can inspect the data, see the evaluation scripts, disagree with the setup, run their own models, or point out where the benchmark itself is weak.

And honestly, I also wanted the work to exist somewhere public before I move too far ahead with the next stage.

What's next

The next step is still the bigger gaming knowledge graph work.

ggex-bench helped answer the first question well enough for me to move on from benchmarking and start testing the extraction pipeline at a larger scale.

That is the part I am moving toward now.

So, yeah - ggex-bench is finally public. If you are curious, have a look, and if you spot anything broken or confusing, please open an issue.

More soon!

Cheers, Tim

← Back to Blog