gitaskhub

Minimal, clean code for the Byte Pair Encoding (BPE) algorithm commonly used in LLM tokenization.

Stars · 10,604
Language · Python
License · MIT
Ask anything about this repo to start.
Full explanation on explaingit →

By chatting or signing in you agree to the Terms and chat-message logging (revocable in History).