Introducing TiktokenViewer 🫣 - A childish way to imitate TikTokenizer

AI/ML
LLMs
Projects
A childish attempt to reimplement TikTokenizer in pure python using OpenAI TikToken, streamlit and streamlit extras including st-annotated-text and st-keyup
Published

April 22, 2024

Yeah! Yet another weekend project update 🤓

After watching Andrej Karpathy’s Lets build the GPT Tokenizer video (finally!), where he also introduced the TikTokenizer vercel app to visualize, how a tokenizer works based on BPE but also how it varies on the underlying encoding scheme / model, I couldn’t prevent myself from re-implementing the Token visualizer app myself 🤓👨🏻‍💻☕️🤗.

It was really a fun and rewarding journey to implement this app in 🐍 . Hands down he is one of the best educator.

Implemented using OpenAI’s TikToken, Streamlit and other streamlit components, I’m able to decently mimic the original app (just couldn’t mimic the hover-over effect yet 👨🏻‍💻).

I have hosted TikTokenViewer app in streamlit cloud and source available in Github.

Y’all can access the app here : https://tiktokenviewer.streamlit.app

Go ahead and try it. Have fun! Soon I’ll put together a detailed blog on Tokenization and especially how did I built this app. You can read all about this in my Techno Adventure Substack.

Please do check it out and subscribe (there is a free plan!) to get all the posts delivered right into your inbox.

Subscribe to Techno Adventure Newsletter

I also publish a newsletter where I share my techo adventures in the intersection of Telecom, AI/ML, SW Engineering and Distributed systems. If you like getting my post delivered directly to your inbox whenever I publish, then consider subscribing to my substack.

I pinky promise 🤙🏻 . I won’t sell your emails!