Tips for Tidy Jupyter Notebooks
data:image/s3,"s3://crabby-images/6ae01/6ae01ac18e0ad6581f52de6a489aec01f71a8a0f" alt="A screenshot of a jupyter notebook demonstrating outline on the left side, and the code and text on the right"
Jupyter notebooks are versatile and widely-used in Data and AI. For those that haven't used them, they allow you to run code interactively with notes side by side, in a clean and organized interface. However, just like with any code or program, they require some discipline to maintain them. Due to the interactive nature and the speed at which you can iterate, they can get unruly quickly. Below are my tips for keeping your notebooks organized to keep you and your team sane. Your future self and team will love you for it ❤️.
Double-check the order of the cells
The notebooks run top-down but during rapid iteration the cell order can get jumbled. It's easy to insert cells that depend on later code. Before sharing, ensure your notebook runs cleanly by arranging cells in logical order. This simple step prevents confusion and frustration for anyone who uses the notebook later.
Include Title & Summary at the top of the notebook
There's nothing worse than opening someone's notebook and having to dig through all the cells just to figure out what it does. Write a quick summary at the top explaining:
- If it's an analysis notebook: What were you looking for? What did you find?
- If it's part of a data pipeline: What's its job? Any special settings needed?
- Any gotchas someone should know before running it? Does it write to a database?
Use Markdown as logical separators
Notebooks have markdown cells which allow you to separate your notebook into sections. Using markdown header syntax, the Jupyter interface will have a outline in the side panel that can help you jump to a specific section easily.
They also allow you to collapse entire sections. I find this helpful when I'm trying to focus on one particular task. This also comes in handy when cleaning up the notebook. If you need to delete an entire section, you can collapse the section at the header and delete all the cells underneath it rather than deleting individual cells.
Keep the Scope Narrow
Think of notebooks like classes in object-oriented programming: each should have a single, well-defined purpose. Your notebooks should work the same way.
Here's a real example: let's say you're working on a new ML model. Don't try to cram your data exploration and model training into one massive notebook. Split it up! One notebook for understanding your data, another for building that model.
This isn't just about being neat - it's practical too. If your notebook gets too long, cloud tools like Databricks won't let you clone it (check out their sizing limits). If you're hitting that limit, it might be time to break things up a bit.
Finally, Employ the Boy Scout Rule
Before you wrap up your work, take a quick pass through your notebook and clean up any dead code or cells you don't need anymore. It makes things so much clearer for your teammates who might want to use or modify it later. You might come back to it in days, weeks, or months later. You'll be grateful you spent a few extra minutes to tidy up!