I don’t understand the love of notebooks. Notebooks are useful for only one thing: live coding tutorials. I have not seen a single use-case outside live coding tutorials where a Jupyter notebook was effective. I have used notebooks, and only for that use-case. You effectively gave me the motivation to write an article about the pros and cons of notebooks, and there are lots of cons… as a lover of notebooks, please explain! I want to understand. I’m outlining my concerns below.
Why should a data scientist care so much about notebooks? I have been a data scientist for 3 years now and I have never seen the need for one. If you receive your data from whatever source, you need to understand it, then design an experiment, and when you have an idea and your hypotheses are laid down, you work with the data engineers and machine learning engineers to make it happen (and in a small team, these are all the same person). This shouldn’t be done in a notebook; you won’t be able to collaborate with the rest of the team using tools like Airflow, Git, Jenkins, Tensorflow with GPU, AWS, etc. if you write your code in a notebook (in some cases people have written plugins or packages for you to be able to do so, a bit like we have ramps and elevators for handicaped people in a wheelchair…). Not knowing about Docker becomes a huge handicap in that situation… And honestly, it’s not THAT hard. Data scientists are very smart people, they can pick it up. Framing everything around the notebook because you can’t work by writing your programs in files is ridiculous. In the end data scientists work with data, data is stored, generated and manipulated with computers, so data scientists cannot abstract away the need for knowing how to work as a developer. It handicaps them.
I’m really curious to understand why data scientists love notebooks so much. I really hate them, *except* for live coding tutorials.