Week 13 - The Bazaar

Lessons from The Cathedral and the Bazaar Revisited

In our recent class discussions about Eric Raymond’s influential essay The Cathedral and the Bazaar, we explored the contrasting development models it describes: the traditional, centralized “cathedral” approach versus the decentralized, open “bazaar” approach. Each student shared their favorite lesson from the essay, and the diverse perspectives offered new insights.

A couple of quotes that really resonated with me were “Good programmers know what to write. Great ones know what to rewrite (and reuse),” and “Every good work of software starts by scratching a developer’s personal itch.” The first highlights the importance of leveraging existing code and not reinventing the wheel - wisdom that the open source world embraces wholeheartedly. The second quote reminded me about how I have a running list of problems, annoyances, or interesting ideas that catch my attention in daily life. When contemplating potential projects, referring back to that personal “itch” list has generated much more engaging and meaningful concepts than just trying to think of ideas from scratch.

The quote “Plan to throw one away; you will, anyhow” sparked interesting debate, with classmates interpreting it various ways. Some saw it as advice to build a disposable first iteration to work out the core concept before investing in a more robust solution. Others viewed it more as accepting that most code will inevitably be discarded or rewritten from the ground up. While the intent is ambiguous, the quote paradoxically reinforces the value of the iterative bazaar approach - build something quickly, release it, gather feedback, then revise or rebuild as needed based on real-world experience.

For those interested in reading more, my earlier post dives deeper into the key lessons from The Cathedral and the Bazaar.

Pandas, Polars, Narwhals, and the Data Frame Ecosystem

Pandas has long been a staple for working with tabular data in Python, providing powerful data structures (like the DataFrame) and data analysis tools. However, its dependency on NumPy can lead to performance bottlenecks for large datasets. This gap has given rise to alternatives like polars, a DataFrame library written in Rust that boasts impressive speeds, and many more.

Taking a different track, the narwhals project aims to provide a unified API that works across multiple DataFrame backends such as pandas, polars, Modin, cuDF, etc. By decoupling the API from the underlying engine, narwhals enables data frame-agnostic code that can leverage the best tool for each use case.

This approach exemplifies the open collaboration and remixing etched into the bazaar philosophy. The ability to mix and match components from different projects is a quintessential benefit of the bazaar model that narwhals embodies. Moreover, providing a common API that transcends any single DataFrame implementation promotes the type of user co-development that Raymond supported, as it allows the community to collectively build upon a shared interface.

Pandas Challenges and Progress

Our journey contributing to pandas has presented its share of challenges. My development environment hit a snag midway through, but a full rebuild got things working again. Furthermore, finding approachable issues to tackle, especially code issues, also proved difficult initially, though documenting some smaller fixes helped me gain familiarity with the codebase.

Slowly but surely, our group has found our stride. We’ve collectively opened and merged multiple pull requests, including modifications to pandas’ continuous integration checks and solutions to errors like SA01 for certain methods. With each contribution, we’re becoming more adept at navigating the project’s workflows and codebases.

That said, we’ve hit some speed bumps when multiple teammates submit interrelated pull requests, leading to merge conflicts that slow the integration process. We’ve also uncovered instances where pandas’ online documentation seems to reference outdated or incorrect source code, and we’ve raised issues regarding this.

Ultimately, these experiences have underscored the importance of clear communication, judicious issue triaging, and an open, collaborative spirit – lessons that resonate strongly with the ethos of the bazaar.

Written before or on April 21, 2024