Modern Python Project GitHub Template
Published:
TL;DR: I created a freely available template to facilitate setting up “modern” Python projects.
Is academic programming still needed?
Artificial intelligence (AI) applications such as large language models (LLMs) have taken the world by storm. Besides generating and editing text, they are increasingly used in coding, both as assistants and for generating computer code based on user instructions in natural language. These capabilities, together with the current widespread free accessibility of LLMs, have led to various statements announcing the “End of Human Coding”, with “AI Agents to Replace Software Engineers”. This also raises the question if programming should even be kept a part of life science curricula.
Let me make a controversial statement: writing code is not difficult – as we see with LLMs, even machines can do it quite well. What is much harder is to write original, well-crafted, and maintainable code that concisely and reproducibly addresses its underlying tasks – in short, following good (research) software engineering practices. This includes building sustainable code bases and implementing application programming interfaces supporting future changes. Additionally, it requires knowledge of the appropriate data structures, dissemination mechanisms, and FAIR (findable, accessible, interoperable, reusable) research software principles. Most importantly, it is about taking responsibility for the integrity and performance of software – something a machine such as an LLM will never be able to do.
The importance of “programming literacy”
I firmly believe that programming should not only be kept part of life science curricula, but it should actually be extended and taught much more broadly (saying this as a pharmacist/self-taught bioinformatician)! However, we need to rethink how we teach programming skills to stay “competitive” relative to LLMs.
We need to move away from producing single-use scripts for trivial tasks such as nucleotide sequence parsing (for which there are excellent libraries and which LLMs excel at) and towards a more holistic understanding of programming and data science. Such “programming literacy” needs to include a good understanding of good software craftsmanship, the importance of standardized metadata and ontologies, and the ethos of open science - how to use the right tools to describe answers to research questions in a FAIR way. This will empower students to interact with code in a knowledgeable way, instead of being helpless recipients of code produced by LLMs.
Just as the invention of the pocket calculator did not replace mathematicians, LLMs will not replace scientific programming (but they might be able to help with the “boring” stuff).
Introducing the “modern_python” repository template
I am not yet able to directly influence the curricula of life science students but there are other ways! One small contribution is to provide students with the knowledge how to properly set up Python projects in a “modern” way, to move away from “scripting” towards software engineering.
Therefore, I have created a compact GitHub template repository providing an easily-customizable basis for writing sustainable code following good coding practices. It is most appropriate for people who have a beginner-level understanding of Python and can use Git/GitHub to clone their repo and pull and push changes.
modern_python includes:
- A directory structure appropriate for package installation and publishing via e.g. PyPI.
- Metadata files implementing FAIR practices (Readme, License, Code of Conduct, Changelog, Citation file).
- Pre-commit setup enforcing linting and unit testing.
- Minimal CI/CD (Continuous Integration) using GitHub Actions.
- A pre-formatted logger.
While there are many Python-based templates available on GitHub, this template is specifically addressing life science research software, focusing on following FAIR principles.
modern_python is available on GitHub and Zenodo and released to the public domain under the Unlicense.
Future directions
This template is only a small first step in creating templates and materials to educate researchers on “programming literacy”. I will try to keep it up to date with new developments in the FAIR Open Science community and Python ecosystem (e.g. the recent popularization of the uv package manager or the ruff formatter/linter) while still keeping it concise.
Do you see anything obvious missing? Feel free to fork the project and open a pull request! I would be happy to work on this together – improving the programming skills of life science students one small step at a time.
PS: this blogpost and the described template were created without the use of AI.