Planet Python
Last update: November 19, 2024 01:43 PM UTC
November 19, 2024
Mike Driscoll
How to Debug Your Textual Application
Textual is a great Python package for creating a lightweight, powerful, text-based user interface. That means you can create a GUI in your terminal with Python without learning curses! But what happens when you encounter some problems that require debugging your application? A TUI takes over your terminal, which means you cannot see anything from Python’s print()
statement.
Wait? What about your IDE? Can that help? Actually no. When you run a TUI, you need a fully functional terminal to interact with it. PyCharm doesn’t work well with Textual. WingIDE doesn’t even have a terminal emulator. Visual Studio Code also doesn’t work out of the box, although you may be able to make it work with a custom json or yaml file. But what do you do if you can’t figure that out?
That is the crux of the problem and what you will learn about in this tutorial: How to debug Textual applications!
Getting Started
To get the most out of this tutorial, make sure you have installed Textual’s development tools by using the following command:
python -m pip install textual-dev --upgrade
Once you have the latest version of textual-dev
installed, you may continue!
Debugging with Developer Mode
When you want to debug a Textual application, you need to open two terminal windows. On Microsoft Windows, you can open two Powershell or two Command Prompts. In the first terminal, run this command:
textual console
The Textual console will listen for any Textual application running in developer mode. But first, you need some kind of application to test with. Open up your favorite Python IDE and create a new file called hello_textual.py
. Then enter the following code into it:
from textual.app import App, ComposeResult from textual.widgets import Button class WelcomeButton(App): def compose(self) -> ComposeResult: yield Button("Exit") def on_button_pressed(self) -> None: self.mount(Button("Other")) if __name__ == "__main__": app = WelcomeButton() app.run()
To run a Textual application, use the other terminal you opened earlier. The one that isn’t running Textual Console in it. Then run this command:
textual run --dev hello_textual.py
You will see the following in your terminal:
If you switch over to the other terminal, you will see a lot of output that looks something like this:
Now, if you want to test that you are reaching a part of your code in Textual, you can add a print()
function now to your on_button_pressed()
method. You can also use self.log.info()
which you can read about in the Textual documentation.
Let’s update your code to include some logging:
from textual.app import App, ComposeResult from textual.widgets import Button class WelcomeButton(App): def compose(self) -> ComposeResult: yield Button("Exit") print("The compose() method was called!") def on_button_pressed(self) -> None: self.log.info("You pressed a button") self.mount(Button("Other")) if __name__ == "__main__": app = WelcomeButton() app.run()
Now, when you run this code, you can check your Textual Console for output. The print()
statement should be in the Console without you doing anything other than running the code. You must click the button to get the log statement in the Console.
Here is what the log output will look like in the Console:
And here is an example of what you get when you print()
to the Console:
There’s not much difference here, eh? Either way, you get the information you need and if you need to print out Python objects, this can be a handy debugging tool.
If you find the output in the Console to be too verbose, you can use -x
or --exclude
to exclude log groups. Here’s an example:
textual console -x SYSTEM -x EVENT -x DEBUG -x INFO
In this version of the Textual Console, you are suppressing SYSTEM, EVENT, DEBUG, and INFO messages.
Launch your code from earlier and you will see that the output in your Console is greatly reduced:
Now, let’s learn how to use notification as a debugging tool.
Debugging with Notification
If you like using print()
statements then you will love that Textual’s App()
class provides a notify()
method. You can call it anywhere in your application using self.app.notify()
, along with a message. If you are in your App
class, you can reduce the call to simply self.notify().
Let’s take the example from earlier and update it to use the notify method instead:
from textual.app import App, ComposeResult from textual.widgets import Button class WelcomeButton(App): def compose(self) -> ComposeResult: yield Button("Exit") def on_button_pressed(self) -> None: self.mount(Button("Other")) self.notify("You pressed the button!") if __name__ == "__main__": app = WelcomeButton() app.run()
The notify()
method takes the following parameters:
message
– The message you want to display in the notificationtitle
– An optional title to add to the messageseverity
– The message’s severity, which translates to a different color for the notification. You may use “information”, “error” or “warning”timeout
– The timeout in seconds for how long to show the message
Try editing the notification to use more of these features. For example, you could update the code above to use this instead:
self.notify("You pressed the button!", title="Info Message", severity="error")
Textual’s App
class also provides a bell()
method you can call to play the system bell. You could add this to really get the user’s attention, assuming they have the system bell enabled on their computer.
Wrapping Up
Debugging your TUI application successfully is a skill. You need to know how to find errors, and Textual’s dev mode makes this easier. While it would be great if a Python IDE had a fully functional terminal built into it, that is a very niche need. So it’s great that Textual included the tooling you need to figure out your code.
Give these tips a try, and you’ll soon be able to debug your Textual applications easily!
The post How to Debug Your Textual Application appeared first on Mouse Vs Python.
November 19, 2024 01:09 PM UTC
Ned Batchelder
Loop targets
I posted a Python tidbit about how for loops can assign to other things than simple variables, and many people were surprised or even concerned:
params = {
"query": QUERY,
"page_size": 100,
}
# Get page=0, page=1, page=2, ...
for params["page"] in itertools.count():
data = requests.get(SEARCH_URL, params).json()
if not data["results"]:
break
...
This code makes successive GET requests to a URL, with a params dict as the data payload. Each request uses the same data, except the “page” item is 0, then 1, 2, and so on. It has the same effect as if we had written it:
for page_num in itertools.count():
params["page"] = page_num
data = requests.get(SEARCH_URL, params).json()
One reply asked if there was a new params dict in each iteration. No, loops in Python do not create a scope, and never make new variables. The loop target is assigned to exactly as if it were an assignment statement.
As a Python Discord helper once described it,
While loops are “if” on repeat. For loops are assignment on repeat.
A loop like for <ANYTHING> in <ITER>:
will take successive
values from <ITER>
and do an assignment exactly as this statement
would: <ANYTHING> = <VAL>
. If the assignment statement is
ok, then the for loop is ok.
We’re used to seeing for loops that do more than a simple assignment:
for i, thing in enumerate(things):
...
for x, y, z in zip(xs, ys, zs):
...
These work because Python can assign to a number of variables at once:
i, thing = 0, "hello"
x, y, z = 1, 2, 3
Assigning to a dict key (or an attribute, or a property setter, and so on) in a for loop is an example of Python having a few independent mechanisms that combine in uniform ways. We aren’t used to seeing exotic combinations, but you can reason through how they would behave, and you would be right.
You can assign to a dict key in an assignment statement, so you can assign to it in a for loop. You might decide it’s too unusual to use, but it is possible and it works.
November 19, 2024 10:40 AM UTC
November 18, 2024
PyCharm
JetBrains AI Assistant 2024.3 is here! A highlight of this release is the flexibility to choose your preferred chat model. Select between Google Gemini, OpenAI, or local models to tailor interactions for a more customized experience.
This update also brings advanced code completion for all major programming languages, improved context management, and the ability to generate inline prompts directly within the editor.
More control over your chat experience: Choose between Gemini, OpenAI, and local models
You can now select your preferred AI chat model, choosing from cloud model providers like Google Gemini and OpenAI, or connect to local models. This expanded selection allows you to customize the AI chat’s responses to your specific workflow, offering a more adaptable and personalized experience.
Google’s Gemini models now available
The lineup of LLMs used by JetBrains AI now includes Gemini 1.5 Pro 002 and Flash 002. These models are designed to deliver advanced reasoning capabilities and optimized performance for a wide range of tasks. The Pro version excels in complex applications, while Flash is tailored for high-volume, low-latency scenarios. Now, AI Assistant users can leverage the power of Gemini models alongside our in-house Mellum and OpenAI options.
Local model support via Ollama
In addition to cloud-based models, you can now connect the AI chat to local models available through Ollama. This is particularly useful for users who need more control over their AI models, offering enhanced privacy, flexibility, and the ability to run models on local hardware.
To add an Ollama model to the chat you need to enable Ollama support in AI Assistant’s settings and configure the connection to your Ollama instance.
Improved context management
In this update, we’ve made context handling in AI Assistant more transparent and intuitive. A revamped UI lets you view and manage every element included as context, providing full visibility and control. The open file and any selected code within it are now automatically added to the context, and you can easily add or remove files as needed, customizing the context to fit your workflow. Additionally, you can attach project-wide instructions to guide AI Assistant’s responses throughout your codebase.
Cloud code completion with broader language support
JetBrains has released its own large language model (LLM) model, Mellum, specifically designed to enhance cloud-based code completion for developers. This new model, specialized for coding tasks, has expanded support for several new languages, including JavaScript, TypeScript, HTML, C#, C, C++, Go, PHP, and Ruby. Now, the code completion experience is unified across JetBrains IDEs, offering syntax highlighting for suggested code, the flexibility to accept suggestions token by token or line by line, and overall reduced latency.
Local code completion enhancements: Multi-line support for Python and contextual improvements
Local code completion has significantly improved, now offering multi-line suggestions for Python. Additionally, optimizations have been made across other programming languages. For Kotlin, retrieval-augmented generation (RAG) enables the model to pull information from multiple project files, ensuring the most relevant suggestions. The support for JavaScript, TypeScript, and CSS has also seen enhancements to their existing RAG functionality. Furthermore, local code completion has been introduced for HTML.
These improvements mean that suggestions appear faster across all languages, creating a more seamless coding experience. Best of all, local code completion is included for free in your IDE, allowing you to start utilizing these powerful features immediately.
Streamlined in-editor experience with inline AI prompts
The new inline AI prompt feature in AI Assistant introduces a direct way to enter your prompts right in the editor. Just start typing your request in natural language and the AI Assistant will recognize it and generate a suggestion. Inline AI prompts are context-aware, automatically including related files and symbols for more accurate code generation. This feature supports Java, Kotlin, Scala, Groovy, JavaScript, TypeScript, Python, JSON, YAML, PHP, Ruby, and Go file formats, and is available to all AI Assistant users.
We also improved the visibility of changes applied. There is now a purple mark in the gutter next to lines changed by AI Assistant, so you can easily see what has been updated.
Make multiple file-wide updates easily
AI Assistant now offers file-wide code generation, enabling streamlined edits across an entire file. This functionality allows for modifications across multiple code sections, including adding necessary imports, updating references, and defining missing declarations.Currently available for Java and Kotlin, it is triggered by the Generate Code action when no specific selection is made in the editor, offering a seamless experience for broad, file-wide adjustments.
Get instant answers about IDE features and settings in AI Chat
Say goodbye to searching through settings or documentation! With the new /docs command, you can now access documentation-based answers directly in the AI chat. Simply ask AI Assistant about a feature, and it will provide interactive step-by-step guidance.
AI-powered quick-fix for faster error resolution
When a JetBrains IDE inspection flags a problem – whether it’s a syntax error, missing import, or something else – it suggests a quick-fix directly within the editor. With the latest update, Fix with AI takes this a step further. This new capability uses AI context awareness to suggest fixes that are more precise and applicable to your specific coding context, making it faster and easier to resolve coding problems without any manual input.
Explore AI Assistant and share your feedback
Explore these updates and let AI Assistant streamline your development workflow even further. As always, we look forward to hearing your feedback. You can also tell us about your experience via the Share your feedback link in the AI Assistant tool window or by submitting feature requests or bug reports in YouTrack.
Happy developing!
November 18, 2024 07:41 PM UTC
Python Morsels
Python's pathlib module
Python's pathlib
module is the tool to use for working with file paths. See pathlib
quick reference tables and examples.
Table of contents
- A pathlib cheat sheet
- The
open
function acceptsPath
objects - Why use a
pathlib.Path
instead of a string? - The basics: constructing paths with
pathlib
- Joining paths
- Current working directory
- Absolute paths
- Splitting up paths with
pathlib
- Listing files in a directory
- Reading and writing a whole file
- Many common operations are even easier
- No need to worry about normalizing paths
- Built-in cross-platform compatibility
- A pathlib conversion cheat sheet
- What about things
pathlib
can't do? - Should strings ever represent file paths?
- Use
pathlib
for readable cross-platform code
A pathlib cheat sheet
Below is a cheat sheet table of common pathlib.Path
operations.
The variables used in the table are defined here:
>>> import pathlib
>>> path = Path("/home/trey/proj/readme.md")
>>> relative = Path("readme.md")
>>> base = Path("/home/trey/proj")
>>> new = Path("/home/trey/proj/sub")
>>> target = path.with_suffix(".txt") # .md -> .txt
>>> pattern = "*.md"
>>> name = "sub/f.txt"
Path-related task | pathlib approach | Example |
---|---|---|
Read all file contents | path.read_text() |
'Line 1\nLine 2\n' |
Write file contents | path.write_text('new') |
Writes new to file |
Get absolute file path | relative.resolve() |
Path('/home/trey/proj/readme.md') |
Get the filename | path.name |
'readme.md' |
Get parent directory | path.parent |
Path('home/trey/proj') |
Get file extension | path.suffix |
'.md' |
Get suffix-free name | path.stem |
'readme' |
Ancestor-relative path | path.relative_to(base) |
Path('readme.md') |
Verify path is a file | path.is_file() |
True |
Make new directory | new.mkdir() |
Makes new directory |
Get current directory | Path.cwd() |
Path('/home/trey/proj') |
Get home directory | Path.home() |
Path('/home/trey') |
Get all ancestor paths | path.parents |
[Path('/home/trey/proj'), ...] |
List files/directories | base.iterdir() |
[Path('home/trey/proj/readme.md'), ...] |
Find files by pattern | base.glob(pattern) |
[Path('/home/trey/proj/readme.md')] |
Find files recursively | base.rglob(pattern) |
[Path('/home/trey/proj/readme.md')] |
Join path parts | base / name |
Path('/home/trey/proj/sub/f.txt') |
Get file size (bytes) | path.stat().st_size |
14 |
Walk the file tree | base.walk() |
Iterable of (path, subdirs, files) |
Rename file to new path | path.rename(target) |
Path object for new path |
Remove file | path.unlink() |
Note that iterdir
, glob
, rglob
, and walk
all return iterators.
The examples above show lists for convenience.
The open
function accepts Path
objects
What does Python's open
function …
Read the full article: https://www.pythonmorsels.com/pathlib-module/
November 18, 2024 05:00 PM UTC
Real Python
Interacting With Python
There are multiple ways of interacting with Python, and each can be useful for different scenarios. You can quickly explore functionality in Python’s interactive mode using the built-in Read-Eval-Print Loop (REPL), or you can write larger applications to a script file using an editor or Integrated Development Environment (IDE).
In this tutorial, you’ll learn how to:
- Use Python interactively by typing code directly into the interpreter
- Execute code contained in a script file from the command line
- Work within a Python Integrated Development Environment (IDE)
- Assess additional options, such as the Jupyter Notebook and online interpreters
Before working through this tutorial, make sure that you have a functioning Python installation at hand. Once you’re set up with that, it’s time to write some Python code!
Get Your Code: Click here to get the free sample code that you’ll use to learn about interacting with Python.
Take the Quiz: Test your knowledge with our interactive “Interacting With Python” quiz. You’ll receive a score upon completion to help you track your learning progress:
Interactive Quiz
Interacting With PythonIn this quiz, you'll test your understanding of the different ways of interacting with Python. By working through this quiz, you'll revisit key concepts related to Python interaction in interactive mode using the REPL, through Python script files, and within IDEs and code editors.
Hello, World!
There’s a long-standing custom in computer programming that the first code written in a newly installed language is a short program that displays the text Hello, World!
to the console.
In Python, running a “Hello, World!” program only takes a single line of code:
print("Hello, World!")
Here, print()
will display the text Hello, World! in quotes to your screen. In this tutorial, you’ll explore several ways to execute this code.
Running Python in Interactive Mode
The quickest way to start interacting with Python is in a Read-Eval-Print Loop (REPL) environment. This means starting up the interpreter and typing commands to it directly.
When you interact with Python in this way, the interpreter will:
- Read the command you enter
- Evaluate and execute the command
- Print the output (if any) to the console
- Loop back and repeat the process
The interactive session continues like this until you instruct the interpreter to stop. Using Python in this interactive mode is a great way to test short snippets of Python code and get more familiar with the language.
If you’re unfamiliar with this application, then you can use your operating system’s search function to find it.
After pressing Enter, you should see a response from the Python interpreter similar to the one below:
Python 3.13.0 (main, Oct 14 2024, 10:34:31) [Clang 15.0.0 (clang-1500.3.9.4)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>>
If you’re not seeing the >>>
prompt, then you’re not talking to the Python interpreter. This could be because Python is either not installed or not in the path of your terminal window session.
Note: If you need additional help to get to this point, then you can check out the How to Install Python on Your System: A Guide tutorial.
If you’re seeing the prompt, then you’re off and running! With these next steps, you’ll execute the statement that displays "Hello, World!"
to the console:
- Ensure that Python displays the
>>>
prompt, and that you position your cursor after it. - Type the command
print("Hello, World!")
exactly as shown. - Press the Enter key.
Read the full article at https://realpython.com/interacting-with-python/ »
[ Improve Your Python With 🐍 Python Tricks 💌 – Get a short & sweet Python Trick delivered to your inbox every couple of days. >> Click here to learn more and see examples ]
November 18, 2024 02:00 PM UTC
Go Deh
There's the easy way...
Best seen on a larger than landscape phone
Someone blogged about a particular problem:
The solution they blogged used a sort which meant it could not be O(n) in time, but the problem looked good so I gave it some thought.
Sets! sets are O(1) in Python and are good for looking things up.
What if when looking at the inputted numbers, one at a time, you also looked for other ints in the input that would extend the int you have to form a longer range? Keep tab of the longest range so far and if you remove ints from the pool as they form ranges, when the pool is empty, you should know the longest range.
I added the printout of the longest range too.
My code
Sample output
Another Algorithm
What if, you kept and extended ranges untill you amassed all ranges then chose the longest? I need to keep the hash lookup. dict key lookup should also be O(1). What to look up? Look up ints that would extend a range!
If you have an existing (integer) range, say 1..3 inclusive of end points then finding 0 would extend the range to 0..3 or finding one more than the range maximum, 4 would extend the original range to 1..4
So if you have ranges then they could be extended by finding rangemin - 1 or rangemax +1. I call then extends
If you do find that the next int from the input ints is also an extends value then you need to find the range that it extends, (by lookup), so you can modify that range. - use a dict to map extends to their range and checking if an int is in the extends dict keys should also take O(1) time.
I took that sketch of an algorithm and started to code. It took two evenings to finally get something that worked and I had to work out several details that were trying. The main problem was what about coalescing ranges? if you have ranges 1..2 and 4..5 what happens when you see a 3? the resultant is the single range 1..5. It took particular test cases and extensive debugging to work out that the extends2range mapping should map to potentially more than one range and that you need to combine ranges if two of them are present for any extend value being hit.
So for 1..2 the extends being looked for are 0 and 3. For 4..5 the extends being looked for are 3, again, and 6. The extends2ranges data structure for just this should look like:
The Code #2
Its Output
This second algorithm gives correct results but is harder to develop and explain. It's a testament to my stubbornness as I thought there was a solution there, and debugging was me flexing my skills to keep them honed.
END.
November 18, 2024 10:11 AM UTC
Python Software Foundation
Help power Python and join in the PSF year-end fundraiser & membership drive!
To build the future of Python and sustain the thriving community that its users deserve, we need your help. By backing the PSF, you’re investing in Python’s growth and health, and your contributions directly impact the language's future. Is your community, work, or hobby powered by Python? Join this year’s drive and power Python’s future with us by donating or becoming a Supporting Member today.
There are three ways to join in:
Your donations:
- Keep Python thriving
- Support CPython and PyPI progress
- Increase security across the Python ecosystem
- Bring the global Python community together
- Make our community more diverse and robust every year
Highlights from 2024:
- A record-making PyCon US - We produced the 21st PyCon US, in Pittsburgh, US, and online, and it was a huge success! For the first time post-2020, PyCon US 2024 sold out with over 2,500 in-person attendees.
- Advances in our Grants Program - 2024 has been a year of change and reflection for the Grants Program, starting with the addition of Marie Nordin to the grants administration team who has supported the PSF in launching several new grants initiatives. We set up Grants Program Office Hours, published a Grants Program Transparency Report for 2022 and 2024, invested in a third-party retrospective, launched a major refresh of all areas of our Grants program and updated our Grants Workgroup Charter. With more changes to come, we are thrilled to share that we awarded a record-breaking amount of grant funds in 2024!
- Empowering the Python community through Fiscal Sponsorship - We are proud to continue supporting our 20 fiscal sponsoree organizations with their initiatives and events all year round. The PSF provides 501(c)(3) tax-exempt status to fiscal sponsorees such as PyLadies and Pallets, and provides back office support so they can focus on their missions. Consider donating to your favorite PSF Fiscal Sponsoree and check out our Fiscal Sponsorees page to learn more about what each of these awesome organizations is all about!
- Connecting directly through Office Hours - The current PSF Board has decided to invest more in connecting and serving the global Python community by establishing a forum to have regular conversations. The board members of the PSF with the support of PSF staff are now holding monthly PSF Board Office Hours on the PSF Discord. The Office Hours are sessions where folks from the community can share with us how we can help your regional community, express perspectives, and provide feedback for the PSF.
- Paying more engineers to work directly on Python, PyPI, and security - We welcomed Petr Viktorin, Deputy Developer in Residence (DiR), and Serhiy Storchaka, Supporting DiR. It’s been exciting to begin to realize the full vision of the DiR program, with special thanks to Bloomberg for making it possible for us to bring Petr on board. The DiR team is taking an active role in shaping the development of the language, and with three people on the team each DiR can now also spend a percentage of their time on feature work aligned with their interests.
- Continuing to enhance Python’s security through Developers-in-Residence - Seth Larson, PSF Security Developer in Residence (DiR) had a busy year thanks to continued support from Alpha-Omega. Seth worked on a variety of projects including the creation of SBOMs for Source and Windows CPython artifacts, implementing build reproducibility for CPython source artifacts, and auditing and migrating Sigstore, to name just a few. Check out Seth's blog to keep up to date with his work. Mike Fiedler, PyPI Safety & Security Engineer, also worked on a variety of projects such as two-factor authentication for all users on PyPI, an audit of PyPI, made significant progress on malware response and reporting, collaborated on the PSF’s submission for the Cybersecurity and Infrastructure Security Agency (CISA)’s Request for Information (RFI), and more! Thanks to AWS and Georgetown for making Mike’s PyPI security accomplishments possible. Stay up to date with Mike's work on the PyPI blog.
- New PSF Staff dedicated to critical infrastructure - We established the PyPI Support Specialist role, filled by Maria Ashna. Over the past 23 years, PyPI has seen essentially exponential growth in traffic and users, relying for the most part on volunteers to support it. The load far outstretched volunteers and prior staff capacity, so we are very excited to have Maria on board. We also filled our Infrastructure Engineer role, welcoming Jacob Coffee to the team, to ensure PSF-maintained systems and services are running smoothly.
We appreciate you and we’re so excited to see where we can go together in the year to come!
November 18, 2024 09:56 AM UTC
Python Bytes
#410 Entering the Django core
<strong>Topics covered in this episode:</strong><br> <ul> <li><strong><a href="https://buttondown.com/carlton/archive/thoughts-on-djangos-core/?featured_on=pythonbytes">Thoughts on Django’s Core</a></strong></li> <li><strong><a href="https://pypi.org/project/futurepool/?featured_on=pythonbytes">futurepool</a></strong></li> <li><strong><a href="https://snarky.ca/dont-use-named-tuples-in-new-apis/?featured_on=pythonbytes">Don't return named tuples in new APIs</a></strong></li> <li><strong><a href="https://ziglang.org/news/migrate-to-self-hosting/?featured_on=pythonbytes">Ziglang: Migrating from AWS to Self-Hosting</a></strong></li> <li><strong>Extras</strong></li> <li><strong>Joke</strong></li> </ul><a href='https://www.youtube.com/watch?v=j-q31u9G3Ds' style='font-weight: bold;'data-umami-event="Livestream-Past" data-umami-event-episode="410">Watch on YouTube</a><br> <p><strong>About the show</strong></p> <p>Sponsored by us! Support our work through:</p> <ul> <li>Our <a href="https://training.talkpython.fm/?featured_on=pythonbytes"><strong>courses at Talk Python Training</strong></a></li> <li><a href="https://courses.pythontest.com/p/the-complete-pytest-course?featured_on=pythonbytes"><strong>The Complete pytest Course</strong></a></li> <li><a href="https://www.patreon.com/pythonbytes"><strong>Patreon Supporters</strong></a></li> </ul> <p><strong>Connect with the hosts</strong></p> <ul> <li>Michael: <a href="https://fosstodon.org/@mkennedy"><strong>@mkennedy@fosstodon.org</strong></a> <strong>/</strong> <a href="https://bsky.app/profile/mkennedy.codes?featured_on=pythonbytes"><strong>@mkennedy.codes</strong></a></li> <li>Brian: <a href="https://fosstodon.org/@brianokken"><strong>@brianokken@fosstodon.org</strong></a> <strong>/</strong> <a href="https://bsky.app/profile/brianokken.bsky.social?featured_on=pythonbytes"><strong>@brianokken.bsky.social</strong></a></li> <li>Show: <a href="https://fosstodon.org/@pythonbytes"><strong>@pythonbytes@fosstodon.org</strong></a> <strong>/</strong> <a href="https://bsky.app/profile/pythonbytes.bsky.social"><strong>@pythonbytes.bsky.social</strong></a></li> </ul> <p>Join us on YouTube at <a href="https://pythonbytes.fm/stream/live"><strong>pythonbytes.fm/live</strong></a> to be part of the audience. Usually <strong>Monday</strong> at 10am PT. Older video versions available there too.</p> <p>Finally, if you want an artisanal, hand-crafted digest of every week of the show notes in email form? Add your name and email to <a href="https://pythonbytes.fm/friends-of-the-show">our friends of the show list</a>, we'll never share it. </p> <p><strong>Brian #1:</strong> <a href="https://buttondown.com/carlton/archive/thoughts-on-djangos-core/?featured_on=pythonbytes">Thoughts on Django’s Core</a></p> <ul> <li>Carlton Gibson</li> <li>Great discussion on <ul> <li>Django and Core vs Plugins</li> <li>Sustainability with limited people</li> <li>Keeping core small</li> <li>The release cycle</li> <li>eembrace plugins vs endorsing plugins.</li> </ul></li> </ul> <p><strong>Michael #2:</strong> <a href="https://pypi.org/project/futurepool/?featured_on=pythonbytes">futurepool</a></p> <ul> <li>via Pat Decker</li> <li>Takes the concept of multiprocessing Pool to the async/await world.</li> <li><p>Create a pool then delegate the work:</p> <pre><code>async with FuturePool(2) as fp: result = await fp.map(async_pool_fn, range(10)) </code></pre></li> <li><p>I would LOVE to see something like this in a broader background asyncio worker pool concept.</p></li> <li>But that concept doesn’t exist in asyncio in Python and that’s a failing of the framework IMO.</li> </ul> <p><strong>Brian #3:</strong> <a href="https://snarky.ca/dont-use-named-tuples-in-new-apis/?featured_on=pythonbytes">Don't return named tuples in new APIs</a></p> <ul> <li>Brett Cannon</li> <li>First off, I’m grateful for any post that talks about APIs and the API is a module, class, or package API and not a Web/REST API. The term API existed long before the internet.</li> <li>“e.g., get_mouse_position() very likely has a two-item tuple of X and Y coordinates of the screen”</li> <li>“it actually makes your API more complex for both you and your users <em>to use</em>. For you, it doubles the data access API surface for your return type as you have to now support index-based and attribute-based data access forever (or until you choose to break your users and change your return type so it doesn't support both approaches)”</li> <li>“… you probably don't want people doing with your return type, like slicing, iterating over all the items …”</li> <li>Alternatives <ul> <li>class</li> <li>dataclass</li> <li>dictionary</li> <li>TypedDict</li> <li>SimpleNamespace</li> </ul></li> <li>“My key point in all of this is to prefer readability and ergonomics over brevity in your code. That means avoiding named tuples except where you are expanding to tweaking an existing API where the named tuple improves over the plain tuple that's already being used.”</li> </ul> <p><strong>Michael #4:</strong> <a href="https://ziglang.org/news/migrate-to-self-hosting/?featured_on=pythonbytes">Ziglang: Migrating from AWS to Self-Hosting</a></p> <ul> <li>The Rust Foundation for example, reports that they spent $404,400 on infrastructure costs in 2023.</li> <li>Zig lang has decided to use a single big cloud machine + mirrors</li> </ul> <p><strong>Extras</strong> </p> <p>Brian:</p> <ul> <li>Changing the Python Test community <ul> <li>Was started to answer questions for Test & Code listeners years ago. </li> <li>Primarily pytest questions</li> <li>Used to be Slack. Then moved to Podia forum. </li> <li>Now I’m trying to work out a Discord solution that is both sustainable and usable.</li> </ul></li> </ul> <p>Michael:</p> <ul> <li><a href="https://bsky.app/profile/wang.social/post/3lb346uyzdc2r?featured_on=pythonbytes">PWang Bsky essay</a></li> <li><a href="https://theworkitem.com/blog/building-a-business-from-python-expertise-michael-kennedy/?featured_on=pythonbytes">Building A Business From Python Expertise - Michael Kennedy on Work Item Podcast</a></li> <li>Subscribe to package releases, just put .atom on the end of their releases URL, for example: <ul> <li><a href="https://github.com/mikeckennedy/jinja_partials/releases?featured_on=pythonbytes">github.com/mikeckennedy/jinja_partials/releases</a> ← add .atom for RSS</li> </ul></li> <li><a href="https://pypi.org/project/pytest-bdd/8.0.0/#data">pytest-bdd 8.0.0</a> was just released via Jamie Thomson <ul> <li>The big feature (in Jamie’s opinion) is the addition of data tables https://github.com/pytest-dev/pytest-bdd/blob/master/CHANGES.rst#800---2024-11-14</li> </ul></li> </ul> <p><strong>Joke:</strong> <a href="https://devhumor.com/media/breaking-javascript-developer-commits-to-framework-for-record-breaking-3-weeks?featured_on=pythonbytes">Breaking: JavaScript Developer Commits to Framework for Record-Breaking 3 Weeks</a></p>
November 18, 2024 08:00 AM UTC
James Bennett
Introducing DjangoVer
Version numbering is hard, and there are lots of popular schemes out there for how to do it. Today I want to talk about a system I’ve settled on for my own Django-related packages, and which I’m calling “DjangoVer”, because it ties the version number of a Django-related package to the latest Django version that package supports.
But one quick note to start with: this is not really “introducing” the idea of DjangoVer, because I know I’ve used the name a few times already in other places. I’m also not the person who invented this, and I don’t know for certain who did — I’ve seen several packages which appear to follow some form of DjangoVer and took inspiration from them in defining my own take on it.
Django’s version scheme: an overview
The basic idea of DjangoVer is that the version number of a Django-related package should tell you which version of Django you can use it with. Which probably doesn’t help much if you don’t know how Django releases are numbered, so let’s start there. In brief:
- Django issues a “feature release” — one which introduces new features — roughly once every eight months. The current feature release series of Django is 5.1.
- Django issues “bugfix releases” — which fix bugs in one or more feature releases — roughly once each month. As I write this, the latest bugfix release for the 5.1 feature release series is 5.1.3 (along with Django 5.0.9 for the 5.0 feature release series, and Django 4.2.16 for the 4.2 feature release series).
- The version number scheme is
MAJOR.FEATURE.BUGFIX
, whereMAJOR
,FEATURE
, andBUGFIX
are integers. - The
FEATURE
component starts at0
, then increments to1
, then to2
, thenMAJOR
is incremented andFEATURE
goes back to0
.BUGFIX
starts at0
with each new feature release, and increments for the bugfix releases for that feature release. - Every feature release whose
FEATURE
component is2
is a long-term support (“LTS”) release.
This has been in effect since Django 2.0 was released, and the feature releases have been: 2.0, 2.1, 2.2 (LTS); 3.0, 3.1, 3.2 (LTS); 4.0, 4.1, 4.2 (LTS); 5.0, 5.1. Django 5.2 (LTS) is expected in April 2025, and then eight months later (if nothing is changed) will come Django 6.0.
I’ll talk more about SemVer in a bit, but it’s worth being crystal clear that Django does not follow Semantic Versioning, and the MAJOR
number is not a signal about API compatibility. Instead, API compatibility runs LTS-to-LTS, with a simple principle: if your code runs on a Django LTS release and raises no deprecation warnings, it will run unmodified on the next LTS release. So, for example, if you have an application that runs without deprecation warnings on Django 4.2 LTS, it will run unmodified on Django 5.2 LTS (though at that point it might begin raising new deprecation warnings, and you’d need to clear them before it would be safe to upgrade any further).
DjangoVer, defined
In DjangoVer, a Django-related package has a version number of the form DJANGO_MAJOR.DJANGO_FEATURE.PACKAGE_VERSION
, where DJANGO_MAJOR
and DJANGO_FEATURE
indicate the most recent feature release series of Django supported by the package, and PACKAGE_VERSION
begins at zero and increments by one with each release of the package supporting that feature release of Django.
Since the version number only indicates the newest Django feature release supported, a package using DjangoVer should also use Python package classifiers to indicate the full range of its Django support (such as Framework :: Django :: 5.1
to indicate support for Django 5.1 — see examples on PyPI).
But while Django takes care to maintain compatibility from one LTS to the next, I do not think DjangoVer packages need to do that; they can use the simpler approach of issuing deprecation warnings for two releases, and then making the breaking change. One of the stated reasons for Django’s LTS-to-LTS compatibility policy is to help third-party packages have an easier time supporting Django releases that people are actually likely to use; otherwise, Django itself generally just follows the “deprecate for two releases, then remove it” pattern. No matter what compatibility policy is chosen, however, it should be documented clearly, since DjangoVer explicitly does not attempt to provide any information about API stability/compatibility in the version number.
That’s a bit wordy, so let’s try an example:
- If you started a new Django-related package today, you’d (hopefully) support the most recent Django feature release, which is 5.1. So the DjangoVer version of your package should be
5.1.0
. - As long as Django 5.1 is the newest Django feature release you support, you’d increment the third digit of the version number. As you add features or fix bugs you’d release
5.1.1
,5.1.2
, etc. - When Django 5.2 comes out next year, you’d (hopefully) add support for it. When you do, you’d set your package’s version number to
5.2.0
. This would be followed by5.2.1
,5.2.2
, etc., and then eight months later by6.0.0
to support Django 6.0. - If version
5.1.0
of your package supports Django 5.1, 5.0, and 4.2 (the feature releases receiving upstream support from Django at the time of the 5.1 release), it should indicate that by including theFramework :: Django
,Framework :: Django :: 4.2
,Framework :: Django :: 5.0
, andFramework :: Django :: 5.1
classifiers in its package metadata.
Why another version system?
Some of you probably didn’t even read this far before rushing to instantly post the XKCD “Standards” comic as a reply. Thank you in advance for letting the rest of us know we don’t need to bother listening to or engaging with you. For everyone else: here’s why I think in this case adding yet another “standard” is actually a good idea.
The elephant in the room here is Semantic Versioning (“SemVer”). Others have written about some of the problems with SemVer, but I’ll add my own two cents here: “compatibility” is far too complex and nebulous a concept to be usefully encoded in a simple value like a version number. And if you want my really cynical take, the actual point of SemVer in practice is to protect developers of software from users, by providing endless loopholes and ways to say “sure, this change broke your code, but that doesn’t count as a breaking change”. It’ll turn out that the developer had a different interpretation of the documentation than you did, or that the API contract was “underspecified” and now has been “clarified”, or they’ll just throw their hands up, yell “Hyrum’s Law” and say they can’t possibly be expected to preserve that behavior.
A lot of this is rooted in the belief that changes, and especially breaking changes, are inherently bad and shameful, and that if you introduce them you’re a bad developer who should be ashamed. Which is, frankly, bullshit. Useful software almost always evolves and changes over time, and it’s unrealistic to expect it not to. I wrote about this a few years back in the context of the Python 2/3 transition:
Though there is one thing I think gets overlooked a lot: usually, the anti-Python-3 argument is presented as the desire of a particular company, or project, or person, to stand still and buck the trend of the world to be ever-changing.
But really they’re asking for the inverse of that. Rather than being a fixed point in a constantly-changing world, what they really seem to want is to be the only ones still moving in a world that has become static around them. If only the Python team would stop fiddling with the language! If only the maintainers of popular frameworks would stop evolving their APIs! Then we could finally stop worrying about our dependencies and get on with our real work! Of course, it’s logically impossible for each one of those entities to be the sole mover in a static world, but pointing that out doesn’t always go well.
But that’s a rant for another day and another full post all its own. For now it’s enough to just say I don’t believe SemVer can ever deliver on what it promises. So where does that leave us?
Well, if the version number can’t tell you whether it’s safe to upgrade from one version to another, perhaps it can still tell you something useful. And for me, when I’m evaluating a piece of third-party software for possible use, one of the most important things I want to know is: is someone actually maintaining this? There are lots of potential signals to look for, but some version schemes — like CalVer — can encode this into the version number. Want to know if the software’s maintained? With CalVer you can guess a package’s maintenance status, with pretty good accuracy, from a glance at the version number.
Over the course of this year I’ve been transitioning all my personal non-Django packages to CalVer for precisely this reason. Compatibility, again, is something I think can’t possibly be encoded into a version number, but “someone’s keeping an eye on this” can be. Even if I’m not adding features to something, Python itself does a new version every year and I’ll push a new release to explicitly mark compatibility (as I did recently for the release of Python 3.13). That’ll bump the version number and let anyone who takes a quick glance at it know I’m still there and paying attention to the package.
For packages meant to be used with Django, though, the version number can usefully encode another piece of information: not just “is someone maintaining this”, but “can I use this with my Django installation”. And that is what DjangoVer is about: telling you at a glance the maintenance and Django compatibility status of a package.
DjangoVer in practice
All of my own personal Django-related packages are now using DjangoVer, and say so in their documentation. If I start any new Django-related projects they’ll do the same thing.
A quick scroll through PyPI turns up other packages doing something that looks similar; django-cockroachdb and django-snowflake, for example, versioned their Django 5.1 packages as “5.1”, and explicitly say in their READMEs to install a package version corresponding to the Django version you use (they also have a maintainer in common, who I suspect of having been an early inventor of what I’m now calling “DjangoVer”).
If you maintain a Django-related package, I’d encourage you to at least think about adopting some form of DjangoVer, too. I won’t say it’s the best, period, because something better could always come along, but in terms of information that can be usefully encoded into the version number, I think DjangoVer is the best option I’ve seen for Django-related packages.
November 18, 2024 02:04 AM UTC
Armin Ronacher
Playground Wisdom: Threads Beat Async/Await
It's been a few years since I wrote about my challenges with async/await-based systems and how they just seem to not support back pressure well. A few years later, I do not think that this problem has subsided much, but my thinking and understanding have perhaps evolved a bit. I'm now convinced that async/await is, in fact, a bad abstraction for most languages, and we should be aiming for something better instead and that I believe to be thread.
In this post, I'm also going to rehash many arguments from very clever people that came before me. Nothing here is new, I just hope to bring it to a new group of readers. In particular, you should really consider these who highly influential pieces:
- Bob Nystrom's What Color is Your Function post, which makes a very strong case that having two types of functions, which are only compatible in one direction, causes problems.
- Ron Pressler's Please stop polluting our imperative languages with pure concepts which I think is probably the single most important talk on that topic.
- Nathaniel J. Smith's Notes on structured concurrency, or: Go statement considered harmful which does a really good job laying out the motivation for structured concurrency.
Your Child Loves Actor Frameworks
As programmers, we are so used to how things work that we make some implicit assumptions that really cloud our ability to think freely. Let me present you with a piece of code that demonstrates this:
def move_mouse():
while mouse.x < 200:
mouse.x += 5
sleep(10)
def move_cat():
while cat.x < 200:
cat.x += 10
sleep(10)
move_mouse()
move_cat()
Read that code and then answer this question: do the mouse and cat move at the same time, or one after another? I guarantee you that 10 out of 10 programmers will correctly state that they move one after another. It makes sense because we know Python and the concept of threads, scheduling and whatnot. But if you speak to a group of children familiar with Scratch, they are likely to conclude that mouse and cat move simultaneously.
The reason is that if you are exposed to programming via Scratch you are exposed to a primitive form of actor programming. The cat and the mouse are both actors. In fact, the UI makes this pretty damn clear, just that the actors are called “sprites”. You attach logic to a sprite on the screen and all these pieces of logic run at the same time. Mind-blowing. You can even send messages from sprite to sprite.
The reason I want you to think about this for a moment is that I think this is rather profound. Scratch is a very, very simple system and it's intended to teaching programming to young kids. Yet the model it promotes is an actor system! If you were to foray into programming via a traditional book on Python, C# or some other language, it's quite likely that you will only learn about threads at the very end. Not just that, it will likely make it sound really complex and scary. Worse, you will probably only learn about actor patterns in some advanced book that will bombard you with all the complexities of large scale applications.
There is something else though you should keep in mind: Scratch will not talk about threads, it will not talk about monads, it will not talk about async/await, it will not talk about schedulers. As far as you are concerned as a programmer, it's an imperative (though colorful and visual) language with some basic “syntax” support for message passing. Concurrency comes natural. A child can program it. It's not something to be afraid of.
Imperative Programming Is Not Inferior
The second thing I want you to take away is that imperative languages are not inferior to functional ones.
While probably most of us are using imperative programming languages to solve problems, I think we all have been exposed to the notion that it's inferior and not particularly pure. There is this world of functional programming, with monads and other things. This world have these nice things involving composition, logic and maths and fancy looking theorems. If you program in that, you're almost transcending to a higher plane and looking down to the folks who are stitching together if statements, for loops, make side effects everywhere, and are doing highly inappropriate things with IO.
Okay, maybe it's not quite as bad, but I don't think I'm completely wrong with those vibes. And look, I get it. I feel happy chaining together lambdas in Rust and JavaScript. But we should also be aware that these constructs are, in many languages, bolted on. Go, for instance, gets away without most of this, and that does not make it an inferior language!
So what you should keep in mind here is that there are different paradigms, and mentally you should try to stop thinking for a moment that functional programming has all its stuff figured out, and imperative programming does not.
Instead, I want to talk about how functional languages and imperative languages are dealing with “waiting”.
The first thing I want to back to is the example from above. Both of the functions (for the cat and the mouse) can be seen as separate threads of execution. When the code calls sleep(10) there's clearly an expectation by the programmer that the computer will temporarily pause the execution and continue later. I don't want to bore you with monads, so as my “functional” programming language, I will use JavaScript and promises. I think that's an abstraction that most readers will be sufficiently familiar with:
function moveMouseBlocking() {
while (mouse.x < 200) {
mouse.x += 5;
sleep(10); // a blocking sleep
}
}
function moveMouseAsync() {
return new Promise((resolve) => {
function iterate() {
if (mouse.x < 200) {
mouse.x += 5;
sleep(10).then(iterate); // non blocking sleep
} else {
resolve();
}
}
iterate();
});
}
You can immediately see a challenge here: it's very hard to translate the blocking example into a non blocking example because all the sudden we need to find a way to express our loop (or really any control flow). We need to manually decompose it into a form of recursive function calling and we need the help of a scheduler and executor here to do the waiting.
This style obviously eventually became annoying enough to deal with that async/await was introduced to mostly restore the sanity of the old code. So it now can look more like this:
async function moveMouseAsync() {
while (mouse.x < 200) {
mouse.x += 5;
await sleep(10);
}
}
Behind the scenes though, nothing has really changed, and in particular, when you call that function, you just get an object that encompasses the “composition of the computation”. That object is a promise which will eventually hold the resulting value. In fact, in some languages like C#, the compiler will really just transpile this into chained function calls. With the promise in hand, you can await the result, or register a callback with then which gets invoked if this thing ever runs to completion.
For a programmer, I think async/await is clearly understood as some sort of neat abstraction — an abstraction over promises and callbacks. However strictly speaking, it's just worse than where we started out, because in terms of expressiveness, we have lost an important affordance: we cannot freely suspend.
In the original blocking code, when we invoked sleep we suspended for 10 milliseconds implicitly; we cannot do the same with the async call. Here we have to “await” the sleep operation. This is the crucial aspect of why we're having these “colored functions”. Only an async function can call another async function, as you cannot await in a sync function.
Halting Problems
The above example shows another problem that async/await causes: what if we never resolve? A normal function call eventually returns, the stack unwinds, and we're ready to receive the result. In an async world, someone has to call resolve at the very end. What if that is never called? Now in theory, that does not seem all that different from someone calling sleep() with a large number to suspend for a very long time, or waiting on a pipe that never gets data sent into. But it is different! In one case, we keep the call stack and everything that relates to it alive; in another case, we just have a promise and are waiting for independent garbage collection with everything already unwound.
Contract wise, there is absolutely nothing that says one has to call resolve. As we know from theory the halting problem is undecidable so it's going to be actually impossible to know if someone will call resolve or not.
That sounds pedantic, but it's very important because promises/futures and async/await are making something strictly worse than not having them. Let's consider a JavaScript promise to be the most canonical example of what this looks like. A promise is created by an anonymous function, that is invoked to eventually call resolve. Take this example:
let neverSettle = new Promise((resolve) => {
// this function ends, but we never called resolve
});
Let me clarify first that this is not a JavaScript specific problem, but it's nice to show it this way. This is a completely legal thing! It's a promise, that never resolves. That is not a bug! The anonymous function in the promise itself will return, the stack will unwind, and we are left with a “pending” promise that will eventually get garbage collected. That is a bit of a problem because since it will never resolve, you can also never await it.
Think of the following example, which demonstrates this problem a bit. In practice you might want to reduce how many things can work at once, so let's imagine a system that can handle up to 10 things that run concurrently. So we might want to use a semaphore to give out 10 tokens so up to 10 things can run at once; otherwise, it applies back pressure. So the code looks like this:
const semaphore = new Semaphore(10);
async function execute(f) {
let token = await semaphore.acquire();
try {
await f();
} finally {
await semaphore.release(token);
}
}
But now we have a problem. What if the function passed to the execute function returns neverSettle? Well, clearly we will never release the semaphore token. This is strictly worse compared to blocking functions! The closest equivalent would be a stupid function that calls a very long running sleep. But it's different! In one case, we keep the call stack and everything that relates to it alive; in the other case case we just have a promise that will eventually get garbage collected, and we will never see it again. In the promise case, we have effectively decided that the stack is not useful.
There are ways to fix this, like making promise finalization available so we can get informed if a promise gets garbage collected etc. However I want to point out that as per contract, what this promise is doing is completely acceptable and we have just caused a new problem, one that we did not have before.
And if you think Python does not have that problem, it does too. Just await Future() and you will be waiting until the heat death of the universe (or really when you shut down your interpreter).
The promise that sits there unresolved has no call stack. But that problem also comes back in other ways, even if you use it correctly. The decomposed functions calling functions via the scheduler flow means that now you need extra affordances to stitch these async calls together into full call stacks. This all creates extra problems that did not exist before. Call stacks are really, really important. They help with debugging and are also crucial for profiling.
Blocking is an Abstraction
Okay, so we know there is at least some challenge with the promise model. What other abstractions are there? I will make the argument that a function being able to “suspend” a thread of execution is a bloody great capability and abstraction. Think of it for a moment: no matter where I am, I can say I need to wait for something and continue later where I left off. This is particularly crucial to apply back-pressure if you decide to need it later. The biggest footgun in Python asyncio remains that write is non blocking. That function will stay problematic forever and you need to follow up with await s.drain() to avoid buffer bloat.
In particular it's an important abstraction because in the real world we have constantly faced with things in fact not being async all the time, and some of the things we think might not block, will in fact block. Just like Python did not think that write should be able to block when it was designed. I want to give you a colorful example of this. Why is the following code blocking, and what is?
def decode_object(idx):
header = indexes[idx]
object_buf = buffer[header.start:header.start + header.size]
return brotli.decompress(object_buf)
It's a bit of a trick question, but not really. The reason it's blocking is because memory access can be blocking! You might not think of it this way, but there are many reasons why just touching a memory region can take time. The most obvious one is memory-mapped files. If you're touching a page that hasn't been loaded yet, the operating system will have to shovel it into memory before returning back to you. There is no “await touching this memory” expression, because if there were, we would have to await everywhere. That might sound petty but blocking memory reads were at the source of a series of incidents at Sentry [1].
The trade-off that async/await makes today is that the idea is that not everything needs to block or needs to suspend. The reality, however, has shown me that many more things really want to suspend, and if a random memory access is a case for suspending, then is the abstraction worth anything?
So maybe to allow any function call block and suspend really was the right abstraction to begin with.
But then we need to talk about spawning threads next, because a single thread is not worth much. The one affordance that async/await system gives you that you don't have otherwise, is actually telling two things to run concurrently. You get that by starting the async operation and deferring the awaiting to later. This is where I will have to concede that async/await has something going for it. It moves the reality of concurrent execution right into the language. The reason concurrency comes so natural to a Scratch programmer is that it's right there, so async/await solves a very similar purpose here.
In a traditional imperative language based on threads, the act of spawning a thread is usually hidden behind a (often convoluted) standard library function. More annoyingly threads very much feel bolted on and completely inadequate to even to the most basic of operations. Because not only do we want to spawn threads, we want to join on them, we want to send values across thread boundaries (including errors!). We want to wait for either a task to be done, or a keyboard input, messages being passed etc.
Classic Threading
So lets focus on threads for a second. As said before, what we are looking for is the ability for any function to yield / suspend. That's what threads allow us to do!
When I am talking about “threads” here, I'm not necessarily referring to a specific kind of implementation of threads. Think of the example of promises from above for a moment: we had the concept of “sleeping”, but we did not really say how that is implemented. There is clearly some underlying scheduler that can enable that, but how that takes places is outside the scope of the language. Threads can be like that. They could be real OS threads, they could be virtual and be implemented with fibers or coroutines. At the end of the day, we don't necessarily have to care about it as developer if the language gets it right.
The reason this matters is that when I talk about “suspending” or “continuing somewhere else,” immediately the thought of coroutines and fibers come to mind. That's because many languages that support them give you those capabilities. But it's good to step back for a second and just think about general affordances that we want, and not how they are implemented.
We need a way to say: run this concurrently, but don't wait for it to return, we want to wait later (or never!). Basically, the equivalent in some languages to call an async function, but to not await. In other words: to schedule a function call. And that is, in essence, just what spawning a thread is. If we think about Scratch: one of the reasons concurrency comes natural there is because it's really well integrated, and a core affordance of the language. There is a real programming language that works very much the same: go with its goroutines. There is syntax for it!
So now we can spawn, and that thing runs. But now we have more problems to solve: synchronization, waiting, message passing and all that jazz are not solved. Even Scratch has answers to that! So clearly there is something else missing to make this work. And what even does that spawn call return?
A Detour: What is Async Even
There is an irony in async/await and that irony is that it exists in multiple languages, it looks completely the same on the surface, but works completely different under the hood. Not only that, the origin stories of async/await in different languages are not even the same.
I mentioned earlier that code that can arbitrary block is an abstraction of sorts. That abstraction for many applications really only makes sense is if the CPU time while you're blocking can be used in other useful ways. On the one hand, because the computer would be pretty bored if it was only doing things in sequence, on the other hand, because we might need things to run in parallel. At times as programmers we need to do two things to make progress simultaneously before we can continue. Enter creating more threads. But if threads are so great, why all that talking about coroutines and promises that underpins so much of async/await in different languages?
I think this is the point where the story actually becomes confusing quickly. For instance JavaScript has entirely different challenges than Python, C# or Rust. Yet somehow all those languages ended up with a form of async/await.
Let's start with JavaScript. JavaScript is a single threaded language where a function scope cannot yield. There is no affordance in the language to do that and threads do not exist. So before async/await, the best you could do is different forms of callback hell. The first iteration of improving that experience was adding promises. async/await only became sugar for that afterward. The reason that JavaScript did not have much choice here is that promises was the only thing that could be accomplished without language changes, and async/await is something that can be implemented as a transpilation step. So really; there are no threads in JavaScript. But here is an interesting thing that happens: JavaScript on the language level has the concept of concurrency. If you call setTimeout, you tell the runtime to schedule a function to be called later. This is crucial! In particular it also means that a promise created, will be scheduled automatically. Even if you forget about it, it will run!
Python on the other hand had a completely different origin story. In the days before async/await, Python already had threads — real, operating system level threads. What it did not have however was the ability for multiple of those threads to run in parallel. The reason for this obviously the GIL (Global Interpreter Lock). However that “just” makes things not to scale to more than one core, so let's ignore that for a second. Because it had threads, it also rather early had people experiment with implementing virtual threads in Python. Back in the day (and to some extend today) the cost of an OS level thread was pretty high, so virtual threads were seen as a fast way to spawn more of these concurrent things. There were two ways in which Python got virtual threads. One was the Stackless Python project, which was an alternative implementation of Python (many patches for cpython rather) that implemented what's called a “stackless VM” (basically a VM that does not maintain a C stack). In short, what that enabled is implementing something that stackless called “tasklets” which were functions that could be suspended and resumed. Stackless did not have a bright future because the stackless nature meant that you could not have interleaving Python -> C -> Python calls and suspend with them on the stack.
There was a second attempt in Python called “greenlet”. The way greenlet worked was implementing coroutines in a custom extension module. It is pretty gnarly in its implementation, but it does allow for cooperative multi tasking. However, like stackless, that did not win out. Instead, what actually happened is that the generator system that Python had for years was gradually upgraded into a coroutine system with syntax support, and the async system was built on top of that.
One of the consequences of this is that it requires syntax support to suspend from a coroutine. This meant that you cannot implement a function like sleep that, when called, yields to a scheduler. You need to await it (or in earlier times you could use yield from). So we ended up with async/await because of how coroutines work in Python under the hood. The motivation for this was that it was seen as a positive thing that you know when something suspends.
One interesting consequence of the Python coroutine model is that at least on the coroutine model it can transcend OS level threads. I could make a coroutine on one thread, ship it off to another, and continue it there. In practice, that does not work because once hooked up with the IO system, it cannot travel to another event loop on anther thread any more. But you can already see that fundamentally it does something quite different to JavaScript. It can travel between threads at least in theory; there are threads; there is syntax to yield. A coroutine in Python will also start out with not running, unlike in JavaScript where it's effectively always scheduled. This is also in parts because the scheduler in python can be swapped out, and there are competing and incompatible implementations.
Lastly let's talk about C#. Here the origin story is once again entirely different. C# has real threads. Not only does it have real threads, it also has per-object locks and absolutely no problems with dealing with multiple threads running in parallel. But that does not mean that it does not have other issues. The reality is that threads alone are just not enough. You need to synchronize and talk between threads quite often and sometimes you just need to wait. For instance you need to wait for user input. You still want to do something, while you're stuck there processing that input. So over time .NET introduced “tasks” which are an abstraction over async operations. They are part of the .NET threading system and the way you interact with them is that you write your code in there, you can suspend from tasks with syntax. .NET will run the task on the current thread, and if you do some blocking you stay blocked. This is in that sense, quite different from JavaScript where while no new “thread” is created, you pend the execution in the scheduler. The reason it works this way in .NET is that some of the motivation of this system was to allow UI triggered code to access the main UI thread without blocking it. But the consequence again is, that if you block for real, you just screwed something up. That however is also why at least at one point what C# did was just to splice functions into chained closures whenever it hit an await. It just decomposes one logical piece of code into many separate functions.
I really don't want to go into Rust, but Rust's async system is probably the weirdest of them all because it's polling-based. In short: unless you actively “wait” for a task to complete, it will not make progress. So the purpose of a scheduler there is to make sure that a task actually can make progress. Why did rust end up with async/await? Primarily because they wanted something that works without a runtime and a scheduler and the limitations of the borrow checker and memory model.
Of all those languages, I think the argument for async/await is the strongest for Rust and JavaScript. Rust because it's a systems language and they wanted a design that works with a limited runtime. JavaScript to me also makes sense because the language does not have real threads, so the only alternative to async/await is callbacks. But for C# the argument seems much weaker. Even the problem of having to force code to run on the UI thread could be just used by having a scheduling policy for virtual threads. The worst offender here in my mind is Python. async/await has ended up with a really complex system where the language now has coroutines and real threads, different synchronization primitives for each and async tasks that end up being pinned to one OS thread. The language even has different futures in the standard library for threads and async tasks!
The reason I wanted you to understand all this is that all these different languages share the same syntax, yet what you can do with it is completely different. What they all have in common is that async functions can only be called by async functions (or the scheduler).
What Async Isn't
Over the years I heard a lot of arguments about why for instance Python ended up with async/await and some of the arguments presented don't hold up to scrutiny from my perspective. One argument that I have heard repeatedly is that if you control when you suspend, you don't need to deal with locking or synchronization. While there is some truth to that (you don't randomly suspend), you still end up with having to lock. There is still concurrency so you need to still protect all your stuff. In Python in particular this is particularly frustrating because not only do you have colored functions, you also have colored locks. There are locks for threads and there are locks for async code, and they are different.
There is a very good reason why I showed the example above of the semaphore: semaphores are real in async programming. They are very often needed to protect a system from taking on too much work. In fact, one of the core challenges that many async/await-based programs suffer from is bloating buffers because there is an inability to exert back pressure (I once again point you to my post on that). Why can they not? Because unless an API is async, it is forced to buffer or fail. What it cannot do, is block.
Async also does not magically solve the issues with GIL in Python. It does not magically make real threads appear in JavaScript, it does not solve issues when random code starts blocking (and remember, even memory access can block). Or you very slowly calculate a large Fibonacci number.
Threads are the Answer, Not Coroutines
I already alluded to this above a few times, but when we think about being able to “suspend” from an arbitrary point in time, we often immediately think of coroutines as a programmers. For good reasons: coroutines are amazing, they are fun, and every programming language should have them!
Coroutines are an important building block, and if any future language designer is looking at this post: you should put them in.
But coroutines should be very lightweight, and they can be abused in ways that make it very hard to follow what's going on. Lua, for instance, gives you coroutines, but it does not give you the necessary structure to do something with them easily. You will end up building your own scheduler, your own threading system, etc.
So what we really want is where we started out with: threads! Good old threads!
The irony in all of this is, that the language that I think actually go this right is modern Java. Project Loom in Java has coroutines and all the bells and whistles under the hood, but what it exposes to the developer is good old threads. There are virtual threads, which are mounted on carrier OS threads, and these virtual threads can travel from thread to thread. If you end up issuing a blocking call on a virtual thread, it yields to the scheduler.
Now I happen to think that threads alone are not good enough! Threads require synchronization, they require communication primitives etc. Scratch has message passing! So there is more that needs to be built to make them work well.
I want to follow up on an another blog post about what is needed to make threads easier to work with. Because what async/await clearly innovated is bringing some of these core capabilities closer to the user of the language, and often modern async/await code looks easier to read than traditional code using threads is.
Structured Concurrency and Channels
Lastly I do want to say something nice about async/await and celebrate the innovations that it has brought up. I believe that this language feature singlehandedly drove some crucial innovation about concurrent programming by making it widely accessible. In particular it moved many developers from a basic “single thread per request” model to breaking down tasks into smaller chunks, even in languages like Python. For me, the biggest innovation here goes to Trio, which introduced the concept of structured concurrency via its nursery. That concept has eventually found a home even in asyncio with the concept of the TaskGroup API and is finding its way into Java.
I recommend you to read Nathaniel J. Smith's Notes on structured concurrency, or: Go statement considered harmful for a much better introduction. However if you are unfamiliar with it, here is my attempt of explaining it:
- There is a clear start and end of work: every thread or task has a clear beginning and end, which makes it easier to follow what each thread is doing. All threads spawned in the context of a thread, are known to that thread. Think of it like creating a small team to work on a task: they start together, finish together, and then report back.
- Threads don't outlive their parent: if for whatever reason the parent is done before the children threads, it automatically awaits before returning.
- Error propagate and cause cancellations: If something goes wrong in one thread, the error is passed back to the parent. But more importantly, it also automatically causes other child threads to cancel. Cancellations are a core of the system!
I believe that structured concurrrency needs to become a thing in a threaded world. Threads must know their parents and children. Threads also need fo find convenient ways to ways to pass their success values back. Lastly context should flow from thread to thread implicity through context locals.
The second part is that async/await made it much more apparent that tasks / threads need to talk with each other. In particular the concept of channels and selecting on channels became more prevalent. This is an essential building block which I think can be further improved upon. As food for thought: if you have structured concurrency, in principle each thread's return value really can be represented as a buffered channel attached to the thread, holding up to a single value (successful return value or error) that you can select on.
Today, although no language has perfected this model, thanks to many years of experimentation, the solution seems clearer than ever, with structured concurrency at its core.
Conclusion
I hope I was able to demonstrate to you that async/await has been a mixed bag. It brought some relief from callback hell, but it also saddled us with new issues like colored functions, new back-pressure challenges, and introduced new problems all entirely such as promises that can just sit around forever without resolving. It has also taken away a lot of utility that call stacks brought, in particular for debugging and profiling. These aren't minor hiccups; they're real obstacles that get in the way of the straightforward, intuitive concurrency we should be aiming for.
If we take a step back, it seems pretty clear to me that we have veered off course by adopting async/await in languages that have real threads. Innovations like Java's Project Loom feel like the right fit here. Virtual threads can yield when they need to, switch contexts when blocked, and even work with message-passing systems that make concurrency feel natural. If we free ourselves from the idea that the functional, promise system has figured out all the problems we can look at threads properly again.
However at the same time async/await has moved concurrent programming to the forefront and has resulted in real innovation. Making concurrency a core feature of the language (via syntax even!) is a good thing. Maybe the increased adoption and people struggling with it, was what made structured concurrency a real thing in the Python async/await world.
Future language design should rethink concurrency once more: Instead of adopting async/await, new languages should model themselves more like Java's Project Loom but with more user friendly primitives. But like Scratch, it should give programmers really good APIs that make concurrency natural. I don't think actor frameworks are the right fit, but a combination of structured concurrency, channels, syntax support for spawning/joining/selecting will go a long way. Watch this space for a future blog post about some things I found to work better than others.
[1] | Sentry works with large debug information files such as PDB or DWARF. These files can be gigabytes in size and we memory map terabytes of preprocessed files into memory during processing. Memory mapped files can block is hardly a surprise, but what we learned in the process is that thanks to containerization and memory limits, you can easily navigate yourself into a situation where you spend much more time on page faults than you expected and the system crawls to a halt. |
November 18, 2024 12:00 AM UTC
November 17, 2024
Django Weblog
2025 DSF Board Election Results
The 2025 DSF Board Election has closed, and the following candidates have been elected:
- Abigail Gbadago
- Jeff Triplett
- Paolo Melchiorre
- Tom Carrick
They will all serve two years for their term.
Directors elected for the 2024 DSF Board, Jacob, Sarah, and Thibaud are continuing with one year left to serve on the board.
Therefore, the combined 2025 DSF Board of Directors are:
- Jacob Kaplan-Moss
- Sarah Abderemane
- Thibaud Colas
- Abigail Gbadago*
- Jeff Triplett*
- Paolo Melchiorre*
-
Tom Carrick*
-
Elected to a two (2) year term
Congratulations to our winners, and a huge thank you to our departing board members Çağıl Uluşahin Sonmez, Chaim Kirby, Kátia Yoshime Nakamura, Katie McLaughlin.
Thank you again to everyone who nominated themselves. Even if you were not successful, you gave our community the chance to make their voices heard in who they wanted to represent them.
November 17, 2024 11:56 PM UTC
Paolo Melchiorre
Thoughts on my election as a DSF board member
My thoughts on my election as a member of the Django Software Foundation (DSF) board of directors.
November 17, 2024 11:00 PM UTC
Real Python
Using the Python zip() Function for Parallel Iteration
Python’s zip()
function combines elements from multiple iterables. Calling zip()
generates an iterator that yields tuples, each containing elements from the input iterables. This function is essential for tasks like parallel iteration and dictionary creation, offering an efficient way to handle multiple sequences in Python programming.
By the end of this tutorial, you’ll understand that:
zip()
in Python aggregates elements from multiple iterables into tuples, facilitating parallel iteration.dict(zip())
creates dictionaries by pairing keys and values from two sequences.zip()
is lazy in Python, meaning it returns an iterator instead of a list.- There’s no
unzip()
function in Python, but the samezip()
function can reverse the process using the unpacking operator*
. - Alternatives to
zip()
includeitertools.zip_longest()
for handling iterables of unequal lengths.
In this tutorial, you’ll explore how to use zip()
for parallel iteration. You’ll also learn how to handle iterables of unequal lengths and discover the convenience of using zip()
with dictionaries. Whether you’re working with lists, tuples, or other data structures, understanding zip()
will enhance your coding skills and streamline your Python projects.
Free Bonus: 5 Thoughts On Python Mastery, a free course for Python developers that shows you the roadmap and the mindset you’ll need to take your Python skills to the next level.
Understanding the Python zip()
Function
zip()
is available in the built-in namespace. If you use dir()
to inspect __builtins__
, then you’ll see zip()
at the end of the list:
>>> dir(__builtins__)
['ArithmeticError', 'AssertionError', 'AttributeError', ..., 'zip']
You can see that 'zip'
is the last entry in the list of available objects.
According to the official documentation, Python’s zip()
function behaves as follows:
Returns an iterator of tuples, where the i-th tuple contains the i-th element from each of the argument sequences or iterables. The iterator stops when the shortest input iterable is exhausted. With a single iterable argument, it returns an iterator of 1-tuples. With no arguments, it returns an empty iterator. (Source)
You’ll unpack this definition throughout the rest of the tutorial. As you work through the code examples, you’ll see that Python zip operations work just like the physical zipper on a bag or pair of jeans. Interlocking pairs of teeth on both sides of the zipper are pulled together to close an opening. In fact, this visual analogy is perfect for understanding zip()
, since the function was named after physical zippers!
Using zip()
in Python
The signature of Python’s zip()
function is zip(*iterables, strict=False)
. You’ll learn more about strict
later. The function takes in iterables as arguments and returns an iterator. This iterator generates a series of tuples containing elements from each iterable. zip()
can accept any type of iterable, such as files, lists, tuples, dictionaries, sets, and so on.
Passing n
Arguments
If you use zip()
with n
arguments, then the function will return an iterator that generates tuples of length n
. To see this in action, take a look at the following code block:
>>> numbers = [1, 2, 3]
>>> letters = ["a", "b", "c"]
>>> zipped = zip(numbers, letters)
>>> zipped # Holds an iterator object
<zip object at 0x7fa4831153c8>
>>> type(zipped)
<class 'zip'>
>>> list(zipped)
[(1, 'a'), (2, 'b'), (3, 'c')]
Here, you use zip(numbers, letters)
to create an iterator that produces tuples of the form (x, y)
. In this case, the x
values are taken from numbers
and the y
values are taken from letters
. Notice how the Python zip()
function returns an iterator. To retrieve the final list object, you need to use list()
to consume the iterator.
If you’re working with sequences like lists, tuples, or strings, then your iterables are guaranteed to be evaluated from left to right. This means that the resulting list of tuples will take the form [(numbers[0], letters[0]), (numbers[1], letters[1]),..., (numbers[n], letters[n])]
. However, for other types of iterables (like sets), you might see some weird results:
>>> s1 = {2, 3, 1}
>>> s2 = {"b", "a", "c"}
>>> list(zip(s1, s2))
[(1, 'a'), (2, 'c'), (3, 'b')]
In this example, s1
and s2
are set
objects, which don’t keep their elements in any particular order. This means that the tuples returned by zip()
will have elements that are paired up randomly. If you’re going to use the Python zip()
function with unordered iterables like sets, then this is something to keep in mind.
Passing No Arguments
You can call zip()
with no arguments as well. In this case, you’ll simply get an empty iterator:
>>> zipped = zip()
>>> zipped
<zip object at 0x7f196294a488>
>>> list(zipped)
[]
Here, you call zip()
with no arguments, so your zipped
variable holds an empty iterator. If you consume the iterator with list()
, then you’ll see an empty list as well.
Read the full article at https://realpython.com/python-zip-function/ »
[ Improve Your Python With 🐍 Python Tricks 💌 – Get a short & sweet Python Trick delivered to your inbox every couple of days. >> Click here to learn more and see examples ]
November 17, 2024 02:00 PM UTC
Test and Code
223: Writing Stuff Down is a Super Power
Taking notes well can help to listen better, remember things, show respect, be more accountable, free up mind space to solve problems.
This episode discusses
- the benefits of writing things down
- preparing for a meeting
- taking notes in meetings
- reviewing notes for action items, todo items, things to follow up on, etc.
- taking notes to allow for better focus
- writing well structured emails
- writing blog posts and books
Learn pytest
- pytest is the number one test framework for Python.
- Learn the basics super fast with Hello, pytest!
- Then later you can become a pytest expert with The Complete pytest Course
- Both courses are at courses.pythontest.com
November 17, 2024 01:55 AM UTC
November 16, 2024
Real Python
Using the len() Function in Python
The len()
function in Python is a powerful and efficient tool used to determine the number of items in objects, such as sequences or collections. You can use len()
with various data types, including strings, lists, dictionaries, and third-party types like NumPy arrays and pandas DataFrames. Understanding how len()
works with different data types helps you write more efficient and concise Python code.
Using len()
in Python is straightforward for built-in types, but you can extend it to your custom classes by implementing the .__len__()
method. This allows you to customize what length means for your objects. For example, with pandas DataFrames, len()
returns the number of rows. Mastering len()
not only enhances your grasp of Python’s data structures but also empowers you to craft more robust and adaptable programs.
By the end of this tutorial, you’ll understand that:
- The
len()
function in Python returns the number of items in an object, such as strings, lists, or dictionaries. - To get the length of a string in Python, you use
len()
with the string as an argument, likelen("example")
. - To find the length of a list in Python, you pass the list to
len()
, likelen([1, 2, 3])
. - The
len()
function operates in constant time, O(1), as it accesses a length attribute in most cases.
In this tutorial, you’ll learn when to use the len()
Python function and how to use it effectively. You’ll discover which built-in data types are valid arguments for len()
and which ones you can’t use. You’ll also learn how to use len()
with third-party types like ndarray
in NumPy and DataFrame
in pandas, and with your own classes.
Free Bonus: Click here to get a Python Cheat Sheet and learn the basics of Python 3, like working with data types, dictionaries, lists, and Python functions.
Getting Started With Python’s len()
The function len()
is one of Python’s built-in functions. It returns the length of an object. For example, it can return the number of items in a list. You can use the function with many different data types. However, not all data types are valid arguments for len()
.
You can start by looking at the help for this function:
>>> help(len)
Help on built-in function len in module builtins:
len(obj, /)
Return the number of items in a container.
The function takes an object as an argument and returns the length of that object. The documentation for len()
goes a bit further:
Return the length (the number of items) of an object. The argument may be a sequence (such as a string, bytes, tuple, list, or range) or a collection (such as a dictionary, set, or frozen set). (Source)
When you use built-in data types and many third-party types with len()
, the function doesn’t need to iterate through the data structure. The length of a container object is stored as an attribute of the object. The value of this attribute is modified each time items are added to or removed from the data structure, and len()
returns the value of the length attribute. This ensures that len()
works efficiently.
In the following sections, you’ll learn about how to use len()
with sequences and collections. You’ll also learn about some data types that you cannot use as arguments for the len()
Python function.
Using len()
With Built-in Sequences
A sequence is a container with ordered items. Lists, tuples, and strings are three of the basic built-in sequences in Python. You can find the length of a sequence by calling len()
:
>>> greeting = "Good Day!"
>>> len(greeting)
9
>>> office_days = ["Tuesday", "Thursday", "Friday"]
>>> len(office_days)
3
>>> london_coordinates = (51.50722, -0.1275)
>>> len(london_coordinates)
2
When finding the length of the string greeting
, the list office_days
, and the tuple london_coordinates
, you use len()
in the same manner. All three data types are valid arguments for len()
.
The function len()
always returns an integer as it’s counting the number of items in the object that you pass to it. The function returns 0
if the argument is an empty sequence:
>>> len("")
0
>>> len([])
0
>>> len(())
0
In the examples above, you find the length of an empty string, an empty list, and an empty tuple. The function returns 0
in each case.
A range
object is also a sequence that you can create using range()
. A range
object doesn’t store all the values but generates them when they’re needed. However, you can still find the length of a range
object using len()
:
>>> len(range(1, 20, 2))
10
This range of numbers includes the integers from 1
to 19
with increments of 2
. The length of a range
object can be determined from the start, stop, and step values.
In this section, you’ve used the len()
Python function with strings, lists, tuples, and range
objects. However, you can also use the function with any other built-in sequence.
Read the full article at https://realpython.com/len-python-function/ »
[ Improve Your Python With 🐍 Python Tricks 💌 – Get a short & sweet Python Trick delivered to your inbox every couple of days. >> Click here to learn more and see examples ]
November 16, 2024 02:00 PM UTC
November 15, 2024
Real Python
The Real Python Podcast – Episode #228: Maintaining the Foundations of Python & Cautionary Tales
How do you build a sustainable open-source project and community? What lessons can be learned from Python's history and the current mess that the WordPress community is going through? This week on the show, we speak with Paul Everitt from JetBrains about navigating open-source funding and the start of the Python Software Foundation.
[ Improve Your Python With 🐍 Python Tricks 💌 – Get a short & sweet Python Trick delivered to your inbox every couple of days. >> Click here to learn more and see examples ]
November 15, 2024 12:00 PM UTC
Talk Python to Me
#485: Secure coding for Python with SheHacksPurple
What do developers need to know about AppSec and building secure software? We have Tonya Janca (AKA SheHacksPurple) on the show to tell us all about it. We talk about what developers should expect from threat modeling events as well as concrete tips for security your apps and services.<br/> <br/> <strong>Episode sponsors</strong><br/> <br/> <a href='https://talkpython.fm/posit'>Posit</a><br> <a href='https://talkpython.fm/bluehost'>Bluehost</a><br> <a href='https://talkpython.fm/training'>Talk Python Courses</a><br/> <br/> <strong>Links from the show</strong><br/> <br/> <div><b>Tanya on X</b>: <a href="https://x.com/shehackspurple?featured_on=talkpython" target="_blank" >@shehackspurple</a><br/> <b>She Hacks Purple website</b>: <a href="https://shehackspurple.ca/?featured_on=talkpython" target="_blank" >shehackspurple.ca</a><br/> <b>White House recommends memory safe languages</b>: <a href="https://www.whitehouse.gov/oncd/briefing-room/2024/02/26/press-release-technical-report/?featured_on=talkpython" target="_blank" >whitehouse.gov</a><br/> <b>Python Developer Survey Results</b>: <a href="https://lp.jetbrains.com/python-developers-survey-2023/?featured_on=talkpython" target="_blank" >jetbrains.com</a><br/> <b>Bandit</b>: <a href="https://github.com/PyCQA/bandit?featured_on=talkpython" target="_blank" >github.com</a><br/> <b>Semgrep Academy</b>: <a href="https://academy.semgrep.dev/?featured_on=talkpython" target="_blank" >academy.semgrep.dev</a><br/> <b>Watch this episode on YouTube</b>: <a href="https://www.youtube.com/watch?v=ocnFl_Nt-ic" target="_blank" >youtube.com</a><br/> <b>Episode transcripts</b>: <a href="https://talkpython.fm/episodes/transcript/485/secure-coding-for-python-with-shehackspurple" target="_blank" >talkpython.fm</a><br/> <br/> <b>--- Stay in touch with us ---</b><br/> <b>Subscribe to us on YouTube</b>: <a href="https://talkpython.fm/youtube" target="_blank" >youtube.com</a><br/> <b>Follow Talk Python on Mastodon</b>: <a href="https://fosstodon.org/web/@talkpython" target="_blank" ><i class="fa-brands fa-mastodon"></i>talkpython</a><br/> <b>Follow Michael on Mastodon</b>: <a href="https://fosstodon.org/web/@mkennedy" target="_blank" ><i class="fa-brands fa-mastodon"></i>mkennedy</a><br/></div>
November 15, 2024 08:00 AM UTC
Matt Layman
Heroku To DigitalOcean - Building SaaS #206
In this episode, I began a migration of my JourneyInbox app from Heroku to DigitalOcean. The first step to this move, since I’m going to use Kamal, is to put the app into a Docker image. We got the whole app into the Docker image, then cleaned up local development and the CI system after making changes that broke those configurations.
November 15, 2024 12:00 AM UTC
November 14, 2024
Python Morsels
Inspecting objects in Python
I rely on 4 functions for inspecting Python objects: type
, help
, dir
, and vars
.
Table of contents
Inspecting an object's structure and data
The scenario is, we're either in the Python REPL or we've used the built-in breakpoint
function to drop into the Python debugger within our code.
So we're within some sort of interactive Python environment.
For example, we might be running this file, which we've put a breakpoint
call in to drop into a Python debugger:
from argparse import ArgumentParser
from collections import Counter
from pathlib import Path
import re
def count_letters(text):
return Counter(
char
for char in text.casefold()
if char.isalpha()
)
def main():
parser = ArgumentParser()
parser.add_argument("file", type=Path)
args = parser.parse_args()
letter_counts = count_letters(args.file.read_text())
breakpoint()
for letter, count in letter_counts.most_common():
print(count, letter.upper())
if __name__ == "__main__":
main()
And we've used the PDB interact
command to start a Python REPL:
~ $ python3 letter_counter.py frankenstein.txt
> /home/trey/letter_counter.py(18)main()
-> breakpoint()
(Pdb) interact
*pdb interact start*
>>>
We have a letter_counts
variable that refers to some sort of object.
We want to know what this object is all about.
What questions could we ask of this object?
Well, to start with, we could simply refer to the object, and then hit Enter
:
>>> letter_counts
Counter({'e': 46043, 't': 30365, 'a': 26743, 'o': 25225, 'i': 24613, 'n': 24367, 's': 21155, 'r': 20818, 'h': 19725, 'd': 16863, 'l': 12739, 'm': 10604, 'u': 10407, 'c': 9243, 'f': 8731, 'y': 7914, 'w': 7638, 'p': 6121, 'g': 5974, 'b': 5026, 'v': 3833, 'k': 1755, 'x': 677, 'j': 504, 'q': 324, 'z': 243})
We've typed the name of a variable that points to an object, and now we see the programmer-readable representation for that object.
How to see an object's class
Often, the string representation tells …
Read the full article: https://www.pythonmorsels.com/inspecting-python-objects/
November 14, 2024 09:49 PM UTC
Django Weblog
Django’s technical governance challenges, and opportunities
As of October 29th, two of four members of the Django Software Foundation Steering Council have resigned from their role, with their intentions being to trigger an election of the Steering Council earlier than otherwise scheduled, per our established governance processes.
To our departing members, Simon and Adam, thank you for your contributions to Django and its governance ❤️. The framework and our community owes a lot to your dedication, and we’re confident our community will join us in celebrating your past contributions – and look forward to learning about your future endeavors in the Django ecosystem. And thanks to the remaining members, James and Andrew, for their service over the years.
Our governance challenges
Governance in open source is hard, and community-driven open source even more so. We’re proud that Django’s original two Benevolent Dictators For Life (BDFLs) both retired from the role and turned things over to community governance ten years ago now. The BDFL model can provide excellent technical governance, but also has its flaws. So the mantle of technical governance then went on to the Core Developers and the Technical Board (renamed to Steering Council) was introduced.
However, time has revealed flaws in the Steering Council’s governance model and operations. The Steering Council was able to provide decision-making – tiebreaking when the developer community couldn’t lead to consensus – but didn’t provide more forward-looking leadership or vision. Disagreements over how – or if – the Steering Council should approach this part of leadership led us to the current situation, with no functioning technical governance as of a few weeks ago. Even before those recent events, those flaws were also a common source of frustration for our contributors, and a source of concern for Django users who (rightly or not) might have expectations of Django’s direction – such as the publication of a “roadmap” for Django development.
The Django Software Foundation Board of Directors is and was aware of those issues, and recently made attempts to have the Steering Council rectify them, in coordination with other established community members. The DSF Board has tried to be hands-off when it comes to technical leadership, but in retrospect we should have been getting involved sooner, or more decisively. The lack of technical leadership is an existential threat to Django – a slow moving one, but a threat nonetheless. It’s our responsibility to address this threat.
Where we’re heading
We now need new Steering Council members. But we also need governance reform. There’s a lot about the Steering Council that is good and might only need minimal changes. However, the overall question of the Steering Council’s remit, and how it approaches technical leadership for the Django community, needs to be resolved.
We’re going to hold early elections of the Steering Council, as soon as we’ve completed the ongoing 2025 DSF Board elections. Those elections will follow existing processes, and we will want a Steering Council who strives to meet the group’s intended goals:
- To safeguard big decisions that affect Django projects at a fundamental level.
- To help shepherd the project’s future direction.
We expect the new Steering Council will take on those known challenges, resolve those questions of technical leadership, and update Django’s technical governance. They will have the full support of the Board of Directors to address this threat to Django’s future. And the Board will also be more decisive in intervening, should similar issues keep arising.
How you can help
We need contributors willing to take on those challenges and help our community come out ahead. It’s a big role, impactful but demanding. And there are strict, often annoying eligibility rules for the Steering Council.
To help you help us, we’ve set up a form: Django 6.x Steering Council elections - Expression of interest.
If you’re interested in stepping up to shepherd Django’s technical direction, fill in our expression of interest form. We’ll let you know whether or not you meet those eligibility rules, take the guesswork out of the way. You get to focus on your motivation for taking on this kind of high-purpose, high-reward governance role.
And once the elections start and we get to candidate registrations, we’ll be able to reuse details submitted here (if you want to) so the process is smoother for everyone.
Django 6.x Steering Council elections - Expression of interest
How everyone can help
Those elections will be crucial for the future of Django, and will be decided thanks to the vote of our Django Software Foundation Individual Members. If you know people who contribute to the DSF’s mission but aren’t Individual Members already -- use our form to nominate them as Individual Members, so they’re eligible to vote. If you’re that person, do nominate yourself. We consider all contributions towards our mission: advancing and promoting Django, protecting the framework’s long-term viability, and advancing the state of the art in web development.
Any questions? Comment on our forum thread, Discussion thread for “Django’s technical governance challenges, and opportunities” blog post, or reach out via email to foundation@djangoproject.com.
November 14, 2024 05:00 PM UTC
PyCharm
Inline AI Prompting, Coding Assistance for the dataclass_transform Decorator (PEP 681), and More in PyCharm 2024.3!
Code smarter, optimize performance, and stay focused on what matters most with the latest updates in PyCharm 2024.3. From enhanced support for AI Assistant and Jupyter notebooks to new features like no-code data filtering, there’s so much to explore.
Learn about all the updates on our What’s New page, download the latest version from our website, or update your current version through our free Toolbox App.
Key features of PyCharm 2024.3
AI Assistant
Inline AI prompting
Get help with code, generate documentation, or write tests by prompting AI directly in PyCharm’s editor. Just type your request on a new line and hit Enter.
Edits made by AI are marked in purple in the gutter, so changes are easy to spot. Need a fresh suggestion? Press Tab, Ctrl+/ ( ⌘/ on macOS), or manually edit the purple input text yourself. This feature is available for Python, JavaScript, TypeScript, JSON, YAML, and Jupyter notebooks.
For a personalized AI chat experience, you can now also choose from Google Gemini, OpenAI, or your own local models. Moreover, enhanced context management now lets you control what AI Assistant takes into consideration. The brand-new UI auto-includes open files and selected code and comes with options to add or remove files and attach project-wide instructions to guide responses across your codebase.
Ability to convert for loops into list comprehensions
Refactor your code faster with AI Assistant, which can now help you change massive for
loops into list comprehensions. This feature works for all for
loops, including nested and while
loops.
Local multiline AI code completion PyCharm Professional
PyCharm Professional now provides local multiline AI code completion suggestions based on the proprietary JetBrains ML model used for Full Line Code Completion. Note that we don’t use your data to train the model.
Local multiline code completion typically generates 2–4 lines of code in scenarios where it can predict the next sequence of logical steps, such as within loops, when handling conditions, or when completing common code patterns and boilerplate sections.
Coding assistance for the dataclass_transform
decorator (PEP 681)
PyCharm now supports intelligent coding assistance for custom data classes created with libraries using the dataclass_transform
decorator. Enjoy the same support as for standard data classes, including attribute code completion and type inference for constructor signatures.
Jupyter Notebook PyCharm Professional
Auto-installation for multiple packages
PyCharm 2024.3 makes it easier to install packages that are imported in your code. A new quick-fix is available for bulk auto-installations, allowing you to download and install several packages in one click.
Ability to open Jupyter table outputs in the Data View window
View Jupyter table outputs in the Data View tool window to access powerful features like heatmaps, formatting, slicing, and AI functions for enhanced dataframe analysis. Just click on the Open in Data View icon to get started.
No-code data filtering
Effortlessly filter data in the Data View tool window or within dataframes without writing any code. Just click the Filter icon in the upper-right corner, choose your filter options and see results in the same window. This functionality works with all supported Python frameworks, including pandas, Polars, NumPy, PyTorch, TensorFlow, and Hugging Face Datasets.
Debug port specification PyCharm Professional
PyCharm now allows you to specify a single debugger port for all communications, simplifying debugging in restricted environments like Docker or WSL. After you set the port in the debugger settings, the debugger runs as a server and all communication between it and the IDE flows through the specified port.
Visit our What’s New page or check out the full release notes for more features and additional details about the features mentioned here. Please report any bugs on our issue tracker so we can address them promptly.
Connect with us on X (formerly Twitter) to share your thoughts on PyCharm 2024.3. We look forward to hearing from you!
November 14, 2024 01:42 PM UTC
Real Python
Quiz: Namespaces and Scope in Python
In this quiz, you’ll test your understanding of Python Namespaces and Scope.
You’ll revisit how Python organizes symbolic names and objects in namespaces, when Python creates a new namespace, how namespaces are implemented, and how variable scope determines symbolic name visibility.
[ Improve Your Python With 🐍 Python Tricks 💌 – Get a short & sweet Python Trick delivered to your inbox every couple of days. >> Click here to learn more and see examples ]
November 14, 2024 12:00 PM UTC
PyPy
Guest Post: Final Encoding in RPython Interpreters
Introduction
This post started as a quick note summarizing a recent experiment I carried out upon a small RPython interpreter by rewriting it in an uncommon style. It is written for folks who have already written some RPython and want to take a deeper look at interpreter architecture.
Some experiments are about finding solutions to problems. This experiment is about taking a solution which is already well-understood and applying it in the context of RPython to find a new approach. As we will see, there is no real change in functionality or the number of clauses in the interpreter; it's more like a comparison between endo- and exoskeletons, a different arrangement of equivalent bones and plates.
Overview
An RPython interpreter for a programming language generally does three or four things, in order:
- Read and parse input programs
- Encode concrete syntax as abstract syntax
- Optionally, optimize or reduce the abstract syntax
- Evaluate the abstract syntax: read input data, compute, print output data, etc.
Today we'll look at abstract syntax. Most programming languages admit a concrete parse tree which is readily abstracted to provide an abstract syntax tree (AST). The AST is usually encoded with the initial style of encoding. An initial encoding can be transformed into any other encoding for the same AST, looks like a hierarchy of classes, and is implemented as a static structure on the heap.
In contrast, there is also a final encoding. A final encoding can be
transformed into by any other encoding, looks like an interface for the
actions of the interpreter, and is implemented as an unwinding structure on
the stack. From the RPython perspective, Python builtin modules like os
or
sys
are final encodings for features of the operating system; the underlying
implementation is different when translated or untranslated, but the interface
used to access those features does not change.
In RPython, an initial encoding is built from a hierarchy of classes. Each
class represents a type of tree nodes, corresponding to a parser production in
the concrete parse tree. Each class instance therefore represents an
individual tree node. The fields of a class, particularly those filled during
.__init__()
, store pre-computed properties of each node; methods can be used
to compute node properties on demand. This seems like an obvious and simple
approach; what other approaches could there be? We need an example.
Final Encoding of Brainfuck
We will consider Brainfuck, a simple Turing-complete programming language. An example Brainfuck program might be:
[-]
This program is built from a loop and a decrement, and sets a cell to zero. In an initial encoding which follows the algebraic semantics of Brainfuck, the program could be expressed by applying class constructors to build a structure on the heap:
Loop(Plus(-1))
A final encoding is similar, except that class constructors are replaced by methods, the structure is built on the stack, and we are parameterized over the choice of class:
lambda cls: cls.loop(cls.plus(-1))
In ordinary Python, transforming between these would be trivial, and mostly is a matter of passing around the appropriate class. Indeed, initial and final encodings are equivalent; we'll return to that fact later. However, in RPython, all of the types must line up, and classes must be determined before translation. We'll need to monomorphize our final encodings, using some RPython tricks later on. Before that, let's see what an actual Brainfuck interface looks like, so that we can cover all of the difficulties with final encoding.
Before we embark, please keep in mind that local code doesn't know what cls
is. There's no type-safe way to inspect an arbitrary semantic domain. In the
initial-encoded version, we can ask isinstance(bf, Loop)
to see whether an
AST node is a loop, but there simply isn't an equivalent for final-encoded
ASTs. So, there is an implicit challenge to think about: how do we evaluate a
program in an arbitrary semantic domain? For bonus points, how do we optimize
a program without inspecting the types of its AST nodes?
What follows is a dissection of this module at the given revision. Readers may find it satisfying to read the entire interpreter top to bottom first; it is less than 300 lines.
Core Functionality
Final encoding is given as methods on an interface. These five methods correspond precisely to the summands of the algebra of Brainfuck.
class BF(object): # Other methods elided def plus(self, i): pass def right(self, i): pass def input(self): pass def output(self): pass def loop(self, bfs): pass
Note that the .loop()
method takes another program as an argument.
Initial-encoded ASTs have other initial-encoded ASTs as fields on class
instances; final-encoded ASTs have other final-encoded ASTs as parameters
to interface methods. RPython infers all of the types, so the reader has to
know that i
is usually an integer while bfs
is a sequence of Brainfuck
operations.
We're using a class to implement this functionality. Later, we'll treat it as a mixin, rather than a superclass, to avoid typing problems.
Monoid
In order to optimize input programs, we'll need to represent the underlying monoid of Brainfuck programs. To do this, we add the signature for a monoid:
class BF(object): # Other methods elided def unit(self): pass def join(self, l, r): pass
This is technically a unital magma, since RPython doesn't support algebraic laws, but we will enforce the algebraic laws later on during optimization. We also want to make use of the folklore that free monoids are lists, allowing callers to pass a list of actions which we'll reduce with recursion:
class BF(object): # Other methods elided def joinList(self, bfs): if not bfs: return self.unit() elif len(bfs) == 1: return bfs[0] elif len(bfs) == 2: return self.join(bfs[0], bfs[1]) else: i = len(bfs) >> 1 return self.join(self.joinList(bfs[:i]), self.joinList(bfs[i:]))
.joinList()
is a little bulky to implement, but Wirth's principle applies:
the interpreter is shorter with it than without it.
Idioms
Finally, our interface includes a few high-level idioms, like the zero program
shown earlier, which are defined in terms of low-level behaviors. In an
initial encoding, these could be defined as module-level functions; here, we
define them on the mixin class BF
.
class BF(object): # Other methods elided def zero(self): return self.loop(self.plus(-1)) def move(self, i): return self.scalemove(i, 1) def move2(self, i, j): return self.scalemove2(i, 1, j, 1) def scalemove(self, i, s): return self.loop(self.joinList([ self.plus(-1), self.right(i), self.plus(s), self.right(-i)])) def scalemove2(self, i, s, j, t): return self.loop(self.joinList([ self.plus(-1), self.right(i), self.plus(s), self.right(j - i), self.plus(t), self.right(-j)]))
Interface-oriented Architecture
Applying Interfaces
Now, we hack at RPython's object model until everything translates. First, consider the task of pretty-printing. For Brainfuck, we'll simply regurgitate the input program as a Python string:
class AsStr(object): import_from_mixin(BF) def unit(self): return "" def join(self, l, r): return l + r def plus(self, i): return '+' * i if i > 0 else '-' * -i def right(self, i): return '>' * i if i > 0 else '<' * -i def loop(self, bfs): return '[' + bfs + ']' def input(self): return ',' def output(self): return '.'
Via rlib.objectmodel.import_from_mixin
, no stressing with covariance of
return types is required. Instead, we shift from a Java-esque view of classes
and objects, to an OCaml-ish view of prebuilt classes and constructors.
AsStr
is monomorphic, and any caller of it will have to create their own
covariance somehow. For example, here are the first few lines of the parsing
function:
@specialize.argtype(1) def parse(s, domain): ops = [domain.unit()] # Parser elided to preserve the reader's attention
By invoking rlib.objectmodel.specialize.argtype
, we make copies of the
parsing function, up to one per call site, based on our choice of semantic
domain. Oleg calls these "symantics"
but I prefer "domain" in code. Also, note how the parsing stack starts with
the unit of the monoid, which corresponds to the empty input string; the
parser will repeatedly use the monoidal join to build up a parsed expression
without inspecting it. Here's a small taste of that:
while i < len(s): char = s[i] if char == '+': ops[-1] = domain.join(ops[-1], domain.plus(1)) elif char == '-': ops[-1] = domain.join(ops[-1], domain.plus(-1)) # and so on
The reader may feel justifiably mystified; what breaks if we don't add these
magic annotations? Well, the translator will throw UnionError
because the
low-level types don't match. RPython only wants to make one copy of functions
like parse()
in its low-level representation, and each copy of parse()
will be compiled to monomorphic machine code. In this interpreter, in order to
support parsing to an optimized string and also parsing to an evaluator, we
need two copies of parse()
. It is okay to not fully understand this at
first.
Composing Interfaces
Earlier, we noted that an interpreter can optionally optimize input programs after parsing. To support this, we'll precompose a peephole optimizer onto an arbitrary domain. We could also postcompose with a parser instead, but that sounds more difficult. Here are the relevant parts:
def makePeephole(cls): domain = cls() def stripDomain(bfs): return domain.joinList([t[0] for t in bfs]) class Peephole(object): import_from_mixin(BF) def unit(self): return [] def join(self, l, r): return l + r # Actual definition elided... for now... return Peephole, stripDomain
Don't worry about the actual optimization yet. What's important here is the
pattern of initialization of semantic domains. makePeephole
is an
SML-style functor on semantic
domains: given a final encoding of Brainfuck, it produces another final
encoding of Brainfuck which incorporates optimizations. The helper
stripDomain
is a finalizer which performs the extraction from the
optimizer's domain to the underlying cls
that was passed in at translation
time. For example, let's optimize pretty-printing:
AsStr, finishStr = makePeephole(AsStr)
Now, it only takes one line to parse and print an optimized AST without ever building it on the heap. To be pedantic, fragments of the output string will be heap-allocated, but the AST's node structure will only ever be stack-allocated. Further, to be shallow, the parser is written to prevent malicious input from causing a stack overflow, and this forces it to maintain a heap-allocated RPython list of intermediate operations inside loops.
print finishStr(parse(text, AsStr()))
Performance
But is it fast? Yes. It's faster than the prior version, which was initial-encoded, and also faster than Andrew Brown's classic version (part 1, part 2). Since Brown's interpreter does not perform much optimization, we will focus on how final encoding can outperform initial encoding.
JIT
First, why is it faster than the same interpreter with initial encoding? Well,
it still has initial encoding from the JIT's perspective! There is an Op
class with a hierarchy of subclasses implementing individual behaviors. A
sincere tagless-final student, or those who remember Stop Writing Classes
(2012, Pycon
US), will
recognize that the following classes could be plain functions, and should
think of the classes as a concession to RPython's lack of support for lambdas
with closures rather than an initial encoding. We aren't ever going to
directly typecheck any Op
, but the JIT will generate typechecking guards
anyway, so we effectively get a fully-promoted AST inlined into each JIT
trace. First, some simple behaviors:
class Op(object): _immutable_ = True class _Input(Op): _immutable_ = True def runOn(self, tape, position): tape[position] = ord(os.read(0, 1)[0]) return position Input = _Input() class _Output(Op): _immutable_ = True def runOn(self, tape, position): os.write(1, chr(tape[position])) return position Output = _Output() class Add(Op): _immutable_ = True _immutable_fields_ = "imm", def __init__(self, imm): self.imm = imm def runOn(self, tape, position): tape[position] += self.imm return position
The JIT does technically have less information than before; it no longer knows
that a sequence of immutable operations is immutable enough to be worth
unrolling, but a bit of rlib.jit.unroll_safe
fixes that:
class Seq(Op): _immutable_ = True _immutable_fields_ = "ops[*]", def __init__(self, ops): self.ops = ops @unroll_safe def runOn(self, tape, position): for op in self.ops: position = op.runOn(tape, position) return position
Finally, the JIT entry point is at the head of each loop, just like with prior interpreters. Since Brainfuck doesn't support mid-loop jumps, there's no penalty for only allowing merge points at the head of the loop.
class Loop(Op): _immutable_ = True _immutable_fields_ = "op", def __init__(self, op): self.op = op def runOn(self, tape, position): op = self.op while tape[position]: jitdriver.jit_merge_point(op=op, position=position, tape=tape) position = op.runOn(tape, position) return position
That's the end of the implicit challenge. There's no secret to it; just
evaluate the AST. Here's part of the semantic domain for evaluation, as well
as the "functor" to optimize it. In AsOps.join()
are the only
isinstance()
calls in the entire interpreter! This is acceptable because
Seq
is effectively a type wrapper for an RPython list, so that a list of
operations is also an operation; its list is initial-encoded and available for
inspection.
class AsOps(object): import_from_mixin(BF) def unit(self): return Shift(0) def join(self, l, r): if isinstance(l, Seq) and isinstance(r, Seq): return Seq(l.ops + r.ops) elif isinstance(l, Seq): return Seq(l.ops + [r]) elif isinstance(r, Seq): return Seq([l] + r.ops) return Seq([l, r]) # Other methods elided! AsOps, finishOps = makePeephole(AsOps)
And finally here is the actual top-level code to evaluate the input program. As before, once everything is composed, the actual invocation only takes one line.
tape = bytearray("\x00" * cells) finishOps(parse(text, AsOps())).runOn(tape, 0)
Peephole Optimization
Our peephole optimizer is an abstract interpreter with one instruction of lookahead/rewrite buffer. It implements the aforementioned algebraic laws of the Brainfuck monoid. It also implements idiom recognition for loops. First, the abstract interpreter. The abstract domain has six elements:
class AbstractDomain(object): pass meh, aLoop, aZero, theIdentity, anAdd, aRight = [AbstractDomain() for _ in range(6)]
We'll also tag everything with an integer, so that anAdd
or aRight
can be
exact annotations. This is the actual Peephole.join()
method:
def join(self, l, r): if not l: return r rv = l[:] bfHead, adHead, immHead = rv.pop() for bf, ad, imm in r: if ad is theIdentity: continue elif adHead is aLoop and ad is aLoop: continue elif adHead is theIdentity: bfHead, adHead, immHead = bf, ad, imm elif adHead is anAdd and ad is aZero: bfHead, adHead, immHead = bf, ad, imm elif adHead is anAdd and ad is anAdd: immHead += imm if immHead: bfHead = domain.plus(immHead) elif rv: bfHead, adHead, immHead = rv.pop() else: bfHead = domain.unit() adHead = theIdentity elif adHead is aRight and ad is aRight: immHead += imm if immHead: bfHead = domain.right(immHead) elif rv: bfHead, adHead, immHead = rv.pop() else: bfHead = domain.unit() adHead = theIdentity else: rv.append((bfHead, adHead, immHead)) bfHead, adHead, immHead = bf, ad, imm rv.append((bfHead, adHead, immHead)) return rv
If this were to get much longer, then implementing a
DSL would be worth it,
but this is a short-enough method to inline. The abstract interpretation is
assumed by induction for the left-hand side of the join, save for the final
instruction, which is loaded into a rewrite register. Each instruction on the
right-hand side is inspected exactly once. The logic for anAdd
followed by
anAdd
is exactly the same as for aRight
followed by aRight
because they
both have underlying Abelian
groups given by the integers.
The rewrite register is carefully pushed onto and popped off from the
left-hand side in order to cancel out theIdentity
, which itself is merely a
unifier for anAdd
or aRight
of 0.
Note that we generate a lot of garbage. For example, parsing a string of n
'+' characters will cause the peephole optimizer to allocate n instances of
the underlying domain.plus()
action, from domain.plus(1)
up to
domain.plus(n)
. An older initial-encoded version of this interpreter used
hash consing to avoid ever
building an op more than once, even loops. It appears more efficient to
generate lots of immutable garbage than to repeatedly hash inputs and search
mutable hash tables, at least for optimizing Brainfuck incrementally during
parsing.
Finally, let's look at idiom recognition. RPython lists are initial-coded, so we can dispatch based on the length of the list, and then inspect the abstract domains of each action.
def isConstAdd(bf, i): return bf[1] is anAdd and bf[2] == i def oppositeShifts(bf1, bf2): return bf1[1] is bf2[1] is aRight and bf1[2] == -bf2[2] def oppositeShifts2(bf1, bf2, bf3): return (bf1[1] is bf2[1] is bf3[1] is aRight and bf1[2] + bf2[2] + bf3[2] == 0) def loop(self, bfs): if len(bfs) == 1: bf, ad, imm = bfs[0] if ad is anAdd and imm in (1, -1): return [(domain.zero(), aZero, 0)] elif len(bfs) == 4: if (isConstAdd(bfs[0], -1) and bfs[2][1] is anAdd and oppositeShifts(bfs[1], bfs[3])): return [(domain.scalemove(bfs[1][2], bfs[2][2]), aLoop, 0)] if (isConstAdd(bfs[3], -1) and bfs[1][1] is anAdd and oppositeShifts(bfs[0], bfs[2])): return [(domain.scalemove(bfs[0][2], bfs[1][2]), aLoop, 0)] elif len(bfs) == 6: if (isConstAdd(bfs[0], -1) and bfs[2][1] is bfs[4][1] is anAdd and oppositeShifts2(bfs[1], bfs[3], bfs[5])): return [(domain.scalemove2(bfs[1][2], bfs[2][2], bfs[1][2] + bfs[3][2], bfs[4][2]), aLoop, 0)] if (isConstAdd(bfs[5], -1) and bfs[1][1] is bfs[3][1] is anAdd and oppositeShifts2(bfs[0], bfs[2], bfs[4])): return [(domain.scalemove2(bfs[0][2], bfs[1][2], bfs[0][2] + bfs[2][2], bfs[3][2]), aLoop, 0)] return [(domain.loop(stripDomain(bfs)), aLoop, 0)]
This ends the bonus question. How do we optimize an unknown semantic domain? We must maintain an abstract context which describes elements of the domain. In initial encoding, we ask an AST about itself. In final encoding, we already know everything relevant about the AST.
The careful reader will see that I didn't really answer that opening question in the JIT section. Because the JIT still ranges over the same operations as before, it can't really be slower; but why is it now faster? Because the optimizer is now slightly better in a few edge cases. It performs the same optimizations as before, but the rigor of abstract interpretation causes it to emit slightly better operations to the JIT backend.
Concretely, improving the optimizer can shorten pretty-printed programs. The Busy Beaver Gauge measures the length of programs which search for solutions to mathematical problems. After implementing and debugging the final-encoded interpreter, I found that two of my entries on the Busy Beaver Gauge for Brainfuck had become shorter by about 2%. (Most other entries are already hand-optimized according to the standard algebra and have no optimization opportunities.)
Discussion
Given that initial and final encodings are equivalent, and noting that RPython's toolchain is written to prefer initial encodings, what did we actually gain? Did we gain anything?
One obvious downside to final encoding in RPython is interpreter size. The example interpreter shown here is a rewrite of an initial-encoded interpreter which can be seen here for comparison. Final encoding adds about 20% more code in this case.
Final encoding is not necessarily more code than initial encoding, though. All AST encodings in interpreters are subject to the Expression Problem, which states that there is generally a quadratic amount of code required to implement multiple behaviors for an AST with multiple types of nodes; specifically, n behaviors for m types of nodes require n × m methods. Initial encodings improve the cost of adding new types of nodes; final encodings improve the cost of adding new behaviors. Final encoding may tend to win in large codebases for mature languages, where the language does not change often but new behaviors are added frequently and maintained for long periods.
Optimizations in final encoding require a bit of planning. The abstract-interpretation approach is solid but relies upon the monoid and its algebraic laws. In the worst case, an entire class hierarchy could be required to encode the abstraction.
It is remarkable to find a 2% improvement in residual program size merely by reimplementing an optimizer as an abstract interpreter respecting the algebraic laws. This could be the most important lesson for compiler engineers, if it happens to generalize.
Final encoding was popularized via the tagless-final movement in OCaml and
Scala, including famously in a series of tutorials by Kiselyov et
al. A "tag", in this jargon, is a
runtime identifier for an object's type or class; a tagless encoding
effectively doesn't allow isinstance()
at all. In the above presentation,
tags could be hacked in, but were not materially relevant to most steps. Tags
were required for the final evaluation step, though, and the tagless-final
insight is that certain type systems can express type-safe evaluation without
those tags. We won't go further in this direction because tags also
communicate valuable information to the JIT.
Summarizing Table
Initial Encoding | Final Encoding |
---|---|
hierarchy of classes | signature of interfaces |
class constructors | method calls |
built on the heap | built on the stack |
traversals allocate stack | traversals allocate heap |
tags are available with isinstance() |
tags are only available through hacks |
cost of adding a new AST node: one class | cost of adding a new AST node: one method on every other class |
cost of adding a new behavior: one method on every other class | cost of adding a new behavior: one class |
Credits
Thanks to folks in #pypy
on Libera Chat: arigato for the idea, larstiq for
pushing me to write it up, and cfbolz and mattip for reviewing and finding
mistakes. The original IRC discussion leading to this blog post is available
here.
This interpreter is part of the rpypkgs suite, a Nix flake for RPython interpreters. Readers with Nix installed can run this interpreter directly from the flake:
$ nix-prefetch-url https://github.com/MG-K/pypy-tutorial-ko/raw/refs/heads/master/mandel.b $ nix run github:rpypkgs/rpypkgs#bf -- /nix/store/ngnphbap9ncvz41d0fkvdh61n7j2bg21-mandel.b
November 14, 2024 08:42 AM UTC
Python Bytes
#409 We've moved to Hetzner write-up
<strong>Topics covered in this episode:</strong><br> <ul> <li><a href="https://github.com/willmcgugan/terminal-tree?featured_on=pythonbytes"><strong>terminal-tree</strong></a></li> <li><strong><a href="https://posting.sh?featured_on=pythonbytes">posting: The API client that lives in your terminal</a></strong></li> <li><strong>Extra, extra, extra</strong></li> <li><strong><a href="https://micro.webology.dev/2024/11/03/uv-does-everything.html?featured_on=pythonbytes">UV does everything or enough that I'm not sure what else it needs to do</a></strong></li> <li><strong>Extras</strong></li> <li><strong>Joke</strong></li> </ul><a href='https://www.youtube.com/watch?v=vg6VLG0jKek' style='font-weight: bold;'data-umami-event="Livestream-Past" data-umami-event-episode="409">Watch on YouTube</a><br> <p><strong>About the show</strong></p> <p>Sponsored by:</p> <ul> <li><a href="https://pythonbytes.fm/scout"><strong>ScoutAPM</strong></a> - Django Application Performance Monitoring</li> <li><a href="https://pythonbytes.fm/codeium"><strong>Codeium</strong></a> - Free AI Code Completion & Chat </li> </ul> <p><strong>Connect with the hosts</strong></p> <ul> <li>Michael: <a href="https://fosstodon.org/@mkennedy"><strong>@mkennedy@fosstodon.org</strong></a></li> <li>Brian: <a href="https://fosstodon.org/@brianokken"><strong>@brianokken@fosstodon.org</strong></a></li> <li>Show: <a href="https://fosstodon.org/@pythonbytes"><strong>@pythonbytes@fosstodon.org</strong></a></li> </ul> <p>Join us on YouTube at <a href="https://pythonbytes.fm/stream/live"><strong>pythonbytes.fm/live</strong></a> to be part of the audience. Usually <strong>Monday</strong> at 10am PT. Older video versions available there too.</p> <p>Finally, if you want an artisanal, hand-crafted digest of every week of the show notes in email form? Add your name and email to <a href="https://pythonbytes.fm/friends-of-the-show">our friends of the show list</a>, we'll never share it.</p> <p><strong>Michael #1:</strong> <a href="https://github.com/willmcgugan/terminal-tree?featured_on=pythonbytes"><strong>terminal-tree</strong></a></p> <ul> <li>An experimental filesystem navigator for the terminal, built with <a href="https://github.com/textualize/textual?featured_on=pythonbytes">Textual</a></li> <li>Tested in macOS only at this point. Chances are very high it works on Linux. Slightly lower chance (but non-zero) that it works on Windows. <ul> <li>Can confirm it works on Linux</li> </ul></li> </ul> <p><strong>Brian #2:</strong> <a href="https://posting.sh?featured_on=pythonbytes">posting: The API client that lives in your terminal</a></p> <ul> <li>Also uses Textual</li> <li>From Darren Burns</li> <li>Interesting that the installation instructions recommends using uv: <ul> <li>uv tool install --python 3.12 posting</li> </ul></li> <li>Very cool. Great docs. Beautiful. keyboard centric, but also usable with a mouse.</li> <li>“Fly through your API workflow with an approachable yet powerful <strong>keyboard-centric</strong> interface. Run it locally or <strong>over SSH</strong> on remote machines and containers. Save your requests in a readable and <strong>version-control friendly</strong> format.”</li> <li>Able to save multiple environments</li> <li>Great colors</li> <li>Allows scripting to run Python code before and after requests to prepare headers, set variables, etc.</li> </ul> <p><strong>Michael #3:</strong> <strong>Extra, extra, extra</strong></p> <ul> <li><a href="https://training.talkpython.fm/courses/getting-started-with-spacy?featured_on=pythonbytes">spaCy course</a> swag give-away, <a href="https://forms.gle/MJPWh3VCB58Peegj7?featured_on=pythonbytes">enter for free</a></li> <li>New essay: <a href="https://mkennedy.codes/posts/opposite-of-cloud-native-is-stack-native/?featured_on=pythonbytes">Opposite of Cloud Native is?</a></li> <li>News: <a href="https://talkpython.fm/blog/posts/we-have-moved-to-hetzner/?featured_on=pythonbytes">We've moved to Hetzner</a></li> <li>New package: <a href="https://mkennedy.codes/posts/introducing-the-chameleon-flask-package/?featured_on=pythonbytes">Introducing chameleon-flask package</a></li> <li>New release: <a href="https://github.com/mikeckennedy/listmonk?featured_on=pythonbytes">Listmonk Python client</a></li> <li><a href="https://www.tiobe.com/tiobe-index/?featured_on=pythonbytes">TIOBE Update</a></li> <li><a href="https://peps.python.org/pep-0750/?featured_on=pythonbytes">PEP 750 – Template Strings</a></li> <li><a href="https://canarymail.io?featured_on=pythonbytes">Canary email</a></li> <li>Left Omnivore, for Pocket, left Pocket for, …, landed on <a href="https://www.instapaper.com?featured_on=pythonbytes">Instapaper</a> <ul> <li>Supports direct import from Omnivore and Pocket</li> <li>Though <a href="https://hoarder.app/?featured_on=pythonbytes">Hoarder</a> is compelling</li> </ul></li> <li>Trying out <a href="https://zen-browser.app/?featured_on=pythonbytes">Zen Browser</a> <ul> <li>Wasn’t a fan of Arc (<a href="https://www.yahoo.com/tech/arc-browser-creator-moving-project-151945233.html?featured_on=pythonbytes">especially</a><a href="https://www.yahoo.com/tech/arc-browser-creator-moving-project-151945233.html?featured_on=pythonbytes"> now)</a> but the news turned me on to Zen</li> </ul></li> </ul> <p><strong>Brian #4:</strong> <a href="https://micro.webology.dev/2024/11/03/uv-does-everything.html?featured_on=pythonbytes">UV does everything or enough that I'm not sure what else it needs to do</a></p> <ul> <li>Jeff Triplett</li> <li>“UV feels like one of those old infomercials where it solves everything, which is where we have landed in the Python world.”</li> <li>“My favorite feature is that UV can now bootstrap a project to run on a machine that does not previously have Python installed, along with installing any packages your application might require.”</li> <li>Partial list (see Jeff’s post for his complete list) <ul> <li>uv pip install replaces pip install</li> <li>uv venv replaces python -m venv</li> <li>uv run, uv tool run, and uv tool install replaces pipx</li> <li>uv build - Build your Python package for pypi</li> <li>uv publish - Upload your Python package to pypi, replacing twine and flit publish</li> </ul></li> </ul> <p><strong>Extras</strong> </p> <p>Brian:</p> <ul> <li><a href="https://nedbatchelder.com/blog/202411/coveragepy_originally.html?featured_on=pythonbytes">Coverage.py originally </a>was just one file</li> <li>Trying out BlueSky <a href="https://bsky.app/profile/brianokken.bsky.social?featured_on=pythonbytes">brianokken.bsky.social</a> <ul> <li>Not because of Taylor Swift, but nice. </li> <li>There are a lot of Python people there.</li> </ul></li> </ul> <p><strong>Joke:</strong> <a href="https://devhumor.com/media/how-programmers-sleep?featured_on=pythonbytes">How programmers sleep</a></p>
November 14, 2024 08:00 AM UTC
Stefan Scherfke
Publishing to PyPI with a Trusted Publisher from GitLab CI/CD
PyPA’s Trusted Publishers let you upload Python packages directly from your CI pipeline to PyPI. And you don’t need any long-lived secrets like API tokens. This makes uploading Python packages not only easier than ever and more secure, too.
In this article, we’ll look at what Trusted Publishers are and how they’re more secure than using API tokens or a username/password combo. We’ll also learn how to set up our GitLab CI/CD pipeline to:
- continuously test the release processes with the TestPyPI on every push to
main
, - automatically perform PyPI releases on every Git tag, and
- additionally secure the process with GitLab (deployment) environments.
The official documentation explains most of this, but it doesn’t go into much depth regarding GitLab pipelines and leaves a few details unexplained.
Why should I want to use this?
API tokens aren’t inherently insecure, but they do have a few drawbacks:
- If they are passed as environment variables, there’s a chance they’ll leak (think of a debug
env | sort
command in your pipeline). - If you don’t watch out, bad co-maintainers can steal the token and do mischief with it.
- You have to manually renew the token from time to time, which can be annoying in the long run.
Trusted Publishers can avoid these problems or, at the very least, reduce their risk:
- You don’t have to manually renew any long-lived tokens.
- All tokens are short-lived. Even if they leak, they can’t be misused for long.
After we’ve learned how Trusted Publishers and protected GitLab environments work, we will take another look at security considerations.
How do Trusted Publishers work?
The basic idea of Trusted Publishers is quite simple:
- In PyPI’s project settings, you add a Trusted Publisher and configure it with the GitLab URL of your project.
- PyPI will then only accept package uploads if the uploader can prove that the upload comes from a CI pipeline of that project.
The technical process behind this is based on the OpenID Connect (OIDC) standard.
Essentially, the process works like this:
- In your CI pipeline, you request an ID token for PyPI.
- GitLab injects the short-lived token into your pipeline as a (masked) environment variable. It is cryptographically signed by GitLab and contains, among other things, your project’s path with namespace.
- You use this token to authenticate with PyPI and request another token for the actual package upload.
- This API token can now be used just like “normal” project-scoped API tokens.
The Trusted Publishers documentation explains this in more detail.
One problem remains, though: An ID token can be requested in any pipeline job and in any branch. Malicious contributors could sneak in a pipeline job and make a corrupted release.
This is where environments come in.
Environments
GitLab environments represent your deployed code in your infrastructure. Think of your code running in a container in your production or testing Kubernetes cluster; or your Python package living on PyPI. :-)
The most important feature of environments in this context is access control: You can protect environments, restricting deployments to them. For protected environments, you can define users or roles that are allowed to perform deployments and that must approve deployments. For example, you could restrict deployments (uploads to PyPI) to all maintainers of your project, but only after you yourself have approved each release.
Note
Protected environments are a premium feature.
Non-profit open source projects/organizations can apply for a free ultimate subscription.
It seems that very old projects also have this feature enabled. Otherwise I can’t explain why I have it for Typed Settings but not for my other projects…
To use an environment in your CI/CD pipeline,
you need to add it to a job in the .gitlab-ci.yml
.
If we also store the name of the environment in the PyPI deployment settings, only uploads from that environment will be allowed, i.e. only uploads that have been authorized by selected people.
Only maintainers can deploy to the release environment and only after Stefan approved it. Only maintainers can deploy to the release environment and only after Stefan approved it.Security Considerations
The last two sections have already hinted at this: GitLab environments are only truly secure if you can protect them.
Let’s take a step back and consider what threats we’re trying to protect against, so that we’ll then be able to choose the right approach:
- Random people doing a merge request for your project.
- Contributors with the developer role committing directly into your project.
- Co-maintainers with more permissions then a developer.
- A Jia Tan which you trust even more than the other maintainers.
What can we do about it?
- Code in other people’s forks doesn’t have access to your project’s CI variables nor can it request OIDC ID tokens in your project’s name. But you need to carefully review each MR!
- Contributors with only developer permissions can still request ID tokens.
If you cannot use protected environments,
using an API token stored in a protected CI/CD variable is a more secure approach.
You should also protect your
main
branch and all tags (using the*
pattern), so that devleopers only have access to feature branches. You’ll find it under Settings → Repository → Protected branches/tags. - Protected CI/CD variables do not protect you from malicious maintainers, though. Even if you only allow yourself to create tags, other maintainers still have access to protected variables. Protected environments with only a selected set of approvers is the most secure approach.
- If a very trusted co-maintainer becomes malicious, there’s very little you can do. Carefully review all commits and read the audit logs (Secure → Audit Events).
So that means for you:
- If you are the only maintainer of a small open source project, just use a Trusted Publisher with (unprotected) environments.
- If you belong to a larger project with multiple maintainers, consider applying for GitLab for Open Source and use a Trusted Publisher with a protected environment.
- If there are multiple contributors and you don’t have access to protected environments, use an API token stored in a protected CI/CD variable and try only grant developer permissions to contributors.
See also
Please also read about the security model and considerations in the PyPa docs.
Putting it all together
Configuring your GitLab project to use a trusted publisher involves three main steps:
- Update your project’s publishing settings on PyPI and TestPyPI.
- Update the CI/CD settings for your GitLab project.
- Update your project’s
pyproject.toml
and.gitlab-ci.yml
.
PyPI Settings
Tell PyPI to trust your GitLab CI pipelines.
- Log in to PyPI and go to your account’s Publishing settings. Here, you can manage and add trusted publishers for your project.
Add a new trusted publisher for GitLab as shown in the screenshot below.
Enter your project’s namespace (your GitLab username or the name of your organization), the project name, the filename of your CI def (usually
.gitlab-ci.yml
).Use
release
as the environment name!- Repeat the same steps for the TestPyPI,
but use
release-test
as environment name.
GitLab CI/CD Settings
You need to create two environments and protect the one for production releases.
Open your project in GitLab, then go to Operate → Environments and click Create an environment to create the production environment:
- Title:
release
- Description:
PyPI releases
(or whatever you want) - External URL:
https://pypi.org/project/{your-project}/
(the URL is displayed in a few places in GitLab and helps you to quickly navigate to your project on PyPI.)
Click Save.
- Title:
Click New environment (in the top right corner) to create the test environment:
- Title:
release-test
- Description:
TestPyPI releases
(or whatever you want) - External URL:
https://test.pypi.org/project/{your-project}/
Click Save.
- Title:
If protected environments are available (see the note above), navigate to Settings → CI/CD and open the Protected environments section. Click the Protect an environment button.
- Select environment:
release
- Allowed to deploy: Choose a role or user, e.g.
Maintainers
. - Approvers: Choose a role or user, e.g. yourself.
- Select environment:
release
environment (and thus, upload to PyPI).
Restrict who can deploy into the release
environment (and thus, upload to PyPI).
Changes in Project Files
In order to be able to upload each commit to the TestPyPI, we need a different version for each build. To achieve this, we can use hatch-vcs, setuptools_scm, or similar.
In the following example, we are going to use hatchling with hatch-vcs as the build backend and uv for everything else.
We configure the build backend in our
pyproject.toml
as follows:[build-system] requires = ["hatchling", "hatch-vcs"] build-backend = "hatchling.build" [tool.hatch.version] source = "vcs" raw-options = { local_scheme = "no-local-version" } # TestPyPI lacks support for this [project] dynamic = ["version"]
Hint
Versions with a local component cannot be uploaded to to (Test)PyPI, so we must disable this feature.
Now lets open our project’s
.gitlab-ci.yml
which we’ll edit during the next steps.Hint
The snippets in the next steps only show fragments of the
.gitlab-ci.yml
. I’ll post the complete file at the end of the article.We need at least a
build
and adeploy
stage:stages: - 'build' # - 'test' # - ... - 'deploy'
Python build tools usually put their artifacts (binary wheels and source distributions) into
dist/
. This directory needs to be added to your pipeline artifacts, so that these files are available in later pipeline jobs:build: stage: 'build' script: - 'uv build --out-dir=dist' artifacts: paths: - 'dist/'
For our use-case, we need two release jobs: One that uploads to the TestPyPI on each push (
release-test
) and one that uploads to the PyPI in tag pipelines (release
).Since both jobs are nearly the same, we’ll also define an “abstract base job”
.release-base
which the other two extend.Hint
To improve readability and avoid issues with excaping, we’ll use YAML multiline strings.
The
>-
operator joins the following lines without a line break and strips additional whitespace.See yaml-multiline.info for details.
.release-base: # Abstract base job for "release" jobs. # Extending jobs must define the following variables: # - PYPI_OIDC_AUD: Audience for the ID token that GitLab # issues to the pipeline job # - PYPI_OIDC_URL: PyPI endpoint for retrieving a publish # token with GitLab’s ID token # - UV_PUBLISH_URL: PyPI endpoint for the actual upload stage: 'deploy' id_tokens: PYPI_ID_TOKEN: aud: '$PYPI_OIDC_AUD' script: # Use the GitLab ID token to retrieve an API token from PyPI - >- resp="$(curl -X POST "${PYPI_OIDC_URL}" -d "{\"token\":\"${PYPI_ID_TOKEN}\"}")" # Parse the response and extract the token - >- publish_token="$(python -c "import json; print(json.load('${resp}')['token'])")" # Upload the files from "dist/" - 'uv publish --token "$publish_token"' # Print the link to PyPI so we can quickly go there to verify the result: - 'version="$(uv run --with hatch-vcs hatchling version)"' - 'echo -e "\033[34;1mPackage on PyPI:\033[0m ${CI_ENVIRONMENT_URL}${version}/"'
Now we can add the
release-test
job. It extends.release-base
, defines variables for the base job, and rules for when the job should run:release-test: extends: '.release-base' rules: # Only run if it's a pipeline for the default branch or a tag: - if: '$CI_COMMIT_BRANCH == $CI_DEFAULT_BRANCH || $CI_COMMIT_TAG' environment: name: 'release-test' url: 'https://test.pypi.org/project/typed-settings/' variables: PYPI_OIDC_AUD: 'testpypi' PYPI_OIDC_URL: 'https://test.pypi.org/_/oidc/mint-token' UV_PUBLISH_URL: 'https://test.pypi.org/legacy/'
The
release
job looks very similar, but the variables have different values and the job only runs in tag pipelines.release: extends: '.release-base' rules: # Only run in tag pipelines: - if: '$CI_COMMIT_TAG' environment: name: 'release' url: 'https://pypi.org/project/typed-settings/' variables: PYPI_OIDC_AUD: 'pypi' PYPI_OIDC_URL: 'https://pypi.org/_/oidc/mint-token' UV_PUBLISH_URL: 'https://upload.pypi.org/legacy/'
release
will look like this.
There’s also a link that takes you directly to the release on PyPI.
The output of the release
will look like this.
There’s also a link that takes you directly to the release on PyPI.
That’s it. You should now be able to automatically create PyPI releases directly from your GitLab CI/CD pipeline. 🎉
A successful GitLab CI/CD pipeline for Typed Settings’ v24.6.0 release. A successful GitLab CI/CD pipeline for Typed Settings’ v24.6.0 release.If you run into any problems, you can
- check if the settings on PyPI match your GitLab project,
- read the Trusted Publishers docs,
- read the GitLAB CI/CD YAML syntax reference,
- read the docs for GitLab environments and GitLab OIDC authentication.
You can leave comments over at Mastodon or Bluesky.
And, as promised, here is the complete (but still minimal) .gitlab-ci.yml
from the snippets above.
If you want to see a real-world example,
you can take a look at Typed Settings pipeline definition.
# .gitlab-ci.yml
stages:
- 'build'
# - 'test'
# - ...
- 'deploy'
build:
stage: 'build'
script:
- 'uv build --out-dir=dist'
artifacts:
paths:
- 'dist/'
.release-base:
# Abstract base job for "release" jobs.
# Extending jobs must define the following variables:
# - PYPI_OIDC_AUD: Audience for the ID token that GitLab issues to the pipeline job
# - PYPI_OIDC_URL: PyPI endpoint for retrieving a publish token with GitLab’s ID token
# - UV_PUBLISH_URL: PyPI endpoint for the actual upload
stage: 'deploy'
id_tokens:
PYPI_ID_TOKEN:
aud: '$PYPI_OIDC_AUD'
script:
- >-
resp="$(curl -X POST "${PYPI_OIDC_URL}" -d "{\"token\":\"${PYPI_ID_TOKEN}\"}")"
- >-
publish_token="$(python -c "import json; print(json.load('${resp}')['token'])")"
- 'uv publish --token "$publish_token"'
- 'version="$(uv run --with hatch-vcs hatchling version)"'
- 'echo -e "\033[34;1mPackage on PyPI:\033[0m ${CI_ENVIRONMENT_URL}${version}/"'
release-test:
extends: '.release-base'
rules:
- if: '$CI_COMMIT_BRANCH == $CI_DEFAULT_BRANCH || $CI_COMMIT_TAG'
environment:
name: 'release-test'
url: 'https://test.pypi.org/project/typed-settings/'
variables:
PYPI_OIDC_AUD: 'testpypi'
PYPI_OIDC_URL: 'https://test.pypi.org/_/oidc/mint-token'
UV_PUBLISH_URL: 'https://test.pypi.org/legacy/'
release:
extends: '.release-base'
rules:
- if: '$CI_COMMIT_TAG'
environment:
name: 'release'
url: 'https://pypi.org/project/typed-settings/'
variables:
PYPI_OIDC_AUD: 'pypi'
PYPI_OIDC_URL: 'https://pypi.org/_/oidc/mint-token'
UV_PUBLISH_URL: 'https://upload.pypi.org/legacy/'