How To Publish A Package On PyPI

okay we're gonna get started with the second talk of this last session of PyCon au 2018 next up we have Mark Smith with publishing perfect Python packages on pi PI let's give them a warm welcome thank you everybody I apologize for the the last-minute change of title but I suddenly rise there's a real opportunity for alliteration here and I I couldn't turn it down originally the title was how to publish a package on PI P is much less interesting so I'm about to run through building and publishing a package from scratch in a moment but first I'll just run through talking about myself for a moment this is me I am gt2k on Twitter github and pretty much everything else my real name is Mark Smith and I'm a developer advocate at next mo we're a sponsor of the conference I stole this slide from my colleague Aaron he complained about that before so I thought I'd give him some credit this time we offer web-based api sattell our web developers to write code that sends text messages makes phone calls sends messages via different in its domestic platforms if that sounds interesting we are around for the rest of the conference come and talk to me or Aaron who's wearing a hoodie the same color as mine and that's enough about me I've got a lot to get through in a really short space of time so I'm going to talk really quickly whoa this is actually falling apart can somebody help me with that no I think it's okay that's new they just slid forward okay I'll keep an eye on it so in March 2016 a developer removed a library called left pad from NPM the nodejs equivalent of pi pi it's a big web service containing packages it broke lots and lots of libraries that depended on it left pad was down had been downloaded two million four hundred and eighty six thousand six hundred and ninety six times in the month before it was removed so it was very popular it was just one function and eleven lines of code that padded a string to a certain length by adding characters to the start of this drink lots of people thought that this the existence of this library made the JavaScript community look kind of stupid why would anyone publish a library consisting of just 11 lines of code why would anyone use it now the fact that it could be removed like this and cause these huge problems in the JavaScript ecosystem is a problem but I think in general the existence that this library actually made the JavaScript community and especially NPM look amazing and if you disagree you can fight me on Twitter left pad was not a problem left pad was a solution it was because the JavaScript standard library doesn't contain a solution for adding white space to the start of a line the developer of left pad solved the problem himself and published the solution for other people to use and because it's really easy to share code on NPM people do with really small libraries the type of thing that you would normally in other languages find on Stack Overflow and a gist but copy and paste is not how you should share code I think we all agree but I think people in the Python community are kind of afraid of setup py files over the years the docs and best practice have been a bit tricky to put together things opinions have changed over time there are differing opinions online so it's just difficult to work out how to do things properly or even whether you are doing things properly but they are getting better all the time and hopefully this talk will help in some small way so every time you publish code on a shared repository you make the Python ecosystem stronger you make somebody's life easier and these are all good things where would we be if Django numpy or pandas had never been published so let's make a package so the idea of this talk really is that you've you're working on a project and you've written some code and you realize it's not that specific to the project so you want to share it with other people so well let's start with some code that we could be useful to other people I mean I'm sure everyone in this room has had to write this code from one time or another wouldn't it be better if you could just BIP install it and call it so the first thing to do is extract the code and here we've extracted it into a file called hello world up py and so and we put it in a sauce directory I'll explain that in a minute and the next step to make this a publishable package is to write a setup opy file in our hello world directory that contains all this stuff so this is about the simplest setup py file you can write so we're using setup tools which is a third-party dependency and it never used to be necessarily recommended unless you needed the extensions that it provided but these days setup tools is a dependency that pip has so you already have setup tools available it's more powerful than dist utils so we're going to use it so import setup and although we're calling a function really all those arguments passed to setup are just kind of metadata describing our library so the name of our library is hello world so this is what you will pip install this is how it will be looked up in the pi PI database we've picked a really low version number because we're probably going to be iterating several times as we get the package kind of sort of solidified in the early stages I've written a description here which isn't very useful so you would write a one-line description and put it in here so people can understand very quickly what the library does and then in plain modules although it looks in this case it contains the same thing as name and in it often will contain the same thing as name and this is what you import so it's not what you pip install is this it's the py file essentially that's going to be copied into the package and uploaded to pi PI and then this last line is needed because we have a source directory which I mentioned in the previous slide it's it's kind of a magic copy paste thing just do that so let's test it we're already at the stage where we can build our package so we call the setup py file with this me distill come and which is short for build me a wheel binary distribution file wheel is the file format that is currently the standard for pip and we get lots of output that comes out of this which I've got lots of that out the important line is the one I've highlighted which says that it's copied our source code into the library folder so that's a good sign and now our project looks like this so it's created a egg info folder which I kind of hate but we'll just ignore that for the rest of the talk and we have a build folder which contains the source code essentially copied our source code into the build folder so that's good and then it zipped it up into this WHL wheel file which is what we want it so we know that it builds we didn't get any errors but we don't know if everything that we need is in there so let's install it so installed the package we just created into our own virtual env running using this command to this pip install command but you may be wondering what this – e dot thing at the end is so the – e flag means you're installing it as editable code and what this means is it's not going to copy the source code in the source folder into the virtual nf into the virtual ends library folder instead it's going to link that virtual end that to the the Python path so that when you can import HelloWorld just even though it's not it has in fact the point is it's importing the file that you're editing so everything you're editing in source is the thing that's being imported in your current virtual ends and then oops excuse me the dot at the end just means you're installing the current directory so it runs the setup dot py in the directory that you're currently in so it's also a HelloWorld distribution so the end result here is that your HelloWorld library will now be in portable into your virtual F so let's test that so we can run Python we can import hello world we can import say hello we can call it a couple of times this is going to get a bit tiresome if we have to do this every time to make sure our setup file is still correct so we'll sort that out in a minute so in theory we can upload this to pipe the I already and you might want to do that just at this early stage to make sure that you're reserving the name of your package so somebody else doesn't take it and just kind of make sure that your basics are set up but I would say that really before we publish to pipe VI there are three things we need to do first we've got documentation testing and but first a bit a little bit of housekeeping so you need to get ignore file as you've seen we've just built the project and it's created this dist folder the build folder and stuff you don't want to get because those are generated files you don't want to add them to get there's a great site here called git ignore i/o you just try placing in that text box hit create and it will spit out a bunch of text of common artifacts on Python projects and you can just paste that into a git ignore file then we need a license to zoo license comm is a great human readable site human understandable sight for understanding what powers you're granting your users and so I recommend going through that eventually it'll give you the text of the library that you look at the license you're looking at and you paste that into a licensed txt file and then finally there's some metadata that you should really add to your setup file so I don't know if you noticed the F string in that function that means that was only introduced in Python 3.6 so here's some classifiers that say specifically it's a Python 3 library and more specifically it's a Python 3.6 and 3.7 is 3.7 compatible and over time as new releases of python come out you'll want to test it on those and update this so that people understand that it's compatible with the version of python that they're using I've also just but in the fact that we using the GPL v2 plus license on this particular package and I've said it's OS independent because we're not doing anything native to any one platform so now it can't we come to documentation first you need to make a big choice essentially up front you need to pick the file format that you're going to use for your documentation you can choose out there option other options but generally it comes down to restructured text or markdown they have various trade-offs restructure text is slightly more powerful it's very well known in the Python community but not so much outside and you can use Sphinx to generate static site of documentation using restructured text if you choose markdown it's simpler and other language communities understand it more and you can use make Docs which is basically the equivalent of Sphinx but for markdown files both of those support read the docs as well so if you're planning to publish your documentation to read the docs it doesn't matter which of these two choices you make I'm going to choose markdown just because it's quick and easy so we can write a readme file with dot MD to say it's a markdown file we give it you should give it a title which is the name of the project you give it a one-line description of what the project is for you should have installation instructions in there this is just very simple and you should have some basic snippets of code code examples to show how the library might be used and in some projects this may be all you need in other projects you will want to use Fink's or make Docs and upload and publish to read the docs now we have a readme we've kind of got a long description of the project and that's a handy thing to publish to pi PI when you're looking at packages on pi PI you'll notice that they usually have essentially a copy of the readme from their github page so that's exactly what we're going to do we because the setup file is a Python code we can open the readme file read it into a variable and then we can provide it to our setup function as the long description argument and then this last argument here is a nice thing these days is that PI P I now supports markdown but we have to tell it upfront so we tell it that the content type is of text markdown and it supports plain text rest or markdown these days so now we want to test and I would recommend using PI test because play test it's awesome but this now means we have a development dependency so where other contributors will need to install PI tests so we need something like a requirements txt file or a pip file now I'm getting a lot of usage out of pip in for these days so we're going to use pip file because it may be new to some members of the audience and it's kind of new and fun so we need a bit file so we don't need to write this ourselves once we've got pip env installed it will essentially maintain this file for us mostly so the first thing to do is to tell our develop a dependency file that we want to install the current module so this is a kind of a copy of our pip install minus u dot but now it's pip ends which means it will also store this store this information in the bit file then secondly we install our development dependency of high test saying that we want something compatible with well up 3.7 and up and then we run pip end shell so pipin also manages your virtual end for you so shell will drop you into the virtual environment it's created with the other two commands so this creates a pip file I think these are pretty straightforward that except for this first highlighted line which is what they're – e dot command stored it's it's got like a magic string at the start which I don't understand it's got path equals dot which makes sense it's got editable equals true which kind of speaks for itself and then we've got our dev packages that we depend upon and we've got PI tests and it said greater than or equal to three point seven so this means when other pip users download our project they can just run pip and install and it will install this stuff for them which is kind of cool but the really cool thing about bit files and pip ends is that it also creates these lock files so as part as well as storing this information the PIP file it also installs them it goes off to PI pi it says what versions of PI tests do you have that are greater or equal to three point seven and it found three point seven point two and installed it for us and an it stored that exact version in the lock file and you could commit this file to github when other people to get rather and then when other people are inst installing your projects using PIP and they will get exactly that version that you're using and then when you want to upgrade when there's a new version release that's compatible with your specifier you can run upgrade and it will update your lock file and install the newer version of PI test that's available so it can keep you up to date but you would only commit that after you've run your tests which kind of keeps you stable so now we have two dependency lists well we don't because this setup file doesn't have any dependencies but if we did it's for production dependencies like flask or click none PI pandas and the version should be as relaxed as possible because you don't want to force your users to use exactly the same version as you if you if your library can be compatible across multiple versions of your dependencies you should do that and then your user gets to use their favorite version of pandas and you're not locking them into a specific version so when you can do that you should pip files are slightly different although versus requirements files where you should be as specific as possible you can use the same version type versions of specifies in your pit file because it locks in the versions in the lock file as we just explained but this is for development requirements so it's the things other developers need to work on your code or run your tests so it's things like PI test and mock and coverage so now we have quite s installed we can write some tests anybody who's worked with ply test before will know how simple it is to write sort of simple tests and then it scales up nicely using fixtures and things like that so here's some tests that just test that the output of our function is what we expect and then we can run them just using PI tests the PI test commands and then it prints out some staffing ss2 to test passed in point zero two seconds that's all good and so now this is what our project looks like there's still not a huge amount of stuff there but it's gradually building up and now we have we can essentially publish nicely documented binary distributions now so that's good but the other type of distribution that sometimes gets overlooked is source distributions so for whatever reason when other people can't use get to pull things down or they can't access pi PI they may need to get a table all of your code that they need an audit and then stick into their own repository it's the way companies like Google and Facebook work so we we run this estadística man to create a table all containing all our source code really we want it to be everything that's in it so it's all the files that we're distributing via get normally so when we run s dist just as a side-effect here we it's telling us that there's some extra metadata it would like so it would like us to have a URL and some author information so we just open up our setup py file and add that in so yeah we've added the URL here's my github page if you had a documentation page that might be another option it's really down to personal choice and then we can just kind of test the Tarble by just listing the files that are inside it so when we look at it it's done a pretty good job we've got our setup Python set up py file we've got a source directory with all the stuff in there but we're missing a few things we're missing our license file missing our pet files we're missing our tests so that's not really a very good source distribution and where you fix this is by writing a manifesto I in file and this I'm not going to show you what this looks like but it's a file that kind of tells you what type of it matches certain file names file types and adds them to the list of stuff that goes in your source distribution and they're really they're slightly fiddly to write they're no fun and they're easy to get wrong and so instead I recommend using this tool called check manifest which is just amazing and what that does is it tries to create manifest rules that match what's in get what you've committed so it kind of matches the basic idea of what should go in a manifest so you install it like this you run check manifest to create and it will actually just store those rules away for you it's a great way to get started with a manifest file and then you add that manifest to get and then when you run your s dist command again and then Stout the contents of your table all we get things like the PIP file the pitfall lock but just to make sure before you run check manifest create you have committed the files to get because that's the point of check manifest so now we're really at the point where we can publish so let's get this thing on pi PI so here we run our setup command we tell it we want to run a we want to create the the wheel we want to create the source distribution and then afterwards we can just check that those have appeared in the dist folder so you can see we've got a wheel file and a GZ table all and then we want to push those two pi pi and it's really simple you go one two pi PI you create an account it will prompt you to create a username and a password and then we want to use twine to upload so there's the setup setup tools can automatically upload your stuff but you shouldn't use that command for various reasons one of which is that up until relatively recently it was insecure twine has always been insecure from the start it also separates building your Yodas at your distributions from uploading your distributions so you can build your table and check it rather than building and uploading it and hoping that you've got it all right so it's brilliant really easy to use once you've installed it you run twine upload and then give it a list of files that you want to upload so we're using a wild card on our dist folder we just want to upload everything that's in there and it prints out a bunch of stuff telling you it's uploaded and then you go to PI pi and they have a box there that says they've recently updated and hopefully you should be in that box if you're quick enough on the browser and here's our library so I've renamed it slightly because obviously somebody's already uploaded a package called hello world to PI pi so I've added my username to the end of that it tells you how to install it it's nicely formatted a markdown readme in the page down the bottom where you can't see there's a link to our github repo I mean it's pretty much what you expect to see with the published package so those are really the bare essentials for publishing a package we've got some tests and documentation we've published the package so that people can use it I would recommend doing a few other things but maybe you can publish first before moving on to the other stuff it's better to make stuff available than to perfect it if you're constantly trying to perfect it before releasing it you never release it so one of the things I'd like to fix next is that we should really test against different versions of Python and there's a tool called tox if people haven't used it heard of it before that does exactly that so a tox configuration can be really simple this one is fine for this current project so this is really three lines of configuration the first one says we want to test under Python three to three point six by three point seven and then we say our dependency is play test for running the tests and then we say to actually run the tests run run the PI test command that they just installed in the previous line and so what tox does is it creates a virtual environment that I've for three point six and three point seven in this case it installs your package into that environment and then it runs the command that you told it to run and assuming that runs with a zero exit code then then everything's fine so running it looks a bit like this so we just run the tox command and you can see that it's running under Python 3.6 and veneks running under the alpha version of python 3.7 and then further down assuming everything goes okay we get this nice light this nice summary saying come on succeeded come on succeeded and in if all the different virtual ends pass you get this nice congratulations line with a little smiley but one thing I wanted to highlight so I mentioned this earlier why we have a source directory there's that when tox runs your tests your your current working directory is actually that top-level directory the one the one that we're working and the one that contains our setup py and when you're running Python commands your current working directory is in your Python path it's the first thing in your Python path so if it's looking for a file called hello world py it will load that one in the top directory it won't load the one that was installed by your setup py so you can end up in situations where your your setup configuration is wrong it will work but like it's not copying the hello world file into your virtual ends but your tests will still all pass so this is why we have a source directory is to just move our code out of the in portable path unless it's specifically installed so again this is why we need to do run pip install minus e dot to install it just once but yeah it just stops us from accidentally importing the wrong copy hello world so that talks allows us to create these kind of contained environments on our machine and run tests inside them but and that's great but developer machines are messy things we have things in our path that maybe we're not using all the time we have environment variables set that could may may affect the running of your tests really what you want is a container or a virtual machine to run these tests for you and there's nice services like Travis if you've got an open source library will do this for you and all you need is a Travis configuration file that you push to github and this is all free so this is what a basic Travis configuration looks like we say it's a Python project we say we want to run it under Python 3.6 and the development version of Python 3.7 and then we tell it how to install our project which in this case is just pip install talks it will automatically install from our setup py as part of the whole kickoff before the tests and then it will run the script that you asked it to so in this case we asked it to run talks in a verbose mode with in the environment py this is something I only learned recently if you ask talks to run the pea white environment it will pick up whatever the sort of the available Python is so we're using the Python that Travis installed for us it creates a virtual end of that and then runs your tests inside it so every time we commit now Travis will pick it up create a couple of virtual machines and run our tests and then send us a message on github to say that our tests passed or failed which is kind of cool and it makes working with other developers easier it makes contributing easier because contributors can see sooner rather than without you even saying anything the contributors can see that their pull request doesn't pass the tests and that allows them to fix it before you ever need to go and have a look at it which is which is good so for extra credit there's other stuff you can do so we should add badges to the top of the readme to give people more confidence in the project a simple badge to add is often code coverage using one of these other services like coveralls and code Cavallaro and they all integrate really nicely with github we can add quality metrics where they essentially run linting on the code and tell you possible mistakes you've made or formatting errors and things like that bomp version is a really nice tool for managing your version numbers and it will automatically tag your commits to to tell github that essentially you're making a release you should test on different platforms you can write more documentation you can always write more documentation this is really a lot of work I mean I don't know if you noticed that and I've run through all this in half an hour I mean really if you're going through this in detail it takes quite a lot longer this is this is maybe a day or twos work to really kind of refine all this stuff so what I recommend is that you don't do any of this so this is like the biggest bait-and-switch I've ever done in a talk so so at least now you know why you do all these things or kind of how they all go together but it's this is the it's repetitive its error-prone and it's kind of boring you know what's really good at fixing problems that are repetitive boring and error-prone so tools on computers so in this case we've got code that writes code there's this awesome project you may have heard of called cookie cutter which is it runs through a template and sort of builds directories for the files and folders it can ask you a few simple questions to the customize the things that it's making for you and you've got templates for a whole bunch of things like ansible configuration and but among other things creating new Python projects which i think is what it was really designed for in the beginning so let's just for a moment pretend I didn't spend twenty five minutes telling you how to build a project project from scratch and instead we we use cookie cutter so the first thing to do is install cookie cutter and then we run cookie cutter and we give it the path to a specific template so in this case we haven't even downloaded the template upfront we're pointing it at a template called cookie cutter PI library which is maintained by a guy a vaguely no called Yunel and I'd love to pronounce his surname but it's got accent so I've never seen before in it so I'm not even going to attempt that he's also written a bunch of blog posts on good packaging practices and easy mistakes you can make when you're packaging so those will be up on my on the repo for this talk probably later on today or tomorrow so first asks you lots of questions it asks you lots and lots of questions because it's as well as giving you in court of a single pass through building a package this package it can work with pay tests it can work with knows it can work with the unit test and it once you wants to know which one of these things you want to work with all the time and a bunch of other services it can integrate with so ask you lots of questions but it's still quicker than writing those config files by hand and then once it's generated this folder full of files you copy in your own code and tests you'll need to make some minor file tweaks to the configuration has generated for you so to specify the versions of Python you want to test against and things like that and then you're done and the end result looks a bit like this and so when I last ran through this process this took me five minutes so it did take a little bit of manual configuration but it took me about five minutes from beginning to end and I could have pushed it to pipe the I if I'd wanted to so I could have cut this entire talk down to two slides in fact really this is a lightning talk that I just spent 30 minutes giving so if I'd wanted instead of wasting all your time but hopefully this gives you a good overview of good packaging practice and encourages you to publish your own packages on pipey I it maybe understand what's going on under the hood so these slides and a bunch of supporting material and bunch of references will be up on github tomorrow probably I recommend you follow me on Twitter but that's because I like what I publish on Twitter and thank you very much we have a little gift for you thank you so much for speaking he's not gonna be taking questions at this time but you could probably talk to him out in the hall absolutely I'm around and the next talk here is going to be at 250 so in about 10 minutes

Leave a Reply

Your email address will not be published. Required fields are marked *