Proposal: Simpler source file headers

List overview All Threads
Download

newer

older

NAV status meeting minutes...

Bugfixing for 3.5

Stein Magnus Jodal

16 Okt 2008 16 Okt '08

15:19

Hi,

I've written up a proposal on what source file headers should contain at [1]. Comments, changes, questions welcome from everyone. Approval of the proposal welcome from Vidar and/or Morten.

[1] http://metanav.uninett.no/devel:source_file_headers

-- Stein Magnus Jodal UNINETT

Show replies by date

Vidar Faltinsen

18 Okt 18 Okt

11:25

Stein Magnus Jodal wrote:

...

Hi,

I've written up a proposal on what source file headers should contain at [1]. Comments, changes, questions welcome from everyone. Approval of the proposal welcome from Vidar and/or Morten.

[1] http://metanav.uninett.no/devel:source_file_headers

Very good initiative! The details on this matter I leave to Morten to decide!

- Vidar

Morten Brekkevold

20 Okt 20 Okt

15:27

On Thu, 16 Oct 2008 15:19:12 +0200 Stein Magnus Jodal stein.magnus.jodal@uninett.no wrote:

...

Hi,

I've written up a proposal on what source file headers should contain at [1]. Comments, changes, questions welcome from everyone. Approval of the proposal welcome from Vidar and/or Morten.

[1] http://metanav.uninett.no/devel:source_file_headers

Good initiative, I have some comments. *Verbosity alert*

You propose to reduce the comment part of the headers, but you do not address the redundancy of this data in Python meta-variables.

I like the idea of having this data machine-readable, but I'm not certain whether the usage of these meta-variables in Python are standardized somehow. I know that at least the __author__ variable is parsed and displayed in a section of its own by pydoc, for instance. Anyone know of any attempt to standardize or make a common convention out of these variables?

The boilerplate header text you propose to remove comes directly from the suggestions in the GPL howto [1]. The boilerplate text boils (!) down to the following parts:

* Copyright notices. * "This file is part of software X." * "software X (and thus this file) can be redistributed under the terms of the GPL" * Disclaimer of warranty * Where to find the text of the license

Releasing our work under the GPL does, AFAIK, in no way require us to use this boilerplate text as is or at all. I was initially leaning towards supporting your idea of reducing the boilerplate, but having read the boilerplate and the howto over again, I'm not so sure.

I now think we should change the boilerplate somewhat to explicitly state that the software can be redistributed under the terms of the GPL _v2_, which was the original intention, but that we should still include a boilerplate in all files with any meaningful/significant piece of code. I.e. a mostly empty __init__.py file doesn't need any boilerplate, but once you start tossing some real code into it, the boilerplate should be added.

As for dropping author information from source code, I'm against it. Files more often than not appear in non-VCS contexts, and this piece of (dis)credit to the authors should be part of the file contents. You don't remove your name from your master's thesis just because that piece of information is available in the VCS you stored your thesis in, do you?

With regards to SVN keywords, most of these actually come from the good (kidding!) old CVS days. These were brought unchanged into SVN, and keyword substitution was enabled in SVN for most of these files. This functionality is also available in Mercurial (although as a plugin, methinks), but so far we haven't enabled it.

I've always thought of the $Id$ keyword as a useful tool for debugging problems where people have been mixing and matching several branches or trunk revisions onto their development servers. In practice, however, I've barely used it for anything, so I wouldn't miss it if we removed it. The other keywords in use are just variants of the Id keyword, which are there to fulfill the same purpose. I don't think I've added the keyword to any files I've created in a long time.

As a side issue, the docstring from your proposed Python header example says "A line describing this module.". We're not attempting to define docstring policy for modules here, so it should suffice to say "Module docstring". We tell people to use PEP-8 [1] as our Python code style guide. This PEP also references PEP-257 [2], which documents docstring conventions; I also think we should adopt this PEP along with PEP-8.

[1] http://www.gnu.org/licenses/gpl-howto.html [2] http://www.python.org/dev/peps/pep-0008/ [3] http://www.python.org/dev/peps/pep-0257/

-- mvh Morten Brekkevold UNINETT

Stein Magnus Jodal

16:16

On Mon, 2008-10-20 at 15:27 +0200, Morten Brekkevold wrote:

...

On Thu, 16 Oct 2008 15:19:12 +0200 Stein Magnus Jodal stein.magnus.jodal@uninett.no wrote:

...
I've written up a proposal on what source file headers should contain at [1]. Comments, changes, questions welcome from everyone. Approval of the proposal welcome from Vidar and/or Morten.

[1] http://metanav.uninett.no/devel:source_file_headers

Good initiative, I have some comments. *Verbosity alert*

Alert needed ;-)

...

You propose to reduce the comment part of the headers, but you do not address the redundancy of this data in Python meta-variables.

Yes, I do. I propose to use the comments, but as long as we use either the variables or the comments, and not both, I'm in.

"Variables: Readable to a human, just like the comments, but also easily extracted programatically. Only the __version__ variable is mentioned in PEP 8. May be preferable to comments, but we do not extract them in any way, so lets just keep to the dead simple comments."

...

I like the idea of having this data machine-readable, but I'm not certain whether the usage of these meta-variables in Python are standardized somehow. I know that at least the __author__ variable is parsed and displayed in a section of its own by pydoc, for instance. Anyone know of any attempt to standardize or make a common convention out of these variables?

As far as I know, there are no standard here, except PEP 8 stating that you should use __version__ "If you have to have Subversion, CVS, or RCS crud in your source file".

*snip boiled down boilerplate*

...

Releasing our work under the GPL does, AFAIK, in no way require us to use this boilerplate text as is or at all. I was initially leaning towards supporting your idea of reducing the boilerplate, but having read the boilerplate and the howto over again, I'm not so sure.

I now think we should change the boilerplate somewhat to explicitly state that the software can be redistributed under the terms of the GPL _v2_, which was the original intention, but that we should still include a boilerplate in all files with any meaningful/significant piece of code. I.e. a mostly empty __init__.py file doesn't need any boilerplate, but once you start tossing some real code into it, the boilerplate should be added.

May I ask why you believe this is needed? I've read the GPLv2 license a couple of times, but I am in no way tempted to do a third read at the moment.

My problem with the boilerplate is simply its size.

...

As for dropping author information from source code, I'm against it. Files more often than not appear in non-VCS contexts, and this piece of (dis)credit to the authors should be part of the file contents. You don't remove your name from your master's thesis just because that piece of information is available in the VCS you stored your thesis in, do you?

I don't believe that's a fair comparision.

Anyway, how useful is it to be able to blame me for my code in three years time or when debugging a NAV installation? Things like that only serves to discourage oldtimers to hang around in the IRC channel (yes, this is a #ref).

The useful point in time for blaming someone for their code is at or right after the time of commit. That is when the committer still has the piece of code fresh in mind and may actually have the possibility to improve it.

*snip CVS/SVN keyword paragraphs*

...

As a side issue, the docstring from your proposed Python header example says "A line describing this module.". We're not attempting to define docstring policy for modules here, so it should suffice to say "Module docstring". We tell people to use PEP-8 [1] as our Python code style guide. This PEP also references PEP-257 [2], which documents docstring conventions; I also think we should adopt this PEP along with PEP-8.

IMHO, "A line describing this module." and "Module docstring" says about the same thing. The point here is the placement of the docstring relative to the other elements.

Including PEP-257 in our set of conventions is of course supported.

-- Stein Magnus Jodal UNINETT

Morten Brekkevold

23 Okt 23 Okt

15:05

On Mon, 20 Oct 2008 14:16:42 +0000 Stein Magnus Jodal stein.magnus.jodal@uninett.no wrote:

...

As far as I know, there are no standard here, except PEP 8 stating that you should use __version__ "If you have to have Subversion, CVS, or RCS crud in your source file".

We've discussed various aspects of the issues in this thread at today's status meeting. While there seem to be no officially documented conventions for metavariable usage in Python files, the __copyright__, __author__ and __license__ variables seem pretty prevalent (just ask Google).

I've come up with this suggestion for Python headers:

---[snip]--- #! /usr/bin/env python # -*- coding: utf-8 -*-

"""Module docstring, using PEP-257 conventions."""

__copyright__ = """ Copyright (C) 2002-2004 Norwegian University of Science and Technology Copyright (C) 2007-2008 UNINETT AS """ __license__ = "GPLv2 http://www.gnu.org/licenses/old-licenses/gpl-2.0.txt" ---[snip]---

An alternate suggestion for the value of __license__ is:

__license__ = """This file is part of Network Administration Visualized (NAV).

NAV is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License version 2 as published by the Free Software Foundation.

This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details. You should have received a copy of the GNU General Public License along with NAV. If not, see http://www.gnu.org/licenses/. """

As for "wasting" as little monitor space as possible, as you put it, it's time you upgrade to a real text editor (tm) which can handle more than one monitor page of text :->

I would like to keep the "this file is part of NAV"-part for situations where someone takes a file out of its NAV context.

The snippets above should be easy to adapt into comments in files written in other programming languages.

...

...
piece of code. I.e. a mostly empty __init__.py file doesn't need any boilerplate, but once you start tossing some real code into it, the boilerplate should be added.

May I ask why you believe this is needed? I've read the GPLv2 license a couple of times, but I am in no way tempted to do a third read at the moment.

My problem with the boilerplate is simply its size.

I think it is needed because not everyone is as intimately aquainted with what a GPL license is or means as you and I are. Also, the disclaimer is a legal precaution, which is why the FSF suggests it in the first place.

We can't expect everyone too look up or recall the entire text of the GPL when they see it mentioned in a file, so we repeat one important sentiment, which has nothing to do with actual redistribution of the software itself: You're on your own.

...

Anyway, how useful is it to be able to blame me for my code in three years time or when debugging a NAV installation? Things like that only serves to discourage oldtimers to hang around in the IRC channel (yes, this is a #ref).

The useful point in time for blaming someone for their code is at or right after the time of commit. That is when the committer still has the piece of code fresh in mind and may actually have the possibility to improve it.

You seem to be overly preoccupied with blame here. Should I interpret this as though you don't think any of the code you've written is worthy of praise?

I can't see that you've provided any useful or convincing arguments against listing authors in source code. Thomas did, however, present one argument during today's meeting, which actually changed my mind.

Quoting from the Subversion Hacker's guide, http://subversion.tigris.org/hacking.html#other-conventions :

«We have a tradition of not marking files with the names of individual authors (i.e., we don't put lines like "Author: foo" or "@author foo" in a special position at the top of a source file). This is to discourage territoriality — even when a file has only one author, we want to make sure others feel free to make changes. People might be unnecessarily hesitant if someone appears to have staked a personal claim to the file.»

According to Thomas, the Subversion team maintains a file that lists all contributors, much like we do in the about.html file.

We also discussed the old "dilemma": How much do I need to contribute before I can/should add myself to the list of Authors? This dilemma, of course, disappears when Authors are dropped from source files.

I've come to the conclusion that we shouldn't require any sort of Author tagging in our files (though we never did write down such a policy, it remained by convention).

But two questions remain: * Should we start removing Authors-lines willy-nilly right now? * Should we refuse to accept new code which is tagged with Authors?

...

Including PEP-257 in our set of conventions is of course supported.

It's on my TODO-list for updating the HACKING file (and moving it to the wiki).

-- mvh Morten Brekkevold UNINETT

Jørgen Abrahamsen

5 Jan 5 Jan

10:54

On Thu, Oct 23, 2008 at 03:05:25PM +0200, Morten Brekkevold wrote:

*snip whole discussion*

...

But two questions remain:

Should we start removing Authors-lines willy-nilly right now?

Should we refuse to accept new code which is tagged with Authors?

I think we should start removing Author-lines right away either as you go or in chunks - whatever rocks your boat. And also refuse code tagged with authors. This should be stated in the HACKING file.

To sum up what's been decided so far: - Remove the legacy $Id$ keyword - Simplify the source file header. (I support Morten's suggestion with a disclaimer included in the version tag.)

Have I missed something?

Hopefully we can close this discussion and start with the migration soon.

-- Jørgen Abrahamsen

Morten Brekkevold

22 Jan 22 Jan

10:09

On Mon, 5 Jan 2009 10:54:56 +0100 Jørgen Abrahamsen jorgen.abrahamsen@uninett.no wrote:

...

I think we should start removing Author-lines right away either as you go or in chunks - whatever rocks your boat. And also refuse code tagged with authors. This should be stated in the HACKING file.

To sum up what's been decided so far:

Remove the legacy $Id$ keyword

Simplify the source file header. (I support Morten's suggestion with a disclaimer included in the version tag.)

Have I missed something?

Hopefully we can close this discussion and start with the migration soon.

In light of the IRC discussion last week, and the risk of turning this in to a bikeshed (as someone commented), I'm going to ab^H^Huse my authority and cut through with the following decisions.

We're tossing out the metavariables, as they seem to be non-standard, won't be read by machine AFAIK, and they clutter up the code. The following is the new template for Python files:

--[snip]-- #!/usr/bin/env python # -*- coding: utf-8 -*- # # Copyright (C) 2002-2004 Norwegian University of Science and Technology # Copyright (C) 2007-2009 UNINETT AS # # This file is part of Network Administration Visualized (NAV). # # NAV is free software: you can redistribute it and/or modify it under # the terms of the GNU General Public License version 2 as published by # the Free Software Foundation. # # This program is distributed in the hope that it will be useful, but WITHOUT ANY # WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A # PARTICULAR PURPOSE. See the GNU General Public License for more details. # You should have received a copy of the GNU General Public License along with # NAV. If not, see http://www.gnu.org/licenses/. # """Module docstring, using PEP-257 conventions.""" --[snip]--

The following rules apply:

1. Modules that are not to be executed as programs shall not contain the shebang on the first line.

2. The copyright notices shall (of course) be correct, the above are only examples. For those working for UNINETT, the UNINETT copyright year must be updated (if necessary) when less than trivial changes are made to a file.

3. No individual authors shall be listed in the comments or in metavariables. The only exception to this is source code modules that have been copied from third parties (assuming compatible licenses) and contain author names in addition to copyright statements (i.e. don't remove stuff if you copied a whole file).

4. Header changes to existing Python files must be committed as separate changesets - do not mix other kinds of changes into these changesets, but do update Python file headers in batch where applicable. Use the log message "Cleanup of Python file headers, according to new policy."

5. Don't spend time on changing headers on stable branches, only on feature branches and the default branch. If you spot a Python file with a non-compliant header, fix it immediately (remember: separate commit!). Change all the files you are currently maintaining.

-- mvh Morten Brekkevold UNINETT

6022

Age (days ago)

6120

Last active (days ago)

nav-dev@lister.sikt.no

6 comments

4 participants

tags (0)

participants (4)

Jørgen Abrahamsen
Morten Brekkevold
Stein Magnus Jodal
Vidar Faltinsen