Drupal Planet Posts

Drupal Code Builder unit testing: bringing in the heavy stuff

I started adding unit tests to Drupal Code Builder about 3 years ago, and I’ve been gradually expanding on them ever since. It’s made refactoring the code a pleasant rather than stressful experience.

However, while all generator types are covered, the level of detail the tests go into isn’t that deep. Back when I wrote the tests, they mostly needed to check for hook implementations that were generated, and so quick and dirty regexes on the generated code did the job: match 'mymodule_form_alter' in the generated code, basically. I gradually extended those to cover things like class declarations and methods, but that approach is very much cracking at the seams.

So it’s time to switch to something more powerful, and more suited to the task.

I’ve already removed my frankly hideous attempt at verifying that generated code is correctly-formed PHP, replacing it with a call to PHP’s own code linter. My own code was running the generated PHP code files through eval() (yes, I know!) to check nothing crashed, which was quick and worked but only up to a point: tests couldn’t create the same function twice, as eval()ing code that contains a function declaration brings it into the global namespace, and it didn’t work at all for classes where while tests were being run, as the parent classes in Drupal core or contrib aren't present.

It's already proved worthwhile, as once I'd converted the tests, I found an error in the generated code: a stray quote mark in base field definitions for a content entity, which my approach wasn't picking up, and never would have.

The second phase is going to be to use PHPCS and Drupal Coder to check that generated code follows Drupal Coding Standards. I'm currently getting that to work in my testing base class, although it might be a while before I push it, as I suspect it's going to complain about quite a few nipicks in the generated code that I'll then have to spend some time fixing.

The third phase (this is a 3-phrase programme, all the best ones are) is going to be to look into using PHP-Parser to check for the presence of functions and methods in the code, rather than my regex-based approach. This is going to allow for much more thorough checking of the generated code, with things such as method order in the code, service injection, and more.

After that, it'll be back to some more refactoring and code clean-up, and then some more code generators! I have a few ideas of what else Drupal Code Builder could generate, but more ideas are welcome in the issue queue on github.

Brief update on Drupal Code Builder

I've completely revamped the Drush commands for Drupal Code Builder:

  • First, they're now in their own project on Github
  • Second, I've rewritten them completely for Drush 9, completely interactive.
  • Third, they are now geared towards adding to existing modules, rather than generating a module as a single shot. That approach made sense in the days of Drupal 6 when it was just hook implementations, but I increasingly find I want to add a plugin, a service, a form, to a module I've already started.

The downside is that installing these is rather tricky at the moment due to some current limitations in Drush 9 beta; see details in the project README, which has full instructions for workarounds.

Now that's out of the way, I'm pushing on with some new generators for the Drupal Code Builder library itself. On my list is:

  • plugin types (as in the plugin manager service, base class and interface, and declaration for Plugins module)
  • entity type
  • entity type handlers
  • your suggestions in the issue queue...

And of course more unit tests and refactoring of the codebase.

Dorgflow: a tool to automate your drupal.org patch workflow

I’ve written previously about git workflow for working on drupal.org patches, and about how we don’t necessarily need to move to a github-style system on drupal.org, we just maybe need better tools for our existing workflow. It’s true that much of it is repetitive, but then repetitive tasks are ripe for automation. In the two years since I released Dorgpatch, a shell script that handles the making of patches for drupal.org issues, I’ve been thinking of how much more of the drupal.org patch workflow could be automated.

Now, I have released a new script, Dorgflow, and the answer is just about everything. The only thing that Dorgflow doesn’t automate is uploading the patch to drupal.org (and that’s because drupal.org’s REST API is read-only). Oh, and writing the code to actually fix bugs or create new features. You still have to do that yourself, along with your cup of coffee.

So assuming you’ve made your own hot beverage of choice, how does Dorgflow work?

Simply! To start with, you need to have an up to date git clone of the project you want to work on, be it Drupal core or a contrib project.

To start work on an issue, just do:

$ dorgflow https://www.drupal.org/node/123456

You can copy and paste the URL from your browser. It doesn’t matter if it has an anchor link on the end, so if you followed a link from your issue tracker and it has ‘#new’ at the end, or clicked down to a comment and it has ‘#comment-1234’ that’s all fine.

The first thing this comment does it make a new git branch for you, using the issue number and the name. It then also downloads and applies all the patch files from the issue node, and makes a commit for each one. Your local git now shows you the history of the work on the issue. (Note though that if a patch no longer applies against the main branch, then it’s skipped, and if a patch has been set to not be displayed on the issue’s file list, then it’s skipped too.)

Let’s see how this works with an actual issue. Today I wanted to review the patch on an issue for Token module. The issue URL is https://www.drupal.org/node/2782605. So I did:

$ dorgflow https://www.drupal.org/node/2782605

That got me a git history like this:

  * 6d07524 (2782605-Move-list-of-available-tokens-from-Help-to-Reports) Patch from Drupal.org. Comment: 35; URL: https://www.drupal.org/node/2782605#comment-11934790; file: token-move-list-of-available-tokens-2782605-34.patch; fid 5784728. Automatic commit by dorgflow.
 * 6f8f6e0 Patch from Drupal.org. Comment: 15; URL: https://www.drupal.org/node/2782605#comment-11666939; file: 2782605-13.patch; fid 5710235. Automatic commit by dorgflow.
 /
* a3b68cc (8.x-1.x) Issue #2833328 by Berdir: Handle bubbleable metadata for block title token replacements
* [older commits…]

What we can see here is:

  • Git is now on a feature branch, called ‘2782605-Move-list-of-available-tokens-from-Help-to-Reports’. The first part is the issue number, and the rest is from the title of the issue node on drupal.org.
  • Two patches were found on the issue, and a commit was made for each one. Each patch’s commit message gives the comment index where the patch was posted, the URL to the comment, the patch filename, and the patch file entity ID (these last two are less interesting, but are used by Dorgflow when you update a feature branch with newer patches from an issue).

The commit for patch 35 will obviously only show the difference between it and patch 15, an interdiff effectively. To see what the patch actually contains, take a diff from the master branch, 8.x-1.x.

(As an aside, the trick to applying a patch that’s against 8.x-1.x to a feature branch that already has commit for a patch is that there is a way to check out files from any git commit while still keeping git’s HEAD on the current branch. So the patch applies, because the files look like 8.x-1.x, but when you make a commit, you’re on the feature branch. Details are on this Stack Overflow question.)

At this point, the feature branch is ready for work. You can make as many commits as you want. (You can rename the branch if you like, provided the ‘2782605-’ part stays at the beginning.) To make your own patch with your work, just run the Dorgflow script without any argument:

$ dorgflow

The script detects the current branch, and from that, the issue number, and then fetches the issue node from drupal.org to get the number of the next comment to use in the patch filename. All you now have to do is upload the patch, and post a comment explaining your changes.

Alternatively, if you’re a maintainer for the project, and the latest patch is ready to be committed, you can do the following to put git into a state where the patch is applied to the main development branch:

$ dorgflow commit

At that point, you just need to obtain the git commit command from the issue node. (Remember the drupal standard git message format, and to check the attribution for the work on the issue is correct!)

What if you’ve previously reviewed a patch, and now there’s a new one? Dorgflow can download new patches with this command:

$ dorgflow update

This compares your feature branch to the issue node’s patches, and any patches you don’t yet have get new commits.

If you’ve made commits for your own work as well, then effectively there’s a fork in play, as your development in your commits and the other person’s patch are divergent lines of development. Appropriately, Dorgflow creates a separate branch. Your commits are moved onto this branch, while the feature branch is rewound to the last patch that was already there, and then has the new patches applied to it, so that it now reflects work on the issue. It’s then up to you to do a git merge of these two branches in order to combine the two lines of development back into one.

Dorgflow is still being developed. There are a few ideas for further features in the issue queue on github (not to mention a couple of bugs for some of the various possible cases the update command can encounter). I’m also pondering whether it’s worth the effort to convert the script to use Symfony Console; feel free to chime in with any opinions on the issue for that.

There are tests too, as it’s pretty important that a script that does things to your git repository does what it’s supposed to (though the only command that does anything destructive is ‘dorgflow cleanup’, which of course asks for confirmation). Having now written this, I’m obviously embarking upon cleaning it up and to some extent rewriting it, though I do have the excuse that the early weeks of working on this were the days after the late nights awake with my newborn daughter, and so the early versions of the code were written in a haze of sleep deprivation. If you’d like to submit a pull request, please do check in with me first on an issue to ensure it’s not going to clash with something I’m partway through changing.

Finally, if you find this as useful as I do (this was definitely an itch I’ve been wanting to scratch for a long time, as well as being a prime case of condiment-passing), please tell other Drupal developers about it. Let’s all spend less time downloading, applying, and rolling patches, and more time writing Drupal code!

Changing the type of a node

There’s an old saying that no information architecture survives contact with the user. Or something like that. You’ll carefully design and build your content types and taxonomies, and then find that the users are actually not quite using what you’ve built in quite the way it was intended when you were building it.

And so there comes a point where you need to grit your teeth, change the structure of the site’s content, and convert existing content.

Back on Drupal 7, I wrote a plugin for Migrate which handled migrations within a single Drupal site, so for example from nodes to a custom entity type, or from one node type to another. (The patch works, though I never found the time to polish it sufficiently to be committed.)

On Drupal 8, without the time to learn the new version of Migrate, I recently had to cobble something together quickly.

Fortunately, this was just changing the type of some nodes, and where all the fields were identical on both source and destination node types. Anything more complex would definitely require Migrate.

First, I created the new node type, and cloned all its fields from the old type to the new type. Here I took the time to update some of the Field Tools module’s functionality to Drupal 8, as it pays off to have a single form to clone fields rather than have to add them to the new node type one by one.

Field Tools also copies display settings where form and view modes match (in other words, if the source bundle has a ‘teaser’ display mode configured, and the destination also has a ‘teaser’ display mode that’s enabled for custom settings, then all of the settings for the fields being cloned are copied over, with field groups too).

With all the new configuration in place, it was now time to get down to the content. This was plain and simple a hack, but one that worked fine for the case in question. Here’s how it went…

We basically want to change the bundle of a bunch of nodes. (Remember, the ‘bundle’ is the generic name for a node type. Node types are bundles, as taxonomy vocabularies are bundles.) The data for a single node is spread over lots of tables, and most of these have the bundle in them.

On Drupal 8 these tables are:

  • the entity base table
  • the entity data table
  • the entity revision data table
  • each field data table
  • each field data revision table

(It’s not entirely clear to me what the separation between base table and data table is for. It looks like it might be that base table is fields that don’t change for revisions, and data table is for fields that do. But then the language is on the base table, and that can be changed, and the created timestamp is on the data table, and while you can change that, I wouldn’t have thought that’s something that has past values kept. Answers on a postcard.)

So we’re basically going to hack the bundle column in a bunch of tables. We start by getting the names of these tables from the entity type storage:

$storage = \Drupal::service('entity_type.manager')->getStorage('node');

// Get the names of the base tables.
$base_table_names = [];
$base_table_names[] = $storage->getBaseTable();
$base_table_names[] = $storage->getDataTable();
// (Note that revision base tables don't have the bundle.)

For field tables, we need to ask the table mapping handler:

$table_mapping = \Drupal::service('entity_type.manager')->getStorage('node')
  ->getTableMapping();

// Get the names of the field tables for fields on the service node type.
$field_table_names = [];
foreach ($source_bundle_fields as $field) {
  $field_table = $table_mapping->getFieldTableName($field->getName());
  $field_table_names[] = $field_table;

  $field_storage_definition = $field->getFieldStorageDefinition();
  $field_revision_table = $table_mapping
    ->getDedicatedRevisionTableName($field_storage_definition);

  // Field revision tables DO have the bundle!
  $field_table_names[] = $field_revision_table;
}

(Note the inconsistency in which tables have a bundle field and which don’t! For that matter, surely it’s redundant in all field tables? Does it improve the indexing perhaps?)

Then, get the IDs of the nodes to update. Fortunately, in this case there were only a few, and it wasn’t necessary to write a batched hook_update_N().

// Get the node IDs to update.
$query = \Drupal::service('entity.query')->get('node');
// Your conditions here!
// In our case, page nodes with a certain field populated.
$query->condition('type', 'page');
$query->exists(‘field_in_question’);
$nids = $query->execute();

And now, loop over the lists of tables names and hack away!

// Base tables have 'nid' and 'type' columns.
foreach ($base_table_names as $table_name) {
  $query = \Drupal\Core\Database\Database::getConnection('default')
    ->update($table_name)
    ->fields(['type' => 'service'])
    ->condition('nid', $service_nids, 'IN')
    ->execute();
}
// Field tables have 'entity_id' and 'bundle' columns.
foreach ($field_table_names as $table_name) {
  $query = \Drupal\Core\Database\Database::getConnection('default')
    ->update($table_name)
    ->fields(['bundle' => 'service'])
    ->condition('entity_id', $service_nids, 'IN')
    ->execute();
}

Node-specific tables use ‘nid’ and ‘type’ for their names, because those are the base field names declared in the entity type class, whereas Field API tables use the generic ‘entity_id’ and ‘bundle’. The mapping between these two is declared in the entity type annotation’s entity_keys property.

This worked perfectly. The update system takes care of clearing caches, so entity caches will be fine. Other systems may need a nudge; for instance, Search API won’t notice the changed nodes and its indexes will literally need turning off and on again.

Though I do hope that the next time I have to do something like this, the amount of data justifies getting stuck into using Migrate!

What goes on in Drupal Code Builder?

Drupal Code Builder library is the new library which powers Module Builder. I recently split Module Builder up, so Drupal Code Builder (DCB) is the engine for generating Drupal code, while what remains in the Module Builder module is just the UI.

DCB is an extensible framework, so if you wanted to have DCB create scaffold code for a particular Drupal component or system, you can.

DCB's API is documented in the README. It's based on the idea of tasks: for example, list the hooks and plugin types that DCB has parsed from the site code, analyze the site code to update that list, or generate code for a module. There are Task classes, and you call public methods on these to do something.

The generators

Broadly, there are three things you want to do with DCB: collect and analyze data about a Drupal codebase to learn about hooks and plugin types, report on that data, and actually generate some code.

The Generate task class is where the work of creating code begins. The other task classes are all pretty simple, or at least self-contained, but the Generate task is accompanied by a large number of classes in the DrupalCodeBuilder\Generate namespace. You can see from the file names that these represent all the different components that make up generated code.

Furthermore, as well as all inheriting from BaseGenerator, there are hierarchies which can probably be deduced from the names alone, where more specialized generators inherit from generic ones. For example, we have:

  • File
    • PHPFile
    • ModuleCodeFile
    • PHPClassFile
      • Plugin
      • Service
    • API (this one's for your mymodule.api.php file)
    • YMLFile
    • Readme

and also:

  • PHPFunction
    • HookImplementation
    • HookMenu
    • HookPermission

However, these hierarchies are only about code re-use. In terms of PHP code, HookImplementation is only related to ModuleCodeFile by the common BaseGenerator base class. As the process of code generation takes place, there will be a tree of components that represents components containing each other, but it's important to remember that class inheritance doesn't come into it.

Also, while the generators in the hierarchies above clearly represent some tangible part of the code we're going to generate, some are more abstract, such as Module and Hooks. These aren't abstract in the OO sense, as they will get instantiated, but I think of them as abstract in the sense that they're not concrete and are responsible for code across different files. (Suggestions for a better word to describe them please!)

The process of generating code starts with a call to the Generate task's generateComponent() method. The host UI application (such as Module Builder module, or the Drush command) passes it an array of data that looks something like this:

[
  'base' => 'module',
  'root_name' => 'mymodule,
  'readable_name' => 'My module',
  'hooks' => [
    'form_alter' => TRUE,
    'install' => TRUE,
  ],
  'plugins => [
    0 => [
      'plugin_type' => 'block',
      'plugin_name' => 'my_plugin',
      'injected_services' => [
        'current_user',
      ],
    ],
  ],
  'settings_form' => TRUE,
  'readme' => TRUE,
]

(How you get the specification for this array as a list of properties and their expected format is a detailed topic of its own, which will be covered later. For now, we're jumping in at the point where code is generated.)

Assembling components

The first job for the Generate task class is to turn this array of data into a list of generator classes for the necessary components.

This list is built up in a cascade, where each component gets to request further components, and those get to request components too, and so on, until we reach components that don't request anything. We start with the root component that was initially requested, Module, let that request components, and then repeat the process.

This is best illustrated with the AdminSettingsForm generator. This implements the requiredComponents() method to request:

  • a permission
  • a router item (on Drupal 7 that's a menu item, but in DCB we refer to these a router item whatever the core Drupal version)
  • a form

In turn, the Permission generator requests a permissions YAML file. You'll see that there are two Permission generators, each with a version suffix. The Permission7 generator requests a hook_permission() hook, which in turn requests a .module file. The Permission8 generator is somewhat simpler, and just requests a YMLFile component.

Meanwhile, the router item requests a routing.yml file on D8, and a hook_menu() on D7.

These two parts of the cascade end when we reach the various file generators: ModuleCodeFile and YMLFile don't request anything. The process that gathers all these generators works iteratively: every iteration it calls requiredComponents() on all the components the previous iteration gave it, and it only stops once an iteration produces no new components. It's safe to request the same component multiple times; in the D7 version of our example, both our hook_menu() and hook_permission() will request a ModuleCodeFile component that represents the .module file. The cascade system knows to either combine these two requests into one component, or ignore the second if it's identical to what's already been requested.

We now have a list of about a dozen or so components, each of which is an instantiated Generator object. Some represent files, some represent functions, and some like Hooks represent a more vague concept of the module 'having some hooks'. There's also the Module generator which started the whole process, whose requiredComponents() did most of the work of interpreting the given array of data.

Assembling a tree of components

The second part of the process is to assemble this flat list of components into a tree. This is where the notion of which component contains others does come into play. This is a different concept from requested components: a component can request something that it won't end up containing, as we saw with the AdminSettingsForm, which requests a permission.

The Generate task calls the containingComponent() method on each component, and this is used to assemble an array of parentage data. There's nothing fancy or recursive going on here; the tree is just an array whose keys are the identifiers of components, and whose values are arrays of the child component identifiers.

This tree now represents a structure of components where child items will produce code to be included in their parents. One part of this structure could be represented like this:

  • module
    • routing.yml
      • router item
    • permission.yml
      • permission
    • .install
      • hook_install()

Some components, such as the Hooks component, are no longer around now: their job was to be a sort of broker for other components in the requesting phase, and they're no longer involved. The root component, Module, is the root of the tree. All the files we'll be outputting are its immediate children. (This is not a file hierarchy, folders are not represented here.)

Assembling file contents

We now have everything we need to start actually generating some code. This is done in a way that's very similar to Drupal's Render API: we recurse into the tree, asking each component to return some content both from itself and its children.

So for example, the router items contribute some lines to the routing.yml file, which then turns them into YAML. The .install component, which is an instance of ModuleCodeFile, produces a @file docblock, and then gets the docblock, function declaration, and function body from the hook_install component, and glues them all together.

Finally, each file component (the immediate children of the module component in the tree) gets to say what its file name and path should be.

So the Generate task has an array of data about files, where each item has a file name, file path, and file contents. This is returned to the caller to be output to the user, or written to the filesystem. Module Builder presents the files in a form, and allows the files to be written. The Drush command outputs them to terminal and optionally writes them too.

Extending it with new components

The best way to add new things for DCB to generate is to inherit from existing basic classes. If these don’t provide the flexibility, there’s always a case to be made to give them more configurable options: for example, the AdminSettingsForm class inherits from Form, but neither of those do very little for the actual generated form class, as the work for that is mostly done by the PHPClass class.

The roadmap for DCB at the moment consists of the following:

  • Generalize the injected services functionality that’s already in Plugins, so generated Form classes and Services can have them too.
  • Add Forms as a basic component that you can request to generate. (It’s currently there only as a base for the AdminSettingsForm generator.)

And as ever, keep adding tests, keep refactoring and improving the code. But I'm always interested in hearing new ideas (or you know, better yet, patches) in the issue queue.

The Lazy Maintainer's Handbook, Part 1: Frequent Releases

Every now and then I think about writing a series of posts called 'The Lazy Maintainer's Handbook', covering various aspects of how to maintain a project (or several) on drupal.org without it being a huge burden on your time (which a lot of people, and companies, think is the case). However, I never get much past the pondering stage. Since it's safe to say that I'm never going to manage to come up with the right order to write these in, I'm just going to start in the middle. Here goes.

Like with all software, bugs are a problem in the world of Drupal. In Drupal contrib, like with any software that has releases, we can classify bugs into three types:

  • bugs which are reported, but have no fix,
  • bugs which have a fix, but the patch hasn't been committed,
  • bugs which have been fixed, but are not part of a release yet.

The first and second type can involve a fair amount of work, and I will cover in another post in the future how much a LM can or should do about them.

The third type though is where the LM can really shine: all that needs to be done here is to make a release. What could be simpler? And let's be clear, making a release is a very quick job. I can do the whole thing in about a minute (though I've not timed it, yet).

For one thing, if you're still writing the release notes by hand, then stop: Git Release Notes for Drush does that for you.

Here's my process:

  • $ git tag This lists all the existing tags, so I can see what the next release number should be.
  • $ git tag 7.x-1.2 This creates the tag for the new release.
  • At this point, it's a good idea to check this in a graphical git client, to check for stupid mistakes like making the tag on a local development branch, or the wrong major branch. (I've done that at least once.)
  • $ git push origin 7.x-1.2 This creates the tag on the remote repository.
  • $ drush release-notes This creates the release notes, using all the commit messages between the previous commit and the commit you just made. (Another reason to use the standard format for commit messages: it will turn the #12345 issue numbers into tags for d.org to then render as links.)
  • Select the output from the command and copy it.
  • Go to your project's page on d.org and click the 'Add new release' link.
  • Select the new tag.
  • Paste the release notes and save the node.

I speed this up even more by having a bash alias for 'drush release-notes | pbcopy', which on OS X puts the text output by the Drush command onto the clipboard, so I can skip the selecting and copying step.

It's quick, right? Fun, even! Why don't maintainers do it more often? The reasons I can think of are:

  • the current branch HEAD (and thus -dev release) is unstable and badly broken
  • maintainers are worried about making releases too frequent and making site builders update all the time
  • maintainers have fallen prey to the 'just one more fix' syndrome, and are waiting for another issue (or issues) to be resolved.

Let's address these shall we?

The branch is unstable

This can happen when you're still on alpha releases, and something's caused you to take a new direction in development. This is a tough case: there's no going back, and you're stuck going forward. The only thing I would advise here is to look at the git history since the last release and see if there's a commit between then and now that could be tagged as the next alpha: for instance if the first few commits after the alpha were simple bug fixes. To try to prevent this problem, I recommend making a release immediately before you take a new direction in development, and if it's a very large rewrite, starting a new major branch, even if that means abandoning the 1.x branch at the alpha stage.

If a major rewrite happens and you're on beta releases or stable releases, then you're doing it wrong: a major rewrite should be cause to start a new major branch.

The last release was recent, and users may dislike frequent releases

Inspecting and testing new releases takes time. More importantly, perhaps, it has quite a high cognitive cost, as for each module you update, you need to review the release notes to look for any changes that might affect your site's functionality, and any parts of your site's codebase that make use of that module's code. It's also a bit stressful, because if something does break, it could be in a part of the site you don't think to check, so the first you'll hear of it is when your client or project manager calls in a panic two days later.

Understandably then, a lot of developers and site builders put off module updates, or don't bother with them until there's a security release.

This may be a time saving, but I don't believe it's an effort saving. Suppose you're on release 1.0 and releases 1.1 and then 1.2 come along. You can either upgrade to each one when it comes out, or you wait for 1.2 and then upgrade straight to that.

Doing two upgrades seems like more work, because you're only having to check the site once.

But I would counter that it's less overall cognitive load, because each single release has fewer changes. If releases are frequent, and include only up to a dozen or so commits, then it's easier to scan down the list of issues in the release notes, and maybe see that they're all very minor bug fixes, or clearly only affect functionality that your site doesn't use.

Ultimately, I don't think postponing upgrades pays off, because eventually, you'll have to bite the bullet and upgrade past several version numbers. Worse, you may have your hand forced when a security update comes along, and you'll be in the situation of having to read the release notes for all the versions you skipped and assessing them, while your site is on fire.

So I think small, frequent releases are actually a good thing, even from the point of view of existing users.

From the point of view of new users, they're a great thing: new users get better code, with fixed bugs. And that applies to existing users as well of course.

(A future episode in this series will cover an idea I've had for ages, of a metric for when you should do a release: after so many commits or weeks have passed.)

You're waiting for just one more fix

Don't. It might never come. In my experience, it probably won't. You'll think to yourself, 'just one more week and someone will review this patch', or 'I'll write a patch for this in the next week'. That week becomes two, and a month, and a year, and even more. I've seen comments on issues called 'Plan for a 1.0 release' where the maintainer says 'I'll make a release in the next two weeks' and that comment is over a year old. I saw one of those comments recently that was two years old. It was mine.

Fight the urge to wait for more fixes. Yes, you want your module to be good, even maybe perfect. But tell yourself: if you release now, you're still making it better. So here's what you need to do:

  • If you're still not on a stable release, release another alpha or beta. The real purpose of unstable releases should be to get people to test your code. You really want them to be testing something recent, not code that's six months old.
  • If you're on stable releases, just release another one. Those issues you were waiting for can go in the next release (or the one after…)

Hopefully that should assuage concerns regarding one's responsibility as a maintainer.

But as a lazy maintainer, what's the benefit to you?

  • Your recent fixes are out there and in use. A bug isn't truly eradicated until a release is made that includes the fix.
  • Fewer duplicate issues filed. With fewer bugs that are fixed in dev but not in a release, there's less chance of people encountering them.
  • More users, because the age of the most recent release is a metric people use when evaluating a module.
  • Users are using a more recent version of the code, which means newer bugs are more likely to get caught. (Because seriously, how many people are actually trying out the dev release, unless they're forced to by unreleased fixes?)

Get the code out. Dev code serves nobody. Releases are what matters.

Module Builder announces split, due to functionality differences

For Drupal 8, Module Builder is undergoing some big changes. It still builds hooks, a README file, an api.php file, permissions, an admin settings form, but now also builds plugins, services, routing items, and its ability to scan your codebase to learn about hooks invented by any of your modules is now extended to plugin types too. And it's actually been available for Drupal 8 for quite some time, but up till now only as the Drush plugin version.

I've now released the D8 version of the module, so you can use an admin UI in Drupal which lets you select the components you want in a form. Unlike Drupal 7 though, the options you enter for your module to generate are stored in a config entity, so you can generate code and then go back and tweak the settings and generate it again, as often as you like.

The big change isn't any of these though. The big change is that Module Builder is being split up.

For a very long time, the Module Builder codebase has been three things in one. Back when I added Drush support (in 2009, according to the git log), it made sense to gradually refactor the code into three parts: the Drupal module UI, the Drush commands, and the common code that does the actual work of generating code based on some parameters (such as which hooks you want, the module name, etc).

That core code has undergone a lot of changes. It's gone from just working with hooks, to a framework that's extensible with new component types. So for example, it's possible to request simply 'an admin form', and the generating code knows to produce the code for the form, the admin permission, and the router item. So that's one component that in fact produces form functions, hook_permission(), hook_menu() on Drupal 7, form class, permissions.yml, routing.yml on Drupal 8. Because Module Builder also works on multiple versions of Drupal (the code to produce Drupal 5 code is even still in there, if you have cause to try it, let me know if it still works!).

Having this multiple-version code within a Drupal module that's only for single version is a source of problems and confusion. The 7.x-2.x version of the project contains a module that's only for Drupal 7, but also the core code and the Drush plugin that both work on all versions. It also increases maintenance work, if we want to the older versions to keep receiving improvements to the generating code.

Hence the split. Module Builder is being divided into three parts:

  • The core code of Module Builder has been moved to a separate library, which is called Drupal Code Builder to distinguish it.
  • The Module Builder project is from now on just Drupal module, which requires the Drupal Code Builder library.
  • The Drush plugin will be moving too, and will also require the Drupal Code Builder library.

So to summarize, the situation is now as follows:

  • To build modules in a Drupal UI, on Drupal 8, you need:
  • To build modules with Drush, on any version, you need:
    • Module Builder 7.x-2.x, installed as a Drush command plugin (again, see the README). But note this will shortly be changing when the Drush command moves out of the d.org project too.
  • To build modules in a Drupal UI, on Drupal 7, you need:
    • Module Builder 7.x-2.x. I will probably release a 7.x-3.x at some point which requires the Drupal Code Builder library, so that the Drupal 7.x UI gets new features that are released in the library.

I'll be writing a post soon about how Drupal Code Builder works, so if you're interested in making Drupal Code Builder make something new, look out for that.

Importing wysiwyg image files in body text with Migrate

Migrate module makes the assumption that for each item in your source, you're importing one item into Drupal, and that's baked in pretty deep.

So how do you handle source items where there is body text containing multiple inline images, which should be imported to file entities? For each single source item, ultimately imported as a node, you also have a variable number of images: a one-to-many source to destination relationship.

One way is to simply import the image files at the same time as the nodes whose body text they are in. In the prepareRow() method of your migration class, you effectively do a mini-import, analysing the text, fetching the file data, saving the file locally, and then getting a file ID that you can use to replace into the body text. But doing that, you don't benefit from any of Migrate's helper classes for importing files, nor can you roll these back without writing further code: this isn't a Migration, capital M, it's just an import.

The better way is to write a second migration, to run before your node migration. It may seem like extra work to have to write a second migration class, but it pays off, and besides, since they both draw from the same source data, a lot of your code is already written. Copy-paste the class definition and the constructor as far as the field mappings. The code you would have put in prepareRow() to analyse the text goes somewhere else, but we're getting ahead of ourselves.

And it turns out that Migrate does allow for your single source items to yield multiple destination items: tucked away in the module's source classes and not mentioned in the examples (which do cover a great deal, are probably the most extensive examples of any contrib module all the same, it must be said) there are at least two (that I've found) places where the one-to-one correspondence can be skirted around.

The one that I used is only available when you use the MigrateSourceList source type, and furthermore when your list source is a set of files. As it happened, the source data I was working with on my project came in JSON files, one file per node to import, and with a body text field which contained references to image files which also needed to be imported.

MigrateSourceList is a source type that separates out the two concepts of listing your source items and processing each one. So unlike, say, the CSV source where the list and the item processing are both provided by the one class, with MigrateSourceList, you can say something like 'my list source is a directory listing, and each item is a JSON file', or 'my list source is a JSON file, and each item is an HTML file'. The MigrateSourceList class delegates the two jobs to further classes, which allows you to mix and match them. Your migration specifies them in the constructor:

    $this->source = new MigrateSourceList(new MigrateListFiles($list_dirs, $base_dir),
      new MigrateItemFile($base_dir), $fields);

There's one more component in the system, and this is the crucial piece that allows us to have more than one destination item per source item: it's the MigrateContentParser class. This allows the files that MigrateListFiles to each return multiple items to MigrateSourceList.

The only implementation of this in Migrate is MigrateSimpleContentParser, which doesn't do much, so you'll want to subclass this for your particular case. It's fairly simple:

  • setContent() — perform any processing on the content of the file. In my case, I needed to run it through drupal_json_decode() and grab the body text field, since the whole file was JSON representing a node.
  • getChunkCount() — process the file content to return a count of how many items for migration are contained in it.
  • getChunkIDs() — similar to getChunkCount(), but return IDs. You will probably end up using a common helper method for this and getChunkCount(), as they do the same sort of work. The IDs you return can be anything you like; in my case they were GUIDs. They are appended to the ID for the file to form the overall source ID.
  • getChunk() — given a chunk ID (one of the ones you provided in getChunkIDs()), return the actual data for that item. Again, you may want to use a common helper. In my case, here I merely returned the ID itself, since the images to migrate were on a remote server and accessed by their GUID.

I submitted a patch to make it a bit easier to deal with the case where files might contain either one or many chunks (or even none): by default, a file providing only one chunk doesn't get to return a chunk ID, which wasn't working for my case. The patch (committed but not yet in a release) adds an option to override this so you always know the ID of the chunk: in my case, the GUID for the image found inside the body text, which was always needed by the image migration code.

At the other end, I needed a custom subclass of MigrateItem to deal with the data returned by getChunk(). This just needs to implement getItem(), and it can pretty much return anything you like: this is the same source data item that your migration class gets to work with.

So to recap, as there's quite a few classes flying around helping one another here, we have MigrateListFiles which uses a custom MigrateContentParser implementation to extract items from the source data files, with possibly more than one item from a single file, and then MigrateSourceList uses MigrateListFiles along with a custom MigrateItem implementation.

The setup code in my migration's constructor then looks like this:

    $parser = new CustomJSONBodyImagesParser();
    $list = new MigrateListFiles(
      // $list_dirs
      array($source_folder),
      // $base_dir
      $source_folder,
      // $file_mask
      '//',
      // $options
      array(),
      // MigrateContentParser $parser
      $parser
    );
    $item = new CustomImageGUIDItem();
    $this->source = new MigrateSourceList($list, $item, $fields);

The end result worked great, and the ability to rollback turned out to be very useful, when there turned out to be bad data here and there that needed to be cleaned up or skipped. But that's what always happens with migrations in my experience!

A script for making patches

I have a standard format for patchnames: 1234-99.project.brief-description.patch, where 1234 is the issue number and 99 is the (expected) comment number. However, it involves two copy-pastes: one for the issue number, taken from my browser, and one for the project name, taken from my command line prompt.

Some automation of this is clearly possible, especially as I usually name my git branches 1234-brief-description. More automation is less typing, and so in true XKCD condiment-passing style, I've now written that script, which you can find on github as dorgpatch. (The hardest part was thinking of a good name, and as you can see, in the end I gave up.)

Out of the components of the patch name, the issue number and description can be deduced from the current git branch, and the project from the current folder. For the comment number, a bit more work is needed: but drupal.org now has a public API, so a simple REST request to that gives us data about the issue node including the comment count.

So far, so good: we can generate the filename for a new patch. But really, the script should take care of doing the diff too. That's actually the trickiest part: figuring out which branch to diff against. It requires a bit of git branch wizardry to look at the branches that the current branch forks off from, and some regular expression matching to find one that looks like a Drupal development branch (i.e., 8.x-4.x, or 8.0.x). It's probably not perfect; I don't know if I accounted for a possibility such as 8.x-4.x branching off a 7.x-3.x which then has no further commits and so is also reachable from the feature branch.

The other thing this script can do is create a tests-only patch. These are useful, and generally advisable on drupal.org issues, to demonstrate that the test not only checks for the correct behaviour, but also fails for the problem that's being fixed. The script assumes that you have two branches: the one you're on, 1234-brief-description, and also one called 1234-tests, which contains only commits that change tests.

The git workflow to get to that point would be:

  1. Create the branch 1234-brief-description
  2. Make commits to fix the bug
  3. Create a branch 1234-tests
  4. Make commits to tests (I assume most people are like me, and write the tests after the fix)
  5. Move the string of commits that are only tests so they fork off at the same point as the feature branch: git rebase --onto 8.x-4.x 1234-brief-description 1234-tests
  6. Go back to 1234-brief-description and do: git merge 1234-tests, so the feature branch includes the tests.
  7. If you need to do further work on the tests, you can repeat with a temporary branch that you rebase onto the tip of 1234-tests. (Or you can cherry-pick the commits. Or do cherry-pick with git rev-list, which is a trick I discovered today.)

Next step will be having the script make an interdiff file, which is a task I find particularly fiddly.

A git-based patch workflow for drupal.org (with interdiffs for free!)

There's been a lot of discussion about how we need github-like features on d.org. Will we get them? There's definitely many improvements in the pipeline to the way our issue queues work. Whether we actually need to replicate github is another debate (and my take on it is that I don't think we do).

In the meantime, I think that it's possible to have a good collaborative workflow with what we have right now on drupal.org, with just the issue queue and patches, and git local branches. Here's what I've gradually refined over the years. It's fast, it helps you keep track of things, and it makes the most of git's strengths.

A word on local branches

Git's killer feature, in my opinion, is local branches. Local branches allow you to keep work on different issues separate, and they allow you to experiment and backtrack. To get the most out of git, you should be making small, frequent commits.

Whenever I do a presentation on git, I ask for a show of hands of who's ever had to bounce on CMD-Z in their text editor because they broke something that was working five minutes ago. Commit often, and never have that problem again: my rule of thumb is to commit any time that your work has reached a state where if subsequent changes broke it, you'd be dismayed to lose it.

Starting work on an issue

My first step when I'm working on an issue is obviously:

  git pull

This gets the current branch (e.g. 7.x, 7.x-2.x) up to date. Then it's a good idea to reload your site and check it's all okay. If you've not worked on core or the contrib project in question in a while, then you might need to run update.php, in case new commits have added updates.

Now start a new local branch for the issue:

  git checkout -b 123456-foobar-is-broken

I like to prefix my branch name with the issue number, so I can always find the issue for a branch, and find my work in progress for an issue. A description after that is nice, and as git has bash autocompletion for branch names, it doesn't get in the way. Using the issue number also means that it's easy to see later on which branches I can delete to unclutter my local git checkout: if the issue has been fixed, the branch can be deleted!

So now I can go ahead and start making commits. Because a local branch is private to me, I can feel free to commit code that's a total mess. So something like:

  dpm($some_variable_I_needed_to_examine);
  /*
  // Commented-out earlier approach that didn't quite work right.
  $foo += $bar;
  */
  // Badly-formatted code that will need to be cleaned up.
  if($badly-formatted_code) { $arg++; }

That last bit illustrates an important point: commit code before cleaning up. I've lost count of the number of times that I've got it working, and cleaned up, and then broken it because I've accidentally removed an important line that was lost among the cruft. So as soon as code is working, I make a commit, usually whose message is something like 'TOUCH NOTHING IT WORKS!'. Then, start cleaning up: remove the commented-out bits, the false starts, the stray code that doesn't do anything, in small commits of course. (This is where you find it actually does, and breaks everything: but that doesn't matter, because you can just revert to a previous commit, or even use git bisect.)

Keeping up to date

Core (or the module you're working on) doesn't stay still. By the time you're ready to make a patch, it's likely that there'll be new commits on the main development branch (with core it's almost certain). And before you're ready, there may be commits that affect your ongoing work in some way: API changes, bug fixes that you no longer need to work around, and so on.

Once you've made sure there's no work currently uncommitted (either use git stash, or just commit it!), do:

git fetch
git rebase BRANCH

where BRANCH is the main development branch that is being committed to on drupal.org, such as 8.0.x, 7.x-2.x-dev, and so on.

(This is arguably one case where a local branch is easier to work with than a github-style forked repository.)

There's lots to read about rebasing elsewhere on the web, and some will say that rebasing is a terrible thing. It's not, when used correctly. It can cause merge conflicts, it's true. But here's another place where small, regular commits help you: small commits mean small conflicts, that shouldn't be too hard to resolve.

Making a patch

At some point, I'll have code I'm happy with (and I'll have made a bunch of commits whose log messages are 'clean-up' and 'formatting'), and I want to make a patch to post to the issue:

  git diff 7.x-1.x > 123456.PROJECT.foobar-is-broken.patch

Again, I use the issue number in the name of the patch. Tastes differ on this. I like the issue number to come first. This means it's easy to use autocomplete, and all patches are grouped together in my file manager and the sidebar of my text editor.

Reviewing and improving on a patch

Now suppose Alice comes along, reviews my patch, and wants to improve it. She should make her own local branch:

  git checkout -b 123456-foobar-is-broken

and download and apply my patch:

  wget PATCHURL
  patch -p1 < 123456.PROJECT.foobar-is-broken.patch

(Though I would hope she has a bash alias for 'patch -p1' like I do. The other thing to say about the above is that while wget is working at downloading the patch, there's usually enough time to double-click the name of the patch in its progress output and copy it to the clipboard so you don't have to type it at all.)

And finally commit it to her branch. I would suggest she uses a commit message that describes it thus:

  git commit -m "joachim's patch at comment #1"

(Though again, I would hope she uses a GUI for git, as it makes this sort of thing much easier.)

Alice can now make further commits in her local branch, and when she's happy with her work, make a patch the same way I did. She can also make an interdiff very easily, by doing a git diff against the commit that represents my patch.

Incorporating other people's changes to ongoing work

All simple so far. But now suppose I want to fix something else (patches can often bounce around like this, as it's great to have someone else to spot your mistakes and to take turns with). My branch looks like it did at my patch. Alice's patch is against the main branch (for the purposes of this example, 7.x-1.x).

What I want is a new commit on the tip of my local branch that says 'Alice's changes from comment #2'. What I need is for git to believe it's on my local branch, but for the project files to look like the 7.x-1.x branch. With git, there's nearly always a way:

  git checkout 7.x-1.x .

Note the dot at the end. This is the filename parameter to the checkout command, which tells git that rather than switch branches, you want to checkout just the given file(s) while staying on your current branch. And that the filename is a dot means we're doing that for the entire project. The branch remains unchanged, but all the files from 7.x-1.x are checked out.

I can now apply Alice's patch:

  wget PATCHURL
  patch -p1 < 123456.2.PROJECT.foobar-is-broken.patch

(Alice has put the comment ID after the issue ID in the patch filename.)

When I make a commit, the new commit goes on the tip of my local branch. The commit diff won't look like Alice's patch: it'll look like the difference between my patch and Alice's patch: effectively, an interdiff. I now make a commit for Alice's patch:

  git commit -m "Alice's patch at comment #2"

I can make more changes, then do a diff as before, post a patch, and work on the issue advances to another iteration.

Here's an example of my local branch for an issue on Migrate I've been working on recently. You can see where I made a bunch of commits to clean up the documentation to get ready to make a patch. Following that is a commit for the patch the module maintainer posted in response to mine. And following that are a few further tweaks that I made on top of the maintainer's patch, which I then of course posted as another patch.

A screenshot of a git GUI showing the tip of a local branch, with a commit for a patch from another user.

(Notice how in a local branch, I don't feel the need to type terribly accurately for my commit messages, or indeed be all that clear.)

Improving on our tools

Where next? I'm pretty happy with this workflow as it stands, though I think there's plenty of scope for making it easier with some git or bash aliases. In particular, applying Alice's patch is a little tricky. (Though the stumbling block there is that you need to know the name of the main development branch. Maybe pass the script the comment URL, and let it ask d.org what the branch of that issue is?)

Beyond that, I wonder if any changes can be made to the way git works on d.org. A sandbox per issue would replace the passing around of patch files: you'd still have your local branch, and merge in and push instead of posting a patch. But would we have one single branch for the issue's development, which then runs the risk of commit clashes, or start a new branch each time someone wants to share something, which adds complexity to merging? And finally, sandboxes with public branches mean that rebasing against the main project's development can't be done (or at least, not without everyone know how to handle the consequences). The alternative would be merging in, which isn't perfect either.

The key thing, for me, is to preserve (and improve) the way that so often on d.org, issues are not worked on by just one person. They're a ball that we take turns pushing forward (snowball, Sisyphean rock, take your pick depending on the issue!). That's our real strength as a community, and whatever changes we make to our toolset have to be made with the goal of supporting that.

Pages

Subscribe to