Drupal Planet Posts

The Lazy Maintainer's Handbook, Part 1: Frequent Releases

By joachim, Thu, 31/03/2016 - 00:59

Every now and then I think about writing a series of posts called 'The Lazy Maintainer's Handbook', covering various aspects of how to maintain a project (or several) on drupal.org without it being a huge burden on your time (which a lot of people, and companies, think is the case). However, I never get much past the pondering stage. Since it's safe to say that I'm never going to manage to come up with the right order to write these in, I'm just going to start in the middle. Here goes.

Like with all software, bugs are a problem in the world of Drupal. In Drupal contrib, like with any software that has releases, we can classify bugs into three types:

bugs which are reported, but have no fix,
bugs which have a fix, but the patch hasn't been committed,
bugs which have been fixed, but are not part of a release yet.

The first and second type can involve a fair amount of work, and I will cover in another post in the future how much a LM can or should do about them.

The third type though is where the LM can really shine: all that needs to be done here is to make a release. What could be simpler? And let's be clear, making a release is a very quick job. I can do the whole thing in about a minute (though I've not timed it, yet).

For one thing, if you're still writing the release notes by hand, then stop: Git Release Notes for Drush does that for you.

Here's my process:

$ git tag This lists all the existing tags, so I can see what the next release number should be.
$ git tag 7.x-1.2 This creates the tag for the new release.
At this point, it's a good idea to check this in a graphical git client, to check for stupid mistakes like making the tag on a local development branch, or the wrong major branch. (I've done that at least once.)
$ git push origin 7.x-1.2 This creates the tag on the remote repository.
$ drush release-notes This creates the release notes, using all the commit messages between the previous commit and the commit you just made. (Another reason to use the standard format for commit messages: it will turn the #12345 issue numbers into tags for d.org to then render as links.)
Select the output from the command and copy it.
Go to your project's page on d.org and click the 'Add new release' link.
Select the new tag.
Paste the release notes and save the node.

I speed this up even more by having a bash alias for 'drush release-notes | pbcopy', which on OS X puts the text output by the Drush command onto the clipboard, so I can skip the selecting and copying step.

It's quick, right? Fun, even! Why don't maintainers do it more often? The reasons I can think of are:

the current branch HEAD (and thus -dev release) is unstable and badly broken
maintainers are worried about making releases too frequent and making site builders update all the time
maintainers have fallen prey to the 'just one more fix' syndrome, and are waiting for another issue (or issues) to be resolved.

Let's address these shall we?

The branch is unstable

This can happen when you're still on alpha releases, and something's caused you to take a new direction in development. This is a tough case: there's no going back, and you're stuck going forward. The only thing I would advise here is to look at the git history since the last release and see if there's a commit between then and now that could be tagged as the next alpha: for instance if the first few commits after the alpha were simple bug fixes. To try to prevent this problem, I recommend making a release immediately before you take a new direction in development, and if it's a very large rewrite, starting a new major branch, even if that means abandoning the 1.x branch at the alpha stage.

If a major rewrite happens and you're on beta releases or stable releases, then you're doing it wrong: a major rewrite should be cause to start a new major branch.

The last release was recent, and users may dislike frequent releases

Inspecting and testing new releases takes time. More importantly, perhaps, it has quite a high cognitive cost, as for each module you update, you need to review the release notes to look for any changes that might affect your site's functionality, and any parts of your site's codebase that make use of that module's code. It's also a bit stressful, because if something does break, it could be in a part of the site you don't think to check, so the first you'll hear of it is when your client or project manager calls in a panic two days later.

Understandably then, a lot of developers and site builders put off module updates, or don't bother with them until there's a security release.

This may be a time saving, but I don't believe it's an effort saving. Suppose you're on release 1.0 and releases 1.1 and then 1.2 come along. You can either upgrade to each one when it comes out, or you wait for 1.2 and then upgrade straight to that.

Doing two upgrades seems like more work, because you're only having to check the site once.

But I would counter that it's less overall cognitive load, because each single release has fewer changes. If releases are frequent, and include only up to a dozen or so commits, then it's easier to scan down the list of issues in the release notes, and maybe see that they're all very minor bug fixes, or clearly only affect functionality that your site doesn't use.

Ultimately, I don't think postponing upgrades pays off, because eventually, you'll have to bite the bullet and upgrade past several version numbers. Worse, you may have your hand forced when a security update comes along, and you'll be in the situation of having to read the release notes for all the versions you skipped and assessing them, while your site is on fire.

So I think small, frequent releases are actually a good thing, even from the point of view of existing users.

From the point of view of new users, they're a great thing: new users get better code, with fixed bugs. And that applies to existing users as well of course.

(A future episode in this series will cover an idea I've had for ages, of a metric for when you should do a release: after so many commits or weeks have passed.)

You're waiting for just one more fix

Don't. It might never come. In my experience, it probably won't. You'll think to yourself, 'just one more week and someone will review this patch', or 'I'll write a patch for this in the next week'. That week becomes two, and a month, and a year, and even more. I've seen comments on issues called 'Plan for a 1.0 release' where the maintainer says 'I'll make a release in the next two weeks' and that comment is over a year old. I saw one of those comments recently that was two years old. It was mine.

Fight the urge to wait for more fixes. Yes, you want your module to be good, even maybe perfect. But tell yourself: if you release now, you're still making it better. So here's what you need to do:

If you're still not on a stable release, release another alpha or beta. The real purpose of unstable releases should be to get people to test your code. You really want them to be testing something recent, not code that's six months old.
If you're on stable releases, just release another one. Those issues you were waiting for can go in the next release (or the one after…)

Hopefully that should assuage concerns regarding one's responsibility as a maintainer.

But as a lazy maintainer, what's the benefit to you?

Your recent fixes are out there and in use. A bug isn't truly eradicated until a release is made that includes the fix.
Fewer duplicate issues filed. With fewer bugs that are fixed in dev but not in a release, there's less chance of people encountering them.
More users, because the age of the most recent release is a metric people use when evaluating a module.
Users are using a more recent version of the code, which means newer bugs are more likely to get caught. (Because seriously, how many people are actually trying out the dev release, unless they're forced to by unreleased fixes?)

Get the code out. Dev code serves nobody. Releases are what matters.

Module Builder announces split, due to functionality differences

By joachim, Sun, 28/02/2016 - 23:28

For Drupal 8, Module Builder is undergoing some big changes. It still builds hooks, a README file, an api.php file, permissions, an admin settings form, but now also builds plugins, services, routing items, and its ability to scan your codebase to learn about hooks invented by any of your modules is now extended to plugin types too. And it's actually been available for Drupal 8 for quite some time, but up till now only as the Drush plugin version.

I've now released the D8 version of the module, so you can use an admin UI in Drupal which lets you select the components you want in a form. Unlike Drupal 7 though, the options you enter for your module to generate are stored in a config entity, so you can generate code and then go back and tweak the settings and generate it again, as often as you like.

The big change isn't any of these though. The big change is that Module Builder is being split up.

For a very long time, the Module Builder codebase has been three things in one. Back when I added Drush support (in 2009, according to the git log), it made sense to gradually refactor the code into three parts: the Drupal module UI, the Drush commands, and the common code that does the actual work of generating code based on some parameters (such as which hooks you want, the module name, etc).

That core code has undergone a lot of changes. It's gone from just working with hooks, to a framework that's extensible with new component types. So for example, it's possible to request simply 'an admin form', and the generating code knows to produce the code for the form, the admin permission, and the router item. So that's one component that in fact produces form functions, hook_permission(), hook_menu() on Drupal 7, form class, permissions.yml, routing.yml on Drupal 8. Because Module Builder also works on multiple versions of Drupal (the code to produce Drupal 5 code is even still in there, if you have cause to try it, let me know if it still works!).

Having this multiple-version code within a Drupal module that's only for single version is a source of problems and confusion. The 7.x-2.x version of the project contains a module that's only for Drupal 7, but also the core code and the Drush plugin that both work on all versions. It also increases maintenance work, if we want to the older versions to keep receiving improvements to the generating code.

Hence the split. Module Builder is being divided into three parts:

The core code of Module Builder has been moved to a separate library, which is called Drupal Code Builder to distinguish it.
The Module Builder project is from now on just Drupal module, which requires the Drupal Code Builder library.
The Drush plugin will be moving too, and will also require the Drupal Code Builder library.

So to summarize, the situation is now as follows:

To build modules in a Drupal UI, on Drupal 8, you need:
- Module Builder 8.x-3.x (see the README for instructions)
- Drupal Code Builder library
To build modules with Drush, on any version, you need:
- Module Builder 7.x-2.x, installed as a Drush command plugin (again, see the README). But note this will shortly be changing when the Drush command moves out of the d.org project too.
To build modules in a Drupal UI, on Drupal 7, you need:
- Module Builder 7.x-2.x. I will probably release a 7.x-3.x at some point which requires the Drupal Code Builder library, so that the Drupal 7.x UI gets new features that are released in the library.

I'll be writing a post soon about how Drupal Code Builder works, so if you're interested in making Drupal Code Builder make something new, look out for that.

Importing wysiwyg image files in body text with Migrate

By joachim, Tue, 05/01/2016 - 21:38

Migrate module makes the assumption that for each item in your source, you're importing one item into Drupal, and that's baked in pretty deep.

So how do you handle source items where there is body text containing multiple inline images, which should be imported to file entities? For each single source item, ultimately imported as a node, you also have a variable number of images: a one-to-many source to destination relationship.

One way is to simply import the image files at the same time as the nodes whose body text they are in. In the prepareRow() method of your migration class, you effectively do a mini-import, analysing the text, fetching the file data, saving the file locally, and then getting a file ID that you can use to replace into the body text. But doing that, you don't benefit from any of Migrate's helper classes for importing files, nor can you roll these back without writing further code: this isn't a Migration, capital M, it's just an import.

The better way is to write a second migration, to run before your node migration. It may seem like extra work to have to write a second migration class, but it pays off, and besides, since they both draw from the same source data, a lot of your code is already written. Copy-paste the class definition and the constructor as far as the field mappings. The code you would have put in prepareRow() to analyse the text goes somewhere else, but we're getting ahead of ourselves.

And it turns out that Migrate does allow for your single source items to yield multiple destination items: tucked away in the module's source classes and not mentioned in the examples (which do cover a great deal, are probably the most extensive examples of any contrib module all the same, it must be said) there are at least two (that I've found) places where the one-to-one correspondence can be skirted around.

The one that I used is only available when you use the MigrateSourceList source type, and furthermore when your list source is a set of files. As it happened, the source data I was working with on my project came in JSON files, one file per node to import, and with a body text field which contained references to image files which also needed to be imported.

MigrateSourceList is a source type that separates out the two concepts of listing your source items and processing each one. So unlike, say, the CSV source where the list and the item processing are both provided by the one class, with MigrateSourceList, you can say something like 'my list source is a directory listing, and each item is a JSON file', or 'my list source is a JSON file, and each item is an HTML file'. The MigrateSourceList class delegates the two jobs to further classes, which allows you to mix and match them. Your migration specifies them in the constructor:

    $this->source = new MigrateSourceList(new MigrateListFiles($list_dirs, $base_dir),
      new MigrateItemFile($base_dir), $fields);

There's one more component in the system, and this is the crucial piece that allows us to have more than one destination item per source item: it's the MigrateContentParser class. This allows the files that MigrateListFiles to each return multiple items to MigrateSourceList.

The only implementation of this in Migrate is MigrateSimpleContentParser, which doesn't do much, so you'll want to subclass this for your particular case. It's fairly simple:

setContent() — perform any processing on the content of the file. In my case, I needed to run it through drupal_json_decode() and grab the body text field, since the whole file was JSON representing a node.
getChunkCount() — process the file content to return a count of how many items for migration are contained in it.
getChunkIDs() — similar to getChunkCount(), but return IDs. You will probably end up using a common helper method for this and getChunkCount(), as they do the same sort of work. The IDs you return can be anything you like; in my case they were GUIDs. They are appended to the ID for the file to form the overall source ID.
getChunk() — given a chunk ID (one of the ones you provided in getChunkIDs()), return the actual data for that item. Again, you may want to use a common helper. In my case, here I merely returned the ID itself, since the images to migrate were on a remote server and accessed by their GUID.

I submitted a patch to make it a bit easier to deal with the case where files might contain either one or many chunks (or even none): by default, a file providing only one chunk doesn't get to return a chunk ID, which wasn't working for my case. The patch (committed but not yet in a release) adds an option to override this so you always know the ID of the chunk: in my case, the GUID for the image found inside the body text, which was always needed by the image migration code.

At the other end, I needed a custom subclass of MigrateItem to deal with the data returned by getChunk(). This just needs to implement getItem(), and it can pretty much return anything you like: this is the same source data item that your migration class gets to work with.

So to recap, as there's quite a few classes flying around helping one another here, we have MigrateListFiles which uses a custom MigrateContentParser implementation to extract items from the source data files, with possibly more than one item from a single file, and then MigrateSourceList uses MigrateListFiles along with a custom MigrateItem implementation.

The setup code in my migration's constructor then looks like this:

    $parser = new CustomJSONBodyImagesParser();
    $list = new MigrateListFiles(
      // $list_dirs
      array($source_folder),
      // $base_dir
      $source_folder,
      // $file_mask
      '//',
      // $options
      array(),
      // MigrateContentParser $parser
      $parser
    );
    $item = new CustomImageGUIDItem();
    $this->source = new MigrateSourceList($list, $item, $fields);

The end result worked great, and the ability to rollback turned out to be very useful, when there turned out to be bad data here and there that needed to be cleaned up or skipped. But that's what always happens with migrations in my experience!

A script for making patches

By joachim, Fri, 27/03/2015 - 13:46

I have a standard format for patchnames: 1234-99.project.brief-description.patch, where 1234 is the issue number and 99 is the (expected) comment number. However, it involves two copy-pastes: one for the issue number, taken from my browser, and one for the project name, taken from my command line prompt.

Some automation of this is clearly possible, especially as I usually name my git branches 1234-brief-description. More automation is less typing, and so in true XKCD condiment-passing style, I've now written that script, which you can find on github as dorgpatch. (The hardest part was thinking of a good name, and as you can see, in the end I gave up.)

Out of the components of the patch name, the issue number and description can be deduced from the current git branch, and the project from the current folder. For the comment number, a bit more work is needed: but drupal.org now has a public API, so a simple REST request to that gives us data about the issue node including the comment count.

So far, so good: we can generate the filename for a new patch. But really, the script should take care of doing the diff too. That's actually the trickiest part: figuring out which branch to diff against. It requires a bit of git branch wizardry to look at the branches that the current branch forks off from, and some regular expression matching to find one that looks like a Drupal development branch (i.e., 8.x-4.x, or 8.0.x). It's probably not perfect; I don't know if I accounted for a possibility such as 8.x-4.x branching off a 7.x-3.x which then has no further commits and so is also reachable from the feature branch.

The other thing this script can do is create a tests-only patch. These are useful, and generally advisable on drupal.org issues, to demonstrate that the test not only checks for the correct behaviour, but also fails for the problem that's being fixed. The script assumes that you have two branches: the one you're on, 1234-brief-description, and also one called 1234-tests, which contains only commits that change tests.

The git workflow to get to that point would be:

Create the branch 1234-brief-description
Make commits to fix the bug
Create a branch 1234-tests
Make commits to tests (I assume most people are like me, and write the tests after the fix)
Move the string of commits that are only tests so they fork off at the same point as the feature branch: git rebase --onto 8.x-4.x 1234-brief-description 1234-tests
Go back to 1234-brief-description and do: git merge 1234-tests, so the feature branch includes the tests.
If you need to do further work on the tests, you can repeat with a temporary branch that you rebase onto the tip of 1234-tests. (Or you can cherry-pick the commits. Or do cherry-pick with git rev-list, which is a trick I discovered today.)

Next step will be having the script make an interdiff file, which is a task I find particularly fiddly.

A git-based patch workflow for drupal.org (with interdiffs for free!)

By joachim, Thu, 27/11/2014 - 08:39

There's been a lot of discussion about how we need github-like features on d.org. Will we get them? There's definitely many improvements in the pipeline to the way our issue queues work. Whether we actually need to replicate github is another debate (and my take on it is that I don't think we do).

In the meantime, I think that it's possible to have a good collaborative workflow with what we have right now on drupal.org, with just the issue queue and patches, and git local branches. Here's what I've gradually refined over the years. It's fast, it helps you keep track of things, and it makes the most of git's strengths.

A word on local branches

Git's killer feature, in my opinion, is local branches. Local branches allow you to keep work on different issues separate, and they allow you to experiment and backtrack. To get the most out of git, you should be making small, frequent commits.

Whenever I do a presentation on git, I ask for a show of hands of who's ever had to bounce on CMD-Z in their text editor because they broke something that was working five minutes ago. Commit often, and never have that problem again: my rule of thumb is to commit any time that your work has reached a state where if subsequent changes broke it, you'd be dismayed to lose it.

Starting work on an issue

My first step when I'm working on an issue is obviously:

git pull

This gets the current branch (e.g. 7.x, 7.x-2.x) up to date. Then it's a good idea to reload your site and check it's all okay. If you've not worked on core or the contrib project in question in a while, then you might need to run update.php, in case new commits have added updates.

Now start a new local branch for the issue:

git checkout -b 123456-foobar-is-broken

I like to prefix my branch name with the issue number, so I can always find the issue for a branch, and find my work in progress for an issue. A description after that is nice, and as git has bash autocompletion for branch names, it doesn't get in the way. Using the issue number also means that it's easy to see later on which branches I can delete to unclutter my local git checkout: if the issue has been fixed, the branch can be deleted!

So now I can go ahead and start making commits. Because a local branch is private to me, I can feel free to commit code that's a total mess. So something like:

  dpm($some_variable_I_needed_to_examine);
  /*
  // Commented-out earlier approach that didn't quite work right.
  $foo += $bar;
  */
  // Badly-formatted code that will need to be cleaned up.
  if($badly-formatted_code) { $arg++; }

That last bit illustrates an important point: commit code before cleaning up. I've lost count of the number of times that I've got it working, and cleaned up, and then broken it because I've accidentally removed an important line that was lost among the cruft. So as soon as code is working, I make a commit, usually whose message is something like 'TOUCH NOTHING IT WORKS!'. Then, start cleaning up: remove the commented-out bits, the false starts, the stray code that doesn't do anything, in small commits of course. (This is where you find it actually does, and breaks everything: but that doesn't matter, because you can just revert to a previous commit, or even use git bisect.)

Keeping up to date

Core (or the module you're working on) doesn't stay still. By the time you're ready to make a patch, it's likely that there'll be new commits on the main development branch (with core it's almost certain). And before you're ready, there may be commits that affect your ongoing work in some way: API changes, bug fixes that you no longer need to work around, and so on.

Once you've made sure there's no work currently uncommitted (either use git stash, or just commit it!), do:

git fetch
git rebase BRANCH

where BRANCH is the main development branch that is being committed to on drupal.org, such as 8.0.x, 7.x-2.x-dev, and so on.

(This is arguably one case where a local branch is easier to work with than a github-style forked repository.)

There's lots to read about rebasing elsewhere on the web, and some will say that rebasing is a terrible thing. It's not, when used correctly. It can cause merge conflicts, it's true. But here's another place where small, regular commits help you: small commits mean small conflicts, that shouldn't be too hard to resolve.

Making a patch

At some point, I'll have code I'm happy with (and I'll have made a bunch of commits whose log messages are 'clean-up' and 'formatting'), and I want to make a patch to post to the issue:

git diff 7.x-1.x > 123456.PROJECT.foobar-is-broken.patch

Again, I use the issue number in the name of the patch. Tastes differ on this. I like the issue number to come first. This means it's easy to use autocomplete, and all patches are grouped together in my file manager and the sidebar of my text editor.

Reviewing and improving on a patch

Now suppose Alice comes along, reviews my patch, and wants to improve it. She should make her own local branch:

git checkout -b 123456-foobar-is-broken

and download and apply my patch:

  wget PATCHURL
  patch -p1 < 123456.PROJECT.foobar-is-broken.patch

(Though I would hope she has a bash alias for 'patch -p1' like I do. The other thing to say about the above is that while wget is working at downloading the patch, there's usually enough time to double-click the name of the patch in its progress output and copy it to the clipboard so you don't have to type it at all.)

And finally commit it to her branch. I would suggest she uses a commit message that describes it thus:

git commit -m "joachim's patch at comment #1"

(Though again, I would hope she uses a GUI for git, as it makes this sort of thing much easier.)

Alice can now make further commits in her local branch, and when she's happy with her work, make a patch the same way I did. She can also make an interdiff very easily, by doing a git diff against the commit that represents my patch.

Incorporating other people's changes to ongoing work

All simple so far. But now suppose I want to fix something else (patches can often bounce around like this, as it's great to have someone else to spot your mistakes and to take turns with). My branch looks like it did at my patch. Alice's patch is against the main branch (for the purposes of this example, 7.x-1.x).

What I want is a new commit on the tip of my local branch that says 'Alice's changes from comment #2'. What I need is for git to believe it's on my local branch, but for the project files to look like the 7.x-1.x branch. With git, there's nearly always a way:

git checkout 7.x-1.x .

Note the dot at the end. This is the filename parameter to the checkout command, which tells git that rather than switch branches, you want to checkout just the given file(s) while staying on your current branch. And that the filename is a dot means we're doing that for the entire project. The branch remains unchanged, but all the files from 7.x-1.x are checked out.

I can now apply Alice's patch:

  wget PATCHURL
  patch -p1 < 123456.2.PROJECT.foobar-is-broken.patch

(Alice has put the comment ID after the issue ID in the patch filename.)

When I make a commit, the new commit goes on the tip of my local branch. The commit diff won't look like Alice's patch: it'll look like the difference between my patch and Alice's patch: effectively, an interdiff. I now make a commit for Alice's patch:

git commit -m "Alice's patch at comment #2"

I can make more changes, then do a diff as before, post a patch, and work on the issue advances to another iteration.

Here's an example of my local branch for an issue on Migrate I've been working on recently. You can see where I made a bunch of commits to clean up the documentation to get ready to make a patch. Following that is a commit for the patch the module maintainer posted in response to mine. And following that are a few further tweaks that I made on top of the maintainer's patch, which I then of course posted as another patch.

A screenshot of a git GUI showing the tip of a local branch, with a commit for a patch from another user.

(Notice how in a local branch, I don't feel the need to type terribly accurately for my commit messages, or indeed be all that clear.)

Improving on our tools

Where next? I'm pretty happy with this workflow as it stands, though I think there's plenty of scope for making it easier with some git or bash aliases. In particular, applying Alice's patch is a little tricky. (Though the stumbling block there is that you need to know the name of the main development branch. Maybe pass the script the comment URL, and let it ask d.org what the branch of that issue is?)

Beyond that, I wonder if any changes can be made to the way git works on d.org. A sandbox per issue would replace the passing around of patch files: you'd still have your local branch, and merge in and push instead of posting a patch. But would we have one single branch for the issue's development, which then runs the risk of commit clashes, or start a new branch each time someone wants to share something, which adds complexity to merging? And finally, sandboxes with public branches mean that rebasing against the main project's development can't be done (or at least, not without everyone know how to handle the consequences). The alternative would be merging in, which isn't perfect either.

The key thing, for me, is to preserve (and improve) the way that so often on d.org, issues are not worked on by just one person. They're a ball that we take turns pushing forward (snowball, Sisyphean rock, take your pick depending on the issue!). That's our real strength as a community, and whatever changes we make to our toolset have to be made with the goal of supporting that.

Building Fast and Flexible Application UIs with Entity Operations

By joachim, Tue, 18/11/2014 - 13:38

Now I've finished the Big Monster Project of Doom that I've been on the last two years, I can talk more about some of the code that I wrote for it. I can also say what it was: it was a web application for activists to canvass the public for a certain recent national referendum (I'll let you guess which one).

One of the major modules I wrote was Entity Operations module. What began as a means to avoid repeating the same code each time I needed a new entity type soon became the workhorse for the whole application UI.

The initial idea was this: if you want a custom entity type, and you want a UI for adding, editing, and deleting entities (much like with nodes), then you have to build this all yourself: hook_menu() items, various entity callbacks, form builders (and validation and submit handlers) for the entity form and the delete confirmation form. (The Model module demonstrates this well.)

That's a lot of boilerplate code, where the only difference is the entity type's name, the base path where the entity UI sits, and the entity form builder itself (but even that can be generalized, as will be seen).

Faced with this and a project on which I knew from the start I was going to need a good handful of custom entities (for use with Microsoft Dynamics CRM, accessed with another custom module of mine, Remote Entity API), I undertook to build a framework that would take away all the repetition.

An Entity UI is thus built by declaring:

A base path (for nodes, this would be 'node'; we'll ignore the fact that in core, this path itself is a listing of content).
A list of subpaths to form the tabs, and the operation handler class for each one

With this in hand, why stop at just the entity view and edit tabs? The operation handlers can output static content or forms: they can output anything. One of the most powerful enhancements I made to this early on was to write an operations handler that outputs a view. It's the same idea as the EVA module.

So for the referendum canvassing application, I had a custom Campaign entity, that functioned as an Organic Group, and had as UI tabs several different views of members, views of contacts in the Campaign's geographic area, views of Campaign group content (such as tasks and contact lists), and so on.

This approach proved very flexible and quick. The group content entities were themselves also built with this, so that, for example, Contact List entities had operations for a user to book the entity, input data, and release it when done working on it. These were built with custom operation handlers specific to the Contact List entity, subclassing the generic form operation handler.

An unexpected bonus to all this was how easy it was to expose form-based operations to Views Bulk Operations and Services (as 'targeted actions' on the entity). This allowed the booking and release operations to work in bulk on views, and also to be done via a mobile app over Services.

A final piece of icing on the cake was the addition of alternative operation handlers for entity forms that provide just a generic bare bones form that invokes Field API to attach field widgets. With these, the amount of code needed to build a custom entity type is reduced to just three functions:

hook_entity_info(), to declare the entity type to Drupal core
hook_entity_operations_info(), to declare the operations that make up the UI
callback_entity_access(), which controls the access to the operations

The module has a few further tricks up its sleeve. If you're using user permissions for your entities, there's a helper function to use in your hook_permission(), which creates permissions out of all the operations (so: 'edit foobar entities', 'book foobar entities', 'discombobulate foobar entities' and so on). The entity URI callback that Drupal core requires you to have can be taken care of by a helper callback which uses the entity's base path definition. There's a form builder that lets you easily embed form-based operations into the entity build, so that you can put the sort of operations that are single buttons ('publish', 'book', etc) on the entity rather than in a tab. And finally, the links to operation tabs can be added to a view as fields, allowing a list of entities with links to view, edit, book, discombobulate, and so on.

So what started as a way to simplify and remove repetitive code became a system for building a whole entity-based UI, which ended up powering the whole of the application.

Graphing relationships between entity types

By joachim, Sun, 31/08/2014 - 21:00

Another thing that was developed as a result of my big Commerce project (see my previous blog post for the run-down of the various modules this contributed back to Drupal) was a bit of code for generating a graph that represents the relationships between entity types.

For a site with a lot of entityreference fields it's a good idea to draw diagrams before you get started, to figure out how everything ties together. But it's also nice to have a diagram that's based on what you've built, so you can compare it, and refer back to it (not to mention that it's a lot easier to read than my handwriting).

The code for this never got released; I tried various graph engines that work with Graph API, but none of them produced quite what I was hoping for. It just sat in my local copy of Field Tools for the last couple of years (I didn't even make a git branch for it, that shows how rough it was!). Then yesterday I came across the Sigma.js graph library, and that inspired me to dig out the code and finish it off.

To give the complete picture, I've added support for the relationships that are formed between entity types by their schema fields: things like the uid property on a node. These are easily picked out of hook_schema()'s foreign keys array.

In the end, I found Sigma.js wasn't the right fit: it looks very pretty, but it expects you to dictate the position of the nodes in the canvass, which for a generated graph doesn't really work. There is a plugin for it that allows the graph to be force-directed, but that was starting to be too fiddly. Instead though, I found Springy, that while maybe not quite as versatile, automatically lays out the graph nodes out of the box. It didn't take too long to write a library module for using Springy with Graph API.

Here's the result:

Graph showing relationships between entity types on a development Drupa site

Because this uses Graph API, it'll work with any graph engine, not just Springy. So I'll be interested to see what people who are more familiar with graphing can make it do. To get something that looks like the above for your site, it's simple: install the 7.x-1.x-dev release of Field Tools, install Graph API, install the Springy module, and follow the instructions in the README of that last module for installing the Springy Javascript library.

The next stage of development for this tool is figuring out a nice way of showing entity bundles. After all, entityreference fields are on specific bundles, and may point to only certain bundles. However, they sometimes point to all bundles of an entity type. And meanwhile, schema properties are always on all bundles and point to all bundles. How do we represent that without the graph turning into a total mess? I'm pondering adding a form that lets you pick which entity types should be shown as separate bundles, but it's starting to get complicated. If you have any thoughts on this, or other ways to improve this feature, please share them with me in the Field Tools issue queue!

Getting Module Builder ready for Drupal 8

By joachim, Tue, 19/08/2014 - 08:12

I've just made a commit to Module Builder that adds unit tests. This is a big deal, because having these frees me up to start making the big changes that are needed for supporting Drupal 8's new structures: routes, plugins, forms, and so on.

The biggest challenge is going to be the interface. Currently, you give Module Builder just a module name and a list of hook names, and it does the necessary. On the command line it's nice and simple:

drush mb mymodule install schema node_insert form_alter views_data_alter

The first parameter is the module name, and everything that follows is a hook name. Now we add to the mix requests such as a form called MyModuleCakeToppingForm, or an entity type plugin, or a route bake_my_cake and its page controller. How to elegantly specify all that over the command line, without making it horribly unwieldy and impossible to remember how to use?

It's also going to be an interesting exercise in reading my own documentation and seeing how much sense it makes after something like 7 months away from the code.

From what I recall, Module Builder uses a hierarchy of component generators to build your module. Taking our example above, the first thing that happens is that the Module generator class kicks in. 'So, you want a module, do you?' it asks, 'You'll need some of these.' And it begins to assemble a list of further generators, for the components it needs: an info file, and the hooks generator. The hooks generator does the actual job of examining your list of requested hooks, and decides based on that that you need three code files: a .module, a .install, and a .views.inc. So by now we have a tree of generators like this:

- Module
-- Info file
-- Hooks
--- Code file: .module
--- Code file: .install
--- Code file: .views.inc

This is not a class hierarchy; this is a tree of objects where each generator has a list of the generators beneath it, and is responsible for collecting data from them. Once we have the tree, we iteratively have each generator assemble the data it wants to contribute, starting with the Module generator at the top.

The original plan when I wrote this system was to make the smallest granularity be a file. The leaves of the generator tree would assemble the text for their file's contents, and the Module generator would collect the files up and return them to the caller for output (either in the UI, or to write them directly).

However, while the original intention of this system was that it could be generalised to base components other than modules (so profiles and themes, which are both supported to some extend but lack the UI, see above!), it's also proven to be extendable downwards to smaller components, and to be worthwhile to do so.

Enter the Form generator. Once we have a generic Function generator (and its child class the HookImplementation), we can create a Form generator. Given a form machine name, 'foo_form', it simply knows to add three copies of the Function generator: 'foo_form', 'foo_form_validate', 'foo_form_submit', along with the correct parameters and some boiler plate code.

And we can specialize this further: the AdminSettingsForm simply extends the Form generator, and adds a menu item component, which itself ensures hook_menu() is requested.

At this point it starts to get a bit complicated, as we have components that request other components that are in totally different parts of the component tree. That's the point at which I think I was when I realized I needed tests so that I can refactor and clean up the messy bits of this, and enhance and extend it, without breaking what's already there.

So that's the current state of Module Builder: not yet ready for Drupal 8, but has lots of potential. At this point, I'd really welcome input on the Drush interface, as that's the big quandary. And any input on new Drupal 8 component generators would be great too; there are a few open issues in the queue. And finally, Module Builder is a complex beast; should anyone looking at the code find it baffling and impenetrable, do please file a documentation issue to highlight the problem and request clarification.

Using Human Queue Worker to process comments

By joachim, Sat, 09/08/2014 - 09:44

Some time ago, I released Human Queue Worker, a module that takes the concept of the Drupal Queue system, but where the processing of the items is done by human users rather than an automated process. I say 'takes the concept'; it in fact uses the Drupal Queue to create and claim queue items, but instead of declaring your queue with hook_cron_queue_info(), you declare it to Human Queue Worker as a queue that humans will be working on.

This was written for my current project and for a fairly specific need, and I didn't imagine many sites would be using it. However, it has an obvious and popular application: approving comments. I always figured it would be nice if someone wrote a little module to define a comment processing human queue.

Well, that someone is me, and the time is now. You see, I'm an idiot: when I set up this new blog site of mine, I totally forgot to set up a CAPTCHA, and then when I added Mollon, I didn't set it up properly. So this site has a few hundred spammy comments that I need to delete.

The problem is that comment management takes time. Unless there are some magical area of the core UI I've completely missed, I can either visit each node and delete them one by one, or use the comment admin form. There, I can mass-delete the ones with obvious spammy titles, but all the others will still need individual inspection.

The Human Queue UI simplifies this hugely. There's just one page for the queue. When you go to that page, you're presented with an item to process. In the case of comment approval, that's the comment itself, plus the parent node and parent comment to give you some context. To process the comment, click one of two buttons: 'Publish' or 'Delete'. The comment is dealt with, and the form reloads, with a brand new comment for you to process. Which means that the only clicking you do is the action buttons: Publish; Delete; Publish; Delete. (Though with the amount of spam on my site, it's probably Delete; Delete; Delete, like the Cybermen.)

I've not timed it, but I reckon I can probably go at quite a rate. And that's with just one of me: the core Queue system guarantees that only one worker can claim an item at any one time, and that applies to human workers too. So if another user were to work the queue too, by going to the same page, they would be getting shown different comments to work on, and we'd work through the comments at twice the rate.

Now I just need to find a compliant friend and make them into my worker drone. If you're interested, please don't post a comment!

The ripple effect

By joachim, Fri, 11/07/2014 - 19:21

Hello! I'm back. I've not made any blog posts in over a year and a half due to the site where my blog was before, drupaler.co.uk, closing down. And while it took me some time to get round to writing a Migrate script to import my posts from the old site's database, it was actually getting round to setting up this new domain that took the longest.

So what have I been doing all this time? Especially as I still don't have a single Drupal 7 site out there to my name? Well, these days I work on a humongous web application which has kept me busy for the last 18 months; it's a large Drupal site (we hit a million of one of its several custom entity types recently), but to the general public it's just a login page. I may talk more about the development challenges in future posts.

Prior to that, I was building what would have been one of the earliest big Drupal Commerce sites to launch... except that very shortly before launch in October 2012, the whole project got canned.

That was obviously rather demoralizing, as it had been a year in the building. However, it's interesting to look back and see just how much a large-scale project can contribute back to the Drupal community. The following is a list of the contrib modules that were created as part of the development of the project. Many of them are pretty simple; many of them don't get that much use. But they get some, and so it's interesting to see how much code a project can share if development is steered towards reusability, and also how a single project, even one that never sees the light of day, can have effects that ripple out and benefit many others.

Reference field option limit

This was built to improve the UI for selecting terms from large taxonomies, where one vocabulary groups another. For example, you could have a huge vocabulary of cities, and a smaller one of cities. Each city term has a term reference to the country it belongs to (the glee of the early days of Drupal 7: all the fields on all the things!), and the entity that you want to tag (in my case a product) has term reference fields to both the country and the city. The way this module works is that the city terms shown in one widget are filtered each time you change the selected country term in the other field's widget. So you select 'France' and the city field widget updates with AJAX to show only cities in France.

It sounds simple to explain (I hope!), but involves a fair amount of work under the hood in the form alteration to pull it off. It's one of my most popular modules, and has received a fair number of patches from other users.

Devel Contrib and Field Tools

These two are developer modules. Devel Contrib arose from my constantly needing to dsm() the data from info hooks such as hook_views_data() and hook_entity_property_info(). When the thing you're developing keeps going wrong, one of first things to do is poke around to check you've properly declared everything to the APIs you're using. I started out just having this as a couple of menu items declared in a custom module, but pretty soon I added more, and I wanted it on other development sites, so the obvious thing to do was clean it up and release it.

Field Tools was something I wrote because the Commerce site in question had dozens of different product types and corresponding node types, with lots of common fields, and I really didn't want to spend hours clicking through the field admin UI to set them up. This may seem like a case of condiment-passing, but I'm sure it's saved me hours of tedium: create the fields once, and then quickly clone them to any other entity types and bundles.

Field Instance Cardinality

A very small tweak module, this lets you override a particular instance of a multi-valued field and set it to be single-valued only. It's a hook_form_alter() hack made reusable by adding some field admin settings, and a good example of how site customization can often be done as modules rather than alter hacks.

Field Value Link Formatter

Very simple module: your taxonomy term fields (or other entity references) should link to a view with an entity ID argument. I guess the more traditional way of doing this is with a lot of path aliases for your taxonomy terms. I think we may also have used this to create Profile-style lists of related entities.

Commerce Shipping Weight Tariff

This was created to deal with the complex business rules we had for shipping costs. Written very near to our planned launch date, it's an example of what I call the 'skimp on the admin UI' contrib module: hardcode the settings, but do so in a way that's cleanly separated from the rest of the module, so that later on an admin UI can be added, perhaps contributed as a patch. Though I think that's yet to materialize for this one.

Views Grouped Table

This was created to help keep track of how the product entities and product display nodes all related to one another.

Views Dependent Filters

Allows the presence of exposed filters on a view to be controlled by values in another exposed filter. This was intended to handle filtering Views of products, though we later switched to using Views with Solr.

Taxonomy add previous

A little bit of UX sugar for when you're adding lots of taxonomy terms that are very similar. In our case, this was terms representing sizes. On creating a new term, this takes you straight back to the form for adding another term, and prefills the field values from the one you just added.

Flag Expire

Allows flags to have either an expiry date, or an expiry period. This was going to be used to feature products, or mark them as new, or discounted. (For new product, you could just automatically mark all products that were created in the last x days, say. But as I recall, the client didn't want ALL new products marked, just selected ones. The way clients do.)

As well as new contrib modules, the project resulted in work on existing contrib modules, in particular Flag, Data, Views Hacks, and Commerce Delivery.

Drupal Planet Posts

The Lazy Maintainer's Handbook, Part 1: Frequent Releases

The branch is unstable

The last release was recent, and users may dislike frequent releases

You're waiting for just one more fix

Tags

Module Builder announces split, due to functionality differences

Tags

Importing wysiwyg image files in body text with Migrate

Tags

A script for making patches

Tags

A git-based patch workflow for drupal.org (with interdiffs for free!)

A word on local branches

Starting work on an issue

Keeping up to date

Making a patch

Reviewing and improving on a patch

Incorporating other people's changes to ongoing work

Improving on our tools

Tags

Building Fast and Flexible Application UIs with Entity Operations

Tags

Graphing relationships between entity types

Getting Module Builder ready for Drupal 8

Tags

Using Human Queue Worker to process comments

Tags

The ripple effect

Reference field option limit

Devel Contrib and Field Tools

Field Instance Cardinality

Field Value Link Formatter

Commerce Shipping Weight Tariff

Views Grouped Table

Views Dependent Filters

Taxonomy add previous

Flag Expire

Tags