drupal planet

A git-based patch workflow for drupal.org (with interdiffs for free!)

There's been a lot of discussion about how we need github-like features on d.org. Will we get them? There's definitely many improvements in the pipeline to the way our issue queues work. Whether we actually need to replicate github is another debate (and my take on it is that I don't think we do).

In the meantime, I think that it's possible to have a good collaborative workflow with what we have right now on drupal.org, with just the issue queue and patches, and git local branches. Here's what I've gradually refined over the years. It's fast, it helps you keep track of things, and it makes the most of git's strengths.

A word on local branches

Git's killer feature, in my opinion, is local branches. Local branches allow you to keep work on different issues separate, and they allow you to experiment and backtrack. To get the most out of git, you should be making small, frequent commits.

Whenever I do a presentation on git, I ask for a show of hands of who's ever had to bounce on CMD-Z in their text editor because they broke something that was working five minutes ago. Commit often, and never have that problem again: my rule of thumb is to commit any time that your work has reached a state where if subsequent changes broke it, you'd be dismayed to lose it.

Starting work on an issue

My first step when I'm working on an issue is obviously:

  git pull

This gets the current branch (e.g. 7.x, 7.x-2.x) up to date. Then it's a good idea to reload your site and check it's all okay. If you've not worked on core or the contrib project in question in a while, then you might need to run update.php, in case new commits have added updates.

Now start a new local branch for the issue:

  git checkout -b 123456-foobar-is-broken

I like to prefix my branch name with the issue number, so I can always find the issue for a branch, and find my work in progress for an issue. A description after that is nice, and as git has bash autocompletion for branch names, it doesn't get in the way. Using the issue number also means that it's easy to see later on which branches I can delete to unclutter my local git checkout: if the issue has been fixed, the branch can be deleted!

So now I can go ahead and start making commits. Because a local branch is private to me, I can feel free to commit code that's a total mess. So something like:

  dpm($some_variable_I_needed_to_examine);
  /*
  // Commented-out earlier approach that didn't quite work right.
  $foo += $bar;
  */
  // Badly-formatted code that will need to be cleaned up.
  if($badly-formatted_code) { $arg++; }

That last bit illustrates an important point: commit code before cleaning up. I've lost count of the number of times that I've got it working, and cleaned up, and then broken it because I've accidentally removed an important line that was lost among the cruft. So as soon as code is working, I make a commit, usually whose message is something like 'TOUCH NOTHING IT WORKS!'. Then, start cleaning up: remove the commented-out bits, the false starts, the stray code that doesn't do anything, in small commits of course. (This is where you find it actually does, and breaks everything: but that doesn't matter, because you can just revert to a previous commit, or even use git bisect.)

Keeping up to date

Core (or the module you're working on) doesn't stay still. By the time you're ready to make a patch, it's likely that there'll be new commits on the main development branch (with core it's almost certain). And before you're ready, there may be commits that affect your ongoing work in some way: API changes, bug fixes that you no longer need to work around, and so on.

Once you've made sure there's no work currently uncommitted (either use git stash, or just commit it!), do:

git fetch
git rebase BRANCH

where BRANCH is the main development branch that is being committed to on drupal.org, such as 8.0.x, 7.x-2.x-dev, and so on.

(This is arguably one case where a local branch is easier to work with than a github-style forked repository.)

There's lots to read about rebasing elsewhere on the web, and some will say that rebasing is a terrible thing. It's not, when used correctly. It can cause merge conflicts, it's true. But here's another place where small, regular commits help you: small commits mean small conflicts, that shouldn't be too hard to resolve.

Making a patch

At some point, I'll have code I'm happy with (and I'll have made a bunch of commits whose log messages are 'clean-up' and 'formatting'), and I want to make a patch to post to the issue:

  git diff 7.x-1.x > 123456.PROJECT.foobar-is-broken.patch

Again, I use the issue number in the name of the patch. Tastes differ on this. I like the issue number to come first. This means it's easy to use autocomplete, and all patches are grouped together in my file manager and the sidebar of my text editor.

Reviewing and improving on a patch

Now suppose Alice comes along, reviews my patch, and wants to improve it. She should make her own local branch:

  git checkout -b 123456-foobar-is-broken

and download and apply my patch:

  wget PATCHURL
  patch -p1 < 123456.PROJECT.foobar-is-broken.patch

(Though I would hope she has a bash alias for 'patch -p1' like I do. The other thing to say about the above is that while wget is working at downloading the patch, there's usually enough time to double-click the name of the patch in its progress output and copy it to the clipboard so you don't have to type it at all.)

And finally commit it to her branch. I would suggest she uses a commit message that describes it thus:

  git commit -m "joachim's patch at comment #1"

(Though again, I would hope she uses a GUI for git, as it makes this sort of thing much easier.)

Alice can now make further commits in her local branch, and when she's happy with her work, make a patch the same way I did. She can also make an interdiff very easily, by doing a git diff against the commit that represents my patch.

Incorporating other people's changes to ongoing work

All simple so far. But now suppose I want to fix something else (patches can often bounce around like this, as it's great to have someone else to spot your mistakes and to take turns with). My branch looks like it did at my patch. Alice's patch is against the main branch (for the purposes of this example, 7.x-1.x).

What I want is a new commit on the tip of my local branch that says 'Alice's changes from comment #2'. What I need is for git to believe it's on my local branch, but for the project files to look like the 7.x-1.x branch. With git, there's nearly always a way:

  git checkout 7.x-1.x .

Note the dot at the end. This is the filename parameter to the checkout command, which tells git that rather than switch branches, you want to checkout just the given file(s) while staying on your current branch. And that the filename is a dot means we're doing that for the entire project. The branch remains unchanged, but all the files from 7.x-1.x are checked out.

I can now apply Alice's patch:

  wget PATCHURL
  patch -p1 < 123456.2.PROJECT.foobar-is-broken.patch

(Alice has put the comment ID after the issue ID in the patch filename.)

When I make a commit, the new commit goes on the tip of my local branch. The commit diff won't look like Alice's patch: it'll look like the difference between my patch and Alice's patch: effectively, an interdiff. I now make a commit for Alice's patch:

  git commit -m "Alice's patch at comment #2"

I can make more changes, then do a diff as before, post a patch, and work on the issue advances to another iteration.

Here's an example of my local branch for an issue on Migrate I've been working on recently. You can see where I made a bunch of commits to clean up the documentation to get ready to make a patch. Following that is a commit for the patch the module maintainer posted in response to mine. And following that are a few further tweaks that I made on top of the maintainer's patch, which I then of course posted as another patch.

A screenshot of a git GUI showing the tip of a local branch, with a commit for a patch from another user.

(Notice how in a local branch, I don't feel the need to type terribly accurately for my commit messages, or indeed be all that clear.)

Improving on our tools

Where next? I'm pretty happy with this workflow as it stands, though I think there's plenty of scope for making it easier with some git or bash aliases. In particular, applying Alice's patch is a little tricky. (Though the stumbling block there is that you need to know the name of the main development branch. Maybe pass the script the comment URL, and let it ask d.org what the branch of that issue is?)

Beyond that, I wonder if any changes can be made to the way git works on d.org. A sandbox per issue would replace the passing around of patch files: you'd still have your local branch, and merge in and push instead of posting a patch. But would we have one single branch for the issue's development, which then runs the risk of commit clashes, or start a new branch each time someone wants to share something, which adds complexity to merging? And finally, sandboxes with public branches mean that rebasing against the main project's development can't be done (or at least, not without everyone know how to handle the consequences). The alternative would be merging in, which isn't perfect either.

The key thing, for me, is to preserve (and improve) the way that so often on d.org, issues are not worked on by just one person. They're a ball that we take turns pushing forward (snowball, Sisyphean rock, take your pick depending on the issue!). That's our real strength as a community, and whatever changes we make to our toolset have to be made with the goal of supporting that.

Corralling permissions into a grid

I've just released Permissions Grid. It does what the name suggests: it presents related permissions in a grid, rather than the usual long list.

How are permissions structured into a grid? Well, only the ones that form natural groups are included: every set of permissions of the form 'create foo, edit foo, delete foo, create bar, edit bar, delete bar' is turned into a matrix of checkboxes with the verbs 'create, edit, delete' along the top, and the objects 'foo, bar' along the top. When modules such as node, taxonomy, and commerce define related permissions for nodes, vocabularies, and products respectively, that gives you something like this:

This gives an easy to grasp overview of what a role can do with different objects on the site: which node types can this role create? which can they edit, or delete? which product types can they edit? which vocabularies can they create terms in?

If this sounds and looks vaguely familiar, that's probably because this module has an ancestor: my Drupal 6 module node permissions grid module, which I wrote back when a site's content types started to become too numerous to easily make sense of. That operated only on node types, and like a great many contrib modules porting to Drupal 7, it's had to 'drop the node' and generalize. But in fact nothing restricts Permissions Grid to entities: all it cares about is permissions.

Structured permissions are declared to the module in an info hook, and each module may declare multiple sets of permissions. This allows for the fact that some modules add further vocabulary-related permissions which do not have the same pattern, and that commerce has entity permissions in both singular and plural form.

Are there any groups of permissions I've missed, whether in core or contrib? Post a feature request, or better still, take a look at the hook implementations already there and file a patch.

On rules versus hooks, or, abstraction shock

I need to add a bit of business logic to my Commerce site: a boolean field on product nodes marks that the corresponding products can't be delivered outside the UK.

And I know the way to do this in Commerce is to create a rule: react to the cart completing checkout, iterate over line items, check the corresponding products, and block the completion if the field in question is set.

Rules is great: with Rules, site builders can change site functionality and cause it to react to events. When non-techy people ask if my job involves designing websites, to put them right I say, 'I make websites go "bing!"'; and now, site builders can make them go 'bing!' too.

But I have a confession: I'm reluctant about using Rules. It's partly that I find the UI confusing, and it feels time-consuming to test them, but deeper than that I think it's just that I feel too far removed from the actual thing I'm trying to make.

And that makes me wonder: am I becoming a Drupal dinosaur?

Because I can imagine when Views first came along, developers who were used to writing their own query and formatting the result themselves, looking at the Views UI and thinking 'I don't feel in control of my lists of stuff any more'.

Or before CCK, developers wrote exactly the form elements they needed in the node form and saved it themselves in the database. I still sometimes speak to non-Drupal developers who want to be able to dump data into the node table directly (or pull it out) and when I tell them they can't, because the data that actually makes a node is spread out over the node table, the node_revision table, and then a multitude of field tables. And their feeling of disconnection at not being able to get their hands on 'the node' as a solid lump of database stuff must surely be akin to what I feel with Rules. And I'm going to call this feeling 'abstraction shock'.

I want to write a hook. I want to write the code for it, for it to feel like a solid thing. I know that my rule can (indeed, should) be exported to code, but I want code that I can read and see exactly what it says, rather than code that Rules will consume and understand. And most of all, I want to be able to put in debug statements to understand what I'm getting as I write it, and after I've written it when it's going wrong or when the site functionality has to change.

If that makes me a dinosaur, save me a seat next to the brontosaurus.

Git tricks: repatching for an issue branch

My workflow for making patches is to use a feature branch for a single issue. Whether you're a contributor or a maintainer it lets you advance the fixing of the problem in small increments, and safely experiment knowing you can roll back.

But where it goes wrong is when your patch is superseded by a newer one in the issue queue, and you want to work on it some more. How do you update your branch for the ongoing work? As ever, with git there's a way.

Let's start with the basics first: you're making a feature branch to work on an issue. I tend to follow the naming pattern '123456-fix-all-the-bugs', but for this example I'll call it 'issue'.

// Make a new branch and switch to it.
$ git co -b issue
// Make lots of commits.
// Ready to make a patch:
$ git diff > 123456.project.issue.patch

(Note that if you can make your patch to show all your commits one by one, which can sometimes aid in making it clear what you're changing, but that's for another day.)

You've now got a patch which you're uploading to the issue queue, and your tree looks something like this:

* [issue] Last commit, ready to roll a patch!
* Fixed the foobar.
* Added a bizbax.
/
* [master]

Now someone else comes along to the issue queue, reviews your patch, and posts a new patch of their own. You in turn look at patch 2, and while it's an improvement, you think it needs still more work.

The problem is how to apply the patch to your repository. It won't apply to the tip of the issue branch, and if you checkout master, you can't get back to your issue branch. You can of course just discard your original issue branch, and create a branch issue2 for patch 2.

Or you can do this:

// Start on the issue branch.
// Stash any work in progress!
$ git stash
// Checkout just the *files* of master, while keeping the HEAD pointer on the
// issue branch.
$ git checkout master -- .
// This puts the files from master into the working tree, but keeps the index
// on the issue branch. In simpler terms, the reverse of patch 1 will appear
// staged (as git believes that your files *ought* to look like patch 1, but
// actually look like master).
// We want the index clean, so unstage everything:
$ git reset HEAD .
// Now apply the new patch.
$ patch -p1 < patch-2.patch
// Now commit this as patch 2.
// Remember to stash pop when you're done!

Because the working tree files (that is, the actual files on your system) look like the master branch, the patch applies cleanly. But because git still believes its on the tip of the issue branch, the commit you make goes on the tip of that branch, and the diff it records is effectively the interdiff between your patch-1 and the other contributor's patch-2. Your tree looks like this:

* [issue] Applied patch 2 from Ada Lovelace.
* Last commit, ready to roll a patch!
* Fixed the foobar.
* Added a bizbax.
/
* [master]

Result: you can now do more work on this branch, and make more commits, and when you're ready, diff against master to make patch-3, ready to upload to the issue queue.

Git tricks: being on the wrong branch

I often find that I'm in the middle of one thing when I have to do another. Whether it's hotfixes for a client, or just finding a minor bug that blocks my current work, or needing to add components to a feature before I can add custom functionality.

The best way is to stash your current work, checkout the master branch, commit, then go back. If you're working on a feature branch (and you should be), then rebase that afterwards so you have access to the new work there. So that's:

$ git stash
$ git checkout master
// do commits
$ git checkout feature
$ git rebase master
$ git stash pop

But that's not always feasible. Sometimes I'm sloppy, and I've already made code changes before stashing. And lately, I've got one instance of Party module that's got a feature branch that's made database changes, but I don't want to hold that up ongoing commits (and I'm too lazy to set up a new local site!).

If your fix is just one commit, you can make it on the feature branch, then cherrypick it to the master branch like this:

// make your commit and note its SHA
$ git stash
$ git checkout master
$ git cherry-pick COMMIT
$ git checkout feature
$ git rebase master

The rebase should be smart enough to figure out the same commit exists on both branches, and will silently drop it from the feature branch.

Alternatively, if you want to do a chunk of main branch work, make a temporary branch on the tip of feature, which you can then move to the master branch when you're done:

$ git checkout -b moveme
// make as many commits as you like
// Now we take everything that's between the tips of feature and moveme, and move it to the tip of master
$ git rebase --onto master feature moveme
// Now merge moveme into master: this'll just fast-foward master.
$ git checkout master
$ git merge moveme
// moveme can be deleted now
$ git branch -d moveme
// Now rebase the feature branch
$ git checkout feature
$ git rebase master

As I've become more familiar with git, I've found that temporary, throw-away branches can be useful in a variety of situations. Another one is making a backup branch prior to potentially messy rebases: just create a branch 'backup' where you are to be sure that no matter what happens with the rebase, your current chain of commits will be preserved. If there are conflicts, you can diff against it to check no code was lost. And when you're happy with the result of the rebase, just delete it. Branches being cheap, and local, opens up a whole new set of uses for them.

Get out, git!

There are lots of good reasons to have your server's codebase be an actual git checkout. But there's one potential flaw: your entire repository's history ends up in your webroot inside a .git folder.

You can block access to it in your .htaccess, but that's hacking core (until this patch lands at least).

There is however an alternative method that lets you keep the entirety of git's working folder outside the webroot completely.

Here's how to convert an existing repository to this format:

  1. Move the .git folder to another location, renaming it in the process so it's no longer hidden. The convention is to leave it with a .git ending though, so for example, 'mysite.git'. I put these inside a folder called 'git' in the user's home folder, for instance.

    $ mv .git ~/git/mysite.git

  2. In its original place in your webroot, create a new file called '.git'. Into this file place a single line thus:

    gitdir: /absolute/path/to/your/mysite.git

    This needs to be an absolute path; relative ones confuse git when you go into subfolders. Using '~' to start at the user's home folder doesn't seem to work either.

  3. Finally, we need to tell the config file where the work folder is. This step isn't completely necessary, but it allows you to invoke the git command while standing in subfolders of your webroot, which is too handy a thing to lose.

    Standing either in the webroot or in the git folder, do:

    $ git config core.worktree "/absolute/path/to/your/webroot"

    You can also edit the git config file by hand to set this, which allows you to also add a comment explaining the manoeuvre for future reference.

That's all there is to it. You now have a working git repository whose working folder is completely inaccessible from the outside world.

For creating a new repo, you can use the following finger-twister:

$ git --git-dir=/path/to/repo.git --work-tree=. init && echo "gitdir: /path/to/repo.git" > .git

There are more tips in this question on StackOverflow. And for a hands-on tutorial, come to my session on git at DrupalCamp Scotland, taking place later this month in Edinburgh.

Moving a git local branch from one local to another

You're maybe at one of the many Drupal Co-worker Friday events that are taking place around the world today. You've packed up your laptop and your lunchbox, and you're looking forward to a day out of the house with some human contact.

But yesterday you were halfway through a big piece of work on your project. And you were using a git local branch, of course. Why? Because it keeps your work isolated off the main development branch, allowing other work to continue independently. And because while your commits are only local you're free to reorder them, edit the log messages, fixup mistakes as if they never happened, and so on. In fact, if you so choose, merging your local branch in with the --squash option makes it look like you made all of the work in a single, perfect commit. Wow!

But that branch is on your desktop machine, and today you're on the laptop. How to get it over from one to the other? You don't want to push it to the remote, because then that means you can't rework it, and it doesn't really belong there anyway. You could make one big patch of your branch against the development branch, but then on your laptop you won't have the dozen or so commits you made yesterday (and you work with small, simple commits, because it makes it easier to roll back should you need to, and to see what's been changed). Furthermore, you've a bunch of changes that aren't yet committed because they're work in progress. You need those too, but not mixed in with what's already committed and reasonably stable.

The solution is the git format-patch command. At first try this is a rather weird one, which fills your folder up with cryptic mailbox files (which as far as I can tell are nigh on useless in the modern world of webmail). But it has an option which turns it into something very useful indeed: --stout. So much so that I've added it to my git global config thus:

  fp  = format-patch --stdout

So standing in your repository and doing

  git fp the-dev-branch > my-local-branch.patch

will give you one single file that comprises multiple patches, one for each commit. Copy that to your laptop by your favourite means, and over there do:

  git co -b my-local-branch
  git am  my-local-branch.patch

And all your commits from your desktop machine are reproduced on your laptop.

But what about your work in progress? I was coming to that. Before you begin, make one commit of all of them, perhaps with a log message to remind you it's the work in progress.

Then after you've done 'git am', all we have to do is kill off that final tip commit, while keeping the changes it contains in the filesystem. The command for that is git reset with the --mixed option:

  git reset --mixed HEAD^

All your work in progress changes are now 'unstaged changes', and your laptop's copy of the repository is in exactly the same stage as your desktop machine.

What about going back the other way? Well I'll figure that one out tonight, I expect ;)

PS. The return trip is easily done thus:

When you create the branch on the new machine, but before you apply the patch, add a tag: git tag mytag. Then at the end of the day, do the original process in reverse but take your diff from the tab: git fp mytag > homeward.patch

And on your home machine, remember to kill the 'work in progress' commit before you apply the patch, with 'git reset --hard HEAD^'. That's a hard reset this time, since the changes in that commit are in the new commits you're bringing back.

Hope your Drupal Coworker Friday was as productive as mine!

A concept for limiting taxonomy terms by common fields

There are several modules that provide taxonomy term widgets that are more efficient at drilling down into large and complex vocabularies of terms, but I've not yet found something that just limits terms in some way.

The use case is somewhat like this: I have products that are classified by sport and by team. And teams can be rugby teams or football teams, and so on. Having one vocabulary per type of team feels somewhat weak, but putting them all in one vocabulary means the user has to wade through a lot of irrelevant terms.

So I've been thinking of ways to limit the terms in the field widget. I don't think this is something I'm going to develop (for reasons that will become clear), but it's just a wacky idea[*] I'm putting out there.

I've discarded my initial thought to use a hierarchy of terms, because that would result in terms that are essentially empty: the group of football team terms is not correctly a team term. It would also mean needing to prevent its selection in the widget, and its appearance in filter forms.

What I've landed on as a concept instead seems wacky but I think makes a lot of sense: now we can have fields on anything, we can match them. What I mean by this is that we compare the value in the 'sport' field on the product, and the value in the identical sport field applied to the terms. (Yes, taxonomy terms on taxonomy terms. It was bound to happen eventually, right?)

In terms of IA, this is actually quite simple and logical: add a term reference field to the team vocabulary, and add the sport field, which is already on products, to the team terms (which the client will be doing; I have no idea what teams belong to what sport).

In terms of the work the UI has to do, it's not that complex either once it's laid out. It goes like this: when the value in the sport field changes, make an ajax call on the team field. In the callback, load all the terms. For each term, look at its team value. Does that match the current value of the sport field in the product form? If so, let the term into the new options array. If not, discard it.

It's simple, but I've hit a problem: we have two really good JavaScript systems in Drupal 7 that work very well individually, but as far as I can tell live in completely different universes. The states system is great for telling one form element to show, hide, or change its value depending on the value of another element. The ajax system is great for telling one form element to react to a change in itself by updating something via ajax. What I don't see how to do is get the term reference widget element to watch for changes in other elements (like the states system does), and then update itself with ajax (using the ajax system ideally).

So that's where I've hit the buffers: my JavaScript skills are pretty minimal, and so that's where I leave this latest wacky idea for now. I've got proof-of-concept code of a term reference widget updating itself via ajax based on form values, but the missing piece is getting it to react to changes in another field, whose form elements can't be altered in PHP (because the term reference widget is being changed in hook_field_widget_form_alter(), which only sees that form element). I'd be grateful for any hints; I actually think this model is not as wacky as it first seems and could have a lot of applications.

[*] Back at DrupalCon London I chanced upon a conversation webchick and stella were having about Coder review automation on Drupal.org, and prefixed a suggestion with, 'I've got a wacky idea...', to which webchick said, 'When are your ideas not wacky?' I don't know whether it was just an off the cuff quip, or whether I really do come up with a lot of crazy stuff. I guess I can live with that reputation ;)

Dynamically changing Views table joins

I've recently had cause to make Views make joins to tables in peculiar ways. Here's some notes on the peculiar things I did with the views_join class to accomplish that.

First of all I'll briefly recap how we define a table to Views. Each item in the $data array returned to hook_view_data() represents all the information about a table. Each key in the array is a field on that table (well, or pseudofield), except for the 'table' key which has the basic data about our table, like this:

$data['my_table'] = array(
  // This defines how the table joins back to different bases.
  'table' => array(
    // How to join back to the base table 'crm_party'.
    'crm_party' => array(
      'left_field' => 'pid',
      'field' => 'pid',
    ),
  ),
);
// Now we can add field definitions on this table.

That's the simplest case. It says, 'to join back to {crm_party}, join on the column 'pid' on both tables'. (Note I will say 'column' when I am speaking of the database, and 'field' for Views, though that can mean both a field that you add to the view, and a field on the table that provides filters, sorts, or arguments.)

So adding a field on this table will cause Views to add this join clause to the query:

... JOIN my_table ON crm_party.pid = my_table.pid

We can easily join where the columns have different names, by giving different values for 'field' and 'left_field' in the table definition.

If the join requires conditions, that's where the 'extra' clause comes in, like this:

// How to join back to the base table 'crm_party'.
'crm_party' => array(
  'left_field' => 'pid',
  'field' => 'pid',
  'extra' => 'foo = 42',
),

This now gives us:

... JOIN my_table ON crm_party.pid = my_table.pid AND foo = 42

The 'extra' can also take an array, in which case each item is an array containing field, operator, and value. (If it seems a bit like the Database API, but not quite, that's because all this was introduced in Views 2 on Drupal 6).

// How to join back to the base table 'crm_party'.
'crm_party' => array(
  'left_field' => 'pid',
  'field' => 'pid',
  'extra' => array(
    // The 'extra' array is numeric, hence has no keys. This always looks odd to me!
    array(
      'field' => 'foo',
      'value' => 42,
      'numeric' => TRUE,
    ),
  ),
),

So far, this is all covered in the Advanced Help documentation contained within Views. But for our relationship handler from CRM Parties to attached entities, we needed a condition on the join depending on values selected in the UI. So the 'extra', defined in hook_views_alter(), won't do, as it's not changeable. Or is it?

When a relationship handler is adding itself to the query, the query hasn't been fully built yet. Rather, Views has a views_plugin_query_default object which will eventually be used to make a DatabaseAPI SelectQuery. This means we can actually reach into the table queue and change the definition for any table to the left of us, like this:

// Our relationship handler's query method:
function query() {
  // Call our parent query method to set up all our tables and joins.
  parent::query();

  if ($this->options['main']) {
    // This is a little weird.
    // We don't add an 'extra' (ie a further join condition) on our
    // relationship join, but rather on the join that got us here from
    // the {crm_party} table.
    // This means reaching into the query object's table queue and fiddling
    // with the join object.
    // Setting a join handler for the join definition is not useful, as that
    // would have no knowledge of the user option set in this relationship
    // handler.
    // @todo: It might however be cleaner to set one anyway and give it
    // a method to add the extra rather than hack the object directly...
    $table = $this->table;
    $base_join = $this->query->table_queue[$table]['join'];
    $base_join->extra = array(array('field' => 'main', 'value' => TRUE));
  }

When I wrote those comments in the code last week, I'd found that using a custom join handler isn't useful, because that has no knowledge of the relationship handler's data. However, this week I found myself working on a different case where I did need to find a way to do just that.

This week's problem was how to filter out Drupal Commerce products that are in the current cart, or more generally in any order (and the current cart's order ID can be supplied with a default argument plugin).

It seems a reasonable enough thing to ask of Views, but it's actually pretty complex, as what's in a cart or order is not products but line items, each of which refers to a product with a reference field.

After several failed attempts, I managed to write a query that produces the correct result, but it requires joining to a subquery which itself has the order ID within it (see the issue for gory details).

The first hurdle with this is easy to overcome: there's nothing wrong about telling Views about a table that doesn't exist. This is often done with aliased tables, but in fact it can be totally fictional provided we also provide our own join handler which understands what to do to the query. That can be anything, as long as the SELECT fields we also add make sense. Hence it's fine to do this in hook_views_data():

// Fake table for the 'product is in order' argument, made from a subquery.
$data['commerce_product_commerce_line_item'] = array(
  'table' => array(
    'group' => 'Commerce Product',
    'join' => array(
      // Join to the commerce_product base.
      'commerce_product' => array(
        'left_field' => 'entity_id',
        'field' => 'line_item_id',
        'handler' => 'views_join_commerce_product_line_item',
      ),
    ),
  ),
);

Our custom join class now has to add the subquery to the view, but it also needs the argument value to do this.

The way I worked around this was to override the ensure_my_table() method in the argument handler. Normally, this calls $this->query->ensure_table() which then creates the join, but ensure_table() can take a join parameter to work with. The overridden version of ensure_my_table() creates the join object, and sets the argument value on it:

function ensure_my_table() {
  // Pre-empt views_plugin_query_default::ensure_table() by setting our join up now.
  // Argh, hack this in for now. This may mean relationships using this break?
  $relationship = 'commerce_product';

  // Get a join object for our table.
  // This is of class views_join_commerce_product_line_item, which takes
  // care of joining to a subquery rather than a table.
  $join = $this->query->get_join_data($this->table, $this->query->relationships[$relationship]['base']);

  // We add the argument value to the join handler as it needs to use it
  // within its subquery.
  $join->argument = $this->argument;

This means that in the build_join() method for our custom join handler, views_join_commerce_product_line_item, we can rely on the argument value that the views has received:

function build_join($select_query, $table, $view_query) {
  // (snip...) build a SelectQuery object for the subquery
  // Set the condition based on the argument value.
  $subquery->condition('cli.order_id', $this->argument);
  // (snip...)
  // Add the join on the subquery.
  $select_query->addJoin($this->type, $subquery, $table['alias'], $condition);

The views_join class's build_join() method is where the Views system of building a query is translated into a DatabaseAPI SelectQuery object. Here we build up our own query (and we don't need Views-safe aliases, as it's a completely internal, non-correlated subquery), and pass it in as a join which uses it as a subquery.

It remains only for the argument handler's query() method to add the conditions for its field, using the alias and field names we gave for the subquery.

In conclusion, the data structure Views understands may appear to be a fixed, declared thing, but with a little bit of tweaking the way tables are joined in a Views query can be affected by both site configuration and user input.

It's Amazing What You Find: Crusty Bits of the Menu System

I've been poking in the innards of the menu system the last few days. This is due to yet another client wanting to do something that goes completely against the grain of Drupal.

In this case, it's the way Drupal only shows you menu links you have access to. This to me seems perfectly reasonable good usability: why show you something you can't use? On the other hand, there is a long-standing feature in Comment module that shows anonymous users a 'Login or register to post comments' link on nodes, so 'incitements to action' or whatever the social media buzzword is do exist in Drupal.

So the challenge was to show a link that the anonymous user can't use, send them to login, and back to the link they wanted in the first place. The last part is just some hook_form_alter() work with form redirection (though it did allow me to discover that everything drupal_get_form() is passed is available to the alter hooks, just in a funny place). The access to the menu item is done by intercepting in hook_menu_item_alter() to save a twin of the menu item, and a checkbox in the menu edit form to trigger this. Even registering the path is easy: a custom menu access callback which takes as access arguments those of the original item plus its access callback and negates whatever the original item would return for access. The part I'm (so far) stuck on is getting hook_menu_alter() to know about what the admin user has done in hook_menu_item_alter(): the next job will probably involve creating a truly ugly query that uses %LIKE% to grab menu items based on their options array.

But that's not what I came here to tell you about today.

In my prodding around of the menu system, I found that menu router items have a 'block_callback' property, with its own database field and everything. Now get this: only one item in the entire {menu_router} table on D6 has this filled, and to boot, it has absolutely no effect. (It's the admin theme page, by the way.) This property is something to do with the way that the root admin page is made out of things that are sort of blocks, but not quite. If the property exists on the router item, then the callback is called to add (!!) to the content. (The admin theme page of course plays no part in the root admin page.)

So we had here a completely useless database field, probably left over from Drupal 5 or even earlier. By the way, in a standard Drupal 7 install, the database field is completely unused. I made a dummy patch to get the issue queue testbot confirm that we never get to this particular piece of code, and the superfluous field has now been removed. It's amazing what you find!

By the way, if anyone fancies some bikeshedding, I still don't have a name better than 'menu_login' for this module. To your paintbrushes!

Pages

Subscribe to RSS - drupal planet