drupal planet

Debugging and Logging AJAX requests tests in Docksal

The hardest thing I find with tests is understanding errors. Every time I think I've got debugging output sorted, I find a new layer where it doesn't work, I've got nothing, and I'm in the dark with something that's crashing and I don't know why.

The first layer is simple: errors in your test code itself. For example, make a typo in your tests/src/Functional/MyTest.php and PHPUnit crashes and you see the error in the terminal.

But when it's site code that's crashing, you're dealing with a system that is being driven by code, and therefore, you can't see it. And that's a major obstacle to figuring out a problem.

The HTML output that Drupal's Functional and Functional Javascript tests produce is a huge help: every time your test code makes a request to the test site, an HTML file is written to the test files directory. If your site crashes when your test makes a request, you'll see the error and the backtrace there.

However, there's no such output when in a Functional Javascript test you cause an AJAX request. And while you can create a screenshot of what the page looks like after the request, our another HTML file of the page (see https://www.drupal.org/project/drupal/issues/3090498 for instructions how; the issue is about how to make that automatic but I have no idea how that might be possible), you can't see the actual error, because AJAX requests that fail just sit there doing nothing. There's nothing useful to see in the browser.

So we need to see logs. When a real site has an AJAX crash, with a human being-controlled web browser making the request, you can go and look in the logs. With a test site, the log table is zapped when the test is completed.

Fortunately, Drupal 8's pluggable logging means there are other ways of getting hold of them, more permanent ways.

I first tried log_stdout module. This outputs log errors to STDOUT. If you're running on Docksal, as I am, you have an extra layer to get though to see that. You can monitor the cli container with fin logs -f cli, and with that module add a | ag WATCHDOG to filter.

However, I wasn't seeing backtrace in this output, and I gave up figuring out why.

So I tried filelog module instead, which as the name implies, writes log to a simple text file. This needs a little bit more work, as by default it writes to 'public://logs'. This means that each run of the test gets its own log file, which is perhaps what you want, but for my own uses I wanted a single log file I could tail -f in a terminal window and have continual monitoring.

A quick bit of config setting in the test's setUp() does the trick:

$this->config('filelog.settings')
  ->set('location', '/var/www/docroot/sites/simpletest/logs')
  ->save();

And I think that's me at last sorted.

Multi-site search using Feeds and SearchAPI

[This is an old post that I wrote for System Seed's blog and meant to put on my own too but it fell off my radar until now. It's also about Drupal 7, but the general principle still applies.]

Handling clients with more than one site involves lots of decisions. And yet, it can sometimes seem like ultimately all that doesn't matter a hill of beans to the end-user, the site visitor. They won't care whether you use Domain module, multi-site, separate sites with common codebase, and so on. Because most people don't notice what's in their URL bar. They want ease of login, and ease of navigation. That translates into things such as the single sign-on that drupal.org uses, and common menus and headers, and also site search: they don’t care that it’s actually sites search, plural, they just want to find stuff.

For the University of North Carolina, who have a network of sites running on a range of different platforms, a unified search system was a key way of giving visitors the experience of a cohesive whole. The hub site, an existing Drupal 7 installation, needed to provide search results from across the whole family of sites.

This presented a few challenges. Naturally, I turned to Apache Solr. Hitherto, I've always considered Solr to be some sort of black magic, from the way in which it requires its own separate server (http not good enough for you?) to the mysteries of its configuration (both Drupal modules that integrate with it require you to dump a bunch of configuration files into your Solr installation). But Solr excels at what it sets out to do, and the Drupal modules around it are now mature enough that things just work out of the box. Even better, Search API module allows you to plug in a different search back-end, so you can develop locally using Drupal's own database as your search provider, with the intention of plugging it all into Solr when you deploy to servers.

One possible setup would have been to have the various sites each send their data into Solr directly. However, with the Pantheon platform this didn't look to be possible: in order to achieve close integration between Drupal and Solr, Pantheon locks down your Solr instance.

That left talking to Solr via Drupal.

Search API lets you define different datasources for your search data, and comes with one for each entity type on your site. In a datasource handler class, you can define how the datasource gets a list of IDs of things to index, and how it gets the content. So writing a custom datasource was one possibility.

Enter the next problem: the external sites that needed to be indexed only exposed their content to us in one format: RSS. In theory, you could have a Search API datasource which pulls in data from an RSS feed. But then you need to write a SearchAPI datasource class which knows how to parse RSS and extract the fields from it.

That sounded like reinventing Feeds, so I turned to that to see what I could do with it. Feeds normally saves data into Drupal entities, but maybe (I thought) there was a way to have the data be passed into SearchAPI for indexing, by writing a custom Feeds plugin?

However, this revealed a funny problem of the sort that you don’t consider the existence of until you stumble on it: Feeds works on cron runs, pulling in data from a remote source and saving it into Drupal somehow. But SearchAPI also works on cron runs, pulling data in, usually entities. How do you get two processes to communicate when they both want to be the active participant?

With time pressing, I took the simple option: define a custom entity type for Feeds to put its data into, and SearchAPI to read its data from. (I could have just used a node type, but then there would have been an ongoing burden of needing to ensure that type was excluded from any kind of interaction with nodes.)

Essentially, this custom entity type acted like a bucket: Feeds dumps data in, SearchAPI picks data out. As solutions go, not the most massively elegant, at first glance. But if you think about it, if I had gone down the route of SearchAPI fetching from RSS directly, then re-indexing would have been a really lengthy process, and could have had consequences for the performance of the sites whose content was being slurped up. A sensible approach would then have been to implement some sort of caching on our server, either of the RSS feeds as files, or the processed RSS data. And suddenly our custom entity bucket system doesn’t look so inelegant after all: it’s basically a cache that both Feeds and SearchAPI can talk to easily.

There were a few pitalls. With Search API, our search index needed to work on two entity types (nodes and the custom bucket entities), and while Search API on Drupal 7 allows this, its multiple entity type datasource handler had a few issues to iron out or learn to live with. The good news though is that the Drupal 8 version of Search API has the concept of multi-entity type search indexes at its core, rather than as a side feature: every index can handle multiple entity types, and there’s no such thing as a datasource for a single entity type.

With Feeds, I found that not all the configuration is exportable to Features for easy deployment. Everything about parsing the RSS feed into entities can be exported, except the actual URL, which is a separate piece of setup and not exportable. So I had to add a hook_updateN() to take care of setting that up.

The end result though was a site search that seamlessly returns results from multiple sites, allowing users to work with a network of disparate sites built on different technologies as if they were all the same thing. Which is what they were probably thinking they were all along anyway.

Controlling multiple sites with Drush 9

Drush 9 has removed dynamic site aliases. Site aliases are hardcoded in YAML files rather than declared in PHP. Sadly, that means that many tricks you could do with the declaration of the site aliases are no longer available.

The only grouping possible is based on the YAML filename. So for example, with the Acquia Cloud Site Factory site aliases generated by the 'blt recipes:aliases:init:acquia' command, you can run a command on the same site across different environments.

But what you can't do is run a command on all the sites in one environment.

One use case for this is checking whether a module is enabled on any sites, so you know that it's safe to remove it from the codebase.

Currently, this is quite a laborious process, as 'drush pm-list' needs to be run for each site.

With environment aliases, this would be a one liner:

drush @hypothetical-env-alias pm-list | ag some_module

('ag' is the very useful silver searcher unix command, which is almost the same as the also excellent 'ack' but faster, and both are much better than grep.)

While site aliases are fixed, they can be altered with Drush hooks. I considered that these might allow something to dynamically declare aliases, or a command option. There's an example of altering aliases with a hook in the Drush code.

In the meantime, a much simpler solution is to use xargs, which I have recently found is extremely useful in all sorts of situations. Because this allows you to run one command multiple times with a set of parameters, all you need to do is pass it a list of site aliases. Fortunately, the 'drush sa' command has lots of formatting options, and one of them gives us just what we need, a list of aliases with one on each line:

drush sa --format=list 

That gives us all the aliases, and we probably don't want that. So here's where ag first comes in to play, as we can filter the list, for example, to only run on live sites (I'm using my ACSF aliases here as an example):

drush sa --format=list| ag 01live

Now we have a filtered list of aliases, and we can feed that into xargs:

drush sa --format=list| ag 01live | xargs -I % drush % pm-list

Normally, xargs puts the input parameter at the end of its command, but here we want it inserted just after the 'drush' command. The -I parameter allows us to specify a placeholder where the input parameter goes, so:

xargs -I % drush % pm-list

says that we want the site name to go where the '%' is, and means that that xargs will run:

drush SITE-ALIAS pm-list

with each value it receives, in this case, each site alias.

Another thing we will do with xargs is set the -t parameter, which outputs each actual command it executes on STDERR. That acts as a heading in the output, so we can clearly see which site is outputting what.

Finally, we can use ag a second time to filter the module list down to just the module we want to find out about:

drush sa --format=list | ag live | xargs -t -I % drush % pml | ag some_module 

The nice thing about the -t parameter is that as it's STDERR, it's not affected by the final pipe to ag for filtering output. So the output will consist of the drush command for the site, followed by the filtered output.

And hey presto.

In conclusion: dynamic site aliases in Drush were nice, but the maintainers removed them (as far as I can gather) because they were a mess to implement, and removing them vastly simplified things. Doing the equivalent with xargs took a bit of figuring out, but once you know how to do it, it's actually a much more powerful way to work with multiple sites at once.

A git-based patch workflow for drupal.org (with interdiffs for free!)

There's been a lot of discussion about how we need github-like features on d.org. Will we get them? There's definitely many improvements in the pipeline to the way our issue queues work. Whether we actually need to replicate github is another debate (and my take on it is that I don't think we do).

In the meantime, I think that it's possible to have a good collaborative workflow with what we have right now on drupal.org, with just the issue queue and patches, and git local branches. Here's what I've gradually refined over the years. It's fast, it helps you keep track of things, and it makes the most of git's strengths.

A word on local branches

Git's killer feature, in my opinion, is local branches. Local branches allow you to keep work on different issues separate, and they allow you to experiment and backtrack. To get the most out of git, you should be making small, frequent commits.

Whenever I do a presentation on git, I ask for a show of hands of who's ever had to bounce on CMD-Z in their text editor because they broke something that was working five minutes ago. Commit often, and never have that problem again: my rule of thumb is to commit any time that your work has reached a state where if subsequent changes broke it, you'd be dismayed to lose it.

Starting work on an issue

My first step when I'm working on an issue is obviously:

  git pull

This gets the current branch (e.g. 7.x, 7.x-2.x) up to date. Then it's a good idea to reload your site and check it's all okay. If you've not worked on core or the contrib project in question in a while, then you might need to run update.php, in case new commits have added updates.

Now start a new local branch for the issue:

  git checkout -b 123456-foobar-is-broken

I like to prefix my branch name with the issue number, so I can always find the issue for a branch, and find my work in progress for an issue. A description after that is nice, and as git has bash autocompletion for branch names, it doesn't get in the way. Using the issue number also means that it's easy to see later on which branches I can delete to unclutter my local git checkout: if the issue has been fixed, the branch can be deleted!

So now I can go ahead and start making commits. Because a local branch is private to me, I can feel free to commit code that's a total mess. So something like:

  dpm($some_variable_I_needed_to_examine);
  /*
  // Commented-out earlier approach that didn't quite work right.
  $foo += $bar;
  */
  // Badly-formatted code that will need to be cleaned up.
  if($badly-formatted_code) { $arg++; }

That last bit illustrates an important point: commit code before cleaning up. I've lost count of the number of times that I've got it working, and cleaned up, and then broken it because I've accidentally removed an important line that was lost among the cruft. So as soon as code is working, I make a commit, usually whose message is something like 'TOUCH NOTHING IT WORKS!'. Then, start cleaning up: remove the commented-out bits, the false starts, the stray code that doesn't do anything, in small commits of course. (This is where you find it actually does, and breaks everything: but that doesn't matter, because you can just revert to a previous commit, or even use git bisect.)

Keeping up to date

Core (or the module you're working on) doesn't stay still. By the time you're ready to make a patch, it's likely that there'll be new commits on the main development branch (with core it's almost certain). And before you're ready, there may be commits that affect your ongoing work in some way: API changes, bug fixes that you no longer need to work around, and so on.

Once you've made sure there's no work currently uncommitted (either use git stash, or just commit it!), do:

git fetch
git rebase BRANCH

where BRANCH is the main development branch that is being committed to on drupal.org, such as 8.0.x, 7.x-2.x-dev, and so on.

(This is arguably one case where a local branch is easier to work with than a github-style forked repository.)

There's lots to read about rebasing elsewhere on the web, and some will say that rebasing is a terrible thing. It's not, when used correctly. It can cause merge conflicts, it's true. But here's another place where small, regular commits help you: small commits mean small conflicts, that shouldn't be too hard to resolve.

Making a patch

At some point, I'll have code I'm happy with (and I'll have made a bunch of commits whose log messages are 'clean-up' and 'formatting'), and I want to make a patch to post to the issue:

  git diff 7.x-1.x > 123456.PROJECT.foobar-is-broken.patch

Again, I use the issue number in the name of the patch. Tastes differ on this. I like the issue number to come first. This means it's easy to use autocomplete, and all patches are grouped together in my file manager and the sidebar of my text editor.

Reviewing and improving on a patch

Now suppose Alice comes along, reviews my patch, and wants to improve it. She should make her own local branch:

  git checkout -b 123456-foobar-is-broken

and download and apply my patch:

  wget PATCHURL
  patch -p1 < 123456.PROJECT.foobar-is-broken.patch

(Though I would hope she has a bash alias for 'patch -p1' like I do. The other thing to say about the above is that while wget is working at downloading the patch, there's usually enough time to double-click the name of the patch in its progress output and copy it to the clipboard so you don't have to type it at all.)

And finally commit it to her branch. I would suggest she uses a commit message that describes it thus:

  git commit -m "joachim's patch at comment #1"

(Though again, I would hope she uses a GUI for git, as it makes this sort of thing much easier.)

Alice can now make further commits in her local branch, and when she's happy with her work, make a patch the same way I did. She can also make an interdiff very easily, by doing a git diff against the commit that represents my patch.

Incorporating other people's changes to ongoing work

All simple so far. But now suppose I want to fix something else (patches can often bounce around like this, as it's great to have someone else to spot your mistakes and to take turns with). My branch looks like it did at my patch. Alice's patch is against the main branch (for the purposes of this example, 7.x-1.x).

What I want is a new commit on the tip of my local branch that says 'Alice's changes from comment #2'. What I need is for git to believe it's on my local branch, but for the project files to look like the 7.x-1.x branch. With git, there's nearly always a way:

  git checkout 7.x-1.x .

Note the dot at the end. This is the filename parameter to the checkout command, which tells git that rather than switch branches, you want to checkout just the given file(s) while staying on your current branch. And that the filename is a dot means we're doing that for the entire project. The branch remains unchanged, but all the files from 7.x-1.x are checked out.

I can now apply Alice's patch:

  wget PATCHURL
  patch -p1 < 123456.2.PROJECT.foobar-is-broken.patch

(Alice has put the comment ID after the issue ID in the patch filename.)

When I make a commit, the new commit goes on the tip of my local branch. The commit diff won't look like Alice's patch: it'll look like the difference between my patch and Alice's patch: effectively, an interdiff. I now make a commit for Alice's patch:

  git commit -m "Alice's patch at comment #2"

I can make more changes, then do a diff as before, post a patch, and work on the issue advances to another iteration.

Here's an example of my local branch for an issue on Migrate I've been working on recently. You can see where I made a bunch of commits to clean up the documentation to get ready to make a patch. Following that is a commit for the patch the module maintainer posted in response to mine. And following that are a few further tweaks that I made on top of the maintainer's patch, which I then of course posted as another patch.

A screenshot of a git GUI showing the tip of a local branch, with a commit for a patch from another user.

(Notice how in a local branch, I don't feel the need to type terribly accurately for my commit messages, or indeed be all that clear.)

Improving on our tools

Where next? I'm pretty happy with this workflow as it stands, though I think there's plenty of scope for making it easier with some git or bash aliases. In particular, applying Alice's patch is a little tricky. (Though the stumbling block there is that you need to know the name of the main development branch. Maybe pass the script the comment URL, and let it ask d.org what the branch of that issue is?)

Beyond that, I wonder if any changes can be made to the way git works on d.org. A sandbox per issue would replace the passing around of patch files: you'd still have your local branch, and merge in and push instead of posting a patch. But would we have one single branch for the issue's development, which then runs the risk of commit clashes, or start a new branch each time someone wants to share something, which adds complexity to merging? And finally, sandboxes with public branches mean that rebasing against the main project's development can't be done (or at least, not without everyone know how to handle the consequences). The alternative would be merging in, which isn't perfect either.

The key thing, for me, is to preserve (and improve) the way that so often on d.org, issues are not worked on by just one person. They're a ball that we take turns pushing forward (snowball, Sisyphean rock, take your pick depending on the issue!). That's our real strength as a community, and whatever changes we make to our toolset have to be made with the goal of supporting that.

Corralling permissions into a grid

I've just released Permissions Grid. It does what the name suggests: it presents related permissions in a grid, rather than the usual long list.

How are permissions structured into a grid? Well, only the ones that form natural groups are included: every set of permissions of the form 'create foo, edit foo, delete foo, create bar, edit bar, delete bar' is turned into a matrix of checkboxes with the verbs 'create, edit, delete' along the top, and the objects 'foo, bar' along the top. When modules such as node, taxonomy, and commerce define related permissions for nodes, vocabularies, and products respectively, that gives you something like this:

This gives an easy to grasp overview of what a role can do with different objects on the site: which node types can this role create? which can they edit, or delete? which product types can they edit? which vocabularies can they create terms in?

If this sounds and looks vaguely familiar, that's probably because this module has an ancestor: my Drupal 6 module node permissions grid module, which I wrote back when a site's content types started to become too numerous to easily make sense of. That operated only on node types, and like a great many contrib modules porting to Drupal 7, it's had to 'drop the node' and generalize. But in fact nothing restricts Permissions Grid to entities: all it cares about is permissions.

Structured permissions are declared to the module in an info hook, and each module may declare multiple sets of permissions. This allows for the fact that some modules add further vocabulary-related permissions which do not have the same pattern, and that commerce has entity permissions in both singular and plural form.

Are there any groups of permissions I've missed, whether in core or contrib? Post a feature request, or better still, take a look at the hook implementations already there and file a patch.

On rules versus hooks, or, abstraction shock

I need to add a bit of business logic to my Commerce site: a boolean field on product nodes marks that the corresponding products can't be delivered outside the UK.

And I know the way to do this in Commerce is to create a rule: react to the cart completing checkout, iterate over line items, check the corresponding products, and block the completion if the field in question is set.

Rules is great: with Rules, site builders can change site functionality and cause it to react to events. When non-techy people ask if my job involves designing websites, to put them right I say, 'I make websites go "bing!"'; and now, site builders can make them go 'bing!' too.

But I have a confession: I'm reluctant about using Rules. It's partly that I find the UI confusing, and it feels time-consuming to test them, but deeper than that I think it's just that I feel too far removed from the actual thing I'm trying to make.

And that makes me wonder: am I becoming a Drupal dinosaur?

Because I can imagine when Views first came along, developers who were used to writing their own query and formatting the result themselves, looking at the Views UI and thinking 'I don't feel in control of my lists of stuff any more'.

Or before CCK, developers wrote exactly the form elements they needed in the node form and saved it themselves in the database. I still sometimes speak to non-Drupal developers who want to be able to dump data into the node table directly (or pull it out) and when I tell them they can't, because the data that actually makes a node is spread out over the node table, the node_revision table, and then a multitude of field tables. And their feeling of disconnection at not being able to get their hands on 'the node' as a solid lump of database stuff must surely be akin to what I feel with Rules. And I'm going to call this feeling 'abstraction shock'.

I want to write a hook. I want to write the code for it, for it to feel like a solid thing. I know that my rule can (indeed, should) be exported to code, but I want code that I can read and see exactly what it says, rather than code that Rules will consume and understand. And most of all, I want to be able to put in debug statements to understand what I'm getting as I write it, and after I've written it when it's going wrong or when the site functionality has to change.

If that makes me a dinosaur, save me a seat next to the brontosaurus.

Git tricks: repatching for an issue branch

My workflow for making patches is to use a feature branch for a single issue. Whether you're a contributor or a maintainer it lets you advance the fixing of the problem in small increments, and safely experiment knowing you can roll back.

But where it goes wrong is when your patch is superseded by a newer one in the issue queue, and you want to work on it some more. How do you update your branch for the ongoing work? As ever, with git there's a way.

Let's start with the basics first: you're making a feature branch to work on an issue. I tend to follow the naming pattern '123456-fix-all-the-bugs', but for this example I'll call it 'issue'.

// Make a new branch and switch to it.
$ git co -b issue
// Make lots of commits.
// Ready to make a patch:
$ git diff > 123456.project.issue.patch

(Note that if you can make your patch to show all your commits one by one, which can sometimes aid in making it clear what you're changing, but that's for another day.)

You've now got a patch which you're uploading to the issue queue, and your tree looks something like this:

* [issue] Last commit, ready to roll a patch!
* Fixed the foobar.
* Added a bizbax.
/
* [master]

Now someone else comes along to the issue queue, reviews your patch, and posts a new patch of their own. You in turn look at patch 2, and while it's an improvement, you think it needs still more work.

The problem is how to apply the patch to your repository. It won't apply to the tip of the issue branch, and if you checkout master, you can't get back to your issue branch. You can of course just discard your original issue branch, and create a branch issue2 for patch 2.

Or you can do this:

// Start on the issue branch.
// Stash any work in progress!
$ git stash
// Checkout just the *files* of master, while keeping the HEAD pointer on the
// issue branch.
$ git checkout master -- .
// This puts the files from master into the working tree, but keeps the index
// on the issue branch. In simpler terms, the reverse of patch 1 will appear
// staged (as git believes that your files *ought* to look like patch 1, but
// actually look like master).
// We want the index clean, so unstage everything:
$ git reset HEAD .
// Now apply the new patch.
$ patch -p1 < patch-2.patch
// Now commit this as patch 2.
// Remember to stash pop when you're done!

Because the working tree files (that is, the actual files on your system) look like the master branch, the patch applies cleanly. But because git still believes its on the tip of the issue branch, the commit you make goes on the tip of that branch, and the diff it records is effectively the interdiff between your patch-1 and the other contributor's patch-2. Your tree looks like this:

* [issue] Applied patch 2 from Ada Lovelace.
* Last commit, ready to roll a patch!
* Fixed the foobar.
* Added a bizbax.
/
* [master]

Result: you can now do more work on this branch, and make more commits, and when you're ready, diff against master to make patch-3, ready to upload to the issue queue.

Git tricks: being on the wrong branch

I often find that I'm in the middle of one thing when I have to do another. Whether it's hotfixes for a client, or just finding a minor bug that blocks my current work, or needing to add components to a feature before I can add custom functionality.

The best way is to stash your current work, checkout the master branch, commit, then go back. If you're working on a feature branch (and you should be), then rebase that afterwards so you have access to the new work there. So that's:

$ git stash
$ git checkout master
// do commits
$ git checkout feature
$ git rebase master
$ git stash pop

But that's not always feasible. Sometimes I'm sloppy, and I've already made code changes before stashing. And lately, I've got one instance of Party module that's got a feature branch that's made database changes, but I don't want to hold that up ongoing commits (and I'm too lazy to set up a new local site!).

If your fix is just one commit, you can make it on the feature branch, then cherrypick it to the master branch like this:

// make your commit and note its SHA
$ git stash
$ git checkout master
$ git cherry-pick COMMIT
$ git checkout feature
$ git rebase master

The rebase should be smart enough to figure out the same commit exists on both branches, and will silently drop it from the feature branch.

Alternatively, if you want to do a chunk of main branch work, make a temporary branch on the tip of feature, which you can then move to the master branch when you're done:

$ git checkout -b moveme
// make as many commits as you like
// Now we take everything that's between the tips of feature and moveme, and move it to the tip of master
$ git rebase --onto master feature moveme
// Now merge moveme into master: this'll just fast-foward master.
$ git checkout master
$ git merge moveme
// moveme can be deleted now
$ git branch -d moveme
// Now rebase the feature branch
$ git checkout feature
$ git rebase master

As I've become more familiar with git, I've found that temporary, throw-away branches can be useful in a variety of situations. Another one is making a backup branch prior to potentially messy rebases: just create a branch 'backup' where you are to be sure that no matter what happens with the rebase, your current chain of commits will be preserved. If there are conflicts, you can diff against it to check no code was lost. And when you're happy with the result of the rebase, just delete it. Branches being cheap, and local, opens up a whole new set of uses for them.

Get out, git!

There are lots of good reasons to have your server's codebase be an actual git checkout. But there's one potential flaw: your entire repository's history ends up in your webroot inside a .git folder.

You can block access to it in your .htaccess, but that's hacking core (until this patch lands at least).

There is however an alternative method that lets you keep the entirety of git's working folder outside the webroot completely.

Here's how to convert an existing repository to this format:

  1. Move the .git folder to another location, renaming it in the process so it's no longer hidden. The convention is to leave it with a .git ending though, so for example, 'mysite.git'. I put these inside a folder called 'git' in the user's home folder, for instance.

    $ mv .git ~/git/mysite.git

  2. In its original place in your webroot, create a new file called '.git'. Into this file place a single line thus:

    gitdir: /absolute/path/to/your/mysite.git

    This needs to be an absolute path; relative ones confuse git when you go into subfolders. Using '~' to start at the user's home folder doesn't seem to work either.

  3. Finally, we need to tell the config file where the work folder is. This step isn't completely necessary, but it allows you to invoke the git command while standing in subfolders of your webroot, which is too handy a thing to lose.

    Standing either in the webroot or in the git folder, do:

    $ git config core.worktree "/absolute/path/to/your/webroot"

    You can also edit the git config file by hand to set this, which allows you to also add a comment explaining the manoeuvre for future reference.

That's all there is to it. You now have a working git repository whose working folder is completely inaccessible from the outside world.

For creating a new repo, you can use the following finger-twister:

$ git --git-dir=/path/to/repo.git --work-tree=. init && echo "gitdir: /path/to/repo.git" > .git

There are more tips in this question on StackOverflow. And for a hands-on tutorial, come to my session on git at DrupalCamp Scotland, taking place later this month in Edinburgh.

Moving a git local branch from one local to another

You're maybe at one of the many Drupal Co-worker Friday events that are taking place around the world today. You've packed up your laptop and your lunchbox, and you're looking forward to a day out of the house with some human contact.

But yesterday you were halfway through a big piece of work on your project. And you were using a git local branch, of course. Why? Because it keeps your work isolated off the main development branch, allowing other work to continue independently. And because while your commits are only local you're free to reorder them, edit the log messages, fixup mistakes as if they never happened, and so on. In fact, if you so choose, merging your local branch in with the --squash option makes it look like you made all of the work in a single, perfect commit. Wow!

But that branch is on your desktop machine, and today you're on the laptop. How to get it over from one to the other? You don't want to push it to the remote, because then that means you can't rework it, and it doesn't really belong there anyway. You could make one big patch of your branch against the development branch, but then on your laptop you won't have the dozen or so commits you made yesterday (and you work with small, simple commits, because it makes it easier to roll back should you need to, and to see what's been changed). Furthermore, you've a bunch of changes that aren't yet committed because they're work in progress. You need those too, but not mixed in with what's already committed and reasonably stable.

The solution is the git format-patch command. At first try this is a rather weird one, which fills your folder up with cryptic mailbox files (which as far as I can tell are nigh on useless in the modern world of webmail). But it has an option which turns it into something very useful indeed: --stout. So much so that I've added it to my git global config thus:

  fp  = format-patch --stdout

So standing in your repository and doing

  git fp the-dev-branch > my-local-branch.patch

will give you one single file that comprises multiple patches, one for each commit. Copy that to your laptop by your favourite means, and over there do:

  git co -b my-local-branch
  git am  my-local-branch.patch

And all your commits from your desktop machine are reproduced on your laptop.

But what about your work in progress? I was coming to that. Before you begin, make one commit of all of them, perhaps with a log message to remind you it's the work in progress.

Then after you've done 'git am', all we have to do is kill off that final tip commit, while keeping the changes it contains in the filesystem. The command for that is git reset with the --mixed option:

  git reset --mixed HEAD^

All your work in progress changes are now 'unstaged changes', and your laptop's copy of the repository is in exactly the same stage as your desktop machine.

What about going back the other way? Well I'll figure that one out tonight, I expect ;)

PS. The return trip is easily done thus:

When you create the branch on the new machine, but before you apply the patch, add a tag: git tag mytag. Then at the end of the day, do the original process in reverse but take your diff from the tab: git fp mytag > homeward.patch

And on your home machine, remember to kill the 'work in progress' commit before you apply the patch, with 'git reset --hard HEAD^'. That's a hard reset this time, since the changes in that commit are in the new commits you're bringing back.

Hope your Drupal Coworker Friday was as productive as mine!

Pages

Subscribe to RSS - drupal planet