Skip to main content
Joachim's blog

Main navigation

  • Home
  • About
  • Hire me

Breadcrumb

  1. Home

Drupal Planet Posts

A better way to work on Drupal core

By joachim, Fri, 16/04/2021 - 13:32

Often when things seem really complicated, I think it's because I must be doing it wrong.

Working on Drupal core since dependencies were removed from git has seemed really fiddly. For a long time I thought I must have missed something, that there was some undocumented technique I wasn't aware of.

But I've asked various people who work on core a lot more than I do, and they've confirmed that what I've been doing is pretty much the way that they do it:

  1. Get a git clone of Drupal core.
  2. Run composer install on it.
  3. Write code!
  4. Make a patch (well, a merge request now!)

That all sounds simple, right? But wait! If you're working on core, you're going to want Devel module for its useful debugging and inspection tools, right? And Drush, for quick cache clearing. And probably Admin Toolbar so going around the UI is quicker.

But you have to install all those with Composer. And doing that dirties the composer.lock file that's part of Drupal core's git clone.

It's fairly simple to keep changes to that file out of your merge request or patch, but pretty soon, you're going to do a git pull on core that's going to include changes to the composer.lock file, because core will have updates to dependencies.

And that's where it all starts to go wrong, because the git pull will fail because of conflicts in the composer.lock file and in other Composer files, and conflicts in that file are really painful to resolve.

So maintaining an ongoing Drupal install that uses a Drupal core git clone quickly becomes a mess. As far as I know, most core developers frequently reset the whole thing and reinstall from scratch.

The problem is caused by using the git clone of Drupal core as the Composer project, so that Drupal core's composer.json is being used as the project composer.json. But there's a better way...

Using Composer with git clones

Composer has a way of using a git clone for a package in a project:

  1. Create your own git clone of the package
  2. Declare that git clone as a Composer package repository in the project's composer.json
  3. Install the package

The result is that Composer creates a symlink from your git clone into the project, and doesn't touch the git clone. You need to be fairly lax in the version requirement you give for the package, so that Composer doesn't object to the git clone being on a feature branch later on.

This, as far as I know, is the standard way for working on a Composer package that you need to operate in the context of a project. It works for library packages and Composer plugins.

For Drupal core...? Well, it works, but as you might have guessed it's a little more complicated.

Drupal has opinions about where it expects to be located in a project, and furthermore, has a scaffolding system which writes files into the webroot when you install it. All of that gets a bit confused if you put Drupal core out of the way and symlink it in.

But with a few symlinks, and one sneaky patch to a scaffold file, it works. It's all quite fiddly and so I've made...

The Drupal Core development project

This is a Composer project template, available at https://github.com/joachim-n/drupal-core-development-project. It handles all the necessary tweaks to get Drupal to work when symlinked into a project: install it as a Composer project.

There are still a few limitations: those are detailed in the README too.

Of course, I've now fallen down the rabbithole of doing more work towards making Drupal completely agnostic of its location, rather than the core issues I wanted to work on in the first place.

Please try it, report any problems, and happy coding on Drupal core!

Tags

  • Composer
  • Drupal core

Different ways to make entity bundles

By joachim, Tue, 27/10/2020 - 11:52

A lot of the time, a custom content entity type only needs a single bundle: all the entities of this type have the same structure. But there are times where they need to vary, typically to have different fields, while still being the same type. This is where entity bundles come in.

Bundles are to a custom entity type what node types are to the node entity type. They could be called sub-types instead (and there is a long-running core issue to rename them to this; I'd say go help but I'm not sure what can be done to move it forward), but the name 'bundle' has stuck because initially the concept was just about different 'bundles of fields', and then the name ended up being applied across the whole entity system.

History lesson over; how do you define bundles on your entity type? Well there are several ways, and of course they each have their use cases.

And of course, Module Builder can help generate your code, whichever of these methods you use.

Quick and simple: hardcode

The simplest way to do anything is to hardcode it. If you have a fixed number of bundles that aren't going to change over time, or at least so rarely that requiring a code deployment to change them is acceptable, then you can simply define the bundles in code. The way to do this is with a hook (though there's a core issue -- which I filed -- to allow these to be defined in a method on the entity class).

Here's how you'd define your hardcoded bundles:

/**
 * Implements hook_entity_bundle_info().
 */
function mymodule_entity_bundle_info() {
  $bundles['my_entity_type'] = [
    'bundle_alpha_' => [
      'label' => t('Alpha'),
      'description' => t('Represents an alpha entity.')
    ],
    'bundle_beta_' => [
      'label' => t('Beta'),
      'description' => t('Represents a beta entity.')
    ],
  ];

  return $bundles;
}

The machine names of the bundles (which are used as the bundle field values, and in field names, admin paths, and so on) are the keys of the array. The labels are used in UI messages and page titles. The descriptions are used on the 'Add entity' page's list of bundles.

Classic: bundle entity type

The way Drupal core does bundles is with a bundle entity type. In addition to the content entity type you want to have bundles, there is also a config entity type, called the 'bundle entity type'. Each single entity of the bundle entity type defines a bundle of the content entity type. So for example, a single node type entity called 'page' defines the 'page' node type; a single taxonomy vocabulary entity called 'tags' defines the 'tags' taxonomy term type.

This is great if you want extensibility, and you want the bundles to be configurable by site admins in the UI rather than developers. The downside is that it's a lot of extra code, as there's a whole second entity type to define.

Very little glue is required between the two entity types, though. They basically each need to reference the other in their entity type annotations:

The content entity type needs:

 *   bundle_entity_type = "my_entity_type_bundle",

and the bundle entity needs:

 *   bundle_of = "my_entity_type",

and the bundle entity class should inherit from \Drupal\Core\Config\Entity\ConfigEntityBundleBase.

Per-bundle functionality: plugins as bundles

This third method needs more than just Drupal core: it's a technique provided by Entity module.

Here, you define a plugin type (an annotation plugin type, rather than YAML), and each plugin of that type corresponds to a bundle. This means you need a whole class for each bundle, which seems like a lot of code compared to the hook technique, but there are cases where that's what you want.

First, because Entity module's framework for this allows each plugin class to define different fields for each bundle. These so-called bundle fields are installed in the same way as entity base fields, but are only on one bundle. This gives you the diversification of per-bundle fields that you get with config fields, but with the definition of the fields in your code where it's easier to maintain.

Second, because in your plugin class you can code different behaviours for different bundles of your entity type. Suppose you want the entity label to be slightly different. No problem, in your entity class simply hand over to the bundle:

class PluginAlpha {

  public function label() {
    $bundle_plugin = \Drupal::service('plugin.manager.my_plugin_type')
    return $bundle_plugin->label($this);
  }
  
}

Add a label() method to the plugin classes, and you can specialise the behaviour for each bundle. If you want to have behaviour that's grouped across more than one plugin, one way to do it is to add properties to your plugin type's annotation, and then implement the functionality in the plugin base class with a conditional on the value in the plugin's definition.

/**
 * @MyPluginType(
 *   id = "plugin_alpha",
 *   label = @Translation("Alpha"),
 *   label_handling = "combolulate"
class MyPluginBase {

  public function label() {
    switch ($this->getPluginDefinition()['floopiness']) {
      case 'combolulate':
        // Return the combolulated label. 
    }
  }

}

For a plugin type to be used as entity bundles, the plugins need to implement \Drupal\entity\BundlePlugin\BundlePluginInterface, and your entity type needs to declare the plugin:

 *   bundle_plugin_type = "my_plugin_type",

Here, the string 'my_plugin_type' is the part of the plugin manager's service name that comes after the 'plugin.manager' prefix. (The plugin system unfortunately doesn't have a standard concept of a plugin type's name, but this is informally what is used in various places such as Entity module and Commerce.)

The bundle plugin technique is well-suited to entities that are used as 'machinery' in some way, rather than content. For example, in Commerce License, each bundle of the license entity is a plugin which provides the behaviour that's specific to that type of license.

One thing to note on this technique: if you're writing kernel tests, you'll need to manually install the bundle plugins for them to exist in the test. See the tests in Entity module for an example.

These three techniques for bundles each have their advantages, so it's not unusual to find all three in use in a single project codebase. It's a question of which is best for a particular entity type.

Tags

  • Drupal core
  • Entity API

Using lazy builders with Twig templates

By joachim, Thu, 04/06/2020 - 14:04

Lazy builders were introduced in Drupal 8's render system to solve the problem of pieces of your page that have a low cacheability affecting the cacheability of the whole page. Typical uses of this technique are with entire render arrays which are handed over to a lazy builder.

However, in retrofitting an existing site to make use of caching more effectively, I found that we have a large amount of content that would be cacheable were it not for a single string. Specifically, we have a DIV of the details for a product, but also in that DIV there is the serial number for the user's specific instance of that product.

The whole of this product DIV was output by a Twig template. I could have ripped this up and remade it as a big render array, putting the lazy builder in for the serial number, but this seemed like a lot of work, and would also make our front-end developers unhappy.

It turned out there was another way. The render system deals with placeholders for lazy builders automatically, so you typically will just do:

$build['uncacheable_bit'] = [
  '#lazy_builder' => [
    'my_service:myLazyBuilder',
    [],
  ],
];

but as I learned from a blog post by Borisson, you can make the placeholders yourself put them anywhere in the content and attach the lazy builders to the build array. This is what the render system does when it handles a '#lazy_builder' builder render array item.

One application of this is that you can pass the placeholder as a parameter to a twig template:

// Your placeholder can be anything, but using a hash like the render system
// does prevents accidental replacements.
$placeholder = Crypt::hashBase64('some_data');

$build['my_content'] = [
  '#theme' => 'my_template',
  '#var_1' => $var_1,
  // This is just a regular parameter as far as the Twig template cares, 
  // output with {{lazy_builder_var}} as normal.
  '#lazy_builder_var' => $placeholder,
];

$build['#attached']['placeholders'][$placeholder] = [
  '#lazy_builder' => [
    // The lazy builder callback and parameters.
    'my_service:myLazyBuilder',
    [],
  ],
];

With this approach, the Twig template just needed some minor changes. It doesn't care about the lazy builder; all it sees is the placeholder string which it treats like any other text. The render system finds the placeholder in the '#attached' attribute, and handles the replacement when the rendered template is retrieved from the render cache.

The result is that we're able to cache much more in the render cache, and improve site performance.

Now I just have to figure out the cache tags...

Tags

  • Twig
  • caching
  • performance

Base fields versus config fields, and how to handle the latter in tests

By joachim, Thu, 19/03/2020 - 14:07

All fields are equal in Drupal 8! Back in Drupal 7 we had two different systems for data on entities, which worked fairly differently: entity properties that were defined as database fields and controlled by hardcoded form elements, and user-created fields on entities that had widgets and formatters. But now in Drupal 8, every value on an entity is a field, with the same configuration in the UI, access to the widgets and formatters, and whose data is accessed in the same way.

Is this the end of the story? No, not quite. For site builders, everything is unified, and that's great. For code that consumes data from entities, everything is unified, and that's great too. But for the developer actually working with those fields, whether a field is a base field or a config field still makes a big difference.

Config fields are easier...

Config fields are manipulated via a UI, and that makes them simple to work with.

Setup

Config fields win easily here: a few clicks in the UI to create the field, a few options to choose in dropdowns for the widget and formatter options and you're set. Export to config, commit: done!

Base fields are more fiddly here. You need to be familiar with the Entity system, and understand (or copy-pasta!) the base field definitions. Then, knowing the right widget and formatter to use and how to set their options requires close reading of the relevant plugin classes.

Update

Config fields win again: change in the UI, export to config, deploy!

For base fields, you once again need to know what you're doing with the code, and you now need a hook_update_N() to make the change to the database (now that entity updates are longer done automatically).

Removal

This is maybe a minor point, but while removing a config field is again just a UI click and a config export, removing a base field will again require a hook_update_N() to tell the Entity API to update the database tables.

...but base fields are more robust

Given the above, why bother with base fields? Well, if your fields just sit there holding data, you can stop reading. But if your fields are involved in your custom code, as a developer, this should give you an unpleasant feeling: your code in modules/custom/mymodule is dependent on configuration that lives in the site's config export.

Robustness

The fact that config fields are so easy to change in the UI is now a mark against them. Another developer could change a field without a full understanding of the code that works with it. The code might expect a value to always be there, but the field is now non-required. The code might expect certain options, but the field now has an extra one.

Instead of this brittle dependency, it feels much safer to have base fields defined closer to the code that makes use of them: in the same module.

Tests

The solution to the dependency problem is obviously tests, which would pick up any change that breaks the code's expectations of how the fields behave. But now we hit a big problem with config fields: your test site doesn't have those fields!

My first attempt at solving this problem was to use the Configuration development module. This allows you to export config from the site to a module's /config/install folder. A test that installs that module then gets those config items imported.

It's a quick and simple approach: when a test crashes because of a missing field, find it in the config folder, add it to the right module's .info.yml file, do drush cde MODULE and commit the changes and the newly created files.

This approach also works for all other sorts of config your test might need: taxonomy vocabularies, node types, roles, and more!

But now you have an additional maintenance burden because your site config is now in three places: the site itself, the config export, and now also in module config. And if developers who change config forget to also export to module config, your test is now no longer testing how the site actually works, and so will either fail, or worse, won't be actually covering what you expect any more.

Best of both worlds: import from config

To summarize: we want the convenience of config fields, but we want them to be close to our code and testable. If they're testable, we can maybe stand to forego the closeness.

My best solution to this so far is simple: allow tests to import configuration direct from the site config sync folder.

Here's the helper method I've been using:

/**
 * Imports config from a real local site's config folder.
 *
 * TODO: this currently only supports config entities.
 *
 * @param string $site_key
 *   The site key, that is, the subfolder in /config in which to find the
 *   config files.
 * @param string $config_name
 *   The config name. This is the same as the YML filename without the
 *   extension. Note that this is not checked for dependencies.
 *
 * @throws \Exception
 *   Throws an exception if the config can't be imported.
 */
protected function importSiteConfig(string $site_key, string $config_name) {
  $storage = $this->container->get('config.storage.sync');

  // Code cribbed from Config Devel module's ConfigImporterExporter.
  $config_filename = "../config/{$site_key}/{$config_name}.yml";

  if (!file_exists($config_filename)) {
    throw new \Exception("Unable to find config file $config_filename to import from.");
  }

  $contents = @file_get_contents($config_filename);

  $data = $storage->decode($contents);
  if (!$data) {
    throw new \Exception("Failed to import config $config_name from site $site_key.");
  }

  // This assumes we only ever import entities from site config for tests,
  // which so far is the case.
  $entity_type_id = $this->container->get('config.manager')->getEntityTypeIdByName($config_name);

  if (empty($entity_type_id)) {
    throw new \Exception("Non-entity config import not yet supported!");
  }

  $entity = $this->container->get('entity_type.manager')->getStorage($entity_type_id)->create($data);
  $entity->save();
}

Use it in your test's setUp() or test methods like this:

  // Import configuration from the default site config files.
  $this->importSiteConfig('default', 'field.storage.node.field_my_field');
  $this->importSiteConfig('default', 'field.field.node.article.field_my_field');

As you can see from the documentation, this so far only handles config that is entities. I've not yet had a use for importing config that's not an entity (and just about all config items are entities except for the ones that are a collection of a module's settings). And it doesn't check for dependencies: you'll need to import them in the right order (field storage before field) and ensure the modules that provide the various things are enabled in the test.

I should mention for the sake of completeness that there's another sort of field, sort of: bundle fields. These are in code like base fields, but limited to a particular bundle. They also have a variety of problems as the system that support them is somewhat incomplete.

Finally, it occurs to me that another way to bridge the gap would be to allow editing base fields in the UI, and then export that back to code consisting of the BaseFieldDefinition calls. But hang on... haven't I just reinvented Drupal 7-era Features?

UPDATE

It turned out this was crashing when importing fields with value options. This code (and that in Config Devel module, which is where I cribbed it from) wasn't properly using the Config API. Here's updated code, which incidentally, also now can handle non-entity config:

  /**
   * Imports config from a real local site's config folder.
   *
   * Needs config module to be enabled.
   *
   * @param string $site_key
   *   The site key, that is, the subfolder in /config in which to find the
   *   config files.
   * @param string $config_name
   *   The config name. This is the same as the YML filename without the
   *   extension.
   *
   * @throws \Exception
   *   Throws an exception if the config can't be imported.
   */
  protected function importSiteConfig(string $site_key, string $config_name) {
    $storage = $this->container->get('config.storage.sync');

    // Code cribbed from Config Devel module's ConfigImporterExporter.
    $config_filename = "../config/{$site_key}/{$config_name}.yml";

    if (!file_exists($config_filename)) {
      throw new \Exception("Unable to find config file $config_filename to import from.");
    }

    $contents = @file_get_contents($config_filename);

    $data = $storage->decode($contents);
    if (!$data) {
      throw new \Exception("Failed to import config $config_name from site $site_key.");
    }

    unset($data['uuid']);

    // We have to partially mock the source storage, because otherwise
    // SystemConfigSubscriber::onConfigImporterValidateSiteUUID() will complain
    // because the source config doesn't contain a system.site config. (For
    // general information, the single-import UI in core doesn't hit this
    // problem because it TOTALLY cheats and pretends its source is the full
    // site config! Though who knows how it manages not to trip up when the site
    // config folder is empty!)
    $source_storage = $this->getMockBuilder(StorageReplaceDataWrapper::class)
      ->setConstructorArgs([$this->container->get('config.storage')])
      ->setMethods(['exists'])
      ->getMock();
    // Satisfy SystemConfigSubscriber that we have a system.site config.
    $source_storage->expects($this->any())
      ->method('exists')
      ->willReturn(TRUE);

    $source_storage->replaceData($config_name, $data);

    // Similarly mock the storage comparer.
    $storage_comparer = $this->getMockBuilder(StorageComparer::class)
      ->setConstructorArgs([$source_storage, $this->container->get('config.storage')])
      ->setMethods(['validateSiteUuid'])
      ->getMock();
    // Satisfy SystemConfigSubscriber that system.site config is valid.
    $storage_comparer->expects($this->any())
      ->method('validateSiteUuid')
      ->willReturn(TRUE);

    $storage_comparer->createChangelist();

    $config_importer = new ConfigImporter(
      $storage_comparer->createChangelist(),
      $this->container->get('event_dispatcher'),
      $this->container->get('config.manager'),
      $this->container->get('lock'),
      $this->container->get('config.typed'),
      $this->container->get('module_handler'),
      $this->container->get('module_installer'),
      $this->container->get('theme_handler'),
      $this->container->get('string_translation'),
      $this->container->get('extension.list.module')
    );

    $config_importer->import();
  }

}

UPDATE 2

I've found an occasional problem with site config items that have third-party settings from modules that aren't relevant to the test. Examples include Menu UI settings in node types, and contrib translation settings in fields. Rather than enable extra modules in the test, these settings can be hacked out of the data after we decode it from the YAML. Here's an updated version of the code.


use Drupal\Core\Config\ConfigImporter;
use Drupal\Core\Config\StorageComparer;
use Drupal\config\StorageReplaceDataWrapper;

  /**
   * Imports config from a real local site's config folder.
   *
   * @param string $site_key
   *   The site key, that is, the subfolder in /config in which to find the
   *   config files.
   * @param string $config_name
   *   The config name. This is the same as the YML filename without the
   *   extension.
   * @param array $third_party_filter
   *   (optional) Keys in the config item's third party settings that are to be
   *   removed prior to import. This prevents unwanted dependencies on modules
   *   that are not relevant to a test. Defaults to an empty array; no keys
   *   removed.
   *
   * @throws \Exception
   *   Throws an exception if the config can't be imported.
   */
  protected function importSiteConfig(string $site_key, string $config_name, array $third_party_filter = []) {
    $storage = $this->container->get('config.storage.sync');

    // Code cribbed from Config Devel module's ConfigImporterExporter.
    $config_filename = "../config/{$site_key}/{$config_name}.yml";

    if (!file_exists($config_filename)) {
      throw new \Exception("Unable to find config file $config_filename to import from.");
    }

    $contents = @file_get_contents($config_filename);

    $data = $storage->decode($contents);
    if (!$data) {
      throw new \Exception("Failed to import config $config_name from site $site_key.");
    }

    unset($data['uuid']);

    // Remove third-party settings.
    foreach ($third_party_filter as $settings_key) {
      unset($data['third_party_settings'][$settings_key]);
    }

    // We have to partially mock the source storage, because otherwise
    // SystemConfigSubscriber::onConfigImporterValidateSiteUUID() will complain
    // because the source config doesn't contain a system.site config. (For
    // general information, the single-import UI in core doesn't hit this
    // problem because it TOTALLY cheats and pretends its source is the full
    // site config! Though who knows how it manages not to trip up when the site
    // config folder is empty!)
    $source_storage = $this->getMockBuilder(StorageReplaceDataWrapper::class)
      ->setConstructorArgs([$this->container->get('config.storage')])
      ->setMethods(['exists'])
      ->getMock();
    // Satisfy SystemConfigSubscriber that we have a system.site config.
    $source_storage->expects($this->any())
      ->method('exists')
      ->willReturn(TRUE);

    $source_storage->replaceData($config_name, $data);

    // Similarly mock the storage comparer.
    $storage_comparer = $this->getMockBuilder(StorageComparer::class)
      ->setConstructorArgs([$source_storage, $this->container->get('config.storage')])
      ->setMethods(['validateSiteUuid'])
      ->getMock();
    // Satisfy SystemConfigSubscriber that system.site config is valid.
    $storage_comparer->expects($this->any())
      ->method('validateSiteUuid')
      ->willReturn(TRUE);

    $storage_comparer->createChangelist();

    $config_importer = new ConfigImporter(
      $storage_comparer->createChangelist(),
      $this->container->get('event_dispatcher'),
      $this->container->get('config.manager'),
      $this->container->get('lock'),
      $this->container->get('config.typed'),
      $this->container->get('module_handler'),
      $this->container->get('module_installer'),
      $this->container->get('theme_handler'),
      $this->container->get('string_translation'),
      $this->container->get('extension.list.module')
    );

    $config_importer->import();
  }

Tags

  • tests
  • Field API

Debugging and Logging AJAX requests tests in Docksal

By joachim, Wed, 06/11/2019 - 14:30

The hardest thing I find with tests is understanding errors. Every time I think I've got debugging output sorted, I find a new layer where it doesn't work, I've got nothing, and I'm in the dark with something that's crashing and I don't know why.

The first layer is simple: errors in your test code itself. For example, make a typo in your tests/src/Functional/MyTest.php and PHPUnit crashes and you see the error in the terminal.

But when it's site code that's crashing, you're dealing with a system that is being driven by code, and therefore, you can't see it. And that's a major obstacle to figuring out a problem.

The HTML output that Drupal's Functional and Functional Javascript tests produce is a huge help: every time your test code makes a request to the test site, an HTML file is written to the test files directory. If your site crashes when your test makes a request, you'll see the error and the backtrace there.

However, there's no such output when in a Functional Javascript test you cause an AJAX request. And while you can create a screenshot of what the page looks like after the request, our another HTML file of the page (see https://www.drupal.org/project/drupal/issues/3090498 for instructions how; the issue is about how to make that automatic but I have no idea how that might be possible), you can't see the actual error, because AJAX requests that fail just sit there doing nothing. There's nothing useful to see in the browser.

So we need to see logs. When a real site has an AJAX crash, with a human being-controlled web browser making the request, you can go and look in the logs. With a test site, the log table is zapped when the test is completed.

Fortunately, Drupal 8's pluggable logging means there are other ways of getting hold of them, more permanent ways.

I first tried log_stdout module. This outputs log errors to STDOUT. If you're running on Docksal, as I am, you have an extra layer to get though to see that. You can monitor the cli container with fin logs -f cli, and with that module add a | ag WATCHDOG to filter.

However, I wasn't seeing backtrace in this output, and I gave up figuring out why.

So I tried filelog module instead, which as the name implies, writes log to a simple text file. This needs a little bit more work, as by default it writes to 'public://logs'. This means that each run of the test gets its own log file, which is perhaps what you want, but for my own uses I wanted a single log file I could tail -f in a terminal window and have continual monitoring.

A quick bit of config setting in the test's setUp() does the trick:

$this->config('filelog.settings')
  ->set('location', '/var/www/docroot/sites/simpletest/logs')
  ->save();

And I think that's me at last sorted.

Tags

  • debugging
  • tests
  • drupal planet

Multi-site search using Feeds and SearchAPI

By joachim, Wed, 30/10/2019 - 13:08

[This is an old post that I wrote for System Seed's blog and meant to put on my own too but it fell off my radar until now. It's also about Drupal 7, but the general principle still applies.]

Handling clients with more than one site involves lots of decisions. And yet, it can sometimes seem like ultimately all that doesn't matter a hill of beans to the end-user, the site visitor. They won't care whether you use Domain module, multi-site, separate sites with common codebase, and so on. Because most people don't notice what's in their URL bar. They want ease of login, and ease of navigation. That translates into things such as the single sign-on that drupal.org uses, and common menus and headers, and also site search: they don’t care that it’s actually sites search, plural, they just want to find stuff.

For the University of North Carolina, who have a network of sites running on a range of different platforms, a unified search system was a key way of giving visitors the experience of a cohesive whole. The hub site, an existing Drupal 7 installation, needed to provide search results from across the whole family of sites.

This presented a few challenges. Naturally, I turned to Apache Solr. Hitherto, I've always considered Solr to be some sort of black magic, from the way in which it requires its own separate server (http not good enough for you?) to the mysteries of its configuration (both Drupal modules that integrate with it require you to dump a bunch of configuration files into your Solr installation). But Solr excels at what it sets out to do, and the Drupal modules around it are now mature enough that things just work out of the box. Even better, Search API module allows you to plug in a different search back-end, so you can develop locally using Drupal's own database as your search provider, with the intention of plugging it all into Solr when you deploy to servers.

One possible setup would have been to have the various sites each send their data into Solr directly. However, with the Pantheon platform this didn't look to be possible: in order to achieve close integration between Drupal and Solr, Pantheon locks down your Solr instance.

That left talking to Solr via Drupal.

Search API lets you define different datasources for your search data, and comes with one for each entity type on your site. In a datasource handler class, you can define how the datasource gets a list of IDs of things to index, and how it gets the content. So writing a custom datasource was one possibility.

Enter the next problem: the external sites that needed to be indexed only exposed their content to us in one format: RSS. In theory, you could have a Search API datasource which pulls in data from an RSS feed. But then you need to write a SearchAPI datasource class which knows how to parse RSS and extract the fields from it.

That sounded like reinventing Feeds, so I turned to that to see what I could do with it. Feeds normally saves data into Drupal entities, but maybe (I thought) there was a way to have the data be passed into SearchAPI for indexing, by writing a custom Feeds plugin?

However, this revealed a funny problem of the sort that you don’t consider the existence of until you stumble on it: Feeds works on cron runs, pulling in data from a remote source and saving it into Drupal somehow. But SearchAPI also works on cron runs, pulling data in, usually entities. How do you get two processes to communicate when they both want to be the active participant?

With time pressing, I took the simple option: define a custom entity type for Feeds to put its data into, and SearchAPI to read its data from. (I could have just used a node type, but then there would have been an ongoing burden of needing to ensure that type was excluded from any kind of interaction with nodes.)

Essentially, this custom entity type acted like a bucket: Feeds dumps data in, SearchAPI picks data out. As solutions go, not the most massively elegant, at first glance. But if you think about it, if I had gone down the route of SearchAPI fetching from RSS directly, then re-indexing would have been a really lengthy process, and could have had consequences for the performance of the sites whose content was being slurped up. A sensible approach would then have been to implement some sort of caching on our server, either of the RSS feeds as files, or the processed RSS data. And suddenly our custom entity bucket system doesn’t look so inelegant after all: it’s basically a cache that both Feeds and SearchAPI can talk to easily.

There were a few pitalls. With Search API, our search index needed to work on two entity types (nodes and the custom bucket entities), and while Search API on Drupal 7 allows this, its multiple entity type datasource handler had a few issues to iron out or learn to live with. The good news though is that the Drupal 8 version of Search API has the concept of multi-entity type search indexes at its core, rather than as a side feature: every index can handle multiple entity types, and there’s no such thing as a datasource for a single entity type.

With Feeds, I found that not all the configuration is exportable to Features for easy deployment. Everything about parsing the RSS feed into entities can be exported, except the actual URL, which is a separate piece of setup and not exportable. So I had to add a hook_updateN() to take care of setting that up.

The end result though was a site search that seamlessly returns results from multiple sites, allowing users to work with a network of disparate sites built on different technologies as if they were all the same thing. Which is what they were probably thinking they were all along anyway.

Tags

  • drupal planet
  • search
  • Feeds

Controlling multiple sites with Drush 9

By joachim, Sat, 16/03/2019 - 20:44

Drush 9 has removed dynamic site aliases. Site aliases are hardcoded in YAML files rather than declared in PHP. Sadly, that means that many tricks you could do with the declaration of the site aliases are no longer available.

The only grouping possible is based on the YAML filename. So for example, with the Acquia Cloud Site Factory site aliases generated by the 'blt recipes:aliases:init:acquia' command, you can run a command on the same site across different environments.

But what you can't do is run a command on all the sites in one environment.

One use case for this is checking whether a module is enabled on any sites, so you know that it's safe to remove it from the codebase.

Currently, this is quite a laborious process, as 'drush pm-list' needs to be run for each site.

With environment aliases, this would be a one liner:

drush @hypothetical-env-alias pm-list | ag some_module

('ag' is the very useful silver searcher unix command, which is almost the same as the also excellent 'ack' but faster, and both are much better than grep.)

While site aliases are fixed, they can be altered with Drush hooks. I considered that these might allow something to dynamically declare aliases, or a command option. There's an example of altering aliases with a hook in the Drush code.

In the meantime, a much simpler solution is to use xargs, which I have recently found is extremely useful in all sorts of situations. Because this allows you to run one command multiple times with a set of parameters, all you need to do is pass it a list of site aliases. Fortunately, the 'drush sa' command has lots of formatting options, and one of them gives us just what we need, a list of aliases with one on each line:

drush sa --format=list 

That gives us all the aliases, and we probably don't want that. So here's where ag first comes in to play, as we can filter the list, for example, to only run on live sites (I'm using my ACSF aliases here as an example):

drush sa --format=list| ag 01live

Now we have a filtered list of aliases, and we can feed that into xargs:

drush sa --format=list| ag 01live | xargs -I % drush % pm-list

Normally, xargs puts the input parameter at the end of its command, but here we want it inserted just after the 'drush' command. The -I parameter allows us to specify a placeholder where the input parameter goes, so:

xargs -I % drush % pm-list

says that we want the site name to go where the '%' is, and means that that xargs will run:

drush SITE-ALIAS pm-list

with each value it receives, in this case, each site alias.

Another thing we will do with xargs is set the -t parameter, which outputs each actual command it executes on STDERR. That acts as a heading in the output, so we can clearly see which site is outputting what.

Finally, we can use ag a second time to filter the module list down to just the module we want to find out about:

drush sa --format=list | ag live | xargs -t -I % drush % pml | ag some_module 

The nice thing about the -t parameter is that as it's STDERR, it's not affected by the final pipe to ag for filtering output. So the output will consist of the drush command for the site, followed by the filtered output.

And hey presto.

In conclusion: dynamic site aliases in Drush were nice, but the maintainers removed them (as far as I can gather) because they were a mess to implement, and removing them vastly simplified things. Doing the equivalent with xargs took a bit of figuring out, but once you know how to do it, it's actually a much more powerful way to work with multiple sites at once.

Tags

  • Drush
  • drupal planet

Unnatural file changes with git

By joachim, Sun, 27/01/2019 - 07:19

When you check out a branch or commit with git, two things happen: git changes the files in the repository folder, and changes the file that tells it what is currently checked out. In git terms, it changes what HEAD points to. In practical terms, it updates the .git/HEAD text file to contain a different reference.

But these two operations can be done separate from one another, so that while the files correspond to one commit, git thinks that the HEAD commit is another one.

This of course puts your git repository in an unstable, even unnatural state. But there are useful reasons for doing this, and one of these comes up in the operation of dorgflow, my tool for working with git and drupal.org patches.

Dorgflow creates a local branch for a drupal.org issue, and creates a commit for each patch, so you end up with a nice sequential representation of the work that's been done so far.

But every patch that's uploaded to an issue is a patch against the master branch, so we can't just apply the patches one by one and make commits: the first patch will apply, and all the subsequent ones will fail.

What we need is for the files be in the same state as the master branch, so that the patch applies, but when we make a commit, it needs to be a new commit on the feature branch:

 * [Feature branch] Patch 1 <- Make patch 2 commit on top of this commit.
/
* [master branch] Latest master commit. <- Apply patch 2 to these files.

In git terms, we want the current files (the working tree, as git calls it) to be on the master branch, while git's HEAD is on the feature branch.

For my first attempt at getting this to work, I did the following:

  1. Put git on the feature branch.
  2. Check out the master branch's files, with git checkout master -- .

When you use git checkout command with a file or files specified, it doesn't move HEAD, but instead changes just those files to the given commit or branch. If you give '.' as the file to check out, then it does that for all of the repository's files. In effect, you get the files to look like the given commit, in our case the master branch.

This is what we want, but it has one crucial flaw. Suppose patch 1 added a new file, foo.php. When we check out master's files, foo.php is not changed, because it's not on master. There's nothing on master to overwrite it. The git checkout master -- . command doesn't actually say 'make all the files look like master', it says 'check out all the files from master'. foo.php isn't on master, and so it's simply left alone.

Suppose now that patch 2 also adds the foo.php file, which it most likely will, since in most cases, a newer patch incorporates all the work of the previous one.

Applying patch 2 to the files in their current state will fail, because patch 2 tries to create the file foo.php, but it's already there.

So this approach is no good. I was stuck with this problem with dorgflow for months, until I had a brainwave: instead of staying on the feature branch and changing the files to look like master, why not check out the master branch, but tell git that the HEAD is at the feature branch?

That solves the problem of the foo.php file: when you check out master, the foo.php that's at the patch 1 commit vanishes, because it doesn't exist on master. So applying patch 2 will be fine.

The only remaining question was how to you tell git it's on a different branch without doing a checkout of files? Turns out this is a simple plumbing command: git symbolic-ref HEAD refs/heads/BRANCH. This is just telling git to change the reference that HEAD points to, and that's how git stores the fact that a particular branch is the current one.

This was a simple change to make in dorgflow (it was actually more work to update the tests so the mocks had the correct method call expectations!).

This means that dorgflow accordingly now handles applying a sequence of issue patches that add files now works in the latest release.

Tags

  • git

Getting more than you bargained for: removing a Drupal module with Composer

By joachim, Wed, 09/01/2019 - 14:19

It's no secret that I find Composer a very troublesome piece of software to work with.

I have issues with Composer on two fronts. First, its output is extremely user-unfriendly, such as the long lists of impenetrable statements about dependencies that it produces when it tells you why it can't make a change you request. Second, many Composer commands have unwanted side-effects, and these work against the practice that changes to your codebase should be as simple as possible for the sake of developer sanity, testing, and user acceptance.

I recently discovered that removing packages is one such task where Composer has ideas of its own. A command such as remove drupal/foo will take it on itself to also update some apparently unrelated packages, meaning that you either have to manage the deployment of these updates as part of your uninstallation of a module, or roll up your sleeves and hack into the mess Composer has made of your codebase.

Guess which option I went for.

Step 1: Remove the module you actually want to remove

Let's suppose we want to remove the Drupal module 'foo' from the codebase because we're no longer using it:

$ composer remove drupal/foo

This will have two side effects, one of which you might want, and one of which you definitely don't.

Side effect 1: dependent packages are removed

This is fine, in theory. You probably don't need the modules that are dependencies of foo. Except... Composer knows about dependencies declared in composer.json, which for Drupal modules might be different from the dependencies declared in module info.yml files (if maintainers haven't been careful to ensure they match). UPDATE: I've been informed in comments that drupal.org's packaging process ensures these are kept in sync. So that's one less thing to worry about!

Furthermore, Composer doesn't know about Drupal configuration dependencies. You could have the situation where you installed module Foo, which had a dependency on Bar, so you installed that too. But then you found Bar was quite useful in itself, and you've created content and configuration on your site that depends on Bar. Ideally, at that point, you should have declared Bar explicitly in your project's root composer.json, but most likely, you haven't.

So at this point, you should go through Composer's output of what it's removed, and check your site doesn't have any of the Drupal modules enabled.

I recommend taking the list of Drupal modules that Composer has just told you it's removed in addition to the requested one, and checking its status on your live site:

$ drush pml | ag MODULE

If you find that any modules are still enabled, then revert the changes you've just made with the remove command, and declare the modules in your root composer.json, copying the declaration from the composer.json file of the module you are removing. Then start step 1 again.

Side effect 2: unrelated packages are updated

This is undesirable basically because any package update is something that has to be evaluated and tested before it's deployed. Having that happen as part of a package removal turns what should be a straight-forward task into something complex and unpredictable. It's forcing the developer to handle two operations that should be separate as one.

(It turns out that the maintainers of Composer don't even consider this to be a problem, and as I have unfortunately come to expect, the issue on github is a fine example of bad maintainership (for the nadir, see the issue on the use of JSON as a format for the main composer file) -- dismissing the problems that users explain they have, claiming the problems are by design, and so on.)

So to revert this, you need to pick apart the changes Composer has made, and reverse some of them.

Before you go any further, commit everything that Composer changed with the remove command. In my preferred method of operation, that means all the files, including the modules folder and the vendor folder. I know that Composer recommends you don't do that, but frankly I think trusting Composer not to damage your codebase on a whim is folly: you need to be able to back out of any mess it may make.

Step 2: Repair composer.lock

The composer.lock file is the record of how the packages currently are, so to undo some of the changes Composer made, we undo some of the changes made to this file, then get Composer to update based on the lock.

First, restore version of composer.lock to how it was before you started:

$ git checkout HEAD^ composer.lock

Unstage it. I prefer a GUI for git staging and unstaging operations, but on the command line it's:

$ git reset composer.lock

Your composer lock file now looks as it did before you started.

Use either git add -p or your favourite git GUI to pick out the right bits. Understanding which bits are the 'right bits' takes a bit of mental gymnastics: overall, we want to keep the changes in the last commit that removed packages completely, but we want to discard the changes that upgrade packages.

But here we've got a reverted diff. So in terms of what we have here, we want to discard changes that re-add a package, and stage and commit the changes that downgrade packages.

When you're done staging you should have:

  • the change to the content hash should be unstaged.
  • chunks that are a whole package should be unstaged
  • chunks that change version should be staged (be sure to get all the bits that relate to a package)

Then commit what is staged, and discard the rest.

Then do a git diff of composer.lock against your starting point: you should see only complete package removals.

Step 3: Restore packages with unrelated changes

Finally, do:

$ composer update --lock

This will restore the packages that Composer updated against your will in step 1 to their original state.

If you are committing Composer-managed packages to your repository, commit them now.

As a final sanity check, do a git diff against your starting point, like this:

$ git diff --name-status master

You should see mostly deleted files. To verify there's nothing that shouldn't be there in the changed files, do:

$ git diff --name-status master | ag '^[^D]'

You should see only composer.json, composer.lock, and the autoloader's files.

PS. If I am wrong and there IS a way to get Compose to remove a package without side-effects, please tell me.

I feel I have exhausted all the options of the remove command:

  • --no-update only changes composer.json, and makes no changes to package files at all. I'm not sure what the point of this is.
  • --no-update-with-dependencies only removes the one package, and doesn't remove any dependencies that are not required anywhere else. This leaves you having to pick through composer.json files yourself and remove dependencies individually, and completely obviates the purpose of a package manager!

Why is something as simple as a package removal turned into a complex operation by Composer? Honestly, I'm baffled. I've tried reasoning with the maintainers, and it's a brick wall.

PPS. Since writing this post, I’ve made Composer Manifest a small Composer plugin which makes it easier to see what Composer has decided to change behind your back. Every time you do a Composer update, install, or remove, it writes a YAML file that lists all the installed packages with their versions. Committing that to your repository means you have an easy way to see exactly what's been changed and when.

Tags

  • Composer

Drupal Code Builder: the analytical, adaptive code generator

By joachim, Sun, 09/09/2018 - 21:26

It's now just over 10 years since webchick handed maintainership of the Module Builder project over to me. I can still remember nervously approaching her at the end of a day at DrupalCon 2008, outside the building where people were chatting, and asking her if she'd had time to look at the patches I'd posted for it. I was at my first DrupalCon; she'd just been named as the core co-maintainer for the work about to start on Drupal 7. No, she hadn't, she said, and she instantly came to the conclusion that she'd never have the time to look at them, and so got her laptop out of her rucksack and there and then edited the project node to make me the maintainer.

Given the longevity of the project, I am often surprised when I talk to people that they don't know what its crucial advantage is over the other Drupal code generating tools that now also exist. Drupal Code Builder, as it's now called, is fundamentally different from Drupal Console's Generator command, and the Drupal Code Generator library that’s included in Drush 9.

I feel that perhaps this requires a buzzword, to make it more noticeable and memorable.

So here is one: Drupal Code Builder is an analytical code generator.

I'm going to add a second one: the Drupal Code Builder Drush commands and Module Builder (which still exists, though is now just module that provides a Drupal-based UI for the Drupal Code Builder library) are adaptive.

What do I mean by those?

When you first install Drupal Code Builder, it has no information about Drupal hooks, or plugin types, or services. There is no data in the codebase on how to generate a hook_form_alter() implementation, or a Block plugin, or how to inject the entity_type.manager service.

Before you can generate code, you need to run a command or submit an admin form which will analyse your entire codebase, and store the resulting data on hooks, plugin types, services, and other Drupal structures (the list of what is analysed keeps growing.).

The consequence of this is huge: Drupal Code Builder is not limited to core hooks, or to the hooks that it's been programmed for. Drupal Code Builder knows all the hooks in your codebase: from core, from contrib module, even from any of your own custom code that invents hooks.

Now repeat that statement for plugin types, for services, and so on.

And that analysis process can, and should, be repeated when you download a new module, because then any Drupal structures (hooks, plugin types, etc) that are defined in new modules gets added to the list of what Drupal Code Builder can generate.

This means that you can use Drupal Code Builder to do something like this:

  1. Generate a new plugin type. This creates a plugin manager service class and declaration, an annotation class that defines the plugin annotation, and an interface and base class for the plugins.
  2. Re-run DCB's analysis.
  3. Generate a plugin of the type you've just made.

When I say that this technique of analysis makes Drupal Coder Builder more powerful than other code generators, I'm not even blowing my own trumpet: the module I was handed in 2008 did the same. Back then in the Drupal 6 era is needed to connect to drupal.org's CVS repository to download the api.php files that document a module's hooks (which is why the Drush command for updating the code analysis prior to Drush 9 is called mb-download). For Drupal 7, the api.php files were moved into the core Drupal codebase, so from that point the process would scan the site's codebase directly. For Drupal 8, I added lots more code analysis, of plugin types and services, but still, the original idea remains: Drupal Code Builder knows how to detect Drupal structures, so it can know more structures than can be hardcoded.

This analysis can get pretty complex. While for hooks, it's just a case of running a regex on api.php files, for detecting information about plugin types, all sorts of techniques are used, such as searching the class parentage of all the plugins of a type to try to guess whether there is a plugin base class. It’s not always perfect, and it could always use more refinement, but it’s a more powerful approach than hardcoding a necessarily finite number of cases. (If only plugin types were declared in some way, or if the documentation for them were systematic in the way hook documentation is, this would all be so much simpler, and much more importantly, it would be more accurate!)

My second buzzword, that UIs for Drupal Code Builder are adaptive, describes the way that neither of the Drush command nor the Drupal module need to know what Drupal Code Builder can build. They merely know the API for describing properties, and how to present them to the user, to then pass the input values to Drupal Code Builder to get the generated code.

This is analogous to the way that Form API doesn’t know anything about a particular form, just about the different kinds of form elements.

This isn’t as exciting or indeed relevant to the end-user, but it does mean that development on Drupal Code Builder can move quickly, as adding new things to generate doesn’t require coordinated releases of software packages for the UI. In fact, I think nearly every new release of Drupal Code Builder has featured some sort of new code generating functionality, while keeping the API the same.

For example, this used to be the UI for adding a route on Drupal and Drush:

Screenshot showing Drush UI Screenshot showing Module Builder UI

A later release of Drupal Code Builder turned the UI into this, without the Drush command or Module Builder needing their code to be updated:

Screenshot showing Drush UI Screenshot showing Module Builder UI

Similarly, an even later version of Drupal Code Builder added code analysis to detect all top-level admin menu items, and so the admin settings form generation now lets you pick where to place the form in the menu.

It would be nice to think that a couple of buzzwords could gain Drupal Code Builder more attention and more users, but I fear that Drupal Console’s generator has got rather more traction in Drupal 8, despite its huge limitation. It’s disappointing that Drush has now added a third code generator to the mix, to even further dilute the ecosystem, and that it’s just as limited by hardcoding.

So where do we go from here?

Well, people get attached to UIs, but they don’t care much about what powers them, especially in this day and age of Composer, where adding dependencies no longer creates an imposition on the end-user.

So I suggest the following: the code analysis portion of Drupal Code Builder could be extracted to a new package. It doesn’t depend on anything on the code generation side, so that should be fairly simple. It provides an API that supports analysis being run in batches, which could maybe do with being spruced up, but it’s good enough for a 1.0.0 version for now.

Then, any code generating system would be able to use it. Both Console and Drush could replace their hardcoded lists of hooks and plugins with analytical data, while keeping the commands they expose to the end-user unchanged.

I’ll be at DrupalEurope in Darmstadt, where I’ll be running a BoF to discuss where we go next with code generation on Drupal.

Tags

  • Drupal Code Builder

Pagination

  • First page
  • Previous page
  • Page 1
  • Current page 2
  • Page 3
  • Page 4
  • Page 5
  • Page 6
  • Page 7
  • Next page
  • Last page
Joachim&#039;s blog

Frequent tags

  • Drupal Code Builder (9)
  • git (7)
  • module builder (7)
  • 6.x (5)
  • drupal commerce (4)
  • Composer (3)
  • Drush (3)
  • contributing code (3)
  • development (3)
  • Entity API (3)
  • tests (3)
  • Field API (3)
  • patching (3)
  • Rector (3)
  • wtf (2)
  • maintaining projects (2)
  • code style (2)
  • contrib module (2)
  • drupal.org (2)
  • debugging (2)
  • multisite (2)
  • deprecation (2)
  • issue queue (2)
  • Drupal core (2)
  • core (2)
  • modules (2)
  • roadmap (2)
  • 7.x (2)
  • developer tools (2)
Powered by Drupal