





import BaseDriver from '@appium/base-driver'

class MyNewDriver extends BaseDriver {

这个空驱动程序什么也不做,但您可以把它包装在一个Node.js模块中,在模块的清单(package.json)中添加一些与Appium相关的字段,然后使用appium driver install来安装它。


假设我想用这个空驱动程序做一些事情;首先我必须决定我想实现哪个WebDriver命令。对于我们的示例,让我们实现Navigate To这个WebDriver命令。暂且不谈执行此命令时我想让驱动程序做什么。为了告诉Appium驱动程序可以处理该命令,我们所要做的就是在我们的驱动程序类中定义一个这样的方法:1

async setUrl(url) {
    // 在这里做我们想做的事


  • 浏览器: 执行一些JavaScript去设置window.location.href
  • iOS应用程序: 使用深度链接启动应用程序
  • Android应用程序: 使用深度链接启动应用程序
  • React应用程序: 加载特定路线
  • Unity: 转到指定的场景




But typically what driver authors want to do is to provide automation behaviours for a given platform(s) that are semantically very similar to the the WebDriver spec implementations for browsers. When you want to find an element, you should get a reference to a UI element. When you want to click or tap that element, the resulting behaviour should be the same as if a person were to click or tap on the element. And so on.

So the real challenge for driver authors is not how to work with the WebDriver protocol (because BaseDriver encapsulates all that for you), but how to make the actual automation happen on the target platform. Every driver relies on its own set of underlying technologies here. As mentioned in the Overview, the iOS driver uses an Apple technology called XCUITest. These underlying automation technologies usually have proprietary or idiosyncratic APIs of their own. Writing a driver becomes the task of mapping the WebDriver protocol to this underlying API (or sometimes a set of different underlying APIs--for example, the UiAutomator2 driver relies not only on the UiAutomator2 technology from Google, but also functions only available through ADB, as well as functions only available via the Android SDK inside a helper app). Tying it all together into a single, usable, WebDriver interface is the incredibly useful (but incredibly challenging) art of driver development!


In practice, this often results in a pretty complex architecture. Let's take iOS for example again. The XCUITest framework (the one used by the Appium driver) expects code that calls it to be written in Objective-C or Swift. Furthermore, XCUITest code can only be run in a special mode triggered by Xcode (and directly or indirectly, the Xcode command line tools). In other words, there's no straightforward way to go from a Node.js function implementation (like setUrl() above) to XCUITest API calls.

What the XCUITest driver authors have done is instead to split the driver into two parts: one part written in Node.js (the part which is incorporated into Appium and which initially handles the WebDriver commands), and the other part written in Objective-C (the part which actually gets run on an iOS device and makes XCUITest API calls). This makes interfacing with XCUITest possible, but introduces the new problem of coordination between the two parts.

The driver authors could have chosen any of a number of very different strategies to model the communication between the Node.js side and the Objective-C side, but at the end of the day decided to use ... the WebDriver protocol! That's right, the Objective-C side of the XCUITest driver is itself a WebDriver implementation, called WebDriverAgent.3

  • The Appium XCUITest driver builds and manages WebDriverAgent for you, which can be a pain and involves the use of Xcode.
  • The XCUITest driver does lots more than what can be done by WebDriverAgent, for example working with simulators or devices, installing apps, and the like.

The moral of the story is that driver architectures can become quite complicated and multilayered, due to the nature of the problem we're trying to solve. It also means it can be difficult sometimes to tell where in this chain of technologies something has gone wrong, if you run into a problem with a particular test. With the XCUITest world again, we have something like the following set of technologies all in play at the same time:

  • Your test code (in its programming language) - owned by you
  • The Appium client library - owned by Appium
  • The Selenium client library - owned by Selenium
  • The network (local or Internet)
  • The Appium server - owned by Appium
  • The Appium XCUITest driver - owned by Appium
  • WebDriverAgent - owned by Appium
  • Xcode - owned by Apple
  • XCUITest - owned by Apple
  • iOS itself - owned by Apple
  • macOS (where Xcode and iOS simulators run) - owned by Apple

It's a pretty deep stack!


There's one other important architectural aspect of drivers to understand. It can be exemplified again by the XCUITest driver. Recall that we just discussed how the two "halves" of the XCUITest driver both speak the WebDriver protocol---the Node.js half clicks right into Appium's WebDriver server, and the Objective-c half (WebDriverAgent) is its own WebDriver implementation.

This opens up the possibility of Appium taking a shortcut in certain cases. Let's imagine that the XCUITest driver needs to implement the Click Element command. The internal code of this implementation would look something like taking the appropriate parameters and constructing an HTTP request to the WebDriverAgent server. In this case, we're basically just reconstructing the client's original call to the Appium server!4 So there's really no need to even write a function implementing the Click Element command. Instead, the XCUITest driver can just let Appium know that this command should be proxied directly to some other WebDriver server.

If you're not familiar with the concept of "proxying," in this case it just means that the XCUITest driver will not be involved at all in handling the command. Instead it will merely be repackaged and forwarded to WebDriverAgent at the protocol level, and WebDriverAgent's response will likewise be passed back directly to the client, without any XCUITest driver code seeing it or modifying it.

This architectural pattern provides a nice bonus for driver authors who choose to deal with the WebDriver protocol everywhere, rather than constructing bespoke protocols. It also means that Appium can create wrapper drivers for any other existing WebDriver implementation very easily. If you look at the Appium Safari driver code, for example, you'll see that it implements basically no standard commands, because all of these are proxied directly to an underlying SafariDriver process.

It's important to understand that this proxying business is sometimes happening under the hood, because if you're ever diving into some open source driver code trying to figure out where a command is implemented, you might be surprised to find no implementation at all in the Node.js driver code itself! In that case, you'll need to figure out where commands are being proxied to so you can look there for the appropriate implementation.

OK, that's enough for this very detailed introduction to drivers!

  1. 您可能会注意到setUrl看起来一点也不像Navigate To,那么我们怎么知道应该使用它而不是其他随机字符串呢?Appium中WebDriver协议到方法名的映射是在@appium/base-driver包中名为routes.js的特殊文件中定义的。因此,如果您正在编写一个驱动程序,您可以在这里弄清楚要使用什么方法名以及需要什么参数。或者您可以查看任何主要的Appium驱动程序源代码! 

  2. 当然,我们希望语义尽可能相似,但在iOS世界中,例如,通过深度链接(一个带有特定应用程序协议的URL)启动应用程序,这几乎是我们能够实现的最接近于导航到网页URL的方式。 

  3. You could in theory, therefore, point your WebDriver client straight to WebDriverAgent and bypass Appium entirely. This is usually not convenient, however, for a few reasons: 

  4. It's not exactly the same call, because the Appium server and the WebDriverAgent server will generate different session IDs, but these differences will be handled transparently.