Building Drivers
Appium wants to make it easy for anyone to develop their own automation drivers as part of the Appium ecosystem. This guide will explain what's involved and how you can accomplish various driver development tasks using the tools Appium provides. This guide assumes you (1) are a competent user of Appium, (2) are a competent Node.js developer, and (3) that you have read and understood the Driver Intro.
If that describes you, great! This guide will get you started.
Before you create your driver¶
Before you get to work implementing your driver, it's important to have a few things sorted out. For example, you need to know what your driver will do. Which platform is it trying to expose WebDriver automation for?
Appium doesn't magically give you the power to automate any platform. All it does is give you a set of convenient tools for implementing the WebDriver Protocol. So if you want to create, for example, a driver for a new app platform, you'll need to know how to automate apps on that platform without Appium.
This usually means that you need to be very familiar with app development for a given platform. And it usually means that you will rely on tools or SDKs provided by the platform vendor.
Basically, if you can't answer the question "how would I launch, remotely trigger behaviours, and read state from an app on this platform?" then you're not quite ready to write an Appium driver. Make sure you do the research to feel comfortable that there is a path forward. Once there is, coding it up and making it available as an Appium driver should be the easy part!
Other drivers to reference¶
One of the greatest things about building an Appium driver is that there are already a number of open source Appium drivers which you can look at for reference. There is a fake-driver sample driver which does basically nothing other than showcase some of the things described in this guide.
And of course, all of Appium's official drivers are open source and available in repositories at the project's GitHub organization. So if you ever find yourself asking, "how does a driver do X?", read the code for these drivers! Also don't be afraid to ask questions of the Appium developers if you get stuck; we're always happy to help make sure the driver development experience is a good one!
Basic requirements for Appium drivers¶
These are the things your driver must do (or be), if you want it to be a valid Appium driver.
Node.js package with Appium extension metadata¶
All Appium drivers are fundamentally Node.js packages, and therefore must have a valid
package.json
. Your driver is not limited to Node.js, but it must provide an adapter written in Node.js so it can be loaded by Appium.
Your package.json
must include appium
as a peerDependency
. The requirements for the
dependency versions should be as loose as possible (unless you happen to know your driver will only
work with certain versions of Appium). For Appium 2, for example, this would look something like
^2.0.0
, declaring that your driver works with any version of Appium that starts with 2.x.
Your package.json
must contain an appium
field, like this (we call this the 'Appium extension
metadata'):
{
...,
"appium": {
"driverName": "fake",
"automationName": "Fake",
"platformNames": [
"Fake"
],
"mainClass": "FakeDriver"
},
...
}
The required subfields are:
driverName
: this should be a short name for your driver.automationName
: this should be the string users will use for theirappium:automationName
capability to tell Appium to use your driver.platformNames
: this is an array of one or more platform names considered valid for your driver. When a user sends in theplatformName
capability to start a session, it must be included in this list for your driver to handle the session. Known platform name strings include:iOS
,tvOS
,macOS
,Windows
,Android
.mainClass
: this is a named export (in CommonJS style) from yourmain
field. It must be a class which extends Appium'sBaseDriver
(see below).
Extend Appium's BaseDriver
class¶
Ultimately, your driver is much easier to write because most of the hard work of implementing the
WebDriver protocol and handling certain common logic is taken care of already by Appium. This is
all encoded up as a class which Appium exports for you to use, called BaseDriver
. It is exported
from appium/driver
, so you can use one of these styles to import it and create your own class
that extends it:
import {BaseDriver} from 'appium/driver';
// or: const {BaseDriver} = require('appium/driver');
export class MyDriver extends BaseDriver {
}
Make your driver available¶
That's basically it! With a Node.js package exporting a driver class and with correct Appium extension metadata, you've got yourself an Appium driver! Now it doesn't do anything, but you can load it up in Appium, start and stop sessions with it, etc...
To make it available to users, you could publish it via NPM. When you do so, your driver will be installable via the Appium CLI:
It's a good idea to test your driver first, of course. One way to see how it works within Appium is to install it locally first:
Developing your driver¶
How you develop your driver is up to you. It is convenient, however, to run it from within Appium
without having to do lots of publishing and installing. The most straightforward way to do this is
to include the most recent version of Appium as a devDependency
, and then also your own driver,
like this:
Now, you can run Appium locally (npm exec appium
or npx appium
), and because your driver is
listed as a dependency alongside it, it will be automatically "installed" and available. You can
design your e2e tests this way, or if you're writing them in Node.js, you can simply import
Appium's start server methods to handle starting and stopping the Appium server in Node. (TODO:
reference an implementation of this in one of the open source drivers when ready).
Another way to do local development with an existing Appium server install is to simply install your driver locally:
Refreshing your driver during development¶
When the Appium server starts, it loads your driver into memory. Changes to your driver code will not take effect until the next time the Appium server starts. Simply starting a new session is not sufficient to cause your driver's code to be reloaded.
However, you can set the APPIUM_RELOAD_EXTENSIONS
environment variable to 1
to request that
Appium clear its module cache and reload extensions whenever a new session is requested. This may
obviate the need to restart the server when you make code changes to your driver.
Standard driver implementation ideas¶
These are things you will probably find yourself wanting to do when creating a driver.
Set up state in a constructor¶
If you define your own constructor, you'll need to call super
to make sure all the standard state
is set up correctly:
The args
parameter here is the object containing all the CLI args used to start the Appium
server.
Define and validate accepted capabilities¶
You can define your own capabilities and basic validation for them. Users will always be able to send in capabilities that you don't define, but if they send in capabilities you have explicitly defined, then Appium will validate that they are of the correct type (and will check for the presence of required capabilities).
If you want to turn capability validation off entirely, set this.shouldValidateCaps
to false
in
your constructor.
To give Appium your validation constraints, set this.desiredCapConstraints
to a validation object
in your constructor. Validation objects can be somewhat complex. Here's an example from the
UiAutomator2 driver:
{
app: {
presence: true,
isString: true
},
automationName: {
isString: true
},
browserName: {
isString: true
},
launchTimeout: {
isNumber: true
},
}
Start a session and read capabilities¶
Appium's BaseDriver
already implements the createSession
command, so you don't have to. However
it's very common to need to perform your own startup actions (launching an app, running some
platform code, or doing different things based on capabilities you have defined for your driver).
So you'll probably end up overriding createSession
. You can do so by defining the method in your
driver:
async createSession(jwpCaps, reqCaps, w3cCaps, otherDriverData) {
const [sessionId, caps] = super.createSession(w3cCaps);
// do your own stuff here
return [sessionId, caps];
}
For legacy reasons, your function will receive old-style JSON Wire Protocol desired and required
caps as the first two arguments. Given that the old protocol isn't supported anymore and clients
have all been updated, you can instead only rely on the w3cCaps
parameter. (For a discussion
about what otherDriverData
is about, see the section below on concurrent drivers).
You'll want to make sure to call super.createSession
in order to get the session ID as well as
the processed capabilities (note that capabilities are also set on this.caps
; modifying caps
locally here would have no effect other than changing what the user sees in the create session
response).
So that's it! You can fill out the middle section with whatever startup logic your driver requires.
End a session¶
If your driver requires any cleanup or shutdown logic, it's best to do it as part of overriding the
implementation of deleteSession
:
async deleteSession() {
// do your own cleanup here
// don't forget to call super!
await super.deleteSession();
}
It's very important not to throw any errors here if possible so that all parts of session cleanup can succeed!
Access capabilities and CLI args¶
You'll often want to read parameters the user has set for the session, whether as CLI args or as
capabilities. The easiest way to do this is to access this.opts
, which is a merge of all options,
from the CLI or from capabilities. So for example to access the appium:app
capability, you could
simply get the value of this.opts.app
.
If you care about knowing whether something was sent in as a CLI arg or a capability, you can
access the this.cliArgs
and this.caps
objects explicitly.
In all cases, the appium:
capability prefix will have been stripped away by the time you are
accessing values here, for convenience.
Implement WebDriver classic commands¶
You handle WebDriver commands by implementing functions in your driver class. Each member of the WebDriver Protocol, plus the various Appium extensions, has a corresponding function that you implement if you want to support that command in your driver. The best way to see which commands Appium supports and which method you need to implement for each command is to look at Appium's routes.js. Each route object in this file tells you the command name as well as the parameters you'd expect to receive for that command.
Let's take this block for example:
'/session/:sessionId/url': {
GET: {command: 'getUrl'},
POST: {command: 'setUrl', payloadParams: {required: ['url']}},
}
Here we see that the route /session/:sessionId/url
is mapped to two commands, one for a GET
request and one for a POST
request. If we want to allow our driver to change the "url" (or
whatever that might mean for our driver), we can therefore implement the setUrl
command, knowing
it will take the url
parameter:
A few notes:
- all command methods should be async
functions or otherwise return a Promise
- you don't need to worry about protocol encoding/decoding. You will get JS objects as params, and
can return JSON-serializable objects in response. Appium will take care of wrapping it up in the
WebDriver protocol response format, turning it into JSON, etc...
- all session-based commands receive the sessionId
parameter as the last parameter
- all element-based commands receive the elementId
parameter as the second-to-last parameter
- if your driver doesn't implement a command, users can still try to access the command, and will
get a 501 Not Yet Implemented
response error.
Implement WebDriver BiDi commands¶
WebDriver BiDi is a newer version of the WebDriver spec which is implemented over Websockets instead of HTTP. As an Appium driver author you can take advantage of Appium's BiDi support without having to know anything about the BiDi protocol or Websockets. Implementing handlers for BiDi commands works just the same as implementing handlers for WebDriver classic commands (described in the previous section). You simply define a method on your driver of the appropriate name, and it will be called when the BiDi command is requested by the client. To see which specific names you should use for BiDi commands, have a look at bidi-commands.js
Currently, you also need to define a doesSupportBidi
field on your driver instances, and ensure
it is set to true
. Appium will not turn on its Websocket servers for your driver and set up any
handlers unless your driver says that it supports BiDi in this way.
Implement element finding¶
Element finding is a special command implementation case. You don't actually want to override
findElement
or findElements
, even though those are what are listed in routes.js
. Appium does
a lot of work for you if instead you implement this function:
Here's what gets passed in:
strategy
- a string, the locator strategy being usedselector
- a string, the selectormult
- boolean, whether the user has requested one element or all elements matching the selectorcontext
- (optional) if defined, will be a W3C Element (i.e., a JS object with the W3C element identifier as the key and the element ID as the value)
And you need to return one of the following:
- a single W3C element (an object as described above)
- an array of W3C elements
Note that you can import that W3C web element identifier from appium/support
:
What you do with elements is up to you! Usually you end up keeping a cache map of IDs to actual element "objects" or whatever the equivalent is for your platform.
Define valid locator strategies¶
Your driver might only support a subset of the standard WebDriver locator strategies, or it might
add its own custom locator strategies. To tell Appium which strategies are considered valid for
your driver, create an array of strategies and assign it to this.locatorStrategies
:
Appium will throw an error if the user attempts to use any strategies other than the allowed ones, which enables you to keep your element finding code clean and deal with only the strategies you know about.
By default, the list of valid strategies is empty, so if your driver isn't simply proxying to another WebDriver endpoint, you'll need to define some. The protocol-standard locator strategies are defined here.
Throw WebDriver-specific errors¶
The WebDriver spec defines a set of error
codes to accompany command responses if an
error occurred. Appium has created error classes for each of these codes, so you can throw the
appropriate error from inside a command, and it will do the right thing in terms of the protocol
response to the user. To get access to these error classes, import them from appium/driver
:
Log messages to the Appium log¶
You can always use console.log
, of course, but Appium provides a nice logger for you as
this.log
(it has .info
, .debug
, .log
, .warn
, .error
methods on it for differing log
levels). If you want to create an Appium logger outside of a driver context (say in a script or
helper file), you can always construct your own too:
Further possibilities for Appium drivers¶
These are things your driver can do to take advantage of extra driver features or do its job more conveniently.
Add a schema for custom command line arguments¶
You can add custom CLI args if you want your driver to receive data from the command line when the Appium server is started (for example, ports that a server administrator should set that should not be passed in as capabilities.
To define CLI arguments (or configuration properties) for the Appium server, your extension must provide a schema. In
the appium
property of your extension's package.json
, add a schema
property. This will either
a) be a schema itself, or b) be a path to a schema file.
The rules for these schemas:
- Schemas must conform to JSON Schema Draft-07.
- If the
schema
property is a path to a schema file, the file must be in JSON or JS (CommonJS) format. - Custom
$id
values are unsupported. To use$ref
, provide a value relative to the schema root, e.g.,/properties/foo
. - Known values of the
format
keyword are likely supported, but various other keywords may be unsupported. If you find a keyword that is unsupported which you need to use, please ask for support or send a PR! - The schema must be of type
object
({"type": "object"}
), containing the arguments in aproperties
keyword. Nested properties are unsupported.
Example:
{
"type": "object",
"properties": {
"test-web-server-port": {
"type": "integer",
"minimum": 1,
"maximum": 65535,
"description": "The port to use for the test web server"
},
"test-web-server-host": {
"type": "string",
"description": "The host to use for the test web server",
"default": "sillyhost"
}
}
}
The above schema defines two properties which can be set via CLI argument or configuration file. If
this extension is a driver and its name is "horace", the CLI args would be
--driver-horace-test-web-server-port
and --driver-horace-test-web-server-host
, respectively.
Alternatively, a user could provide a configuration file containing:
{
"server": {
"driver": {
"horace": {
"test-web-server-port": 1234,
"test-web-server-host": "localhorse"
}
}
}
}
Add driver scripts¶
Sometimes you might want users of your driver to be able to run scripts outside the context of
a session (for example, to run a script that pre-builds aspects of your driver). To support this,
you can add a map of script names and JS files to the scripts
field within your Appium extension
metadata. So let's say you've created a script in your project that lives in a scripts
directory
in your project, named driver-prebuild.js
. Then you could add a scripts
field like this:
Now, assuming your driver is named mydriver
, users of your driver can run appium driver run
mydriver prebuild
, and your script will execute.
Proxy commands to another WebDriver implementation¶
A very common design architecture for Appium drivers is to have some kind of platform-specific WebDriver implementation that the Appium driver interfaces with. For example, the Appium UiAutomator2 driver interfaces with a special (Java-based) server running on the Android device. In webview mode, it also interfaces with Chromedriver.
If you find yourself in this situation, it is extremely easy to tell Appium that your driver is just going to be proxying WebDriver commands straight to another endpoint.
First, let Appium know that your driver can proxy by implementing the canProxy
method:
Next, tell Appium which WebDriver routes it should not attempt to proxy (there often end up being certain routes that you don't want to forward on):
The proxy avoidance list should be an array of arrays, where each inner array has an HTTP method as
its first member, and a regular expression as its second. If the regular expression matches the
route, then the route will not be proxied and instead will be handled by your driver. In this
example, we are avoiding proxying all POST
routes that have the appium
prefix.
Next, we have to set up the proxying itself. The way to do this is to use a special class from
Appium called JWProxy
. (The name means "JSON Wire Proxy" and is related to a legacy
implementation of the protocol). You'll want to create a JWProxy
object using the details required to
connect to the remote server:
// import {JWProxy} from 'appium/driver';
const proxy = new JWProxy({
server: 'remote.server',
port: 1234,
base: '/',
});
this.proxyReqRes = proxy.proxyReqRes.bind(proxy);
this.proxyCommand = proxy.command.bind(proxy);
Here we are creating a proxy object and assigning some of its methods to this
under the names
proxyReqRes
and proxyCommand
. This is required for Appium to use the proxy, so don't forget
this step! The JWProxy
has a variety of other options which you can check out in the source code,
as well. (TODO: publish options as API docs and link here).
Finally, we need a way to tell Appium when the proxy is active. For your driver it might always
be active, or it might only be active when in a certain context. You can define the logic as an
implementation of proxyActive
:
With those pieces in play, you won't have to reimplement anything that's already implemented by the remote endpoint you're proxying to. Appium will take care of all the proxying for you.
Proxy BiDi commands to another BiDi implementation¶
All of the above about proxying WebDriver commands is conceptually also valid for proxying BiDi commands specifically. In order to enable BiDi proxying, you need to:
- Set the
doesSupportBidi
field on your driver instances totrue
. - Implement
get bidiProxyUrl
on your driver. This should return a Websocket URL which is the address of the upstream socket you want BiDi commands to be proxied to.
The intended pattern here is for you to start a session on the upstream implementation, check
whether it has an active BiDi socket in the returned capabilities (e.g., the webSocketUrl
capability), and then to set an internal field to that value, so that it can be returned by get
bidiProxyUrl
. Once all this is in place, Appium will proxy BiDi commands from the client straight
to the upstream connection.
Extend the existing protocol with new commands¶
You may find that the existing commands don't cut it for your driver. If you want to expose behaviours that don't map to any of the existing commands, you can create new commands in one of two ways:
- Extending the WebDriver protocol and creating client-side plugins to access the extensions
- Overloading the Execute Script command by defining Execute Methods
If you want to follow the first path, you can direct Appium to recognize new methods and add them
to its set of allowed HTTP routes and command names. You do this by assigning the newMethodMap
static variable in your driver class to an object of the same form as Appium's routes.js
object.
For example, here is the newMethodMap
for the FakeDriver
example driver:
static newMethodMap = {
'/session/:sessionId/fakedriver': {
GET: {command: 'getFakeThing'},
POST: {command: 'setFakeThing', payloadParams: {required: ['thing']}},
},
'/session/:sessionId/fakedriverargs': {
GET: {command: 'getFakeDriverArgs'},
},
};
In this example we're adding a few new routes and a total of 3 new commands. For more examples of
how to define commands in this way, it's best to have a look through routes.js
. Now all you need
to do is implement the command handlers in the same way you'd implement any other Appium command.
The downside of this way of adding new commands is that people using the standard Appium clients won't have nice client-side functions designed to target these endpoints. So you would need to create and release client-side plugins for each language you want to support (directions or examples can be found at the relevant client docs).
An alternative to this way of doing things is to overload a command which all WebDriver clients
have access to already: Execute Script. Appium provides some a convenient tool for making this
easy. Let's say you are building a driver for stereo system called soundz
, and you wanted to
create a command for playing a song by name. You could expose this to your users in such a way that
they call something like:
// webdriverio example. Calling webdriverio's `executeScript` command is what trigger's Appium's
// Execute Script command handler
driver.executeScript('soundz: playSong', [{song: 'Stairway to Heaven', artist: 'Led Zeppelin'}]);
Then in your driver code you can define the static property executeMethodMap
as a mapping of
script names to methods on your driver. It has the same basic form as newMethodMap
, described
above. Once executeMethodMap
is defined, you'll also need to implement the Execute Script command
handler, which according to Appium's routes mapping is called execute
. The implementation can
call a single helper function, this.executeMethod
, which takes care of looking at the script and
arguments the user sent in and routing it to the correct custom handler you've defined. Here's an
example:
static executeMethodMap = {
'soundz: playSong', {
command: 'soundzPlaySong',
params: {required: ['song', 'artist'], optional: []},
}
}
async soundzPlaySong(song, artist) {
// play the song based on song and artist details
}
async execute(script, args) {
return await this.executeMethod(script, args);
}
A couple notes about this system:
1. The arguments array sent via the call to Execute Script must contain only zero or one element(s). The
first item in the list is considered to be the parameters object for your method. These parameters
will be parsed, validated, and then applied to your overload method in the order specified in
executeMethodMap
(the order specified in the required
parameters list, followed by the
optional
parameters list). I.e., this framework assumes only a single actual argument sent in via
Execute Script (and this argument should be an object with keys/values representing the
parameters your execute method expects).
1. Appium does not automatically implement execute
(the Execute Script handler) for you. You may
wish, for example, to only call the executeMethod
helper function when you're not in proxy
mode!
1. The executeMethod
helper will reject with an error if a script name doesn't match one of the
script names defined as a command in executeMethodMap
, or if there are missing parameters.
Build Appium Doctor checks¶
Your users can run appium driver doctor <driverName>
to run installation and health checks. Visit
the Building Doctor Checks guide for more information on this
capability.
Implement handling of Appium settings¶
Appium users can send parameters to your driver via CLI args as well as via capabilities. But these cannot change during the course of a test, and sometimes users want to adjust parameters mid-test. Appium has a Settings API for this purpose.
To support settings in your own driver, first of all define this.settings
to be an instance of
the appropriate class, in your constructor:
Now, you can read user settings any time simply by calling this.settings.getSettings()
. This will
return a JS object where the settings names are keys and have their corresponding values.
If you want to assign some default settings, or run some code on your end whenever settings are updated, you can do both of these things as well.
constructor() {
const defaults = {setting1: 'value1'};
this.settings = new DeviceSettings(defaults, this.onSettingsUpdate.bind(this));
}
async onSettingsUpdate(key, value) {
// do anything you want here with key and value
}
Emit BiDi events¶
With the WebDriver BiDi protocol, clients can subscribe to arbitrary events which can be sent
asynchronously to the client over the BiDi socket connection. As an Appium driver author you don't
need to worry about event subscription. If you want to emit an event with a certain method name and
payload, it's as easy as using the built-in event emitter with the bidiEvent
event.
As an
example, let's say our driver wants to periodically emit CPU load information. We could define an
event called system.cpu
, and a payload that looks like {load: 0.97}
to signify 97% CPU usage.
Whenever we want, our driver can simply call the following code (assuming we have the current load
in this.currentCpuLoad
):
Now, if the client has subscribed to the system.cpu
event, it will be notified with the load
whenever the driver emits it.
Make itself aware of resources other concurrent drivers are using¶
Let's say your driver uses up some system resources, like ports. There are a few ways to make sure that multiple simultaneous sessions don't use the same resources:
- Have your users specify resource IDs via capabilities (
appium:driverPort
etc) - Just always use free resources (find a new random port for each session)
- Have each driver express what resources it is using, then examine currently-used resources from other drivers when a new session begins.
To support this third strategy, you can implement get driverData
in your driver to return what
sorts of resources your driver is currently using, for example:
Now, when a new session is started on your driver, the driverData
response from any other
simultaneously running drivers (of the same type) will also be included, as the last parameter of
the createSession
method:
You can dig into this driverData
array to see what resources other drivers are using to help
determine which ones you want to use for this particular session.
Warning
Be careful here, since driverData
is only passed between sessions of a single running Appium
server. There's nothing to stop a user from running multiple Appium servers and requesting your
driver simultaneously on each of them. In this case, you won't be able to ensure independence
of resources via driverData
, so you might consider using file-based locking mechanisms or
something similar.
Warning
It's also important to note you will only receive driverData
for other instances of your
driver. So unrelated drivers also running may still be using some system resources. In general
Appium doesn't provide any features for ensuring unrelated drivers don't interfere with one
another, so it's up to the drivers to allow users to specify resource locations or addresses to
avoid clashes.
Log events to the Appium event timeline¶
Appium has an Event Timing API which allows users to get timestamps for certain server-side events (like commands, startup milestones, etc...) and display them on a timeline. The feature basically exists to allow introspection of timing for internal events to help with debugging or running analysis on Appium driver internals. You can add your own events to the event log:
Simply provide a name for the event and it will be added at the current time, and made accessible as part of the event log for users.
Hide behaviour behind security flags¶
Appium has a feature-flag based security model that allows driver authors
to hide certain features behind security flags. What this means is that if you have a feature you
deem insecure and want to require server admins to opt in to it, you can require that they enable
the feature by adding it to the --allow-insecure
list or turning off server security entirely.
To support the check within your own driver, you can call this.isFeatureEnabled(featureName)
to
determine whether a feature of the given name has been enabled. Or, if you want to simply
short-circuit and throw an error if the feature isn't enabled, you can call
this.assertFeatureEnabled(featureName)
.
Use a temp dir for files¶
If you want to use a temporary directory for files your driver creates that are not important to
keep around between computer or server restarts, you can simply read from this.opts.tmpDir
. This
reads the temporary directory location from @appium/support
, potentially overridden by a CLI
flag. I.e., it's safer than writing to your own temporary directory because the location here plays
nicely with possible user configuration. this.opts.tmpDir
is a string, the path to the dir.
Deal with unexpected shutdowns or crashes¶
Your driver might run into a situation where it can't continue operating normally. For example, it
might detect that some external service has crashed and nothing will work anymore. In this case, it
can call this.startUnexpectedShutdown(err)
with an error object including any details, and Appium
will attempt to gracefully handle any remaining requests before shutting down the session.
If you want to perform some of your own cleanup logic when you encounter this condition, you can
either do so immediately before calling this.startUnexpectedShutdown
, or you can attach a handler
to the unexpected shutdown event and run your cleanup logic "out of band" so to speak:
handler
should be a function which receives an error object (representing the reason for the
unexpected shutdown).