Understanding GraphQL

November 27th, 2019

This was published on WeMatch tech blog.

If you are a web developer, you probably have heard of or know about GraphQL.

GraphQL is a query language for your API, and a server-side runtime for executing queries by using a type system you define for your data.

In 2012, GraphQL was developed internally by Facebook to power billions of requests in a day. It was open-sourced in 2015 and now in 2019, it has become a strong alternative to REST. Aside from Facebook, many big tech companies such as Netflix, Airbnb, and PayPal have adopted GraphQL and are using it in their production.

Where does WeMatch come into this? During a project kick-off meeting last summer, I suggested using GraphQL for our new API server. Surprisingly, my suggestion was greenlit and WeMatch officially hopped on the hype train. This blog post will address learnings from adopting GraphQL.

Here are some interesting GraphQL characteristics:

Client only fetches what it needs
Request multiple resources in a single query
A single endpoint for all requests
Versionless API

Imagine there is an endpoint GET /people/1 which returns data like this:

{
  "id": 1,
  "name": "Luke Skywalker",
  "height": "172",
  "mass": "77",
  "hair_color": "blond",
  "skin_color": "fair",
  "eye_color": "blue",
  "birth_year": "19BBY",
  "gender": "male"
}

Pretend there are multiple clients making requests to this endpoint. Client A needs all information whereas Client B only needs to know a person's name and gender. Due to how this API is structured, Client B is over-fetching data every time it makes a request. This may not matter when the payload is this small, but what about when clients need to fetch larger data? What happens when Client B needs to request GET /people? This is when GraphQL comes in handy. Client B can request to /graphql with a query like below.

query {
  people {
    name
    gender
  }
}

The query would return data in this format:

{
  "data": [
    {
      "name": "Luke Skywalker",
      "gender": "male"
    },
    {
      "name": "Leia Organa",
      "gender": "female"
    }
  ]
}

As you can see, the backend does not have to create custom endpoints for various types of data. GraphQL optimizes the payload by allowing clients to make specific requests.

Let's think of a different scenario. Imagine a client needs data on two different resources: people and starships. In the case of REST, the client would have to make two separate requests: GET /people?name=Luke+Skywalker and GET /starships?name=Death+Star. With GraphQL, the client can make a single request to /graphql to achieve the same result.

query {
  people(name: "Luke Skywalker") {
    name
    gender
  }
  starships(name: "Death Star") {
    name
    model
    manufacturer
  }
}

By now you may have noticed that I've used one endpoint /graphql for all queries. For REST, a backend developer has to come up with many endpoint designs and names. Some may prefer using query parameters and others might not. Even if a team has a strict set of rules for API designs and naming conventions, not every client will be satisfied with the endpoint and a frontend developer may ask for the endpoint name to be changed. Using GraphQL eliminates this issue.

Versioning is necessary for REST because API always evolves, especially when you have many clients. Each client may need a specific version of the API, and adding new features to APIs can be considered a breaking change. A backend developer could use a custom request header, accept header, or URL methods like below to version APIs.

https://v1.api.example.com/users
https://v2.api.example.com/users
https://v3.api.example.com/users

https://auth.example.com/v1/token
https://auth.example.com/v2/token
https://auth.example.com/v3/token

On the other hand, GraphQL makes it possible to avoid versioning, because a client selects explicitly what is to be returned. It is also possible to deprecate a field by updating a schema. A client receives a deprecation warning instead of an error when querying the said field.

GraphQL takes a strong opinion on avoiding versioning by providing the tools for the continuous evolution of a GraphQL schema.

Nothing is a silver bullet and GraphQL does come with caveats as well. Because GraphQL's endpoint serves many resources and each query can be different, caching is difficult to achieve. There are libraries such as DataLoader to help with this, albeit not a complete solution.

Another issue is query performance. Enabling a client to request any number of resources can be both a pro and a con. Imagine a client querying 10 resources at once. Each resource may contain thousands of data. Now multiply that by X. While GraphQL gives more power and flexibility, it can also mean a server may slow down or even crash if thousands of queries are made.

Lastly, GraphQL handles errors in an interesting way. It always returns "200 OK" status code for bad queries. Below is an example of an error while processing a query.

# 200 OK
{
      "errors": [
        {
          "message": "Cannot query \"email\" on type \"UserNode\".",
          "locations": [
            {
              "line": 4,
              "column": 21
            }
          ]
        }
      ]
    }

This response and the lack of a useful status code are not too user friendly, and to improve this, many GraphQL community members recommend adding an error field to the schema and returning a normal payload with the error field in the response.

return {
  error: {
    id: '1',
    type: 'tokenExpiry',
    title: 'Expired Access Token',
    message: 'Access token has expired. Renew token',
  }
}

All in all, adopting a completely new architecture is more difficult and slower than taking the usual route. It can be a costly bargain that may not pay off well. GraphQL is still a relatively new language compared to the tried and trusted REST and its cons may not be worth it to change your service's architecture. While I cannot deny the growing pains, it has been a positive change for WeMatch :)

Thanks for reading!