|
min. read

Postgres and Yjs CRDT Collaborative Text Editing, Using PowerSync

Conrad Hofmeyr

One of the strengths of CRDTs is that they are very well-suited to collaborative text editing. A popular example of this is Yjs, a CRDT implementation that has been integrated with a variety of rich text editors.

We’ve previously mentioned that CRDTs can be implemented on top of PowerSync (a Postgres<>SQLite bi-directional sync layer) for exactly such use cases. 

So we’ve been planning to show how Yjs can be used with PowerSync to store the Yjs CRDT data structure in Postgres, and keep it in sync between clients in real-time. Some members of the PowerSync community also asked about this recently (thank you to Andric in particular!).

In response to this interest, today we’re excited to release an initial Yjs + PowerSync demo which uses the Tiptap rich text editor. We built the demo as a React app with the PowerSync Web SDK, and used Supabase as our Postgres backend for simplicity.

Storing CRDT Data in Postgres

CRDT data structures can be stored in SQL databases as binary/blob data. In Postgres, we use the BYTEA column type.

Depending on your data storage, it may be more efficient to have the CRDT data structure be split up to some extent. For example, it’s typically not very efficient to update a 1MB blob in-place in a database row. 

Therefore, for our demo we elected to store a separate database row for each update to the Yjs document. Since CRDTs are commutative, a Yjs document can be constructed from scratch by applying all the individual updates in any order, so this storage approach works well, and it also makes it easy to sync incremental updates between users.

Accordingly, here is the simplified Postgres schema that we used for our demo app:

To prevent an ever-growing list of updates, updates may be merged periodically. This reduces the storage overhead — both in number of updates, and in update size.

PowerSync Sync Rules

PowerSync cannot currently sync BYTEA columns directly from Postgres, so we need to convert the BYTEA column to Base64 in the PowerSync Sync Rules configuration:

SELECT id, document_id, base64(update_data) as update_b64 FROM document_updates

Client-Side Yjs + PowerSync Implementation

Yjs has a concept of Providers, and more specifically Connection Providers (which handle communication between clients) and Persistence Providers (which handle data storage). Unless there are providers already available that suit your needs, implementing and maintaining providers can potentially require a non-trivial engineering effort for a developer wanting to use Yjs.

When PowerSync is used to store Yjs documents and sync document updates between users, it basically combines the Connection Provider and Persistence Provider in one: PowerSync stores all the document data in Postgres on the server-side and in SQLite on the client-side, and syncs it automatically between users (keeping the local SQLite database up to date on each client). This reduces Yjs implementation effort quite a bit.

On the client side, a Yjs document is instantiated in memory and populated by applying document updates that are queried from the local SQLite database. 

We can use PowerSync live queries to continuously watch the local database for document updates, and then apply those to the in-memory Yjs document, as seen in PowerSyncYjsProvider.ts: (therefore, this query takes care of both the initial loading of the document from the local database, as well as receiving real-time updates from other users)

onst seenDocUpdates = new Set<string>();

const updates = db.watch('SELECT * FROM document_updates WHERE document_id = ?', [documentId])

for await (let results of updates) {
  // New data detected in the database
  for (let update of results.rows!._array) {
    // Ignore any updates we've already seen
    if (!this.seenDocUpdates.has(update.id)) {
      this.seenDocUpdates.add(update.id);      
      // apply the update from the database to the doc
      const origin = this;
      Y.applyUpdateV2(doc, b64ToUint8Array(update.update_b64), origin);    
    }
  }
}

A few notes on the above:

  • Yjs uses Uint8Array to represent binary data in JavaScript, so we convert the Base64 data from the database to Uint8Array.
  • Yjs allows a custom ‘origin’ property to be associated with document updates. We use this to keep track of which updates were loaded from the local database vs. updates that originated from the Tiptap text editor as the user makes edits to the document.

The remaining part of the client-side implementation is to watch for new edits to the Yjs document (made by the user via the Tiptap text editor) and then to persist those document updates to the local SQLite database, from where PowerSync will sync them to Postgres and to other clients: (here, we are doing the inverse conversion of Uint8Array to Base64)

doc.on('updateV2', async (update: Uint8Array, origin: any) => {
    if (origin === this) {
      // update originated from the database / PowerSync - ignore
      return;
    }
    // update originated from elsewhere - save to the database
    const docUpdateId = uuidv4();
    this.seenDocUpdates.add(docUpdateId);
    await this.db.execute('INSERT INTO document_updates(id, document_id, update_b64) VALUES(?, ?, ?)', [
      docUpdateId,
      this.documentId,
      Uint8ArrayTob64(update)
    ]);
  }
);

Writing to Postgres

As you may know, PowerSync allows the developer to define their own backend connector — i.e. the developer controls how writes are made to the Postgres database. 

In the SupabaseConnector.ts in our demo app, you will see that for efficiency reasons we batch together the insertion of document updates into the database, using an ‘insert_document_updates’ Postgres function defined in database.sql.

Merging Updates

The more edits are made to a document, the longer the Yjs CRDT update history becomes. We mentioned above that updates can be merged to compact the history. We’ve included a very basic Supabase edge function to show how updates can be merged using Yjs. 

Running the Demo

For a short demo video and for full instructions on how to run the demo, refer to the repo README.

Roadmap

This is currently a fairly simple demo with various opportunities for improvement. We have listed some specific planned improvements under the roadmap section of the README. Feel free to open a PR if there’s anything you’d like to contribute!

Subscribe to receive updates

Thank you! Your submission has been received.
Oops! Something went wrong while submitting the form. Please try again.