fbpx
Skip to content
  • Investing Insights
  • smallcase Rationales
  • Inside smallcase
  • Postweek Reports
  • Subscribe

Field-level encryption in MongoDB community server, using Node JS and Mongoose

share: Icon-Whatsapp Icon-Twitter
Subscribe
Inside smallcase ,  

Field-level encryption in MongoDB community server, using Node JS and Mongoose

Author Tejas Agrawal
Published July 11, 2021
Share
Icon-Facebook Icon-Twitter Icon-Email
Collections Engineering
Reading Time: 5 minutes

Field Level Encryption (FLE)

Simply put, it’s a kind of encryption where we encrypt specific columns or fields in the database, instead of encrypting the whole table or document.

Unlike Encryption at rest, FLE does not encrypt the whole database. 

Using Encryption at rest allows people with enough authentication to bypass the security check and access the data. These people could be

  • DBA
  • A third-party provider which hosts the MongoDB cluster
  • A third-party data analytics firm that has access to data that includes private, personal, or confidential information

This risk is mitigated by FLE, where we store the encrypted data in DB.

Automatic FLE in MongoDB

Automatic FLE is available only in enterprise servers with version 4.2 or later.

How does this work?

MongoDB Enterprise provides a service called `mongocryptd` which sits between application and DB.

This service is used to automate the encryption and decryption process.

mongocryptd uses the provided KMS to fetch the encryption keys and parses the JSON schema defined in the collection to encrypt the required fields.

This saves the overhead of handling encryption at the application level.

Image taken from MongoDB Documentation

How to handle FLE in community server with Mongoose

When using FLE in the community server, we need to handle encryption and decryption at the application level.
For this, we need to define a standard and secure encryption and decryption algorithm.

Defining getters and setters on fields

Mongoose getters and setters allow you to execute custom logic when getting or setting a property on a Mongoose document. Getters let you transform data in MongoDB into a more user-friendly form, and setters let you transform user data before it gets to MongoDB.

const mongoose = require('mongoose');
const Schema = mongoose.Schema;
const { encrypt, decrypt } = require('./cipher');

const userSchema = new Schema(
    {
        name: String,
        phone: { type: String, set: encrypt, get: decrypt },
        email: { type: String, set: encrypt, get: decrypt },
    },

    {
        versionKey: false,
    }
);

const User = mongoose.model('users', userSchema, 'users');
module.exports = User;

We need to add some parameters in the schema, which will tell mongoose to use the getters and setters every time we do a query.

const mongoose = require('mongoose');
const Schema = mongoose.Schema;
const { encrypt, decrypt } = require('./cipher');

const userSchema = new Schema(
    {
        name: String,
        phone: { type: String, set: encrypt, get: decrypt },
        email: { type: String, set: encrypt, get: decrypt },
    },

    {
        versionKey: false,

        // Following options will enable us to use getters and setters on almost all queries
        toObject: { getters: true, setters: true },
        toJSON: { getters: true, setters: true },
        runSettersOnQuery: true,
    }
);

const User = mongoose.model('users', userSchema, 'users');
module.exports = User;

Writing a document

var user = new User({
     name: 'Test User',
     email: 'sample@example.com',
     phone: '9999999999',
 });
 user.save()
Encrypted values of data are stored in DB

Fetching the Document

User.findOne({ email: "sample@example.com" });
Notice that we did not have to search the email with its encrypted value, because that will be taken care by the runSettersOnQuery parameter passed in the schema

Problem with this approach

  • As you might have guessed, for data to be encrypted or decrypted, it needs to go through the getters and setters of mongoose model.
  • This does not happen in 2 cases
    • find queries with lean
    • aggregation queries
  • In both of the cases, the JSON data is directly returned from MongoDB, without being converted into mongoose model data type, and hence getter function is not executed.

lean()

User.findOne({ email: "sample@example.com" }).lean();
Data was not decrypted when lean was used

aggregate()

User.aggregate([
    {
        $match: {
            email: 'sample@example.com',
        },
    },
    {
        $project: {
            name: 1,
            phone: 1,
        },
    },
]);

The result of the above query is empty.

Solution 

lean()

We need to make sure that the getter function defined in the schema is called every time we use lean

An npm package mongoose-lean-getter can be used to achieve this

A parameter needs to be passed to lean, to invoke the package

The plugin is used like following,

const mongoose = require('mongoose');
const Schema = mongoose.Schema;
const { encrypt, decrypt } = require('./cipher');

// Adding the package
const mongooseLeanGetter = require('mongoose-lean-getters');


const userSchema = new Schema(
    {
        name: String,
        phone: { type: String, set: encrypt, get: decrypt },
        email: { type: String, set: encrypt, get: decrypt },
    },

    {
        versionKey: false,
        toObject: { getters: true, setters: true },
        toJSON: { getters: true, setters: true },
        runSettersOnQuery: true,
    }
);

// Using the package
userSchema.plugin(mongooseLeanGetter);

const User = mongoose.model('users', userSchema, 'users');
module.exports = User;

Query to the collection will look like the following

User.find({ email: "sample@example.com" }).lean({ 
    getters: true 
});
After using mongoose-lean-getters, data is decrypted

aggregate()

We need to manually encrypt or decrypt the aggregation queries at 2 points

  • Entry point
  • After getting the result

Entry Point

If we are using filter operation in the aggregation pipeline where we want to match an encrypted field, we need to encrypt the email and then do the search, like follow

User.aggregate([
    {
        $match: {
            email: 'f373a715d2b545f3f78422f64293539:0431e1ceb0c9373a9e30313c55306c6e2cdf32f5bc1b00b686c468f50fdd2a81',
        },
    },
    {
        $project: {
            name: 1,
            phone: 1,
        },
    },
]);

After getting the result

We need to manually decrypt the data returned in the above query

phone was not decrypted

Caveats

  • Aggregation queries need to be handled separately at each instance.
  • Reason being, 
    • When we use aggregation, we might project the fields with some other name than that defined in the schema
    • If there is a deeply nested array in the result, we need to recursively traverse the array and check the fields that need to be decrypted. This traversal causes a significant performance hit

Benchmarking

  • Process
    • Query –
      User.findOne({ email: "sample@example.com" })
      .lean()
    • Used npm package autocannon
    • Created API to perform the query
    • Hit the API from 5 nodes, 100 requests per node
  • Git repo for this can be found here
Without Encryption
With Encryption and using mongoose-lean-getters plugin
  • Performance hit at
    • 99 percentile = ~7%
    • Avg = ~12%

Conclusion

  • Automatic FLE in MongoDB is only available in Enterprise Server with version 4.2 or higher
  • In the community server of MongoDB, FLE needs to be implemented at the application level
    When using Node JS and Mongoose ORM, this can be achieved by using
    • Mongoose getters and setters
    • mongoose-lean-getter
  • Aggregation queries need to be handled separately at every instance
  • There is an expected performance hit when we introduce FLE in the application

Author

  • Tejas Agrawal

    View all posts

smallcasesmallcase engineering
Icon-Facebook Icon-Twitter
Download App

Tejas Agrawal

You may want to read

​

Failover & Recovery with Repmgr in PostgreSQL 11

Configuring replication for databases is the best strategy towards achieving high availability. PostgreSQL streaming replication using Repmgr can satisfy this...

Micro-frontend in smallcase

Automation using Bull Queues to face bear market

  • Previous postInflation fears pull down markets
  • Next postThe Good Bad and Ugly weekly review Jul 09 2021

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

You must be logged in to post a comment.

Welcome back to smallcase blog

New here? Create an account

Forget password
or sign in with

Sign in with Google

Register for this site!

Sign up now for the good stuff.

Lose something?

Enter your username or email to reset your password.

or sign in with

Sign in with Google

Your subscriptions

Weekly wrapup of all investment news and alerts from the markets

Lost your password?
  • smallcase – Invest / SIP in stock portfolios
  • About
  • Disclaimer
  • Twitter