Encrypting Data At Rest

Encryption Levels

Version 2.0 to 2.3. Copyright © 2019 TigerGraph. All Rights Reserved.

The TigerGraph graph data store uses a proprietary encoding scheme which both compresses the data and obscures the data unless the user knows the encoding/decoding scheme. In addition, the TigerGraph system supports integration with industry-standard methods for encrypting data when stored in disk ("data at rest").

Data at rest encryption can be applied at many different levels. A user can choose to use one or more level.

Kernel-level Encryption

File system encryption employs advanced encryption algorithms. Some tools allow the user to select from a menu of encryption algorithms. It can be done either in kernel mode or user mode. To run in kernel mode, superuser permission is required.

Since Linux 2.6, device-mapper has been an infrastructure, which provides a generic way to create virtual layers of block devices with transparent encryption blocks using the kernel crypto API.

In Ubuntu, full-disk encryption is an option during the OS installation process. For other Linux distributions, the disk can be encrypted with dm-encrypt .

A commonly used utility is eCryptfs , which is licensed under GPL, and it is built into some kernels, such as Ubuntu.

User-Level Encryption

If root privilege is not available, a workaround is to use FUSE (Filesystem in User Space) to create a user-level filesystem running on top of the host operating system. While the performance may not be as good as running in kernel mode, there are more options available for customization and tuning.

Example 1: Kernel-mode file system encryption with dm-crypt

In this example, we use dm-crypt to provide kernel-mode file system encryption. The dm-crypt utility is widely available and offers a choice of encryption algorithms. It also can be set to encrypt various units of storage – full disk, partitions, logical volumes, or files.

The basic idea of this solution is to create a file, map an encrypted file system to it, and mount it as a storage directory for TigerGraph with R/W permission only to authorized users.

Prerequisites

Before you start, you will need a Linux machine on which

  • you have root permission,

  • the TigerGraph system has not yet been installed,

  • and you have sufficient disk space for the TigerGraph data you wish to encrypt. This may be on your local disk or on a separate disk you have mounted.

Instructions

  • Install cryptsetup (cryptsetup is included with Ubuntu, but other OS users may need to install it with yum).

  • Install the TigerGraph system.

  • Grant sudo privilege to the TigerGraph OS user.

  • Stop all TigerGraph services with the following commands: gadmin stop -y gadmin stop admin -y

  • Acting as the tigergraph OS user, run the following export commands to set variables. Replace the placeholders enclosed in angle brackets <...> with the values of your choice:

  • Create a file for TigerGraph data storage.

  • Change the permission of the file so that only the owner of the file (that is, only the tigergraph user who created the file in the previous step) will be able to access it:

  • Associate a loopback device with the file:

  • Encrypt storage in the device. cryptsetup will use the Linux device mapper to create, in this case, $encrypted_file_path . Initialize the volume and set a password interactively with the password you set to $encryption_password :

If you are trying to automate the process with a script running with root TTY session , you may use the following command:

  • Open the partition, and create a mapping to $encrypted_file_path :

If you are trying to automate the process with a script running with root TTY session , you may use the following command:

  • Clear the password from bash variables and bash history.

  • Create a file system and verify its status:

  • Mount the new file system to /mnt/secretfs:

  • Change the permission to 700 so that only $db_user has access to the file system:

  • Move the original TigerGraph files to the encrypted filesystem and make a symbolic link. If you wish to encrypt only the TigerGraph data store (called gstore), use the following commands:

There are other TigerGraph files which you might also consider to be sensitive and wish to encrypt. These include the dictionary, kafka data files, and log files. You could selectively identify files to protect or you could encrypt the entire TigerGraph folder. In this case, simply move $tigergraph_root instead of $tigergraph_root/gstore.

The data of TigerGraph data is now stored in an encrypted filesystem. It will be automated decrypted when the tigergraph user (and only this user) accesses it.

To automatically deploy this encryption solution, you may

  1. Chain all the steps as a bash script

  2. Remove all "sudo" since the script will be running as root.

  3. Run the script as root user after TigerGraph Installation.

Performance Evaluation

Encryption is usually CPU-bound rather than I/O-bound. If CPU usage reamains below 100%, encryption should not cause much performance slowdown. A performance test using both small and large queries supports this prediction: for small (~1 sec) and large (~100 sec) queries, there is a ~5% slowdown due to filesystem encryption.

We used the TPC-H dataset with scale factor 10 ( http://www.tpc.org/tpch/ ). The data size is 23GB after loading into TigerGraph..The write test (data loading) was done by running a loading job and then killing the GPE with SIGTERM (to exit gracefully) to ensure that all kafka data is consumed.The read test (GSE cold start) measures the time from "gadmin start gse" until "online" appears in "gadmin status gse".

Example 2: Encrypting Data on Amazon EC2

Major cloud service providers often provide their own methodologies for encrypting data at rest. For Amazon EC2, we recommend users start by reading the AWS Security Blog: How to Protect Data at Rest with Amazon EC2 Instance Store Encryption .

In this section, we provide a simple example for configuring file system encryption for a TigerGraph running on Amazon EC2. The steps are based on those given in How to Protect Data at Rest with Amazon EC2 Instance Store Encryption , with some additions and modifications.

The basic idea of this solution is to create a file, map an encrypted file system to it, and mount it as a storage directory for TigerGraph with permission only to authorized users.

Prerequisites

Make sure you have installed and configured AWS CLI with keys locally.

Create an S3 Bucket

Configure IAM roles and permission for the S3 bucket

Create a KMS Key (optional)

If you don't have a KMS key, you can create it first:

  1. From the IAM console , choose Encryption keys from the navigation pane.

  2. Select Create Key , and type in <your-key-alias>

  3. In Step 4 : Define Key Usage Permissions , select <your-role-name>

  4. The role now has permission to use the key.

Encrypt a secret password with KMS and store it in the S3 bucket

Configure EC2 with role and launch configurations

In this section, you launch a new EC2 instance with the new IAM role and a bootstrap script that executes the steps to encrypt the file system.

  1. In the EC2 console , launch a new instance (see this tutorial for more details). Amazon Linux AMI 2017.09.1 (HVM), SSD Volume Type (If NOT using Amazon Linux AMI, a script the installs python, pip and AWS CLI needs to be added in the beginning).

  2. In Step 3: Configure Instance Details

    1. In IAM role , choose <your-role-name>

    2. In User Data , paste the following code block after replacing the placeholders with your values and appending TigerGraph installation script

It may take a few minutes for the script to complete after system launch.

Then, you should be able to launch one or more EC2 machines with an encrypted folder under /mnt/secretfs that only OS user tigergraph can access.

Performance

Encryption is usually CPU-bound rather than I/O bound. If CPU usage is below 100%, TigerGraph tests show no significant performance downgrade.

Copyright (c) 2015-2018 www.tigergraph.com. All rights reserved.