Configuring Apache Superset 3 in a production environment

October 11, 2023
superset

Apache Superset proves to be a fantastic Business Intelligence tool: open-source, feature-rich (with a multitude of charts, integration with multiple database management systems, etc.), and it has nothing to envy in comparison to well-known commercial alternatives (Tableau, PowerBI, BusinessObjects, and the like).

Unfortunately, being fantastic doesn’t mean it’s without flaws, particularly in terms of documentation, which may seem a bit sparse to some.

It is precisely to address this shortcoming that I propose we look at how to quickly set up a Superset instance tailored for production together.

Prerequisites

Installation

Cloning the superset repo

Start by cloning the repository directly on your production server (yes, this is not common):

cd $HOME
git clone https://github.com/apache/superset.git
cd superset

Then select the desired version via its tag.

git checkout tags/3.0.0

Editing configuration files

Now comes the step of customizing certain configuration files. Since you are on a git tag, you won’t be able to commit your modifications as is, so you have the choice of:

It’s up to you.

The docker/.env-non-dev file (understand here: non-dev = prod) allows you to define a set of environment variables that will be used by the docker containers we will start later.

Add the following:

# We don't want demo data in production
SUPERSET_LOAD_EXAMPLES=no
# A random string to encode session cookies
SUPERSET_SECRET_KEY=4Sido8BkIjs54Vz2XyVD5GJIvANVIAT399dRESjdmr4vm92n
# To prevent XSS attacks (among other things)
TALISMAN_ENABLED=yes
# Number of workers: the higher the value, the fewer intermittent chart refresh failures you will have in your dashboards (adjust according to your server's power).
SERVER_WORKER_AMOUNT=64

Also, make some adjustments in docker/pythonpath_dev/superset_config.py to enable alerting and the template engine (necessary for creating datasets with dynamic filtering):

FEATURE_FLAGS = {"ALERT_REPORTS": True, "ENABLE_TEMPLATE_PROCESSING": True}

Disable telemetry by replacing in the docker-compose-non-dev.yml file:

x-superset-image: &superset-image apachesuperset.docker.scarf.sh/apache/superset:${TAG:-latest-dev}

with

x-superset-image: &superset-image apache/superset:${TAG:-latest-dev}

And switch to the latest stable version of postgreSQL by replacing:

     image: postgres:14

with

     image: postgres:16

Startup

Instantiate and start the docker containers:

docker compose -f docker-compose-non-dev.yml up -d

Superset is accessible on your production server via http://127.0.0.1:8088.

Reverse proxy

Configure a reverse proxy to secure the connection, for example, using caddy:

Edit /etc/caddy/Caddyfile

bi.myawesomecompany.com {
        reverse_proxy http://127.0.0.1:8088
}

Then restart caddy:

sudo service caddy restart

First Login

Log in at https://bi.myawesomecompany.com with the username / password: admin / admin

Login page

Change your password.

Backup and Restoration

When editing the docker-compose-non-dev.yml configuration file, you may have noticed that a postgresql database is being instantiated.

Therefore, backup and restoration for superset only need to consider this database.

You can perform a hot backup with a simple command:

docker exec -t superset_db pg_dump superset -U superset | xz > backup.sql.xz

For restoration, start only the postgresql container in advance, avoiding having a superset instance connected to a database being restored:

docker compose down
docker compose -f docker-compose-non-dev.yml up db -d
docker exec -t superset_db dropdb -U superset superset
docker exec -t superset_db createdb -U superset superset
xz -dc backup.sql.xz | docker exec -i superset_db psql -U superset -d superset
docker compose -f docker-compose-non-dev.yml up -d