Configuration

The application takes a number of different settings and reads them from environment variables. There are also a small number of settings inside database tables.

Environment variables

component ichnaea.conf.AppComponent

Configuration summary:

Setting

Parser

Required?

LOCAL_DEV_ENV

bool

TESTING

bool

LOGGING_LEVEL

ichnaea.conf.logging_level_parser

ASSET_BUCKET

str

ASSET_URL

str

DB_READONLY_URI

str

Yes

DB_READWRITE_URI

str

Yes

SENTRY_DSN

str

SENTRY_ENVIRONMENT

str

STATSD_HOST

str

STATSD_PORT

int

REDIS_URI

str

Yes

CELERY_WORKER_CONCURRENCY

int

Yes

MAPBOX_TOKEN

str

GEOIP_PATH

str

SECRET_KEY

str

Configuration options:

LOCAL_DEV_ENV
Parser

bool

Default

“false”

Required

No

Whether we are (True) or are not (False) in a local dev environment. There are some things that get configured one way in a developer’s environment and another way in a server environment.

TESTING
Parser

bool

Default

“false”

Required

No

Whether or not we are running tests.

LOGGING_LEVEL
Parser

ichnaea.conf.logging_level_parser

Default

“INFO”

Required

No

Logging level to use. One of CRITICAL, ERROR, WARNING, INFO, or DEBUG.

ASSET_BUCKET
Parser

str

Default

“”

Required

No

name of AWS S3 bucket to store map tile image assets and export downloads

ASSET_URL
Parser

str

Default

“”

Required

No

url for map tile image assets and export downloads

DB_READONLY_URI
Parser

str

Required

Yes

uri for the readonly database; mysql+pymysql://USER:PASSWORD@HOST:PORT/NAME

DB_READWRITE_URI
Parser

str

Required

Yes

uri for the read-write database; mysql+pymysql://USER:PASSWORD@HOST:PORT/NAME

SENTRY_DSN
Parser

str

Default

“”

Required

No

Sentry DSN; leave blank to disable Sentry error reporting

SENTRY_ENVIRONMENT
Parser

str

Default

“”

Required

No

Sentry environment

STATSD_HOST
Parser

str

Default

“”

Required

No

StatsD host; blank to disable StatsD

STATSD_PORT
Parser

int

Default

“8125”

Required

No

StatsD port

REDIS_URI
Parser

str

Required

Yes

uri for Redis; redis://HOST:PORT/DB

CELERY_WORKER_CONCURRENCY
Parser

int

Required

Yes

the number of concurrent Celery worker processes executing tasks

MAPBOX_TOKEN
Parser

str

Default

“”

Required

No

Mapbox API key; if you do not provide this, then parts of the site showing maps will be disabled

GEOIP_PATH
Parser

str

Default

“/home/docs/checkouts/readthedocs.org/user_builds/ichnaea/checkouts/latest/ichnaea/tests/data/GeoIP2-City-Test.mmdb”

Required

No

absolute path to mmdb file for GeoIP lookups

SECRET_KEY
Parser

str

Default

“default for development, change in production”

Required

No

a unique passphrase used for cryptographic signing

Alembic requires an additional item in the environment:

# URI for user with ddl access
SQLALCHEMY_URL=mysql+pymysql://USER:PASSWORD@HOST:PORT/DBNAME

The webapp uses gunicorn which also has configuration.

# Port for gunicorn to listen on
GUNICORN_PORT=${GUNICORN_PORT:-"8000"}

# Number of gunicorn workers to spin off--should be one per cpu
GUNICORN_WORKERS=${GUNICORN_WORKERS:-"1"}

# Gunicorn worker class--use our gevent worker
GUNICORN_WORKER_CLASS=${GUNICORN_WORKER_CLASS:-"ichnaea.webapp.worker.LocationGeventWorker"}

# Number of simultaneous greenlets per worker
GUNICORN_WORKER_CONNECTIONS=${GUNICORN_WORKER_CONNECTIONS:-"4"}

# Number of requests to handle before retiring worker
GUNICORN_MAX_REQUESTS=${GUNICORN_MAX_REQUESTS:-"10000"}

# Jitter to add/subtract from number of requests to prevent stampede
# of retiring
GUNICORN_MAX_REQUESTS_JITTER=${GUNICORN_MAX_REQUESTS_JITTER:-"1000"}

# Timeout for handling a request
GUNICORN_TIMEOUT=${GUNICORN_TIMEOUT:-"60"}

# Python log level for gunicorn logging output: debug, info, warning,
# error, critical
GUNICORN_LOGLEVEL=${GUNICORN_LOGLEVEL:-"info"}

Database

The MySQL compatible database is used for storing configuration and application data.

The webapp service requires a read-only connection.

The celery worker service requires a read-write connection.

Both of them can be restricted to only DML (data-manipulation) permissions as neither need DDL (data-definition) rights.

DDL changes are done using the alembic database migration system.

GeoIP

The web and worker roles need access to a maxmind GeoIP City database in version 2 format. Both GeoLite and commercial databases will work.

Redis

The Redis cache is used as a:

  • classic cache by the web role

  • backend to store rate-limiting counters

  • custom and a worker queuing backend

Sentry

All roles and command line scripts use an optional Sentry server to log application exception data. Set this to a Sentry DSN to enable Sentry or '' to disable it.

StatsD

All roles and command line scripts use an optional StatsD service to log application specific metrics. The StatsD service needs to support metric tags.

The project uses a lot of metrics as further detailed in the metrics documentation.

All metrics are prefixed with a location namespace.

Map tile and download assets

The application can optionally generate image tiles for a data map and public export files available via the downloads section of the website.

These assets are stored in a static file repository (Amazon S3) and made available via a HTTPS frontend (Amazon CloudFront).

Set ASSET_BUCKET and ASSET_URL accordingly.

To access the ASSET_BUCKET, authorized AWS credentials are needed inside the Docker image. See the Boto3 credentials documentation for details.

The development environment defaults to serving map tiles from the web server, and not serving public export files for download.

Mapbox

The web site content uses Mapbox to display a world map. In order to do this, it requires a Mapbox API token. Without a token, the map is not displayed.

You can create an account on their site: https://www.mapbox.com

After you have an account, you can create an API token at: https://account.mapbox.com

Set the MAP_TOKEN configuration value to your API token.

Configuration in the database

API Keys

The project requires API keys to access the locate APIs.

API keys can be any string of up to 40 characters, though random UUID4s in hex representation are commonly used, for example 329694ac-a337-4856-af30-66162bc8187a.

Fallback

You can also enable a fallback location provider on a per API key basis. This allows you to send queries from this API key to an external service if Ichnaea can’t provide a good-enough result.

In order to configure this fallback mode, you need to set the fallback_* columns. For example:

fallback_name: mozilla
fallback_schema: ichnaea/v1
fallback_url: https://location.services.mozilla.com/v1/geolocate?key=some_key
fallback_ratelimit: 10
fallback_ratelimit_interval: 60
fallback_cache_expire: 86400

The name can be shared between multiple API keys and acts as a partition key for the cache and rate limit tracking.

The schema can be one of NULL, ichnaea/v1, combain/v1, googlemaps/v1 or unwiredlabs/v1.

NULL and ichnaea/v1 are currently synonymous. Setting the schema to one of those means the external service uses the same API as the geolocate v1 API used in Ichnaea.

If you set the url to one of the unwiredlabs endpoints, add your API token as an anchor fragment to the end of it, so instead of:

https://us1.unwiredlabs.com/v2/process.php

you would instead use:

https://us1.unwiredlabs.com/v2/process.php#my_secret_token

The code will read the token from here and put it into the request body.

Note that external services will have different terms regarding caching, data collection, and rate limiting.

If the external service allows caching their responses on an intermediate service, the cache_expire setting can be used to specify the number of seconds the responses should be cached. This can avoid repeated calls to the external service for the same queries.

The rate limit settings are a combination of how many requests are allowed to be send to the external service. It’s a “number” per “time interval” combination. In the above example, 10 requests per 60 seconds.

Export Configuration

Ichnaea supports exporting position data that it gets via the APIs to different export targets. This configuration lives in the export_config database table.

Currently three different kinds of backends are supported:

  • s3: Amazon S3 buckets

  • internal: Ichnaea’s internal data processing pipeline which creates/ updates position data using new position information

  • geosubmit: submitting position information to an HTTP POST endpoint in geosubmit v2 format

The type of the target is determined by the schema column of each entry.

All export targets can be configured with a batch setting that determines how many reports have to be available before data is submitted to the backend.

All exports have an additional skip_keys setting as a set of API keys. Data submitted using one of these API keys will not be exported to the target.

There can be multiple instances of the bucket and HTTP POST export targets in export_config, but only one instance of the internal export.

Here’s the SQL for setting up an “internal” export target:

INSERT INTO export_config
(`name`, `batch`, `schema`) VALUES ("internal test", 1, "internal");

For a production setup you want to set the batch column to something like 100 or 1000 to get more efficiency. For initial testing its easier to set it to 1 so you immediately process any incoming data.

S3 Bucket Export (s3)

The schema column must be set to s3.

The S3 bucket export target combines reports into a gzipped JSON file and uploads them to the specified bucket url, for example:

s3://amazon_s3_bucket_name/directory/{source}{api_key}/{year}/{month}/{day}

The url can contain any level of additional static directories under the bucket root. The {api_key}/{year}/{month}/{day} parts will be dynamically replaced by the api_key used to upload the data, the source of the report (e.g. gnss) and the date when the backup took place. The files use a random UUID4 as the filename.

An example filename might be:

/directory/test/2015/07/15/554d8d3c-5b28-48bb-9aa8-196543235cf2.json.gz

Internal Export (internal)

The schema column must be set to internal.

The internal export target forwards the incoming data into the internal data pipeline.

HTTP Export (geosubmit)

The schema column must be set to geosubmit.

The HTTP export target buffers incoming data into batches of batch size and then submits them using the Geosubmit Version 2: /v2/geosubmit API to the specified url endpoint.

If the project is taking in data from a partner in a data exchange, the skip_keys setting can be used to prevent data being round tripped and sent back to the same partner that it came from.