Configuration¶

The application takes a number of different settings and reads them from environment variables. There are also a small number of settings inside database tables.

Environment variables
- Database
- GeoIP
- Redis
- Sentry
- StatsD
- Map tile and download assets
- Mapbox
Configuration in the database
- API Keys
  - Fallback
- Export Configuration

Environment variables ¶

component ichnaea.conf.AppComponent¶

Configuration summary:

Setting	Parser	Required?
`LOCAL_DEV_ENV`	bool
`TESTING`	bool
`LOGGING_LEVEL`	ichnaea.conf.logging_level_parser
`ASSET_BUCKET`	str
`ASSET_URL`	str
`DB_READONLY_URI`	str	Yes
`DB_READWRITE_URI`	str	Yes
`SENTRY_DSN`	str
`SENTRY_ENVIRONMENT`	str
`STATSD_HOST`	str
`STATSD_PORT`	int
`REDIS_URI`	str	Yes
`CELERY_WORKER_CONCURRENCY`	int	Yes
`MAPBOX_TOKEN`	str
`GEOIP_PATH`	str
`SECRET_KEY`	str

Configuration options:

LOCAL_DEV_ENV¶

Parser: bool
Default: “false”
Required: No

Whether we are (True) or are not (False) in a local dev environment. There are some things that get configured one way in a developer’s environment and another way in a server environment.

TESTING¶

Parser: bool
Default: “false”
Required: No

Whether or not we are running tests.

LOGGING_LEVEL¶

Parser: ichnaea.conf.logging_level_parser
Default: “INFO”
Required: No

Logging level to use. One of CRITICAL, ERROR, WARNING, INFO, or DEBUG.

ASSET_BUCKET¶

Parser: str
Default: “”
Required: No

name of AWS S3 bucket to store map tile image assets and export downloads

ASSET_URL¶

Parser: str
Default: “”
Required: No

url for map tile image assets and export downloads

DB_READONLY_URI¶

Parser: str
Required: Yes

uri for the readonly database; mysql+pymysql://USER:PASSWORD@HOST:PORT/NAME

DB_READWRITE_URI¶

Parser: str
Required: Yes

uri for the read-write database; mysql+pymysql://USER:PASSWORD@HOST:PORT/NAME

SENTRY_DSN¶

Parser: str
Default: “”
Required: No

Sentry DSN; leave blank to disable Sentry error reporting

SENTRY_ENVIRONMENT¶

Parser: str
Default: “”
Required: No

Sentry environment

STATSD_HOST¶

Parser: str
Default: “”
Required: No

StatsD host; blank to disable StatsD

STATSD_PORT¶

Parser: int
Default: “8125”
Required: No

StatsD port

REDIS_URI¶

Parser: str
Required: Yes

uri for Redis; redis://HOST:PORT/DB

CELERY_WORKER_CONCURRENCY¶

Parser: int
Required: Yes

the number of concurrent Celery worker processes executing tasks

MAPBOX_TOKEN¶

Parser: str
Default: “”
Required: No

Mapbox API key; if you do not provide this, then parts of the site showing maps will be disabled

GEOIP_PATH¶

Parser: str
Default: “/home/docs/checkouts/readthedocs.org/user_builds/ichnaea/checkouts/latest/ichnaea/tests/data/GeoIP2-City-Test.mmdb”
Required: No

absolute path to mmdb file for GeoIP lookups

SECRET_KEY¶

Parser: str
Default: “default for development, change in production”
Required: No

a unique passphrase used for cryptographic signing

Alembic requires an additional item in the environment:

# URI for user with ddl access
SQLALCHEMY_URL=mysql+pymysql://USER:PASSWORD@HOST:PORT/DBNAME

The webapp uses gunicorn which also has configuration.

# Port for gunicorn to listen on
GUNICORN_PORT=${GUNICORN_PORT:-"8000"}

# Number of gunicorn workers to spin off--should be one per cpu
GUNICORN_WORKERS=${GUNICORN_WORKERS:-"1"}

# Gunicorn worker class--use our gevent worker
GUNICORN_WORKER_CLASS=${GUNICORN_WORKER_CLASS:-"ichnaea.webapp.worker.LocationGeventWorker"}

# Number of simultaneous greenlets per worker
GUNICORN_WORKER_CONNECTIONS=${GUNICORN_WORKER_CONNECTIONS:-"4"}

# Number of requests to handle before retiring worker
GUNICORN_MAX_REQUESTS=${GUNICORN_MAX_REQUESTS:-"10000"}

# Jitter to add/subtract from number of requests to prevent stampede
# of retiring
GUNICORN_MAX_REQUESTS_JITTER=${GUNICORN_MAX_REQUESTS_JITTER:-"1000"}

# Timeout for handling a request
GUNICORN_TIMEOUT=${GUNICORN_TIMEOUT:-"60"}

# Python log level for gunicorn logging output: debug, info, warning,
# error, critical
GUNICORN_LOGLEVEL=${GUNICORN_LOGLEVEL:-"info"}

Database ¶

The MySQL compatible database is used for storing configuration and application data.

The webapp service requires a read-only connection.

The celery worker service requires a read-write connection.

Both of them can be restricted to only DML (data-manipulation) permissions as neither need DDL (data-definition) rights.

DDL changes are done using the alembic database migration system.

GeoIP ¶

The web and worker roles need access to a maxmind GeoIP City database in version 2 format. Both GeoLite and commercial databases will work.

Redis ¶

The Redis cache is used as a:

classic cache by the web role
backend to store rate-limiting counters
custom and a worker queuing backend

Sentry ¶

All roles and command line scripts use an optional Sentry server to log application exception data. Set this to a Sentry DSN to enable Sentry or '' to disable it.

StatsD ¶

All roles and command line scripts use an optional StatsD service to log application specific metrics. The StatsD service needs to support metric tags.

The project uses a lot of metrics as further detailed in the metrics documentation.

All metrics are prefixed with a location namespace.

Map tile and download assets ¶

The application can optionally generate image tiles for a data map and public export files available via the downloads section of the website.

These assets are stored in a static file repository (Amazon S3) and made available via a HTTPS frontend (Amazon CloudFront).

Set ASSET_BUCKET and ASSET_URL accordingly.

To access the ASSET_BUCKET, authorized AWS credentials are needed inside the Docker image. See the Boto3 credentials documentation for details.

The development environment defaults to serving map tiles from the web server, and not serving public export files for download.

Mapbox ¶

The web site content uses Mapbox to display a world map. In order to do this, it requires a Mapbox API token. Without a token, the map is not displayed.

You can create an account on their site: https://www.mapbox.com

After you have an account, you can create an API token at: https://account.mapbox.com

Set the MAP_TOKEN configuration value to your API token.

Configuration in the database ¶

API Keys ¶

The project requires API keys to access the locate APIs.

API keys can be any string of up to 40 characters, though random UUID4s in hex representation are commonly used, for example 329694ac-a337-4856-af30-66162bc8187a.

Fallback ¶

You can also enable a fallback location provider on a per API key basis. This allows you to send queries from this API key to an external service if Ichnaea can’t provide a good-enough result.

In order to configure this fallback mode, you need to set the fallback_* columns. For example:

fallback_name: mozilla
fallback_schema: ichnaea/v1
fallback_url: https://location.services.mozilla.com/v1/geolocate?key=some_key
fallback_ratelimit: 10
fallback_ratelimit_interval: 60
fallback_cache_expire: 86400

The name can be shared between multiple API keys and acts as a partition key for the cache and rate limit tracking.

The schema can be one of NULL, ichnaea/v1, combain/v1, googlemaps/v1 or unwiredlabs/v1.

NULL and ichnaea/v1 are currently synonymous. Setting the schema to one of those means the external service uses the same API as the geolocate v1 API used in Ichnaea.

If you set the url to one of the unwiredlabs endpoints, add your API token as an anchor fragment to the end of it, so instead of:

https://us1.unwiredlabs.com/v2/process.php

you would instead use:

https://us1.unwiredlabs.com/v2/process.php#my_secret_token

The code will read the token from here and put it into the request body.

Note that external services will have different terms regarding caching, data collection, and rate limiting.

If the external service allows caching their responses on an intermediate service, the cache_expire setting can be used to specify the number of seconds the responses should be cached. This can avoid repeated calls to the external service for the same queries.

The rate limit settings are a combination of how many requests are allowed to be send to the external service. It’s a “number” per “time interval” combination. In the above example, 10 requests per 60 seconds.

Export Configuration ¶

Ichnaea supports exporting position data that it gets via the APIs to different export targets. This configuration lives in the export_config database table.

Currently three different kinds of backends are supported:

s3: Amazon S3 buckets
internal: Ichnaea’s internal data processing pipeline which creates/ updates position data using new position information
geosubmit: submitting position information to an HTTP POST endpoint in geosubmit v2 format

The type of the target is determined by the schema column of each entry.

All export targets can be configured with a batch setting that determines how many reports have to be available before data is submitted to the backend.

All exports have an additional skip_keys setting as a set of API keys. Data submitted using one of these API keys will not be exported to the target.

There can be multiple instances of the bucket and HTTP POST export targets in export_config, but only one instance of the internal export.

Here’s the SQL for setting up an “internal” export target:

INSERT INTO export_config
(`name`, `batch`, `schema`) VALUES ("internal test", 1, "internal");

For a production setup you want to set the batch column to something like 100 or 1000 to get more efficiency. For initial testing its easier to set it to 1 so you immediately process any incoming data.

S3 Bucket Export (s3)¶

The schema column must be set to s3.

The S3 bucket export target combines reports into a gzipped JSON file and uploads them to the specified bucket url, for example:

s3://amazon_s3_bucket_name/directory/{source}{api_key}/{year}/{month}/{day}

The url can contain any level of additional static directories under the bucket root. The {api_key}/{year}/{month}/{day} parts will be dynamically replaced by the api_key used to upload the data, the source of the report (e.g. gnss) and the date when the backup took place. The files use a random UUID4 as the filename.

An example filename might be:

/directory/test/2015/07/15/554d8d3c-5b28-48bb-9aa8-196543235cf2.json.gz

Internal Export (internal)¶

The schema column must be set to internal.

The internal export target forwards the incoming data into the internal data pipeline.

HTTP Export (geosubmit)¶

The schema column must be set to geosubmit.

The HTTP export target buffers incoming data into batches of batch size and then submits them using the Geosubmit Version 2: /v2/geosubmit API to the specified url endpoint.

If the project is taking in data from a partner in a data exchange, the skip_keys setting can be used to prevent data being round tripped and sent back to the same partner that it came from.