Configuration¶
The application takes a number of different settings and reads them from environment variables. There are also a small number of settings inside database tables.
Environment variables¶
- component ichnaea.conf.AppComponent¶
Configuration summary:
Setting
Parser
Required?
bool
bool
ichnaea.conf.logging_level_parser
str
str
str
Yes
str
Yes
str
str
str
int
str
Yes
int
Yes
str
str
str
Configuration options:
- LOCAL_DEV_ENV¶
- Parser
bool
- Default
“false”
- Required
No
Whether we are (True) or are not (False) in a local dev environment. There are some things that get configured one way in a developer’s environment and another way in a server environment.
- TESTING¶
- Parser
bool
- Default
“false”
- Required
No
Whether or not we are running tests.
- LOGGING_LEVEL¶
- Parser
ichnaea.conf.logging_level_parser
- Default
“INFO”
- Required
No
Logging level to use. One of CRITICAL, ERROR, WARNING, INFO, or DEBUG.
- ASSET_BUCKET¶
- Parser
str
- Default
“”
- Required
No
name of AWS S3 bucket to store map tile image assets and export downloads
- ASSET_URL¶
- Parser
str
- Default
“”
- Required
No
url for map tile image assets and export downloads
- DB_READONLY_URI¶
- Parser
str
- Required
Yes
uri for the readonly database;
mysql+pymysql://USER:PASSWORD@HOST:PORT/NAME
- DB_READWRITE_URI¶
- Parser
str
- Required
Yes
uri for the read-write database;
mysql+pymysql://USER:PASSWORD@HOST:PORT/NAME
- SENTRY_DSN¶
- Parser
str
- Default
“”
- Required
No
Sentry DSN; leave blank to disable Sentry error reporting
- SENTRY_ENVIRONMENT¶
- Parser
str
- Default
“”
- Required
No
Sentry environment
- STATSD_HOST¶
- Parser
str
- Default
“”
- Required
No
StatsD host; blank to disable StatsD
- STATSD_PORT¶
- Parser
int
- Default
“8125”
- Required
No
StatsD port
- REDIS_URI¶
- Parser
str
- Required
Yes
uri for Redis;
redis://HOST:PORT/DB
- CELERY_WORKER_CONCURRENCY¶
- Parser
int
- Required
Yes
the number of concurrent Celery worker processes executing tasks
- MAPBOX_TOKEN¶
- Parser
str
- Default
“”
- Required
No
Mapbox API key; if you do not provide this, then parts of the site showing maps will be disabled
- GEOIP_PATH¶
- Parser
str
- Default
“/home/docs/checkouts/readthedocs.org/user_builds/ichnaea/checkouts/latest/ichnaea/tests/data/GeoIP2-City-Test.mmdb”
- Required
No
absolute path to mmdb file for GeoIP lookups
- SECRET_KEY¶
- Parser
str
- Default
“default for development, change in production”
- Required
No
a unique passphrase used for cryptographic signing
Alembic requires an additional item in the environment:
# URI for user with ddl access
SQLALCHEMY_URL=mysql+pymysql://USER:PASSWORD@HOST:PORT/DBNAME
The webapp uses gunicorn which also has configuration.
# Port for gunicorn to listen on
GUNICORN_PORT=${GUNICORN_PORT:-"8000"}
# Number of gunicorn workers to spin off--should be one per cpu
GUNICORN_WORKERS=${GUNICORN_WORKERS:-"1"}
# Gunicorn worker class--use our gevent worker
GUNICORN_WORKER_CLASS=${GUNICORN_WORKER_CLASS:-"ichnaea.webapp.worker.LocationGeventWorker"}
# Number of simultaneous greenlets per worker
GUNICORN_WORKER_CONNECTIONS=${GUNICORN_WORKER_CONNECTIONS:-"4"}
# Number of requests to handle before retiring worker
GUNICORN_MAX_REQUESTS=${GUNICORN_MAX_REQUESTS:-"10000"}
# Jitter to add/subtract from number of requests to prevent stampede
# of retiring
GUNICORN_MAX_REQUESTS_JITTER=${GUNICORN_MAX_REQUESTS_JITTER:-"1000"}
# Timeout for handling a request
GUNICORN_TIMEOUT=${GUNICORN_TIMEOUT:-"60"}
# Python log level for gunicorn logging output: debug, info, warning,
# error, critical
GUNICORN_LOGLEVEL=${GUNICORN_LOGLEVEL:-"info"}
Database¶
The MySQL compatible database is used for storing configuration and application data.
The webapp service requires a read-only connection.
The celery worker service requires a read-write connection.
Both of them can be restricted to only DML (data-manipulation) permissions as neither need DDL (data-definition) rights.
DDL changes are done using the alembic database migration system.
GeoIP¶
The web and worker roles need access to a maxmind GeoIP City database in version 2 format. Both GeoLite and commercial databases will work.
Redis¶
The Redis cache is used as a:
classic cache by the web role
backend to store rate-limiting counters
custom and a worker queuing backend
Sentry¶
All roles and command line scripts use an optional Sentry server to log
application exception data. Set this to a Sentry DSN to enable Sentry or ''
to disable it.
StatsD¶
All roles and command line scripts use an optional StatsD service to log application specific metrics. The StatsD service needs to support metric tags.
The project uses a lot of metrics as further detailed in the metrics documentation.
All metrics are prefixed with a location namespace.
Map tile and download assets¶
The application can optionally generate image tiles for a data map and public export files available via the downloads section of the website.
These assets are stored in a static file repository (Amazon S3) and made available via a HTTPS frontend (Amazon CloudFront).
Set ASSET_BUCKET
and ASSET_URL
accordingly.
To access the ASSET_BUCKET
, authorized AWS credentials are needed inside
the Docker image. See the Boto3 credentials documentation for details.
The development environment defaults to serving map tiles from the web server, and not serving public export files for download.
Mapbox¶
The web site content uses Mapbox to display a world map. In order to do this, it requires a Mapbox API token. Without a token, the map is not displayed.
You can create an account on their site: https://www.mapbox.com
After you have an account, you can create an API token at: https://account.mapbox.com
Set the MAP_TOKEN
configuration value to your API token.
Configuration in the database¶
API Keys¶
The project requires API keys to access the locate APIs.
API keys can be any string of up to 40 characters, though random UUID4s
in hex representation are commonly used, for example
329694ac-a337-4856-af30-66162bc8187a
.
Fallback¶
You can also enable a fallback location provider on a per API key basis. This allows you to send queries from this API key to an external service if Ichnaea can’t provide a good-enough result.
In order to configure this fallback mode, you need to set the fallback_*
columns. For example:
fallback_name: mozilla
fallback_schema: ichnaea/v1
fallback_url: https://location.services.mozilla.com/v1/geolocate?key=some_key
fallback_ratelimit: 10
fallback_ratelimit_interval: 60
fallback_cache_expire: 86400
The name can be shared between multiple API keys and acts as a partition key for the cache and rate limit tracking.
The schema can be one of NULL
, ichnaea/v1
, combain/v1
,
googlemaps/v1
or unwiredlabs/v1
.
NULL
and ichnaea/v1
are currently synonymous. Setting the schema to one
of those means the external service uses the same API as the geolocate v1 API
used in Ichnaea.
If you set the url to one of the unwiredlabs endpoints, add your API token as an anchor fragment to the end of it, so instead of:
https://us1.unwiredlabs.com/v2/process.php
you would instead use:
https://us1.unwiredlabs.com/v2/process.php#my_secret_token
The code will read the token from here and put it into the request body.
Note that external services will have different terms regarding caching, data collection, and rate limiting.
If the external service allows caching their responses on an intermediate
service, the cache_expire
setting can be used to specify the number of
seconds the responses should be cached. This can avoid repeated calls to the
external service for the same queries.
The rate limit settings are a combination of how many requests are allowed to be send to the external service. It’s a “number” per “time interval” combination. In the above example, 10 requests per 60 seconds.
Export Configuration¶
Ichnaea supports exporting position data that it gets via the APIs to different
export targets. This configuration lives in the export_config
database
table.
Currently three different kinds of backends are supported:
s3
: Amazon S3 bucketsinternal
: Ichnaea’s internal data processing pipeline which creates/ updates position data using new position informationgeosubmit
: submitting position information to an HTTP POST endpoint in geosubmit v2 format
The type of the target is determined by the schema
column of each entry.
All export targets can be configured with a batch
setting that determines
how many reports have to be available before data is submitted to the backend.
All exports have an additional skip_keys
setting as a set of API keys. Data
submitted using one of these API keys will not be exported to the target.
There can be multiple instances of the bucket and HTTP POST export
targets in export_config
, but only one instance of the internal export.
Here’s the SQL for setting up an “internal” export target:
INSERT INTO export_config
(`name`, `batch`, `schema`) VALUES ("internal test", 1, "internal");
For a production setup you want to set the batch column to something
like 100
or 1000
to get more efficiency. For initial testing its
easier to set it to 1
so you immediately process any incoming data.
S3 Bucket Export (s3)¶
The schema column must be set to s3
.
The S3 bucket export target combines reports into a gzipped JSON file and
uploads them to the specified bucket url
, for example:
s3://amazon_s3_bucket_name/directory/{source}{api_key}/{year}/{month}/{day}
The url can contain any level of additional static directories under
the bucket root. The {api_key}/{year}/{month}/{day}
parts will
be dynamically replaced by the api_key used to upload the data,
the source of the report (e.g. gnss) and the date when the backup took place.
The files use a random UUID4 as the filename.
An example filename might be:
/directory/test/2015/07/15/554d8d3c-5b28-48bb-9aa8-196543235cf2.json.gz
Internal Export (internal)¶
The schema column must be set to internal
.
The internal export target forwards the incoming data into the internal data pipeline.
HTTP Export (geosubmit)¶
The schema column must be set to geosubmit
.
The HTTP export target buffers incoming data into batches of batch
size and
then submits them using the Geosubmit Version 2: /v2/geosubmit API to the specified
url
endpoint.
If the project is taking in data from a partner in a data exchange, the
skip_keys
setting can be used to prevent data being round tripped and sent
back to the same partner that it came from.