Configuration

As part of deploying the application, you need to create an application configuration file, commonly called location.ini and insert a couple of rows into various database tables.

Configuration File

As explained in the the deployment documentation the processes find this configuration file via the ICHNAEA_CFG environment variable. The variable should contain an absolute path, for example /etc/location.ini.

The configuration file is an ini-style file and contains a number of different sections.

Required Sections

Cache

The cache section contains a cache_url pointing to a Redis server.

The cache is used as a classic cache by the webapp code, as a backend to store rate-limiting counters, as a custom and a celery queuing backend.

[cache]
cache_url = redis://localhost:6379/0

Database

The database section contains settings for accessing the MySQL database.

The web application only requires and uses the read-only connection, while the asynchronous celery workers only use the read-write connection.

Both of them can be restricted to only DML (data-manipulation) permissions, as neither needs DDL (data-definition) rights.

DDL changes are only done via the alembic database migration system, which has a separate alembic.ini configuration file.

[database]
rw_url = mysql+pymysql://rw_user:password@localhost/location
ro_url = mysql+pymysql://ro_user:password@localhost/location

GeoIP

The geoip section contains settings related to the maxmind GeoIP database.

The db_path setting needs to point to a maxmind GeoIP city database in version 2 format. Both GeoLite and commercial databases will work.

[geoip]
db_path = /path/to/GeoIP2-City.mmdb

Optional Sections

Assets

The assets section contains settings for a static file repository (Amazon S3) and a public DNS to access those files via HTTPS (Amazon CloudFront).

These are used to store and serve both the image tiles generated for the data map and the public export files available via the downloads section of the website.

[assets]
bucket = amazon_s3_bucket_name
url = https://some_distribution_id.cloudfront.net

Sentry

The sentry section contains settings related to a Sentry server.

The dsn setting needs to contain a valid DSN project entry.

[sentry]
dsn = https://public_key:secret_key@localhost/project_id

StatsD

The statsd section contains settings related to a StatsD service. The project uses a lot of metrics as further detailed in the metrics documentation.

The host and port settings determine how to connect to the service via UDP.

Since a single StatsD service usually supports multiple different projects, the metric_prefix setting can be used to prefix all metrics emitted by this project with a unique name.

The tag_support setting can either be false or true and declares whether or not the StatsD service supports metric tags. Datadog is an example of a service that supports tags. If tag_support is false, the tags will be emitted as part of the standard metric name.

[statsd]
host = localhost
port = 8125
metric_prefix = location
tag_support = true

For initial testing it can be useful to simply capture the statsd metrics without running an actual statsd daemon. To do so you can use the nc -lku localhost 8125 command to run a UDP service and print out all incoming data on the console.

Web

The web section contains settings related to the non-API website content.

The web functionality by default is limited to the public HTTP API. If the enabled setting is set to true the website content pages are also made available.

The map_id_base and map_id_labels settings specify Mapbox map ids for a base map and a map containing only labels. The map_token specifies a Mapbox access token.

[web]
enabled = true
map_id_base = example_base.map-123
map_id_labels = example_labels.map-234
map_token = pk.example_public_access_token

Database Configuration

API Keys

The project requires API keys to access the locate APIs. You need to add API keys manually to the database by direct SQL inserts.

API keys can be any string of up to 40 characters, though random UUID4s in hex representation are commonly used, for example 329694ac-a337-4856-af30-66162bc8187a.

But to start off, you can add a simple literal test API key:

INSERT INTO api_key
(`valid_key`, `allow_locate`) VALUES ("test", 1);

Export Configuration

The project supports exporting all data that its gets via the submit-style APIs to different backends. This configuration lives in the export_config database table.

Currently three different kinds of backends are supported:

  • Amazon S3 buckets
  • The projects own internal data processing pipeline
  • A HTTPS POST endpoint accepting the geosubmit v2 format

The type of the target is determined by the schema column of each entry.

All export targets can be configured with a batch setting that determines how many reports have to be available before data is submitted to the backend.

All exports have an additional skip_keys setting as a set of API keys. Data submitted using one of these API keys will not be exported to the target.

There can be multiple instances of the bucket and HTTP POST export targets, but only one instance of the internal export.

In the simplest case, you insert one row to send data to the internal data pipeline via:

INSERT INTO export_config
(`name`, `batch`, `schema`) VALUES ("internal", 1, "internal");

For a production setup you want to set the batch column to something like 100 or 1000 to get more efficiency. For initial testing its easier to set it to 1 so you immediately process any incoming data.

Bucket Export

The Amazon S3 bucket export combines reports into a gzipped JSON file and uploads them to the specified bucket url, for example:

s3://amazon_s3_bucket_name/directory/{source}{api_key}/{year}/{month}/{day}

The schema column must be set to s3.

The url can contain any level of additional static directories under the bucket root. The {api_key}/{year}/{month}/{day} parts will be dynamically replaced by the api_key used to upload the data, the source of the report (e.g. gnss) and the date when the backup took place. The files use a random UUID4 as the filename.

An example filename might be:

/directory/test/2015/07/15/554d8d3c-5b28-48bb-9aa8-196543235cf2.json.gz

Internal Export

The internal export forwards the incoming data into the internal data pipeline.

The schema column must be set to internal.

HTTPS Export

The HTTPS export buffers incoming data into batches of batch size and then submits them using the Geosubmit Version 2 API to the specified url endpoint, for example:

https://localhost/some/api/url?key=export

The schema column must be set to geosubmit.

If the project is taking in data from a partner in a data exchange, the skip_keys setting can be used to prevent data being round tripped and send back to the same partner that it came from.