Skip to content

PASTAplus/auth

Repository files navigation

EDI Identity and Access Manager (IAM)

Authentication and authorization service for the PASTA+ Data Repository environment.

  • EDI services support signing in via LDAP and via selected 3rd party identity providers (IdPs) using OAuth2 / OpenID Connect (OIDC)
  • LDAP accounts are managed by EDI and provide membership in the vetted group
  • All users that sign in (via LDAP or OAuth2) become members of the authenticated group

API

IAM provides a REST API for managing user profiles, identities, and access control rules (ACRs) for data packages and other resources in the EDI Data Repository. The API is designed to be used by client applications to create and manage user profiles, resources and access control rules.

Command-line interface (CLI)

IAM includes a set of command-line tools for managing authentication, authorization, and database operations.

Database administration and maintenance

  • db_manager.py: Utility for managing the database schema as required by the IAM web service.
  • run_psql.py: Run the psql command-line utility using settings from config.py.
  • sync_search_tables.py: The search tables are used by the UI to provide fast searches in resources. The tables are normally kept in sync by database triggers installed by db_manager.py. If a bug or other system issue causes the search tables to become unsynchronized, this utility can be used to synchronize the tables after the issue has been fixed.
  • mk_database_fixture.py: Creates the database fixture used for the unit tests.

Managing authentication tokens and API keys

  • api_key_to_token.py: Converts an API key to an EDI authentication token. Facilitates token-based authentication by exchanging long-lived API keys for session tokens.
  • view_edi_token.py: Decodes and displays the contents of an EDI token. Shows token metadata, claims, and expiration information for debugging.
  • view_pasta_token.py: Decodes and displays the contents of a PASTA token. Similar to view_edi_token.py but for PASTA authentication tokens.
  • gen_private_key_pair.sh: Generates public/private key pairs used for signing tokens issued by the service.

Managing resources and access control rules

  • is_authorized.py: Checks if an EDI token provides access to a specific resource by querying the /auth/v1/authorized endpoint with a token, resource key, and permission level (read/write/changePermission).
  • resource.py: Manages resources in the authorization system. Provides commands to create, update, delete, and viewing resource trees and their associated permissions.
  • rule.py: Manages authorization rules. Configures and maintains the rules that determine access control policies for resources.
  • add_eml_docs.py: Adds EML (Ecological Metadata Language) documents to the system. Used for importing metadata documents for testing or initial data population.

Creating test data and validating configurations

  • add_test_objects.py: Populates the database with test objects for development and testing. Creates sample data structures to validate authentication and authorization.
  • add_ui_test_resource_tree.py: Generates resource tree structures for UI testing. Creates a hierarchy of test resources with various permission levels to verify the user interface displays and updates resources and access control rules correctly.
  • check_config_template.py: Validates live config.py against config.py.template, ensuring that the live configuration contains all expected settings, and that the template has been updated to stay in sync.
  • gen_edi_id.py: Generates unique EDI identifiers. Creates properly formatted identifiers for profiles and groups.

Supported Identity Providers (IdPs)

EDI LDAP (Lightweight Directory Access Protocol)

  • LDAP accounts are managed by EDI and provide membership in the vetted group, which provides elevated privileges for users publishing packages on EDI

Configuration

Google

  • Google's OAuth2 service is part of Google Cloud and accessed via Google Cloud Console

Configuration

Notes

ORCID

Configuration

GitHub

Configuration

Microsoft

  • Microsoft's OAuth2 service is part of Microsoft Entra ID.

Configuration

Notes

  • To edit the Redirect URIs, select Redirect URIs under Essentials
  • The EDI app is configured to support accounts in any organizational directory (any Microsoft Entra ID tenant or multitenant), and personal Microsoft accounts (e.g., Skype, Xbox)
  • We do not currently use the Logout URI
  • Select the tokens you would like to be issued by the authorization endpoint:
    • Access tokens (used for implicit flows): Y
    • ID tokens (used for implicit and hybrid flows): Y
    • Live SDK support: N
    • Allow public client flows: N

redirect_uri

The redirect_uri in OAuth2 is always a URL provided by the client. After successful sign-in, the IdP redirects to this URL, appending the user's security context as query parameters.

To prevent spoofing, the redirect_uri must exactly match a registered value at the IdP. Multiple redirect_uris can be registered to support different instances of the same OAuth2 application. For Auth, the redirect_uri follows this format:

https://<HOST><:PORT>/auth/callback/<IDP_NAME>

Since we currently have public production, staging and development instances of Auth, and also run Auth locally under port 5443 for development, these are the redirect_uris that we need to be preconfigured at each IdP.

GitHub

Google

Microsoft

ORCID

Note: ORCID does not support localhost in the redirect_uri, so we use 127.0.1.1. However, this conflicts with requirement for localhost by other IdPs, so can only be used for testing ORCID in development. To test ORCID in development, also set 127.0.0.1 in Config.SERVICE_BASE_URL.


## Conda

### Managing the Conda environment in a production environment

Start and stop the auth service as root:

```shell
# systemctl start auth.service
# systemctl stop auth.service

Remove and rebuild the auth venv:

conda env remove --name auth
conda env create --file environment-min.yml

Update the auth venv in place:

conda env update --file environment-min.yml --prune

Activate and deactivate the auth venv:

conda activate auth
conda deactivate

Managing the Conda environment in a development environment

Update the environment.yml:

conda env export --no-builds > environment.yml

Update Conda itself:

conda update --name base conda

Update all packages in environment:

conda update --all

Create or update the requirements.txt file (for use by GitHub Dependabot, and for pip based manual installs):

pip list --format freeze > requirements.txt

Server: Procedure for updating the Conda environment and all dependencies

conda deactivate
conda update -n base -c defaults conda
conda update --all
conda env remove --yes --name auth
conda env create --file environment-min.yml
conda activate auth

Dev: Procedure for updating the Conda environment and all dependencies

Full "Server" procedure, plus update the environment.yml and requirements.txt files:

conda env export --no-builds > environment.yml
pip list --format freeze > requirements.txt

If Conda base won't update to latest version, try:

conda install conda==<version>

or

conda update -n base -c defaults conda --repodata-fn=repodata.json

Strategy for dealing with Google emails historically used as identifiers

This procedure describes how we handle the IdP UID (stored in Profile.idp_uid) in a way that lets us migrate away from using Google emails as identifiers, while still allowing users to log in with their Google accounts, and moving to using Google's OAuth2 UID as the unique identifier for users.

  • Below, the "API UID" refers to the unique user identifier string provided by the client when creating a skeleton profile through the API.

  • When a new profile is created through the API:

    • If the API UID already exists in the Profile.idp_uid field:
      • The existing profile may be a skeleton or a full profile
      • Return the existing profile
    • If not:
      • If API UID the unique string is in the Profile.email field:
        • We only used emails as identifiers for Google IdP users, so this is a Google IdP profile
        • Return the existing profile
      • If not:
        • Create a new profile with the API UID in the Profile.idp_uid field
        • Return the new profile
  • When someone logs in with an IdP other than Google:

    • Create a profile if one doesn't exist, and then log in the user
  • When someone logs in with Google as their IdP:

    • If a profile exists with the Google IdP UID in the Profile.idp_uid field:
      • Log the user in as normal.
    • If not:
      • If the Google IdP email matches a Profile.idp_uid:
        • Set the new Google IdP UID in Profile.idp_uid
        • Set all other fields
        • Log the user in as normal
      • If not:
        • Create a new profile using the Google IdP UID
        • Set all other fields
        • Log the user into the new profile

Setting up a trusted CA and SSL certificate for local development

To avoid browser warnings about untrusted certificates, we create a self-signed CA certificate and use it to sign a certificate for the local development server.

Browsers do not use the system CA store, so the CA certificate must be added to the browser's trust store. For Chrome, go to chrome://settings/certificates and import the CA certificate in the Authorities tab.

Brief instructions for creating the CA, and server certificates, and installing them to the system CA store. You will be prompted for a new password for the CA key, and for the same password again when signing the local certificate. There's no need to remember the password after that, unless you plan on signing more certs with the same CA:

openssl genpkey -algorithm RSA -out ca.key -aes256
openssl req -x509 -new -nodes -key ca.key -sha256 -days 3650 -out ca.crt -subj "/CN=My Local CA"
openssl genpkey -algorithm RSA -out localhost.key
openssl req -new -key localhost.key -out localhost.csr -subj "/CN=localhost"

cat > localhost.ext <<EOF
authorityKeyIdentifier=keyid,issuer
basicConstraints=CA:FALSE
keyUsage = digitalSignature, nonRepudiation, keyEncipherment, dataEncipherment
extendedKeyUsage = serverAuth
subjectAltName = @alt_names

[alt_names]
DNS.1 = localhost
EOF

openssl x509 -req -in localhost.csr -CA ca.crt -CAkey ca.key -CAcreateserial -out localhost.crt -days 3650 -sha256 -extfile localhost.ext

sudo cp localhost.crt /etc/ssl/certs/
sudo cp localhost.key /etc/ssl/private/
sudo cp ca.crt /usr/local/share/ca-certificates/
sudo update-ca-certificates

Chromium DevTools integration

Chromium DevTools workfolders allow developers to work with local files directly in the browser. Integration can be set up in DevTools by adding the project root as a workspace folder. This can also be automated by serving a well-known directory containing a configuration file that DevTools recognizes. As seen from the browser, the directory structure should look like this:

/.well-known/appspecific/com.chrome.devtools.json

Example com.chrome.devtools.json:

{
  "workspace": {
    "root": "/home/user/projects/auth",
    "uuid": "53b029bb-c989-4dca-969b-835fecec3717"
  }
}

For details, see:

https://chromium.googlesource.com/devtools/devtools-frontend/+/main/docs/ecosystem/automatic_workspace_folders.md

Preparing the database on first install

sudo -u postgres createuser --pwprompt auth
sudo -u postgres createdb -O auth auth

Then update config.py with the password you set for the auth Postgres user.

Export Postgres DB to another server

Export:

sudo -su postgres pg_dump -U auth -h localhost auth > /tmp/auth-dump.sql

Import:

sudo -su postgres
psql -U auth -h localhost -c 'drop database if exists auth;'
psql -U auth -h localhost -c 'create database auth;'
psql -U auth -h localhost -c 'alter database auth owner to auth;'
psql -U auth -h localhost auth < /tmp/auth-dump.sql

Server settings

  • Ensure that total number of Postgres connections is lower than the max_connections setting in Postgres, and leave some headroom for psql and other tools that may need to connect to the database for maintenance and debugging.
  • Too many connections will show as errors like psycopg2.OperationalError: FATAL: too many connections in the logs.
  • The default max_connections in Postgres is 100. It can be checked with SHOW max_connections; in psql, and updated in postgresql.conf if needed.
  • The max number of connections created by IAM is (pool_size + max_overflow) * workers
  • The number of workers is set in the /etc/systemd/system/auth.service file, and defaults to 2. The pool size and max overflow are set in config.py and default to 10 and 20, respectively, which means that each worker can create up to 30 connections, for a total of 60 connections at max load.

About

PASTA+ Authentication Service

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors