Configuration

May 2024 - ongoing

This page captures customizations we implemented after deployment based on specific needs of the SEFSC. It will be built out over time as continue to experiment and learn.

Connect to NODD

The NOAA Open Data Dissemination (NODD) Program hosts publically-available environmental data in the cloud for easy access by all end users. As of 2025, datasets on NODD are sorted by NOAA Line and/or Program Office and by type of data. The SEFSC NODD data bucket is a Google Cloud Project storage bucket and will be used to host underwater video data free-of-charge.1

One important advantage of cloud computing is that large volumes of data can be accessed from anywhere by any VM. This allows us to store our training data in NODD and process it using our internal VIAME-Web installation. To do this, we linked VIAME to our NODD bucket so that the two resources could communicate.

  1. Boot the virtual machines and connect via SSH.

  2. Log in to Girder, VIAME’s data management platform.

  3. Click “Admin console” in the menu on the left, then select “Assetstores”.

    Note

    You must have an administrator account to see and access this menu.

  4. Select “Create new Amazon S3 assetstore” from the bottom and enter the requested information.

    Note

    This menu name is misleading but is used to add any cloud storage bucket, regardless of cloud service provider. It does not necessarily need to be an Amazon S3 bucket.

    • Assetstore name: This is what you want want to refer to the assetstore by. It can be anything.
    • S3 bucket name: The name of the bucket to be linked. This comes from the bucket itself and was assigned when the bucket was created. For example, the SEFSC NODD bucket is named nmfs_odp_sefsc.
    • Path prefix (optional): Use this to connect only select data within the bucket being linked. “Prefix” refers to directory path in which the desired data resides. For example, in the SEFSC NODD bucket, reef fish videos are stored in PEMD/VIDEO_DATA/GOM_REEF_FISH. This is the path prefix.
    • Access key ID: Not needed for public data buckets.
    • Secret access key: Used for private buckets. Not needed for public data buckets.
    • Service: Specifies the cloud service provider of the desired bucket. For GCP, this is https://storage.googleapis.com.
    • Region: The geographic region of the bucket being linked.

    If you only intend to read data in from the bucket (most common), be sure to select Read only at the bottom. Click “Create” when done.

  5. Click “Import data” on the newly added assetstore and enter the requested information.

    • Import path (key or directory): Where in the bucket the data are located. For the SEFSC NODD bucket, reef fish videos are stored in PEMD/VIDEO_DATA/GOM_REEF_FISH.
    • Destination type: “Folder”, “User”, or “Collection” as appropriate.
    • Destination ID: (optional) Where the data should go.

    Click “Begin import” when done. This may take some time, depending on how much data are being imported.

When finished, the data should be visible within VIAME.

Footnotes

  1. This remains a work-in-progress. Check back periodically for new data.↩︎