Column_Name | Description |
---|---|
gid_1 | Unique identifier for the first administrative level. |
gid_0 | Country code for Canada. |
country | Name of the country (Canada). |
name_1 | Name of the first administrative level (Province/Territory). |
varname_1 | Alternative or variant names for the region. |
nl_name_1 | Native language names for the region. |
type_1 | Type of administrative unit (Province/Territory). |
engtype_1 | English type name for the administrative unit. |
cc_1 | Internal code for the administrative region. |
hasc_1 | Hierarchical administrative subdivision code. |
iso_1 | ISO code for the administrative region. |
geom | Simplified polygon geometry for the region. |
3 Data Sources Overview
3.1 Canadian Outline
3.1.1 Source and Description
The dataset, titled Canada Outline, was generated to create a simplified polygon of Canada’s national boundaries. This dataset was produced using the GADM Level 1 administrative boundary data for Canada, with the geometry simplified for efficient large-scale spatial analysis (Global Administrative Areas, 2023). The resulting GeoPackage file represents Canada and its provinces/territories with reduced geometric complexity. The dataset is suitable for mapping and spatial analysis tasks requiring simplified boundaries.
3.1.2 Processing Steps
The processed data was generated using the prc_canada_outline
script. The following steps were undertaken:
- Data Acquisition:
- GADM Level 1 administrative boundary data for Canada was retrieved using the
geodata::gadm
function.
- GADM Level 1 administrative boundary data for Canada was retrieved using the
- Geometry Simplification:
- Simplified the polygon geometry to reduce data complexity using the
terra::simplifyGeom
function with a tolerance of 0.1.
- Simplified the polygon geometry to reduce data complexity using the
- Output Generation:
- The simplified boundary dataset was saved as
can_1_simplified.gpkg
in GeoPackage format.
- The simplified boundary dataset was saved as
3.1.3 Processed Data Structure
The structure of the simplified dataset is as follows:
3.2 Canadian Freshwater Fish Species
3.2.1 Source and Description
The dataset, titled FishBase Freshwater Species Checklist, was harvested from FishBase (Froese, R. and Pauly, D., 2024). This dataset provides information on freshwater fish species found across Canada. It includes essential taxonomic and ecological details, such as species name, vernacular name, family, order, and occurrence status. Additional datasets retrieved from FishBase include species status (e.g., threat categories from the IUCN Red List of Threatened Species), and whether the species is classified as game or commercial. The source data consists of HTML tables retrieved through web scraping and transformed into a structured format for analysis.
3.2.2 Processing Steps
The harvested raw data was processed using the prc_freshwater_fish_canada
script. The following steps were performed:
- Data Cleaning:
- Column names in all input files were standardized using the
janitor::clean_names
function. - Unnecessary columns, such as
name_in_country
, were removed.
- Column names in all input files were standardized using the
- Field Selection and Renaming:
- The
fish_base_name
column was renamed tovernacular
for clarity.
- The
- Integration:
- Species occurrence data was joined with additional tables for status, game species, and commercial species using the
species
column as a key.
- Species occurrence data was joined with additional tables for status, game species, and commercial species using the
- Data Transformation:
- A unique
species_id
was generated for each species record.
- A unique
- Data Export:
- The fully integrated and processed dataset was saved as
freshwater_fish_species_canada.csv
.
- The fully integrated and processed dataset was saved as
3.2.3 Processed Data Structure
The processed dataset retains the following structure:
Column_Name | Description |
---|---|
species_id | Unique identifier for each fish species entry. |
species | Scientific name of the fish species. |
vernacular | Common name of the fish species. |
order | Taxonomic order to which the species belongs. |
family | Taxonomic family to which the species belongs. |
occurrence | Occurrence status of the species in Canada (e.g., native). |
threat_category | Threat category assigned to the species, if applicable (e.g., Vulnerable (VU), Endangered (EN)). |
game | Indicator of whether the species is classified as a game species (1 = yes, NA = no). |
commercial | Indicator of whether the species is classified as a commercial species (1 = yes, NA = no). |
3.3 Canadian Hydrological Features
3.3.1 Source and Description
The dataset, titled Atlas of Canada National Scale Data 1:1,000,000 - Waterbodies & Rivers, was harvested from Natural Resources Canada (Canada, 2022b, 2022a). This comprehensive GIS dataset includes spatial and tabular data representing Canada’s hydrological features. It consists of waterbodies and river datasets compiled for large-scale mapping at a 1:1,000,000 scale. The processed dataset provides geospatial layers of lakes and rivers, including both their polygonal representations and sampled points for further spatial analyses.
3.3.2 Processing Steps
The harvested raw data was processed using the prc_atlas_of_canada_hydrology
script. The following steps were performed:
- Unzipping: The compressed GDB files were extracted to a temporary directory for processing.
- Lake Processing:
- Loaded the “AC_1M_Waterbodies” layer using the
sf::st_read
function. - Calculated the area and perimeter of each lake using
sf::st_area
andlwgeom::st_perimeter_lwgeom
. - Filter lakes to keep only lakes with area > 5 \(km^2\).
- Generated a unique
waterbody_id
for each lake and classified it as typelake
. - Retained relevant columns:
waterbody_id
,wb_type
,name
,name_fr
,area
, andperimeter
. - Transformed the data to CRS 4326 and exported it as
lakes_polygons.gpkg
. - Generated centroids for each lake and exported the point layer as
lakes_points.gpkg
.
- Loaded the “AC_1M_Waterbodies” layer using the
- River Processing:
- Loaded the “AC_1M_Rivers_dense” layer using the
sf::st_read
function. - Calculated the length of each river segment using
sf::st_length
. - Filter lakes to keep only lakes with length > 10 km.
- Generated a unique
waterbody_id
for each river and classified it as typeriver
. - Retained relevant columns:
waterbody_id
,wb_type
,name
,name_fr
, andlength
. - Transformed the data to CRS 4326 and exported it as
rivers_lines.gpkg
. - Sampled points along each river segment at the mid-point of each rivers, cast the lines to points, and exported as
rivers_points.gpkg
.
- Loaded the “AC_1M_Rivers_dense” layer using the
- Cleanup: Temporary files created during processing were deleted.
3.3.3 Processed Data Structure
The structure of the processed lake polygon dataset is as follows:
Column_Name | Description |
---|---|
waterbody_id | Unique identifier for each lake feature. |
wb_type | Type of waterbody, always ‘lake’ for this dataset. |
name | Name of the lake in English. |
name_fr | Name of the lake in French. |
area | Surface area of the lake in square kilometers. |
perimeter | Perimeter of the lake in kilometers. |
The structure of the processed lake point dataset is as follows:
Column_Name | Description |
---|---|
waterbody_id | Unique identifier for each lake feature. |
geom | Centroid geometry of the lake in geographic coordinates (CRS 4326). |
The structure of the processed river line dataset is as follows:
Column_Name | Description |
---|---|
waterbody_id | Unique identifier for each river segment. |
wb_type | Type of waterbody, always ‘river’ for this dataset. |
name | Name of the river in English. |
name_fr | Name of the river in French. |
length | Length of the river segment in kilometers. |
The structure of the processed river point dataset is as follows:
Column_Name | Description |
---|---|
waterbody_id | Unique identifier for each river segment. |
geom | Sampled point geometry at the mid-point of the river. |
3.4 Freshwater Fish Species Occurrences
3.4.1 Source and Description
The dataset, titled Global Freshwater Fish Species Occurrences Database, was retrieved from Figshare (Tedesco et al., 2017a). This comprehensive dataset compiles freshwater fish species occurrences aggregated at the drainage basin level. The data is associated with the data paper by Tedesco et al. (2017b) and is designed for analyzing global freshwater biodiversity patterns.
The dataset includes both spatial data (drainage basins) and tabular data (species occurrences). The drainage basin data covers 3,119 polygons worldwide, while the occurrence table provides over 110,000 records of species distributions across basins. The data has been validated using established taxonomic references such as FishBase.
3.4.2 Processing Steps
The harvested data was processed using the prc_freshwater_fish_occurrences
script. The following steps were performed:
- Data Extraction:
- The compressed dataset (
freshwater_fish_occurrences.zip
) was unzipped into a temporary directory.
- The compressed dataset (
- Drainage Basin Data Processing:
- The shapefile
Basin042017_3119.shp
was loaded using thesf::st_read
function. - Column names were cleaned and renamed to remove unnecessary prefixes using
janitor::clean_names
anddplyr::rename_with
. - The cleaned data was exported as
bassins.gpkg
in GeoPackage format.
- The shapefile
- Species Occurrence Data Processing:
- The
Occurrence_Table.csv
file was loaded using thevroom::vroom
function. - Column names were cleaned and renamed similarly to the drainage basin data.
- Species names in the
species_name_in_source
andfishbase_valid_species_name
columns were updated to replace periods (.
) with spaces. - The processed data was exported as
occurrences.csv
.
- The
3.4.3 Processed Data Structure
The structure of the processed drainage basin dataset (bassins.gpkg
) is as follows:
Column_Name | Description |
---|---|
basin_name | Name of the drainage basin. |
country | Country where the basin is located. |
ecoregion | Ecoregion associated with the basin. |
endorheic | Indicator of whether the basin is endorheic (closed basin). |
out_longit | Longitude of the basin outlet. |
out_latit | Latitude of the basin outlet. |
med_longit | Median longitude of the basin. |
med_latit | Median latitude of the basin. |
surf_area | Surface area of the basin in square kilometers. |
geometry | Polygon geometry of the basin. |
The structure of the processed species occurrence dataset (occurrences.csv
) is as follows:
Column_Name | Description |
---|---|
basin_name | Name of the drainage basin. |
species_name_in_source | Species name as recorded in the source. |
native_exotic_status | Status of the species in the basin (native or exotic). |
tsn_itis_code | Taxonomic Serial Number (TSN) from ITIS. |
fishbase_species_code | Species code from FishBase. |
fishbase_valid_species_name | Valid species name from FishBase. |
occurrence_status | Occurrence status of the species in the basin (valid or invalid). |
3.5 GBIF Species Occurrences
3.5.1 Source and Description
The dataset, titled GBIF Occurrence Data, was retrieved from the Global Biodiversity Information Facility (GBIF) (Global Biodiversity Information Facility, 2025). This dataset compiles species occurrence records for a specified taxonomic group and geographic region. The retrieved dataset includes spatial and tabular data for over 850,000 records of species occurrences across Canada, starting from 2000.
It should be noted that the processing steps described below show how the data is downloaded and processed programmatically. However, due to how GBIF manages their data request, we elected to store the downloaded data on our secure Google Cloud Storage and to retrieve them programmatically from there rather than directly on GBIF.
3.5.2 Processing Steps
The harvested data was processed using the dwn_gbif
and prc_gbif
scripts. The following steps were performed:
- Data Download:
- A query for specified taxonomic groups and geographic boundaries was submitted to the GBIF API using the
rgbif
package.
- A query for specified taxonomic groups and geographic boundaries was submitted to the GBIF API using the
- Data Extraction:
- The compressed ZIP file was extracted into a temporary directory.
- The occurrence data was read from
occurrence.txt
using thevroom::vroom
function.
- Data Cleaning and Transformation:
- Selected relevant columns:
species
,year
,month
,day
,eventDate
,decimalLatitude
,decimalLongitude
, andlifeStage
. - Converted the cleaned data to a spatial data frame using the
sf::st_as_sf
function with geographic coordinates (CRS 4326).
- Selected relevant columns:
- Export:
- The cleaned and spatially enabled dataset was exported as
species_occurrences_gbif.gpkg
in GeoPackage format.
- The cleaned and spatially enabled dataset was exported as
3.5.3 Processed Data Structure
The structure of the processed dataset (species_occurrences_gbif.gpkg
) is as follows:
Column_Name | Description |
---|---|
species | Scientific name of the species. |
year | Year of the recorded occurrence. |
month | Month of the recorded occurrence. |
day | Day of the recorded occurrence. |
eventDate | Full date and time of the recorded occurrence. |
lifeStage | Life stage of the species at the time of the occurrence (if available). |
geometry | Point geometry of the occurrence in geographic coordinates (CRS 4326). |
3.6 National Hydro Network
3.6.1 Source and Description
The dataset, titled National Hydro Network (NHN) GeoBase Series, was retrieved from Natural Resources Canada (Canada, 2022c). This dataset provides a comprehensive geometric description and a set of basic attributes describing Canada’s inland surface waters. The data includes lakes, reservoirs, rivers, canals, drainage networks, and associated features.
3.6.2 Processing Steps
The NHN data was processed using the prc_national_hydro_network
script. The following steps were performed:
- Data Extraction:
- The compressed GeoPackage dataset (
rhn_nhn_decoupage.gpkg.zip
) was unzipped into a temporary directory.
- The compressed GeoPackage dataset (
- Watershed Data Processing:
- The GeoPackage file (
rhn_nhn_decoupage.gpkg
) was read using thesf::st_read
function. - All geometries were cast to polygons using
sf::st_cast("GEOMETRYCOLLECTION")
andsf::st_collection_extract("POLYGON")
. - The geometries were simplified using
sf::st_simplify(dTolerance = 100)
to reduce complexity while maintaining accuracy. - A unique
watershed_id
was generated for each feature using thedplyr::mutate
function. - Only the
watershed_id
and geometry columns were retained.
- The GeoPackage file (
- Export:
- The processed watershed data was exported as
watersheds.gpkg
in GeoPackage format.
- The processed watershed data was exported as
3.6.3 Processed Data Structure
The structure of the processed dataset (watersheds.gpkg
) is as follows:
Column_Name | Description |
---|---|
watershed_id | Unique identifier for each watershed feature. |
geom | Polygon geometry of the watershed in geographic coordinates (CRS 4326). |
3.7 Ontario Freshwater Fishes Life History Database
The Ontario Freshwater Fishes Life History Database (Eakins, 2024) provides comprehensive life history information for 161 freshwater fish species in Ontario, Canada. The dataset includes 43 characteristics per species, covering taxonomic, ecological, reproductive, and habitat-related information. It also provides a bibliography of references used to compile this data.
3.7.1 Source and Accessibility
- Source: Ontario Freshwater Fishes Life History Database
- Accessibility: Open Government Licence - Canada
- Data Type: HTML Scraped Dataset
- Coverage: Freshwater systems in Ontario, Canada
- Geographic Coverage: Ontario (Bounding Box: -95.1539, 41.6770, -74.3435, 56.8595)
- Temporal Coverage: 2024
- Processing Script:
dwn_ontario_freshwater_fishes_life_history.R
- Output Files:
ontario_fishes_characteristics.csv
,ontario_fishes_references.csv
3.7.2 Processing Steps
Data was harvested from the website by scraping individual fish detail pages for 161 species. The harvesting script used the rvest
package in R to extract structured data for each species, including:
- Characteristics Extraction: Scraped 43 life history attributes such as family, common name, habitat preference, spawning season, thermal regime, and abundance.
- References Compilation: Extracted all bibliographic references associated with each species.
- Data Cleaning: Cleaned and pivoted the raw data into a tidy format using
tidyr::pivot_wider()
andjanitor::clean_names()
. - Export: Saved the processed data into two CSV files:
ontario_fishes_characteristics.csv
: A structured table of life history characteristics.ontario_fishes_references.csv
: A table of species-referenced bibliographies.
3.7.3 Processed Data Structure
The structure of the processed data includes the following key variables:
Field | Description |
---|---|
species_name | Scientific name of the species. |
family | Taxonomic family, including common name. |
species | Species name, including genus. |
taxonomic_authority | Author of the taxonomic classification. |
common_name_s | Common name in English. |
french_name | Common name in French. |
ontario_origin | Origin status in Ontario (e.g., native, introduced). |
general_abundance | General abundance in freshwater systems. |
thermal_regime | Thermal regime preference (coldwater, coolwater, warmwater). |
habitat_preference | Preferred habitats for the species. |
spawning_season | Season of spawning activity. |
fecundity | Number of eggs produced by the species. |
adult_length_cm | Length of adult individuals in centimeters. |
maximum_length_cm | Maximum recorded length of the species in centimeters. |
lifespan_yrs | Expected lifespan in years. |
Field | Description |
---|---|
species | Scientific name of the species associated with the reference. |
reference | Reference citation for the data. |
3.8 FishPass Database
The FishPass Database (Benoit et al., 2023) provides a comprehensive collection of biological attributes that influence fish movement and passage, with a particular focus on species from the Laurentian Great Lakes. This dataset is essential for designing selective fish passage systems that aim to balance connectivity while managing the spread of invasive species. It includes 21 biological attributes, covering phenology, morphology, physiology, and behavioral characteristics for 220 fish species. Data coverage varies across species and attributes, highlighting gaps in knowledge about behavioral traits and potential invasive species.
3.8.1 Source and Accessibility
- Source: Dryad Repository
- DOI: 10.5061/dryad.fqz612jwj
- Accessibility: CC0 Public Domain Dedication
- Data Type: Tabular CSV files
- Coverage: Laurentian Great Lakes, North America
- Processing Script:
prc_fishpass.R
- Output Files:
fishpass_behaviour.csv
fishpass_morphology.csv
fishpass_phenology.csv
fishpass_physiology.csv
3.8.2 Processing Steps
- Data Cleaning:
- Column names were standardized using
janitor::clean_names()
. - Missing or malformed data were handled appropriately.
- Column names were standardized using
- Data Structuring:
- Each CSV file was parsed into tidy data tables, retaining only relevant fields.
- Attributes such as migratory status, body morphology, spawning season, and vertical stationing were included.
- Export:
- Cleaned data tables were saved into separate CSV files for each attribute dimension, enabling modular analysis.
3.8.3 Processed Data Structure
The cleaned dataset is divided into four tables, each containing data for 220 species:
Field | Description |
---|---|
order | Taxonomic order of the species. |
family | Taxonomic family of the species. |
genus | Genus of the species. |
scientific_name | Scientific name of the species. |
common_name | Common name of the species. |
vertical_station | Vertical stationing behavior (e.g., demersal, pelagic). |
schooling_behaviour | Tendency for schooling behavior (e.g., schooling, non-schooling). |
reference_vs | Reference for vertical station data. |
reference_sb | Reference for schooling behavior data. |
Field | Description |
---|---|
order | Taxonomic order of the species. |
family | Taxonomic family of the species. |
genus | Genus of the species. |
scientific_name | Scientific name of the species. |
common_name | Common name of the species. |
maximum_total_length_cm | Maximum recorded total length (cm). |
body_shape | Overall body shape (e.g., fusiform, elongated). |
aspect_ratio | Aspect ratio of the fins. |
eye_size_percent_hl | Eye size as a percentage of head length. |
reference_mtl | Reference for maximum total length data. |
Field | Description |
---|---|
order | Taxonomic order of the species. |
family | Taxonomic family of the species. |
genus | Genus of the species. |
scientific_name | Scientific name of the species. |
common_name | Common name of the species. |
migratory_status | Whether the species is migratory or non-migratory. |
spatial_scale_of_movement | Scale of spatial movement (e.g., diadromous, potamodromous). |
spawning_frequency | Frequency of spawning (e.g., iteroparous). |
spring_spawner | Indicates if the species spawns in spring (binary). |
reference_sf | Reference for spawning frequency data. |
Field | Description |
---|---|
order | Taxonomic order of the species. |
family | Taxonomic family of the species. |
genus | Genus of the species. |
scientific_name | Scientific name of the species. |
common_name | Common name of the species. |
climbing_ability | Ability to climb barriers (binary). |
hearing_specialization | Hearing specialization (binary). |
trophic_level | Trophic level of the species. |
presence_of_ampullary_electroreceptors | Presence of ampullary electroreceptors (binary). |
reference_ca | Reference for climbing ability data. |
3.9 North American Freshwater Migratory Fish Database (NAFMFD)
The North American Freshwater Migratory Fish Database (NAFMFD) (Dean et al., 2021, 2022) synthesizes comprehensive data on the migratory behavior of freshwater fishes across Canada, the United States, and Mexico. It includes information for 1,241 species spanning 79 families and 322 genera, characterizing migratory status, patterns, and behaviors.
3.9.1 Source and Accessibility
- Source: U.S. Geological Survey ScienceBase-Catalog
- DOI: 10.5066/P9WDLLP0
- Accessibility: Public Domain
- Data Type: Excel Spreadsheet and Metadata (XML)
- Coverage: North America
- Processing Script:
prc_north_american_freshwater_migratory_fish_database.R
- Output Files:
north_american_freshwater_migratory_fish_database.csv
3.9.2 Processing Steps
The data was harvested from the ScienceBase-Catalog using two files: an Excel spreadsheet containing species information and a metadata XML file. The processing involved the following steps:
- Data Extraction:
- The Excel file was read using the
readxl
package. - Column names were standardized using
janitor::clean_names()
for consistency.
- The Excel file was read using the
- Data Cleaning:
- Removed unnecessary whitespace and formatted data fields for analysis.
- Export:
- The processed data was saved as a CSV file for ease of use in further analyses.
3.9.3 Processed Data Structure
The final processed dataset contains 28 fields and 2,198 rows. Key variables include species taxonomy, migratory behavior, and references for data sources.
Field | Description |
---|---|
itis_family | Integrated Taxonomic Information System (ITIS) identifier for the family. |
family_name | Taxonomic family name of the species. |
scientific_name | Scientific name of the species. |
common_name | Common name of the species. |
migratory | Indicator of whether the species is migratory (1 = yes, 0 = no). |
anadromous | Indicator of anadromous migratory behavior (1 = yes, 0 = no). |
catadromous | Indicator of catadromous migratory behavior (1 = yes, 0 = no). |
potamodromous | Indicator of potamodromous migratory behavior (1 = yes, 0 = no). |
diadromous | Indicator of diadromous migratory behavior (1 = yes, 0 = no). |
suspected_migrant | Indicator of suspected migratory status (1 = yes, 0 = no). |
non_migratory | Indicator of non-migratory status (1 = yes, 0 = no). |
reference | Reference for data source and assignment. |
3.10 Roberge et al. (2002)
The Roberge et al. (2002) dataset (Roberge et al., 2002) summarizes associations between stream habitat characteristics and life history stages of 86 species and 13 additional subspecies/forms of freshwater fishes in British Columbia and Yukon. The dataset focuses on stream habitat requirements across four life stages: spawning, young-of-the-year, juvenile, and adult. It highlights significant gaps in knowledge for species groups such as green sturgeon, minnows, smelts, ciscos, suckers, sculpins, lampreys, and sticklebacks, emphasizing the need for further research on stream habitat requirements for these taxa.
3.10.1 Source and Accessibility
- Source: Fisheries and Oceans Canada, ScienceBase GCS
- Accessibility: Institutional License
- Data Type: Tabular Text File (
.txt
), PDF Report (.pdf
) - Coverage: British Columbia and Yukon, Canada
- Processing Script:
prc_roberge_2002.R
- Output File:
roberge.csv
3.10.2 Processing Steps
The dataset was retrieved manually from the report (Fs97-4-2611E.pdf
) to create the raw data (Roberge2002.txt
). The programmatic processing steps involved:
- Data Extraction:
- Imported the tabular text file with
read.delim()
. - Filled missing values for habitat and life stage fields using
tidyr::fill()
.
- Imported the tabular text file with
- Data Restructuring:
- Split the data by species groups.
- Renamed columns to reflect meaningful variable names (e.g.,
LifeStage
,Value
). - Pivoted columns for different life stages (e.g., spawning, juvenile) into a long format using
tidyr::pivot_longer()
.
- Data Cleaning:
- Replaced malformed characters and standardized names using
stringr
functions. - Filtered rows with missing or invalid values.
- Replaced malformed characters and standardized names using
- Export:
- Saved the processed dataset as a CSV file (
roberge.csv
) for further analysis.
- Saved the processed dataset as a CSV file (
3.10.3 Processed Data Structure
The processed dataset contains 5,319 rows and six fields, summarizing habitat characteristics across different life stages for freshwater fish species.
Field | Description |
---|---|
name | Common name of the species. |
scientificname | Scientific name of the species. |
MigrationStrategy | Migration strategy (e.g., anadromous, potamodromous). |
characteristics | Stream habitat characteristics (e.g., depth, flow type). |
LifeStage | Life stage (e.g., Spawning, Juvenile). |
Value | Habitat value associated with the life stage. |
3.11 Dahlke et al. (2020)
The Dahlke et al. (2020) (Dahlke et al., 2020b) compiles experimental and imputed thermal tolerance data, thermal safety margins, and responsiveness for various fish species and life stages. It supports the research on fish survival under warming conditions and size-dependent oxygen supply constraints (Dahlke et al., 2020a).
3.11.1 Source and Accessibility
- Source: PANGAEA - Data Publisher for Earth & Environmental Science
- DOI: 10.1594/PANGAEA.917796
- Accessibility: CC BY 4.0
- Data Type: Excel Spreadsheets
- Coverage: Global
- Processing Script:
prc_dahlke_2020.R
- Output Files:
experimental_imputed_tolerance.csv
thermal_safety_margins.csv
thermal_responsiveness.csv
thermal_tolerance.csv
3.11.2 Processing Steps
The dataset was harvested and processed from four Excel files hosted on the PANGAEA repository. Each file corresponds to a specific thermal-related aspect, and the processing involved the following steps:
- Loading Data:
- Read data from Excel files (
experimental_and_imputed_tolerance_data.xlsx
,thermal_safety_margins.xlsx
,thermal_responsiveness.xlsx
, andthermal_tolerance.xlsx
) usingreadxl
.
- Read data from Excel files (
- Cleaning and Restructuring:
- Standardized column names using
janitor::clean_names()
. - Pivoted wide-format sheets into long-format tables for easier integration.
- Filled missing values and cleaned malformed characters.
- Standardized column names using
- Data Integration:
- Combined multiple sheets within a single Excel file to create unified datasets.
- Merged species-specific attributes across life stages and realms.
- Exporting Processed Data:
- Saved cleaned datasets as CSV files (
vroom_write
) for further analysis.
- Saved cleaned datasets as CSV files (
3.11.3 Processed Data Structure
Field | Description |
---|---|
life_stage | Life stage of the species (e.g., Spawners, Juvenile). |
species | Species name following FishBase taxonomy. |
tmin_c | Minimum temperature (°C) tolerance. |
tmax_c | Maximum temperature (°C) tolerance. |
tmid_c | Midpoint temperature (°C) between Tmin and Tmax. |
trange_c | Range of temperature tolerance (°C). |
reference | Citation for the data source. |
Field | Description |
---|---|
species | Species name following FishBase taxonomy. |
lifestage | Life stage associated with the thermal response (e.g., Embryos, Juveniles). |
response | Type of response (e.g., Development Rate, Survival). |
trange_c | Range of temperature (°C) associated with the response. |
tmid_c | Midpoint temperature (°C) for the response. |
reference | Citation for the data source. |
Field | Description |
---|---|
species | Species name following FishBase taxonomy. |
realm | Habitat realm of the species (e.g., Freshwater, Marine). |
depth_spawners | Spawning depth range in meters. |
depths_embryos | Embryo depth range in meters. |
spawning_season | Seasonality of spawning (e.g., MJJ for May–July). |
Field | Description |
---|---|
thermal_tolerance | Thermal tolerance parameter (e.g., Tmax, Tmin). |
species_fish_base | Species name following FishBase taxonomy. |
latitude | Geographic latitude of observation. |
lifestage | Life stage of the species (e.g., Spawner, Embryo). |
tmax_c | Maximum temperature (°C) tolerance. |
tmin_c | Minimum temperature (°C) tolerance. |
trange_c | Range of temperature tolerance (°C). |
realm | Habitat realm of the species (e.g., Marine, Freshwater). |
3.12 FishBase
The FishBase dataset (Froese and Pauly, 2024) provides a comprehensive database of global fish species, covering taxonomic, ecological, and biological attributes. Using the rfishbase
R package (Boettiger et al., 2012), this dataset was retrieved via API queries and includes species information, habitat details, growth patterns, diet, reproduction, and distribution data.
3.12.1 Source and Accessibility
- Source: FishBase API via
rfishbase
- DOI: https://www.fishbase.org
- Accessibility: FishBase Terms of Use
- Data Type: API Queries
- Coverage: Global
- Processing Script:
download_fishbase.R
- Output Files:
species.csv
fecundity.csv
reproduc.csv
eggdev.csv
larvdyn.csv
fooditems.csv
ecology.csv
swimming.csv
spawning_traits.csv
spawning_phenology.csv
larvae_traits.csv
larvae_phenology.csv
larvaepresence_phenology.csv
3.12.2 Data retrieval
FishBase provides over 200 tables, all of which were summarily searched for relevance on the context of the current project. The final list of tables considered is available here in the following table:
Table Name | Description |
---|---|
ecology |
Includes ecological traits such as habitat preference, migratory behavior, and feeding mode. |
eggdev |
Details on egg development stages and environmental influences. |
fecundity |
Contains information on fish fecundity, including egg production and reproductive output. |
fooditems |
Provides detailed information on specific food items consumed by fish. |
larvae |
Contains larval traits and early life stage characteristics. |
larvaepresence |
Information on the presence of larvae in different locations. |
larvdyn |
Provides data on larval dynamics, including movement and growth. |
reproduc |
Provides reproductive characteristics, including spawning type and parental care. |
spawning |
Details spawning behavior, including seasonality, habitat, and locations. |
species |
Main table containing taxonomic and biological information on fish species. |
swimming |
Provides information on fish swimming capabilities, including speed and behavior. |
3.12.3 Processing Steps
The dataset was harvested using API queries to FishBase via rfishbase
. The processing steps involved, but are not limited to:
- Fetching Data:
- Extracted tables covering species, fecundity, reproduction, egg development, larval dynamics, food items, ecology, and swimming behavior.
- Queried FishBase for spawning, larval phenology, and larval presence phenology.
- Data Cleaning & Standardization:
- Standardized column names with
janitor::clean_names()
. - Filtered species based on the reference list of freshwater fish species in Canada.
- Removed redundant columns and transformed categorical variables for consistency.
- Standardized column names with
- Data Structuring:
- Merged multiple sources of reproductive data to create a unified
reproduc.csv
file. - Created long-format data for spawning and larval phenology to facilitate time-series analyses.
- Consolidated habitat data for ecological profiling.
- Merged multiple sources of reproductive data to create a unified
- Exporting Processed Data:
- Saved all cleaned tables as CSV files for further analyses.
3.12.4 Individual Tables
3.12.4.1 Ecology (ecology
)
3.12.4.1.1 Explanation of Data Content
The Ecology Table in FishBase reference provides information on species habitat preferences, depth range, trophic levels, and associations with different aquatic environments.
- Trophic Levels (
diet_troph
,food_troph
):diet_troph
: The trophic level of the species based on its diet.food_troph
: The trophic level of the species based on observed food intake.
- Habitat Zones:
- Marine and Coastal Zones: Includes
neritic
,supra_littoral_zone
,saltmarshes
,littoral_zone
,tide_pools
,intertidal
,sub_littoral
,oceanic
,epipelagic
,mesopelagic
,bathypelagic
,abyssopelagic
, andhadopelagic
. - Freshwater and Transitional Habitats: Includes
estuaries
,mangroves
,marshes_swamps
,stream
,lakes
, andcaves
.
- Marine and Coastal Zones: Includes
- Substrate Preferences:
- Different substrate types are recorded, such as
soft_bottom
,sand
,coarse
,fine
,level
,sloping
,silt
,mud
,ooze
,detritus
,organic
,hard_bottom
,rocky
,rubble
,gravel
,vegetation
, anddriftwood
.
- Different substrate types are recorded, such as
These fields allow researchers to analyze species distributions across environments, their interactions within ecosystems, and their adaptability to changing conditions.
3.12.4.1.2 Processing
The ecology.csv
file was generated using the following processing steps:
- Selection of Relevant Fields:
- Chose species-related ecological fields, including habitat zones, trophic levels, and substrate preferences.
- Data Transformation:
- Converted missing values (
NA
) to0
where appropriate for habitat presence/absence. - Replaced
-1
values with1
to indicate presence in binary habitat fields. - Applied
dplyr::mutate()
to ensure correct encoding of categorical variables.
- Converted missing values (
- Filtering & Summarization:
- Filtered rows where all habitat presence indicators were
0
to remove non-relevant entries. - Used
dplyr::group_by(spec_code)
anddplyr::summarise()
to compute mean trophic levels and aggregate habitat presence across multiple observations.
- Filtered rows where all habitat presence indicators were
- Exporting Cleaned Data:
- The final dataset was written to
ecology.csv
for downstream analyses.
- The final dataset was written to
3.12.4.1.3 Processed Data Structure
Field | Description |
---|---|
species_id | Unique species identifier |
diet_troph | Trophic level based on diet |
food_troph | Trophic level based on food intake |
neritic | Presence in neritic zones (binary) |
supra_littoral_zone | Presence in supra-littoral zones (binary) |
saltmarshes | Presence in saltmarshes (binary) |
littoral_zone | Presence in littoral zones (binary) |
tide_pools | Presence in tide pools (binary) |
intertidal | Presence in intertidal zones (binary) |
sub_littoral | Presence in sub-littoral zones (binary) |
caves | Presence in caves (binary) |
oceanic | Presence in oceanic zones (binary) |
epipelagic | Presence in epipelagic zones (binary) |
mesopelagic | Presence in mesopelagic zones (binary) |
bathypelagic | Presence in bathypelagic zones (binary) |
abyssopelagic | Presence in abyssopelagic zones (binary) |
hadopelagic | Presence in hadopelagic zones (binary) |
estuaries | Presence in estuaries (binary) |
mangroves | Presence in mangroves (binary) |
marshes_swamps | Presence in marshes and swamps (binary) |
cave_anchialine | Presence in anchialine caves (binary) |
stream | Presence in streams (binary) |
lakes | Presence in lakes (binary) |
cave | Presence in caves (binary) |
cave2 | Additional cave habitat information (binary) |
soft_bottom | Preference for soft bottom substrates (binary) |
sand | Preference for sand substrates (binary) |
coarse | Preference for coarse substrates (binary) |
fine | Preference for fine substrates (binary) |
level | Preference for level substrates (binary) |
sloping | Preference for sloping substrates (binary) |
silt | Preference for silt substrates (binary) |
mud | Preference for mud substrates (binary) |
ooze | Preference for ooze substrates (binary) |
detritus | Presence of detritus (binary) |
organic | Presence of organic matter (binary) |
hard_bottom | Preference for hard bottom substrates (binary) |
rocky | Preference for rocky habitats (binary) |
rubble | Preference for rubble habitats (binary) |
gravel | Preference for gravel habitats (binary) |
vegetation | Presence in vegetated habitats (binary) |
driftwood | Presence near driftwood (binary) |
fb_table | FishBase source table indicator |
3.12.4.2 Egg Development (eggdev
)
3.12.4.2.1 Explanation of Data Content
The Egg Development Table in FishBase reference provides details on the environmental conditions required for fish eggs to develop.
- Temperature Requirements (
temperature
):- Represents the mean recorded temperature at which eggs develop.
- Understanding thermal tolerance at the egg stage is essential for studying species adaptations to different climatic conditions.
- Salinity Preferences (
freshwater
,brackish
):- Freshwater (
freshwater
): Indicates whether a species’ eggs develop in freshwater conditions. - Brackish (
brackish
): Indicates whether eggs can develop in brackish environments. - These values help identify species that can tolerate a range of salinities, which is useful for habitat conservation and management.
- Freshwater (
3.12.4.2.2 Processing
The eggdev.csv
file was generated using the following processing steps:
- Selection of Relevant Fields:
- Extracted species-related egg development fields, including temperature and salinity preferences.
- Data Transformation:
- Computed the mean temperature per species using
dplyr::group_by()
andsummarize()
. - Converted missing values (
NA
) to0
where appropriate for salinity preferences. - Restructured the salinity data using
tidyr::pivot_wider()
to create binary presence/absence indicators for freshwater and brackish environments.
- Computed the mean temperature per species using
- Filtering & Summarization:
- Removed species with no recorded temperature or salinity values.
- Aggregated multiple records per species to avoid redundancy.
- Exporting Cleaned Data:
- The final dataset was written to
eggdev.csv
for further analyses.
- The final dataset was written to
3.12.4.2.3 Processed Data Structure
Field | Description |
---|---|
species_id | Unique species identifier |
temperature | Mean recorded temperature during egg development |
freshwater | Presence in freshwater environments (binary) |
brackish | Presence in brackish environments (binary) |
fb_table | FishBase source table indicator |
3.12.4.3 Fecundity (fecundity
)
3.12.4.3.1 Explanation of Data Content
The Fecundity Table in FishBase provides information on the reproductive capacity of fish species, specifically focusing on the number of eggs produced per spawning event.
- Fecundity Estimates (
fecundity_min
,fecundity_max
):- Represents the estimated range of eggs produced per spawning event.
fecundity_min
: The lowest recorded number of eggs produced.fecundity_max
: The highest recorded number of eggs produced.- These values help assess species reproductive potential and variability in spawning output.
- Fecundity Factors:
- While not included in this dataset, fecundity in FishBase is often linked to factors like body size, reproductive strategy, and environmental conditions.
3.12.4.3.2 Processing
The fecundity.csv
file was generated using the following processing steps:
- Selection of Relevant Fields:
- Extracted species-related fecundity fields, including minimum and maximum fecundity estimates.
- Data Transformation:
- Computed the mean fecundity values for species with multiple records using
dplyr::group_by()
andsummarize()
. - Ensured missing values (
NA
) were excluded from the mean calculations.
- Computed the mean fecundity values for species with multiple records using
- Filtering & Summarization:
- Removed species with no available fecundity data to retain only informative records.
- Exporting Cleaned Data:
- The final dataset was written to
fecundity.csv
for further analyses.
- The final dataset was written to
3.12.4.3.3 Processed Data Structure
Field | Description |
---|---|
species_id | Unique species identifier |
fecundity_min | Mean minimum recorded fecundity |
fecundity_max | Mean maximum recorded fecundity |
fb_table | FishBase source table indicator |
3.12.4.4 Food Items (fooditems
)
3.12.4.4.1 Explanation of Data Content
The Food Items Table in FishBase reference provides insights into the dietary composition of fish species, documenting the type of prey consumed and the life stages at which they are eaten. This dataset helps in understanding feeding ecology, trophic interactions, and the role of fish species in food webs.
- Primary, Secondary, and Tertiary Food Items (
food_i
,food_ii
,food_iii
):food_i
: Broad food category such as detritus, nekton, or plankton.food_ii
: More specific classification withinfood_i
(e.g., finfish under nekton, cephalopods under mollusks).food_iii
: The most specific food category, detailing exact prey type (e.g., squids/cuttlefish, bony fish, carcasses).
- Prey and Predator Life Stages (
prey_stage
,predator_stage
):- Specifies the developmental stages of both prey and predator (e.g., larval, juvenile, adult).
- Indicates at what stage of life a fish species consumes specific types of prey.
3.12.4.4.2 Processing
The fooditems.csv
file was generated using the following processing steps:
- Selection of Relevant Fields:
- Extracted dietary data related to fish feeding habits, including primary (
food_i
), secondary (food_ii
), and tertiary (food_iii
) food items. - Included details on prey and predator life stages (
prey_stage
,predator_stage
). - Retained geospatial fields (
country
,longitude
,latitude
) where available.
- Extracted dietary data related to fish feeding habits, including primary (
- Data Transformation:
- Standardized food category names for consistency.
- Ensured missing values (
NA
) were preserved where information was unavailable.
- Geospatial Information Handling:
- Included location-based data where available to associate feeding habits with geographic distribution.
- Maintained missing values for species with unreported feeding locations.
- Exporting Cleaned Data:
- The final dataset was written to
fooditems.csv
for further analyses.
- The final dataset was written to
3.12.4.4.3 Processed Data Structure
Field | Description |
---|---|
species_id | Unique species identifier |
country | Country where the food record was observed |
longitude | Longitude of the food observation |
latitude | Latitude of the food observation |
food_i | Primary food category (e.g., detritus, nekton, plankton) |
food_ii | Secondary food category (e.g., cephalopods, finfish) |
food_iii | Tertiary food category (e.g., squids/cuttlefish, bony fish) |
prey_stage | Life stage of the prey (e.g., larvae, juvenile, adult) |
predator_stage | Life stage of the predator (e.g., larvae, juvenile, adult) |
fb_table | FishBase source table indicator |
3.12.4.5 Larvae (larvae
)
3.12.4.5.1 Explanation of Data Content
The Larvae Table in FishBase reference contains detailed information about the early life stages of fish species, including their developmental traits and seasonal presence.
- Larval Traits (
larvae_traits
):- Place of Development (
placeof_development
): Describes where larvae develop (e.g., planktonic, in a closed nest, on the substrate). - Larval Duration (
larval_duration_min
,larval_duration_max
,larval_duration_mod
): Duration of the larval stage in days, with minimum, maximum, and modal estimates. - Shape of Yolk Sac (
shapeofyolksac
): Describes the yolk sac morphology, which can influence larval survival. - Body Form (
body_form
): Indicates whether larvae are fusiform, elongated, or have another body shape.
- Place of Development (
- Larval Phenology (
larvae_phenology
):- Locality (
locality
): Provides the geographic region where larval presence was recorded. - Monthly Presence (
jan
–dec
): Indicates the months in which larvae were observed (1 = present, 0 = absent). - This data is essential for tracking seasonal spawning patterns and larval dispersal dynamics.
- Locality (
3.12.4.5.2 Processing
The larvae_traits.csv
and larvae_phenology.csv
files were generated using the following processing steps:
- Selection of Relevant Fields:
- For
larvae_traits.csv
: Selected species-related developmental traits, including duration, yolk sac shape, and body form. - For
larvae_phenology.csv
: Selected locality and monthly larval presence indicators.
- For
- Data Transformation:
- Standardized categorical variables such as
body_form
to ensure consistency (e.g., converting “eel-like” to “elongated”). - Converted missing values (
NA
) to0
for monthly larval presence data. - Used
dplyr::mutate()
to standardize habitat descriptors.
- Standardized categorical variables such as
- Filtering & Summarization:
- Removed entries where no larval traits were recorded.
- Ensured only species with confirmed larval presence were retained in
larvae_phenology.csv
.
- Exporting Cleaned Data:
- The final datasets were written to
larvae_traits.csv
andlarvae_phenology.csv
for further analyses.
- The final datasets were written to
3.12.4.5.3 Processed Data Structure
3.12.4.5.3.1 Larval Traits (larvae_traits.csv
)
Field | Description |
---|---|
species_id | Unique species identifier |
placeof_development | Location of larval development (e.g., planktonic) |
larval_duration_min | Minimum larval duration (days) |
larval_duration_max | Maximum larval duration (days) |
larval_duration_mod | Modal larval duration (days) |
shapeofyolksac | Shape of yolk sac |
body_form | Larval body form (e.g., fusiform, elongated) |
fb_table | FishBase source table indicator |
3.12.4.5.3.2 Larval Phenology (larvae_phenology.csv
)
Field | Description |
---|---|
species_id | Unique species identifier |
locality | Geographic location of larval observations |
jan-dec | Monthly indicators of larval presence (1 = present, 0 = absent) |
fb_table | FishBase source table indicator |
3.12.4.6 Larval Presence (larvaepresence
)
3.12.4.6.1 Explanation of Data Content
The Larval Presence Table in FishBase provides information on the geographic and seasonal occurrence of fish larvae.
- Monthly Presence (
jan
–dec
):- Indicates the months in which larvae were observed at a given location.
- Values are binary (
1
= present,0
= absent), providing insight into seasonal larval distributions.
3.12.4.6.2 Processing
The larvaepresence_phenology.csv
file was generated using the following processing steps:
- Selection of Relevant Fields:
- Retained species identification, geographic location, and monthly larval presence indicators.
- Data Transformation:
- Converted missing values (
NA
) to0
, indicating no recorded presence. - Replaced occurrences of
111
with1
, ensuring correct binary encoding. - Used
dplyr::mutate()
to standardize monthly presence values.
- Converted missing values (
- Filtering & Summarization:
- Removed species without any recorded larval presence.
- Ensured that at least one month contained larval presence data before inclusion in the final dataset.
- Exporting Cleaned Data:
- The final dataset was written to
larvaepresence_phenology.csv
for further analyses.
- The final dataset was written to
3.12.4.6.3 Processed Data Structure
Field | Description |
---|---|
species_id | Unique species identifier |
country | Country where the larval presence was observed |
longitude | Longitude of the observation site |
latitude | Latitude of the observation site |
jan-dec | Monthly indicators of larval presence (1 = present, 0 = absent) |
fb_table | FishBase source table indicator |
3.12.4.7 Larval Dynamics (larvdyn
)
3.12.4.7.1 Explanation of Data Content
The Larval Dynamics Table in FishBase reference provides data on the environmental conditions affecting fish larvae, including ecosystem type, water temperature, and larval duration.
- Ecosystem (
ecosystem
):- Describes the habitat type where larval development occurs (e.g., Shelf, Freshwater, Marine).
- Helps in assessing species adaptation to different environmental conditions.
- Temperature (
temperature
):- Represents the recorded water temperature (°C) where larvae develop.
- Critical for studying species-specific thermal tolerances and climate change impacts.
- Larval Duration (
duration
):- Indicates the estimated number of days larvae remain in the larval stage.
- Longer durations suggest extended planktonic phases, which can affect dispersal and survival.
3.12.4.7.2 Processing
The larvdyn.csv
file was generated using the following processing steps:
- Selection of Relevant Fields:
- Extracted ecosystem type, water temperature, and larval duration for each species.
- Data Transformation:
- Ensured categorical values for
ecosystem
were standardized. - Retained missing values (
NA
) where temperature or duration was unreported. - Applied
dplyr::mutate()
to tag records with the FishBase table name.
- Ensured categorical values for
- Filtering & Summarization:
- Removed species without any recorded larval dynamics data.
- Retained unique records per species to prevent duplication.
- Exporting Cleaned Data:
- The final dataset was written to
larvdyn.csv
for further analyses.
- The final dataset was written to
3.12.4.7.3 Processed Data Structure
Field | Description |
---|---|
species_id | Unique species identifier |
ecosystem | Ecosystem where larval development occurs |
temperature | Water temperature (°C) at larval development |
duration | Duration of the larval stage (days) |
fb_table | FishBase source table indicator |
3.12.4.8 Reproduction (reproduc
)
3.12.4.8.1 Explanation of Data Content
The Reproduction Table in FishBase reference provides information on the reproductive strategies of fish species, including their fertilization method, spawning patterns, and parental care.
- Reproductive Mode (
repro_mode
):- Describes how sexes are structured in a species.
- Common values include dioecism (separate sexes), hermaphroditism, and unisex populations.
- Describes how sexes are structured in a species.
- Fertilization Type (
fertilization
):- Indicates whether fertilization occurs externally (in water) or internally (within the body).
- Mating System (
mating_system
):- Provides details on the mating strategy, though this field is often unreported.
- Spawning Pattern (
spawning
):- Describes the seasonal and geographic variation in spawning behavior.
- Some species spawn throughout the year, while others have distinct reproductive seasons.
- Describes the seasonal and geographic variation in spawning behavior.
- Batch Spawning (
batch_spawner
):- 0 = Does not spawn in batches.
- 1 = Produces eggs in multiple spawning events.
- 0 = Does not spawn in batches.
- Reproductive Guild (
rep_guild1
,rep_guild2
):- Classifies species based on their spawning and parental care behaviors.
- Primary guild (
rep_guild1
): Nonguarders, guarders, bearers.
- Secondary guild (
rep_guild2
): More specific categorization, such as substratum egg scatterers or brood hiders.
- Classifies species based on their spawning and parental care behaviors.
- Parental Care (
parental_care
):- Indicates if a species provides care for its eggs or offspring.
- Values include none, guarding, mouthbrooding, or live-bearing behaviors.
- Indicates if a species provides care for its eggs or offspring.
3.12.4.8.2 Processing
The reproduc.csv
file was generated using the following processing steps:
- Selection of Relevant Fields:
- Extracted key reproductive attributes such as reproductive mode, fertilization type, and parental care.
- Data Transformation:
- Standardized categorical variables (
repro_mode
,fertilization
,rep_guild1
,rep_guild2
,parental_care
) to ensure consistency.
- Ensured missing values (
NA
) remained where information was unreported.
- Standardized categorical variables (
- Filtering & Summarization:
- Removed species without any recorded reproductive data.
- Exporting Cleaned Data:
- The final dataset was written to
reproduc.csv
for further analyses.
- The final dataset was written to
3.12.4.8.3 Processed Data Structure
Field | Description |
---|---|
species_id | Unique species identifier |
repro_mode | Reproductive mode (e.g., dioecism, hermaphroditism) |
fertilization | Fertilization type (external or internal) |
mating_system | Mating strategy |
spawning | Spawning pattern (e.g., seasonal, year-round) |
batch_spawner | Indicator of batch spawning (1 = Yes, 0 = No) |
rep_guild1 | Primary reproductive guild classification |
rep_guild2 | Secondary reproductive guild classification |
parental_care | Type of parental care (e.g., none, mouthbrooding) |
fb_table | FishBase source table indicator |
3.12.4.9 Spawning (spawning
)
3.12.4.9.1 Explanation of Data Content
The Spawning Table in FishBase reference provides detailed information on the reproductive seasonality and environmental requirements of fish species.
- Spawning Traits (
spawning_traits
):- Temperature Requirements (
temp_low
,temp_high
): Indicates the range of temperatures (°C) at which spawning occurs.
- Fecundity Estimates (
fecundity_min
,fecundity_max
): Represents the estimated range of eggs produced per spawning event.
- Spawning Cycles (
spawning_cycles
): Number of spawning events per year.
- Spawning Habitats (
coastal
,lacustrine
,riverine
,estuarine
): Indicates the type of environment where spawning takes place.
- Temperature Requirements (
- Spawning Phenology (
spawning_phenology
):- Geographic Information (
country
,longitude
,latitude
): Location details of where spawning was observed.
- Monthly Spawning Presence (
jan
–dec
): Binary indicators (1
= active spawning,0
= no activity) showing seasonal trends in spawning behavior.
- Geographic Information (
3.12.4.9.2 Processing
The spawning_traits.csv
and spawning_phenology.csv
files were generated using the following processing steps:
- Selection of Relevant Fields:
- Extracted environmental and reproductive attributes for
spawning_traits.csv
.
- Retained geographic and temporal indicators for
spawning_phenology.csv
.
- Extracted environmental and reproductive attributes for
- Data Transformation:
- Computed mean temperature and fecundity values per species using
dplyr::group_by()
andsummarize()
.
- Converted missing values (
NA
) to0
for monthly spawning indicators.
- Standardized habitat types using
tidyr::pivot_wider()
to create binary presence/absence indicators.
- Computed mean temperature and fecundity values per species using
- Filtering & Summarization:
- Retained only species with at least one recorded spawning event.
- Removed records where all monthly spawning indicators were
0
.
- Retained only species with at least one recorded spawning event.
- Exporting Cleaned Data:
- The final datasets were written to
spawning_traits.csv
andspawning_phenology.csv
for further analyses.
- The final datasets were written to
3.12.4.9.3 Processed Data Structure
3.12.4.9.3.1 Spawning Traits (spawning_traits.csv
)
Field | Description |
---|---|
species_id | Unique species identifier |
temp_low | Minimum temperature (°C) at which spawning occurs |
temp_high | Maximum temperature (°C) at which spawning occurs |
fecundity_min | Minimum estimated fecundity |
fecundity_max | Maximum estimated fecundity |
spawning_cycles | Number of spawning events per year |
coastal | Presence in coastal spawning habitats (binary) |
lacustrine | Presence in lake spawning habitats (binary) |
riverine | Presence in river spawning habitats (binary) |
estuarine | Presence in estuarine spawning habitats (binary) |
fb_table | FishBase source table indicator |
3.12.4.9.3.2 Spawning Phenology (spawning_phenology.csv
)
Field | Description |
---|---|
species_id | Unique species identifier |
country | Country where spawning presence was observed |
longitude | Longitude of the observation site |
latitude | Latitude of the observation site |
jan-dec | Monthly indicators of spawning presence (1 = present, 0 = absent) |
fb_table | FishBase source table indicator |
3.12.4.10 Species (species
)
3.12.4.10.1 Explanation of Data Content
The Species Table in FishBase reference provides fundamental taxonomic and biological information for fish species.
- Taxonomic Information:
- Scientific Name (
scientific
): The genus and species combination used for taxonomic classification.
- Preferred Image (
pic_preferred_name
): The filename of the representative species image.
- Scientific Name (
- Habitat and Distribution:
- Freshwater (
fresh
), Brackish (brack
), Saltwater (saltwater
): Binary indicators (1
= present,0
= absent) representing habitat preferences.
- Pelagic or Demersal (
demers_pelag
): Classification indicating if the species is bottom-dwelling (demersal) or lives in the water column (pelagic).
- Air Breathing (
air_breathing
): Indicates if the species is capable of air breathing.
- Anadromy and Catadromy (
ana_cat
): Classifies migratory species that move between freshwater and marine environments.
- Freshwater (
- Life History Traits:
- Longevity in the Wild (
longevity_wild
): Maximum recorded lifespan (years).
- Maximum Length (
length
): Maximum total length (cm) recorded for the species.
- Common Length (
common_length
): The most frequently observed length (cm) for the species.
- Maximum Weight (
weight
): Maximum recorded weight (grams) for the species.
- Longevity in the Wild (
- Economic Importance:
- Importance (
importance
): Classifies species as commercial, minor commercial, subsistence, or game fish.
- Game Fish (
game_fish
): Binary indicator (1
= game fish,0
= not targeted for sport fishing).
- Importance (
3.12.4.10.2 Processing
The species.csv
file was generated using the following processing steps:
- Selection of Relevant Fields:
- Retained taxonomic, habitat, and biological attributes essential for ecological and fisheries research.
- Data Transformation:
- Standardized categorical variables (
body_shape_i
,demers_pelag
,ana_cat
,importance
) for consistency.
- Converted missing values (
NA
) to0
for binary habitat fields (fresh
,brack
,saltwater
).
- Standardized categorical variables (
- Filtering & Summarization:
- Removed species lacking taxonomic or ecological data to maintain dataset integrity.
- Mapped species to FishBase images where available.
- Removed species lacking taxonomic or ecological data to maintain dataset integrity.
- Exporting Cleaned Data:
- The final dataset was written to
species.csv
for further analyses.
- The final dataset was written to
3.12.4.10.3 Processed Data Structure
Field | Description |
---|---|
species_id | Unique species identifier |
scientific | Genus and species combination |
pic_preferred_name | Filename of the representative species image |
body_shape_i | Body shape classification (e.g., fusiform) |
fresh | Presence in freshwater habitats (binary) |
brack | Presence in brackish water habitats (binary) |
saltwater | Presence in saltwater habitats (binary) |
demers_pelag | Classification as demersal or pelagic |
air_breathing | Indicator of air-breathing ability (binary) |
ana_cat | Migration category (e.g., anadromous, catadromous) |
longevity_wild | Maximum recorded lifespan (years) |
length | Maximum recorded length (cm) |
common_length | Most frequently observed length (cm) |
weight | Maximum recorded weight (grams) |
importance | Economic importance (e.g., commercial, game fish) |
game_fish | Indicator of game fish status (binary) |
fb_table | FishBase source table indicator |
3.12.4.11 Swimming (swimming
)
3.12.4.11.1 Explanation of Data Content
The Swimming Table in FishBase reference provides information on the swimming strategies of fish species.
- Swimming Type (
adult_type
):- Describes the primary movement mechanism used by adult fish.
- Common values include:
- Movements of body and/or caudal fin (e.g., undulatory propulsion).
- Movements of median and/or paired fins (e.g., fin propulsion without significant body movement).
- Movements of body and/or caudal fin (e.g., undulatory propulsion).
- Describes the primary movement mechanism used by adult fish.
- Swimming Mode (
adult_mode
):- Classifies species based on their swimming kinematics.
- Common categories:
- Anguilliform (e.g., eels—body waves propagate the entire length).
- Carangiform (e.g., jacks—flexion in the posterior half of the body).
- Subcarangiform (e.g., salmon—less body flexion than carangiform).
- Thunniform (e.g., tuna—efficient, high-speed swimming with minimal body movement).
- Labriform (e.g., wrasses—use of pectoral fins for propulsion).
- Anguilliform (e.g., eels—body waves propagate the entire length).
- Classifies species based on their swimming kinematics.
3.12.4.11.2 Processing
The swimming.csv
file was generated using the following processing steps:
- Selection of Relevant Fields:
- Extracted adult swimming type and mode to retain core locomotive attributes.
- Data Transformation:
- Standardized categorical variables (
adult_type
,adult_mode
) for consistency.
- Ensured missing values (
NA
) remained where data was unreported.
- Standardized categorical variables (
- Filtering & Summarization:
- Removed species without recorded swimming data.
- Exporting Cleaned Data:
- The final dataset was written to
swimming.csv
for further analyses.
- The final dataset was written to
3.12.4.11.3 Processed Data Structure
Field | Description |
---|---|
species_id | Unique species identifier |
adult_type | Primary movement mechanism |
adult_mode | Swimming kinematics classification |
fb_table | FishBase source table indicator |