`data.processing` ¶

Data Processing Module¶

This module handles all input/output operations for the AFCCP modeling pipeline.

It supports importing problem instance data (AFSCs, cadets, preferences, value functions, etc.) and exporting solutions and diagnostics to CSV and Excel formats for analysis. It also initializes the directory structure for versioned data instances.

Key Capabilities¶

Organizes input/output folders and paths for a data instance
Imports cleaned data required for AFCCP optimization
Exports results and supporting data for evaluation and visualization

Primary Functions¶

Initialization
- initialize_file_information: Sets up import/export folder paths for a given data name and version
Import Functions
- import_afscs_data: Loads AFSCs and related structural data
- import_cadets_data: Loads cadet records and attributes
- import_afsc_cadet_matrices_data: Loads cadet preference matrices and qualification matrices
- import_value_parameters_data: Loads objective weights and breakpoints for value functions
- import_solutions_data: Imports previously solved cadet-to-AFSC assignments
- import_additional_data: Loads auxiliary data like base assignments and course info
Export Functions
- export_afscs_data: Saves AFSC-related data to CSV
- export_cadets_data: Saves cadet-related data to CSV
- export_afsc_cadet_matrices_data: Saves preference and qualification matrices to CSV
- export_value_parameters_data: Saves value function breakpoints and weights to CSV
- export_solutions_data: Saves one or more solutions in compact CSV format
- export_additional_data: Saves supporting data (base preferences, utility matrices, courses)
- export_solution_results_excel: Writes detailed solution metrics and diagnostics to an Excel workbook

Notes¶

All functions assume access to an Instance object that contains parameters, value structures, solution data, and export path information. The module is used during both the data preparation and results analysis stages of the AFCCP workflow.

`initialize_file_information(data_name: str, data_version: str)` ¶

Initialize filepaths for an AFCCP data instance.

This function constructs and returns import/export file path dictionaries for a given AFCCP data instance identified by data_name and data_version. It ensures the required directory structure exists under instances/ and dynamically builds file paths for reading and writing instance-specific data and results.

It is primarily used to manage file I/O consistently across different data experiments or scenario versions within the AFCCP modeling system.

Parameters¶

data_name : str The name of the data instance (e.g., "2025", "Baseline", "TestRun01"). This defines the subdirectory under instances/ where the data is stored.
data_version : str The version label for the run (e.g., "Default", "V1"). Used to separate multiple experimental runs under the same data instance name, enabling controlled versioning of model input and output files.

Returns¶

Tuple[Dict[str, str], Dict[str, str]] A tuple of two dictionaries:
- import_paths: maps each input data type (e.g., "Cadets", "AFSCs") to its CSV file path.
- export_paths: maps each output destination (e.g., "Solutions", "Results Charts") to its folder or file path.

Directory Behavior¶

Creates base folder instances/<data_name>/ if it doesn't exist.
Creates version-specific folders for "Model Input" and "Analysis & Results":
- e.g., "4. Model Input (V1)", "5. Analysis & Results (V1)"
Also creates subfolders under "Analysis & Results" such as:
- "Data Charts", "Results Charts", "Cadet Board", "Value Functions"
If version-specific input files are not found, it defaults to shared or base files when appropriate.

Examples¶

from afccp.data.processing import initialize_file_information

import_paths, export_paths = initialize_file_information("2025", "V1")
afsc_path = import_paths["AFSCs"]
solution_folder = export_paths["Analysis & Results"]

Source code in afccp/data/processing.py

def initialize_file_information(data_name: str, data_version: str):
    """
    Initialize filepaths for an AFCCP data instance.

    This function constructs and returns import/export file path dictionaries for a given AFCCP
    data instance identified by `data_name` and `data_version`. It ensures the required directory
    structure exists under `instances/` and dynamically builds file paths for reading and writing
    instance-specific data and results.

    It is primarily used to manage file I/O consistently across different data experiments or
    scenario versions within the AFCCP modeling system.

    Parameters
    ----------
    - data_name : str
      The name of the data instance (e.g., `"2025"`, `"Baseline"`, `"TestRun01"`). This defines the
      subdirectory under `instances/` where the data is stored.

    - data_version : str
      The version label for the run (e.g., `"Default"`, `"V1"`). Used to separate multiple experimental
      runs under the same data instance name, enabling controlled versioning of model input and output files.

    Returns
    -------
    - Tuple[Dict[str, str], Dict[str, str]]
      A tuple of two dictionaries:
        - `import_paths`: maps each input data type (e.g., `"Cadets"`, `"AFSCs"`) to its CSV file path.
        - `export_paths`: maps each output destination (e.g., `"Solutions"`, `"Results Charts"`) to its folder or file path.

    Directory Behavior
    ------------------
    - Creates base folder `instances/<data_name>/` if it doesn't exist.
    - Creates version-specific folders for `"Model Input"` and `"Analysis & Results"`:
        - e.g., `"4. Model Input (V1)"`, `"5. Analysis & Results (V1)"`
    - Also creates subfolders under `"Analysis & Results"` such as:
        - `"Data Charts"`, `"Results Charts"`, `"Cadet Board"`, `"Value Functions"`
    - If version-specific input files are not found, it defaults to shared or base files when appropriate.

    Examples
    --------
    ```python
    from afccp.data.processing import initialize_file_information

    import_paths, export_paths = initialize_file_information("2025", "V1")
    afsc_path = import_paths["AFSCs"]
    solution_folder = export_paths["Analysis & Results"]
    ```
    """

    # If we don't already have the instance folder, we make it now
    instance_path = "instances/" + data_name + "/"
    if data_name not in afccp.globals.instances_available:
        os.mkdir(instance_path)
    instance_folder = np.array(os.listdir(instance_path))

    # Valid files/folders
    sub_folders = ["Original & Supplemental", "Combined Data", "CFMs", "Model Input", "Analysis & Results"]
    sub_folder_files = {"Model Input": ["Cadets", "Cadets Preferences", "Cadets Utility", "Cadets Utility Constraints",
                                        "Cadets Utility (Final)", "AFSCs", "AFSCs Preferences", "AFSCs Utility",
                                        "Value Parameters", "Goal Programming", "ROTC Rated Interest",
                                        "ROTC Rated OM", "USAFA Rated OM", "OTS Rated OM",
                                        "Bases", "Bases Preferences",
                                        "Bases Utility", "Courses", "Cadets Selected", "AFSCs Buckets",
                                        'Castle Input'],
                        "Analysis & Results": ["Solutions", "Base Solutions", "Course Solutions"]}

    # Loop through each sub-folder in the above list and determine the filepaths for the various files
    import_filepaths = {}
    export_filepaths = {}
    for i, sub_folder in enumerate(sub_folders):

        # Sub-Folder with the number: "4. Model Input" for example
        numbered_sub_folder = str(i + 1) + ". " + sub_folder

        # All the sub-folders that have this numbered sub-folder
        if len(instance_folder) != 0:
            indices = np.flatnonzero(np.core.defchararray.find(instance_folder, numbered_sub_folder) != -1)
            sub_folder_individuals = instance_folder[indices]
        else:
            sub_folder_individuals = []

        # If this is the "default version", we already know what the sub-folder has to be
        if data_version == "Default":
            import_sub_folder = numbered_sub_folder
            export_sub_folder = numbered_sub_folder

            # If we don't currently have this sub-folder, we make it (New instance file)
            if numbered_sub_folder not in sub_folder_individuals:
                os.mkdir(instance_path + numbered_sub_folder + "/")

        # If the data version was specified, we have to check if it has the specific folder or not
        else:

            # If the version folder is not there, we import from the default but will export to the data version folder
            version_indices = np.flatnonzero(np.core.defchararray.find(sub_folder_individuals, data_version) != -1)
            if len(version_indices) == 0:
                import_sub_folder = numbered_sub_folder
                export_sub_folder = numbered_sub_folder + " (" + data_version + ")"

                # We will only ever export specific version data to these sub-folders
                if sub_folder in ["Model Input", "Analysis & Results"]:
                    os.mkdir(instance_path + export_sub_folder + "/")

            # We already have the version folder
            else:
                import_sub_folder = sub_folder_individuals[version_indices[0]]
                export_sub_folder = sub_folder_individuals[version_indices[0]]

        # If this is one of the sub-folders we can import/export to/from
        if sub_folder in sub_folder_files:

            # Get sub folder paths
            import_sub_folder_path = instance_path + import_sub_folder + "/"
            export_sub_folder_path = instance_path + export_sub_folder + "/"

            # Add generic file-paths for this sub-folder
            export_filepaths[sub_folder] = export_sub_folder_path
            import_filepaths[sub_folder] = import_sub_folder_path

            # Loop through each file listed above in the "sub_folder_files" for this sub-folder
            sub_folder_files_available = os.listdir(import_sub_folder_path)
            for file in sub_folder_files[sub_folder]:

                # Create the name of the file
                if data_version == "Default":
                    filename = data_name + " " + file + ".csv"
                else:
                    filename = data_name + " " + file + " (" + data_version + ").csv"

                # Get the path that we would export this file to
                export_filepaths[file] = export_sub_folder_path + filename

                # If we already have this file in the "import path", we add it to that filepath dictionary
                if filename in sub_folder_files_available:
                    import_filepaths[file] = import_sub_folder_path + filename
                elif data_version != "Default" and data_name + " " + file + ".csv" in sub_folder_files_available:
                    import_filepaths[file] = import_sub_folder_path + data_name + " " + file + ".csv"

    # If we don't have one of the Analysis & Results "sub-sub folders", we make it
    for sub_sub_folder in ["Data Charts", "Results Charts", "Cadet Board", 'Value Functions']:
        if sub_sub_folder not in os.listdir(export_filepaths["Analysis & Results"]):
            os.mkdir(export_filepaths["Analysis & Results"] + sub_sub_folder + "/")

    # Return the information
    return import_filepaths, export_filepaths

`import_afscs_data(import_filepaths: dict, parameters: dict) -> dict` ¶

Imports AFSC-level model input data from CSV and populates the instance parameter dictionary.

This function reads the "AFSCs" input file (provided via import_filepaths) and updates the supplied parameters dictionary with structured information for each Air Force Specialty Code (AFSC). These inputs are essential for AFCCP modeling and include AFSC quotas, groupings, tiered degree requirements, and other structural attributes.

The function handles type conversion, fills missing entries, and appends a special unmatched AFSC ("*") for use in optimization logic.

Parameters¶

import_filepaths : dict A dictionary of import paths keyed by label (e.g., "AFSCs"). Must contain the key "AFSCs" pointing to the location of the AFSCs input CSV file.
parameters : dict A dictionary of instance-wide input parameters. This will be updated in-place with the AFSC-specific parameter values.

Returns¶

dict The updated parameters dictionary, now containing keys such as:
"afscs": List of AFSC names (plus an unmatched AFSC "*")
"acc_grp": Accession group categories
"afscs_stem": STEM-designation indicator for each AFSC
"quota_d", "quota_e", "quota_min", "quota_max": Target and constraint bounds
"pgl": Projected graduation levels for each AFSC
"Deg Tiers": Tiered degree qualification matrix (if present)

Notes¶

All values are loaded as NumPy arrays to facilitate vectorized modeling.
The unmatched AFSC "*" is appended to "afscs" for modeling unmatched cadets.
NaN entries and string "nan" values in the CSV are sanitized to empty strings before processing.
Degree tiers are only added if the "Deg Tier 1" column is present in the CSV. ```

`import_cadets_data(import_filepaths, parameters)` ¶

Imports Cadet-level model input data from CSV and populates the instance parameter dictionary.

This function reads the "Cadets" CSV file specified in import_filepaths and extracts relevant demographic, qualification, preference, training, and weighting information for each cadet. It populates the provided parameters dictionary with this structured data, including derived quantities like total cadet count, preference matrix dimensions, and accession source types (SOCs).

Parameters¶

import_filepaths : dict Dictionary of import paths keyed by label. Must contain the key "Cadets" pointing to the cadet input CSV path.
parameters : dict Dictionary of instance-level parameters. This dictionary will be updated in-place with cadet-related entries.

Returns¶

dict The updated parameters dictionary, now containing cadet-specific fields such as:

"cadets", "merit", "assigned", "asc1"/"asc2", "cip1"/"cip2" (basic identifiers)
"usafa", "soc", "minority", "race", "ethnicity" (demographic data)
"must_match" (AFSCs that must be assigned)
"c_preferences": Preference matrix (N x P) where N = cadets, P = preference slots
"c_utilities": Utility matrix (N x U), where U = min(P, 10)
"SOCs": List of accession sources present in this instance (e.g., ["usafa", "rotc"])
"training_start", "training_preferences", "training_threshold" (training pipeline values)
"weight_afsc", "weight_base", "weight_course" (objective weights)

Notes¶

NaN or string 'nan' entries in the CSV are automatically sanitized.
Extra care is taken to remove BOM characters (e.g., "ï»¿") in CSV headers.
Preferences are detected using any column starting with "Pref_", and corresponding utilities from "Util_1" onward.
SOCs must be one of "USAFA", "ROTC", or "OTS"; any other value will raise an error.
This function calls gather_degree_tier_qual_matrix() to supplement qualification mappings.

`import_afsc_cadet_matrices_data(import_filepaths, parameters)` ¶

Imports optional AFSC-cadet interaction matrices and updates the instance parameter dictionary accordingly.

This function augments the core cadet and AFSC input data by importing preference matrices, utility matrices, and supplemental rated-selection files (if available). The imported data enables more advanced modeling of cadet-AFSC interactions including two-sided preferences and selection boards.

Parameters¶

import_filepaths : dict Dictionary of filepaths keyed by dataset name. Recognized keys include:
- "Cadets Utility"
- "Cadets Preferences"
- "AFSCs Utility"
- "AFSCs Preferences"
- "Cadets Utility (Final)"
- "Cadets Selected"
- "AFSCs Buckets"
- "ROTC Rated Interest", "ROTC Rated OM", "USAFA Rated OM", "OTS Rated OM"
parameters : dict Dictionary of model instance parameters. Must contain:
- "afscs": array of AFSC names
- "N": number of cadets
- "M": number of AFSCs
- "num_util": number of utility entries per cadet
- "P": number of preferences

Returns¶

dict Updated parameter dictionary with new keys (if data was provided), including:

"utility": cadet utility matrix (N x M)
"c_pref_matrix": cadet preference rankings (N x M, integer-valued, 1 = top choice)
"afsc_utility": AFSC utility matrix (M x M)
"a_pref_matrix": AFSC preference rankings (M x M)
"cadet_utility": final cadet utility matrix (N x M)
"c_selected_matrix": matrix of cadets selected by AFSCs (N x M)
"a_bucket_matrix": bucketing of AFSCs for visualization or selection (M x M)
"rr_interest_matrix", "rr_om_matrix", "rr_om_cadets": ROTC board data
"ur_om_matrix", "ur_om_cadets": USAFA board data
"or_om_matrix", "or_om_cadets": OTS board data
"usafa_cadets": indices of USAFA cadets in the instance

Raises¶

ValueError If neither "Cadets Utility" nor "c_utilities" are provided in the inputs, since cadet utility data is required.

Notes¶

This function assumes the "Cadets" and "AFSCs" CSVs have already been processed.
If raw preferences/utilities are not explicitly imported, they are reconstructed from c_preferences and c_utilities.
Preference ranks use integers where 1 is most preferred (not 0).
Utility matrices are aligned by the order of p["afscs"] and not by the file column order alone.
AFSC utility and preference data are optional but support two-sided matching or board processes.
Rated OM/Interest files enable specialty-specific board logic for each SOC.

Source code in afccp/data/processing.py

def import_afsc_cadet_matrices_data(import_filepaths, parameters):
    """
    Imports optional AFSC-cadet interaction matrices and updates the instance parameter dictionary accordingly.

    This function augments the core cadet and AFSC input data by importing preference matrices, utility matrices,
    and supplemental rated-selection files (if available). The imported data enables more advanced modeling of
    cadet-AFSC interactions including two-sided preferences and selection boards.

    Parameters
    ----------
    - import_filepaths : dict
        Dictionary of filepaths keyed by dataset name. Recognized keys include:

        - `"Cadets Utility"`
        - `"Cadets Preferences"`
        - `"AFSCs Utility"`
        - `"AFSCs Preferences"`
        - `"Cadets Utility (Final)"`
        - `"Cadets Selected"`
        - `"AFSCs Buckets"`
        - `"ROTC Rated Interest"`, `"ROTC Rated OM"`, `"USAFA Rated OM"`, `"OTS Rated OM"`

    - parameters : dict
        Dictionary of model instance parameters. Must contain:

        - `"afscs"`: array of AFSC names
        - `"N"`: number of cadets
        - `"M"`: number of AFSCs
        - `"num_util"`: number of utility entries per cadet
        - `"P"`: number of preferences

    Returns
    -------
    dict
    Updated parameter dictionary with new keys (if data was provided), including:

    - `"utility"`: cadet utility matrix (N x M)
    - `"c_pref_matrix"`: cadet preference rankings (N x M, integer-valued, 1 = top choice)
    - `"afsc_utility"`: AFSC utility matrix (M x M)
    - `"a_pref_matrix"`: AFSC preference rankings (M x M)
    - `"cadet_utility"`: final cadet utility matrix (N x M)
    - `"c_selected_matrix"`: matrix of cadets selected by AFSCs (N x M)
    - `"a_bucket_matrix"`: bucketing of AFSCs for visualization or selection (M x M)
    - `"rr_interest_matrix"`, `"rr_om_matrix"`, `"rr_om_cadets"`: ROTC board data
    - `"ur_om_matrix"`, `"ur_om_cadets"`: USAFA board data
    - `"or_om_matrix"`, `"or_om_cadets"`: OTS board data
    - `"usafa_cadets"`: indices of USAFA cadets in the instance

    Raises
    ------
    ValueError
      If neither `"Cadets Utility"` nor `"c_utilities"` are provided in the inputs, since cadet utility data is required.

    Notes
    -----
    - This function assumes the `"Cadets"` and `"AFSCs"` CSVs have already been processed.
    - If raw preferences/utilities are not explicitly imported, they are reconstructed from `c_preferences` and `c_utilities`.
    - Preference ranks use integers where `1` is most preferred (not `0`).
    - Utility matrices are aligned by the order of `p["afscs"]` and not by the file column order alone.
    - AFSC utility and preference data are optional but support two-sided matching or board processes.
    - Rated OM/Interest files enable specialty-specific board logic for each SOC.
    """

    # Shorthand
    p = parameters

    # Loop through the potential additional dataframes and import them if we have them
    datasets = {}
    for dataset in ["Cadets Utility", "Cadets Preferences", "AFSCs Utility", "AFSCs Preferences",
                    "ROTC Rated Interest", "ROTC Rated OM", "USAFA Rated OM", "OTS Rated OM", 'Cadets Utility (Final)',
                    "Cadets Selected", "AFSCs Buckets"]:

        # If we have the dataset, import it
        if dataset in import_filepaths:
            datasets[dataset] = afccp.globals.import_csv_data(import_filepaths[dataset])

    # First and last AFSC (for collecting matrices from dataframes)
    afsc_1, afsc_M = p["afscs"][0], p["afscs"][p["M"] - 1]

    # Load in extra dataframes
    for dataset, param in {'Cadets Selected': 'c_selected_matrix', 'AFSCs Buckets': 'a_bucket_matrix'}.items():
        if dataset in datasets:
            p[param] = np.array(datasets[dataset].loc[:, afsc_1: afsc_M])

    # Determine how we incorporate the original cadets' utility matrix
    if "Cadets Utility" in datasets:  # Load in the matrix directly
        p["utility"] = np.array(datasets["Cadets Utility"].loc[:, afsc_1: afsc_M])
    elif "c_utilities" in p:  # Create the matrix using the columns

        # Create utility matrix (numpy array NxM) from the utility/preference column information
        p["utility"] = np.zeros([p["N"], p["M"]])
        for i in range(p["N"]):
            for util in range(p['num_util']):
                j = np.where(p["c_preferences"][i, util] == p["afscs"])[0]
                if len(j) != 0:
                    p['utility'][i, j[0]] = p["c_utilities"][i, util]
    else:
        raise ValueError("Error. No cadet utility data provided which is required.")

    # Cadets "Real" Utility (after aggregating it with their ordinal rankings)
    if 'Cadets Utility (Final)' in datasets:  # Load in the cadet utility matrix
        p['cadet_utility'] = np.array(datasets["Cadets Utility (Final)"].loc[:, afsc_1: afsc_M])

    # Determine how we incorporate the cadets' preferences dataframe
    if "Cadets Preferences" in datasets:  # Load in the preferences dataframe directly
        p["c_pref_matrix"] = np.array(datasets["Cadets Preferences"].loc[:, afsc_1: afsc_M])
    elif "c_preferences" in p:  # Create the preferences dataframe using the columns

        # Create cadet preferences dataframe (numpy array NxM) from the preference column information
        p["c_pref_matrix"] = np.zeros([p["N"], p["M"]]).astype(int)
        for i in range(p["N"]):
            for util in range(p['P']):
                j = np.where(p["c_preferences"][i, util] == p["afscs"])[0]
                if len(j) != 0:
                    p['c_pref_matrix'][i, j[0]] = util + 1  # 1 is first choice (NOT 0)

    # AFSC preferences and utilities are not required initial data elements (Depending on how we solve, they may be)
    if "AFSCs Utility" in datasets:  # Load in the AFSC utility matrix
        p["afsc_utility"] = np.array(datasets["AFSCs Utility"].loc[:, afsc_1: afsc_M])
    if "AFSCs Preferences" in datasets:  # Load in the AFSC preferences dataframe
        p["a_pref_matrix"] = np.array(datasets["AFSCs Preferences"].loc[:, afsc_1: afsc_M])

    # All USAFA Cadets
    p['usafa_cadets'] = np.where(p['usafa'])[0]

    # Rated dataframes
    if "ROTC Rated Interest" in datasets:
        r_afscs = list(datasets['ROTC Rated Interest'].columns[1:])
        p['rr_interest_matrix'] = np.array(datasets['ROTC Rated Interest'].loc[:, r_afscs[0]:r_afscs[len(r_afscs) - 1]])
    if "ROTC Rated OM" in datasets:
        r_afscs = list(datasets['ROTC Rated OM'].columns[1:])
        p['rr_om_matrix'] = np.array(datasets['ROTC Rated OM'].loc[:, r_afscs[0]:r_afscs[len(r_afscs) - 1]])
        p['rr_om_cadets'] = np.array(datasets['ROTC Rated OM']['Cadet'])
    if "USAFA Rated OM" in datasets:
        r_afscs = list(datasets['USAFA Rated OM'].columns[1:])
        p['ur_om_matrix'] = np.array(datasets['USAFA Rated OM'].loc[:, r_afscs[0]:r_afscs[len(r_afscs) - 1]])
        p['ur_om_cadets'] = np.array(datasets['USAFA Rated OM']['Cadet'])
    if "OTS Rated OM" in datasets:
        r_afscs = list(datasets['OTS Rated OM'].columns[1:])
        p['or_om_matrix'] = np.array(datasets['OTS Rated OM'].loc[:, r_afscs[0]:r_afscs[len(r_afscs) - 1]])
        p['or_om_cadets'] = np.array(datasets['OTS Rated OM']['Cadet'])

    # Return dictionary of parameters
    return p

`import_value_parameters_data(import_filepaths, parameters, num_breakpoints=24)` ¶

Imports and constructs value parameter sets for the model based on CSV definitions and analyst-defined breakpoints.

This function reads and compiles all information associated with value-based decision modeling, including objective functions, breakpoints, weights, and value constraints for both cadets and AFSCs. It supports multiple sets of value parameters, each potentially with different assumptions or constraints.

Parameters¶

import_filepaths : dict Dictionary containing paths to relevant files. Required keys:
- "Model Input": folder path containing individual VP set CSVs.
- "Value Parameters": CSV listing metadata about each VP set. Optional keys:
- "Cadets Utility Constraints": file containing minimum cadet utility constraints by VP set.
parameters : dict Instance parameter dictionary already populated with "M", "N", and AFSC/cadet-level arrays.
num_breakpoints : int, optional (default=24) Number of breakpoints to discretize each value function unless exact breakpoints are provided.

Returns¶

dict A dictionary keyed by VP set names. Each value is a dictionary of value parameters containing:

Objective definitions and weights
Breakpoints (a) and values (f^hat)
Minimum utility constraints
AFSC objective indexed sets (K^A)
cadet_weight, afsc_weight, and associated metadata
VP set weights and local weights for combination logic

Raises¶

FileNotFoundError If required files are missing from the provided import paths.
ValueError If the "Value Parameters" file is missing or empty.

Notes¶

All AFSC objectives are indexed across O objectives per AFSC.
The function automatically parses objective weight strings and reconstructs value functions if needed.
For each VP set listed in the "Value Parameters" file, the function expects a matching CSV file named "VP <VP Name>.csv" (e.g., "VP Baseline.csv").
Supports optional use of "Cadets Utility Constraints" to impose per-cadet minimums.
If num_breakpoints is None, raw a/f^hat arrays are expected instead of constructing from strings.
Value functions are compressed after loading to remove redundant zero segments for performance.

`import_solutions_data(import_filepaths, parameters)` ¶

Imports and assembles cadet assignment solutions from a saved output file (<data_name> Solutions.csv). Optionally, more files are read in for Base/Training Course solutions.

This function reads solution files containing AFSC, base, and course assignments for each cadet, converts string labels to indexed arrays, and returns a structured dictionary containing all available solution configurations.

Parameters¶

import_filepaths : dict Dictionary containing filepaths to solution files. Expected keys:
- "Solutions": required CSV file with AFSC assignments (one column per solution).
- "Base Solutions": optional CSV with base assignments (same column names as above).
- "Course Solutions": optional CSV with course assignments (same column names as above).
parameters : dict Dictionary of instance parameters. Must contain:
- 'afscs': array of valid AFSC names.
- 'bases': array of valid base names (if "Base Solutions" is provided).
- 'courses': list of valid course arrays for each AFSC (if "Course Solutions" is provided).
- 'S': sentinel index value for unmatched bases.

Returns¶

dict Dictionary mapping solution names to their data. Each solution entry contains:

'name': name of the solution (from CSV column header)
'afsc_array': array of assigned AFSC strings
'j_array': array of assigned AFSC indices (matching parameters['afscs'])
'base_array' (optional): array of base names (if base data is present)
'b_array' (optional): array of base indices (or sentinel S if unmatched)
'course_array' (optional): array of course names (if course data is present)
'c_array' (optional): array of (j, c) tuples representing AFSC/course index pairs

Raises¶

FileNotFoundError If the required "Solutions" file is not present in import_filepaths.

Notes¶

If a course assignment is not found within any AFSC’s course list, a fallback value of (0, 0) is added and a warning is printed.
Assumes that all solution files share the same cadet ordering and column headers for consistent mapping.

`import_additional_data(import_filepaths, parameters)` ¶

Imports supplemental data files (if present) and updates the instance parameters dictionary.

This function loads optional model extensions including base and course assignments, preference matrices, and CASTLE-specific AFSC data. These components are not required for basic operation but enhance downstream modeling functionality (e.g., course scheduling, base optimization, CASTLE implementation).

Parameters¶

import_filepaths : dict Dictionary containing filepaths to additional optional data files. Expected keys include:
- "Bases", "Bases Preferences", "Bases Utility"
- "Courses", "Castle Input"
parameters : dict Dictionary of core model parameters. This dictionary will be updated with any new fields derived from imported files.

Returns¶

dict Updated parameter dictionary with the following optional fields added if available:

'bases': Array of base names
'S': Number of bases
'base_min' / 'base_max': Base assignment bounds by AFSC
'b_pref_matrix': Cadet base preference matrix
'base_utility': Cadet base utility matrix
'courses': Dict of course options by AFSC
'course_start', 'course_min', 'course_max': Dicts with course metadata by AFSC
'castle_afscs_arr', 'afpc_afscs_arr': Raw CASTLE vs AFPC AFSC labels
'castle_afscs': Mapping of CASTLE AFSCs → AFPC AFSCs
'J^CASTLE': CASTLE AFSCs mapped to indices in the AFPC AFSC array
'ots_counts': OTS accession counts for CASTLE AFSCs
'optimal_policy': Policy toggle per CASTLE AFSC
'castle_q': Dictionary of breakpoint-based value functions:
- 'a', 'f^hat': Breakpoints and values
- 'r': Number of breakpoints
- 'L': Breakpoint indices

Notes¶

Breakpoint information from "Castle Input" is stored under castle_q.
Course and base matrices are assumed to be properly aligned with the cadet and AFSC indices already in memory.
All newly imported data is optional and loaded only if the corresponding files are provided.

`export_afscs_data(instance)` ¶

Exports AFSC-level data from the given Instance object to a CSV file.

This function collects Air Force Specialty Code (AFSC) parameters stored in the instance, organizes them into a structured dataframe, and writes the result to disk at the location specified by instance.export_paths["AFSCs"].

Parameters¶

instance : Instance A fully initialized Instance object with a populated parameters dictionary and export_paths mapping. The instance must include AFSC data such as quotas, eligibility, preference counts, and any derived degree tier breakdowns.

Returns¶

None The function writes the output to disk and does not return a value.

Notes¶

The function dynamically detects and exports the following fields if present:
- Core AFSC descriptors: name, accession group, STEM tag, base assignment
- Quota targets: Desired, Estimated, Min, Max, PGL, commissioning source quotas
- Course counts (T) and bubble caps (max_bubbles)
- Eligibility counts per commissioning source
- Degree tier distributions and tier counts (if "Deg Tiers" and "I^D" are available)
- Cadet preference counts per AFSC (if "Choice Count" is present)
Only the first p["M"] AFSCs are included in the output. Any padding elements (e.g., "*") are excluded.
The output file is named "AFSCs.csv" and stored in the directory determined by instance.export_paths.

`export_cadets_data(instance)` ¶

Exports cadet-level data from the given Instance object to a CSV file.

This function builds the "Cadets" dataframe from internal model parameters stored in the instance, capturing individual cadet characteristics, preferences, and qualification data (if available). The output is saved to disk at the location specified by instance.export_paths["Cadets"].

Parameters¶

instance : Instance A fully initialized Instance object containing model parameters (parameters) and a configured export path for the "Cadets" CSV file.

Returns¶

None The function writes the cadet-level data to disk and does not return a value.

Notes¶

The following cadet-level attributes will be included if present in parameters:
- Basic profile: Cadet ID, gender, race, ethnicity, accession group, STEM tag, ASC codes
- Assignment metadata: must-match flags, base/course preferences, assigned AFSC
- Merit metrics: raw merit, real merit
- Training data: start date, preference rankings, course/base weights and thresholds
- Utility and preference columns: if present, the full c_utilities and c_preferences matrices will be exported
- Qualification data: if a qual matrix is present, columns are added for each AFSC (e.g., qual_17X, qual_21R)
The preference and utility columns are labeled as Pref_1, Pref_2, ..., Util_1, Util_2, etc.
The output is saved as "Cadets.csv" under the directory given by instance.export_paths.

`export_afsc_cadet_matrices_data(instance)` ¶

Exports cadet-AFSC utility and preference matrices from the given Instance object to CSV files.

This function checks for the presence of known matrix-style parameters in the model (e.g., utility values, preference rankings, interest scores) and exports them to disk. Each matrix is stored as a CSV with cadets as rows and AFSCs as columns (or vice versa), depending on the context.

Parameters¶

instance : Instance A fully initialized Instance object containing model parameters (parameters), value parameters (value_parameters), and export paths.

Returns¶

None The function writes one or more matrix-style datasets to disk if they exist.

Notes¶

The following parameters will be exported if present:
- utility: Cadet utilities over all AFSCs → "Cadets Utility"
- c_pref_matrix: Cadet preferences over all AFSCs → "Cadets Preferences"
- afsc_utility: AFSC utilities over all cadets → "AFSCs Utility"
- a_pref_matrix: AFSC preferences over all cadets → "AFSCs Preferences"
- rr_interest_matrix: ROTC-rated interest scores → "ROTC Rated Interest"
- rr_om_matrix: ROTC OM values → "ROTC Rated OM"
- ur_om_matrix: USAFA OM values → "USAFA Rated OM"
- or_om_matrix: OTS OM values → "OTS Rated OM"
- cadet_utility: Finalized cadet utility values → "Cadets Utility (Final)"
- c_selected_matrix: Final cadet selection matrix → "Cadets Selected"
- a_bucket_matrix: AFSC bucket matrix → "AFSCs Buckets"
Each exported dataframe will have a "Cadet" column followed by one column per AFSC in the relevant set. The set of AFSCs may vary depending on whether the data is specific to a commissioning source (SOC).
Datasets related to specific SOCs (e.g., "ROTC Rated OM") use filtered cadet subsets and AFSCs determined by determine_soc_rated_afscs().

`export_value_parameters_data(instance)` ¶

Export value parameter datasets and related information to CSV files for analysis.

This function extracts and exports optimization value parameter sets, global utility matrices (if available), and cadet-specific constraints. It supports multiple value parameter configurations by exporting separate files per set. This facilitates analysis, debugging, or visualization of value-based multi-objective optimization.

Parameters¶

instance : Instance A configured instance of the CadetCareerProblem class, including:
- vp_dict: Dictionary of value parameter sets
- value_parameters: Active value parameter configuration
- parameters: General instance parameters (e.g., AFSCs, cadets)
- export_paths: File paths for saving exports

Returns¶

None The function writes multiple CSV files to disk for each available dataset.

This command will generate: - A separate CSV file for each set of value parameters (e.g., weights, targets, value functions) - An overall summary CSV file of value parameter metadata - A cadet-level constraints CSV file (min values per cadet per VP set) - A global utility matrix CSV (if global_utility is present in a VP set)

Notes¶

Value function breakpoints a and values f^hat are stored as comma-separated strings for readability.
Objective weights are scaled to a 0–100 range and normalized per AFSC.
The output files use naming conventions like:
- {data_name} {vp_name}.csv
- {data_name} {vp_name} Global Utility.csv
- These files are versioned if the instance's data version is not "Default".

`export_solutions_data(instance)` ¶

Export cadet-to-AFSC solution assignments to CSV files.

This function exports all available cadet solution assignments (including AFSC, base, and course solutions) to CSV files for downstream analysis, visualization, or comparison. Each solution is saved as a column in the exported file, enabling side-by-side comparison of multiple optimization outcomes.

Parameters¶

instance : Instance The problem instance containing:
- parameters – model parameter dictionary
- solutions – dictionary of cadet-to-AFSC assignments by solution name
- export_paths – dictionary of destination paths for saving outputs

Returns¶

None The function writes 1 to 3 CSV files to disk, depending on available solution components.

This exports: - Solutions.csv: Main cadet-to-AFSC assignment matrix - Base Solutions.csv: Optional cadet-to-base assignments, if present - Course Solutions.csv: Optional cadet-to-course assignments, if present

Notes¶

Each file contains cadets in the first column and one or more solution columns following.
Solution names (keys from instance.solutions) define the column headers.
The function safely skips missing data (e.g., base or course assignments are only exported if they exist).
Used primarily to track scenario-based solution outputs from multi-run experiments.

`export_additional_data(instance)` ¶

Export additional configuration and metadata to CSV files.

This function exports all supplementary datasets associated with the problem instance, including base assignments, base preferences, utility scores, training course data, and CASTLE-AFSC mappings. These datasets are derived from the instance.parameters dictionary and written to disk using paths from instance.export_paths.

Parameters¶

instance : Instance The problem instance containing:
- parameters – dictionary of model inputs and outputs
- export_paths – dictionary of file paths for each dataset

Returns¶

None Outputs are written to CSV files; no value is returned.

This generates the following (if applicable): - Bases.csv: Min/max cadet assignments per AFSC at each base - Bases Preferences.csv: Cadet preferences over bases - Bases Utility.csv: Cadet utility scores for each base - Courses.csv: Course-level details per AFSC (min/max/start) - Castle Input.csv: CASTLE-to-AFPC AFSC mappings with optional value curves

Notes¶

The export is conditional: datasets are only written if their associated parameters exist in instance.parameters.
CASTLE-related data (castle_q, castle_afscs_arr, etc.) must be present to trigger Castle Input.csv export.
Base utility and preference matrices are assumed to be cadet-by-base numpy arrays.

`export_solution_results(instance, filepath)` ¶

Export a comprehensive Excel workbook of solution results.

This function generates an Excel file containing detailed outputs from a solved cadet-AFSC assignment instance, including objective values, cadet assignments, constraint violations, and other performance metrics. The resulting Excel workbook supports deep post-solution analysis and includes conditional formatting for visual clarity.

Parameters¶

instance : Instance An object representing the solved assignment problem. Must contain:
parameters (dict): Problem data
value_parameters (dict): Objective metadata and weights
solution (dict): Final solution output (e.g., assignments, utilities, choice rankings)
mdl_p (dict): Metadata including formatting options
filepath : str Full path where the Excel file will be saved (e.g., "output/solution_results.xlsx")

Returns¶

None Writes an .xlsx file to disk containing multiple sheets of structured solution data.

Excel Output Includes¶

Main: High-level metrics, objective value, choice counts, and performance indicators
Objective Measures: AFSC scores for each weighted objective
Constraint Fails: Constraint violations by AFSC
Objective Values: Weighted performance per AFSC with visual scoring heatmaps
Solution: Per-cadet assignment breakdown with preferences, utilities, base/course matches
X, V, Q (optional): Assignment matrices for AFSCs, bases, and training courses
Lambda, Y (optional): Value function parameters per AFSC and objective
Castle Metrics (if applicable): Metrics for CASTLE-mode AFSCs
Blocking Pairs (if present): Cadet-AFSC blocking violations

Notes¶

Conditional formatting is applied to highlight preference rankings, merit scores, and match quality.
The function handles presence or absence of optional components (e.g., base matching, training courses).
Top 10 cadet choices and utilities are shown in the Solution tab for deeper preference analysis.

`draw_frame_border_outside(workbook, worksheet, first_row, first_col, rows_count, cols_count, color='#0000FF', width=2)` ¶

Draws an outer border around a rectangular cell range using conditional formatting.

Applies a frame to the specified region starting at (first_row, first_col) with size (rows_count x cols_count)
Border color and width are customizable
Assumes 0-based indexing and adjusts if row/column values are less than 1

Parameters¶

workbook : xlsxwriter.Workbook
worksheet : xlsxwriter.Worksheet
first_row : int
first_col : int
rows_count : int
cols_count : int
color : str, default '#0000FF'
width : int, default 2

Source code in afccp/data/processing.py

def draw_frame_border_outside(workbook, worksheet, first_row, first_col, rows_count, cols_count,
                              color='#0000FF', width=2):
    """
    Draws an outer border around a rectangular cell range using conditional formatting.

    - Applies a frame to the specified region starting at (first_row, first_col) with size (rows_count x cols_count)
    - Border color and width are customizable
    - Assumes 0-based indexing and adjusts if row/column values are less than 1

    Parameters
    ----------
    - workbook : xlsxwriter.Workbook
    - worksheet : xlsxwriter.Worksheet
    - first_row : int
    - first_col : int
    - rows_count : int
    - cols_count : int
    - color : str, default '#0000FF'
    - width : int, default 2
    """

    # verify type of data passed in
    if first_row <= 0:
        first_row = 1
    if first_col <= 0:
        first_col = 1
    cols_count = abs(cols_count)
    rows_count = abs(rows_count)

    # top left corner
    worksheet.conditional_format(first_row - 1, first_col,
                                 first_row - 1, first_col,
                                 {'type': 'formula', 'criteria': 'True',
                                  'format': workbook.add_format({'bottom': width, 'border_color': color})})
    worksheet.conditional_format(first_row, first_col - 1,
                                 first_row, first_col - 1,
                                 {'type': 'formula', 'criteria': 'True',
                                  'format': workbook.add_format({'right': width, 'border_color': color})})
    # top right corner
    worksheet.conditional_format(first_row - 1, first_col + cols_count - 1,
                                 first_row - 1, first_col + cols_count - 1,
                                 {'type': 'formula', 'criteria': 'True',
                                  'format': workbook.add_format({'bottom': width, 'border_color': color})})
    worksheet.conditional_format(first_row, first_col + cols_count,
                                 first_row, first_col + cols_count,
                                 {'type': 'formula', 'criteria': 'True',
                                  'format': workbook.add_format({'left': width, 'border_color': color})})
    # bottom left corner
    worksheet.conditional_format(first_row + rows_count - 1, first_col - 1,
                                 first_row + rows_count - 1, first_col - 1,
                                 {'type': 'formula', 'criteria': 'True',
                                  'format': workbook.add_format({'right': width, 'border_color': color})})
    worksheet.conditional_format(first_row + rows_count, first_col,
                                 first_row + rows_count, first_col,
                                 {'type': 'formula', 'criteria': 'True',
                                  'format': workbook.add_format({'top': width, 'border_color': color})})
    # bottom right corner
    worksheet.conditional_format(first_row + rows_count - 1, first_col + cols_count,
                                 first_row + rows_count - 1, first_col + cols_count,
                                 {'type': 'formula', 'criteria': 'True',
                                  'format': workbook.add_format({'left': width, 'border_color': color})})
    worksheet.conditional_format(first_row + rows_count, first_col + cols_count - 1,
                                 first_row + rows_count, first_col + cols_count - 1,
                                 {'type': 'formula', 'criteria': 'True',
                                  'format': workbook.add_format({'top': width, 'border_color': color})})
    # top
    worksheet.conditional_format(first_row - 1, first_col + 1,
                                 first_row - 1, first_col + cols_count - 2,
                                 {'type': 'formula', 'criteria': 'True',
                                  'format': workbook.add_format({'bottom': width, 'border_color': color})})
    # left
    worksheet.conditional_format(first_row + 1, first_col - 1,
                                 first_row + rows_count - 2, first_col - 1,
                                 {'type': 'formula', 'criteria': 'True',
                                  'format': workbook.add_format({'right': width, 'border_color': color})})
    # bottom
    worksheet.conditional_format(first_row + rows_count, first_col + 1,
                                 first_row + rows_count, first_col + cols_count - 2,
                                 {'type': 'formula', 'criteria': 'True',
                                  'format': workbook.add_format({'top': width, 'border_color': color})})
    # right
    worksheet.conditional_format(first_row + 1, first_col + cols_count,
                                 first_row + rows_count - 2, first_col + cols_count,
                                 {'type': 'formula', 'criteria': 'True',
                                  'format': workbook.add_format({'left': width, 'border_color': color})})

data.processing ¶

Data Processing Module¶

Key Capabilities¶

Primary Functions¶

Notes¶

initialize_file_information(data_name: str, data_version: str) ¶

Parameters¶

Returns¶

Directory Behavior¶

Examples¶

import_afscs_data(import_filepaths: dict, parameters: dict) -> dict ¶

Parameters¶

Returns¶

Notes¶

See Also¶

import_cadets_data(import_filepaths, parameters) ¶

Parameters¶

Returns¶

Notes¶

See Also¶

import_afsc_cadet_matrices_data(import_filepaths, parameters) ¶

Parameters¶

Returns¶

Raises¶

Notes¶

import_value_parameters_data(import_filepaths, parameters, num_breakpoints=24) ¶

Parameters¶

Returns¶

Raises¶

Notes¶

See Also¶

import_solutions_data(import_filepaths, parameters) ¶

Parameters¶

Returns¶

Raises¶

Notes¶

See Also¶

import_additional_data(import_filepaths, parameters) ¶

Parameters¶

Returns¶

Notes¶

See Also¶

export_afscs_data(instance) ¶

Parameters¶

Returns¶

Notes¶

See Also¶

export_cadets_data(instance) ¶

Parameters¶

Returns¶

Notes¶

See Also¶

export_afsc_cadet_matrices_data(instance) ¶

Parameters¶

Returns¶

Notes¶

See Also¶

export_value_parameters_data(instance) ¶

Parameters¶

Returns¶

Notes¶

See Also¶

export_solutions_data(instance) ¶

Parameters¶

Returns¶

Notes¶

See Also¶

export_additional_data(instance) ¶

Parameters¶

Returns¶

Notes¶

See Also¶

export_solution_results(instance, filepath) ¶

Parameters¶

Returns¶

Excel Output Includes¶

Notes¶

See Also¶

draw_frame_border_outside(workbook, worksheet, first_row, first_col, rows_count, cols_count, color='#0000FF', width=2) ¶

Parameters¶

`data.processing` ¶

`initialize_file_information(data_name: str, data_version: str)` ¶

`import_afscs_data(import_filepaths: dict, parameters: dict) -> dict` ¶

`import_cadets_data(import_filepaths, parameters)` ¶

`import_afsc_cadet_matrices_data(import_filepaths, parameters)` ¶

`import_value_parameters_data(import_filepaths, parameters, num_breakpoints=24)` ¶

`import_solutions_data(import_filepaths, parameters)` ¶

`import_additional_data(import_filepaths, parameters)` ¶

`export_afscs_data(instance)` ¶

`export_cadets_data(instance)` ¶

`export_afsc_cadet_matrices_data(instance)` ¶

`export_value_parameters_data(instance)` ¶

`export_solutions_data(instance)` ¶

`export_additional_data(instance)` ¶

`export_solution_results(instance, filepath)` ¶

`draw_frame_border_outside(workbook, worksheet, first_row, first_col, rows_count, cols_count, color='#0000FF', width=2)` ¶