`data.adjustments` ¶

Data Adjustments Module for AFCCP¶

This module contains utility functions that perform critical adjustments, validations, and transformations on the parameters and vp_dict dictionaries that define each problem instance in the AFCCP model.

The functions here serve as post-processing or pre-processing steps to ensure internal consistency, prepare data for model input, or apply specific business rules (such as OTS must-matches or degree tier qualification logic). They are commonly called after loading data or before solving a model.

Functions:¶

parameter_sanity_check(parameters) Performs validation checks on parameters and value parameters to ensure modeling assumptions are satisfied.
parameter_sets_additions(parameters) Updates derived parameter sets (like I^OTS, I^USAF, J^Rated) based on core problem inputs.
more_parameter_additions(parameters) Adds further derived variables or flags used throughout the AFCCP model such as first-choice indicators.
base_training_parameter_additions(parameters) Adds data structures needed to support Base Training assignments for cadets.
set_ots_must_matches(parameters) Selects a subset of OTS cadets as "must-match" based on their merit and OTS accession targets.
gather_degree_tier_qual_matrix(cadets_df, parameters) Determines the qualification matrix for AFSC eligibility based on degree tier requirements.
convert_instance_to_from_scrubbed(instance, new_letter=None, translation_dict=None, data_name='Unknown') Converts instance AFSC names to "scrubbed" placeholders or restores them back to their original names for anonymized modeling and solution reproducibility.

`parameter_sanity_check(instance)` ¶

Perform a Full Sanity Check on Problem Instance Parameters.

This function performs a comprehensive audit of the problem instance's input data and value parameters to verify that all structures, matrices, and definitions are logically consistent and feasible. This includes checks on cadet eligibility, AFSC quotas, objective constraints, preference list coherence, utility monotonicity, and tiered qualification logic.

The goal is to prevent downstream issues during optimization by catching data errors or logical mismatches in advance. All checks are printed with contextual explanations and will highlight both errors and warnings when inconsistencies are found.

Parameters:¶

instance: CadetCareerProblem class instance An instantiated problem containing:
- parameters: dictionaries and matrices representing cadets, AFSCs, preferences, and utility definitions.
- value_parameters: constraints, objective targets, and value function metadata.

Returns:¶

None: This function prints all identified issues to the console but does not return any values. It may raise a ValueError if value_parameters are not initialized.

Examples:¶

from afccp.data.adjustments import parameter_sanity_check
parameter_sanity_check(instance)

This prints a series of diagnostics like:

"ISSUE: AFSC '15A' quota constraint invalid: 12 (min) > 10 (eligible)"
"WARNING: Cadet 41 has no preferences and is therefore eligible for nothing."
"ISSUE: Value function breakpoints for AFSC '17X' objective 'Tier 2' are misaligned."

Source code in afccp/data/adjustments.py

def parameter_sanity_check(instance):
    """
    Perform a Full Sanity Check on Problem Instance Parameters.

    This function performs a comprehensive audit of the problem instance's input data and value parameters to verify that all
    structures, matrices, and definitions are logically consistent and feasible. This includes checks on cadet eligibility,
    AFSC quotas, objective constraints, preference list coherence, utility monotonicity, and tiered qualification logic.

    The goal is to prevent downstream issues during optimization by catching data errors or logical mismatches in advance.
    All checks are printed with contextual explanations and will highlight both errors and warnings when inconsistencies are found.

    Parameters:
    --------
    - instance: `CadetCareerProblem` class instance
        An instantiated problem containing:

        - `parameters`: dictionaries and matrices representing cadets, AFSCs, preferences, and utility definitions.
        - `value_parameters`: constraints, objective targets, and value function metadata.

    Returns:
    --------
    - None: This function prints all identified issues to the console but does not return any values.
      It may raise a `ValueError` if `value_parameters` are not initialized.

    Examples:
    --------
    ```python
    from afccp.data.adjustments import parameter_sanity_check
    parameter_sanity_check(instance)
    ```

    This prints a series of diagnostics like:

    - "ISSUE: AFSC '15A' quota constraint invalid: 12 (min) > 10 (eligible)"
    - "WARNING: Cadet 41 has no preferences and is therefore eligible for nothing."
    - "ISSUE: Value function breakpoints for AFSC '17X' objective 'Tier 2' are misaligned."
    """

    # Shorthand
    p, vp = instance.parameters, instance.value_parameters

    if vp is None:
        raise ValueError("Cannot sanity check parameters without specifying which value_parameters to use.")

    # Initialization
    print("Sanity checking the instance parameters...")
    issue = 0

    # Check constraint type matrix (I discontinued "3"s and "4"s in favor of just doing "1"s and "2"s
    if 3 in vp['constraint_type'] or 4 in vp['constraint_type']:
        issue += 1
        print(issue, "ISSUE: 'constraint_type' matrix contains 3s and/or 4s instead of 1s and 2s. I discontinued the"
                     "use of the former in favor of the latter so please adjust it.")

    # Loop through each AFSC to check various elements
    for j, afsc in enumerate(p["afscs"][:p["M"]]):

        if p["num_eligible"][j] < p["quota_min"][j]:
            issue += 1
            print(issue, "ISSUE: AFSC '" + afsc + "' quota constraint invalid. " + str(p["quota_min"][j]) +
                  " (min) > " + str(p["num_eligible"][j]) + " (number of eligible cadets).")
        elif p["num_eligible"][j] == p["quota_min"][j]:
            issue += 1
            print(issue, "WARNING: AFSC '" + afsc +
                  "' has a lower quota that is the same as its number of eligible cadets (" +
                  str(p["quota_min"][j]) + "). All eligible cadets for this AFSC will be assigned to it.")

        if p["quota_min"][j] > p["quota_max"][j]:
            issue += 1
            print(issue, "ISSUE: AFSC '" + afsc + "' quota constraint invalid. " + str(p["quota_min"][j]) +
                  " (min) > " + str(p["quota_max"][j]) + " (max).")

        quota_k = np.where(vp["objectives"] == "Combined Quota")[0][0]
        if p["quota_d"][j] != vp["objective_target"][j, quota_k]:
            issue += 1
            print(issue, "ISSUE: AFSC '" + afsc + "' quota desired target of " + str(p["quota_d"][j]) +
                  " from AFSCs Fixed does not match its objective target (" + str(vp["objective_target"][j, quota_k]) +
                  ") in the value parameters.")

        if p["quota_d"][j] < p["quota_min"][j] or p["quota_d"][j] > p["quota_max"][j]:
            issue += 1
            print(issue, "ISSUE: AFSC '" + afsc + "' quota desired target of " + str(p["quota_d"][j]) +
                  " is outside the specified range on the number of cadets (" + str(p["quota_min"][j]) + ", " +
                  str(p["quota_max"][j]) + ").")

        # If we have the AFSC preference lists, we validate certain features
        if 'a_pref_matrix' in p:

            cfm_list = np.where(p['a_pref_matrix'][:, j])[0]  # Cadets on the AFSC preference list

            # Cadets that are both on the CFM preference list and are eligible for the AFSC (qual matrix)
            both_lists = np.intersect1d(cfm_list, p['I^E'][j])  # SHOULD contain the same cadets
            num_cfm, num_qual = len(cfm_list), len(p['I^E'][j])  # SHOULD be the same number of cadets

            # If the numbers aren't equal
            if len(both_lists) != num_qual:
                issue += 1
                cfm_not_qual = [cadet for cadet in cfm_list if cadet not in p['I^E'][j]]
                qual_not_cfm = [cadet for cadet in p['I^E'][j] if cadet not in cfm_list]
                print(issue, "ISSUE: AFSC '" + afsc + "' CFM preference list ('a_pref_matrix') does not match the qual"
                                                      "matrix. \nThere are " + str(num_cfm) +
                      " cadets that are on the preference list (non-zero ranks) but there are "
                      + str(num_qual) + " 'eligible' cadets (qual matrix). There are " + str(len(both_lists)) +
                      " cadets in both sets. \nCFM list but not qual cadets:", cfm_not_qual,
                      "\nQual but not CFM list cadets:", qual_not_cfm)

            # Make sure that all eligibility pairs line up
            if 'c_pref_matrix' in p:

                for i, cadet in enumerate(p['cadets']):

                    on_afsc_list = p['a_pref_matrix'][i, j] > 0
                    on_cadet_list = p['c_pref_matrix'][i, j] > 0

                    if on_cadet_list and not on_afsc_list:
                        issue += 1
                        print(issue, "ISSUE: AFSC '" + afsc + "' is on cadet '" + str(cadet) + "' (index=" +
                              str(i) + ")'s preference list (c_pref_matrix) but the cadet is not on their preference "
                                       "list (a_pref_matrix).")
                    elif on_afsc_list and not on_cadet_list:
                        issue += 1
                        print(issue, "ISSUE: Cadet '" + str(cadet) + "' (index=" + str(i) + ") is on AFSC '" + afsc +
                              "'s preference list (a_pref_matrix) but the AFSC is not on their preference list (c_pref_matrix).")

        # Validate AFOCD tier objectives
        for objective in ["Mandatory", "Desired", "Permitted", "Tier 1", "Tier 2", "Tier 3", "Tier 4"]:

            # Make sure this is a valid objective for this problem instance
            if objective not in vp["objectives"]:
                continue  # goes to the next objective

            # Get index
            k = np.where(vp["objectives"] == objective)[0][0]

            # Check if the AFSC is constraining this objective
            if k not in vp["K^C"][j]:
                continue

            # Make sure there are cadets that are in this degree tier
            if len(p["I^D"][objective][j]) == 0:
                issue += 1
                if "Tier" in objective:
                    print(issue, "ISSUE: AFSC '" + afsc + "' objective '" + objective +
                          "' is empty. No cadets have degrees that fit in this tier for this class year.")
                else:
                    print(issue, "ISSUE: AFSC '" + afsc + "' objective '" + objective +
                          "'-Tier is empty. No cadets have degrees that fit in this tier for this class year.")

            # Make sure objective has valid target
            if vp["objective_target"][j, k] == 0:
                issue += 1
                print(issue, "ISSUE: AFSC '" + afsc + "' objective '" + objective +
                      "'-Tier target cannot be 0 when it has a nonzero weight.")

        # Validate AFOCD Tier objectives
        levels = []
        for t, objective in enumerate(["Tier 1", "Tier 2", "Tier 3", "Tier 4"]):

            # Make sure this is a valid objective for this problem instance
            if objective not in vp["objectives"]:
                continue  # goes to the next objective

            # Get index
            k = np.where(vp["objectives"] == objective)[0][0]

            # Make sure that this is a valid tier for this AFSC
            if k not in vp['K^A'][j]:
                continue  # goes to the next objective

            level = "I" + str(t + 1)
            requirement_dict = {'t_mandatory': 'M', 't_desired': 'D', 't_permitted': 'P'}
            for r_level in requirement_dict:
                if p[r_level][j, t]:
                    level = requirement_dict[r_level] + str(t + 1)
            levels.append(level)

            # Make sure this requirement/qualification level is present with the cadets
            if level not in p['qual'][:, j]:
                issue += 1
                print(issue, "ISSUE: AFSC '" + afsc + "' objective '" + objective +
                      "' expected cadet qualification level is '" + level + "' but this is not in the qual matrix.")

        unique_levels = np.unique(p['qual'][:, j])
        for level in unique_levels:
            if level not in levels and 'E' not in level and 'I' not in level:
                issue += 1
                print(issue, "ISSUE: AFSC '" + afsc + "' qualification level '" + level +
                      "' found within the cadet qual matrix but this is not defined within the value"
                      " parameters." )

        # Make sure all constrained objectives have appropriate constraints
        for k in vp["K^C"][j]:
            objective = vp["objectives"][k]

            # Check constraint type to see if something doesn't check out
            if vp["constraint_type"][j, k] == 1:

                # If the minimum is zero, we know this is an "at MOST" constraint (0 to 0.3, for example)
                if vp['objective_min'][j, k] == 0:
                    issue += 1
                    print(issue, "WARNING: AFSC '" + afsc + "' objective '" + objective +
                          "' has an 'at most' constraint of '" + vp['objective_value_min'][j, k] +
                          "'. The constraint_type is 1, indicating an approximate constraint but this is not recommended. "
                          "Instead, use the constraint_type '2' to indicate an exact constraint since this is the easiest"
                          " way to meet an 'at most' constraint.")

            # Make sure constrained objectives have valid constraint types
            if vp['constraint_type'][j, k] not in [1, 2]:
                issue += 1
                print(issue, "ISSUE: AFSC '" + afsc + "' objective '" + objective +
                      "' is in set of constrained objectives: vp['K^C'][j] but has a constraint_type of '" +
                      str(vp['constraint_type'][j, k]) + "'. This is not a valid active constraint.",
                      "Please update the set of value parameters using 'instance.update_value_parameters()'.")

            # Check valid 'objective_value_min' constraint range
            try:
                lb = float(vp["objective_value_min"][j, k].split(",")[0])
                ub = float(vp["objective_value_min"][j, k].split(",")[1])
                assert lb <= ub
            except:
                issue += 1
                print(issue, "ISSUE: AFSC '" + afsc + "' objective '" + objective +
                      "' constraint range (objective_value_min) '" + vp["objective_value_min"][j, k] +
                      "' is invalid. This constraint is currently activated.")

        # Make sure value functions are valid
        for k in vp["K^A"][j]:
            objective = vp["objectives"][k]
            vf_string_start = vp["value_functions"][j, k].split("|")[0]

            # VF String validation
            if vf_string_start not in ["Min Increasing", "Min Decreasing", "Balance", "Quota_Direct",
                                       "Quota_Normal"]:
                issue += 1
                print(issue, "ISSUE: AFSC '" + afsc + "' objective '" + objective + "' value function string '" +
                      vp["value_functions"][j, k] + "' is invalid.")

            # Validate number of breakpoints
            if vp["r"][j, k] == 0:
                issue += 1
                print(issue, "ISSUE: AFSC '" + afsc + "' objective '" + objective +
                      "' does not have any value function breakpoints. 'a':", vp["a"][j][k])
                continue

            # Value function should have same number of x and y coordinates
            if len(vp["a"][j][k]) != len(vp["f^hat"][j][k]):
                issue += 1
                print(issue, "ISSUE: AFSC '" + afsc + "' objective '" + objective +
                      "' value function breakpoint coordinates do not align. 'a' has length of " + len(vp["a"][j][k]) +
                      " while 'f^hat' has length of " + len(vp["f^hat"][j][k]) + ".")
                continue

            # Ensure that the breakpoint "x" coordinates are always getting bigger
            current_x = -1
            valid_x_bps = True
            for l in vp["L"][j][k]:
                if vp["a"][j][k][l] < current_x:
                    valid_x_bps = False
                    break
                else:
                    current_x = vp["a"][j][k][l]

            if not valid_x_bps:
                issue += 1
                print(issue, "ISSUE: AFSC '" + afsc + "' objective '" + objective +
                      "' value function x coordinates do not continuously increase along x-axis. 'a':", vp["a"][j][k],
                      "'vf_string':", vp["value_functions"][j, k])

        # Check all the objectives to see if the user missed something
        for k, objective in enumerate(vp['objectives']):

            if vp['constraint_type'][j, k] in [1, 2] and k not in vp['K^C'][j]:
                issue += 1
                print(issue, "WARNING: AFSC '" + afsc + "' objective '" + objective +
                      "' has a constraint_type of '" + str(vp['constraint_type'][j, k]) +
                      "' but is not in set of constrained objectives: vp['K^C'][j]. This is a mistake so",
                      "please update the set of value parameters using 'instance.update_value_parameters()'.")

    # Loop through each cadet to check preferences and utility values
    invalid_utility, invalid_cadet_utility = 0, 0
    invalid_utility_cadets, invalid_cadet_utility_cadets = [], []
    for i in p['I']:
        if 'c_preferences' in p and 'c_pref_matrix' in p:
            for choice in range(p['P']):
                afsc = p['c_preferences'][i, choice]
                if afsc in p['afscs']:
                    j = np.where(p['afscs'] == afsc)[0][0]
                    if p['c_pref_matrix'][i, j] != choice + 1:
                        issue += 1
                        print(issue, "ISSUE: Cadet", p['cadets'][i], "has AFSC '" + afsc + "' in position '"
                              + str(choice + 1) + "' in the Cadets.csv file, but it is ranked '" +
                              str(p['c_pref_matrix'][i, j]) + "' from the Cadets Preferences.csv file.")
                        break  # Don't need to check the rest of the cadet's preferences

            # If this cadet does not have any preferences, we skip them (must be an OTS candidate)
            if len(p['cadet_preferences'][i]) == 0:
                issue += 1
                print(issue, f"WARNING: Cadet {i} has no preferences and is therefore eligible for nothing.")
                continue

            # Make sure "utility" array is monotonically decreasing and the "cadet_utility" array is strictly decreasing
            arr_1 = p['utility'][i, p['cadet_preferences'][i]]
            arr_2 = p['cadet_utility'][i, p['cadet_preferences'][i]]
            if not all(arr_1[idx] >= arr_1[idx + 1] for idx in range(len(arr_1) - 1)):
                invalid_utility += 1
                invalid_utility_cadets.append(i)
            if not all(arr_2[idx] > arr_2[idx + 1] for idx in range(len(arr_2) - 1)):
                invalid_cadet_utility += 1
                invalid_cadet_utility_cadets.append(i)

    # Report issues with decreasing cadet utility values
    if invalid_utility > 0:
        issue += 1
        print(issue, "ISSUE: The cadet-reported utility matrix 'utility', located in 'Cadets Utility.csv'\nand in the "
                     "'Util' columns of 'Cadets.csv', does not incorporate monotonically\ndecreasing utility values for "
                     "" + str(invalid_utility) + " cadets. Please adjust.")
        if invalid_utility < 100:
            print('These are the cadets at indices', invalid_utility_cadets)
    if invalid_cadet_utility > 0 and 'last_afsc' not in p:  # There IS indifference with the new method of utilities
        issue += 1
        print(issue, "ISSUE: The constructed cadet utility matrix 'cadet_utility', located in 'Cadets Utility (Final)."
                     "csv',\ndoes not incorporate strictly decreasing utility values for "
                     "" + str(invalid_cadet_utility) + " cadets. Please adjust.")
        if invalid_cadet_utility < 40:
            print('These are the cadets at indices', invalid_cadet_utility_cadets)

    # Loop through each objective to see if there are any null values in the objective target array
    for k, objective in enumerate(vp["objectives"]):
        num_null = pd.isnull(vp["objective_target"][:, k]).sum()
        if num_null > 0:
            issue += 1
            print(issue, "ISSUE: Objective '" + objective + "' contains " +
                  str(num_null) + " null target values ('objective_target').")

    # USSF OM Constraint rules
    if instance.mdl_p['USSF OM'] is True and "USSF" not in p['afscs_acc_grp']:
        issue += 1
        print(issue, "ISSUE: Space Force OM constraint specified in controls (USSF OM = True) but no USSF"
                     " AFSCS found in the instance.")

    # At least one rated preference for rated eligible
    for soc in p['SOCs']:
        if soc in p['Rated Cadets']:
            for i in p['Rated Cadets'][soc]:
                if len(p['Rated Choices'][soc][i]) == 0:
                    issue += 1
                    print(issue,
                          "ISSUE: Cadet '" + str(p['cadets'][i]) + "' is on " + soc.upper() +
                          "'s Rated list (" + soc.upper() + " Rated OM.csv) but is not eligible for any Rated AFSCs. "
                                                            "You need to remove their row from the csv.")

    # Make sure all cadets eligible for at least one rated AFSC are in their SOC's rated OM list
    for soc in p['SOCs']:
        if 'J^Rated' in p:  # Make sure we have rated AFSCs

            # Loop through each cadet from this SOC
            for i in p[soc + '_cadets']:

                # Check if they're eligible for at least one rated AFSC
                if np.sum(p['eligible'][i][p['J^Rated']]) >= 1:

                    # If they're eligible for a Rated AFSC but aren't in the "Rated OM.csv" file, that's a problem
                    if i not in p['Rated Cadets'][soc]:
                        rated_afscs_eligible = p['afscs'][np.intersect1d(p['J^Rated'], p['J^E'][i])]
                        issue += 1
                        print(issue, "ISSUE: Cadet '" + str(p['cadets'][i]) + "' is not on " + soc.upper() +
                              "'s Rated list (" + soc.upper() + " Rated OM.csv), but is on the preference lists for",
                              rated_afscs_eligible, "Please add a row in 'Rated OM.csv' for this cadet reflecting their "
                                                    "OM.")

    # Validate that the "totals" for minimums/maximums work
    if np.sum(p['pgl']) > p['N']:
        issue += 1
        print(issue, "ISSUE: Total sum of PGL targets is", int(np.sum(p['pgl'])),
              " while 'N' is " + str(p['N']) + ". This is infeasible since we don't have enough cadets.")
    if np.sum(p['quota_min']) > p['N']:
        issue += 1
        print(issue, "ISSUE: Total sum of minimum constrained capacities (quota_min) is", int(np.sum(p['quota_min'])),
              " while 'N' is " + str(p['N']) + ". This is infeasible since we don't have enough cadets.")
    if (np.sum(p['quota_max']) < p['N']) and 'ots' not in p['SOCs']:  # OTS candidates can go unmatched
        issue += 1
        print(issue, "ISSUE: Maximum constrained capacities (quota_max) is", int(np.sum(p['quota_max'])),
              " while 'N' is " + str(p['N']) + ". This is infeasible; we don't have enough positions for cadets to fill.")

    # Print statement
    print('Done,', issue, "issues found.")

`parameter_sets_additions(parameters)` ¶

Add Indexed Sets and Subsets to the Problem Instance Parameters.

This function enhances the problem instance by creating indexed sets and subsets for both cadets and AFSCs, demographic filters, eligibility matrices, preference-related metadata, and readiness for optimization. It also validates eligibility constraints and appends additional calculated data fields.

Parameters¶

parameters : dict The fixed model input parameters for a cadet-AFSC assignment instance, including eligibility matrices, cadet/AFSC attributes, utility matrices, and demographics.

Returns¶

Updated parameter dictionary with:

Indexed cadet and AFSC sets: I, J, J^E, I^E
Eligibility and preference counts: num_eligible, Choice Count
Demographic and qualification subsets: I^D, I^USAFA, I^Male, I^Minority, etc.
Assignment constraints: J^Fixed, J^Reserved
Cadet and AFSC preference mappings
Updated utility matrix with unmatched column

Examples¶

from afccp.data.adjustments import parameter_sets_additions
params = parameter_sets_additions(params)

Notes¶

Automatically detects and processes USAFA/ROTC cadet splits based on usafa and soc columns.
Adds extra handling for cadets that are fixed to AFSCs via preassignments in assigned.
Includes support for rated cadets, STEM AFSCs, race/ethnicity filters, and eligibility-based breakouts.

`more_parameter_additions(parameters)` ¶

Add Additional Subsets and Parameter Structures to the Problem Instance.

This function enhances the problem instance by appending numerous structured subsets and derived attributes based on cadet preferences, eligibility, accession groupings, demographics, and more. It enriches the input parameter dictionary in preparation for detailed analysis and optimization.

Parameters¶

parameters : dict The initial problem instance dictionary, containing data on cadets, AFSCs, eligibility, utility matrices, etc.

Returns¶

dict The updated problem instance with additional fields, subsets, and derived variables including:

Cadet and AFSC preferences
Accessions group (Rated, USSF, NRL) AFSC indices
Rated-specific cadet groupings and OM mapping
Simpson index for race/ethnicity
Groupings by SOC (e.g., ROTC, USAFA), gender, and STEM designation
Subsets like I^Must_Match, J^Bottom 2 Choices, etc.

Examples¶

parameters = more_parameter_additions(parameters)

Notes¶

The function performs a large number of conditional operations and appends dozens of new keys to parameters. These are used downstream in optimization and statistical evaluation of AFSC assignment plans.

`base_training_parameter_additions(parameters)` ¶

Add Base and Training Parameters to the Problem Instance.

This function extends the parameter dictionary with the data structures required to support base assignments and training course scheduling within the CASTLE Base/Training optimization model. Each cadet is categorized into preference-based "states" depending on their AFSC priorities and base/course interest.

The function also calculates cadet-course availability, utility of wait times, and assignment eligibility across bases and courses. This enables simultaneous modeling of AFSC matches, base assignments, and training timelines.

Parameters¶

parameters : dict The problem instance parameters, including cadet preferences, AFSC eligibility, utility scores, training thresholds, and configuration flags for base/course logic.

Returns¶

dict Updated parameter dictionary with additional sets and matrices such as:

D, Cadet Objectives, J^State, w^A, w^B, w^C, u^S: cadet state structures.
B^A, B^E, B^State: base assignment eligibility mappings.
C^E, I^A, course_days_cadet, course_utility: training availability and utility values.
lo^B, hi^B, lo^C, hi^C: quantity constraints on base/course assignments.

Examples¶

from afccp.data.adjustments import base_training_parameter_additions
parameters = base_training_parameter_additions(parameters)

Notes¶

Cadet states are built using base_threshold and training_threshold, which split cadet preferences into AFSCs only, AFSC + base, and AFSC + base + course states.
Utility from training courses is based on cadet preferences (Early, Late, None) and normalized start dates.
Course utility is scaled from 0 to 1, with utility decreasing/increasing with wait time as appropriate.
This logic assumes all relevant arrays like training_start, course_start, afsc_assign_base, etc., exist and are preloaded in the parameter dictionary.

`set_ots_must_matches(parameters)` ¶

Identify OTS Candidates Who Must Be Matched in the Assignment.

This function determines which Officer Training School (OTS) cadets must be assigned (i.e., matched) by identifying the top candidates based on their Order of Merit (OM) scores. It updates the must_match array to indicate mandatory match requirements, and adds a new set I^Must_Match containing the indices of cadets who must be assigned a slot.

If OTS is not a participating source of commissioning (SOC) in the instance, the function exits early with no modifications.

Parameters:¶

parameters (dict): Dictionary of model parameters, including cadet index sets (I, I^OTS), merit scores, and SOC definitions.

Returns:¶

dict: The updated parameters dictionary with the following changes:
- must_match: N-length array with 1 for must-match cadets, 0 for others, and NaN for non-OTS cadets.
- I^Must_Match: Set of cadet indices in I^OTS who are in the top ~99.5% of the OM distribution.

Examples:¶

parameters = set_ots_must_matches(parameters)

Source code in afccp/data/adjustments.py

def set_ots_must_matches(parameters):
    """
    Identify OTS Candidates Who Must Be Matched in the Assignment.

    This function determines which Officer Training School (OTS) cadets must be assigned (i.e., matched) by
    identifying the top candidates based on their Order of Merit (OM) scores. It updates the `must_match`
    array to indicate mandatory match requirements, and adds a new set `I^Must_Match` containing the indices
    of cadets who must be assigned a slot.

    If OTS is not a participating source of commissioning (SOC) in the instance, the function exits early
    with no modifications.

    Parameters:
    --------
    - parameters (dict): Dictionary of model parameters, including cadet index sets (`I`, `I^OTS`), merit scores,
      and SOC definitions.

    Returns:
    --------
    - dict: The updated parameters dictionary with the following changes:

        - `must_match`: N-length array with `1` for must-match cadets, `0` for others, and `NaN` for non-OTS cadets.
        - `I^Must_Match`: Set of cadet indices in `I^OTS` who are in the top ~99.5% of the OM distribution.

    Examples:
    --------
    ```python
    parameters = set_ots_must_matches(parameters)
    ```
    """

    # Shorthand
    p = parameters

    # Clear "must matches"
    p['must_match'] = np.array([np.nan for _ in p['I']])
    p['must_match'][p['I^OTS']] = 0

    # No OTS adjustments to be made
    if 'ots' not in p['SOCs']:
        print('OTS not included in this instance!! Nothing to do here.')
        return p

    # Sort OTS candidates by OM and take the top {ots_accessions} people
    sorted_by_merit = np.argsort(p['merit'])[::-1]
    ots_sorted = np.array([i for i in sorted_by_merit if i in p['I^OTS']])
    p['I^Must_Match'] = ots_sorted[:int(p['ots_accessions'] * 0.995)]
    p['must_match'][p['I^Must_Match']] = 1
    return p

`gather_degree_tier_qual_matrix(cadets_df, parameters)` ¶

Construct or Validate Degree Tier Qualification Matrix for Cadets.

This function analyzes the provided cadets_df and parameters to determine if a valid degree qualification matrix (qual) exists. If not, or if the format differs from the expected "Tiers" structure, a new one is generated using CIP codes. It then computes a series of derived binary matrices (e.g., eligible, mandatory, tier 1, etc.) that describe cadet eligibility for each AFSC based on degree requirements.

The degree tier qualification matrix is a critical part of the AFSC assignment model, influencing eligibility filtering, tier-based objective constraints, and value function evaluations.

Parameters:¶

cadets_df (pd.DataFrame): The dataframe containing cadet qualification data. Must contain columns like qual_AFSC or CIP fields if generating the qualification matrix.
parameters (dict): Instance parameter dictionary (p) that includes AFSCs, CIP codes, qualification type, and degree tier expectations. This dictionary will be modified in place.

Returns:¶

dict: Updated parameters dictionary with the following keys (if applicable):
- "qual": The constructed or validated NxM string matrix of qualification levels.
- "eligible" / "ineligible": Binary matrices indicating AFSC eligibility.
- "mandatory" / "desired" / "permitted": Binary matrices based on tier requirements.
- "tier 1" to "tier 4": Tier-specific binary matrices.
- "exception": Binary matrix marking cadets eligible through exception rules.
- "t_count": Array of number of degree tiers per AFSC.
- "t_proportion": Matrix with expected proportions for each tier per AFSC.
- "t_eq" / "t_geq" / "t_leq": Binary matrices specifying how tier requirements should be interpreted.

Examples:¶

p = gather_degree_tier_qual_matrix(cadets_df, p)

`convert_instance_to_from_scrubbed(instance, new_letter=None, translation_dict=None, data_name='Unknown')` ¶

Convert Between Original and Scrubbed AFSC Names Based on PGL Sorting.

This function transforms a problem instance by reordering or restoring AFSC names based on their PGL targets. It is used to anonymize (scrub) AFSCs for publication or experimentation by replacing real AFSC identifiers with generic labels (e.g., "X1", "X2", ...) while preserving order. If a translation dictionary is provided, it performs the inverse operation—restoring original AFSC names from their scrubbed versions.

The function updates all relevant AFSC-indexed matrices, arrays, and value parameters in the instance. It also modifies the solution dictionary (instance.solutions) and preference matrices to maintain consistency.

Parameters:¶

instance (CadetCareerProblem): The full problem instance containing parameter and solution data.
new_letter (str, optional): A single letter (e.g., "X") to use as the prefix for scrubbed AFSC names. If provided, performs a forward conversion (real → scrubbed).
translation_dict (dict, optional): A mapping from real to scrubbed AFSC names. If provided and new_letter is None, performs a reverse conversion (scrubbed → real).
data_name (str, optional): A custom label to attach to the instance's data_name attribute.

Returns:¶

tuple:
- instance (CadetCareerProblem): The updated instance with renamed AFSCs and adjusted internal data.
- translation_dict (dict): The mapping used for conversion (real → scrubbed).

Examples:¶

# Forward conversion (scrubbing AFSC names)
new_instance, afsc_mapping = convert_instance_to_from_scrubbed(instance, new_letter="X")

# Reverse conversion (restoring AFSC names)
original_instance, _ = convert_instance_to_from_scrubbed(new_instance, translation_dict=afsc_mapping)

Source code in afccp/data/adjustments.py

def convert_instance_to_from_scrubbed(instance, new_letter=None, translation_dict=None, data_name='Unknown'):
    """
    Convert Between Original and Scrubbed AFSC Names Based on PGL Sorting.

    This function transforms a problem instance by reordering or restoring AFSC names based on their PGL targets.
    It is used to anonymize (scrub) AFSCs for publication or experimentation by replacing real AFSC identifiers
    with generic labels (e.g., "X1", "X2", ...) while preserving order. If a translation dictionary is provided,
    it performs the inverse operation—restoring original AFSC names from their scrubbed versions.

    The function updates all relevant AFSC-indexed matrices, arrays, and value parameters in the instance.
    It also modifies the solution dictionary (`instance.solutions`) and preference matrices to maintain consistency.

    Parameters:
    --------
    - instance (`CadetCareerProblem`): The full problem instance containing parameter and solution data.
    - new_letter (str, optional): A single letter (e.g., `"X"`) to use as the prefix for scrubbed AFSC names.
      If provided, performs a *forward* conversion (real → scrubbed).
    - translation_dict (dict, optional): A mapping from real to scrubbed AFSC names. If provided and
      `new_letter` is None, performs a *reverse* conversion (scrubbed → real).
    - data_name (str, optional): A custom label to attach to the instance's `data_name` attribute.

    Returns:
    --------
    - tuple:
        - `instance` (`CadetCareerProblem`): The updated instance with renamed AFSCs and adjusted internal data.
        - `translation_dict` (dict): The mapping used for conversion (real → scrubbed).

    Examples:
    --------
    ```python
    # Forward conversion (scrubbing AFSC names)
    new_instance, afsc_mapping = convert_instance_to_from_scrubbed(instance, new_letter="X")

    # Reverse conversion (restoring AFSC names)
    original_instance, _ = convert_instance_to_from_scrubbed(new_instance, translation_dict=afsc_mapping)
    ```
    """

    # Load parameters
    p = copy.deepcopy(instance.parameters)

    # Initialize AFSC information
    current_afscs_unsorted = p["afscs"][:p["M"]]
    new_p = copy.deepcopy(p)

    # We're going from original to scrubbed
    if new_letter is not None:
        data_name = new_letter

        # Sort current list of AFSCs by PGL
        t_indices = np.argsort(p["pgl"])[::-1]  # Indices that word sort the list -> used a lot below!
        current_afscs = copy.deepcopy(current_afscs_unsorted[t_indices])

        # Construct new list of AFSCS
        new_p['afscs'] = np.array([' ' * 10 for _ in p['J']])
        for j, afsc in enumerate(current_afscs):
            new_p['afscs'][j] = new_letter + str(j + 1)

            # Adjust new AFSC by adding "_U" or "_R" extension if necessary
            for ext in ["_R", "_U"]:
                if ext in afsc:
                    new_p['afscs'][j] += ext
                    break

        # Translate AFSCs to the new list
        translation_dict = {}
        for afsc in current_afscs_unsorted:
            j = np.where(current_afscs == afsc)[0][0]
            translation_dict[afsc] = new_p['afscs'][j]  # Save this AFSC to the translation dictionary
        new_p["afscs"] = np.hstack((new_p["afscs"], "*"))  # Add "unmatched" AFSC

    # We're going from scrubbed to original
    else:

        # Translate AFSCs (Really weird sorting going on...sorry)
        new_p["afscs"] = np.array(list(translation_dict.keys()))
        new_p["afscs"] = np.hstack((new_p["afscs"], "*"))  # Add "unmatched" AFSC
        flipped_translation_dict = {translation_dict[afsc]: afsc for afsc in translation_dict}
        real_order_scrubbed_afscs = np.array(list(flipped_translation_dict.keys()))
        scrubbed_order_indices = np.array(
            [np.where(real_order_scrubbed_afscs==afsc)[0][0] for afsc in current_afscs_unsorted])
        scrubbed_order_real_afscs = new_p['afscs'][scrubbed_order_indices]
        current_afscs = real_order_scrubbed_afscs

        # Get sorted indices
        t_indices = np.array([np.where(scrubbed_order_real_afscs==afsc)[0][0] for afsc in new_p["afscs"][:p["M"]]])

    # Loop through each key in the parameter dictionary to translate it
    for key in p:

        # If it's a one dimensional array of length M, we translate it accordingly
        if np.shape(p[key]) == (p["M"], ) and "^" not in key:  # Sets/Subsets will be adjusted later
            new_p[key] = p[key][t_indices]

        # If it's a two-dimensional array of shape Mx4, we translate it accordingly
        elif np.shape(p[key]) == (p["M"], 4):
            new_p[key] = p[key][t_indices, :]

        # If it's a two-dimensional array of shape (NxM), we translate it accordingly
        elif np.shape(p[key]) == (p["N"], p["M"]) and key not in ['c_preferences', 'c_utilities']:
            new_p[key] = p[key][:, t_indices]

        # If it's a two-dimensional array of shape (NxM+1), we translate it accordingly (leave unmatched AFSC alone)
        elif np.shape(p[key]) == (p["N"], p["M"] + 1):
            new_p[key] = copy.deepcopy(p[key])
            new_p[key][:, :p['M']] = p[key][:, t_indices]

    # Get assigned AFSC vector
    for i, real_afsc in enumerate(p["assigned"]):
        if real_afsc in current_afscs:
            j = np.where(current_afscs == real_afsc)[0][0]
            new_p["assigned"][i] = new_p["afscs"][j]

    # Set additions, and add to the instance
    instance.parameters = parameter_sets_additions(new_p)

    # Translate value parameters
    if instance.vp_dict is not None:
        new_vp_dict = {}
        for vp_name in instance.vp_dict:
            vp = copy.deepcopy(instance.vp_dict[vp_name])
            new_vp = copy.deepcopy(vp)

            for key in vp:

                # If it's a one dimensional array of length M, we translate it accordingly
                if np.shape(vp[key]) == (p["M"],):
                    new_vp[key] = vp[key][t_indices]

                # If it's a two-dimensional array of shape (NxM), we translate it accordingly
                elif np.shape(vp[key]) == (p["N"], p["M"]):
                    new_vp[key] = vp[key][:, t_indices]

                # If it's a two-dimensional array of shape (MxO), we translate it accordingly
                elif np.shape(vp[key]) == (vp["M"], vp["O"]) and key not in ["a", "f^hat"]:
                    new_vp[key] = vp[key][t_indices, :]

            for j, old_j in enumerate(t_indices):
                for k in vp["K"]:
                    for key in ["a", "f^hat"]:
                        new_vp[key][j][k] = vp[key][old_j][k]

            # Set value parameters to dict
            new_vp_dict[vp_name] = new_vp

        # Set it to the instance
        instance.vp_dict = new_vp_dict

        # Loop through each set of value parameters again
        for vp_name in instance.vp_dict:

            # Set additions
            instance.vp_dict[vp_name] = \
                afccp.data.values.value_parameters_sets_additions(instance.parameters, instance.vp_dict[vp_name])

    else:
        instance.vp_dict = None

    # Translate solutions
    if instance.solutions is not None:
        new_solutions_dict = {}

        # Loop through each solution
        for solution_name in instance.solutions:
            real_solution = copy.deepcopy(instance.solutions[solution_name])
            new_solutions_dict[solution_name] = copy.deepcopy(real_solution)

            # Loop through each assigned AFSC for the cadets
            for i, j in enumerate(real_solution['j_array']):
                if j != p["M"]:
                    real_afsc = p["afscs"][j]
                    j = np.where(current_afscs == real_afsc)[0][0]
                    new_solutions_dict[solution_name]['j_array'][i] = j

        # Save solutions dictionary
        instance.solutions = new_solutions_dict

    else:
        instance.solutions = None

    # Convert "c_preferences" array
    if "c_preferences" in p:
        for i in p["I"]:
            for pref in range(p["P"]):
                real_afsc = p["c_preferences"][i, pref]
                if real_afsc in current_afscs:
                    j = np.where(current_afscs == real_afsc)[0][0]
                    new_p["c_preferences"][i, pref] = new_p["afscs"][j]

    # Instance Attributes
    instance.data_name, instance.data_version = data_name, "Default"
    instance.import_paths, instance.export_paths = None, None

    return instance, translation_dict

`data.adjustments` ¶