Q&A Call
Q&A Call for 2023 Humana-Mays Healthcare Data Analytics Case Competition is scheduled for September 23, 2023. For further questions, please email the team at humanacasecomp@tamu.edu
Please check below for previous year’s competition Q&A Call to explore and learn more about the competition.
Q&A Slide Deck:
Humana-Mays_Case_Competition_Q&A_09272022
Q&A Answer Log (Including those not answered on the call):
“How recent is this data?”
A: The survey used to collect the self-reported Housing Insecurity indicator was collected in the June/July timeframe of 2022. The lookback period associated with the potential predictor data is 12 months prior to survey response date.
“Can you elaborate more on the 3 fields in the hold-out submission? Especially RANK? Are we just sorting
the scores to get the ranking?”
A: ID is the identification variable on the holdout. We need this field returned such that we can
match your scored file to our file of record and evaluate the performance & fairness of your
resulting model.
SCORE is the resulting probability value assignment resulting from your model. Typically, this is
value between 0 and 1 where the higher the value the more likely the record is to have housing
insecurity issues.
RANK is simply the record number once the scores are sorted from highest probability to lowest
probability. If you records have the same the same score value (i.e. tie) those records should
receive the same RANK value.
“The variable, “cms_ma_plan_ind,” takes the value of 0 for all observations. It is described as “Binary” indicator that a member is on a Medicare Advantage plan.” Is it correct that the data is supposed to include Humana members with Medicare Advantage plan with Part D so that all members are expected to have the value of 1 under that variable?”
A: A Medicare Advantage with Part D (or MAPD) plan includes Medicare Part D prescription drug coverage and a Medicare Advantage (or MA) plan does not include drug coverage. All members in this file are MAPD members.
“Is the underlying rate card assumed to be same/similar? ie – same treatment/prescription = same cost for all the members in a data.”
A: Associated costs for treatments & prescriptions can be different for members depending on the plan and location of service. As such, the provided data includes individual costs realized for each member.
“What’s the difference between Probable Homeowner and Homeowner for the “cons_homstat” feature?”
A: Homeowner refers to a “verified” homeowner whereas probable homeowner is a household that is “unverified” but scores highly on homeownership predictive model.
“What’s the difference between Probable Multi-Buyer and Multi-Buyer for the “cons_mobplus” feature?”
A: Single Buyer refers to a member who has 1 ‘verified’ purchase from a mail order vendor. Multi-Buyer is a member who has 2+ ‘verified” purchases from a mail order vendor. Probable Multi-Buyer is a member that has no verified purchases but scores highly on mail-order buyer predictive model.
“What does ‘Both’ (i.e. value = 3) mean for the feature cms_orig_reas_entitle_cd? Does it mean all values (0,1,2) or just (1,2)?”
A: Both stands for 1 – Disable and 2 – End Stage Renal Disease (ESRD).
“Based on the slides from the kick-off meeting, there are "outreach point features” in the dataset, would you please give an example of what that means?”
A: This refers to the features with the prefix of “CNT_CP”. These features refer to the number of interactions Humana has had with that member via different channels (i.e. phone, web, mail) over the course of the last 12 months.
“We have a question about calculation of disparity score. As in the guide, the scoring metric is the true positive rate. Since we just need to upload scores and rankings in our results, I am wondering what threshold you would use to determine whether a score is labeled as positive or negative.”
A: You will only be able to calculate a performance score (i.e. AUC) and a fairness score (i.e. disparity score) on the training file given it contains the target variable (housing insecurity indicator. For the holdout submission, we are looking for the probability score – not a 1/0 prediction. Therefore, you will not need to determine a threshold to determine positive or negative.
“Question about the data dictionary. I saw the file with the dictionary, but I only saw around 8 listings. Is there another file that mentions the other categories in excel? I noticed like 800 different columns with acronym style categorization.”
A: There are 8 long descriptions associated with the categorical features to help you understand what each value means in the data. (i.e. Long Descriptions tab within the data dictionary workbook). The rest of the fields in the file are numeric in nature and do not require a cipher.
“We are curious about whether cci_score and dcsi_score are ordinal, i.e., the higher (or lower) the scores are, the worse this person’s health status is.”
A: Yes. The higher the scores are, the worse this person’s health status is. This applies to both cci_score and dcsi_score, while dcsi_score is for diabetes severity specifically.
“We found that ‘cons_stlnindx’ and ‘cons_stlindex’ are both integer in dataset. Do they both mean category for loan? If not, what are their detailed description?”
A: cons_stlnindx: Student loan index. A statistical model predicting an individual is likely to have a student loan. The higher index, the higher likelihood.
cons_stlindex: Short Term Loan Index. A demographic based analytical model which predicts the likelihood someone in the household has applied for a short term loan. The higher index, the higher likelihood.
For both ‘cons_hxmh’ and ’cons_hxmioc’, as the index numbers gets bigger, do they mean people have a worse/better health situation?
A: cons_hxmh: Health Index – Manage Health. Predicts the likelihood of an individual to proactively manage their health. They are likely to pursue health and wellness goals plus track exercise, vital signs, nutrition, calories, or weight. The higher index, the higher likelihood.
cons_hxmioc: Health Index – Manage Illness or Condition. Predicts the likelihood of an individual to self-monitor an illness or health condition. They are likely to manage medication, look up health or nutritional information, post a question online or share personal health history, use an online application that stores personal health history, communicate with a health professional through online options (email chat, app, webcam). The higher index, the higher likelihood.
I’m seeing that there are quite some data points with estimated age below 65, but with Medicare Reason being “Old Age Survivors Insurance (OASI)”. Is this expected? On top of this, we were wondering what the category “Both” means in the original reason for entry into Medicare, since there are three categories: Old Age Survivors Insurance (OASI), Disable and End Stage Renal Disease (ESRD).
Yes, generally members with “Old Age Survivors Insurance (OASI)” should be 65 and older, so it appears that the synthetic data algorithm has introduce a little fuzziness here.
Both stands for 1 – Disable and 2 – End Stage Renal Disease (ESRD).
In the fairness guide, it is stated that fairness is based on the TPR for each group of people based on race and gender. How will you treat the racial categories “other” and “unknown”? Am I correct to assume that you will calculate the TPR for the “other” racial category but not for the “unknown” category?
When calculating the Disparity score, we are accounting for all categories, including Unknown, White, Black, Other, Asian, Hispanic, and North American Native.
Please tell me if my understanding about the cnt_cp lag variables is correct. For e.g.- if the score date was 15th September, then cnt_cp_emails_1 will be count of member interactions from 15th July to 15th August i.e. 1 month prior to the month of score date.
Correct.
Can you please tell what is meant by each type of member interactions – webstatement, vat, print, livecall and email?
Webstatement: member interacted with Humana via online web activities.
VAT: member interacted with Humana via IVR call (interactive voice response), basically a computerized call without live agent.
Print: member interacted with Humana via printed materials, such as mail documents.
Livecall: member interacted with Humana via live agent call.
Email: member interacted with Humana via emails.