Abstract
Minimal necessary NAACCR variables chosen and process documented for preparing them for analysis, as well as supplementing some of them with additional data from EMR if available. Ready to proceed to chart review of existing data, acquisition of independent NAACCR data, development of additional variables, and working on Aim 1.Note: This is not (yet) a manuscript. We are still at the data cleaning/alignment stage and it is far too early to draw conclusions. Rather, this is a regularly updated report that I am sharing with you to keep you in the loop on my work and/or because you are also working on NAACCR, i2b2, Epic, or Sunrise because I value your perspective and perhaps my results might be useful to your own work. Only de-identified data has been used to generate these results any dates or patient num values you see here are also de-identified (with size of time intervals preserved).This portion of the study is under Dr. Michalek’s exempt project IRB number HSC20170563N. If you are a researcher who would like a copy of the data, please email me and I will get back to you with further instructions and any additional information needed for our records. Yellow highlights are items with which I know I need to deal soon. Verbatim names of files, variables/elements, or values are displayed in a special style, like this . Data element names are in addition linked to a glossary at the end of this document, e.g. Surgical Oncology . This is where any relevant cleaning or tranformation steps will be described (in progress). Data elements from NAACCR usually have a NAACCR ID preceding them, e.g. 1780 Quality of Survival . I try to use the word ‘data element’ to describe data in its raw state and ‘variable’ to refer to analysis-ready data that I have already processed. Often one variable incorporates information from multiple data elements. Tables, figures, and sections are also linked from text that references them. If you have a Word version of this document, to follow a link, please hold down the ‘control’ key and click on it. The most current version of this document can be found online at https://rpubs.com/bokov/kidneycancer and it has a built-in chat session. |
A recent study of state death records1 reports that among US-born Texans of Hispanic ancestry (7.3 million, 27% of the State’s population), annual age-adjusted mortality rates for kidney cancer are 1.5-fold and 1.4-fold those of non-Hispanic whites for males and females respectively. My goal is to determine whether these findings can be replicated at UT Health (Aim 2) and Massachusetts General Hospital (Aim 3). If there is evidence for an ethnic disparity, I will look for possible mediators of this disparity among socioeconomic, lifestyle, and family history variables (Aim 2a). Otherwise the focus will shift to determining which of these same variables are the best predictors of mortality and recurrence.
At the Clinical Informatics Research Division (CIRD) we operate an i2b22 data warehouse containing deidentified data for over 1.3 million patients from the electronic medical record (EMR) systems of the UT Health faculty practice and the University Health System (UHS) county hospital. We use the HERON3 extract transform load (ETL) process to link data from multiple sources including copies of monthly reports that the Mays Cancer Center sends to the Texas Cancer Registry with detailed information on cancer cases including dates of diagnosis, surgery, and recurrence along with stage and grade at presentation. My first-pass eligibility query returns 2327 patients having one or more of the following in their records: an ICD9 code of 189.0 or any ICD10 code starting with C64; the NAACCR item 0400 Primary Site
having a value starting with C64 (Kidney, NOS
); or the SEER Primary Site having a value of Kidney and Renal Pelvis
.
My second pass criteria narrow the initial cohort to patients that have NAACCR, defined as having a non-missing 0390 Date of Diagnosis
and one or both of Kidney, NOS
or Kidney and Renal Pelvis
. As can be seen from table I only 486 of the patient-set met these criteria and 1841 did not. Actually a total of 673 patients had NAACCR records but 187 of them had kidney cancer documented only in the EMR, but neither Kidney, NOS
or Kidney and Renal Pelvis
in NAACCR. Next time I re-run my i2b2 query I will include all site of occurrence information from NAACCR not just kidney. This will allow me to find out what types of cancer these patients do in fact have. In Appendix 3.2.1-Appendix 3.2.3 I identified additional exclusion criteria which I will implement in the next major revision of this document.
In sec. 2.1 I summarize the evidence that NAACCR and EMR records are correctly matched with each other. In sec. 2.2 I summarize the minimum set of NAACCR data elements that is sufficient to replicate my analysis in an independent NAACCR data set. In sec. 2.3 I report the extent to which the completeness of NAACCR records can be improved by using EMR records of the same patients. In sec. 3 is a technical demonstration of the data analysis scripts (on a small random sample). In sec. 4 there is a characterization of the full (N=2327) patient cohort. Finally, in sec. 5 I present my plans for overcoming the data issues I found, replicating the analysis on independent data, preparing additional variables, and starting work on Aim 1.
Since this is the first study at our site to make such extensive use of combined EMR and NAACCR data, it is important to first validate the data linkage done by our ETL.
The following data elements exist in both NAACCR and the EMR, respectively: date of birth (0240 Date of Birth
and birth_date
), marital status (0150 Marital Status at DX
and Marital Status
), sex (0220 Sex
and sex_cd
), race (Race (NAACCR 0160-0164)
and race_cd
), and Hispanic ethnicity (0190 Spanish/Hispanic Origin
and Hispanic or Latino
). The agreement between NAACCR and the EMR is never going to be 100% with race, Hispanic ancestry, and marital status expected to be especially variable. Nonetheless, if record linkage is correct, when patient counts for NAACCR and EMR are tabulated against each of the above variables, then most of the values should agree.
I confirmed that this is the case for marital status (table VII), sex (table VIII), race (table IX), and Hispanic ancestry (table X). Furthermore, there are 0 eligible patients lacking a 0240 Date of Birth
and only 15 with a mismatch between 0240 Date of Birth
and birth_date
. Independent evidence for correct linkage is that EMR ICD9/10 codes for primary kidney cancer rarely precede 0390 Date of Diagnosis
(fig. 5), EMR surgical history of nephrectomy and ICD9/10 codes for acquired absence of a kidney rarely precede 1200 RX Date--Surgery
or 3170 RX Date--Most Defin Surg
(fig. 6), and death dates from non-NAACCR sources (Death, i2b2
, Deceased per SSA
, and Expired
) rarely precede 1760 Vital Status
(fig. 10).
The primary outcome variables I need are date of initial diagnosis, date of surgery (if any), date of recurrence (if any), and date of death (if any). The primary predictor variable is whether or not a patient is Hispanic. There are many covariates of interest, but these five values are the scaffolding on which the rest of the analysis will be built.
I found the following NAACCR elements sufficient for deriving all the above analytic variables: 0190 Spanish/Hispanic Origin
, 1880 Recurrence Type--1st
, 3170 RX Date--Most Defin Surg
, 1340 Reason for No Surgery
, 0390 Date of Diagnosis
, 1200 RX Date--Surgery
, 1750 Date of Last Contact
, 1760 Vital Status
, 1770 Cancer Status
, 1860 Recurrence Date--1st
, Kidney and Renal Pelvis
, and Kidney, NOS
. More details about how these were selected can be found in Appendix 3.2. In addition the following will almost certainly be needed for covariates or mediators: 0220 Sex
, 0240 Date of Birth
, 0150 Marital Status at DX
, 0250 Birthplace
, and any field whose name contains Race
, Comorbid/Complication
, AJCC
, or TNM
. For crosschecking it will also be useful to have 2850 CS Mets at DX
, 0580 Date of 1st Contact
, and 0446 Multiplicity Counter
. Additional items are likely to be needed as this project evolves, but the elements listed so far should be sufficient to replicate my analysis on de-identified State or National NAACCR data.
EMR records can not only enrich the data with additional elements unavailable in NAACCR alone, but might also make it possible to fill in missing 0390 Date of Diagnosis
, 3170 RX Date--Most Defin Surg
/ 1200 RX Date--Surgery
, 1860 Recurrence Date--1st
, and 1750 Date of Last Contact
values. It may even be possible to reconstruct entire records for the 1841 kidney cancer patients in the EMR lacking NAACCR records. However, this depends on how much the EMR and NAACCR versions of a variable agree when neither is missing.
Data elements representing date of death and Hispanic ethnicity are in sufficient agreement ( table X and Appendix 3.2.4 ) to justify merging information from the EMR and NAACCR. The process for combining them is described in the Death
, Hispanic (strict)
, and Hispanic (broad)
sections of Appendix 4 respectively. At this time I cannot merge diagnosis, surgery, or recurrence– where data from both sources is available, EMR dates lag considerably behind NAACCR dates ( Appendix 3.2.1-Appendix 3.2.3 ) and their variability is probably larger than the effect size. The surgery and recurrence lags might be because those actual visits are not yet available in the data warehouse and I am only seeing them as reflected in the patient history at visits long after the fact. The diagnosis lag may be due to the decision to proceed with surgery often being made based on imaging data,4 with definitive pathology results only available after surgery (Appendix 3.2.2). Attempting to merge these elements would bias the data and obscure the actual differences. However there are several ways forward that I will discuss in sec. 5 below.
EMR data can still be used to flag records for exclusion pending verification by chart review in cases where EMR codes for kidney cancer or secondary tumors precede Diagnosis
or Recurrence
respectively. This can also apply to nephrectomy EMR codes and [Surgery
][a_tsurg] but I will need to distinguish between the prior nephrectomy being due to cancer versus other indications.
For now I am analyzing the data as if I only have access to NAACCR except mortality where I do it both with ( fig. 3 ) and without ( fig. 4 ) the EMR.
The point of this section is solely to test whether my scripts succeeded in turning the raw data elements into a time-to-event (TTE) variables to which Kaplan-Meier curves can be fit without numeric errors or grossly implausible results. All the plots below are from a small random sample of the data– N=127, 82 Hispanic and 45 non-Hispanic white, 5 unknown excluded. This is further reduced in some cases as described in the figure captions. These sample sizes are not sufficient to detect clinically significant differences and, again, this is not the goal yet. The intent is only to insure that my software performs correctly while keeping myself blinded to the hold-out data on which the hypothesis testing will ultimately be done.
Furthermore, these survival curves are not yet adjusted for covariates such as age or stage at diagnosis. There are also refinements planned to the exclusion criteria which I discuss below in sec. 5.
In all the plots below, the time is expressed in weeks and +
signs denote censored events (the last follow-up of patients for whom the respective outcomes were never observed). The lightly-shaded regions around each line are 95% confidence intervals.
Typically 2-4 weeks elapse diagnosis from surgery and providers try to not exceed 4 weeks. Nevertheless years may sometimes elapse due to factors such as an indolent tumors or loss of contact with the patient. About 15% of patients never undergo surgery4. Fig. 1 is in agreement with this. It can also be seen in fig. 1 that 34 surgeries seem to happen on the day of diagnosis. This is plausible if NAACCR diagnosis is based on pathology rather than clinical examination where a positive result is usually coded as a renal mass, not a cancer. In my next data update I intend to also include all ICD9/10 codes for renal mass at which point I will revisit the question of using EMR data to fill in missing diagnosis dates (see sec. 5).
Figure 1: Number of weeks elapsed from Diagnosis
(time 0) to Surgery
for 82 Hispanic and 45 non-Hispanic white patients with a 3-year follow-up period (any surgeries occurring more than 3 years post-diagnosis are treated as censored).
Figure 2: Number of weeks elapsed from Surgery
(time 0) to Recurrence
for 67 Hispanic and 34 non-Hispanic white patients. The numbers are lower than for fig. 1 because patients not undergoing surgery are excluded. Here the follow-up period is six years.
Figure 3: Like fig. 2 except now the outcome is 1760 Vital Status
for 67 Hispanic and 34 non-Hispanic white patients. Six-year follow-up.
Figure 4: Like fig. 3 but now supplemented EMR information to see how much of a difference it makes. For the predictor Hispanic (broad)
is used instead of Hispanic (NAACCR)
and for the outcome Death
is used instead of 1760 Vital Status
. There were 68 Hispanic and 33 non-Hispanic white patients. There were 10 fewer censored events than in fig. 3 which may improve sensitivity in the actual analysis.
The below variables are subject to change as the data validation and preparation processes evolve.
Disease-free | Never disease-free | Recurred | Unknown if recurred or was ever gone | Not in NAACCR | |
---|---|---|---|---|---|
n | 160 | 211 | 95 | 20 | 1841 |
Age at Last Contact, combined (mean (sd)) | 54.32 (20.42) | 63.43 (13.76) | 62.51 (15.23) | 55.59 (23.01) | 61.34 (14.18) |
a_hsp_broad (%) | |||||
Hispanic | 106 ( 66.2) | 116 ( 55.0) | 50 ( 52.6) | 8 ( 40.0) | 857 (46.6) |
non-Hispanic white | 47 ( 29.4) | 75 ( 35.5) | 42 ( 44.2) | 10 ( 50.0) | 525 (28.5) |
Other | 3 ( 1.9) | 17 ( 8.1) | 3 ( 3.2) | 1 ( 5.0) | 13 ( 0.7) |
Unknown | 4 ( 2.5) | 3 ( 1.4) | 0 | 1 ( 5.0) | 364 (19.8) |
NA | 0 | 0 | 0 | 0 | 82 ( 4.5) |
a_hsp_naaccr (%) | |||||
Hispanic | 100 ( 62.5) | 114 ( 54.0) | 46 ( 48.4) | 8 ( 40.0) | 86 ( 4.7) |
non-Hispanic white | 50 ( 31.2) | 74 ( 35.1) | 45 ( 47.4) | 10 ( 50.0) | 84 ( 4.6) |
Other | 4 ( 2.5) | 18 ( 8.5) | 2 ( 2.1) | 1 ( 5.0) | 14 ( 0.8) |
Unknown | 6 ( 3.8) | 5 ( 2.4) | 2 ( 2.1) | 1 ( 5.0) | 3 ( 0.2) |
NA | 0 | 0 | 0 | 0 | 1654 (89.8) |
a_hsp_strict (%) | |||||
Hispanic | 62 ( 38.8) | 68 ( 32.2) | 27 ( 28.4) | 6 ( 30.0) | 562 (30.5) |
non-Hispanic white | 29 ( 18.1) | 64 ( 30.3) | 35 ( 36.8) | 9 ( 45.0) | 53 ( 2.9) |
Other | 4 ( 2.5) | 12 ( 5.7) | 2 ( 2.1) | 1 ( 5.0) | 84 ( 4.6) |
Unknown | 65 ( 40.6) | 67 ( 31.8) | 31 ( 32.6) | 4 ( 20.0) | 702 (38.1) |
NA | 0 | 0 | 0 | 0 | 440 (23.9) |
a_tdeath (%) | 8 ( 5.0) | 99 ( 46.9) | 30 ( 31.6) | 3 ( 15.0) | 305 (16.6) |
a_tdiag (%) | 160 (100.0) | 211 (100.0) | 95 (100.0) | 20 (100.0) | 0 |
a_trecur (%) | 0 | 1 ( 0.5) | 83 ( 87.4) | 0 | 41 ( 2.2) |
a_tsurg (%) | 157 ( 98.1) | 113 ( 53.6) | 94 ( 98.9) | 13 ( 65.0) | 113 ( 6.1) |
BMI (mean (sd)) | 31.19 (8.34) | 27.77 (7.26) | 29.32 (7.11) | 29.66 (9.92) | 30.63 (9.31) |
Deceased, EMR (%) | 7 ( 4.4) | 90 ( 42.7) | 22 ( 23.2) | 3 ( 15.0) | 298 (16.2) |
Deceased, Registry (%) | 1 ( 0.6) | 71 ( 33.6) | 18 ( 18.9) | 3 ( 15.0) | 43 ( 2.3) |
Deceased, SSN (%) | 1 ( 0.6) | 12 ( 5.7) | 5 ( 5.3) | 0 | 89 ( 4.8) |
Diabetes, i2b2 (%) | 56 ( 35.0) | 54 ( 25.6) | 27 ( 28.4) | 1 ( 5.0) | 585 (31.8) |
Diabetes, Registry (%) | 31 ( 19.4) | 26 ( 12.3) | 8 ( 8.4) | 0 | 26 ( 1.4) |
Hispanic, i2b2 (%) | 92 ( 57.5) | 96 ( 45.5) | 43 ( 45.3) | 7 ( 35.0) | 746 (40.5) |
Hispanic, Registry (%) | |||||
Non_Hispanic | 54 ( 33.8) | 92 ( 43.6) | 47 ( 49.5) | 11 ( 55.0) | 98 ( 5.3) |
Unknown | 6 ( 3.8) | 5 ( 2.4) | 2 ( 2.1) | 1 ( 5.0) | 3 ( 0.2) |
Hispanic_NOS | 86 ( 53.8) | 96 ( 45.5) | 43 ( 45.3) | 8 ( 40.0) | 67 ( 3.6) |
Mexican | 13 ( 8.1) | 17 ( 8.1) | 1 ( 1.1) | 0 | 17 ( 0.9) |
Spanish_Surname | 0 | 1 ( 0.5) | 1 ( 1.1) | 0 | 2 ( 0.1) |
Cuban | 1 ( 0.6) | 0 | 0 | 0 | 0 |
S_Ctr_America | 0 | 0 | 1 ( 1.1) | 0 | 0 |
NA | 0 | 0 | 0 | 0 | 1654 (89.8) |
Insurance, Registry (%) | |||||
Not Insured | 17 ( 10.6) | 21 ( 10.0) | 7 ( 7.4) | 2 ( 10.0) | 17 ( 0.9) |
Self-Pay | 22 ( 13.8) | 21 ( 10.0) | 15 ( 15.8) | 0 | 14 ( 0.8) |
Insurance NOS | 1 ( 0.6) | 5 ( 2.4) | 0 | 0 | 1 ( 0.1) |
Managed Care HMO / PPO | 56 ( 35.0) | 53 ( 25.1) | 28 ( 29.5) | 10 ( 50.0) | 40 ( 2.2) |
Private Fee-for-Svc | 0 | 1 ( 0.5) | 0 | 0 | 0 |
Medicaid | 10 ( 6.2) | 14 ( 6.6) | 1 ( 1.1) | 0 | 10 ( 0.5) |
Medicaid Mgd. Care Pln. | 14 ( 8.8) | 6 ( 2.8) | 6 ( 6.3) | 3 ( 15.0) | 10 ( 0.5) |
Medicare/Medicaid NOS | 13 ( 8.1) | 30 ( 14.2) | 12 ( 12.6) | 1 ( 5.0) | 36 ( 2.0) |
Medicare w Suppl. NOS | 3 ( 1.9) | 2 ( 0.9) | 2 ( 2.1) | 0 | 6 ( 0.3) |
Medicare Mgd. Care Pln. | 9 ( 5.6) | 16 ( 7.6) | 7 ( 7.4) | 3 ( 15.0) | 13 ( 0.7) |
Medicare w Private Suppl. | 5 ( 3.1) | 22 ( 10.4) | 9 ( 9.5) | 0 | 20 ( 1.1) |
Medicare w Medicaid | 3 ( 1.9) | 5 ( 2.4) | 2 ( 2.1) | 0 | 7 ( 0.4) |
TriCare | 3 ( 1.9) | 1 ( 0.5) | 0 | 0 | 4 ( 0.2) |
VA | 1 ( 0.6) | 7 ( 3.3) | 1 ( 1.1) | 0 | 3 ( 0.2) |
Unknown | 3 ( 1.9) | 7 ( 3.3) | 5 ( 5.3) | 1 ( 5.0) | 6 ( 0.3) |
NA | 0 | 0 | 0 | 0 | 1654 (89.8) |
Kidney Cancer, i2b2 (%) | 152 ( 95.0) | 193 ( 91.5) | 85 ( 89.5) | 17 ( 85.0) | 1729 (93.9) |
Kidney Cancer, Registry (%) | 156 ( 97.5) | 204 ( 96.7) | 87 ( 91.6) | 19 ( 95.0) | 20 ( 1.1) |
Language, i2b2 (%) | |||||
English | 128 ( 80.0) | 173 ( 82.0) | 84 ( 88.4) | 19 ( 95.0) | 1588 (86.3) |
Spanish | 31 ( 19.4) | 29 ( 13.7) | 7 ( 7.4) | 1 ( 5.0) | 213 (11.6) |
Other | 0 | 3 ( 1.4) | 0 | 0 | 4 ( 0.2) |
Unknown | 1 ( 0.6) | 6 ( 2.8) | 4 ( 4.2) | 0 | 36 ( 2.0) |
Marital Status, Registry (%) | |||||
Divorced | 13 ( 8.1) | 16 ( 7.6) | 11 ( 11.6) | 0 | 16 ( 0.9) |
Separated | 8 ( 5.0) | 2 ( 0.9) | 1 ( 1.1) | 2 ( 10.0) | 6 ( 0.3) |
Married | 79 ( 49.4) | 125 ( 59.2) | 56 ( 58.9) | 7 ( 35.0) | 102 ( 5.5) |
Domestic Partner | 0 | 0 | 0 | 0 | 0 |
Single | 39 ( 24.4) | 30 ( 14.2) | 16 ( 16.8) | 9 ( 45.0) | 32 ( 1.7) |
Unknown | 15 ( 9.4) | 24 ( 11.4) | 8 ( 8.4) | 2 ( 10.0) | 17 ( 0.9) |
Widowed | 6 ( 3.8) | 14 ( 6.6) | 3 ( 3.2) | 0 | 14 ( 0.8) |
NA | 0 | 0 | 0 | 0 | 1654 (89.8) |
n_cstatus (%) | |||||
Tumor_Free | 160 (100.0) | 1 ( 0.5) | 7 ( 7.4) | 0 | 58 ( 3.2) |
Tumor | 0 | 210 ( 99.5) | 81 ( 85.3) | 0 | 114 ( 6.2) |
Unknown | 0 | 0 | 7 ( 7.4) | 20 (100.0) | 15 ( 0.8) |
NA | 0 | 0 | 0 | 0 | 1654 (89.8) |
Race, i2b2 (%) | |||||
White | 149 ( 93.1) | 185 ( 87.7) | 87 ( 91.6) | 19 ( 95.0) | 1566 (85.1) |
Black | 3 ( 1.9) | 10 ( 4.7) | 3 ( 3.2) | 1 ( 5.0) | 95 ( 5.2) |
Asian | 3 ( 1.9) | 6 ( 2.8) | 0 | 0 | 13 ( 0.7) |
Pac Islander | 0 | 0 | 0 | 0 | 1 ( 0.1) |
Other | 0 | 3 ( 1.4) | 0 | 0 | 46 ( 2.5) |
Unknown | 5 ( 3.1) | 7 ( 3.3) | 5 ( 5.3) | 0 | 120 ( 6.5) |
Race, Registry (%) | |||||
White | 153 ( 95.6) | 188 ( 89.1) | 91 ( 95.8) | 18 ( 90.0) | 170 ( 9.2) |
Black | 3 ( 1.9) | 10 ( 4.7) | 2 ( 2.1) | 1 ( 5.0) | 11 ( 0.6) |
Asian | 1 ( 0.6) | 3 ( 1.4) | 0 | 0 | 2 ( 0.1) |
Pac Islander | 0 | 1 ( 0.5) | 0 | 0 | 0 |
Other | 0 | 4 ( 1.9) | 0 | 0 | 0 |
Unknown | 3 ( 1.9) | 5 ( 2.4) | 2 ( 2.1) | 1 ( 5.0) | 4 ( 0.2) |
NA | 0 | 0 | 0 | 0 | 1654 (89.8) |
Sex, i2b2 (%) | |||||
m | 100 ( 62.5) | 151 ( 71.6) | 63 ( 66.3) | 13 ( 65.0) | 1047 (56.9) |
f | 60 ( 37.5) | 60 ( 28.4) | 32 ( 33.7) | 7 ( 35.0) | 793 (43.1) |
u | 0 | 0 | 0 | 0 | 1 ( 0.1) |
Sex, Registry (%) | |||||
m | 98 ( 61.3) | 149 ( 70.6) | 63 ( 66.3) | 13 ( 65.0) | 106 ( 5.8) |
f | 62 ( 38.8) | 62 ( 29.4) | 32 ( 33.7) | 7 ( 35.0) | 81 ( 4.4) |
NA | 0 | 0 | 0 | 0 | 1654 (89.8) |
This detailed investigation of the available data elements and development of analysis scripts opens four priority directions: more data, external data, more covariates, and improved pre-processing at the i2b2 end (Aim 1).
More data can be acquired by reclaiming values that are currently inconsistent or missing. There are various ad-hoc consistency checks described in Appendix 3.1, Appendix 3.2.1, Appendix 3.2.2 I need to gather these checks in one place and systematically run them on every patient to get a total count of records that need manual chart review (Dr. Rodriguez’s protocol) and for each record a list of issues to resolve.
To reclaim missing values I will need to solve the problem of lag and disagreement between the EMR and NAACCR (sec. 2.3). I will meet with the MCC NAACCR registrar and learn where exactly in the EMR and other sources she looks to abstract [1880 Recurrence Type--1st
][n_rectype], [3170 RX Date--Most Defin Surg
][n_rx3170], [1340 Reason for No Surgery
][n_surgreason], [0390 Date of Diagnosis
][n_ddiag], [1200 RX Date--Surgery
][n_dsurg], [1750 Date of Last Contact
][n_lc], [1760 Vital Status
][n_vtstat], [1770 Cancer Status
][n_cstatus], [1860 Recurrence Date--1st
][n_drecur], [Kidney and Renal Pelvis
][n_seer_kcancer], and [Kidney, NOS
][n_kcancer]. I will also meet with personnel experienced in Urology chart review to learn their methods.. This may lead to improvements in the CIRD ETL process. I also plan on adding all ICD codes for ‘renal mass’4 to my i2b2 query (Appendix 3.2.1). Meanwhile, in response to researcher questions including my own, CIRD staff have identified thousands of NAACCR entries and surgery billing records that got excluded from i2b2 because they are not associated with visits to UT Health clinics. After the next i2b2 refresh we expect an increased number of patients and possible improved agreement of event dates between EMR and NAACCR.
For external data I will request non-aggregated limited/deidentified records from the Texas Cancer Registry. I will also look at the NCDB dataset obtained by Urology to see if it has the elements listed in sec. 2.2.
In the remainder of Aim 2 and Aim 3 I will need the following additional variables: (NAACCR only) stage and grade; (EMR only) analgesics, smoking and alcohol, family history of cancer or diabetes, lab results, vital signs, Miperamine (as per Dr. Michalek), frequency of lab and image orders, frequency and duration of visits, and participation in adjuvant trials; (both) birthplace, language, and diabetes; and (census data in i2b2) income and education. Each of these will require a workup similar to that reported in sec. 2 and Appendix 3. I can work independently on many of these but I will need guidance from experts in Urology on interpreting the stage and grade data. If genomic data from the Urology biorepository becomes available for these patients in the course of this study it also will become an important variable for Aim 2.
The use of TCR or NCDB data is not a substitute for UT Health and MGH i2b2 data. The registries allow me to test the replicability of high-level findings to State and National populations but they will not have the detailed additional variables I will need to investigate the causes of disparate patient outcomes.
Nor are the R scripts I wrote for this project a substitute for DataFinisher5 development planned for Aim 1. On the contrary, the reason I was able to make this much progress in one month is that the data linkage and de-identification was done by the CIRD i2b2 ETL, the data selection was simplified by the i2b2 web client, and an enormous amount of post-processing was done by my DataFinisher app that is integrated into our local i2b2. During the work I present here I found several additional post-processing steps that generalize to other studies and I will integrate those into DataFinisher so that the data it outputs is even more analysis-ready. This will, in turn, will simplify the logistics of Aim 3.
1. Pinheiro, P. S. et al. High cancer mortality for US-born Latinos: Evidence from California and Texas. BMC Cancer 17, (2017).
2. Murphy, S. et al. Instrumenting the health care enterprise for discovery research in the genomic era. Genome Research 19, 1675–1681 (2009).
3. Adagarla, B. et al. SEINE: Methods for Electronic Data Capture and Integrated Data Repository Synthesis with Patient Registry Use Cases. (2015).
4. Rodriguez, R. personal communication (2018).
5. Bokov, A., Manuel, L., Cheng, C., Bos, A. & Tirado-Ramos, A. Denormalize and Delimit: How not to Make Data Extraction for Analysis More Complex than Necessary. Procedia Computer Science 80, 1033–1041 (2016).
Need to tabulate the frequencies of various combinations of TNM values
3400 Derived AJCC-7 T
, 3410 Derived AJCC-7 N
, 3420 Derived AJCC-7 M
, 2940 Derived AJCC-6 T
, 2960 Derived AJCC-6 N
, and 2980 Derived AJCC-6 M
are missing if and only if 3402 Derived AJCC-7 T Descript
, 3412 Derived AJCC-7 N Descript
, 3422 Derived AJCC-7 M Descript
, 2950 Derived AJCC-6 T Descript
, 2970 Derived AJCC-6 N Descript
, and 2990 Derived AJCC-6 M Descript
are also missing, respectively. For the tables in this section, the counts are by visit rather than by unique patient since the question of interest is how often do the stages assigned to the same case agree with each other. Each of the tables shows the 20 most common combinations of values.
3430 Derived AJCC-7 Stage Grp |
3000 Derived AJCC-6 Stage Grp |
0970 TNM Clin Stage Group |
0910 TNM Path Stage Group |
N |
---|---|---|---|---|
- | - | - | - | 3810 |
IV | IV | 99 | 99 | 65 |
III | III | 99 | 3 | 57 |
- | - | 88 | 88 | 57 |
I | I | 99 | 99 | 56 |
UNK | UNK | 99 | 99 | 55 |
I | I | 99 | 1 | 43 |
IV | IV | 4 | 99 | 42 |
III | III | 99 | 99 | 23 |
- | UNK | 99 | 99 | 23 |
IV | IV | 99 | 4 | 22 |
- | - | 99 | 99 | 17 |
II | II | 99 | 2 | 13 |
II | II | 99 | 99 | 13 |
IV | IV | 4 | 4 | 12 |
- | I | 99 | 1 | 9 |
IV | IV | 99 | 3 | 8 |
I | I | 1 | 99 | 7 |
- | - | 4 | 99 | 6 |
- | I | 99 | 99 | 6 |
3400 Derived AJCC-7 T |
2940 Derived AJCC-6 T |
0940 TNM Clin T |
0880 TNM Path T |
N |
---|---|---|---|---|
- | - | - | - | 3824 |
N- | N- | 88 | 88 | 64 |
cX | cX | - | - | 50 |
p3a | p3b | - | 3A | 33 |
p1a | p1a | - | 1A | 30 |
p1b | p1b | - | 1B | 24 |
p3a | p3a | - | 3A | 21 |
c1a | c1a | - | - | 20 |
pX | pX | - | - | 14 |
- | pX | - | - | 13 |
c4 | c4 | - | - | 12 |
p3b | p3b | - | 3B | 10 |
c1 | c1 | - | - | 10 |
p3 | p3 | - | 3 | 9 |
p2a | p2 | - | 2A | 8 |
p3a | p3a | - | 3 | 8 |
p1a | p1a | - | - | 8 |
c1b | c1b | - | - | 6 |
c3a | c3b | - | - | 6 |
cX | cX | X | X | 5 |
3410 Derived AJCC-7 N |
2960 Derived AJCC-6 N |
0950 TNM Clin N |
0890 TNM Path N |
N |
---|---|---|---|---|
- | - | - | - | 3825 |
c0 | c0 | - | - | 130 |
N- | N- | 88 | 88 | 64 |
p0 | p0 | - | 0 | 54 |
cX | cX | - | - | 46 |
c0 | c0 | - | X | 44 |
c0 | c0 | - | 0 | 31 |
c1 | c1 | - | - | 29 |
cX | cX | - | X | 25 |
- | c0 | - | - | 21 |
- | cX | - | - | 16 |
c0 | c0 | X | X | 15 |
c0 | c0 | 0 | - | 15 |
- | c0 | - | 0 | 14 |
p1 | p1 | - | 1 | 8 |
c0 | c0 | c0 | - | 8 |
c0 | c0 | c0 | c0 | 7 |
c0 | c0 | - | pX | 7 |
c0 | c0 | 0 | X | 7 |
y0 | y0 | - | 0 | 5 |
3420 Derived AJCC-7 M |
2980 Derived AJCC-6 M |
0960 TNM Clin M |
0900 TNM Path M |
N |
---|---|---|---|---|
- | - | - | - | 3827 |
c0 | c0 | - | - | 310 |
c1 | c1 | - | - | 67 |
N- | N- | 88 | 88 | 64 |
- | c0 | - | - | 50 |
c0 | c0 | 0 | - | 36 |
c0 | c0 | c0 | c0 | 24 |
c1 | c1 | 1 | - | 24 |
p1 | p1 | - | - | 13 |
c0 | c0 | c0 | - | 9 |
c0 | cX | - | - | 9 |
c1 | c1 | - | 1 | 8 |
p1 | p1 | - | 1 | 8 |
c0 | c0 | - | c0 | 7 |
- | c0 | - | 0 | 6 |
- | - | c0 | - | 6 |
c1 | c1 | c1 | - | 6 |
- | - | c0 | c0 | 5 |
- | c0 | 0 | - | 5 |
- | - | c1 | - | 5 |
In tables II, III, IV, V, when both the AJCC-7 and AJCC-6 values are non-missing they agree with each other 92.4%, 77.3%, 94.3%, and 94.7% of the time for T, N, and M respectively. There are 31.6%, 22.9%, 22.8%, and 22.6% AJCC-7 values missing but 6.9%, 10.3%, 10.2%, and 10.3% can be filled in from AJCC-6 for T, N, and M respectively.
patient_num |
start_date |
3400 Derived AJCC-7 T |
3410 Derived AJCC-7 N |
3420 Derived AJCC-7 M |
3430 Derived AJCC-7 Stage Grp |
---|---|---|---|---|---|
350 | 2014-05-10 | X | 0 | 0 | UNK |
3442 | 2014-09-17 | is | 0 | 0 | 0 |
3442 | 2015-03-01 | 1a | 0 | 0 | I |
9006 | 2009-09-02 | 1b | 0 | 0 | I |
9006 | 2009-11-18 | 1b | 0 | 0 | I |
18576 | 2011-08-03 | 1a | 0 | 0 | I |
18584 | 2011-06-04 | 3a | 0 | 0 | III |
19421 | 2011-05-12 | 1b | 0 | 0 | I |
35354 | 2010-04-02 | 3 | 2NOS | 0 | IIINOS |
35354 | 2010-04-10 | 1a | 0 | 0 | I |
41377 | 2012-01-05 | 3a | 0 | 0 | III |
43065 | 2013-06-06 | 3c | 1 | 1 | IV |
62619 | 2010-04-17 | X | 0 | 0 | UNK |
89902 | 2010-01-17 | 3a | 0 | 0 | III |
93443 | 2012-08-21 | X | 1a | 0 | UNK |
93443 | 2012-09-09 | 1a | 0 | 0 | I |
97742 | 2010-11-02 | 3a | 0 | 1 | IV |
111335 | 2013-01-19 | 1 | 0 | 0 | I |
114314 | 2015-10-27 | 3b | 0 | 0 | III |
117341 | 2011-03-04 | X | X | 0 | UNK |
All the TODO items are now tracked on to GitHub as well as linked from their respective yellow-highlighted text throughout the document.
In this section are patient counts for all 2327 patients in the overall set, broken down by various NAACCR variables (rows) and equivalent EMR variables (columns). The bold values are counts of patients for whom NAACCR and EMR are in agreement. Patients in the NA
are the ones with only EMR and no NAACCR records, so they count as missing rather than discrepant.
divorced | legally sepa | married | other | significant | single | unknown | widowed | Sum | ||
---|---|---|---|---|---|---|---|---|---|---|
Divorced | 0 | 47 | 0 | 2 | 0 | 0 | 5 | 2 | 0 | 56 |
Separated | 0 | 0 | 15 | 3 | 0 | 0 | 1 | 0 | 0 | 19 |
Married | 0 | 5 | 3 | 336 | 0 | 0 | 13 | 5 | 7 | 369 |
Domestic Partner | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
Single | 0 | 1 | 2 | 3 | 0 | 0 | 119 | 0 | 1 | 126 |
Unknown | 0 | 3 | 0 | 8 | 0 | 0 | 32 | 22 | 1 | 66 |
Widowed | 0 | 0 | 0 | 1 | 0 | 0 | 1 | 0 | 35 | 37 |
NA | 1 | 150 | 35 | 887 | 1 | 2 | 423 | 66 | 89 | 1654 |
Sum | 1 | 206 | 55 | 1240 | 1 | 2 | 594 | 95 | 133 | 2327 |
m | f | u | Sum | |
---|---|---|---|---|
m | 428 | 1 | 0 | 429 |
f | 9 | 235 | 0 | 244 |
NA | 937 | 716 | 1 | 1654 |
Sum | 1374 | 952 | 1 | 2327 |
White | Black | Asian | Pac Islander | Other | Unknown | Sum | |
---|---|---|---|---|---|---|---|
White | 591 | 2 | 2 | 0 | 2 | 23 | 620 |
Black | 1 | 26 | 0 | 0 | 0 | 0 | 27 |
Asian | 0 | 0 | 6 | 0 | 0 | 0 | 6 |
Pac Islander | 0 | 0 | 1 | 0 | 0 | 0 | 1 |
Other | 1 | 0 | 2 | 0 | 1 | 0 | 4 |
Unknown | 13 | 1 | 0 | 0 | 0 | 1 | 15 |
NA | 1400 | 83 | 11 | 1 | 46 | 113 | 1654 |
Sum | 2006 | 112 | 22 | 1 | 49 | 137 | 2327 |
Non_Hispanic | Hispanic | Sum | |
---|---|---|---|
Non_Hispanic | 304 | 15 | 319 |
Hispanic | 56 | 298 | 354 |
NA | 983 | 671 | 1654 |
Sum | 1343 | 984 | 2327 |
Non_Hispanic | Hispanic | Sum | |
---|---|---|---|
Non_Hispanic | 291 | 11 | 302 |
Unknown | 13 | 4 | 17 |
Hispanic_NOS | 44 | 256 | 300 |
Mexican | 9 | 39 | 48 |
Spanish_Surname | 2 | 2 | 4 |
Cuban | 1 | 0 | 1 |
S_Ctr_America | 0 | 1 | 1 |
NA | 983 | 671 | 1654 |
Sum | 1343 | 984 | 2327 |
Min. | 1st Qu. | Median | Mean | 3rd Qu. | Max. |
---|---|---|---|---|---|
-12 | -6.5 | -3.162 | -3.186 | -0.7064 | 9.999 |
The tables of patients with discrpant birthdates have been removed because the only apply to 15 patients, and are mostly empty. They can still be viewed in the 181009 archival version of this document for marital, sex, race, hisp, and surg
For each of the main event variables Diagnosis
, Surgery
, Recurrence
, and Death
/ 1760 Vital Status
there were multiple candidate data elements in the raw data. If such a family of elements is in good agreement overall then individual missing dates can be filled in with the earliest non-missing dates from other data elements in that family (except for mortality where the latest non-missing date would make more sense). But to do this I needed not only to establish qualitative agreement as I did for demographic variables in sec. 2.1 and Appendix 3.1 but also determine how often these dates lag or lead each other and by how much. The plots in this section use the y-axis to represent time for patient records arranged along the x-axis. They are arranged in an order that varies from one plot to another, chosen for visual interpretability. Each vertical slice of a plot represents one patient’s history, with different colors representing events as documented by different data elements. The goal is to see the frequency, magnitude, and direction of divergence for several variables at the same time.
At this time only 0390 Date of Diagnosis
is usable for calculating Diagnosis
. Initially 0580 Date of 1st Contact
was considered as an additional NAACCR source along with the earliest EMR records of 189.0 Malignant neoplasm of kidney, except pelvis
and C64 Malignant neoplasm of kidney, except renal pelvis
. 0443 Date Conclusive DX
is never used by our NAACCR. All other NAACCR data elements containing the word ‘date’ seem to be retired or related to events after initial diagnosis. 0580 Date of 1st Contact
was disqualified because it never precedes 0390 Date of Diagnosis
but often trails behind 1200 RX Date--Surgery
, see fig. 11. I will need to consult with a NAACCR registrar about what [0580 Date of 1st Contact
][n_fc] actually means but it does not appear to be a first visit nor first diagnosis. As can be seen in fig. 5 and table XIII, the first ICD9 or ICD10 code most often occurs after initial diagnosis, sometimes before the date of diagnosis, and coinciding with the date of diagnosis rarest of all. Several of the ICD9/10 first observed dates lead or trail the 0390 Date of Diagnosis
by multiple years.
Figure 5: Here is a plot centered on 0390 Date of Diagnosis
(blue horizontal line at 0) with black lines indicating ICD10 codes for primary kidney cancer from the EMR and dashed red lines indicating ICD9 codes. The dashed horizontal blue lines indicate +- 3 months from 0390 Date of Diagnosis
.
before | +/- 2 weeks | after | NA | Sum | |
---|---|---|---|---|---|
before | 29 | 2 | 15 | 1 | 47 |
+/- 2 weeks | 0 | 38 | 34 | 1 | 73 |
after | 0 | 1 | 316 | 3 | 320 |
NA | 0 | 0 | 7 | 39 | 46 |
Sum | 29 | 41 | 372 | 44 | 486 |
For most patients (291), the first EMR code is recorded within 3 months of first diagnosis as recorded by NAACCR. Of those with a larger time difference, the majority (143) have their first EMR code after first 0390 Date of Diagnosis
. Only 13 patients have ICD9/10 diagnoses that precede their 0390 Date of Diagnosis
by more than 3 months. An additional 54 patients have first EMR diagnoses that precede 0390 Date of Diagnosis
by less than three months. These might need to be eliminated from the sample on the grounds of not being first occurrences of kidney cancer. However, we cannot back-fill missing NAACCR records or NAACCR records lacking a diagnosis date because there is too frequently disagreement between the the two sources, and the EMR records are currently biased toward later dates.
To construct the Surgery
analytic variable I considered 1200 RX Date--Surgery
, 1260 Date of Initial RX--SEER
, 1270 Date of 1st Crs RX--CoC
, and 3170 RX Date--Most Defin Surg
from NAACCR as well as earliest occurrences of V45.73 Acquired absence of kidney
, Z90.5 Acquired absence of kidney
, or HX NEPHRECTOMY
from the EMR. In the plots and tables below I show why I decided to use 3170 RX Date--Most Defin Surg
as the surgery date and when that is unavailable, to fall back on 1200 RX Date--Surgery
. The other data elements are not used except to flag potentially incorrect records if they occur earlier than the date of diagnosis.
Figure 6: Above is a plot of all patients sorted by 1200 RX Date--Surgery
(black line). On the same axis is 3170 RX Date--Most Defin Surg
(red line) which is almost identical to 1200 RX Date--Surgery
except for a small number of cases where it occurs later than 1200 RX Date--Surgery
. It never occurs earlier. The violet lines indicate for each patient the earliest EMR code implying that a surgery had taken place (acquired absence of kidney ICD V/Z codes or surgical history of nephrectomy). The blue horizontal line is 0390 Date of Diagnosis
with the dashed lines representing a 3-month window in both directions..
Figure 7: In the above plot the 1270 Date of 1st Crs RX--CoC
(green) and 1260 Date of Initial RX--SEER
(cyan) events are superimposed on time till 1200 RX Date--Surgery
like in fig. 6 (but violet lines for nephrectomy EMR codes are omitted for readability). The 1270 Date of 1st Crs RX--CoC
and 1260 Date of Initial RX--SEER
variables trend earlier than 1200 RX Date--Surgery
.
In fig. 6 the 5 patients for which the earliest EMR nephrectomy code occurs before the earliest NAACCR possible record of surgery are highlighted in yellow. Among the remaining 181 patients who have an EMR code for nephrectomy, there are 129 for whom it happens more than 3 months after 1200 RX Date--Surgery
and those lags have a median of 14.3 months. This level of discrepancy disqualifies V45.73 Acquired absence of kidney
, Z90.5 Acquired absence of kidney
, and HX NEPHRECTOMY
from being used to fill in missing NAACCR dates. This may change after the next i2b2 update in which the fix to the “visit-less patient” problem will be implemented (sec. 5)
Figure 8: Above is a plot equivalent to fig. 7 but for patients who do not have a 1340 Reason for No Surgery
code equal to Surgery Performed
. There are many 1270 Date of 1st Crs RX--CoC
and 1260 Date of Initial RX--SEER
events but only a small number of 1200 RX Date--Surgery
(black) and 3170 RX Date--Most Defin Surg
(red). The 1200 RX Date--Surgery
and 3170 RX Date--Most Defin Surg
that do occur track each other perfectly. Together with NAACCR data dictionary’s description this suggests that 3170 RX Date--Most Defin Surg
is the correct principal surgery date in close agreement with 1200 RX Date--Surgery
, so perhaps missing 3170 RX Date--Most Defin Surg
values can be filled from 1200 RX Date--Surgery
. However 1270 Date of 1st Crs RX--CoC
and 1260 Date of Initial RX--SEER
seem like non-primary surgeries or other events and cannot be used to fill in missing values.
Min. | 1st Qu. | Median | Mean | 3rd Qu. | Max. | NA’s | |
---|---|---|---|---|---|---|---|
3170 RX Date--Most Defin Surg |
0 | 0 | 3 | 8.461 | 9.643 | 215.1 | 119 |
1270 Date of 1st Crs RX--CoC |
0 | 0 | 2.929 | 6.431 | 6.964 | 318.3 | 28 |
1260 Date of Initial RX--SEER |
0 | 0 | 3.857 | 8.213 | 8.571 | 270.9 | 198 |
1200 RX Date--Surgery |
0 | 0 | 2.857 | 7.83 | 9 | 215.1 | 109 |
V45.73 Acquired absence of kidney |
-361.1 | 8.143 | 31.43 | 69.5 | 82.71 | 957.4 | 261 |
HX NEPHRECTOMY |
-91.86 | 10.11 | 37.07 | 77.85 | 93.96 | 758.1 | 318 |
Surgical Oncology |
-194.9 | 0.2143 | 4.714 | 23.58 | 46 | 236.6 | 455 |
Z90.5 Acquired absence of kidney |
-20.14 | 9.607 | 37.86 | 85.12 | 111.2 | 957.4 | 226 |
1860 Recurrence Date--1st |
0 | 40.04 | 73.71 | 137.2 | 205.3 | 935.9 | 402 |
It makes sense that the Epic EMR lags behind NAACCR. As an outpatient system, it’s probably recording visits after the original surgery, and perhaps we are not yet importing the right elements from Sunrise EMR. In sec. 5 I outline possible remedies to that. For now, V45.73 Acquired absence of kidney
, HX NEPHRECTOMY
, Surgical Oncology
, and Z90.5 Acquired absence of kidney
can still be used to exclude cases as not first-time occurrences if it precedes diagnosis. Would I lose a lot of cases to such a criterion?
before | same-day | after | NA | |
---|---|---|---|---|
3170 RX Date--Most Defin Surg |
0 | 138 | 229 | 119 |
1270 Date of 1st Crs RX--CoC |
0 | 149 | 309 | 28 |
1260 Date of Initial RX--SEER |
0 | 83 | 205 | 198 |
1200 RX Date--Surgery |
0 | 146 | 231 | 109 |
V45.73 Acquired absence of kidney |
3 | 0 | 222 | 261 |
HX NEPHRECTOMY |
3 | 2 | 163 | 318 |
Surgical Oncology |
7 | 1 | 23 | 455 |
Z90.5 Acquired absence of kidney |
1 | 0 | 259 | 226 |
Only a small number of cases would be disqualified. Another important question is the level of agreement between 1340 Reason for No Surgery
and the NAACCR data elements that are candidates for comprising the surgery variable.
n_rx3170 = FALSE | n_rx3170 = TRUE | n_rx1270 = FALSE | n_rx1270 = TRUE | n_rx1260 = FALSE | n_rx1260 = TRUE | n_dsurg = FALSE | n_dsurg = TRUE | |
---|---|---|---|---|---|---|---|---|
Surgery Performed | 15 | 457 | 13 | 459 | 170 | 302 | 14 | 458 |
Surgery Not First Course | 136 | 10 | 20 | 126 | 82 | 64 | 122 | 24 |
No Surgery, Contra Indicated | 17 | 1 | 3 | 15 | 10 | 8 | 16 | 2 |
No Surgery, Deceased | 4 | 0 | 1 | 3 | 2 | 2 | 4 | 0 |
No Surgery, No Reason Given | 5 | 0 | 2 | 3 | 2 | 3 | 5 | 0 |
No Surgery, Refused | 5 | 3 | 2 | 6 | 4 | 4 | 4 | 4 |
Unknown Whether Surgery Done | 16 | 1 | 11 | 6 | 13 | 4 | 15 | 2 |
Unknown Whether Surgery Recommended or Done | 3 | 0 | 2 | 1 | 2 | 1 | 3 | 0 |
In summary, based on fig. 6 and table XIII V45.73 Acquired absence of kidney
, HX NEPHRECTOMY
, Surgical Oncology
, and Z90.5 Acquired absence of kidney
can only be used to disqualify patients for having erroneous records or previous history of kidney cancer but cannot fill in missing diagnosis dates. Based on figs. 7, 8, and table XVII 1270 Date of 1st Crs RX--CoC
and 1260 Date of Initial RX--SEER
are not necessarily always surgery events. This leaves 3170 RX Date--Most Defin Surg
with 0390 Date of Diagnosis
as a fallback. When I meet with the NAACCR regisrar I will seek their feedback about this approach and I will ask them about the most reliable way to identify the first kidney cancer occurrence for a patient if they have several (overlapping?) NAACCR entries. I also need to ask a chart abstraction expert about the best way to find in Epic and in Sunrise the date of a patient’s first nephrectomy
Candidate data elements for constructing the Recurrence
variable were 1770 Cancer Status
, 1880 Recurrence Type--1st
, and 1860 Recurrence Date--1st
from NAACCR. Our site is on NAACCR v16, not v18, so we do not have 1772 Date of Last Cancer Status
. According to the v16 standard, 1750 Date of Last Contact
should be used instead. From the EMR the candidates were 14 ICD9/10 codes for secondary tumors. In table XVII I reconcile 1770 Cancer Status
and 1880 Recurrence Type--1st
.
Tumor_Free | Tumor | Unknown | |
---|---|---|---|
Disease-free | 201 | 0 | 0 |
In situ invasive | 0 | 2 | 0 |
In situ original | 0 | 3 | 0 |
Local, insufficient info | 1 | 8 | 0 |
Local invasive | 2 | 15 | 0 |
Regional, insufficient info | 0 | 3 | 1 |
Invasive adjacent tissue only | 0 | 3 | 0 |
Invasive regional lymph nodes only | 0 | 3 | 0 |
Invasive adjacent tissue and regional lymph nodes | 0 | 2 | 0 |
Regional in situ, NOS | 0 | 1 | 0 |
Multiple true for invasive tumor | 0 | 2 | 0 |
Distant, insufficient info | 1 | 16 | 0 |
Distant invasive lung only | 1 | 22 | 1 |
Distant invasive pleura only | 0 | 1 | 0 |
Distant invasive liver only | 0 | 3 | 0 |
Distant invasive bone only | 1 | 7 | 0 |
Distant invasive CNS only | 0 | 5 | 0 |
Distant invasive lymph node only | 0 | 3 | 0 |
Distant invasive single site and local/trocar/regional | 0 | 4 | 0 |
Distant invasive multiple sites | 1 | 4 | 0 |
Never disease-free | 0 | 246 | 0 |
Recurred but no other info | 0 | 2 | 0 |
Unknown if recurred or was ever gone | 0 | 2 | 31 |
1880 Recurrence Type--1st
can be simplified by leaving values of Disease-free
(0), Never disease-free
(70), and Unknown if recurred or was ever gone
(99) as they are; if there were multiple values for the same case and one of those values was 70 then defaulting to Never disease-free
; and recoding all other values as simply Recurred
. I named this analytic variable Recurrence Status
.
Recur Date=FALSE | Recur Date=TRUE | |
---|---|---|
1654 | 0 | |
Disease-free | 215 | 0 |
Never disease-free | 281 | 1 |
Recurred | 19 | 124 |
Unknown if recurred or was ever gone | 33 | 0 |
This explains why 1860 Recurrence Date--1st
values are relatively rare in the data– they are specific to actual recurrences which are not a majority of the cases. This is a good from the standpoint of data consistency. Now we need to see to what extent the EMR codes agree with this.
Figure 9: In the above plot, the black line represents months elapsed between surgery and the first occurence of an EMR code for secondary tumors, if any. The horizontal red line segments indicate individual 1860 Recurrence Date--1st
. The dotted vertical red lines denote Recurred
patients who are missing a 1860 Recurrence Date--1st
. The blue horizontal line is the date of surgery and the dotted horizontal lines above and below it are +- 3 months. Patients whose 1880 Recurrence Type--1st
is Disease-free
are highlighted in green, Never disease-free
in yellow, and Recurred
in red. There are 75 patients with multiple NAACCR records, and all records for these patients have been excluded from this plot.
The green highlights in fig. 9 are mostly where one would expect, but why are there 38 patients on the left side of the plot labeled Disease-free
that have EMR codes for secondary tumors? Also, there are 32 patients with metastatic tumor codes earlier than 1200 RX Date--Surgery
and of those 5 occur more than 3 months prior to 1200 RX Date--Surgery
. Did they present with secondary tumors to begin with but remained disease free after surgery? These are questions to ask the NAACCR registrar. The EMR codes are in better agreement with 1860 Recurrence Date--1st
than the data elements in Appendix 3.2.1 and Appendix 3.2.2 so it might make sense to back-fill the few 1860 Recurrence Date--1st
that are missing but first I want to make sure I understand how to reliably distinguish on the EMR side genuine recurrences from secondary tumors that existed at presentation. The small number of cases affected either way lowers the priority of this isuse. For now I will rely only on 1860 Recurrence Date--1st
in constructing the analytical variable Recurrence
.
Unlike diagnosis (Appendix 3.2.1), surgery (Appendix 3.2.2), and recurrence (Appendix 3.2.3) death dates exhibit good agreement between various sources and can be used to supplement the data available from NAACCR.
Figure 10: Above are plotted times of death (if any) relative to 0390 Date of Diagnosis
(horizontal blue line). The four data sources are Death, i2b2
(), Deceased per SSA
(), Expired
(), and 1760 Vital Status
().
Below -30 |
-30 to 0 | same | 0 to 30 | Above 30 |
Neither missing |
Left missing |
Right missing |
Both missing |
|
---|---|---|---|---|---|---|---|---|---|
Deceased per SSA |
1 (10.0%) -31.0 |
0 ( 0.0%) |
9 (90.0%) 0.0 |
0 ( 0.0%) |
0 ( 0.0%) |
10 ( 2.1%) 0.0 |
83 (17.1%) |
8 ( 1.6%) |
385 (79.2%) |
Expired |
1 (11.1%) -34.0 |
7 (77.8%) -5.0 |
1 (11.1%) 0.0 |
0 ( 0.0%) |
0 ( 0.0%) |
9 ( 1.9%) -5.0 |
84 (17.3%) |
8 ( 1.6%) |
385 (79.2%) |
Death, i2b2 |
1 ( 1.3%) -31.0 |
0 ( 0.0%) |
73 (96.1%) 0.0 |
2 ( 2.6%) 5.5 |
0 ( 0.0%) |
76 (15.6%) 0.0 |
17 ( 3.5%) |
46 ( 9.5%) |
347 (71.4%) |
Earliest Death |
1 ( 1.1%) -34.0 |
7 ( 7.5%) -5.0 |
85 (91.4%) 0.0 |
0 ( 0.0%) |
0 ( 0.0%) |
93 (19.1%) 0.0 |
0 ( 0.0%) |
47 ( 9.7%) |
346 (71.2%) |
Latest Death |
0 ( 0.0%) |
0 ( 0.0%) |
91 (97.8%) 0.0 |
2 ( 2.2%) 5.5 |
0 ( 0.0%) |
93 (19.1%) 0.0 |
0 ( 0.0%) |
47 ( 9.7%) |
346 (71.2%) |
In table XIX the sum of the Neither missing
and Left missing
is always 93 which is the number of deceased patients according to NAACCR records alone. The Right missing
column is the number of patients whose deceased status is recorded in the external source but not in NAACCR. For the last two rows Right missing
means the total number of deceased patients not recorded in NAACCR but which can be filled in from one or more of the other sources. There are 47 such patients. Finally the last column, Both missing
, is the number of patients presumed to be alive because none of the sources have any evidence for being deceased. The Left missing
column indicates how many patients are reported deceased in NAACCR but not the other source. Though there are some missing for each individual data source, NAACCR is never the only source reporting them deceased– the values in the bottom two rows are both 0.
The left-side columns of table XIX show the prevalence and magnitude of discrepancies in death dates of the 93 patients that NAACCR and at least one other source agree are deceased. There are at most 10 such patients and for 9 of them the discrepancy is less than one month, with a median difference ranging from -5 to 5.5 days. The small number of discrepancies and the small magnitude of the ones that do occur justify filling in missing NAACCR death dates from the other sources.
Despite the overall agreement between 0190 Spanish/Hispanic Origin
and Hispanic or Latino
there needs to be some way to adjudicate the minority of cases where the sources disagree. The following additional data elements can provide relevant information to form a final consensus variable for analysis: language_cd
, Language
, Ethnicity
, race_cd
, and Race (NAACCR 0160-0164)
First, each of these variables is re-coded to Hispanic
, non-Hispanic
, and Unknown
.
language_cd
and Language
are interpreted as being evidence in favor of Hispanic
ethnicity if the language includes Spanish. English, ASL, and unknown values are all treated as Unknown
ethnicity. However, a language other than the above (e.g. German) is interpreted as evidence for being non-Hispanic
.
0190 Spanish/Hispanic Origin
already have explicit designations of non-Hispanic
and Unknown
and all other values are interpreted as Hispanic
. Hispanic or Latino
is interpreted as Hispanic
if TRUE
and Unknown
if FALSE
(in contrast with most of the other elements, there is no way to distinguish a genuinely FALSE
value of Hispanic or Latino
from a missing one).
Ethnicity
is the whole ethnicity variable from i2b2 OBSERVATION_FACT and suprprisingly it sometimes disagrees with Hispanic or Latino
. A value of hispanic
is interpreted directly. The values other
,unknown
, unknown/othe
,i choose not
, and @
are all interpeted as Unknown
and any other value (at our site, arab-amer
and non-hispanic
) is interpreted as non-Hispanic
. Rules are then applied to create unified variables from all these data elements. I have three such variables– Hispanic (NAACCR)
, Hispanic (broad)
, and Hispanic (strict)
Hispanic (NAACCR)
only uses information from NAACCR.
Hispanic (broad)
errs on the side of assigning Hispanic
ethnicity if there is any evidence for it at all, then non-Hispanic
, and Unknown
only if there is truly no information from any source about the patient’s ethnicity. In particular, Hispanic
is assigned if any non-missing values of language_cd
, Language
, 0190 Spanish/Hispanic Origin
, Hispanic or Latino
, and Ethnicity
have a value of Hispanic
; Unknown
if all non-missing values of language_cd
, Language
, 0190 Spanish/Hispanic Origin
, Hispanic or Latino
, and Ethnicity
are unanimous for Unknown
; and non-Hispanic
otherwise.
Finally, Hispanic (strict)
only assigns Hispanic
if all non-missing values of 0190 Spanish/Hispanic Origin
, Hispanic or Latino
, and Ethnicity
are unanimous for Hispanic
. non-Hispanic
is assigned if all non-missing values of 0190 Spanish/Hispanic Origin
and Ethnicity
are unanimous for non-Hispanic
(the Hispanic or Latino
element is not used for the reasons explained above) and neither Language
nor language_cd
vote for Hispanic
. If neither of these conditions are met, Unknown
is assigned.
There is an additional step for patients coded as non-Hispanic
where they are further classified into non-Hispanic white
and Other
. For Hispanic (NAACCR)
this is determined by whether or Race (NAACCR 0160-0164)
is White
. For Hispanic (broad)
the criterion is whether at least one of Race (NAACCR 0160-0164)
or race_cd
is White
. For Hispanic (strict)
it’s whether both Race (NAACCR 0160-0164)
and race_cd
are White
.
In the end, Hispanic (NAACCR)
, Hispanic (broad)
, and Hispanic (strict)
all have the same levels, but differ in the proportion of patients assigned to each.
Hispanic (NAACCR) |
Hispanic (broad) |
Hispanic (strict) |
N Patients |
---|---|---|---|
Hispanic | Hispanic | Hispanic | 213 |
Hispanic | Hispanic | Unknown | 141 |
non-Hispanic white | non-Hispanic white | non-Hispanic white | 190 |
non-Hispanic white | non-Hispanic white | Unknown | 59 |
non-Hispanic white | Hispanic | Unknown | 11 |
non-Hispanic white | non-Hispanic white | Other | 3 |
Other | Other | Other | 23 |
Other | Other | Unknown | 13 |
Other | Hispanic | Unknown | 2 |
Other | non-Hispanic white | Other | 1 |
Unknown | Unknown | Unknown | 9 |
Unknown | Hispanic | Unknown | 4 |
Unknown | non-Hispanic white | Unknown | 3 |
Unknown | Other | Unknown | 1 |
- | Hispanic | Hispanic | 512 |
- | non-Hispanic white | - | 440 |
- | Unknown | Unknown | 363 |
- | Hispanic | Unknown | 254 |
- | - | Other | 76 |
- | - | Unknown | 6 |
- | non-Hispanic white | Unknown | 3 |
Of the 673 with NAACCR records (all, not just the 486 meeting the current criteria, see sec. 1) only 22 have differences between Hispanic (NAACCR)
and Hispanic (broad)
but 229 have differences between Hispanic (NAACCR)
and Hispanic (strict)
.
According to Hispanic (NAACCR)
, Hispanic (broad)
, and Hispanic (strict)
respectively, 52.6%, 55.1%, and 31.6% of the NAACCR patients are Hispanic. At 55.1% Hispanic (broad)
comes the closest to the 2016 Census estimates for San Antonio. Also, anecdotal evidence suggests that Hispanic ethnicity is under-reported. This argues for using Hispanic (broad)
when possible, but I will keep Hispanic (strict)
available for sensitivity analysis.
Figure 11: Wierd observation– 0580 Date of 1st Contact
(red) is almost always between 1750 Date of Last Contact
(black) and 0390 Date of Diagnosis
(blue) though diagnosis is usually on a biopsy sample and that’s why it’s dated as during or after surgery we thought. If first contact is some kind of event after first diagnosis, what is it?.
Surgery 1200 RX Date--Surgery
seems to happen in significant amounts both before and after first contact 0580 Date of 1st Contact
.
This section is no longer relevant but is still available for reference in the kidneycancer_181009 snapshot of this document
This section is no longer relevant but is still available for reference in the kidneycancer_181009 snapshot of this document
Here are descriptions of the variables referenced in this document.
patient_num
1880 Recurrence Type–1st
Link: http://datadictionary.naaccr.org/default.aspx?c=10#1880
3170 RX Date–Most Defin Surg; Date of most definitive surgery.
Link: http://datadictionary.naaccr.org/default.aspx?c=10#3170
1340 Reason for No Surgery
Link: http://datadictionary.naaccr.org/default.aspx?c=10#1340
0390 Date of Diagnosis
Link: http://datadictionary.naaccr.org/default.aspx?c=10#390
1200 RX Date–Surgery
Link: http://datadictionary.naaccr.org/default.aspx?c=10#1200
1750 Date of Last Contact; Last Contact
Link: http://datadictionary.naaccr.org/default.aspx?c=10#1750
1760 Vital Status; Vital Status, Registry; This gets individually converted to a TTE variable by data.R
Link: http://datadictionary.naaccr.org/default.aspx?c=10#1760
1770 Cancer Status; Cancer Status, Registry
Link: http://datadictionary.naaccr.org/default.aspx?c=10#1770
1860 Recurrence Date–1st
Link: http://datadictionary.naaccr.org/default.aspx?c=10#1860
Kidney and Renal Pelvis; SEER site
Kidney, NOS; KC, Registry
Surgical Oncology; Visit to Surgical Oncology; Visit to Surgical Oncology (UT Health)
3180 RX Date–Surgical Disch
Link: http://datadictionary.naaccr.org/default.aspx?c=10#3180
0920 TNM Path Descriptor
Link: http://datadictionary.naaccr.org/default.aspx?c=10#920
0980 TNM Clin Descriptor
Link: http://datadictionary.naaccr.org/default.aspx?c=10#980
3430 Derived AJCC-7 Stage Grp
Link: http://datadictionary.naaccr.org/default.aspx?c=10#3430
3422 Derived AJCC-7 M Descript
Link: http://datadictionary.naaccr.org/default.aspx?c=10#3422
3420 Derived AJCC-7 M
Link: http://datadictionary.naaccr.org/default.aspx?c=10#3420
3412 Derived AJCC-7 N Descript
Link: http://datadictionary.naaccr.org/default.aspx?c=10#3412
3410 Derived AJCC-7 N
Link: http://datadictionary.naaccr.org/default.aspx?c=10#3410
3402 Derived AJCC-7 T Descript
Link: http://datadictionary.naaccr.org/default.aspx?c=10#3402
3400 Derived AJCC-7 T
Link: http://datadictionary.naaccr.org/default.aspx?c=10#3400
3000 Derived AJCC-6 Stage Grp
Link: http://datadictionary.naaccr.org/default.aspx?c=10#3000
2990 Derived AJCC-6 M Descript
Link: http://datadictionary.naaccr.org/default.aspx?c=10#2990
2980 Derived AJCC-6 M
Link: http://datadictionary.naaccr.org/default.aspx?c=10#2980
2970 Derived AJCC-6 N Descript
Link: http://datadictionary.naaccr.org/default.aspx?c=10#2970
2960 Derived AJCC-6 N
Link: http://datadictionary.naaccr.org/default.aspx?c=10#2960
2950 Derived AJCC-6 T Descript
Link: http://datadictionary.naaccr.org/default.aspx?c=10#2950
2940 Derived AJCC-6 T
Link: http://datadictionary.naaccr.org/default.aspx?c=10#2940
0940 TNM Clin T
Link: http://datadictionary.naaccr.org/default.aspx?c=10#940
0950 TNM Clin N
Link: http://datadictionary.naaccr.org/default.aspx?c=10#950
0960 TNM Clin M
Link: http://datadictionary.naaccr.org/default.aspx?c=10#960
0970 TNM Clin Stage Group
Link: http://datadictionary.naaccr.org/default.aspx?c=10#970
0910 TNM Path Stage Group
Link: http://datadictionary.naaccr.org/default.aspx?c=10#910
0900 TNM Path M
Link: http://datadictionary.naaccr.org/default.aspx?c=10#900
0890 TNM Path N
Link: http://datadictionary.naaccr.org/default.aspx?c=10#890
0880 TNM Path T
Link: http://datadictionary.naaccr.org/default.aspx?c=10#880
0240 Date of Birth
Link: http://datadictionary.naaccr.org/default.aspx?c=10#240
birth_date
0150 Marital Status at DX; Marital Status, Registry
Link: http://datadictionary.naaccr.org/default.aspx?c=10#150
Marital Status; Marital Status, i2b2
0220 Sex; Sex, Registry
Link: http://datadictionary.naaccr.org/default.aspx?c=10#220
sex_cd; Sex, i2b2
Race (NAACCR 0160-0164); Race, registry; To obtain a combined NAACCR race code for analysis, it is necessary to combine NAACCR variables 0160 Race
- 0164 Race
into one and then recode it to the closest match among White
, Black
Asian
, Pac Islander
, Other
, and Unknown
race_cd; Race, i2b2
0190 Spanish/Hispanic Origin; Hispanic Origin, Registry
Link: http://datadictionary.naaccr.org/default.aspx?c=10#190
Hispanic or Latino; Hispanic Origin, i2b2
Death, i2b2; Death, i2b2; Death according to the combined i2b2 records from all sources
Deceased per SSA; Death, SSN
Expired; Discharge Disposition
0250 Birthplace
Link: http://datadictionary.naaccr.org/default.aspx?c=10#250
2850 CS Mets at DX
Link: http://datadictionary.naaccr.org/default.aspx?c=10#2850
0580 Date of 1st Contact; Can also be date of clinical (as opposed to path) diagnosis
Link: http://datadictionary.naaccr.org/default.aspx?c=10#580
0446 Multiplicity Counter
Link: http://datadictionary.naaccr.org/default.aspx?c=10#446
Death; Death
Hispanic (strict); Hispanic (strict); Code patients as Hispanic or non-Hispanic only if all available evidence is unanimous, otherwise err on the side of Unknown
Hispanic (broad); Hispanic (broad); Code patients as Hispanic if there is even the slightest evidence they are, otherwise assume they re non-Hispanic, and only if there is really zero evidence either way return Unknown
Diagnosis; Diagnosis
Recurrence; Recurrence; Analytic master variable for time to recurrence. Based on n_drecur
Surgery; Surgery
Hispanic (NAACCR); Hispanic, registry; The n_hisp
variable binned to Hispanic
, non-Hispanic
, and Unknown
Recurrence Status; Recurrence Status; This is the main analytic variable for recurrence. This is based on n_rectype
but with all values that signify recurrence binned together leaving Unknown if recurred or was ever gone
,Never disease-free
,Disease-free
, and Recurred
.
start_date
189.0 Malignant neoplasm of kidney, except pelvis; KC ICD9, i2b2; 189.0 Malignant neoplasm of kidney, except pelvis
C64 Malignant neoplasm of kidney, except renal pelvis; KC ICD10, i2b2; C64 Malignant neoplasm of kidney, except renal pelvis
1260 Date of Initial RX–SEER; Date of initiation of the first course therapy for the tumor being reported, using the SEER definition of first course. See also Date 1st Crs RX CoC [1270].
Link: http://datadictionary.naaccr.org/default.aspx?c=10#1260
1270 Date of 1st Crs RX–CoC; Date of initiation of the first therapy for the cancer being reported, using the CoC definition of first course. The date of first treatment includes the date a decision was made not to treat the patient.
Link: http://datadictionary.naaccr.org/default.aspx?c=10#1270
V45.73 Acquired absence of kidney; V45.73 Acquired absence of kidney
Z90.5 Acquired absence of kidney
HX NEPHRECTOMY; Surgical history
C7B-C7B Secondary neuroendocrine tumors (C7B); C7B-C7B Secondary neuroendocrine tumors (C7B)
C79 Secondary malignant neoplasm of other and unspecified sites; C79 Secondary malignant neoplasm of other and unspecified sites
C79 Secondary malignant neoplasm of other and unspecified sites; C79 Secondary malignant neoplasm of other and unspecified sites
C78 Secondary malignant neoplasm of respiratory and digestive organs; C78 Secondary malignant neoplasm of respiratory and digestive organs
C78 Secondary malignant neoplasm of respiratory and digestive organs; C78 Secondary malignant neoplasm of respiratory and digestive organs
C77 Secondary and unspecified malignant neoplasm of lymph nodes; C77 Secondary and unspecified malignant neoplasm of lymph nodes
C77 Secondary and unspecified malignant neoplasm of lymph nodes; C77 Secondary and unspecified malignant neoplasm of lymph nodes
196 Secondary and unspecified malignant neoplasm of lymph nodes; 196 Secondary and unspecified malignant neoplasm of lymph nodes
196 Secondary and unspecified malignant neoplasm of lymph nodes; 196 Secondary and unspecified malignant neoplasm of lymph nodes
197 Secondary malignant neoplasm of respiratory and digestive systems; 197 Secondary malignant neoplasm of respiratory and digestive systems
197 Secondary malignant neoplasm of respiratory and digestive systems; 197 Secondary malignant neoplasm of respiratory and digestive systems
198 Secondary malignant neoplasm of other specified sites; 198 Secondary malignant neoplasm of other specified sites
198 Secondary malignant neoplasm of other specified sites; 198 Secondary malignant neoplasm of other specified sites
language_cd; Language, i2b2
Language
Ethnicity; EMR demographics
Test section
sequence | time | type | name | hash |
---|---|---|---|---|
0001 | 2018-10-16 17:22:29 | info | sessionInfo | - |
0002 | 2018-10-16 17:22:29 | this_script | exploration.spin.Rmd | 4dff158 |
0003 | 2018-10-16 17:23:10 | rdata | .depdata[ii] = “dictionary.R.rdata” | dbb49fe969d73218eddfdbe85670344e |
0004 | 2018-10-16 17:26:03 | rdata | .depdata[ii] = “data.R.rdata” | b9233974e7a29b4c5d27a1603013438d |
0003.0001 | 2018-10-16 17:22:34 | info | sessionInfo | - |
0003.0002 | 2018-10-16 17:22:34 | this_script | dictionary.R | 4dff158 |
0003.0003 | 2018-10-16 17:22:35 | file | inputdata = “local/in/HSC20170563N_kc_v200.int.csv” | caa0a30bd87cd77659b118986cab73a4 |
0003.0004 | 2018-10-16 17:22:46 | file | inputdata = “local/in/HSC20170563N_kc_v200.int.csv” | caa0a30bd87cd77659b118986cab73a4 |
0003.0005 | 2018-10-16 17:22:46 | file | rawdct = “local/in/meta_HSC20170563N_kc_v200.int.csv” | 77226290495672d030798e64327fe10a |
0003.0006 | 2018-10-16 17:22:46 | file | tpldct = “datadictionary_static.csv” | dc40ce6053d4edc459cb6a240f1cf8c6 |
0003.0007 | 2018-10-16 17:22:49 | info | sessionInfo | - |
0003.0008 | 2018-10-16 17:22:49 | save | save | - |
0004.0001 | 2018-10-16 17:23:15 | info | sessionInfo | - |
0004.0002 | 2018-10-16 17:23:15 | this_script | data.R | 4dff158 |
0004.0003 | 2018-10-16 17:23:26 | rdata | .depdata = “dictionary.R.rdata” | dbb49fe969d73218eddfdbe85670344e |
0004.0004 | 2018-10-16 17:23:26 | file | levels_map_file = “levels_map.csv” | dade16a6df40d86457f024f52781e3b2 |
0004.0005 | 2018-10-16 17:24:07 | seed | project_seed | - |
0004.0006 | 2018-10-16 17:25:35 | info | sessionInfo | - |
0004.0007 | 2018-10-16 17:25:37 | save | save | - |
0004.0003.0001 | 2018-10-16 17:22:34 | info | sessionInfo | - |
0004.0003.0002 | 2018-10-16 17:22:34 | this_script | dictionary.R | 4dff158 |
0004.0003.0003 | 2018-10-16 17:22:35 | file | inputdata = “local/in/HSC20170563N_kc_v200.int.csv” | caa0a30bd87cd77659b118986cab73a4 |
0004.0003.0004 | 2018-10-16 17:22:46 | file | inputdata = “local/in/HSC20170563N_kc_v200.int.csv” | caa0a30bd87cd77659b118986cab73a4 |
0004.0003.0005 | 2018-10-16 17:22:46 | file | rawdct = “local/in/meta_HSC20170563N_kc_v200.int.csv” | 77226290495672d030798e64327fe10a |
0004.0003.0006 | 2018-10-16 17:22:46 | file | tpldct = “datadictionary_static.csv” | dc40ce6053d4edc459cb6a240f1cf8c6 |
0004.0003.0007 | 2018-10-16 17:22:49 | info | sessionInfo | - |
0004.0003.0008 | 2018-10-16 17:22:49 | save | save | - |
UT Health San Antonio↩