Journal

Yesterday

Struggling with how to correctly separate (in DataFinisher) the functionality of selecting valid rules versus recommending them… and how to incorporate user response in future interactive-mode use.

Today

New plan for DataFinisher spreadsheet-embedded metadata. There will be a nested dict built from the first two rows of the input csv, structured like this:

{ ...
 'birth_date': {
  'dat': '2018-11-29', 
  'outcols': [
   {
    'dat': '2018-11-29', 
    'cname': 'birth_date', 
    'extr': 'as_is', 
    'args': []
   }
  ]
 }, 
 'v073_TNM_Pth_M': {
  'dat': {
   'mxconmod': 1, 
   'quantity_num': null, 
   'mxinsts': 1, 
   'nval_num': null, 
   'cid': 73, 
   'name': '0900 TNM Path M', 
   'colcd': 'v073', 
   'rule': 'UNKNOWN_DATA_ELEMENT', 
   'ddomain': 'NAACCR', 
   'concept_path': '\\i2b2\\naaccr\\S:11 Stage/Prognostic Factors\\0900 TNM Path M\\', 
   'mxfacts': 1, 
   'done': 0, 
   'valueflag_cd': null, 
   'ccd': 8, 
   'colid': 'v073_TNM_Pth_M', 
   'tval_char': null, 
   'mod': null, 
   'confidence_num': null, 
   'units_cd': null, 
   'location_cd': null, 
   'valtype_cd': null
  }, 
  'outcols': [
   {
    'dat': '{\'mxconmod\': 1, \'quantity_num\': null, \'mxinsts\': 1, \'nval_num\': null, \'cid\': 73, \'name\': \'0900 TNM Path M\', \'colcd\': \'v073\', \'rule\': \'UNKNOWN_DATA_ELEMENT\', \'ddomain\': \'NAACCR\', \'concept_path\': \'\\\\i2b2\\\\naaccr\\\\S:11 Stage/Prognostic Factors\\\\0900 TNM Path M\\\\\', \'mxfacts\': 1, \'done\': 0, \'valueflag_cd\': null, \'ccd\': 8, \'colid\': \'v073_TNM_Pth_M\', \'tval_char\': null, \'mod\': null, \'confidence_num\': null, \'units_cd\': null, \'location_cd\': null, \'valtype_cd\': null}', 
    'cname': 'v073_TNM_Pth_M', 
    'extr': 'as_is', 
    'args': []
   }
  ]
 }
 ...
}

These are two columns (of many). The birth_date example is how columns lacking metadata (but not intended to get blown away) would look. The v073_TNM_Pth_M example is how metadata columns will look.

Notice that each column has an outcols list which currently contains the information for one column (a copy of the original). This makes it easier to wrap my head around how to accept algorithmic and user-originated decisions on top of this basic structure: by interpreting the contents of dat the user or the suggestion engine generate additional derived columns to the outcols list of each column. So finally we have a unified messaging process for either user or default suggestions, distinct from the much broader selection of which rules could potentially be valid for a given column.

Written on November 29, 2018