{"id":159,"date":"2019-09-11T11:57:23","date_gmt":"2019-09-11T10:57:23","guid":{"rendered":"https:\/\/informedica.nl\/?p=159"},"modified":"2020-11-15T09:36:55","modified_gmt":"2020-11-15T08:36:55","slug":"using-f-as-an-etl-tool","status":"publish","type":"post","link":"https:\/\/informedica.nl\/?p=159","title":{"rendered":"Using F# as an ETL tool"},"content":{"rendered":"\n<p>The ability to Extract, Transform and Load data to a format that enables data analysis and machine learning is essential to make use of the vast amount of observational data  that is nowadays available. F# can be a very efficient tool to achieve these goals. <\/p>\n\n\n<p><!--more--><\/p>\n\n\n<p>The general model to represent the extraction is as follows:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code lang=\"fsharp\" class=\"language-fsharp\">\/\/\/ A function to summarize a list of values\r\n\/\/\/ as one value\r\ntype Collaps = Value seq -> Value\r\n\r\n\/\/\/ A conversion function for a value\r\ntype Convert = Value -> Value\r\n\r\n\/\/\/ An observation describes wich signals to\r\n\/\/\/ that describe that observation (Sources)\r\n\/\/\/ along with a collaps function that summarizes\r\n\/\/\/ the obtained values to one value.\r\n\/\/\/ Each source value also can be converted to\r\n\/\/\/ a different or adjusted value.\r\ntype Observation =\r\n    { Name : Name \/\/ Art bloodgas \r\n      Sources : Source list\r\n      Collaps : Collaps }\r\n\r\nand Source =\r\n    { Name : Name\r\n      Id : ParameterId\r\n      Convert : Convert }\r\n\r\n\/\/\/ A list of observations to retrieve\r\ntype Observations = Observation seq<\/code><\/pre>\n\n\n\n<p>Central is the Observation type. An observation is anything that can be observed, obviously, and is identified by a name, a list of sources and the ability to summarize the source list to a single value.<\/p>\n\n\n\n<p>For example, when we want to observe the urinary output of a patient, this can be stored in a number of parameters like:<\/p>\n\n\n\n<ul class=\"wp-block-list\"><li>Spontaneous diuresis<\/li><li>Catheter measured diuresis<\/li><li>Diaper weight measured diuresis<\/li><li>etc..<\/li><\/ul>\n\n\n\n<p>So with these list of sources the observed diuresis can be summurazed by adding all the entries in the list of sources.<\/p>\n\n\n\n<p>Another example is when temperature is measured at different locations and is stored using different parameters. A possible collapse function could use the list of sources as a hierarchical ordered list that returns the first entry that is not empty.<\/p>\n\n\n\n<p>For each source a convert function can be used to either:<\/p>\n\n\n\n<ul class=\"wp-block-list\"><li>To convert the value to a different value, for example to match the other possible values in the list of sources or<\/li><li>To apply a filter to a value to filter out values that cannot be true measurements or observations<\/li><\/ul>\n\n\n\n<p>The end result is a list of observations that describe where to retrieve those observations, how to summarize the sources for the observations and convert and\/or filter the observations values.<\/p>\n\n\n\n<p>Next a dataset is created that contains the retrieved data.<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code lang=\"fsharp\" class=\"language-fsharp\">\/\/\/ The resulting dataset with colums\r\n\/\/\/ and rows of data. Each row has a\r\n\/\/\/ date time.\r\ntype DataSet =\r\n    { Columns : Name seq\r\n      Data : PatientId * DateTime * DataRow seq }\r\n\r\nand DataRow = Value seq<\/code><\/pre>\n\n\n\n<p>The names contain the observation names, which are column names, and each row is uniquely identified by a patient a date time. The row itself is a list of values that map to the column names. Thus it represents a flat table containing all the patient data.<\/p>\n\n\n\n<p>The final step is to anonymize the dataset by replacing patient id&#8217;s with random identifiers and replacing the actual date time with a time since the first observation for that patient. In the process the birthdate for a patient can be replaced by the age at the time of an observation. This leaves no traceable patient data. <\/p>\n\n\n\n<p>A prototypical system has been created for the MetaVision PDMS from <a href=\"https:\/\/www.imd-soft.com\/products\/intensive-care\">iMDSoft<\/a>.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>The ability to Extract, Transform and Load data to a format that enables data analysis and machine learning is essential to make use of the vast amount of observational data that is nowadays available. F# can be a very efficient tool to achieve these goals.<\/p>\n","protected":false},"author":2,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[10,9],"tags":[],"class_list":["post-159","post","type-post","status-publish","format-standard","hentry","category-medicine","category-programming"],"_links":{"self":[{"href":"https:\/\/informedica.nl\/index.php?rest_route=\/wp\/v2\/posts\/159","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/informedica.nl\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/informedica.nl\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/informedica.nl\/index.php?rest_route=\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"https:\/\/informedica.nl\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=159"}],"version-history":[{"count":6,"href":"https:\/\/informedica.nl\/index.php?rest_route=\/wp\/v2\/posts\/159\/revisions"}],"predecessor-version":[{"id":182,"href":"https:\/\/informedica.nl\/index.php?rest_route=\/wp\/v2\/posts\/159\/revisions\/182"}],"wp:attachment":[{"href":"https:\/\/informedica.nl\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=159"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/informedica.nl\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=159"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/informedica.nl\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=159"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}