The Architecture of Truth: Data Engineering and Biomarker Harmonization in Precision Medicine

La Arquitectura de la Verdad: Ingeniería de Datos y Armonización de Biomarcadores en la Medicina de Precisión

In the Kóre Labs ecosystem , subjectivity is obsolete. The difference between data and a clinical decision lies in semantic interoperability . This technical documentation details the infrastructure needed to eliminate data silos and allow hard science to dictate the optimization protocol.

1. Interoperability Architecture in Health Informatics

Building a master database for the 150 most common clinical biomarkers is not merely a cataloging task; it constitutes a critical ontological engineering challenge for the integrity of modern healthcare systems. As a Senior Health Informatics Engineer, the fundamental premise of this design is semantic interoperability . In an ecosystem where Electronic Health Records (EHRs) must communicate with Clinical Trial Data Management Systems (CDMS) and Big Data repositories for population research, ambiguity in the definition of an analyte is unacceptable. A numerical value lacks clinical meaning without the rigorous definition of four dimensions: the component (analyte), the system (biological matrix), the analytical method, and the standardized unit of measurement.

A thorough analysis of current standards reveals that data fragmentation remains the primary obstacle to precision medicine. For example, the coexistence of gravimetric (mg/dL) and molar (mmol/L) units for glucose, or the variability in creatinine calibration (traceable to IDMS versus traditional Jaffe), creates "data silos" that impede longitudinal analysis.<sup> 1</sup> This technical specification establishes a "Single Source of Truth" aligned with FDA regulations for clinical data presentation and HL7 FHIR global standards.

1.1 The LOINC Paradigm as a Backbone

The LOINC (Logical Observation Identifiers Names and Codes) standard provides the taxonomy necessary to disambiguate laboratory observations. Unlike billing codes (CPTs) that describe a service, LOINC describes the scientific observation . For this master database, we have selected codes that maximize interoperability, prioritizing "method-less" codes when the analytical technique does not significantly alter the clinical reference range, but specifying methods when it is critical (as in the case of testosterone by mass spectrometry or D-dimer)

The structure of each entry in our database must validate six semantic axes:

  1. Component: The analyte (e.g., Glucose ).

  2. Property: The physical quantity (e.g. MCnc for mass concentration vs. SCnc for substance concentration).

  3. Time: The moment of the capture (e.g., Pt for point in time).

  4. System: The sample (e.g., Ser/Plas ).

  5. Scale: The type of result ( Qn quantitative).

  6. Method: (Optional but critical in specific cases).

1.2 UCUM Computational Syntax and CDISC Validation

To ensure that the data are computable, we reject the use of ambiguous text strings for units. We strictly adopt the Unified Code for Units of Measure (UCUM). For example, "International Units" are coded as [iU] , avoiding confusion with arbitrary units of enzyme activity. 4

Simultaneously, to meet FDA regulatory requirements in clinical trials, each biomarker is mapped to CDISC's Study Data Tabulation Model (SDTM). This involves assigning each test a standardized LBTESTCD (Laboratory Test Code) and LBTEST (Test Name), ensuring that a clinical result can move seamlessly from hospital care to a regulatory approval dossier. 4


2. Clinical and Metabolic Chemistry: The Core of Diagnosis

The Comprehensive Metabolic Panel (CMP) represents the largest volume of transactions in global clinical laboratories. Harmonizing these analytes is a priority due to their use in critical decision algorithms, such as dose titration in renal impairment or the management of hyperglycemic crises.

2.1 Carbohydrate Metabolism: Glucose and HbA1c

Glucose measurement presents particular challenges due to the duality of units and the patient's physiological state. In the Latin American (LATAM) context, it is crucial to distinguish between "Fasting Glucose" and "Random Glucose," even though the chemical analyte is identical.

Glucose (Serum/Plasma)

The database prioritizes the generic LOINC code for routine hospital use, which encompasses hexokinase and glucose oxidase methods. It is imperative to note that whole blood glucose (POC) yields values ​​approximately 10–15% lower than plasma glucose due to the red blood cell displacement volume, and therefore requires a different LOINC code.<sup>1</sup>

  • Primary LOINC: 2345-7 (Glucose [Mass/volume] in Serum or Plasma).

  • CDISC mapping: LBTESTCD = GLUC , LBTEST = Glucose .

  • Conversion Factor: The relationship between the conventional units (mg/dL) used in the U.S. and Mexico, and the SI units (mmol/L) used in Canada and Europe, is derived from the molecular weight of glucose ( $C_6H_{12}O_6$ , 180.16 g/mol).

    • Formula: $Value_{mg/dL} \times 0.0555 = Value_{mmol/L}$ . 6

  • LATAM context: Common aliases include "Glycemia", "Basal Glucose" (if fasting, LOINC 1558-6). 8

Glycated Hemoglobin (HbA1c)

The standardization of HbA1c has been the subject of a massive global effort between the National Glycohemoglobin Standardization Program (NGSP) and the International Federation of Clinical Chemistry (IFCC). The database must support both units due to the incomplete global transition.

  • LOINC (NGSP - %): 4548-4 (Hemoglobin A1c/Hemoglobin.total in Blood).

  • LOINC (IFCC - mmol/mol): 59261-8 (Hemoglobin A1c/Hemoglobin.total [Moles/mole] in Blood).

  • Conversion Logic: Unlike other analytes, the ratio is not a simple multiplicative factor, but a master equation derived from comparing reference methods 9:

    $$IFCC_{mmol/mol} = (10.93 \times NGSP_{\%} ) - 23.50$$
    $$NGSP_{\%} = (0.09148 \times IFCC_{mmol/mol}) + 2.152$$
  • Clinical Implication: A value of 7.0% (NGSP) is equivalent to 53 mmol/mol (IFCC). Confusion between these scales can lead to serious therapeutic errors in diabetes management.

2.2 Renal Function: The Traceability Challenge

Creatinine

Creatinine is the cornerstone biomarker for estimating glomerular filtration rate (eGFR). The master database should include metadata on whether the assay is traceable to IDMS (Isotope Dilution Mass Spectrometry), as this alters the applicable eGFR equation (MDRD vs. CKD-EPI).11

  • Primary LOINC: 2160-0 (Creatinine [Mass/volume] in Serum or Plasma).

  • Units and Conversion:

    • Conventional: mg/dL.

    • IF: $\mu$ mol/L (UCUM: umol/L ).

    • Factor: $mg/dL \times 88.4 = \mu mol/L$ . 6

    • Reasoning: The factor is derived from the molecular weight of creatinine (113.12 g/mol).

Blood Urea Nitrogen (BUN) vs. Urea

There is a significant geographical dichotomy: the US and parts of LATAM report the nitrogen content (BUN), while Europe reports the whole urea molecule.

  • LOINC BUN: 3094-0.

  • LOINC Urea: 3093-2.

  • Critical Conversion:

    • From BUN (mg/dL) to Urea (mmol/L): Multiply by 0.357. 6

    • From BUN (mg/dL) to Urea (mg/dL): Multiply by 2.14 (molecular weight ratio 60/28).

    • Data Strategy: Store as BUN for compatibility with North American standards, but automatically calculate Urea for international interfaces.

2.3 Electrolytes and Homeostasis

Electrolytes offer an advantage in harmonization: for monovalent ions ( Na+ , K+ , Cl- ), the units mEq/L and mmol/L are numerically identical. However, the UCUM syntax must be rigorous.

Biomarker Alias ​​Español LOINC CDISC LBTESTCD Conv. Factor
Sodium Serum Natremia 2951-2 SODIUM

1.0 (mEq/L = mmol/L) 1

Potassium Kalemia 2823-3 POTASSIUM

1.0 (mEq/L = mmol/L) 1

Chloride Chloride 2075-0 CHLOR

1.0 (mEq/L = mmol/L) 1

Baking soda Total CO2 2028-9 BICARB

1.0 (mEq/L = mmol/L) 1

Calcium (Total)

Calcium is divalent, which introduces complex conversion factors between mass and molar units.

  • LOINC: 17861-6 (Calcium [Mass/volume] in Serum or Plasma).

  • Conversion: $mg/dL \times 0.25 = mmol/L$ . 6

    • Insight: The atomic weight of Calcium is 40.078 g/mol. The exact conversion is $10 / 40.078 = 0.2495 , clinically rounded to 0.25.

2.4 Clinical Chemistry Master Table (Excerpt)

This table summarizes the essential parameters for implementation in relational databases SQL or FHIR Resources.

ID Common Name (EN/ES) LOINC Matrix UCUM Conv. UCUM SI Factor (C → SI) Fountain
1 Glucose / Glucose 2345-7 Ser/Plas mg/dL mmol/L 0.0555 6
2 Urea Nitrogen / BUN 3094-0 Ser/Plas mg/dL mmol/L 0.357 6
3 Creatinine / Creatinine 2160-0 Ser/Plas mg/dL umol/L 88.4 13
4 Calcium / Calcium 17861-6 Ser/Plas mg/dL mmol/L 0.25 6
5 Albumin 1751-7 Ser/Plas g/dL g/L 10.0 6
6 Total Protein / Proteins 2885-2 Ser/Plas g/dL g/L 10.0 6
7 ALP / Alkaline Phosphatase 6768-6 Ser/Plas U/L U/L 1.0 1
8 ALT (SGPT) / TGP 1742-6 Ser/Plas U/L U/L 1.0 11
9 AST (SGOT) / TGO 1920-8 Ser/Plas U/L U/L 1.0 1
10 Total Bilirubin / Bilirubin 1975-2 Ser/Plas mg/dL umol/L 17.1 6
11 Uric Acid / Ácido Urico 3084-1 Ser/Plas mg/dL umol/L 59.48 14
12 Magnesium / Magnesium 2601-3 Ser/Plas mg/dL mmol/L 0.4114 15
13 Phosphate / Phosphorus 2777-1 Ser/Plas mg/dL mmol/L 0.323 6
14 Lactate / Lactato 2532-0 Plas mg/dL mmol/L 0.111 15
15 Amylase / Amylase 1798-8 Ser/Plas U/L U/L 1.0 1

3. Hematology: Cellular and Morphological Complexity

The Complete Blood Count (CBC) or "Hematology" in Mexico <sup>16</sup> presents unique challenges due to the mix of absolute counts, percentages, and calculated parameters. The database architecture must clearly distinguish between automated and manual methods, prioritizing automated methods due to their prevalence (>95%).

3.1 Cell Counts and Units

The LOINC and UCUM standard for cell counts has evolved. The traditional notation of "thousands per microliter" ( $10^3/\mu L$ ) is functionally equivalent to the SI unit "Giga per liter" ( $10^9/L$ ).

  • Leukocytes (WBC): LOINC 6690-2 (Leukocytes [#/volume] in Blood by Automated counting).

    • CDISC mapping: LBTESTCD = WBC .

  • Erythrocytes (RBC): LOINC 789-8 (Erythrocytes [#/volume] in Blood by Automated counting).

    • Unit: $10^6/\mu L$ or $10^{12}/L$ (Tera/L). Factor 1.0.

  • Plates: LOINC 777-3.

    • Critical Note: In cases of severe thrombocytopenia or EDTA agglutination, manual or citrate counts are used, which requires alternative LOINC codes to avoid corrupting the historical database.

3.2 The Red Blood Cell Series: Hemoglobin and Hematocrit

  • Hemoglobin: LOINC 718-7.

    • Conversion: $g/dL \times 10 = g/L$ . 6 This conversion is standard in Commonwealth countries and European hospital systems.

  • Hematocrit: LOINC 4544-3.

    • Nature: Volume fraction.

    • Conversion: $\% \times 0.01 = L/L$ (decimal fraction).

3.3 Leukocyte Differential: Absolute vs. Relative

A clear trend in health informatics is the prioritization of absolute counts over relative percentages. Clinically, neutropenia is defined by the absolute number of neutrophils, not their percentage. The database should contain both, but mark absolute counts as prioritized for clinical decision rules (CDS).

Cell Absolute LOINC Relative LOINC (%) CDISC Absolute
Neutrophils

751-8 17

770-8 NEUT
Lymphocytes

731-0 17

736-9 LYM
Monocytes

742-7 17

5905-5 BUN
Eosinophils

711-2 17

713-8 EOS
Basophils

704-7 17

706-2 BASO

3.4 Coagulation and the Risk of Units

In the coagulation panel, the greatest threat to data integrity lies in the D-dimer . There are two non-interchangeable reporting units: D-dimer Units (DDU) and Fibrinogen Equivalent Units (FEU).

  • The Problem: The weight of the FEU unit is approximately twice that of DDU ( $1 DDU ≈ 2 FEU ). A value of 500 ng/mL FEU (normal cutoff) is equivalent to 250 ng/mL DDU. Confusing these units can lead to false positives or negatives in the exclusion of pulmonary embolism. 18

  • Database Solution:

    • D-Dimer FEU: LOINC 48066-5 (Fibrin D-dimer FEU [Mass/volume] in Platelet poor plasma).

    • D-Dimer DDU: LOINC 48065-7.

    • A strict validation rule is required to prevent the merging of this data without explicit conversion (Factor 1.7 - 2.0 depending on the kit). 20


4. Endocrinology: Hormones and Functional Axes

Endocrine biomarkers present challenges in analytical sensitivity (e.g., 3rd generation TSH) and extremely small units (picograms). Furthermore, steroid hormones require conversions based on specific molecular weights.

4.1 Thyroid Function (Thyroid Profile)

The thyroid profile is fundamental for metabolic screening. In Mexico and Latin America, profiles that include Total T3 and Free T3 are often requested, unlike Anglo-Saxon guidelines that prioritize TSH and Free T4. 21

  • TSH (Thyroid Stimulating Hormone):

    • LOINC: 11579-0 (Thyrotropin [Units/volume] in Serum or Plasma).

    • Units: $\mu IU/mL$ is numerically equal to $mIU/L$ .

  • Free T4 (Free Thyroxine):

    • LOINC: 3024-7.

    • Conversion: $ng/dL \times 12.87 = pmol/L$ . 22

    • Insight: This conversion is critical. A value of 1.0 ng/dL is normal, but 1.0 pmol/L is incompatible with life. Range validation is essential.

  • Total T3 (Triiodothyronine):

    • LOINC: 3053-6.

    • Conversion: $ng/dL \times 0.0154 = nmol/L$ . 24

4.2 Reproductive Hormones and Steroids

Mass spectrometry (LC-MS/MS) is becoming the gold standard for testosterone and estradiol due to the low accuracy of immunoassays at low concentrations (women/children). The database should include flags for LC-MS methods. 25

Hormone LOINC Conventional Unit SI Unit Factor Molecular weight
Total Testosterone 2986-8 ng/dL nmol/L

0.0347 13

288.4 g/mol 26

Estradiol (E2) 2243-4 pg/mL pmol/L

3.67 13

272.4 g/mol 27

Progesterone 2608-8 ng/mL nmol/L

3.18 13

314.5 g/mol 28

FSH 15067-2 mIU/mL IU/L 1.0 Protein
LH 10501-5 mIU/mL IU/L 1.0 Protein
  • Technical Note: Testosterone has a molecular weight of 288.4 g/mol. The factor is derived from: $10 (dL/L) / 288.4 \approx 0.0347$ .

4.3 Vitamins and Immunoglobulins

  • Vitamin D (25-OH): Measures the sum of D2 and D3.

    • LOINC: 62292-8.

    • Conversion: $ng/mL \times 2.496 = nmol/L$ . 23

    • Regulation: Although supplement labels use IU (40 IU = 1 mcg), clinical laboratories use mass or moles. 29

  • Vitamin B12:

    • LOINC: 2132-9.

    • Conversion: $pg/mL \times 0.738 = pmol/L$ . 24

  • Immunoglobulins (IgG, IgA, IgM):

    • They are usually reported in mg/dL in the US and g/L in the rest of the world.

    • Conversion: $mg/dL \times 0.01 = g/L$ . 15

    • LOINCs: IgG (2465-3), IgA (2458-8), IgM (2472-9). 31


5. Tumor Markers and Cardiology

5.1 Oncology: Standardization of Markers

Tumor markers are complex proteins where the unit "U/mL" depends on the international reference standard (e.g., WHO standards).

Marker Clinical Use LOINC Unit Fountain
PSA Total Prostate 2857-1 ng/mL 32
CEA Colon/General 2039-6 ng/mL 33
CA-125 Ovary 10334-1 U/mL 34
CA 19-9 Pancreas 24108-3 U/mL 33
CA 15-3 Mother 6875-9 U/mL 33
AFP Liver/Testicle 1834-1 ng/mL 35
  • Security Alert: Alpha-fetoprotein (AFP) has a dual use. As a tumor marker, it is reported in ng/mL. As a prenatal screening tool (for neural tube defects), it is reported in MoM (Multiples of the Median). The database must strictly separate these contexts to avoid misdiagnosing cancer in pregnant women.

5.2 Cardiology: Troponins and Peptides

The transition to High Sensitivity Troponins (hs-cTn) requires a change in LOINC code and units.

  • Troponin I (High Sensitivity): LOINC 89579-7.

  • Unit: ng/L (nanograms per liter) is preferred to report whole numbers (e.g., 14 ng/L) instead of confusing decimals (0.014 ng/mL).

  • Natriuretic Peptides: It is vital to distinguish between BNP (LOINC 30934-4) and NT-proBNP (LOINC 33762-6). They are not interchangeable or directly mathematically convertible due to differences in half-life and renal clearance. 36


6. Monitoring of Therapeutic Drugs (TDM) and Trace Elements

6.1 Drugs with a Narrow Therapeutic Index

Monitoring requires accuracy in matrix (whole blood vs serum) and time (trough/peak).

  • Immunosuppressants (Ciclosporine, Tacrolimus, Sirolimus):

    • Matrix: Whole Blood , since they are sequestered in erythrocytes. Analyzing serum would result in falsely low values ​​(close to zero).

    • LOINC Tacrolimus: 7976-3. 38

    • LOINC Ciclosporin: 3520-4. 39

    • Tacrolimus conversion: $ng/mL \times 1.244 = nmol/L$ .

  • Anticonvulsants:

    • Phenytoin: LOINC 3968-5. Range: 10-20 $\mu g/mL$ . 40

      • Conversion: $\mu g/mL \times 3.96 = \mu mol/L$ .

    • Valproic Acid: LOINC 4086-5. Range: 50-100 µg/mL .

      • Conversion: $\mu g/mL \times 6.93 = \mu mol/L$ .

    • Carbamazepine: LOINC 3432-2. Conversion factor 4.23.7

6.2 Trace Elements and Contamination

For metals such as Zinc, Copper and Lead, the database should include pre-analytical instructions: "Royal Blue Tube free of metals."

  • Lead: LOINC 5732-3 (Blood).

    • Conversion: $\mu g/dL \times 0.0483 = \mu mol/L$ . 13

  • Zinc: LOINC 5763-8. Low levels may indicate nutritional deficiency or sepsis (acute phase response). 41


7. CDISC and FDA Regulatory Mapping

For the submission of data in clinical trials (FDA New Drug Applications), internal codes must be mapped to the CDISC Controlled Terminology standard. The LB (Laboratory) domain is the recipient of this data.

7.1 Mapping Structure

  • LBTESTCD: Short code (max 8 characters). Ex: GLUC .

  • LBTEST: Standardized full name. E.g.: Glucose .

  • LBLOINC: The LOINC code used.

  • LBSTRESN: Standardized numerical result (usually in SI).

Example of a Master Record:

  • Test: White Blood Cell Count.

  • LOINC: 6690-2.

  • CDISC LBTESTCD: WBC .

  • CDISC LBTEST: White Blood Cells .

  • CDISC LBORRESU (Original Units): 10^3/uL .

  • CDISC LBSTERSU (Standard Units): 10^9/L .

This mapping ensures that data generated in a laboratory in Puebla (Laboratorios Ruiz) are semantically identical to those from a laboratory in Rochester (Mayo Clinic) when integrated into a multicenter study. 4


8. Implementation: Detailed Master Tables

The consolidated master tables for the main categories are presented below, integrating codes, aliases in Spanish, and validated conversion factors.

8.1 Master Table: Metabolic and Renal Panel

Common Name Alias ​​Español (LATAM) LOINC Matrix Conventional Unit SI Unit Factor (C → SI) CDISC Code
Glucose Glucose / Blood glucose 2345-7 Serum mg/dL mmol/L 0.0555 GLUC
Urea Nitrogen BUN / Blood Urea Nitrogen 3094-0 Serum mg/dL mmol/L (Urea) 0.357 BUN
Creatine Creatinine 2160-0 Serum mg/dL umol/L 88.4 CREATE
Sodium Sodium / Natremia 2951-2 Serum mEq/L mmol/L 1.0 SODIUM
Potassium Potassium / Kalemia 2823-3 Serum mEq/L mmol/L 1.0 POTASSIUM
Chloride Chlorine / Chloride 2075-0 Serum mEq/L mmol/L 1.0 CHLOR
Calcium Total Calcium 17861-6 Serum mg/dL mmol/L 0.25 AC
Albumin Albumin 1751-7 Serum g/dL g/L 10.0 ALB
Total Protein Total Proteins 2885-2 Serum g/dL g/L 10.0 PROT
Bilirubin Tot Total Bilirubin 1975-2 Serum mg/dL umol/L 17.1 BILI
ALP Alkaline Phosphatase 6768-6 Serum U/L U/L 1.0 ALP
ALT TGP / Alanine Aminot. 1742-6 Serum U/L U/L 1.0 ALT
AST TGO / Aspartate Aminot. 1920-8 Serum U/L U/L 1.0 AST

8.2 Master Table: Lipids and Cardiac

Common Name Alias ​​Español LOINC Matrix Conventional Unit SI Unit Factor CDISC Code
Cholesterol Total Cholesterol 2093-3 Serum mg/dL mmol/L 0.0259 CHOL
Triglycerides Triglycerides 2571-8 Serum mg/dL mmol/L 0.0113 TRIG
HDL HDL Cholesterol 2085-9 Serum mg/dL mmol/L 0.0259 HDL
LDL (Calc) LDL Cholesterol 13457-7 Serum mg/dL mmol/L 0.0259 LDL
CK-MB CPK-MB 13969-1 Serum ng/mL ug/L 1.0 CKMB
Troponin I Troponin I (High Sensitivity) 89579-7 Serum ng/L ng/L 1.0 TROPIS
BNP B Natriuretic Peptide 30934-4 Serum pg/mL ng/L 1.0 BNP
NT-proBNP NT-proBNP 33762-6 Serum pg/mL ng/L 1.0 NTBNP

8.3 Master Table: Hematology and Coagulation

Common Name Alias ​​Español LOINC Matrix Conventional Unit SI Unit Factor CDISC Code
WBC Leukocytes 6690-2 Blood 10*3/uL 10*9/L 1.0 WBC
RBC Erythrocytes 789-8 Blood 10*6/uL 10*12/L 1.0 RBC
Hemoglobin Hemoglobin 718-7 Blood g/dL g/L 10.0 HGB
Hematocrit Hematocrit 4544-3 Blood % Fraction 0.01 HCT
Platelets Platelets 777-3 Blood 10*3/uL 10*9/L 1.0 PLATE
Neutrophils Neutrophils Abs 751-8 Blood 10*3/uL 10*9/L 1.0 NEUT
PT Prothrombin Time 5902-2 Plasma s s 1.0 PT
INR INR 6301-6 Plasma {ratio} {ratio} 1.0 INR
APTT aPTT / Thromboplastin 14979-9 Plasma s s 1.0 APTT
Fibrinogen Fibrinogen 3255-7 Plasma mg/dL g/L 0.01 FIB
D-Dimer FEU D-dimer (FEU) 48066-5 Plasma ng/mL mg/L (FEU) 0.001 DDIMER

9. Conclusion and Implementation Recommendations

This technical specification provides the necessary basis for building a laboratory data management system (LIMS) or clinical data repository (CDR) that is semantically robust and meets international standards.

Key Points for Implementation:

  1. Unit Validation: The system must reject data entry if the unit does not match the defined UCUM syntax. "uIU/mL" should not be allowed as free text; it must be [iU]/mL .

  2. D-Dimer Segregation: It is imperative to configure distinct internal test codes for D-Dimer DDU and FEU to prevent errors in clinical interpretation in the emergency department.

  3. Therapeutic Mapping: For drugs such as Cyclosporine and Tacrolimus, the system should force the selection of "Whole Blood" as the matrix, rejecting "Serum" to avoid falsely low reports that could lead to iatrogenic overdose.

  4. Localization: Use the provided Spanish aliases (e.g., "Biometría Hemática") in the user interface, but maintain the LOINC and CDISC codes in the database layer to ensure data portability.

By adhering to this framework, healthcare organizations not only improve the quality of their internal data, but also enable the ability to participate in global research networks and meet the strictest regulatory requirements.