The process to which a normal cell progresses into a tumour cell has been identified as a series of mutations. Multiplying by each cell cycle, the tumour cells begin to exceed the normal cells and spread throughout the organ, eventually to other parts of the body. The complexity of cancer lies in its mutations, there is no one method of diagnosis, nor is there a specific gene damaged, many are impacted, different mechanisms are implicated and as a result, our treatment methodology will need to shift towards a personalized approach. An approach that considers each patient as a unique case.
I recently completed a quick python project that involved organizing The Wisconsin breast cancer data set, diagnostic measurements of 699 patients breast cancer cells, by tumour characteristics and lethality. I was able to identify correlations between variables and in turn determine the variables associated with malignant (lethal cancer) cases.
Results indicated that diagnostic measurements with a high score (indicative of increased mutations) were strongly correlated with a high variability in variables cell shape and size. Furthermore, an increased variability in mitosis (cell division) was also strongly correlated with malignancy in patients. These are general observations we have come to appreciate and understand through research, but it’s interesting to note that patterns seen in older data sets are still valid in the current diagnosis of breast cancer. If we are able to train computers (via supervised learning) with these older data sets to identify patterns and correlations, we can implement these algorithms into our current diagnosis practices. By inputting factors such as tumour size and shape, we can determine the lethality and subsequent form of treatment for each patient.
Figure: Wisconsin Breast Cancer Dataset. Measurements included the indicated variables, organizing results from a score of 1 (low mutations) – 10 (high mutations). Correlations were identified between a cells shape, size and mitosis rate. Cells that contained a high rate of mitosis were correlated with a high malignant score.
Researching previous datasets identifies patterns and trends, algorithms are based on a series of executable steps that takes the user from Point A to B and as such with the presence of historical data we are able to map out predictive algorithms/tests. Using databases allows us to develop algorithms capable of identifying trends and patterns we deem important, outputs designed to guide decision-making processes. With the increased amount of historical data and new data, it becomes imperative to utilize the data already available, organize it, determine correlations and implement it into a useful decision-making algorithm.