Excel Is To Blame for Major Types of Errors In 20% of Scientific Papers on Genes

Excel Is To Blame for Major Types of Errors In 20% of Scientific Papers on Genes
Excel Sheet

As we all know, excel is use to create big charts, conditional formatting, online accessing, bringing data simultaneously and much more. But according to a new study, excel’s default formatting setting is moderately responsible for major errors in 20% of scientific papers discussion gene.

According to scientists, this errors which come from Excel autocorrecting common gene names into dates or numbers. Once, it develops, then it is difficult to fix them. This is because there is no way to disable these features permanently.

Scientists have scanned 35,175 Excel spreadsheets from 3,597 papers published in 19 different journals between 2005 and 2015 by a specialized program. Through this program, they identified problems in 704 of these papers.

Thus, they can identify, these gene naming errors are actually caused by Excel’s default formatting. The Microsoft Excel, when used with default settings, is known to convert gene names to dates and floating-point numbers.

Such types of errors occur while scientists are trying to input a large amount of data into an excel program. It’s because the large data needs to deal with dates and floating-point numbers. For example, you’re texting someone and your phone keeps trying to autocorrect you on a particular word. Most of the times you notice it and manually fix the word before sending. But other times you’ll hit send before correcting or even noticing it.

Scientists explained, “For example, gene symbols such as SEPT2 (Septin 2) and MARCH1 [Membrane-Associated Ring Finger (C3HC4) 1, E3 Ubiquitin Protein Ligase] are converted by default to ‘2-Sep’ and ‘1-Mar’, respectively.

Furthermore, RIKEN identifiers were described to be automatically converted to floating-point numbers (i.e. from accession ‘2310009E13’ to ‘2.31E+13’). .[W]e have uncovered further instances where gene symbols were converted to dates in supplementary data of recently published papers (e.g. ‘SEPT2′ converted to ‘2006/09/02’).”

If scientists hit on the automatic formatting, rather than changing it back to the name of the gene, it will simply change the incorrectly autocorrected date’s format.

Christopher Ingraham explained, “If a researcher types “MARCH1”, Excel would make it “1-MAR”. If the researcher then hit “undo” on the formatting, it would appear as “42430”. It’s because how Excel stores date internally.”

Remembering the format of each column in a spreadsheet before typing anything is the only way to completely avoid these errors.

Scientists hope that this study will increase awareness of this issues in excel. Through this various researchers will know that their work might stop at last moment by a simple programming error.