Wednesday, 22 February 2012

New Mammal Species Discoveries Mini-Project

So as part of our credit for phyloinformatics, my friend Peggy and I set ourselves the task of finding ways to visualise the newly described mammal species described in Global Trends and Biases in New Mammal Species Discoveries by D.M. Reeder, K.M. Helgen & D.E. Wilson (2007).

All the data is in pdf form and there's a lot of it; 22 pages of 341 extant mammal species, so our first priority is to extract all that juicy info and condense it into a more malleable form.
Peggy found a piece of software 'Able2Extract' which is basically a pdf converter.
The version we are using is a trial version so in addition to a time limit there are some limited functions.

Here's an example of extracting some data from the pdf.
You simply highlight the data you want to capture then choose which format you would like it extracted in e.g. Excel, Word, Power Point, HTML or as an image.


For this one I exported it to Excel and it works pretty well.
However, since the paper's authors had to split the species names onto two separate rows for space, the Excel file does the same.

So now we have to combine the text from different cells together in order to have complete species names.

There seems to be a few different methods of doing this but they assume that you will be merging the text cells into a new cell, rather than merging into the existing text cell
i.e. B4 attenboroughi into B3 Zaglossus


In a worst-case scenario you would simply go through each species name manually merging them, but speaking as someone who did a LOT of this for their undergrad I can't bear the thought of an evening's ctrl X, ctrl V session.

For the moment we will continue to extract the data and do some more background reading.

Also, my project partner has a shinier blog than me which she updates regularly and will be recording her progress on this project much the same way as I am doing:
http://biobubblespeg.blogspot.com/


No comments:

Post a Comment