Said yes to a request to help do some data analysis for someone, for free.
Turned out the data was 17K files, 77GB of data, and the data is fairly 'dirty' and in a number of different structures of delimited separated data - tab, commas, etc.
Bloody mistake on my part. If I don't know the reliability of a dataset, I am not going to help someone out without looking into it first. I've burnt near 20 hours on it already in the last 5-6 weeks.
I didn't really have time to deal with this amount of data, and the requestor has deadlines.
Advice for the reader: don't volunteer without understanding the task ahead of you.
Turned out the data was 17K files, 77GB of data, and the data is fairly 'dirty' and in a number of different structures of delimited separated data - tab, commas, etc.
Bloody mistake on my part. If I don't know the reliability of a dataset, I am not going to help someone out without looking into it first. I've burnt near 20 hours on it already in the last 5-6 weeks.
I didn't really have time to deal with this amount of data, and the requestor has deadlines.
Advice for the reader: don't volunteer without understanding the task ahead of you.
no subject
Date: 2012-02-05 07:21 pm (UTC)no subject
Date: 2012-02-05 07:28 pm (UTC)I strongly suggested studying SQL and learning how to handle data.
no subject
Date: 2012-02-05 07:30 pm (UTC)no subject
Date: 2012-02-05 07:44 pm (UTC)no subject
Date: 2012-02-06 12:14 pm (UTC)