Gah!

Feb. 5th, 2012 11:12 am
vlion: (matrix)
[personal profile] vlion
Said yes to a request to help do some data analysis for someone, for free.

Turned out the data was 17K files, 77GB of data, and the data is fairly 'dirty' and in a number of different structures of delimited separated data - tab, commas, etc.

Bloody mistake on my part. If I don't know the reliability of a dataset, I am not going to help someone out without looking into it first. I've burnt near 20 hours on it already in the last 5-6 weeks.

I didn't really have time to deal with this amount of data, and the requestor has deadlines.

Advice for the reader: don't volunteer without understanding the task ahead of you.

Date: 2012-02-05 07:21 pm (UTC)
foxfirefey: A headless unicorn from Diesel Sweeties. (headless unicorn)
From: [personal profile] foxfirefey
Oh lordy. That was very unfair of them! I'm sorry you ended up in such a pickle.

Date: 2012-02-05 07:30 pm (UTC)
foxfirefey: A guy looking ridiculous by doing a fashionable posing with a mouse, slinging the cord over his shoulders. (geek)
From: [personal profile] foxfirefey
SQLite is very handy for these kinds of things, assuming it can scale that much.

Date: 2012-02-06 12:14 pm (UTC)
exor674: Computer Science is my girlfriend (Default)
From: [personal profile] exor674
I am actually amazed at SQLite, my dataset is not even close to that large, but I use SQLite for the accounting datastore ( we've fetched these, we've got to fetch these ) when I do an edges fetch, and I have been *SO* waiting for SQLite to just fall over -- it hasn't.

Profile

vlion: cut of the flammarion woodcut, colored (Default)
vlion

May 2019

S M T W T F S
   1234
567891011
12131415161718
19 202122232425
262728293031 

Style Credit

Page generated Jan. 28th, 2026 10:06 pm
Powered by Dreamwidth Studios

Expand Cut Tags

No cut tags

Most Popular Tags

Page Summary