Automating Data Entry (SurveyCTO and Regression Tables) /u/bkbk57293 Python Education

I’m working as a research assistant gathering data on randomized experiments in preparation for a meta analysis. I read a paper. I then fill out a survey in SurveyCTO. I enter basic background information, and then for each treatment effect in each table in the paper, I answer a series of questions, like what is outcome variable, what are the units, what is the statistical significance level of the estimate, what variables are controlled for in the regression, and so on. Sometimes there are over 100 treatment effect estimates, for various outcomes and time horizons. I’ve used pdfplumber in the past to automatically write tables to a csv, and I expect that a first step will be to gather all the treatment effect numbers so that I don’t have to scroll through the paper to find them for each question. The questiona are asked in the form “what is the treatment effect for outcome-time_horizon?” so then I would need to match the entries in the collection of treatment effects to the question based on “outcome” and “time_horizon.” I see Automate the Boring Stuff has a chapter on GUI Automation, which I will look at, and I have some experience with selenium, which I may need to use, but I assume there are people with experience having to do rote data entry in SurveyCTO, so I wanted to see if there are standard approaches. The goal is not to have the entire process automated, but to have some tools I can layer over the current process to both speed it up and maybe convince my supervisors that more of it can be automated.

submitted by /u/bkbk57293
[link] [comments]

r/learnpython I’m working as a research assistant gathering data on randomized experiments in preparation for a meta analysis. I read a paper. I then fill out a survey in SurveyCTO. I enter basic background information, and then for each treatment effect in each table in the paper, I answer a series of questions, like what is outcome variable, what are the units, what is the statistical significance level of the estimate, what variables are controlled for in the regression, and so on. Sometimes there are over 100 treatment effect estimates, for various outcomes and time horizons. I’ve used pdfplumber in the past to automatically write tables to a csv, and I expect that a first step will be to gather all the treatment effect numbers so that I don’t have to scroll through the paper to find them for each question. The questiona are asked in the form “what is the treatment effect for outcome-time_horizon?” so then I would need to match the entries in the collection of treatment effects to the question based on “outcome” and “time_horizon.” I see Automate the Boring Stuff has a chapter on GUI Automation, which I will look at, and I have some experience with selenium, which I may need to use, but I assume there are people with experience having to do rote data entry in SurveyCTO, so I wanted to see if there are standard approaches. The goal is not to have the entire process automated, but to have some tools I can layer over the current process to both speed it up and maybe convince my supervisors that more of it can be automated. submitted by /u/bkbk57293 [link] [comments]

submitted by /u/bkbk57293
[link] [comments]

Leave a Reply Cancel reply