Apologies if this is in the wrong sub, but since the script is Python-based (specifically as a Jupyter Notebook), I thought it might be a good idea to start here.
I have a script that calls an API asynchrously like 50 times to create a single set of data stored in a parquet file. This process repeats continuously, which generates about 300 GB of data per month. One file takes about ~5 minutes to create.
The process seems too intensive to run on a Pi 4, tried that already. And I don’t know enough beyond rote programming to clearly understand the possible options outside there, except AWS and the likes (which I think would break the bank due to the data creation and storage bit).
I want to move this entire process online, and have no idea where to start or how to do this in an economically efficient manner. At the moment, this is done entirely on my computer.
I may streamline this in the future, such as only storing changed/updated/newly added records only, but I havent been able to think that far as of yet.
Any suggestions would be welcomed.
submitted by /u/Tea_n00b
[link] [comments]
r/learnpython Apologies if this is in the wrong sub, but since the script is Python-based (specifically as a Jupyter Notebook), I thought it might be a good idea to start here. I have a script that calls an API asynchrously like 50 times to create a single set of data stored in a parquet file. This process repeats continuously, which generates about 300 GB of data per month. One file takes about ~5 minutes to create. The process seems too intensive to run on a Pi 4, tried that already. And I don’t know enough beyond rote programming to clearly understand the possible options outside there, except AWS and the likes (which I think would break the bank due to the data creation and storage bit). I want to move this entire process online, and have no idea where to start or how to do this in an economically efficient manner. At the moment, this is done entirely on my computer. I may streamline this in the future, such as only storing changed/updated/newly added records only, but I havent been able to think that far as of yet. Any suggestions would be welcomed. submitted by /u/Tea_n00b [link] [comments]
Apologies if this is in the wrong sub, but since the script is Python-based (specifically as a Jupyter Notebook), I thought it might be a good idea to start here.
I have a script that calls an API asynchrously like 50 times to create a single set of data stored in a parquet file. This process repeats continuously, which generates about 300 GB of data per month. One file takes about ~5 minutes to create.
The process seems too intensive to run on a Pi 4, tried that already. And I don’t know enough beyond rote programming to clearly understand the possible options outside there, except AWS and the likes (which I think would break the bank due to the data creation and storage bit).
I want to move this entire process online, and have no idea where to start or how to do this in an economically efficient manner. At the moment, this is done entirely on my computer.
I may streamline this in the future, such as only storing changed/updated/newly added records only, but I havent been able to think that far as of yet.
Any suggestions would be welcomed.
submitted by /u/Tea_n00b
[link] [comments]