need advice on how to split .html into smaller .txt files to use later with Azure TTS. /u/zlost666_ Python Education

I'm new. I am trying to make a script which I will use later with Azure text to speech service. I need to divide bigger text into smaller parts. I have a html book. I created a small fragment from a book with 2 chapters for tests. I use Beautiful Soup to parse local html. I use a while loop to iterate through two chapters, I take chapter #1, then a for loop to iterate through letters in chapter, the first 100 letters I want to put in ranobe_1.text, the remaining letters from chapter I want to put in ranobe_2.text. Then I want to perform the same operation for chapter #2 and get ranobe_3.text and ranobe_4.text In this example, I want to get the output of 4 text files from the html. For a while, I just hard coded the output of 2 text files, ranobe_1.text and ranobe_2.text. I tried to iterate the count of output text files, but didn't find how to do it properly yet. <!DOCTYPE html> <html lang="en"> <head> <meta charset="UTF-8"> <title>A Record of a Mortal's Journey to Immortality</title> </head> <body> </section><section> <title><p>Chapter 1. The Village by the Forest</p></title> <p>“Second Fool” opened his eyes and stared at the mud and thatch roof over his head.</p> </section><section> <title><p>Chapter 2. Green Ox Village</p></title> <p>Han Li’s home was said to be a small city, but it was actually just a large village called Green Ox Village.</p> </section><section> </body> </html> from bs4 import BeautifulSoup file = open("example.html") soup = BeautifulSoup(file, 'html.parser') text = soup.find_all("section") chapter_number = 0 while chapter_number < 2: text_input = text[chapter_number].text text_input = text_input.replace(' ', ' ').replace('***', ' ') count_letter = 0 for letter in text_input: count_letter += 1 if count_letter <= 100: new_file = open(f"downloaded_files/{'ranobe_' + str(1) + '.txt'}", "a") new_file.write(letter) new_file.close elif count_letter > 100: new_file = open(f"downloaded_files/{'ranobe_' + str(2) + '.txt'}", "a") new_file.write(letter) new_file.close() chapter_number += 1 file.close()

submitted by /u/zlost666_
[link] [comments]

r/learnpython I’m new. I am trying to make a script which I will use later with Azure text to speech service. I need to divide bigger text into smaller parts. I have a html book. I created a small fragment from a book with 2 chapters for tests. I use Beautiful Soup to parse local html. I use a while loop to iterate through two chapters, I take chapter #1, then a for loop to iterate through letters in chapter, the first 100 letters I want to put in ranobe_1.text, the remaining letters from chapter I want to put in ranobe_2.text. Then I want to perform the same operation for chapter #2 and get ranobe_3.text and ranobe_4.text In this example, I want to get the output of 4 text files from the html. For a while, I just hard coded the output of 2 text files, ranobe_1.text and ranobe_2.text. I tried to iterate the count of output text files, but didn’t find how to do it properly yet. <!DOCTYPE html> <html lang=”en”> <head> <meta charset=”UTF-8″> <title>A Record of a Mortal’s Journey to Immortality</title> </head> <body> </section><section> <title><p>Chapter 1. The Village by the Forest</p></title> <p>“Second Fool” opened his eyes and stared at the mud and thatch roof over his head.</p> </section><section> <title><p>Chapter 2. Green Ox Village</p></title> <p>Han Li’s home was said to be a small city, but it was actually just a large village called Green Ox Village.</p> </section><section> </body> </html> from bs4 import BeautifulSoup file = open(“example.html”) soup = BeautifulSoup(file, ‘html.parser’) text = soup.find_all(“section”) chapter_number = 0 while chapter_number < 2: text_input = text[chapter_number].text text_input = text_input.replace(‘ ‘, ‘ ‘).replace(‘***’, ‘ ‘) count_letter = 0 for letter in text_input: count_letter += 1 if count_letter <= 100: new_file = open(f”downloaded_files/{‘ranobe_’ + str(1) + ‘.txt’}”, “a”) new_file.write(letter) new_file.close elif count_letter > 100: new_file = open(f”downloaded_files/{‘ranobe_’ + str(2) + ‘.txt’}”, “a”) new_file.write(letter) new_file.close() chapter_number += 1 file.close() submitted by /u/zlost666_ [link] [comments]

I'm new. I am trying to make a script which I will use later with Azure text to speech service. I need to divide bigger text into smaller parts. I have a html book. I created a small fragment from a book with 2 chapters for tests. I use Beautiful Soup to parse local html. I use a while loop to iterate through two chapters, I take chapter #1, then a for loop to iterate through letters in chapter, the first 100 letters I want to put in ranobe_1.text, the remaining letters from chapter I want to put in ranobe_2.text. Then I want to perform the same operation for chapter #2 and get ranobe_3.text and ranobe_4.text In this example, I want to get the output of 4 text files from the html. For a while, I just hard coded the output of 2 text files, ranobe_1.text and ranobe_2.text. I tried to iterate the count of output text files, but didn't find how to do it properly yet. <!DOCTYPE html> <html lang="en"> <head> <meta charset="UTF-8"> <title>A Record of a Mortal's Journey to Immortality</title> </head> <body> </section><section> <title><p>Chapter 1. The Village by the Forest</p></title> <p>“Second Fool” opened his eyes and stared at the mud and thatch roof over his head.</p> </section><section> <title><p>Chapter 2. Green Ox Village</p></title> <p>Han Li’s home was said to be a small city, but it was actually just a large village called Green Ox Village.</p> </section><section> </body> </html> from bs4 import BeautifulSoup file = open("example.html") soup = BeautifulSoup(file, 'html.parser') text = soup.find_all("section") chapter_number = 0 while chapter_number < 2: text_input = text[chapter_number].text text_input = text_input.replace(' ', ' ').replace('***', ' ') count_letter = 0 for letter in text_input: count_letter += 1 if count_letter <= 100: new_file = open(f"downloaded_files/{'ranobe_' + str(1) + '.txt'}", "a") new_file.write(letter) new_file.close elif count_letter > 100: new_file = open(f"downloaded_files/{'ranobe_' + str(2) + '.txt'}", "a") new_file.write(letter) new_file.close() chapter_number += 1 file.close()

submitted by /u/zlost666_
[link] [comments]

Leave a Reply Cancel reply