User and Password Interface Pt. 2 /u/Take-The-L-Train Python Education

User and Password Interface Pt. 2 /u/Take-The-L-Train Python Education

This is a continuation of a post I made earlier. I think I have a better grasp of while loops and if/else statements, but there are still a few kinks in my code.

The code is:

  1. print(‘Welcome to the USG Ishimura.’) 2.
  2. user = input(‘Please enter your user ID: ‘)
  3. print(‘Welcome, ‘ , user) 5.
  4. while True:
  5. x = input(‘Please enter your passcode: ‘)
  6. if x == ‘987413’:
  7. if user == ‘Isaac Clarke’:
  8. print(‘Access granted.’)
  9. break
  10. else:
  11. print(‘Access denied. Please try again.’)
  12. if x == ‘987319’:
  13. if user == ‘Zachary Hammond’:
  14. print(‘Access granted.’)
  15. break
  16. else:
  17. print(‘Access denied. Please try again.’)
  18. if x == ‘987665’:
  19. if user == ‘Kendra Daniels’:
  20. print(‘Access granted.’)
  21. break
  22. else:
  23. print(‘Access denied. Please try again.’)

Everything’s working the way that I want it to, but when I put in Zachary Hammond or Kendra Daniels’ username and passcode, the terminal will print out ‘Access denied. Please try again.’ as well as ‘Access granted.’ Basically, both the if and the else statement are printing out at the same time. How can I correct this so only the if or the else statement will print out at once?

submitted by /u/Take-The-L-Train
[link] [comments]

​r/learnpython This is a continuation of a post I made earlier. I think I have a better grasp of while loops and if/else statements, but there are still a few kinks in my code. The code is: print(‘Welcome to the USG Ishimura.’) 2. user = input(‘Please enter your user ID: ‘) print(‘Welcome, ‘ , user) 5. while True: x = input(‘Please enter your passcode: ‘) if x == ‘987413’: if user == ‘Isaac Clarke’: print(‘Access granted.’) break else: print(‘Access denied. Please try again.’) if x == ‘987319’: if user == ‘Zachary Hammond’: print(‘Access granted.’) break else: print(‘Access denied. Please try again.’) if x == ‘987665’: if user == ‘Kendra Daniels’: print(‘Access granted.’) break else: print(‘Access denied. Please try again.’) Everything’s working the way that I want it to, but when I put in Zachary Hammond or Kendra Daniels’ username and passcode, the terminal will print out ‘Access denied. Please try again.’ as well as ‘Access granted.’ Basically, both the if and the else statement are printing out at the same time. How can I correct this so only the if or the else statement will print out at once? submitted by /u/Take-The-L-Train [link] [comments] 

This is a continuation of a post I made earlier. I think I have a better grasp of while loops and if/else statements, but there are still a few kinks in my code.

The code is:

  1. print(‘Welcome to the USG Ishimura.’) 2.
  2. user = input(‘Please enter your user ID: ‘)
  3. print(‘Welcome, ‘ , user) 5.
  4. while True:
  5. x = input(‘Please enter your passcode: ‘)
  6. if x == ‘987413’:
  7. if user == ‘Isaac Clarke’:
  8. print(‘Access granted.’)
  9. break
  10. else:
  11. print(‘Access denied. Please try again.’)
  12. if x == ‘987319’:
  13. if user == ‘Zachary Hammond’:
  14. print(‘Access granted.’)
  15. break
  16. else:
  17. print(‘Access denied. Please try again.’)
  18. if x == ‘987665’:
  19. if user == ‘Kendra Daniels’:
  20. print(‘Access granted.’)
  21. break
  22. else:
  23. print(‘Access denied. Please try again.’)

Everything’s working the way that I want it to, but when I put in Zachary Hammond or Kendra Daniels’ username and passcode, the terminal will print out ‘Access denied. Please try again.’ as well as ‘Access granted.’ Basically, both the if and the else statement are printing out at the same time. How can I correct this so only the if or the else statement will print out at once?

submitted by /u/Take-The-L-Train
[link] [comments]  This is a continuation of a post I made earlier. I think I have a better grasp of while loops and if/else statements, but there are still a few kinks in my code. The code is: print(‘Welcome to the USG Ishimura.’) 2. user = input(‘Please enter your user ID: ‘) print(‘Welcome, ‘ , user) 5. while True: x = input(‘Please enter your passcode: ‘) if x == ‘987413’: if user == ‘Isaac Clarke’: print(‘Access granted.’) break else: print(‘Access denied. Please try again.’) if x == ‘987319’: if user == ‘Zachary Hammond’: print(‘Access granted.’) break else: print(‘Access denied. Please try again.’) if x == ‘987665’: if user == ‘Kendra Daniels’: print(‘Access granted.’) break else: print(‘Access denied. Please try again.’) Everything’s working the way that I want it to, but when I put in Zachary Hammond or Kendra Daniels’ username and passcode, the terminal will print out ‘Access denied. Please try again.’ as well as ‘Access granted.’ Basically, both the if and the else statement are printing out at the same time. How can I correct this so only the if or the else statement will print out at once? submitted by /u/Take-The-L-Train [link] [comments]

Read more

Learning Python best way /u/Flimsy_Gur7966 Python Education

Learning Python best way /u/Flimsy_Gur7966 Python Education

Hi guys! Im in love with python and so far i take p4e and the python crash course from coursera, i want to expand my little knowledge way more, i saw in Udemy a couple of courses “100 days of code” “python for data science and machine learning” and “python bootcamp from zero to hero” all 3 for arounf 68$ dollars, im more of a guy that learns by typing and messing with the program, you guys think those 3 udemy programs would be good or should i invest somewhere else?

submitted by /u/Flimsy_Gur7966
[link] [comments]

​r/learnpython Hi guys! Im in love with python and so far i take p4e and the python crash course from coursera, i want to expand my little knowledge way more, i saw in Udemy a couple of courses “100 days of code” “python for data science and machine learning” and “python bootcamp from zero to hero” all 3 for arounf 68$ dollars, im more of a guy that learns by typing and messing with the program, you guys think those 3 udemy programs would be good or should i invest somewhere else? submitted by /u/Flimsy_Gur7966 [link] [comments] 

Hi guys! Im in love with python and so far i take p4e and the python crash course from coursera, i want to expand my little knowledge way more, i saw in Udemy a couple of courses “100 days of code” “python for data science and machine learning” and “python bootcamp from zero to hero” all 3 for arounf 68$ dollars, im more of a guy that learns by typing and messing with the program, you guys think those 3 udemy programs would be good or should i invest somewhere else?

submitted by /u/Flimsy_Gur7966
[link] [comments]  Hi guys! Im in love with python and so far i take p4e and the python crash course from coursera, i want to expand my little knowledge way more, i saw in Udemy a couple of courses “100 days of code” “python for data science and machine learning” and “python bootcamp from zero to hero” all 3 for arounf 68$ dollars, im more of a guy that learns by typing and messing with the program, you guys think those 3 udemy programs would be good or should i invest somewhere else? submitted by /u/Flimsy_Gur7966 [link] [comments]

Read more

I have a Intro to Programming Python exam soon. /u/Regaliceratops_888 Python Education

I have a Intro to Programming Python exam soon. /u/Regaliceratops_888 Python Education

It will be a semester 1 exam and I’m stressing pretty hard on it, although it will be open note. Any good study guides or resources online to help me prepare?

Are these Quizlet sets useful resources? https://quizlet.com/188060105/intro-to-python-final-exam-flash-cards/

https://quizlet.com/398381641/introduction-to-programming-final-exam-review-python-flash-cards/

submitted by /u/Regaliceratops_888
[link] [comments]

​r/learnpython It will be a semester 1 exam and I’m stressing pretty hard on it, although it will be open note. Any good study guides or resources online to help me prepare? Are these Quizlet sets useful resources? https://quizlet.com/188060105/intro-to-python-final-exam-flash-cards/ https://quizlet.com/398381641/introduction-to-programming-final-exam-review-python-flash-cards/ submitted by /u/Regaliceratops_888 [link] [comments] 

It will be a semester 1 exam and I’m stressing pretty hard on it, although it will be open note. Any good study guides or resources online to help me prepare?

Are these Quizlet sets useful resources? https://quizlet.com/188060105/intro-to-python-final-exam-flash-cards/

https://quizlet.com/398381641/introduction-to-programming-final-exam-review-python-flash-cards/

submitted by /u/Regaliceratops_888
[link] [comments]  It will be a semester 1 exam and I’m stressing pretty hard on it, although it will be open note. Any good study guides or resources online to help me prepare? Are these Quizlet sets useful resources? https://quizlet.com/188060105/intro-to-python-final-exam-flash-cards/ https://quizlet.com/398381641/introduction-to-programming-final-exam-review-python-flash-cards/ submitted by /u/Regaliceratops_888 [link] [comments]

Read more

why is my code for 21 game not working (the main problem is in try and except statement according to me) /u/Own-Recipe5931 Python Education

why is my code for 21 game not working (the main problem is in try and except statement according to me) /u/Own-Recipe5931 Python Education

import random startup = random.randint(1,3) print("My number is ", startup) check = startup + 3 nexnum = int(input("Enter your number n- ")) def check_round(): while nexnum > check: print("Your number is too highn Please try again") nexnum = int(input("-")) while nexnum or startup >= 21: try: nexnum = int (nexnum) check_round() except: print("Please enter a whole number") nexnum = int(input("-")) continue func1 = nexnum + 3 func2 = nexnum - 1 startup = random.randint(func2, func1) nexnum = int(input("Enter your next numbern-")) continue 

submitted by /u/Own-Recipe5931
[link] [comments]

​r/learnpython import random startup = random.randint(1,3) print(“My number is “, startup) check = startup + 3 nexnum = int(input(“Enter your number n- “)) def check_round(): while nexnum > check: print(“Your number is too highn Please try again”) nexnum = int(input(“-“)) while nexnum or startup >= 21: try: nexnum = int (nexnum) check_round() except: print(“Please enter a whole number”) nexnum = int(input(“-“)) continue func1 = nexnum + 3 func2 = nexnum – 1 startup = random.randint(func2, func1) nexnum = int(input(“Enter your next numbern-“)) continue submitted by /u/Own-Recipe5931 [link] [comments] 

import random startup = random.randint(1,3) print("My number is ", startup) check = startup + 3 nexnum = int(input("Enter your number n- ")) def check_round(): while nexnum > check: print("Your number is too highn Please try again") nexnum = int(input("-")) while nexnum or startup >= 21: try: nexnum = int (nexnum) check_round() except: print("Please enter a whole number") nexnum = int(input("-")) continue func1 = nexnum + 3 func2 = nexnum - 1 startup = random.randint(func2, func1) nexnum = int(input("Enter your next numbern-")) continue 

submitted by /u/Own-Recipe5931
[link] [comments]  import random startup = random.randint(1,3) print(“My number is “, startup) check = startup + 3 nexnum = int(input(“Enter your number n- “)) def check_round(): while nexnum > check: print(“Your number is too highn Please try again”) nexnum = int(input(“-“)) while nexnum or startup >= 21: try: nexnum = int (nexnum) check_round() except: print(“Please enter a whole number”) nexnum = int(input(“-“)) continue func1 = nexnum + 3 func2 = nexnum – 1 startup = random.randint(func2, func1) nexnum = int(input(“Enter your next numbern-“)) continue submitted by /u/Own-Recipe5931 [link] [comments]

Read more

Who here uses the following tech stack: FastAPI, HTMX, TailwindCSS, and Sqlite. /u/WynActTroph Python Education

Who here uses the following tech stack: FastAPI, HTMX, TailwindCSS, and Sqlite. /u/WynActTroph Python Education

How is your project doing performance wise? Is this tech stack suitable for a production ready micro saas product? Where did you deploy your project? What was the biggest hurdles you face when developing?

submitted by /u/WynActTroph
[link] [comments]

​r/learnpython How is your project doing performance wise? Is this tech stack suitable for a production ready micro saas product? Where did you deploy your project? What was the biggest hurdles you face when developing? submitted by /u/WynActTroph [link] [comments] 

How is your project doing performance wise? Is this tech stack suitable for a production ready micro saas product? Where did you deploy your project? What was the biggest hurdles you face when developing?

submitted by /u/WynActTroph
[link] [comments]  How is your project doing performance wise? Is this tech stack suitable for a production ready micro saas product? Where did you deploy your project? What was the biggest hurdles you face when developing? submitted by /u/WynActTroph [link] [comments]

Read more

[Reinforcement Learning] Policy iteration and value iteration taking a while to converge /u/gitgud_x Python Education

[Reinforcement Learning] Policy iteration and value iteration taking a while to converge /u/gitgud_x Python Education

Hi, I’m studying reinforcement learning, closely following the textbook by Sutton and Barlow (more so the tutorial videos by Mutual Information here).

I’ve tried to implement the simple example (from here) of an agent in a 4×4 gridworld. The agent can move within the grid (up/down/left/right) and two of the opposite diagonal squares are terminal states. Each step accrues reward -1, so the goal is to reach a terminal state as quickly as possible. My code that gets this done is below.

import numpy as np # 16 different states arranged in a 4x4 square grid # Two opposite corners are terminal states # The reward for every transition is always -1 (we want to minimize the number of steps) class GridWorld: ''' The states of the grid are enumerated as: [[ 0, 1, 2, 3], [ 4, 5, 6, 7], [ 8, 9, 10, 11], [12, 13, 14, 15]] The agent has 4 potential actions: - 0: up - 1: right - 2: down - 3: left unless the action would move the agent off the grid, in which case the agent remains in place. The agent receives a reward of -1 at each time step until it reaches a terminal state. There are two terminal states in the grid - states 0 and 15. ''' ACTIONS_DICT = {0: 'Up', 1: 'Right', 2: 'Down', 3: 'Left'} def __init__(self, size=4): self.size = size self.n_states = size * size self.n_actions = GridWorld.ACTIONS_DICT.__len__() # initialize a random policy - choose uniformly at random from the 4 actions in all states self.policy = np.ones(shape=[self.n_states, self.n_actions]) / self.n_actions # self.policy[s, a] = π(a | s) self.gamma = 1 # discount factor - no discounting def get_state_from_s(self, s: int): ''' Given an enumerated state s, return the coordinates of the state in the grid. ''' return (s // self.size, s % self.size) def get_action_from_a(self, a: int): ''' Given an enumerated action a, return the name of the action as a string. ''' return GridWorld.ACTIONS_DICT.get(a) def state_transition(self, s: int, a: int) -> tuple[int, int]: ''' Given the current state s and action a, return the next state s' and reward r. Samples from p(s', r | s, a). ''' x, y = self.get_state_from_s(s) if s == 0 or s == 15: return s, 0 if a == 0: x = max(0, x - 1) elif a == 1: y = min(self.size - 1, y + 1) elif a == 2: x = min(self.size - 1, x + 1) elif a == 3: y = max(0, y - 1) s_prime = x * self.size + y return s_prime, -1 def environment_model(self, s_new: int, r: int, s: int, a: int): # Deterministic environment # Returns the value of p(s', r | s, a). if self.state_transition(s, a) == (s_new, r): return 1 else: return 0 def policy_evaluation(self, num_sweeps: int = 0) -> np.ndarray: ''' Apply the Bellman equations for V and Q to estimate the value function at the current policy self.policy. ### Arguments #### Optional - `num_sweeps` (int, default = 0): number of iterations to run the policy evaluation for. If 0, run until convergence. ### Returns - `np.ndarray`: estimated value function V, where `V[s]` = V(s) ''' # initialize the state value function randomly self.V = np.random.random(self.n_states) # self.V[s] = V(s), the state value function self.Q = np.random.random(size=[self.n_states, self.n_actions]) # self.Q[s, a] = Q(s, a), the state-action value function self.V[0] = self.V[15] = 0 # set the value of the terminal states to 0 self.Q[0, :] = self.Q[15, :] = 0 # set the value of the terminal states to 0 sweep = 0 V_new = np.zeros(self.n_states) Q_new = np.zeros([self.n_states, self.n_actions]) while True: for s in range(self.n_states): if s == 0 or s == 15: pass # terminal states always have V(s) = 0 else: V_new[s] = sum(self.policy[s, a] *  sum(self.environment_model(s_prime, -1, s, a) * (-1 + self.gamma * self.V[s_prime])  for s_prime in range(self.n_states))  for a in range(self.n_actions)) for a in range(self.n_actions): Q_new[s, a] = sum(self.environment_model(s_prime, -1, s, a) * ( -1 + self.gamma * sum(self.policy[s_prime, a_prime] * self.Q[s_prime, a_prime]  for a_prime in range(self.n_actions)))  for s_prime in range(self.n_states)) sweep += 1 if (np.allclose(self.V, V_new) and np.allclose(self.Q, Q_new)) or sweep == num_sweeps: self.V = V_new self.Q = Q_new break else: self.V = V_new self.Q = Q_new def policy_improvement(self): ''' Update the policy to be greedy with respect to q(s, a). The new policy is deterministic rather than stochastic. ''' new_policy = np.zeros_like(self.policy) for s in range(self.n_states): a_opt = np.argmax(self.Q[s, :]) new_policy[s, a_opt] = 1 self.policy = new_policy def policy_iteration(self): ''' Perform policy iteration to find the optimal policy. ''' i = 0 new_policy = self.policy while True: self.policy_evaluation() # until convergence self.policy_improvement() if (self.policy == new_policy).all(): break else: new_policy = self.policy i += 1 print(f'Converged after {i} iterations.') def value_iteration(self): ''' Perform value iteration to find the optimal policy. ''' i = 0 new_policy = self.policy while True: self.policy_evaluation(num_sweeps=1) self.policy_improvement() if (self.policy == new_policy).all(): break else: new_policy = self.policy i += 1 print(f'Converged after {i} iterations.') 

Both policy iteration and value iteration converge to the optimal solution. However, it takes a long time.

grid = GridWorld() grid.value_iteration() # or use: grid.policy_iteration() print(grid.policy) 

I would think for this very simple situation, it should be a lot faster.

Policy iteration takes about 200 iterations (a few seconds). Value iteration takes about 10,000 iterations (around a minute).

Have I done something to make this code really inefficient? I’m not doing anything fancy here. Thanks for any advice!

submitted by /u/gitgud_x
[link] [comments]

​r/learnpython Hi, I’m studying reinforcement learning, closely following the textbook by Sutton and Barlow (more so the tutorial videos by Mutual Information here). I’ve tried to implement the simple example (from here) of an agent in a 4×4 gridworld. The agent can move within the grid (up/down/left/right) and two of the opposite diagonal squares are terminal states. Each step accrues reward -1, so the goal is to reach a terminal state as quickly as possible. My code that gets this done is below. import numpy as np # 16 different states arranged in a 4×4 square grid # Two opposite corners are terminal states # The reward for every transition is always -1 (we want to minimize the number of steps) class GridWorld: ”’ The states of the grid are enumerated as: [[ 0, 1, 2, 3], [ 4, 5, 6, 7], [ 8, 9, 10, 11], [12, 13, 14, 15]] The agent has 4 potential actions: – 0: up – 1: right – 2: down – 3: left unless the action would move the agent off the grid, in which case the agent remains in place. The agent receives a reward of -1 at each time step until it reaches a terminal state. There are two terminal states in the grid – states 0 and 15. ”’ ACTIONS_DICT = {0: ‘Up’, 1: ‘Right’, 2: ‘Down’, 3: ‘Left’} def __init__(self, size=4): self.size = size self.n_states = size * size self.n_actions = GridWorld.ACTIONS_DICT.__len__() # initialize a random policy – choose uniformly at random from the 4 actions in all states self.policy = np.ones(shape=[self.n_states, self.n_actions]) / self.n_actions # self.policy[s, a] = π(a | s) self.gamma = 1 # discount factor – no discounting def get_state_from_s(self, s: int): ”’ Given an enumerated state s, return the coordinates of the state in the grid. ”’ return (s // self.size, s % self.size) def get_action_from_a(self, a: int): ”’ Given an enumerated action a, return the name of the action as a string. ”’ return GridWorld.ACTIONS_DICT.get(a) def state_transition(self, s: int, a: int) -> tuple[int, int]: ”’ Given the current state s and action a, return the next state s’ and reward r. Samples from p(s’, r | s, a). ”’ x, y = self.get_state_from_s(s) if s == 0 or s == 15: return s, 0 if a == 0: x = max(0, x – 1) elif a == 1: y = min(self.size – 1, y + 1) elif a == 2: x = min(self.size – 1, x + 1) elif a == 3: y = max(0, y – 1) s_prime = x * self.size + y return s_prime, -1 def environment_model(self, s_new: int, r: int, s: int, a: int): # Deterministic environment # Returns the value of p(s’, r | s, a). if self.state_transition(s, a) == (s_new, r): return 1 else: return 0 def policy_evaluation(self, num_sweeps: int = 0) -> np.ndarray: ”’ Apply the Bellman equations for V and Q to estimate the value function at the current policy self.policy. ### Arguments #### Optional – `num_sweeps` (int, default = 0): number of iterations to run the policy evaluation for. If 0, run until convergence. ### Returns – `np.ndarray`: estimated value function V, where `V[s]` = V(s) ”’ # initialize the state value function randomly self.V = np.random.random(self.n_states) # self.V[s] = V(s), the state value function self.Q = np.random.random(size=[self.n_states, self.n_actions]) # self.Q[s, a] = Q(s, a), the state-action value function self.V[0] = self.V[15] = 0 # set the value of the terminal states to 0 self.Q[0, :] = self.Q[15, :] = 0 # set the value of the terminal states to 0 sweep = 0 V_new = np.zeros(self.n_states) Q_new = np.zeros([self.n_states, self.n_actions]) while True: for s in range(self.n_states): if s == 0 or s == 15: pass # terminal states always have V(s) = 0 else: V_new[s] = sum(self.policy[s, a] * sum(self.environment_model(s_prime, -1, s, a) * (-1 + self.gamma * self.V[s_prime]) for s_prime in range(self.n_states)) for a in range(self.n_actions)) for a in range(self.n_actions): Q_new[s, a] = sum(self.environment_model(s_prime, -1, s, a) * ( -1 + self.gamma * sum(self.policy[s_prime, a_prime] * self.Q[s_prime, a_prime] for a_prime in range(self.n_actions))) for s_prime in range(self.n_states)) sweep += 1 if (np.allclose(self.V, V_new) and np.allclose(self.Q, Q_new)) or sweep == num_sweeps: self.V = V_new self.Q = Q_new break else: self.V = V_new self.Q = Q_new def policy_improvement(self): ”’ Update the policy to be greedy with respect to q(s, a). The new policy is deterministic rather than stochastic. ”’ new_policy = np.zeros_like(self.policy) for s in range(self.n_states): a_opt = np.argmax(self.Q[s, :]) new_policy[s, a_opt] = 1 self.policy = new_policy def policy_iteration(self): ”’ Perform policy iteration to find the optimal policy. ”’ i = 0 new_policy = self.policy while True: self.policy_evaluation() # until convergence self.policy_improvement() if (self.policy == new_policy).all(): break else: new_policy = self.policy i += 1 print(f’Converged after {i} iterations.’) def value_iteration(self): ”’ Perform value iteration to find the optimal policy. ”’ i = 0 new_policy = self.policy while True: self.policy_evaluation(num_sweeps=1) self.policy_improvement() if (self.policy == new_policy).all(): break else: new_policy = self.policy i += 1 print(f’Converged after {i} iterations.’) Both policy iteration and value iteration converge to the optimal solution. However, it takes a long time. grid = GridWorld() grid.value_iteration() # or use: grid.policy_iteration() print(grid.policy) I would think for this very simple situation, it should be a lot faster. Policy iteration takes about 200 iterations (a few seconds). Value iteration takes about 10,000 iterations (around a minute). Have I done something to make this code really inefficient? I’m not doing anything fancy here. Thanks for any advice! submitted by /u/gitgud_x [link] [comments] 

Hi, I’m studying reinforcement learning, closely following the textbook by Sutton and Barlow (more so the tutorial videos by Mutual Information here).

I’ve tried to implement the simple example (from here) of an agent in a 4×4 gridworld. The agent can move within the grid (up/down/left/right) and two of the opposite diagonal squares are terminal states. Each step accrues reward -1, so the goal is to reach a terminal state as quickly as possible. My code that gets this done is below.

import numpy as np # 16 different states arranged in a 4x4 square grid # Two opposite corners are terminal states # The reward for every transition is always -1 (we want to minimize the number of steps) class GridWorld: ''' The states of the grid are enumerated as: [[ 0, 1, 2, 3], [ 4, 5, 6, 7], [ 8, 9, 10, 11], [12, 13, 14, 15]] The agent has 4 potential actions: - 0: up - 1: right - 2: down - 3: left unless the action would move the agent off the grid, in which case the agent remains in place. The agent receives a reward of -1 at each time step until it reaches a terminal state. There are two terminal states in the grid - states 0 and 15. ''' ACTIONS_DICT = {0: 'Up', 1: 'Right', 2: 'Down', 3: 'Left'} def __init__(self, size=4): self.size = size self.n_states = size * size self.n_actions = GridWorld.ACTIONS_DICT.__len__() # initialize a random policy - choose uniformly at random from the 4 actions in all states self.policy = np.ones(shape=[self.n_states, self.n_actions]) / self.n_actions # self.policy[s, a] = π(a | s) self.gamma = 1 # discount factor - no discounting def get_state_from_s(self, s: int): ''' Given an enumerated state s, return the coordinates of the state in the grid. ''' return (s // self.size, s % self.size) def get_action_from_a(self, a: int): ''' Given an enumerated action a, return the name of the action as a string. ''' return GridWorld.ACTIONS_DICT.get(a) def state_transition(self, s: int, a: int) -> tuple[int, int]: ''' Given the current state s and action a, return the next state s' and reward r. Samples from p(s', r | s, a). ''' x, y = self.get_state_from_s(s) if s == 0 or s == 15: return s, 0 if a == 0: x = max(0, x - 1) elif a == 1: y = min(self.size - 1, y + 1) elif a == 2: x = min(self.size - 1, x + 1) elif a == 3: y = max(0, y - 1) s_prime = x * self.size + y return s_prime, -1 def environment_model(self, s_new: int, r: int, s: int, a: int): # Deterministic environment # Returns the value of p(s', r | s, a). if self.state_transition(s, a) == (s_new, r): return 1 else: return 0 def policy_evaluation(self, num_sweeps: int = 0) -> np.ndarray: ''' Apply the Bellman equations for V and Q to estimate the value function at the current policy self.policy. ### Arguments #### Optional - `num_sweeps` (int, default = 0): number of iterations to run the policy evaluation for. If 0, run until convergence. ### Returns - `np.ndarray`: estimated value function V, where `V[s]` = V(s) ''' # initialize the state value function randomly self.V = np.random.random(self.n_states) # self.V[s] = V(s), the state value function self.Q = np.random.random(size=[self.n_states, self.n_actions]) # self.Q[s, a] = Q(s, a), the state-action value function self.V[0] = self.V[15] = 0 # set the value of the terminal states to 0 self.Q[0, :] = self.Q[15, :] = 0 # set the value of the terminal states to 0 sweep = 0 V_new = np.zeros(self.n_states) Q_new = np.zeros([self.n_states, self.n_actions]) while True: for s in range(self.n_states): if s == 0 or s == 15: pass # terminal states always have V(s) = 0 else: V_new[s] = sum(self.policy[s, a] *  sum(self.environment_model(s_prime, -1, s, a) * (-1 + self.gamma * self.V[s_prime])  for s_prime in range(self.n_states))  for a in range(self.n_actions)) for a in range(self.n_actions): Q_new[s, a] = sum(self.environment_model(s_prime, -1, s, a) * ( -1 + self.gamma * sum(self.policy[s_prime, a_prime] * self.Q[s_prime, a_prime]  for a_prime in range(self.n_actions)))  for s_prime in range(self.n_states)) sweep += 1 if (np.allclose(self.V, V_new) and np.allclose(self.Q, Q_new)) or sweep == num_sweeps: self.V = V_new self.Q = Q_new break else: self.V = V_new self.Q = Q_new def policy_improvement(self): ''' Update the policy to be greedy with respect to q(s, a). The new policy is deterministic rather than stochastic. ''' new_policy = np.zeros_like(self.policy) for s in range(self.n_states): a_opt = np.argmax(self.Q[s, :]) new_policy[s, a_opt] = 1 self.policy = new_policy def policy_iteration(self): ''' Perform policy iteration to find the optimal policy. ''' i = 0 new_policy = self.policy while True: self.policy_evaluation() # until convergence self.policy_improvement() if (self.policy == new_policy).all(): break else: new_policy = self.policy i += 1 print(f'Converged after {i} iterations.') def value_iteration(self): ''' Perform value iteration to find the optimal policy. ''' i = 0 new_policy = self.policy while True: self.policy_evaluation(num_sweeps=1) self.policy_improvement() if (self.policy == new_policy).all(): break else: new_policy = self.policy i += 1 print(f'Converged after {i} iterations.') 

Both policy iteration and value iteration converge to the optimal solution. However, it takes a long time.

grid = GridWorld() grid.value_iteration() # or use: grid.policy_iteration() print(grid.policy) 

I would think for this very simple situation, it should be a lot faster.

Policy iteration takes about 200 iterations (a few seconds). Value iteration takes about 10,000 iterations (around a minute).

Have I done something to make this code really inefficient? I’m not doing anything fancy here. Thanks for any advice!

submitted by /u/gitgud_x
[link] [comments]  Hi, I’m studying reinforcement learning, closely following the textbook by Sutton and Barlow (more so the tutorial videos by Mutual Information here). I’ve tried to implement the simple example (from here) of an agent in a 4×4 gridworld. The agent can move within the grid (up/down/left/right) and two of the opposite diagonal squares are terminal states. Each step accrues reward -1, so the goal is to reach a terminal state as quickly as possible. My code that gets this done is below. import numpy as np # 16 different states arranged in a 4×4 square grid # Two opposite corners are terminal states # The reward for every transition is always -1 (we want to minimize the number of steps) class GridWorld: ”’ The states of the grid are enumerated as: [[ 0, 1, 2, 3], [ 4, 5, 6, 7], [ 8, 9, 10, 11], [12, 13, 14, 15]] The agent has 4 potential actions: – 0: up – 1: right – 2: down – 3: left unless the action would move the agent off the grid, in which case the agent remains in place. The agent receives a reward of -1 at each time step until it reaches a terminal state. There are two terminal states in the grid – states 0 and 15. ”’ ACTIONS_DICT = {0: ‘Up’, 1: ‘Right’, 2: ‘Down’, 3: ‘Left’} def __init__(self, size=4): self.size = size self.n_states = size * size self.n_actions = GridWorld.ACTIONS_DICT.__len__() # initialize a random policy – choose uniformly at random from the 4 actions in all states self.policy = np.ones(shape=[self.n_states, self.n_actions]) / self.n_actions # self.policy[s, a] = π(a | s) self.gamma = 1 # discount factor – no discounting def get_state_from_s(self, s: int): ”’ Given an enumerated state s, return the coordinates of the state in the grid. ”’ return (s // self.size, s % self.size) def get_action_from_a(self, a: int): ”’ Given an enumerated action a, return the name of the action as a string. ”’ return GridWorld.ACTIONS_DICT.get(a) def state_transition(self, s: int, a: int) -> tuple[int, int]: ”’ Given the current state s and action a, return the next state s’ and reward r. Samples from p(s’, r | s, a). ”’ x, y = self.get_state_from_s(s) if s == 0 or s == 15: return s, 0 if a == 0: x = max(0, x – 1) elif a == 1: y = min(self.size – 1, y + 1) elif a == 2: x = min(self.size – 1, x + 1) elif a == 3: y = max(0, y – 1) s_prime = x * self.size + y return s_prime, -1 def environment_model(self, s_new: int, r: int, s: int, a: int): # Deterministic environment # Returns the value of p(s’, r | s, a). if self.state_transition(s, a) == (s_new, r): return 1 else: return 0 def policy_evaluation(self, num_sweeps: int = 0) -> np.ndarray: ”’ Apply the Bellman equations for V and Q to estimate the value function at the current policy self.policy. ### Arguments #### Optional – `num_sweeps` (int, default = 0): number of iterations to run the policy evaluation for. If 0, run until convergence. ### Returns – `np.ndarray`: estimated value function V, where `V[s]` = V(s) ”’ # initialize the state value function randomly self.V = np.random.random(self.n_states) # self.V[s] = V(s), the state value function self.Q = np.random.random(size=[self.n_states, self.n_actions]) # self.Q[s, a] = Q(s, a), the state-action value function self.V[0] = self.V[15] = 0 # set the value of the terminal states to 0 self.Q[0, :] = self.Q[15, :] = 0 # set the value of the terminal states to 0 sweep = 0 V_new = np.zeros(self.n_states) Q_new = np.zeros([self.n_states, self.n_actions]) while True: for s in range(self.n_states): if s == 0 or s == 15: pass # terminal states always have V(s) = 0 else: V_new[s] = sum(self.policy[s, a] * sum(self.environment_model(s_prime, -1, s, a) * (-1 + self.gamma * self.V[s_prime]) for s_prime in range(self.n_states)) for a in range(self.n_actions)) for a in range(self.n_actions): Q_new[s, a] = sum(self.environment_model(s_prime, -1, s, a) * ( -1 + self.gamma * sum(self.policy[s_prime, a_prime] * self.Q[s_prime, a_prime] for a_prime in range(self.n_actions))) for s_prime in range(self.n_states)) sweep += 1 if (np.allclose(self.V, V_new) and np.allclose(self.Q, Q_new)) or sweep == num_sweeps: self.V = V_new self.Q = Q_new break else: self.V = V_new self.Q = Q_new def policy_improvement(self): ”’ Update the policy to be greedy with respect to q(s, a). The new policy is deterministic rather than stochastic. ”’ new_policy = np.zeros_like(self.policy) for s in range(self.n_states): a_opt = np.argmax(self.Q[s, :]) new_policy[s, a_opt] = 1 self.policy = new_policy def policy_iteration(self): ”’ Perform policy iteration to find the optimal policy. ”’ i = 0 new_policy = self.policy while True: self.policy_evaluation() # until convergence self.policy_improvement() if (self.policy == new_policy).all(): break else: new_policy = self.policy i += 1 print(f’Converged after {i} iterations.’) def value_iteration(self): ”’ Perform value iteration to find the optimal policy. ”’ i = 0 new_policy = self.policy while True: self.policy_evaluation(num_sweeps=1) self.policy_improvement() if (self.policy == new_policy).all(): break else: new_policy = self.policy i += 1 print(f’Converged after {i} iterations.’) Both policy iteration and value iteration converge to the optimal solution. However, it takes a long time. grid = GridWorld() grid.value_iteration() # or use: grid.policy_iteration() print(grid.policy) I would think for this very simple situation, it should be a lot faster. Policy iteration takes about 200 iterations (a few seconds). Value iteration takes about 10,000 iterations (around a minute). Have I done something to make this code really inefficient? I’m not doing anything fancy here. Thanks for any advice! submitted by /u/gitgud_x [link] [comments]

Read more

Why does this happen? /u/Influence-Various Python Education

Why does this happen? /u/Influence-Various Python Education

I am doing exercises for my Programming Fundamentals class and these codes are intended to check whether a string contains any lowercase letters, but something is wrong.

Here is the code:

def any_lowercase3(s):
for c in s:
flag=c.islower()
return flag

I understand that it is wrong because this will only check the last letter of the string and return that, but I don’t understand why.

This class is my first experience in Python and coding as a whole since the HTML MySpace days so I am struggling a bit.

submitted by /u/Influence-Various
[link] [comments]

​r/learnpython I am doing exercises for my Programming Fundamentals class and these codes are intended to check whether a string contains any lowercase letters, but something is wrong. Here is the code: def any_lowercase3(s): for c in s: flag=c.islower() return flag I understand that it is wrong because this will only check the last letter of the string and return that, but I don’t understand why. This class is my first experience in Python and coding as a whole since the HTML MySpace days so I am struggling a bit. submitted by /u/Influence-Various [link] [comments] 

I am doing exercises for my Programming Fundamentals class and these codes are intended to check whether a string contains any lowercase letters, but something is wrong.

Here is the code:

def any_lowercase3(s):
for c in s:
flag=c.islower()
return flag

I understand that it is wrong because this will only check the last letter of the string and return that, but I don’t understand why.

This class is my first experience in Python and coding as a whole since the HTML MySpace days so I am struggling a bit.

submitted by /u/Influence-Various
[link] [comments]  I am doing exercises for my Programming Fundamentals class and these codes are intended to check whether a string contains any lowercase letters, but something is wrong. Here is the code: def any_lowercase3(s): for c in s: flag=c.islower() return flag I understand that it is wrong because this will only check the last letter of the string and return that, but I don’t understand why. This class is my first experience in Python and coding as a whole since the HTML MySpace days so I am struggling a bit. submitted by /u/Influence-Various [link] [comments]

Read more

Moving to previous inputs /u/Less-Ad1160 Python Education

Moving to previous inputs /u/Less-Ad1160 Python Education

What if i wanted to make an input move to one in a previous part of my code. Like you go north (loop/input one) then (loop/input 2) you go south and go back to (input/loop one). Would there be some way to do that?

submitted by /u/Less-Ad1160
[link] [comments]

​r/learnpython What if i wanted to make an input move to one in a previous part of my code. Like you go north (loop/input one) then (loop/input 2) you go south and go back to (input/loop one). Would there be some way to do that? submitted by /u/Less-Ad1160 [link] [comments] 

What if i wanted to make an input move to one in a previous part of my code. Like you go north (loop/input one) then (loop/input 2) you go south and go back to (input/loop one). Would there be some way to do that?

submitted by /u/Less-Ad1160
[link] [comments]  What if i wanted to make an input move to one in a previous part of my code. Like you go north (loop/input one) then (loop/input 2) you go south and go back to (input/loop one). Would there be some way to do that? submitted by /u/Less-Ad1160 [link] [comments]

Read more