Skip to main content

BaseScorer

class BaseScorer(ABC):

score

def score(self, reference_prompt: str, candidate_prompt: str):
Compare a candidate prompt against a reference prompt and return a tuple of scores. The higher the score, the better. For example, (diversity, difficulty, feasibility).

MathScorer

class MathScorer(BaseScorer):

init

def __init__(self, agent: Optional[ChatAgent] = None):

score

def score(self, reference_problem: str, new_problem: str):
Evaluates the new math problem relative to the reference math problem. Parameters:
  • reference_problem (str): The reference math problem.
  • new_problem (str): The new or evolved math problem.
Returns: Dict[str, int]: A dictionary with scores for diversity, difficulty, validity, and solvability.

GeneralScorer

class GeneralScorer(BaseScorer):

init

def __init__(self, agent: Optional[ChatAgent] = None):

score

def score(self, reference_problem: str, new_problem: str):
Evaluates the new problem against the reference problem using structured scoring. Parameters:
  • reference_problem (str): The original problem.
  • new_problem (str): The evolved or new problem.
Returns: Dict[str, int]: A dictionary with scores for diversity, complexity, and validity.