Skip to main content

Faithfulness

The faithfulness metric measures the quality of your RAG pipeline's generator by evaluating whether the actual_output factually aligns with the contents of your retrieval_context. deepeval's faithfulness metric is a self-explaining LLM-Eval, meaning it outputs a reason for its metric score.

info

Although similar to the HallucinationMetric, the faithfulness metric in deepeval is more concerned with hallucination in RAG pipelines, rather than the actual LLM itself.

Required Arguments

To use the FaithfulnessMetric, you'll have to provide the following arguments when creating an LLMTestCase:

  • input
  • actual_output
  • retrieval_context

Example

from deepeval import evaluate
from deepeval.metrics import FaithfulnessMetric
from deepeval.test_case import LLMTestCase

# Replace this with the actual output from your LLM application
actual_output = "We offer a 30-day full refund at no extra cost."

# Replace this with the actual retrieved context from your RAG pipeline
retrieval_context = ["All customers are eligible for a 30 day full refund at no extra cost."]

metric = FaithfulnessMetric(
threshold=0.7,
model="gpt-4",
include_reason=True
)
test_case = LLMTestCase(
input="What if these shoes don't fit?",
actual_output=actual_output,
retrieval_context=retrieval_context
)

metric.measure(test_case)
print(metric.score)
print(metric.reason)

# or evaluate test cases in bulk
evaluate([test_case], [metric])

There are three optional parameters when creating a FaithfulnessMetric:

  • [Optional] threshold: a float representing the minimum passing threshold, defaulted to 0.5.
  • [Optional] model: a string specifying which of OpenAI's GPT models to use, OR any one of langchain's chat models of type BaseChatModel. Defaulted to 'gpt-4-1106-preview'.
  • [Optional] include_reason: a boolean which when set to True, will include a reason for its evaluation score. Defaulted to True.

You can also choose to fallback to Ragas' faithfulness metric (which has a similar implemention). This however is not capable of generating a reason.

from deepeval.metrics import RAGASFaithfulnessMetric