# Langchain Quickstart openAI

Reference: [https://python.langchain.com/docs/get_started/quickstart](https://python.langchain.com/docs/get_started/quickstart)

```{admonition} What you will learn
* Query openAI chatGPT
* Defining prompts to be passed to LLMs
* Formating the answers from LLMs
* Building Langchain pipelines
* Implement a simple Retrieval Augmented Generation (RAG) approach
    * Loading text from webpage, segmentation of the text into chunks and transforming text-chunks into vectors (embeddings)
    * Query a vector database (FAISS) and retrieve relevant documents
    * Pass documents, which are relevant for the query, as context information to openai chatGPT 
```

## Basic LLM Usage for Question Answering
### Most Basic Approach

In [1]:
#!pip install langchain-openai

In [84]:
#%env OPENAI_API_KEY=sk-...rZt  #This is how to permanently store your API-Key. Note: Without " "

In [85]:
import os
import openai
from langchain_openai import ChatOpenAI

In [86]:
openai.api_key=os.environ["OPENAI_API_KEY"]
#openai.api_key

In [8]:
llm = ChatOpenAI()

In [10]:
llm.invoke("how can langsmith help with testing?")

AIMessage(content='Langsmith can help with testing in the following ways:\n\n1. Automated testing: Langsmith can be used to write scripts and test cases for automated testing of software applications. This can help in quickly and efficiently testing the functionality of the software.\n\n2. Test data generation: Langsmith can be used to generate test data for different scenarios, allowing testers to validate the behavior of the software under various conditions.\n\n3. Performance testing: Langsmith can be used to write scripts for performance testing of software applications, helping to identify and resolve performance issues.\n\n4. Integration testing: Langsmith can be used to write scripts for testing the integration of different components or systems, ensuring that they work together as expected.\n\n5. Regression testing: Langsmith can be used to automate regression testing, ensuring that new code changes do not introduce any new bugs or issues in the software.\n\nOverall, Langsmith 

### Create simple Pipeline
In contrast to the previous *most basic approach* we now add a system prompt. For this we apply LangChain's `ChatPromptTemplate`-class. Moreover, the LLM's answer, shall be rendered in a better way, which is done using the `StrOutputParser`-class.

In [11]:
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.output_parsers import StrOutputParser
prompt = ChatPromptTemplate.from_messages([
    ("system", "You are world class technical documentation writer."),
    ("user", "{input}")
])

In [12]:
output_parser = StrOutputParser()

We create a pipeline, which consists of the prompt, the LLM and the output-parser:

In [13]:
chain = prompt | llm | output_parser

### Query

The `invoke()`-method is now called for the pipeline-object:

In [14]:
print(chain.invoke({"input": "how can langsmith help with testing?"}))

Langsmith can help with testing in a variety of ways, including:

1. Test Automation: Langsmith can be used to automate testing processes, such as unit testing, integration testing, and end-to-end testing. By writing test scripts in Langsmith, you can ensure that your code is thoroughly tested and free of bugs.

2. Performance Testing: Langsmith can also be used for performance testing, such as load testing and stress testing. By simulating large numbers of users or heavy traffic on your application, you can identify performance bottlenecks and optimize your code accordingly.

3. Data Generation: Langsmith can be used to generate test data for your application. By creating realistic data sets with Langsmith, you can ensure that your tests are comprehensive and cover a wide range of scenarios.

4. Integration Testing: Langsmith can help with integration testing by simulating interactions between different components of your application. By writing integration tests in Langsmith, you can

## Basic RAG Usage
In the previous subsection it has been shown, how a LLM can be applied for question-answering. Now, we like to apply Retrieval Augmented Generation (RAG) for question answering. The RAG system integrates a LLM, but in contrast to the previously described basic usage, in RAG more context information is passed to the LLM. The corresponding answer of the LLM then not only depends on the data on which the LLM has been trained on, but also on external knowledge from documents, provided by the user. This external knowledge is passed as context to the LLM, together with the query. The external knowledge, which is used as context, certainly depends on the user's query. Therefore, the query is first passed to a vector-database, which returns the most relevant documents for the given query. These relevant documents are used as context.

Below we

1. Collect external documents from the web
2. Segment these documents into chunks
3. Calculate an embedding (a vector) for each chunk
4. Store the chunk-embeddings in a vector DB.

### Collect Documents for External Database

By applying the LangChain `WebBaseLoader`-class the content of one or several webpages can be downloaded as shown in the code-cell below. For downloading multiple pages the corresponding Urls must be passed as a list.

In [15]:
from langchain_community.document_loaders import WebBaseLoader
loader = WebBaseLoader("https://docs.smith.langchain.com/user_guide",encoding="utf-8")

docs = loader.load()

USER_AGENT environment variable not set, consider setting it to identify your requests.


The length of the returned datastructure is 1, since we loaded only a single page:

In [22]:
len(docs)

1

For each downloaded page we can now access the `page_content` and the page's `metadata` as shown below:

In [17]:
print(docs[0].page_content)






LangSmith User Guide | ü¶úÔ∏èüõ†Ô∏è LangSmith







Skip to main contentGo to API DocsSearchRegionUSEUGo to AppQuick StartUser GuideTracingEvaluationProduction Monitoring & AutomationsPrompt HubProxyPricingSelf-HostingCookbookThis is outdated documentation for ü¶úÔ∏èüõ†Ô∏è LangSmith, which is no longer actively maintained.For up-to-date documentation, see the latest version.User GuideOn this pageLangSmith User GuideLangSmith is a platform for LLM application development, monitoring, and testing. In this guide, we‚Äôll highlight the breadth of workflows LangSmith supports and how they fit into each stage of the application development lifecycle. We hope this will inform users how to best utilize this powerful platform or give them something to consider if they‚Äôre just starting their journey.Prototyping‚ÄãPrototyping LLM applications often involves quick experimentation between prompts, model types, retrieval strategy and other parameters.
The ability to rapidly understand ho

In [18]:
docs[0].metadata

{'source': 'https://docs.smith.langchain.com/user_guide',
 'title': 'LangSmith User Guide | ü¶úÔ∏èüõ†Ô∏è LangSmith',
 'description': 'LangSmith is a platform for LLM application development, monitoring, and testing. In this guide, we‚Äôll highlight the breadth of workflows LangSmith supports and how they fit into each stage of the application development lifecycle. We hope this will inform users how to best utilize this powerful platform or give them something to consider if they‚Äôre just starting their journey.',
 'language': 'en'}

### Chunking

In [62]:
from langchain_text_splitters import RecursiveCharacterTextSplitter
text_splitter = RecursiveCharacterTextSplitter(chunk_size=2000, chunk_overlap=0)
chunks = text_splitter.split_documents(docs)

Let's have a look to the chunks:

In [63]:
len(chunks)

6

In [64]:
chunks[0].page_content

'LangSmith User Guide | ü¶úÔ∏èüõ†Ô∏è LangSmith'

In [65]:
chunks[1].page_content

'Skip to main contentGo to API DocsSearchRegionUSEUGo to AppQuick StartUser GuideTracingEvaluationProduction Monitoring & AutomationsPrompt HubProxyPricingSelf-HostingCookbookThis is outdated documentation for ü¶úÔ∏èüõ†Ô∏è LangSmith, which is no longer actively maintained.For up-to-date documentation, see the latest version.User GuideOn this pageLangSmith User GuideLangSmith is a platform for LLM application development, monitoring, and testing. In this guide, we‚Äôll highlight the breadth of workflows LangSmith supports and how they fit into each stage of the application development lifecycle. We hope this will inform users how to best utilize this powerful platform or give them something to consider if they‚Äôre just starting their journey.Prototyping\u200bPrototyping LLM applications often involves quick experimentation between prompts, model types, retrieval strategy and other parameters.\nThe ability to rapidly understand how the model is performing ‚Äî and debug where it is fai

In [66]:
chunks[2].page_content

'We provide native rendering of chat messages, functions, and retrieve documents.Initial Test Set\u200bWhile many developers still ship an initial version of their application based on ‚Äúvibe checks‚Äù, we‚Äôve seen an increasing number of engineering teams start to adopt a more test driven approach. LangSmith allows developers to create datasets, which are collections of inputs and reference outputs, and use these to run tests on their LLM applications.\nThese test cases can be uploaded in bulk, created on the fly, or exported from application traces. LangSmith also makes it easy to run custom evaluations (both LLM and heuristic based) to score test results.Comparison View\u200bWhen prototyping different versions of your applications and making changes, it‚Äôs important to see whether or not you‚Äôve regressed with respect to your initial test cases.\nOftentimes, changes in the prompt, retrieval strategy, or model choice can have huge implications in responses produced by your applic

In [67]:
chunks[3].page_content

"Every playground run is logged in the system and can be used to create test cases or compare with other runs.Beta Testing\u200bBeta testing allows developers to collect more data on how their LLM applications are performing in real-world scenarios. In this phase, it‚Äôs important to develop an understanding for the types of inputs the app is performing well or poorly on and how exactly it‚Äôs breaking down in those cases. Both feedback collection and run annotation are critical for this workflow. This will help in curation of test cases that can help track regressions/improvements and development of automatic evaluations.Capturing Feedback\u200bWhen launching your application to an initial set of users, it‚Äôs important to gather human feedback on the responses it‚Äôs producing. This helps draw attention to the most interesting runs and highlight edge cases that are causing problematic responses. LangSmith allows you to attach feedback scores to logged traces (oftentimes, this is hook

### Embedding of chunks and storing in Vector DB

In [75]:
#!pip install faiss-cpu
#!pip install faiss-gpu

In [76]:
from langchain_openai import OpenAIEmbeddings
embeddings = OpenAIEmbeddings()

In [77]:
from langchain_community.vectorstores import FAISS
vector = FAISS.from_documents(chunks, embeddings)

Now, we have inserted our external documents (actually only one webpage) into the vector database. The RAG system is now ready to be used for question answering.

### Create Prompt
First, we define a general prompt template and a *document chain*, which consists of the prompt template and the LLM.

In [78]:
from langchain.chains.combine_documents import create_stuff_documents_chain

prompt = ChatPromptTemplate.from_template("""Answer the following question based only on the provided context:

<context>
{context}
</context>

Question: {input}""")

document_chain = create_stuff_documents_chain(llm, prompt)

The next code-cell is just for testing the chain, which has been defined above. In this test we just apply a dummy-text as context. This dummy-text will later be replaced by the vector DB's answer on our query.

In [79]:
from langchain_core.documents import Document

document_chain.invoke({
    "input": "how can langsmith help with testing?",
    "context": [Document(page_content="langsmith can let you visualize test results")]
})

'Langsmith can help with testing by allowing you to visualize test results.'

After testing the *document chain*, we now define a *retrieval chain*, which consists of the vector DB and the already defined document chain. This retrieval chain constitutes the entire RAG system.

In [80]:
from langchain.chains import create_retrieval_chain

retriever = vector.as_retriever()
retrieval_chain = create_retrieval_chain(retriever, document_chain)

Next, we send a query to the RAG system. This means that
1. the query is send to the vector DB
2. the vector DB returns the most relevant documents for the given query. For this
    1. the embedding-vector of the query is calculated
    2. the similarity between the query's embedding-vector and the embedding-vectors of all chunks in the DB is calculated.
    3. The chunk, whose embedding-vector is most similar to the query's embedding-vector is returned. 
4. the returned relevant chunk is being passed as context together with the query to the LLM
5. the LLM returns the answer on the query, taking into account the provided context

In [81]:
response = retrieval_chain.invoke({"input": "how can langsmith help with testing?"})
print(response["answer"])

LangSmith allows developers to create datasets, which are collections of inputs and reference outputs, and use these to run tests on their LLM applications. Test cases can be uploaded in bulk, created on the fly, or exported from application traces. LangSmith also makes it easy to run custom evaluations (both LLM and heuristic based) to score test results. Additionally, LangSmith provides a comparison view for test runs to track and diagnose regressions in test scores across multiple revisions of an application. The platform also offers a playground environment for rapid iteration and experimentation, allowing developers to quickly test out different prompts and models.
