# Langchain Quickstart Llama3 with Ollama

This is an adaptation from Langchain Quickstart [https://python.langchain.com/docs/get_started/quickstart](https://python.langchain.com/docs/get_started/quickstart)

```{admonition} What you will learn
* Exactly the same as in the [previous subsection](01_langchain_quickstart_openAI.ipynb), however now we run the *Llamma 3* LLM and the embedding on the local machine
* Apply [ollama](https://ollama.com) for running LLM applications locally.
```

## Ollama
[Ollama](https://ollama.com) is an open-source project which allows to easily run LLMs locally. Sota LLMs like Llamma 3, Gemma 2, Mistral or Phi 3 can be applied through Ollama.

1. **Installation:** Go to [https://ollama.com](https://ollama.com), download Ollama and follow the in installation instructions.
2. **Download LLM:** After the installation process type `ollama run llama3` into your terminal or shell. This will download *llama3* to your local disk (if it is not yet there). In the same way you can also download any other LLM, which is provided by Ollama.
3. **Query/Chat:** The command `ollama run llama3` will start the chatbot. Enter your question.

<figure align="center">
<img width="600" src="https://maucher.home.hdm-stuttgart.de/Pics/ollamaShell.png">
<figcaption><b>Figure:</b>Run and query llama3 through Ollama</figcaption>
</figure>

Type `/?` into the shell in order to get information about all possible Ollama commands.

In this section however, we do not focus on the command-line usage of Ollama, but on how it will be accessed within a Python script or a Jupyter Notebook. This is demonstrated below.

## Basic LLM Usage for Question Answering

See also: [https://ollama.com/blog/llama3](https://ollama.com/blog/llama3)

In [1]:
from langchain_community.chat_models import ChatOllama

In [2]:
llm = ChatOllama(model="llama3")

In [3]:
llm.invoke("how can langsmith help with testing?")



### Create simple Pipeline

In [4]:
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.output_parsers import StrOutputParser
prompt = ChatPromptTemplate.from_messages([
    ("system", "You are world class technical documentation writer."),
    ("user", "{input}")
])

In [5]:
output_parser = StrOutputParser()

In [6]:
chain = prompt | llm | output_parser

### Query

In [7]:
chain.invoke({"input": "how can langsmith help with testing?"})

"As a Langsmith, I can assist with testing by:\n\n1. **Automating manual testing**: By generating test data and executing it through APIs, I can automate the testing process, reducing manual effort and increasing efficiency.\n2. **Providing test scenarios and cases**: I can help generate test scenarios and cases based on the requirements and specifications, ensuring that the tests are thorough and comprehensive.\n3. **Validating API responses**: I can validate API responses by comparing them with expected results, helping to ensure that APIs are functioning correctly.\n4. **Testing UI interactions**: I can simulate user interactions with a UI, such as clicking buttons or filling out forms, to test how different scenarios affect the application's behavior.\n5. **Detecting bugs and errors**: By analyzing code and generating test cases, I can help identify potential bugs and errors before they become major issues.\n6. **Creating test data sets**: I can generate test data sets that are rel

## Basic RAG Usage
In the previous subsection it has been shown, how a LLM can be applied for question-answering. Now, we like to apply Retrieval Augmented Generation (RAG) for question answering. The RAG system integrates a LLM, but in contrast to the previously described basic usage, in RAG more context information is passed to the LLM. The corresponding answer of the LLM then not only depends on the data on which the LLM has been trained on, but also on external knowledge from documents, provided by the user. This external knowledge is passed as context to the LLM, together with the query. The external knowledge, which is used as context, certainly depends on the user's query. Therefore, the query is first passed to a vector-database, which returns the most relevant documents for the given query. These relevant documents are used as context.

Below we

1. Collect external documents from the web
2. Segment these documents into chunks
3. Calculate an embedding (a vector) for each chunk
4. Store the chunk-embeddings in a vector DB. 

### Collect Documents for External Database

In [8]:
from langchain_community.document_loaders import WebBaseLoader
loader = WebBaseLoader("https://docs.smith.langchain.com/user_guide",encoding="utf-8")

docs = loader.load()

USER_AGENT environment variable not set, consider setting it to identify your requests.


In [9]:
len(docs)

1

In [10]:
docs[0]

Document(metadata={'source': 'https://docs.smith.langchain.com/user_guide', 'title': 'LangSmith User Guide | ü¶úÔ∏èüõ†Ô∏è LangSmith', 'description': 'LangSmith is a platform for LLM application development, monitoring, and testing. In this guide, we‚Äôll highlight the breadth of workflows LangSmith supports and how they fit into each stage of the application development lifecycle. We hope this will inform users how to best utilize this powerful platform or give them something to consider if they‚Äôre just starting their journey.', 'language': 'en'}, page_content="\n\n\n\n\nLangSmith User Guide | ü¶úÔ∏èüõ†Ô∏è LangSmith\n\n\n\n\n\n\n\nSkip to main contentGo to API DocsSearchRegionUSEUGo to AppQuick StartUser GuideTracingEvaluationProduction Monitoring & AutomationsPrompt HubProxyPricingSelf-HostingCookbookThis is outdated documentation for ü¶úÔ∏èüõ†Ô∏è LangSmith, which is no longer actively maintained.For up-to-date documentation, see the latest version.User GuideOn this pageLangSm

### Chunking, Embedding and Storage in Vector DB

In [11]:
from langchain_community.embeddings import OllamaEmbeddings
ollama_emb = OllamaEmbeddings(model="llama3")

In [12]:
from langchain_community.vectorstores import FAISS
from langchain_text_splitters import RecursiveCharacterTextSplitter


text_splitter = RecursiveCharacterTextSplitter()
documents = text_splitter.split_documents(docs)
vector = FAISS.from_documents(documents, ollama_emb)

In [13]:
vector

<langchain_community.vectorstores.faiss.FAISS at 0x1172f2930>

### Create Prompt

In [14]:
from langchain.chains.combine_documents import create_stuff_documents_chain

prompt = ChatPromptTemplate.from_template("""Answer the following question based only on the provided context:

<context>
{context}
</context>

Question: {input}""")

document_chain = create_stuff_documents_chain(llm, prompt)

In [15]:
from langchain_core.documents import Document

document_chain.invoke({
    "input": "how can langsmith help with testing?",
    "context": [Document(page_content="langsmith can let you visualize test results")]
})

'According to the context, LangSmith can let you visualize test results.'

In [17]:
from langchain.chains import create_retrieval_chain

retriever = vector.as_retriever()
retrieval_chain = create_retrieval_chain(retriever, document_chain)

### Query

In [18]:
response = retrieval_chain.invoke({"input": "how can langsmith help with testing?"})
print(response["answer"])

According to the provided context, LangSmith can help with testing in the following ways:

1. **Debugging**: LangSmith allows developers to create datasets and run tests on their LLM applications. This enables debugging by looking through application traces.
2. **Test cases**: Langsmith makes it easy to run custom evaluations (both LLM and heuristic-based) to score test results.
3. **Comparison view**: The comparison view allows users to track and diagnose regressions in test scores across multiple revisions of their application.
4. **Playground**: The playground environment enables rapid iteration and experimentation, allowing developers to quickly test out different prompts and models.
5. **Beta testing**: LangSmith supports collecting feedback on how the LLM application is performing in real-world scenarios, helping to develop an understanding of where it's succeeding or failing.
6. **Annotating traces**: The platform allows annotators (PMs, engineers, or subject matter experts) to 