Gen-AI Developer Classroom notes 14/Feb/2026

Data Prepartion Refer Here for the changes done We will be considering only policy folder Refer Here for our initial Data generation. Refer Here for the generated data and Refer Here for synthetic data generation code.

Gen-AI Developer Classroom notes 13/Feb/2026

Synthetic data Synthetic data is not real data but generated data. Synthetic data creation is used because data is expensive privacy concerns fine tuning training models (ml) Before LLMS there were deep learning models used to generate synthetic data. Usecase: HR Helpdesk RAG AN HR Helpdeks RAG typically Answers Leave policies Payroll questions Benefits Travel… Continue reading Gen-AI Developer Classroom notes 13/Feb/2026

Gen-AI Developer Classroom notes 12/Feb/2026

Ensuring only updated docs are indexed use the following code directory_loader = DirectoryLoader( path="../data/updates/IT_Helpdesk_KB_Articles_v2", glob="*.txt", loader_cls=TextLoader, ) documents = directory_loader.load() text_splitter = RecursiveCharacterTextSplitter( chunk_size=100, chunk_overlap=20, ) chunks = text_splitter.split_documents(documents) embedding = VertexAIEmbeddings( model_name="text-embedding-005") vector_store = Chroma( collection_name="kb_collection", embedding_function=embedding, persist_directory="../vectordb/kb_collection_db_sample1", ) # only changed docs and reindex result = index( docs_source=documents, record_manager=sql_record_manager, vector_store=vector_store, cleanup=’incremental’, source_id_key=’source’ )… Continue reading Gen-AI Developer Classroom notes 12/Feb/2026

Gen-AI Developer Classroom notes 06/Feb/2026

Handling document updates Using Record Manager Since RecordManager is removed in latest versions (langchain > 1.2.x) we will discuss alternative solutions in next session Refer Here for the updating index notebook. Enterprise RAGs Organizations mostly adopt models from cloud: Azure AWS GCP direct OpenAI Claude onprem ollama Document Sources: Cloud Fileshares Confluence/Wiki pages Possible options:… Continue reading Gen-AI Developer Classroom notes 06/Feb/2026

Gen-AI Developer Classroom notes 05/Feb/2026

Handling document updates In RAG we build indexing pipeline from document sources and in most of the cases documents get updated. We would look into how to handle document updates i.e. updating vector databases with latest documents. Strategies: Delete and reindex everything Update and reindex only what has changed Sample: Lets generate some documents with… Continue reading Gen-AI Developer Classroom notes 05/Feb/2026

Python Classroom notes 03/Feb/2026

Goal of these sessions After this course, you should be able to build CLI applications Backend APIs Quality UnitTests Security Scans Code Quality Industry standards Design: Design Patterns Architectural Patterns (*) System Design (*)

Gen-AI Developer Classroom notes 03/Feb/2026

Goal of these sessions After this course, you should be able to build CLI applications Backend APIs Quality UnitTests Security Scans Code Quality Industry standards Design: Design Patterns Architectural Patterns (*) System Design (*)

Gen-AI Developer Classroom notes 29/Jan/2026

Dealing with PDF Loading in Langchain Popular libraries for pdf pypdf (largely text) pymupdf (text + images) unstructured (elements) pypdfplumber OCR (scanned pdf) We need to write extra code to extract images pypdf loading for Refer Here ncert panchantra Scenario 1 PDF is full of image illustrations which has text to be extracted, we need… Continue reading Gen-AI Developer Classroom notes 29/Jan/2026

Gen-AI Developer Classroom notes 28/Jan/2026

RAG – Dealing with Real world Data When we build RAG’s for enterprise, We might endup with different forms of data Pure Text Data Multi Modal Knowledge Structure Data (Databases, Tables, CSV) Semi Structure Data Source Code Images & Visual only data Audio and Video Web & Dynamic Content Multi-source Enterprises Multi Modal Knowledge We… Continue reading Gen-AI Developer Classroom notes 28/Jan/2026

Gen-AI Developer Classroom notes 25/Jan/2026

Simple RAG Pipeline We ingest docs into vector databases after chunking and embedding We make the vector store as retriever when a question is asked we get similar docs (chunks) Now lets pass that chunks into prompt and to llm to generate a response Prompt Templates ChatPromptTemplate Refer Here to this notebook Exercise: convert each… Continue reading Gen-AI Developer Classroom notes 25/Jan/2026