[{"content":"Welcome to my blog section. I write about software, performance, AI, and systems thinking. Explore the latest posts below and come back for more ideas on engineering, tooling, and product thinking.\n","date":"14 June 2026","externalUrl":null,"permalink":"/blogs/","section":"Blogs","summary":"","title":"Blogs","type":"blogs"},{"content":"","date":"14 June 2026","externalUrl":null,"permalink":"/tags/frontend/","section":"Tags","summary":"","title":"Frontend","type":"tags"},{"content":" Software Engineer · Problem Solver · Systems Thinker # I enjoy taking ambiguous, messy problems and turning them into simple, reliable systems. My work usually sits at the intersection of product intent, system design, and real-world constraints. I prefer boring tech at scale, obsess over system boundaries, and optimize for correctness before cleverness. Apart from work I have been a lifelong supporter of FC Barcelona, and in my free time prefer to read fiction, play basketball or just venture out and meet new people.\nI write about engineering, performance, AI, and systems thinking with the goal of turning complexity into clarity. Below are my latest blog posts — explore them to see what I’m thinking about lately.\n","date":"14 June 2026","externalUrl":null,"permalink":"/","section":"Ishan","summary":"","title":"Ishan","type":"page"},{"content":"","date":"14 June 2026","externalUrl":null,"permalink":"/tags/javascript/","section":"Tags","summary":"","title":"Javascript","type":"tags"},{"content":"","date":"14 June 2026","externalUrl":null,"permalink":"/tags/performance/","section":"Tags","summary":"","title":"Performance","type":"tags"},{"content":"","date":"14 June 2026","externalUrl":null,"permalink":"/tags/rust/","section":"Tags","summary":"","title":"Rust","type":"tags"},{"content":"","date":"14 June 2026","externalUrl":null,"permalink":"/tags/","section":"Tags","summary":"","title":"Tags","type":"tags"},{"content":"I\u0026rsquo;ve been writing JavaScript for years.\nI\u0026rsquo;ve created objects, arrays, functions, closures, promises, maps, sets, and enough React components to make Chrome cry.\nYet if you had asked me a few years ago:\n\u0026ldquo;What happens to all that memory once you\u0026rsquo;re done using it?\u0026rdquo;\nMy answer would\u0026rsquo;ve been something along the lines of:\n\u0026ldquo;I don\u0026rsquo;t know. The browser figures it out.\u0026rdquo;\nAnd honestly, that\u0026rsquo;s not entirely wrong.\nThe browser does figure it out.\nBut that realization led me down a rabbit hole of understanding Garbage Collection, why languages like JavaScript and Java have it, why languages like C++ and Rust don\u0026rsquo;t, and why memory management is probably one of the biggest philosophical differences between programming languages.\nLet\u0026rsquo;s talk about it.\nThe Hotel Room Problem # Imagine you\u0026rsquo;re staying in a hotel.\nEvery time you need a room, the hotel gives you one.\nDone with it?\nGreat.\nNow someone needs to clean it.\nThere are essentially three ways to solve this problem.\nOption 1: Clean It Yourself # This is the C/C++ approach.\nYou check into a room.\nWhen you\u0026rsquo;re done, you explicitly tell the hotel:\n\u0026ldquo;I\u0026rsquo;m leaving. Please clean this room.\u0026rdquo;\nIf you forget?\nThe room remains occupied forever.\nIf you accidentally tell them to clean the room twice?\nChaos.\nThis is effectively what malloc() and free() are doing in C.\nint* number = new int(42); delete number; You allocated memory.\nYou freed memory.\nEverything is your responsibility.\nMaximum control.\nMaximum power.\nMaximum opportunity to shoot yourself in the foot.\nOption 2: Hire A Janitor # This is JavaScript.\nYou don\u0026rsquo;t clean anything.\nYou simply stop using the room.\nAt some point, a janitor walks around the hotel and says:\n\u0026ldquo;Nobody seems to be using this room anymore.\u0026rdquo;\nAnd cleans it.\nThat\u0026rsquo;s Garbage Collection.\nThe language runtime periodically identifies memory that is no longer reachable and reclaims it automatically.\nThe developer never explicitly frees memory.\nOption 3: Hire An Extremely Strict Librarian # This is Rust.\nInstead of a janitor cleaning things later, Rust introduces a librarian who keeps track of exactly who owns every book.\nThe rules are simple:\nEvery piece of data has one owner. When the owner goes away, the data is cleaned up. Ownership can be transferred. Multiple readers are allowed. Only one writer is allowed. If you violate any of these rules:\nThe code doesn\u0026rsquo;t compile.\nNot \u0026ldquo;throws an exception.\u0026rdquo;\nNot \u0026ldquo;fails in production.\u0026rdquo;\nIt simply refuses to build.\nSo What Exactly Is Garbage Collection? # Let\u0026rsquo;s start with a simple example.\nlet user = { name: \u0026#34;Walter White\u0026#34; }; JavaScript allocates memory for that object.\nNow imagine:\nuser = null; Nobody references the object anymore.\nThe object still physically exists in memory for a short period.\nBut eventually the Garbage Collector notices:\n\u0026ldquo;Nobody can reach this object anymore.\u0026rdquo;\nAnd removes it.\nThat\u0026rsquo;s the key concept:\nReachability # Modern JavaScript Garbage Collectors care about whether an object is reachable from the application\u0026rsquo;s roots.\nRoots typically include:\nGlobal variables Active function calls Local variables currently on the stack If an object can no longer be reached from any root, it becomes garbage.\nThe Mark And Sweep Algorithm # Most modern JavaScript engines use some variation of Mark and Sweep.\nThe algorithm is surprisingly simple.\nImagine the Garbage Collector as Thanos.\nNot the snapping part.\nThe part where he scans the universe.\nStep 1: Mark # Start from all root objects.\nMark everything reachable.\nwindow.user -\u0026gt; address -\u0026gt; city Everything connected gets marked as alive.\nStep 2: Sweep # Anything that wasn\u0026rsquo;t marked gets deleted.\nGone.\nReduced to atoms.\nThe collector walks through memory and frees everything unreachable.\nWhich is why the algorithm is called:\nMark and Sweep\nSimple name.\nVery expensive job.\nWhy Not Just Count References? # You might think:\n\u0026ldquo;Why don\u0026rsquo;t we simply count how many references point to an object?\u0026rdquo;\nMany older systems tried exactly that.\nlet a = {}; let b = a; Reference count = 2.\nRemove one reference.\nCount becomes 1.\nRemove the second.\nCount becomes 0.\nDelete object.\nEasy.\nExcept for one problem.\nThe Spider-Man Problem # Imagine Peter Parker and MJ storing each other\u0026rsquo;s phone numbers.\nlet peter = {}; let mj = {}; peter.friend = mj; mj.friend = peter; Now remove the external references.\npeter = null; mj = null; The two objects still point to each other.\nA simple reference counter sees:\n\u0026ldquo;Each object still has one reference.\u0026rdquo;\nSo neither gets deleted.\nMemory leak.\nMark-and-Sweep solves this beautifully.\nThe collector asks:\n\u0026ldquo;Can I reach either of these from the roots?\u0026rdquo;\nNo.\nDelete both.\nProblem solved.\nThe Hidden Cost Of Garbage Collection # At this point GC sounds magical.\nAnd honestly, it is.\nBut magic has a price.\nSomeone still has to stop and inspect memory.\nSomeone still has to traverse object graphs.\nSomeone still has to clean things up.\nThis introduces runtime overhead.\nThat\u0026rsquo;s why GC-heavy applications sometimes experience pauses.\nModern engines like V8 have become incredibly sophisticated with generational, incremental, concurrent and parallel collection strategies, but the core idea remains the same.\nThe janitor still needs to work.\nHe\u0026rsquo;s just gotten much faster.\nEnter Rust # Now let\u0026rsquo;s switch universes.\nImagine if instead of hiring a janitor, we simply never allowed hotel rooms to become ambiguous in the first place.\nThat\u0026rsquo;s Rust.\nRust does not use a Garbage Collector.\nInstead, it uses a system called Ownership.\n{ let name = String::from(\u0026#34;Jesse\u0026#34;); } The moment the scope ends:\n} The memory is automatically released.\nNo Garbage Collector.\nNo runtime scan.\nNo janitor.\nOwnership rules determine exactly when cleanup should happen.\nSo Which One Is Better? # Neither.\nThey\u0026rsquo;re solving different problems.\nGarbage Collected Languages # Examples:\nJavaScript Java C# Go (with GC) Pros:\nEasier developer experience Faster iteration Fewer memory management mistakes Cons:\nRuntime overhead Less predictable performance Potential GC pauses Ownership / Manual Memory Languages # Examples:\nRust C++ C Pros:\nGreater control Predictable performance Lower runtime overhead Cons:\nSteeper learning curve More responsibility Easier to introduce bugs (except Rust, which shifts the pain to compile time) Final Thoughts # One of the biggest realizations for me was that memory management isn\u0026rsquo;t just an implementation detail.\nIt\u0026rsquo;s a language philosophy.\nJavaScript says:\n\u0026ldquo;Trust the runtime.\u0026rdquo;\nC++ says:\n\u0026ldquo;Trust the developer.\u0026rdquo;\nRust says:\n\u0026ldquo;Trust the compiler.\u0026rdquo;\nAnd that single decision ends up influencing everything from developer experience to performance characteristics.\nThe next time you create an object in JavaScript, remember:\nSomewhere deep inside V8, a tiny janitor is waiting patiently for you to stop using it.\nAnd somewhere in the Rust ecosystem, a very angry librarian is making sure nobody loses track of a book.\n","date":"14 June 2026","externalUrl":null,"permalink":"/blogs/memory-management-frenzy/","section":"Blogs","summary":"","title":"The Janitor, The Librarian, and The Rustacean: How Languages Manage Memory","type":"blogs"},{"content":"","date":"14 June 2026","externalUrl":null,"permalink":"/tags/ux/","section":"Tags","summary":"","title":"Ux","type":"tags"},{"content":"","date":"14 June 2026","externalUrl":null,"permalink":"/tags/web-vitals/","section":"Tags","summary":"","title":"Web-Vitals","type":"tags"},{"content":"","date":"24 March 2026","externalUrl":null,"permalink":"/categories/","section":"Categories","summary":"","title":"Categories","type":"categories"},{"content":"","date":"24 March 2026","externalUrl":null,"permalink":"/tags/cli/","section":"Tags","summary":"","title":"Cli","type":"tags"},{"content":"","date":"24 March 2026","externalUrl":null,"permalink":"/categories/projects/","section":"Categories","summary":"","title":"Projects","type":"categories"},{"content":"Here\u0026rsquo;s a collection of my projects and work. These showcase my skills, interests, and contributions across different domains.\n","date":"24 March 2026","externalUrl":null,"permalink":"/projects/","section":"Projects","summary":"","title":"Projects","type":"projects"},{"content":"","date":"24 March 2026","externalUrl":null,"permalink":"/tags/scraping/","section":"Tags","summary":"","title":"Scraping","type":"tags"},{"content":"","date":"24 March 2026","externalUrl":null,"permalink":"/tags/tokio/","section":"Tags","summary":"","title":"Tokio","type":"tags"},{"content":"A small, dependency-minimal Rust web crawler that fetches a seed URL, extracts same-host links from the homepage, and saves HTML responses to disk.\nBuilt for learning and small local crawl tasks — not a production spider.\nTL;DR # What this is: A CLI and library that crawls a seed URL, follows same-host links found on the homepage, and writes each page to disk with structured stdout logging.\nWhat this isn\u0026rsquo;t: A recursive or politeness-aware crawler. No robots.txt, rate limiting, or depth control by default.\nRun: cargo run --release -- \u0026quot;https://example.com\u0026quot; --out-dir crawl_out\nWhat it does # Accepts a seed URL (or hostname) as CLI input Fetches the homepage once Extracts \u0026lt;a href=\u0026quot;...\u0026quot;\u0026gt; links on the first page Normalizes each link to an absolute URL Follows only same-host links Fetches each same-host page once Saves each response body in out_dir using a deterministic URL hash filename Logs crawl events to stdout with status and byte counts Project structure # Module Role src/main.rs CLI entrypoint using clap + tokio src/lib.rs Reusable crawler API src/engine.rs Crawl orchestration src/fetch.rs HTTP fetch wrapper with reqwest src/links.rs HTML link extraction src/storage.rs File path generation, save HTML src/url_util.rs URL normalization and same-host checks src/log.rs Logging abstraction (stdout + pluggable) Usage # Build and run from the project root:\ncargo run --release -- \u0026#34;https://example.com\u0026#34; --out-dir crawl_out Short form:\ncargo run --release -- example.com -o crawl_out Defaults:\nout_dir: crawl_out Output # crawl_out/\u0026lt;url_hash\u0026gt;.html — url_hash is derived from the normalized final URL Stdout log events include: seed, response, fetch, save, skip_links, link_skip, fetch_err, save_err Configuration # No configuration file. CLI args only.\nTests # No test files are currently included. The library is unit-test-friendly via Crawler::with_logger and CrawlConfig.\nDependencies # reqwest — HTTP client tokio — async runtime anyhow — error handling clap — CLI url — URL parsing Extending # Add depth control (breadth-first / recursive crawl) Add robots.txt + rate limiting Add concurrency queue and dedupe URL set Add filter rules (patterns, content types) Instrument with structured logging / metrics Notes # The crawler is intentionally simple. It does not enforce politeness controls by default.\n","date":"24 March 2026","externalUrl":null,"permalink":"/projects/web-crawler/","section":"Projects","summary":"","title":"Web Crawler","type":"projects"},{"content":"","date":"24 March 2026","externalUrl":null,"permalink":"/tags/web-crawler/","section":"Tags","summary":"","title":"Web-Crawler","type":"tags"},{"content":"Circuit breaker pattern for agent and LLM calls: monitors failures and temporarily disables expensive operations when a threshold is exceeded.\nPackage name: agent_circuit_breaker. MIT licensed, Python 3.9+.\nTL;DR # What this is: A zero-dependency Python library that wraps agent/LLM calls with circuit breaker semantics — decorator, context manager, or explicit call() / call_async().\nWhat this isn\u0026rsquo;t: An LLM client or agent framework. It only guards calls you already make.\nInstall: pip install -e . (from the repo) or add agent_circuit_breaker to your project.\nFeatures # Lightweight — zero dependencies for the core library Sync and async — decorator, context manager, and call() / call_async() Configurable — consecutive or sliding-window failure counting, custom predicate, fallback, excluded exceptions Observable — callbacks for state changes, failures, successes, and open/close/half-open transitions Quick Start # Decorator # from agent_circuit_breaker import circuit_breaker @circuit_breaker(failure_threshold=5, recovery_timeout=60) def call_llm(prompt: str) -\u0026gt; str: # your LLM/agent call return response @circuit_breaker(failure_threshold=3, recovery_timeout=30) async def async_call_llm(prompt: str) -\u0026gt; str: return await some_async_client(prompt) Context manager # from agent_circuit_breaker import CircuitBreaker breaker = CircuitBreaker(failure_threshold=5, recovery_timeout=60) with breaker: result = agent.run(task) # async async with breaker: result = await agent.run_async(task) Class-based # from agent_circuit_breaker import CircuitBreaker, CircuitBreakerOpenError breaker = CircuitBreaker(failure_threshold=5, recovery_timeout=60) try: result = breaker.call(agent_function, arg1, arg2) except CircuitBreakerOpenError: result = \u0026#34;Service unavailable\u0026#34; Configuration # Parameter Default Description failure_threshold 5 Number of failures that open the circuit. recovery_timeout 60 Seconds the circuit stays open before a trial (half-open). failure_window None If set, use a sliding time window (seconds) instead of consecutive failures. failure_predicate None Callable (exc) -\u0026gt; bool: only count exception as failure when it returns True. fallback None Callable to run when the circuit is open instead of raising. excluded_exceptions None Tuple of exception types that never count as failures (still re-raised). Exception handling order: first check excluded_exceptions; if the exception is in that tuple, do not count it. Otherwise use failure_predicate if set, else treat as failure.\nStates # CLOSED — Calls allowed; failures are counted. OPEN — Calls blocked; CircuitBreakerOpenError (or fallback) until recovery_timeout has passed. HALF_OPEN — One trial call allowed; success closes the circuit, failure reopens it. Monitoring # Set callbacks on the breaker:\non_state_change(old_state, new_state) on_failure() / on_success() on_open() / on_close() / on_half_open() Requirements # Python 3.9+ ","date":"15 March 2026","externalUrl":null,"permalink":"/projects/agentic-circuit-breaker/","section":"Projects","summary":"","title":"Agent Circuit Breaker","type":"projects"},{"content":"","date":"15 March 2026","externalUrl":null,"permalink":"/tags/agents/","section":"Tags","summary":"","title":"Agents","type":"tags"},{"content":"","date":"15 March 2026","externalUrl":null,"permalink":"/tags/circuit-breaker/","section":"Tags","summary":"","title":"Circuit-Breaker","type":"tags"},{"content":"","date":"15 March 2026","externalUrl":null,"permalink":"/tags/llm/","section":"Tags","summary":"","title":"Llm","type":"tags"},{"content":"","date":"15 March 2026","externalUrl":null,"permalink":"/tags/python/","section":"Tags","summary":"","title":"Python","type":"tags"},{"content":"","date":"15 March 2026","externalUrl":null,"permalink":"/tags/resilience/","section":"Tags","summary":"","title":"Resilience","type":"tags"},{"content":"","date":"7 January 2026","externalUrl":null,"permalink":"/tags/distributed-systems/","section":"Tags","summary":"","title":"Distributed-Systems","type":"tags"},{"content":"","date":"7 January 2026","externalUrl":null,"permalink":"/tags/inference/","section":"Tags","summary":"","title":"Inference","type":"tags"},{"content":"A distributed inference framework for large language models that routes requests to workers, manages KV cache lifecycle, handles failures gracefully, and applies backpressure so memory — not compute — is the bottleneck.\nNote: This is the infrastructure layer. LLM integration is not yet implemented. The system provides the distributed architecture, routing, and session management, but actual model inference needs to be integrated.\nTL;DR # What this is: A production-ready distributed inference framework for scaling LLM serving across multiple workers with memory-aware admission control, backpressure handling, and automatic failure recovery.\nWhat this isn\u0026rsquo;t: A complete LLM inference solution (model integration pending) or a single-server inference engine.\nTech Stack: TypeScript/Node.js (Coordinator) + Rust (Worker) with Express and Axum.\nKey Features: O(1) admission control, horizontal scaling, backpressure, session management, heartbeat-based health monitoring.\nUse Cases # This framework is designed for:\nScaling LLM inference across multiple GPU workers Memory-constrained environments where KV cache management is critical Production deployments requiring high availability and failure resilience Multi-tenant systems needing session isolation and capacity management Streaming inference with backpressure to handle slow clients gracefully Quick Start # git clone \u0026lt;repository-url\u0026gt; cd inference-engine ./start.sh Then test inference: python test_inference.py \u0026quot;What is the capital of France?\u0026quot; or use the curl/scripts below. See Setup and running for full prerequisites and options.\nSetup and running # Prerequisites # Requirement Purpose Node.js 18+ Coordinator (TypeScript/Node) npm Install coordinator dependencies Rust 1.70+ Worker (Rust) — install from rustup.rs LLVM (Windows only) Worker build needs libclang, llvm-nm, and llvm-objcopy for the llama_cpp_sys crate. Install LLVM (e.g. 17.x) and set LIBCLANG_PATH to the LLVM bin directory (e.g. C:\\Program Files\\LLVM\\bin). Also set NM_PATH to the full path to llvm-nm.exe and OBJCOPY_PATH to the full path to llvm-objcopy.exe in the same directory (e.g. C:\\Program Files\\LLVM\\bin\\llvm-objcopy.exe), or add that directory to PATH. start.sh derives NM_PATH from LIBCLANG_PATH if set. 1. Clone and install # git clone \u0026lt;repository-url\u0026gt; cd inference-engine Coordinator (one-time):\ncd coordinator npm install cd .. Worker: No separate install step; start.sh (or cargo build) will compile it.\n2. Download a model (required for inference) # The worker loads a GGUF model file. Default is TinyLlama 1.1B.\nDownload a TinyLlama GGUF from TheBloke/TinyLlama-1.1B-Chat-v1.0-GGUF (e.g. tinyllama-1.1b-chat-v1.0.Q4_K_M.gguf). Place it in modelFiles/ in the project root (create the folder if needed). Optional: set MODEL_PATH to the full path to your .gguf file if you use a different path or filename. Use forward slashes when setting in Git Bash (e.g. E:/Projects/inference-engine/modelFiles/my-model.gguf). 3. Run the system # Option A – Start both with one script (recommended):\n./start.sh This will:\nBuild and start the Coordinator on http://localhost:1337 Build and start the Worker on http://localhost:3001 Press Ctrl+C to stop both.\nOption B – Run Coordinator and Worker separately:\nTerminal 1 – Coordinator:\ncd coordinator npm run build npm start Terminal 2 – Worker (from project root):\n# Optional: set model path (use forward slashes on Windows in Git Bash) # export MODEL_PATH=\u0026#34;E:/Projects/inference-engine/modelFiles/tinyllama-1.1b-chat-v1.0.Q4_K_M.gguf\u0026#34; cd worker cargo build cargo run 4. Test the API # Health checks:\ncurl http://localhost:1337/coordinator/health curl http://localhost:3001/worker/health curl http://localhost:1337/coordinator/health/workers Streaming inference (curl):\ncurl -N -X POST http://localhost:1337/coordinator/infer \\ -H \u0026#34;Content-Type: application/json\u0026#34; \\ -d \u0026#39;{\u0026#34;prompt\u0026#34;:\u0026#34;What is the capital of France?\u0026#34;,\u0026#34;model\u0026#34;:\u0026#34;tinyllama-1.1b\u0026#34;,\u0026#34;max_tokens\u0026#34;:1000}\u0026#39; Python test script:\npython test_inference.py \u0026#34;What is the capital of France?\u0026#34; 1000 Shell test script:\n./test_inference.sh \u0026#34;What is the capital of France?\u0026#34; 1000 5. Environment variables (optional) # Variable Where Description MODEL_PATH Worker Path to GGUF model file. Use forward slashes in Git Bash. Default: .../modelFiles/tinyllama-1.1b-chat-v1.0.Q4_K_M.gguf LIBCLANG_PATH Worker build (Windows) LLVM bin directory (for libclang), e.g. C:\\Program Files\\LLVM\\bin. NM_PATH Worker build (Windows) Full path to llvm-nm.exe. Can be derived from LIBCLANG_PATH (see start.sh). OBJCOPY_PATH Worker build (Windows) Full path to llvm-objcopy.exe, e.g. C:\\Program Files\\LLVM\\bin\\llvm-objcopy.exe. PORT Coordinator Coordinator port (default 1337). HOST Coordinator Coordinator host (default 0.0.0.0). WORKER_ID, WORKER_URL, COORDINATOR_URL Worker Override worker identity and URLs if running multiple workers or custom topology. Architecture # System Type: Distributed coordinator-worker architecture with stateless scheduling.\nCommunication: HTTP/SSE (Server-Sent Events) for streaming, REST for control plane.\nScaling Model: Horizontal scaling by adding workers; coordinator handles routing and admission control.\nThe system consists of three components, each with a single responsibility:\n┌─────────────────────────────────────────────────────────────────┐ │ CLIENT │ └─────────────────────────────────────────────────────────────────┘ │ ▼ ┌─────────────────────────────────────────────────────────────────┐ │ COORDINATOR │ │ • Admission control (O(1)) │ │ • Session tracking │ │ • Backpressure + streaming │ └─────────────────────────────────────────────────────────────────┘ │ ┌───────────────┼───────────────┐ ▼ ▼ ▼ ┌───────────────────┐ ┌───────────────────┐ ┌───────────────────┐ │ WORKER 1 │ │ WORKER 2 │ │ WORKER N │ │ • KV Cache │ │ • KV Cache │ │ • KV Cache │ │ • Model Weights │ │ • Model Weights │ │ • Model Weights │ │ • Decode Loop │ │ • Decode Loop │ │ • Decode Loop │ └───────────────────┘ └───────────────────┘ └───────────────────┘ Components # Coordinator (TypeScript/Node.js)\nEntry point for all client requests Streams tokens from worker to client Applies backpressure — buffers fill, clients get dropped, not workers Tracks sessions for real-time capacity awareness Never touches model weights or KV cache Scheduler (Pure function)\nSelects which worker handles each request Scores workers by session count (60%) and KV cache usage (40%) Rejects early if system is at capacity (O(1) check) Worker (Rust)\nDesigned to own the model — weights, tokenizer, KV cache (LLM integration pending) Prefill: Tokenize prompt, build initial KV cache (infrastructure ready) Decode: Autoregressive token generation (infrastructure ready) Enforces local limits — max sessions, max KV per session No client awareness — just produces tokens into a bounded channel API Reference # POST /coordinator/infer # Start an inference request. Returns streaming tokens via Server-Sent Events.\nRequest:\n{ \u0026#34;prompt\u0026#34;: \u0026#34;string\u0026#34;, \u0026#34;model\u0026#34;: \u0026#34;string\u0026#34;, \u0026#34;max_tokens\u0026#34;: number } Response: text/event-stream\nEach SSE event:\n{ \u0026#34;token\u0026#34;: \u0026#34;string\u0026#34;, \u0026#34;finished\u0026#34;: boolean } Status Codes:\n200 - Success (streaming) 400 - Missing required fields 502 - Worker unreachable or failed 503 - System at capacity See protocol/inference.http.md for complete API documentation.\nConfiguration # Coordinator # Environment variables (optional):\nPORT - Server port (default: 1337) HOST - Server host (default: 0.0.0.0) Worker # Environment variables:\nMODEL_PATH - Path to GGUF model file. Use forward slashes (e.g. E:/path/to/model.gguf) when setting in Git Bash. Default: E:/Projects/inference-engine/modelFiles/tinyllama-1.1b-chat-v1.0.Q4_K_M.gguf Supported models: The worker uses the llama_cpp Rust crate (v0.3), which bundles llama.cpp. Default is TinyLlama 1.1B (TheBloke/TinyLlama-1.1B-Chat-v1.0-GGUF); use any quant e.g. Q4_K_M.gguf. Other supported architectures include Llama, Gemma 2, Phi, Mistral, etc. Gemma 3 is not yet supported by the bundled llama.cpp.\nWORKER_ID - Unique identifier (default: worker-1) WORKER_URL - Reachable URL for coordinator (default: http://localhost:3001) COORDINATOR_URL - Coordinator base URL (default: http://localhost:1337) System Limits # Per Worker:\n100 max sessions 512 MB max KV per session 8 GB total KV cache System-wide:\n1000 total sessions 64 GB total KV cache Project Structure # inference-engine/ ├── coordinator/ # TypeScript/Node.js coordinator service │ ├── src/ │ │ ├── server.ts # Express server setup │ │ ├── infer.ts # Inference request handling │ │ ├── scheduler.ts # Worker selection logic │ │ ├── health.ts # Health check endpoints │ │ └── ... │ └── package.json │ ├── worker/ # Rust worker service │ ├── src/ │ │ ├── main.rs # Entry point │ │ ├── model.rs # Model loading \u0026amp; inference │ │ ├── cache.rs # KV cache management │ │ ├── stream.rs # Token streaming │ │ └── ... │ └── Cargo.toml │ ├── docs/ # Detailed documentation │ ├── ARCHITECTURE.md # System design deep dive │ ├── COORDINATOR.md # Coordinator implementation │ ├── WORKER.md # Worker implementation │ ├── FAILURE_MODES.md # Failure handling strategies │ └── ... │ ├── protocol/ # API specifications │ └── inference.http.md │ ├── start.sh # Quick start script └── README.md Key Features # Distributed Architecture: Framework for scaling inference across multiple workers Memory-Aware Admission Control: O(1) capacity checks prevent overload Backpressure: Slow clients are dropped, not workers Failure Resilience: Automatic retries for prefill failures Session Management: KV cache lifecycle infrastructure with TTL-based cleanup Real-time Health Tracking: Heartbeat-based worker monitoring Streaming Infrastructure: Server-Sent Events with bounded channels for backpressure Keywords: distributed inference, LLM serving, KV cache management, backpressure, admission control, worker scheduling, session management, horizontal scaling, memory-aware load balancing, token streaming, Server-Sent Events, coordinator-worker pattern, failure resilience, health monitoring, heartbeat protocol\nDocumentation # For detailed information, see:\nSystem Overview - High-level design and problem statement Architecture - Deep dive into system design Coordinator - Coordinator implementation details Worker - Worker implementation details Failure Modes - Failure handling strategies Streaming - Token streaming and backpressure KV Cache - KV cache management Scheduler - Worker selection algorithm Observability - Metrics and monitoring Current Status # Project Phase: Infrastructure complete, LLM integration pending.\nThis project provides the infrastructure layer for distributed LLM inference:\n✅ Implemented:\nCoordinator with admission control and session tracking Worker framework with health monitoring and heartbeat Scheduler for worker selection Streaming infrastructure with backpressure Session management and KV cache lifecycle (infrastructure) Failure handling and retry logic 🚧 Pending:\nLLM model integration (model loading, tokenization, inference) Actual KV cache implementation tied to a specific model backend Token generation logic Integration Requirements: To complete LLM integration, implement model loading, tokenization, and inference logic in the worker\u0026rsquo;s model.rs module. The infrastructure for session management, streaming, and KV cache lifecycle is ready.\nTroubleshooting # Worker fails to start # Check ports: Ensure port 3001 is not in use Verify Rust installation: rustc --version should show 1.70+ Check build errors: Review cargo build output for dependency issues Coordinator returns 503 \u0026ldquo;System at capacity\u0026rdquo; # Check worker health: curl http://localhost:3001/worker/health Verify worker registration: curl http://localhost:1337/coordinator/health/workers Check system limits: Review session and KV cache limits Ensure worker is running: Worker must be running and sending heartbeats Coordinator can\u0026rsquo;t reach worker # Verify worker URL: Check WORKER_URL environment variable matches actual worker address Check network: Ensure coordinator can reach worker on the specified port Check heartbeat: Worker should be sending heartbeats every 10 seconds Development # Building # Coordinator:\ncd coordinator npm install npm run build Worker:\ncd worker cargo build --release Testing # See individual component documentation for testing instructions.\nContributing # Contributions welcome! Please read the architecture documentation before making significant changes.\nFor AI/LLM Parsing # Project Summary: Distributed inference framework for LLM serving with coordinator-worker architecture, memory-aware admission control, and backpressure handling.\nPrimary Technologies: TypeScript, Node.js, Rust, Express, Axum, Server-Sent Events.\nArchitecture Pattern: Coordinator-Worker distributed system with stateless scheduler.\nCore Concepts: KV cache management, session lifecycle, admission control, worker scheduling, backpressure, heartbeat monitoring, failure recovery.\nCurrent State: Infrastructure layer complete; LLM model integration pending.\nRelated Documentation: See docs/ directory for detailed architecture, failure modes, streaming, and component-specific documentation.\n","date":"7 January 2026","externalUrl":null,"permalink":"/projects/inference-engine/","section":"Projects","summary":"","title":"Inference Engine","type":"projects"},{"content":"","date":"7 January 2026","externalUrl":null,"permalink":"/tags/nodejs/","section":"Tags","summary":"","title":"Nodejs","type":"tags"},{"content":"","date":"5 January 2025","externalUrl":null,"permalink":"/tags/data-analytics/","section":"Tags","summary":"","title":"Data-Analytics","type":"tags"},{"content":"","date":"5 January 2025","externalUrl":null,"permalink":"/tags/fastapi/","section":"Tags","summary":"","title":"Fastapi","type":"tags"},{"content":"","date":"5 January 2025","externalUrl":null,"permalink":"/tags/machine-learning/","section":"Tags","summary":"","title":"Machine-Learning","type":"tags"},{"content":"Player similarity analysis built on the FIFA 22 dataset. Users define a custom player profile — position and attribute ratings — and the system returns the most comparable real players using cosine similarity.\nIncludes a training notebook (training.ipynb) and a FastAPI app (app.py) backed by players_22.csv.\nTL;DR # What this is: A sports analytics project that lets users build a player profile and find similar FIFA 22 players based on seven comparable attributes.\nWhat this isn\u0026rsquo;t: A production scouting platform. Results are useful but would benefit from additional filters and features.\nStack: Python, pandas, scikit-learn, FastAPI.\nProblem Statement # Let users build a player profile from attributes they care about, then surface similar real players from the dataset. That similarity layer can support further use cases — team formation ideas, recruitment comparisons, or game-like player discovery.\nDataset # Source: Kaggle — FIFA 22 player data Selection: Multiple datasets were evaluated for completeness and fit; FIFA 22 was chosen for a simulation-game-like experience accessible to a general audience Earlier exploration: Data from fbref and Transfermarkt was considered, but scraping added complexity without matching the intended UX Data Cleaning and Preprocessing # Removed missing and inconsistent entries Normalized and standardized numerical values Handled outliers and imbalanced data where needed Mapped detailed position codes into broader roles (attacker, midfielder, defender); goalkeepers are excluded from similarity search Feature Selection # Seven features drive comparison:\nFeature Dataset column Position player_positions Speed pace Passing passing Dribbling dribbling Defense defending Physic physic Shooting shooting These were chosen because they directly influence player comparison and are easy for a general user to fill in.\nEncoding and Similarity # One-hot encoding for position — preserves category uniqueness without implying order between positions.\nMinMax scaling for numerical attributes before combining feature vectors.\nCosine similarity to rank players:\nMeasures orientation rather than magnitude Works well with high-dimensional sparse data Reduces skew from differing feature scales API # The FastAPI app loads players_22.csv, builds a similarity matrix at startup, and exposes:\nPOST /find_similar_players/ Request body:\n{ \u0026#34;player_positions\u0026#34;: \u0026#34;midfielder\u0026#34;, \u0026#34;speed\u0026#34;: 75, \u0026#34;physic\u0026#34;: 70, \u0026#34;defense\u0026#34;: 65, \u0026#34;dribbling\u0026#34;: 80, \u0026#34;shooting\u0026#34;: 72, \u0026#34;passing\u0026#34;: 85 } Response:\n{ \u0026#34;similar_players\u0026#34;: [\u0026#34;Player A\u0026#34;, \u0026#34;Player B\u0026#34;, \u0026#34;Player C\u0026#34;] } Returns the top 3 closest matches by cosine similarity.\nProject Files # File Purpose app.py FastAPI service for similarity queries training.ipynb Exploratory analysis and model development players_22.csv FIFA 22 player dataset Conclusion # The project demonstrates cosine similarity for sports analytics — useful for team formation, recruitment comparison, and performance benchmarking.\nFuture work: Add more features and filters on top of similarity search to narrow results when raw matches are not specific enough.\n","date":"5 January 2025","externalUrl":null,"permalink":"/projects/similar-player-finder/","section":"Projects","summary":"","title":"Similar Player Finder","type":"projects"},{"content":"","date":"5 January 2025","externalUrl":null,"permalink":"/tags/sports/","section":"Tags","summary":"","title":"Sports","type":"tags"},{"content":"","date":"4 December 2024","externalUrl":null,"permalink":"/tags/migration/","section":"Tags","summary":"","title":"Migration","type":"tags"},{"content":"Easy Migration is a MongoDB plugin for writing and managing migrations. It gives you a clear, structured workflow for database changes — pass a URI, your collections, and a callback that runs per record.\nPublished on npm as mongodbplugin.\nTL;DR # What this is: A Node.js library that iterates over a primary MongoDB collection and runs your migration logic on each document, with access to related collections and a built-in logging helper.\nWhat this isn\u0026rsquo;t: A full migration framework with versioning or rollback (rollback support is planned).\nInstall: npm i mongodbplugin\nFeatures # Intuitive API — simple, documented methods for defining migrations Customizable logic — bring your own per-record callback to fit application needs Error handling — structured error reporting and logging for debugging Built-in logging — writeLog persists migration output to a migrationLogs folder Rollback support — planned Quick Start # npm i mongodbplugin Usage # You need four things:\nYour MongoDB URI A primary collection to read migration data from The collections involved in the migration (primary plus any secondary/related collections) A callback with the logic to apply per record const processMigration = require(\u0026#34;mongodbplugin\u0026#34;); const mongoDB_URI = process.env.DB_URL; const primaryCollection = require(\u0026#34;./models/primaryCollection\u0026#34;); const secondaryCollection = require(\u0026#34;./models/secondaryCollection\u0026#34;); const ternaryCollection = require(\u0026#34;./models/ternaryCollection\u0026#34;); const callbackFn = require(\u0026#34;./migrations/updateNewFieldsInDB\u0026#34;); // callbackFn runs once per record inside a loop processMigration( { uri: mongoDB_URI, options: { // additional MongoDB connection options }, }, primaryCollection, [primaryCollection, secondaryCollection, ternaryCollection], callbackFn ); Your callback receives the current document, the collections you passed in, and a writeLog helper:\n// (data, primaryCollection, secondaryCollection, ternaryCollection, writeLog) function migrate(data, primary, secondary, ternary, writeLog) { writeLog(\u0026#34;update\u0026#34;, `Migrating document ${data._id}`); // your migration logic here } Call writeLog(action, logContent) anywhere inside the callback. Logs are saved under migrationLogs/.\nComing Soon # Improved overall performance Improved logging capabilities Rollback support ","date":"4 December 2024","externalUrl":null,"permalink":"/projects/easy-migration/","section":"Projects","summary":"","title":"Mongo Easy Migration","type":"projects"},{"content":"","date":"4 December 2024","externalUrl":null,"permalink":"/tags/mongodb/","section":"Tags","summary":"","title":"Mongodb","type":"tags"},{"content":"","date":"4 December 2024","externalUrl":null,"permalink":"/tags/npm/","section":"Tags","summary":"","title":"Npm","type":"tags"},{"content":" Why should you even bother with UX as an engineer, isn\u0026rsquo;t that what designers are paid for? # Today the user experience lies at the forefront and is the biggest defining quality for the value of a product. You can\u0026rsquo;t have a product that is slow to load, with constant layout shifting and unstable elements, as these hurt the user\u0026rsquo;s overall experience. So to address these problems we have to monitor and optimize for certain metrics during the whole lifecycle of the product. To monitor these metrics we use core web vitals\nWhat are Web Vitals # Web Vitals is an initiative by Google to provide unified guidance for quality signals essential to delivering a great user experience on the web.\nGoogle provides several tools, usable directly from your Chromium browser or other external tools., that can help identify the metrics that matter the most, which are known as core web vitals\nThe core web vitals include:\nLCP (Largest Contentful Paint): It gives us the time taken to render the largest object/image/text block on the page(visible in the viewport), since the time, the user navigated to the page.\nCLS (Content Layout Shift): It gives us the value of the largest change in layout that occurs during the entire lifecycle of the page\nINP (Interaction to Next Paint): INP asses the overall responsiveness to user interaction( not screen size problem), observing the latency of mouse/keyboard clicks and other user interactions, the interaction with the worst latency defines the value of INP\nLet\u0026rsquo;s discuss about two of these metrics, LCP and CLS one by one in detail\nBut how do you calculate these web vitals?🧐 # If it is a web app you can directly use chrome’s lighthouse tool, to calculate values of web vitals, or else you can use this javascript library for ‘web-vitals’, for apps like which are being run inside an iframe or any other kind of sandboxed environment inside a browser.\nLargest Contentful Paint (LCP)? # For this first we need to understand what LCP is, LCP reports the render time of the largest image or text block visible in the viewport, relative to when the user first navigates to the page. A good LCP score is when your LCP is 2.5 seconds or less. Technically LCP score is the 75th percentile of page loads, across all devices.\nElements considered for LCP\n\u0026lt;image\u0026gt; (element inside SVG) \u0026lt;img\u0026gt; \u0026lt;video\u0026gt; url() (CSS background images) block-level elements loading text or inline text Optimizing for LCP - # While there\u0026rsquo;s no single approach, these fundamental techniques can be useful:\nUsing Lazy Loading: Helps in delaying the downloading of the media not present in the viewport\nUtilizing a CDN (Content Delivery Network) for static assets significantly improves LCP (Largest Contentful Paint) by reducing the time it takes to deliver assets. CDNs distribute content across multiple servers, allowing assets to be loaded from a server geographically closer to the user, reducing latency. Webp-based images are a superpower as they are small in size.\nUse Hashes in File Names with Cache-Control By using hashed file names with cache control headers, you ensure that assets are only re-downloaded when changes are made. This reduces unnecessary requests and ensures that users always get the most up-to-date resources, improving LCP by minimizing revalidation. Big Companies like shopify use this to make sure the fastest delivery and only the latest version gets delivered to the user Example - You are to serve a style.css file from your server, make sure its name is not style.css but style\u0026lt;some_hash\u0026gt;.css, which changes after each file update, this is done so that every time the file updates, a new hash is created, which makes it a new file name for browser, and it doesn\u0026rsquo;t use the previously cached on ( though our typical react/next builds, automatically create hashed files for our static files).\nInvestigate and Optimize Network Requests Analyzing the network terminal helps identify slow or redundant requests. Reducing or optimizing these requests can decrease the time needed for the browser to render the page\u0026rsquo;s largest content, directly improving LCP.\nRemove Unnecessary Frontend Requests Removing or deferring non-essential frontend requests reduces the overall load on the network, allowing critical resources to load faster. This prioritization enhances LCP by ensuring that the main content is rendered promptly.\nOptimize Database Queries Streamlining database queries can drastically reduce server response times. Efficient queries ensure that the server can deliver necessary data faster, which contributes to a quicker LCP as the page can render its largest element sooner.\nIncrease Inline Requests from Frontend Increasing inline critical CSS and JavaScript within the HTML reduces the number of render-blocking resources. With fewer external requests, the browser can render the page faster, improving LCP.\nRun Background Tasks for Non-Essential Operations Ensuring that the server focuses solely on delivering the web page while offloading other tasks to background processes prevents delays in page rendering. This prioritization ensures that the LCP is not hindered by non-critical server tasks.\nContent Layout Shift(CLS)? # First, we need to understand what CLS is, CLS is a measure of the largest burst of layout shift scores for every unexpected layout shift that occurs during the entire lifecycle of a page.\nA layout shift occurs any time a visible element changes its position from one rendered frame to the next.\nFor a good user experience, sites should strive to have a CLS score of 0.1 or less. To ensure you\u0026rsquo;re hitting this target for most of your users, a good threshold to measure is the 75th percentile of page loads, segmented across mobile and desktop devices.\nSimply put, CLS occurs when divs jump unexpectedly, To calculate the layout shift score, the browser looks at the viewport size and the movement of unstable elements in the viewport between two rendered frames. The layout shift score is a product of two measures of that movement: the impact fraction and the distance fraction.\nWhat causes these movements in div👁️👁️:\nDynamic pages take some time to load the content, and the div in which the content is loaded also does not have a fixed size. You can use skeletons, but even they will have a layout shift if they are not of the exact size as that of the resultant div. Screen size changes also affect this variedly\nOptimizing for CLS # There are no steps or paths to follow for CLS at least in my experience till now, or some practices, what you need to do is iteratively work on decreasing the delta in change of layout\nFirst principle Techniques you can use\nUsing a skeleton, to be closest to the resultant layout, your design should also allow, the first page of the app, to have a fixed layout at least for the viewport, so that you can give a much better skeleton for it.\nUsing Loaders, these should be used rather carefully, as they will result in a definite layout shift, it\u0026rsquo;s just that they indicate the user of the loading state. a smart way to use loader is to use it on secondary pages/states, like when clicking on a button requires a redirection to another page/state which requires an API call to render information, rather than using a skeleton on a secondary page/state, make a loader on the button you pressed, give the loading state to current page, make the API call get the data, and when the page is ready only then redirect user, and since the final page will be loaded directly there will be no layout shift.\nMake sure there is no element on the page, which is unstable and moves with the viewport\nThe size(aspect ratio) of image and other media should be constant, they should not change with time, or when it gets completely loaded, affects the layout shift adversely.\nAdditional Tips - # Optimization for web vitals is an iterative process, and heavily context-dependent, you can also take the help of softwares like mixpanel, etc, to know which page is visited most often and can optimize your APIs and design systems while keeping that in mind\nDo not do regular deployments of frontend applications, as you are usually serving your application pages from CDN, but they need to be invalidated due to regular deployments. A new asset needs to be served, which will require your server to deliver the page, this interference from the server can increase LCP. Reasoning: Even if your files are hashed, and you do very frequent deployments the first time the static files are called on the browser they will take more time then cached ones, and all of users will be required to download the latest files, maybe with just more planned product lifecycle you can avoid these very frequent deployments and save some time there.\nTry testing your pages on slow network speed (you can do this directly on chrome from network tools), will help you discover key breakages in your app, which usually get missed since we are working with good internet\nMake sure your artifacts are the smallest possible in size, as it always helps to have a lighter page\nConclusion # Maintaining these metrics within specific thresholds is essential, to make sure your application’s SEO score is good and it ranks high and also from the perspective of general user experience\nIf you are an app developer, who develops apps for platforms like, app store, play store, shopify app store, then this is something you must be careful about, as they take these metrics under consideration not only while reviewing your app for publishing but also for giving it badges and featuring it on their platforms.\nThere is a lot more to discuss regarding the two of these, but that will need a separate blog of its own, so till then stay tuned 🤗\n","date":"14 August 2024","externalUrl":null,"permalink":"/blogs/improving-web-vitals/","section":"Blogs","summary":"","title":"Improving UX - Web Vitals Story","type":"blogs"},{"content":"Hi! I\u0026rsquo;m Ishan. Literally when I was born, my parents thought what can be the most powerful name that we can have for our child, they tried to get intersection of 4 hindu dieties -\u0026gt; The Sun, Lord Shiva, Goddess Parvati \u0026amp; Lord Vishnu and thought, great, \u0026ldquo;Ishan\u0026rdquo; can mean any of these at any time, this will be best for our child\nBorn and raised in Kota, Rajasthan, I completed my schooling at Bakshi\u0026rsquo;s Spring Dales School (2019) and graduated with a degree in Computer Science from RTU (2023).\nI am a die-hard fan of FC Barcelona and the Indian Cricket Team. Whether consciously or not, I’ve always tried to model myself after Lionel Messi, Sachin Tendulkar, and Rahul Dravid—striving to be supremely calm under immense pressure, fiercely consistent, and the best at what I do.\nBeyond tech, I’ve always loved testing my limits in different arenas. I used to write original plays and dramas, and even represented my district in debate competitions and the state of Rajasthan in a National-Level quiz. That competitive drive eventually spilled over into entrepreneurship, where my startup was selected as one of just 12 across India for T-Hub’s exclusive 1:1 mentorship program. When I\u0026rsquo;m not building things, I’m usually reading fiction, learning random trivia, or meeting new people.\nThis is my personal space where I share my thoughts, projects, and experiences.\nCareer # Skailama # Software Engineer -\u0026gt; Engineering Lead | June 2023 - April 2026 I build products end-to-end. Over the past few years I\u0026rsquo;ve worked across high-performance web applications, backend systems, AI-powered automation, and SaaS products. I\u0026rsquo;ve improved Core Web Vitals by 70%, optimized critical systems with Rust, automated internal operations using AI, modernized production platforms, and helped take products from zero to profitability. I enjoy solving difficult engineering problems where product, performance, and business outcomes intersect.\nGrayswipe # Co-Founder | October 2021 - Augues 2024 Built and launched a B2B SaaS marketplace for India\u0026rsquo;s textile industry, owning product architecture and full-stack development from day one. Scaled the platform to 50+ manufacturers representing over ₹50 crore in annual turnover, validating early product–market fit. Led a team of engineers while driving technical strategy, delivery, and product execution. Selected among the top 12 startups (out of 375 nationwide) for the T-Hub RubriX Cohort.\nEarly Experiece \u0026amp; Internships # Software Engineer / Freelance Built and shipped a promotional game featured during Flipkart Big Billion Days, integrating directly with Flipkart systems and supporting a large-scale consumer event. Developed blockchain analytics software on the Polygon ecosystem, contributing to the company securing a Polygon fellowship. Implemented distributed authentication using Shamir\u0026rsquo;s Secret Sharing, enabling secure trust distribution and resilient key management.\n","externalUrl":null,"permalink":"/about/","section":"Ishan","summary":"","title":"About Me","type":"page"},{"content":"","externalUrl":null,"permalink":"/authors/","section":"Authors","summary":"","title":"Authors","type":"authors"},{"content":"","externalUrl":null,"permalink":"/series/","section":"Series","summary":"","title":"Series","type":"series"}]