Hi, I’m Jun. I’m a data warehouse developer lost in vibe coding (Claude Code and OpenCode) these days. :)
I'm working for Snowflake, focusing on building the next-generation cloud data warehouse system.
Before that, I built multimodal AI systems at Microsoft, sepecifically on media AI. I also spent time hacking on Databend, an open-source cloud data warehouse.
I’ve accumulated 12+ years of experience on different systems — just in time for AI to make most of it obsolete.
Oh well. Embrace the AI era. 🤖
Cloud Computing Multimodal Processing FFmpeg AI/ML C/C++ Java JavaScript Python Go Rust Distributed Systems Databases OLAP
Projects
HelloRag
A small, opinionated RAG playground built as a ramp-up project. It wires together document chunking, embedding, and retrieval using FAISS for vector similarity search and Postgres for chunk storage. Not a production system—just a hands-on way to understand how RAG systems actually fit together.
Document chunking with overlap for better context retention
Embedding generation using OpenAI embeddings API
Vector similarity search with FAISS (HNSW index)
Python FAISS HNSW PostgreSQL Vector Search RAG
Snowflake Query Optimizations
I worked on a range of query optimization for Snowflake, including rule-based plan, runtime and data encoding optimizations. These enhancements not only improved overall query performance but also contributed to better TPC-DS benchmark results. Essentially, I tackled optimizations from both a compiler and data encoding perspective to make the engine smarter and faster.
Runtime filter derivation and pushdown
Similar sub-expression merging for batching aggregations
Similar CTE extraction/elimination
FSST encoding improvements for better compression and access speed
C++ Java Snowflake Query Optimizer FSST Encoding
Snowflake Differential Privacy Aggregations
I designed and implemented aggregation constraints at Snowflake to support differential privacy. This feature lets us run analytics while preserving user privacy, and yes, we even filed a patent for it! It’s all about making sure we can get insights without compromising data confidentiality.
Java Snowflake's Aggregation Policy
Databend - Open-Source Data Warehouse
As a hobby contributor to the Databend open-source cloud data warehouse, I added features like bloom filter indexing for micro-partitions, top-k pushdown in aggregations, and a new time interval data type. It was a fun way to dive into an open-source project and improve its performance and functionality.
Bloom filter indexing for micro-partitions to speed up query filtering
Top-k pushdown in aggregation queries for better performance
New time interval data type for enhanced time-based operations
Rust Databend Bloom Filters OLAP
Azure Live Video Analytics (Microsft Media Services)
As the lead of the data-plane for the service, I oversaw the development of the core infrastructure, that covered the pipeline from ingestion to the event detection
The ingestion component, a RTSP/RTP protocol-based video ingestion service
The video processing service, covering video extraction and transcoding.
The AI integration service for video/image analytics
C Azure Media Services RTSP/RTP FFmpeg AI Video Analytics
Microsoft Stream
Led front-end development for the corporate video-sharing service as the website’s core developer.
TypeScript Cloud Service AngularJS
Convoy A/B Testing Parameters Store
Paramters store for A/B testing experiments in Convoy
Cloud Service
GoRtmp – RTMP Library in Go
An RTMP (Real-Time Messaging Protocol) library written in Go. It’s designed to be a lightweight, straightforward implementation of the RTMP protocol, making it easy to integrate into streaming applications.