About Me

Hi, I’m Jun. I’m a data warehouse developer lost in vibe coding (Claude Code and OpenCode) these days. :)

I'm working for Snowflake, focusing on building the next-generation cloud data warehouse system. Before that, I built multimodal AI systems at Microsoft, sepecifically on media AI. I also spent time hacking on Databend, an open-source cloud data warehouse.

I’ve accumulated 12+ years of experience on different systems — just in time for AI to make most of it obsolete.

Oh well. Embrace the AI era. 🤖

Cloud Computing Multimodal Processing FFmpeg AI/ML C/C++ Java JavaScript Python Go Rust Distributed Systems Databases OLAP

Projects

HelloRag

A small, opinionated RAG playground built as a ramp-up project. It wires together document chunking, embedding, and retrieval using FAISS for vector similarity search and Postgres for chunk storage. Not a production system—just a hands-on way to understand how RAG systems actually fit together.

  • Document chunking with overlap for better context retention
  • Embedding generation using OpenAI embeddings API
  • Vector similarity search with FAISS (HNSW index)
Python FAISS HNSW PostgreSQL Vector Search RAG

Snowflake Query Optimizations

I worked on a range of query optimization for Snowflake, including rule-based plan, runtime and data encoding optimizations. These enhancements not only improved overall query performance but also contributed to better TPC-DS benchmark results. Essentially, I tackled optimizations from both a compiler and data encoding perspective to make the engine smarter and faster.

  • Runtime filter derivation and pushdown
  • Similar sub-expression merging for batching aggregations
  • Similar CTE extraction/elimination
  • FSST encoding improvements for better compression and access speed
C++ Java Snowflake Query Optimizer FSST Encoding

Snowflake Differential Privacy Aggregations

I designed and implemented aggregation constraints at Snowflake to support differential privacy. This feature lets us run analytics while preserving user privacy, and yes, we even filed a patent for it! It’s all about making sure we can get insights without compromising data confidentiality.

Java Snowflake's Aggregation Policy

Databend - Open-Source Data Warehouse

As a hobby contributor to the Databend open-source cloud data warehouse, I added features like bloom filter indexing for micro-partitions, top-k pushdown in aggregations, and a new time interval data type. It was a fun way to dive into an open-source project and improve its performance and functionality.

  • Bloom filter indexing for micro-partitions to speed up query filtering
  • Top-k pushdown in aggregation queries for better performance
  • New time interval data type for enhanced time-based operations
Rust Databend Bloom Filters OLAP

Azure Live Video Analytics (Microsft Media Services)

As the lead of the data-plane for the service, I oversaw the development of the core infrastructure, that covered the pipeline from ingestion to the event detection

  • The ingestion component, a RTSP/RTP protocol-based video ingestion service
  • The video processing service, covering video extraction and transcoding.
  • The AI integration service for video/image analytics
C Azure Media Services RTSP/RTP FFmpeg AI Video Analytics

Microsoft Stream

Led front-end development for the corporate video-sharing service as the website’s core developer.

TypeScript Cloud Service AngularJS

Convoy A/B Testing Parameters Store

Paramters store for A/B testing experiments in Convoy

Cloud Service

GoRtmp – RTMP Library in Go

An RTMP (Real-Time Messaging Protocol) library written in Go. It’s designed to be a lightweight, straightforward implementation of the RTMP protocol, making it easy to integrate into streaming applications.

Go RTMP Streaming

FFmpeg Contributions

Some random contribution to the Ffmepg

C FFmpeg

Experience

Snowflake

05/2022 — Present

Senior SDE, DB engine team

Convoy

04/2021 — 05/2022

Senior SDE, Data infra team

Amazon

10/2014 — 04/2015

SDE, Amazon shopping cart team

Displays2Go

01/2013 — 10/2014

SDE, Data infra team

Education

Northeastern University

Master of Computer Engineering, Boston, US

Nanjing University

Bachelor of Electrical Engineering, Nanjing, CN

Contact