Sunil Pai
PhD Defense, Electrical Engineering
Advisor: Olav Solgaard
Co-advisors: David AB Miller, Shanhui Fan
March 28, 2022
based on measurements from experiment
Photonics: "engineering light"
Fiber optics
Our photonic integrated circuit
Integrated photonics:
Si waveguides
made in blender
guides light
(like a wire guides electricity)
Programmable photonics allows us to shape and manipulate light programmatically.
Programmable bulk optics
Programmable chip-scale photonics
Some projectors use these!
Chip-scale light manipulation
3D holography / augmented reality
Also does math (matrix-vector multiply)
8 mm long
Bulk optical computing
Spatial light modulator
miniaturize
Electronoobs, Youtube
Holoeye Photonics
H tree circuit
Photonic binary tree in H-tree config can be used to generate a 2D image
Miller, Attojoule Optoelectronics..., JLT, 2016
Machine learning / AI (this talk)
Cryptography / blockchain
Quantum computing
LIDAR (self-driving cars)
Telecom (LiFi, optical phased arrays)
Imaging and biochemical sensing
Augmented / virtual reality
All of these applications follow this model:
This talk: how can we perform these applications in a scalable manner in the presence of error?
Photonic matrix multiply
Sensing and communications
Machine learning / AI * (this talk)
Cryptography / blockchain *
Quantum computing
Our experimental focus *
LIDAR (self-driving cars)
Telecom (LiFi, optical phased arrays)
Imaging and biochemical sensing
Augmented / virtual reality
How does this tech work?
What do we have in the lab?
Waveguides guide / confine light
Light is a wave
1D
2D
source
Oxide
Silicon
2D waveguide simulation solving Maxwell's equations
(using 2D FDFD)
Field value
Waveguides guide / confine light
Oxide
Silicon
Degrees of freedom:
\(\theta(x, t) = k_x x - \omega t\)
Field and power:
Phasor: \(\sqrt{p}e^{i\theta}\)
\(\Psi(y, z)\)
\(\sqrt{p}e^{i\theta}\)
Degrees of freedom:
Phasor field: \(x = \sqrt{p}e^{i\theta}\)
Assumptions:
Coherent, single wavelength (\(\lambda\))
Single mode waveguide
3D
\(x = \sqrt{p}e^{i\theta}\)
monitor
source
Oxide
Silicon
"real part"
50/50 coupler and splitter
Oxide
Silicon
Differential phase \(\theta\)
Interaction
Coherent light interference or coupling depends on differential phase \(\theta\)
Same power for modes \(\bm{x}\) and \(\bm{y}\).
\(\boldsymbol{y}\)
\(\boldsymbol{x}\)
Interaction
50/50 coupler and splitter
Oxide
Silicon
Differential phase \(\theta\)
Interaction
\(\color{darkblue}\begin{bmatrix}y_1 \\ y_2 \end{bmatrix} \color{black} = \frac{1}{\sqrt{2}}\begin{bmatrix}1 & i \\ i & 1 \end{bmatrix}\color{darkred}\begin{bmatrix}x_1 \\ x_2 \end{bmatrix}\)
\(\color{darkred} \begin{bmatrix}x_1 \\ x_2 \end{bmatrix} = \frac{1}{\sqrt{2}}\begin{bmatrix} e^{i\theta} \\ 1 \end{bmatrix}\)
Phasor:
Device operator:
\(\boldsymbol{y}\)
\(\boldsymbol{x}\)
Split ratio: \(\sin^2(\Delta \beta L_{\mathrm{int}})\)
\(L_{\mathrm{int}}\)
Thermal (resistive heating)
Oxide
Silicon
TiN
_
+
Heat
50 mW
Silicon index increases with heat
\(\sqrt{p} e^{i\theta} \to \sqrt{p} e^{i(\theta + \color{darkgreen}\delta \theta \color{black})}\)
\(\color{darkgreen}\delta \theta \color{black} \propto \Delta T L_{\mathrm{PS}}\)
Apply voltage, achieve phase shift
Phase changes but power stays the same
Set \(\theta, \phi\)
\(\boldsymbol{y} = \left(e^{i\phi}\sin \frac{\theta}{2}, \cos \frac{\theta}{2}\right)\)
Assume \(\bm{x} = (0, 1)\)
(MZI = Mach-Zehnder interferometer)
\(\boldsymbol{y}\)
\(\boldsymbol{x}\)
\(\theta\)
\(\phi\)
\(\boldsymbol{y}\)
\(\boldsymbol{x}\)
\(\theta\)
\(\phi\)
\(\bm{y} = U_2 \bm{x}\)
\(2 \times 2\)
\(2\)-vector
\(2\)-vector
An MZI controls where light goes.
Sweep \(\theta, \phi\) to "nullify" top power
\(\boldsymbol{y}\)
\(\boldsymbol{x}\)
\(\theta\)
\(\phi\)
\(\boldsymbol{y}\)
\(\boldsymbol{x}\)
\(\theta\)
\(\phi\)
We can deduce \(\boldsymbol{x} = \left(e^{-i\phi}\sin \frac{\theta}{2}, \cos \frac{\theta}{2}\right)\)
\(2 \times 2\)
\(2\)-vector
\(2\)-vector
\(\bm{y} = U_2 \bm{x}\)
An MZI controls where light goes.
\(\color{darkblue}\begin{bmatrix}y_1 \\ y_2 \end{bmatrix} \color{black} = i \begin{bmatrix}e^{i\phi}\sin \frac{\theta}{2} & \cos \frac{\theta}{2} \\ e^{i\phi}\cos \frac{\theta}{2} & -\sin \frac{\theta}{2} \end{bmatrix}\color{darkred}\begin{bmatrix}x_1 \\ x_2 \end{bmatrix}\)
Any MZI (analyzer orientation) can be represented as performing the following operation:
\(\boldsymbol{y}\)
\(\boldsymbol{x}\)
\(\theta\)
\(\phi\)
Light needs to get from a laser to the chip
This can be achieved using an optical fiber and focusing grating.
Optical I/O
10\({}^\circ\)
to chip
to chip
Fiber outputs Gaussian beam with mode field diameter (MFD) \(\sim 10 \mu \mathrm{m}\)
Building blocks
Unit cell MZI
MZI "mesh"
Interfere modes
Guide light
Control phase
\(\boldsymbol{y}\)
\(\boldsymbol{x}\)
\(\theta\)
\(\phi\)
Setup or measure mode pair
Arbitrarily reshape light
Optical I/O
Energy conservation:
\(\|\bm{y}\| = \|U\bm{x}\| = \|\bm{x}\|\)
\(P = \|\bm{y}\|^2 = \|\bm{x}\|^2\)
Why is this useful?
\(U\) is a unitary matrix
Generate any \(N\)-mode "image"
Analyze any \(N\)-mode "image"
Binary tree: all devices connected to one "root" MZI node
node
Recursive definition
node
Recursive definition
Example \(N = 8\)
Balanced (depth 3)
Unbalanced (depth 7)
Model free = self-corrects any component errors
\(N - 1\) nullifications
Balanced: \(\log_2 N\) steps
Unbalanced: \(N\) steps
"Self-configure" any orthogonal basis set or unitary \(U\) in a universal mesh.
Result:
Mode orthogonality: a mode cannot be linear combination (sum) of other orthogonal modes.
Unbalanced tree cascade
Unitary matrix
Cascaded binary trees: cascade analyzers to construct any unitary matrix
Key attributes:
Why is this useful?
In some cases, we can express some problem in terms of a subset of modes \(M < N\).
Low rank SVD has connections to principal components analysis (PCA) / dimensionality reduction
We can express an arbitrary matrix (not just unitary) using a singular value decomposition (SVD):
Full rank SVD
Low rank SVD
Key features:
Pai et al. Parallel programming... IEEE JSTQE, 2020
"Breadth-first search" to arrange MZIs into columns
In preparation
My theoretical contribution
Device
Ideal
Linear optical error:
Types of systematic error:
Goal: describe how component errors relate to overall error \(\epsilon^2\).
Unbalanced
Balanced
Architecture dependence
vs
Causes include: calibration error, environmental perturbations (humidity, thermal drift, etc.)
Phase error \(\Delta\) centered at 0:
Device
Ideal
Error function:
\(\mathcal{H}_{\epsilon^2}\) is known as the Hessian.
Hessian: \(\mathcal{H}_{\epsilon^2} = \frac{\partial \langle \epsilon^2 \rangle}{\partial \eta_i \partial\eta_j}\)
Diagonal terms: affect uncorrelated/random errors
Off-diagonal terms: affect mostly constant errors
Set up the foundation for new Hessian error theory of photonic mesh networks.
Uncorrelated / random errors (e.g. phase error):
Only \(\mathcal{H}_{\theta\theta}\) contribute.
\(E[\delta_i\delta_j] = 0\) and \(E[\delta_i^2] := \sigma_i^2\). Errors add in quadrature.
Correlated errors (e.g. bandwidth):
Affected by the entire Hessian (\(E[\delta_i\delta_j] = E[\delta_i]E[\delta_j]\) or \(\delta_i\delta_j \neq 0\)).
Phase perturbation \(\Delta\) centered at 0:
Device
Ideal
Error function
Correlation analysis
Correlation diagram
0th order
1st order
2nd order
Sensitivity \(\mathcal{H}_{\theta\theta} = p_\theta\), power through phase shift \(\theta\)
Perturb one phase shift \(\theta\):
\(\mathcal{H}_{\theta\theta} = \frac{\epsilon^2(\delta\theta)}{\delta \theta^2} \Bigg|_{\delta\theta = 0}\)
Define sensitivity:
For any "feedforward" optical device with I/O \(\bm{x}, \bm{y}\):
Assume average: \(N\) uniform powers
Total power \(P = 1\)
Unbalanced \(Np_\theta\)
Balanced \(Np_\theta\)
Balanced trees are more robust while having the same number of components.
Assume Gaussian random inputs
Power in the waveguide spanning \(N\) outputs in its subtree follows this distribution assuming standard random input.
Relative power in waveguide spanning \(N_1\) outputs in its subtree for \(N_2 = N - N_1\).
Unbalanced \(Np_\theta\)
Balanced \(Np_\theta\)
\(\mathcal{H}_{\theta\theta} = \frac{\epsilon^2(\delta\theta)}{\delta \theta^2} \Bigg|_{\delta\theta = 0} = p_\theta \)
Error function scales with sum of powers through waveguide segments
Total error:
Sensitivity:
sum over all phase shifts
Balanced trees: \(\propto \log N \sigma^2 \to \log N \sigma^4\)
Unbalanced trees: \(\propto N \sigma^2 \to N \sigma^4\)
Note: \(2^{16} = 65536\)
After self-configuration
Tens of thousands of modes feasible!
Balanced trees are affected less by correlated error compared to unbalanced trees.
Unbalanced \(\mathcal{H}_{\epsilon^2}\)
Balanced \(\mathcal{H}_{\epsilon^2}\)
Number of trees (\(M\), rank) reduces the performance gap.
Balanced architectures go from \(\log N \to N\) error scaling as \(M \to N\).
Constant sqrt sensitivity \(\epsilon_{N, M} / \delta\)
Random error sqrt sensitivity \(\epsilon_{N, M} / \sigma\)
Balanced trees
Unbalanced trees
A new sensitivity theory
A new cascade architecture
Balanced cascades outperform unbalanced cascades for small \(M\)
In preparation
My experimental contribution
Need in commercial AI:
Advantages of photonic mesh
Data
Photonic neural network engine
Facial recognition
Self-driving car
Recommendations
Chat-bot
(Photonic neural net = PNN)
Intelligent response
Cost function: \(\mathcal{L}(\bm{y}, \widehat\bm{y})\)
Desired labels: \(\bm{y}\)
Probabilities: \(\widehat\bm{y}\)
Data
\(\color{darkblue}\bm{y}^{(\ell)} \color{black}= \color{darkgreen}U^{(\ell)}\color{darkred}\bm{x}^{(\ell)}\)
On-chip
\(\color{darkred}\bm{x}^{(\ell + 1)} \color{gray}= f^{(\ell)}(\color{darkblue}\bm{y}^{(\ell)}\color{gray})\)
Off-chip
Use photonics to classify handwritten digits from 0 to 9
MNIST dataset
\(8 \times 8\)
Electro-optic nonlinearity
Williamson et al. JSTQE 2019
In just 200 epochs (dataset passes) we can achieve near-perfect accuracy (98%) in MNIST
Can this training be done using photonics experiment?
Train / test split: 80 % / 20 %
Update method: "Adam" update
Training method: Batch gradient descent
Model training
\(\mathcal{L}(\boldsymbol{x}) = \mathrm{softmax\ cross\ entropy}[|U_3|U_2|U_1 \boldsymbol{x}|||^2]\)
\(\boldsymbol{x} = (x_1, x_2, p, p, 0), \|\boldsymbol{x}\| = 3 \)
Classification problem: Sklearn 2D point classification
To program unitaries and inputs, we need to calibrate the chip.
0
\(\pi / 2\)
\(\pi\)
Need a fast PD to measure phase shift time (WIP)
Likely less than 1 kHz switch time
Evaluate the entire cost function once per parameter, \(D\) params
This is highly inefficient especially in hybrid PNNs (our use case)
Perturbative gradients = numerical differentiation
\(D\) can be in the millions in modern neural nets
To date, no photonic/optical backpropagation (machine learning) has been experimentally demonstrated.
Green bars: measure power going through phase shifter
Advantages over perturbative: modular (for hybrid PNN) and efficient (\(D\) times faster)
Backpropagation = widely used for training (fueled 2010s deep learning boom)
In situ backpropagation incorporates an experimental implementation of inverse design.
Gradient update: \(\frac{\partial \mathcal{L}}{\partial \epsilon} = -\mathcal{R}\left(\boldsymbol{b}_{\mathrm{aj}}^T \hat A^{-1} \frac{\partial \hat A}{\partial \epsilon} \hat A^{-1} \boldsymbol{b}\right)\)
Similarity:
Phase shifts \(\bm{\theta}\) are related to permittivity \(\bm{\epsilon}\) of the inverse design problem.
Cost \(\mathcal{L}\) is a desired mode overlap
Hughes et al., Shanhui Fan. Optica 2018.
Operator \(\hat A = (\nabla \times \nabla \times - k_0^2 \epsilon)\)
Freq-domain equation: \(\bm{b} = \hat A(\omega) \bm{e}\)
input source
field
NQP lab, Stanford
Data
Backpropagation can be applied to a hybrid multilayer system
Leverage the energy of photonics for both directions:
"Linear in optics, nonlinear in electronics"
Experiment
Circle dataset (easy)
80-20 train-test split
Adam update (i.e., not SGD)
250 data points total
Autodiff powered by JAX/Haiku
This is the first demonstration of backpropagation in an optical chip to our knowledge
96%/93% model test/train accuracy after training
Experiment
Moons dataset (medium)
80-20 train-test split
Adam update
500 data points total
Autodiff powered by JAX/Haiku
Note: We use the correct (expected phase) instead of the measured phase (order of magnitude more error)
We observe excellent agreement between simulated digital training and in situ training despite gradient error.
96%/93% model test/train accuracy after training
We have a functioning experimental prototype of a photonic mesh
This can be used for many applications.
We choose a machine learning application evaluated on standard 2d classification problems.
Inference task: high accuracy (98% on moons dataset)
Training task: First demo of analog (in-situ, on-chip) backpropagation on an optical chip
96%/93% model test/train accuracy on noisy circle dataset
Key changes:
Parallelize over all layers in the network.
In our setup: we "simulate" this step without a backprop unit.
The fractional phase gradient error becomes more of an issue as it gets smaller due to active thermal noise / drift.
Phase error means "distance from optimal value."
Mean square fractional error: \(1- \boldsymbol{g} \cdot \hat{\boldsymbol{g}}\), where \(\boldsymbol{g}, \hat{\boldsymbol{g}} \) is normalized
\(\mathcal{L}_m = 1 - |\widehat{\bm{u}}_m^T \bm{u}^*_m|^2\), \(\bm{u}_m\) is row \(m\) of \(U\)
Digital update
Analog update
backprop unit
Digital: scaling, nonlinearities, elementwise ops that are \(O(N)\)*
Analog: Only linear optics \(O(N^2)\)
Do we beat digital with photonic hybrid solution?
Off-chip
On-chip
Note: Most of the energy is in analog-digital conversion
Problem: digital subtraction is costly due to A/D conversion.
We experimentally and theoretically explored programmable linear operations in optics.
Theory: new error theory for universal binary tree circuits
Experiment: first backprop demo
based on measurements from experiment
Our photonic integrated circuit
Si waveguides
made in blender
The first demonstration of backpropagation in an optical chip.
New theory of error-tolerant cascaded binary tree optical devices.
In this talk (upcoming papers):
Not in this talk (upcoming and past papers):
Photonic blockchain using photonic meshes
Design/simulation/testing of MEMS phase shifters, couplers
Parallel programming of an arbitrary feedforward photonic network
Google Scholar
Phox framework (work in progress)
Goal of the project: a full stack open source framework for programmable photonics!
Related paper: In preparation.
Olav's glorious bike ride
"DAB Miller" (found in lab)
OSA
Dinner party!
SUPR retreat
Fan lab
Solgaard lab
Halloween
Grad party
Fan group hike
Proof of photonic computational work ensures security for:
Photonic blockchain: any blockchain application that includes photonic hardware as part of the proof of work (PoW) computation.
Photonic PoW
Blockchain
Crypto
Core question: can we build a photonic blockchain technology using a systematic error-prone analog device?
Hash error rate: \(1 - (1 - \mathrm{BER})^{256} \approx 256 \cdot \mathrm{BER} \), assume independent
Key assumptions
Systematic error examples:
Random error examples
Hardware-agnostic error correction:
Same energy, more footprint
Expected improvement:
Systematic error reduced by factor of \(\sqrt{R}\)
Random error stays the same
Note: Bulk of the energy in photonic computing is in analog to digital conversion
Error correction results in less output error and smaller hash error rate
Tick marks are 2 apart
Dispersion analysis:
Improve time-efficiency by parallelizing computation over many wavelengths.
Only works up to some acceptable error.
Error correction
Error correction reduces the output error standard deviation.
This is only for \(N = 4\), but what about \(N > 4\)?
Key conclusion: Sharp boundary between feasible/infeasible
Key finding:
\(\sigma_{\mathrm{out}} \propto NK\sigma\) for phase, coupling
\(\sigma_{\mathrm{out}} \propto N^{3 / 2}K\sigma\) for loss
Why to increase \(N\):
Photonic advantage is higher
Why to increase \(K\):
Smaller footprint, more output bits