
$ ok-transformer --help
Exploring machine learning engineering and operations. ❚

Fig. 1 Effect of batch normalization on the magnitude of preactivation gradients.#
#
Dependencies#
docker 5.0.3
docker-compose 1.25.5
fastapi 0.75.2
keras 2.8.0
matplotlib 3.5.1
mlflow 1.26.1
numpy 1.22.4
optuna 2.10.0
pandas 1.4.2
pipenv 2022.6.7
prefect 2.0b5
scikit-learn 1.0.2
seaborn 0.11.2
tensorflow-datasets 4.5.2
tensorflow-macos 2.8.0
tensorflow-metal 0.4.0
torch 1.13.0
torchvision 0.14.0
uvicorn 0.17.6
xgboost 1.6.0.dev0
Hardware#
GPU 0: Tesla P100-PCIE-16GB (UUID: GPU-543c532b-c511-c675-a565-bf01208405e0)
Model name: Intel(R) Xeon(R) CPU @ 2.00GHz
Socket(s): 1
Core(s) per socket: 1
Thread(s) per core: 2
L3 cache: 38.5 MiB
CPU MHz: 2000.188
MemAvailable: 15212104 kB
Avail
67G
References#
- ALL18
Sanjeev Arora, Zhiyuan Li, and Kaifeng Lyu. Theoretical analysis of auto rate-tuning by batch normalization. CoRR, 2018. URL: http://arxiv.org/abs/1812.03981, arXiv:1812.03981.
- AM16
Sanjeev Arora and Tengyu Ma. Back-propagation, an introduction. 12 2016. URL: http://www.offconvex.org/2016/12/20/backprop/.
- BKH16
Jimmy Lei Ba, Jamie Ryan Kiros, and Geoffrey E. Hinton. Layer normalization. 2016. URL: https://arxiv.org/abs/1607.06450, doi:10.48550/ARXIV.1607.06450.
- BDVJ03a
Yoshua Bengio, Réjean Ducharme, Pascal Vincent, and Christian Janvin. A neural probabilistic language model. J. Mach. Learn. Res., 3(null):1137–1155, mar 2003.
- BDVJ03b
Yoshua Bengio, Réjean Ducharme, Pascal Vincent, and Christian Janvin. A neural probabilistic language model. J. Mach. Learn. Res., 3:1137–1155, March 2003. URL: http://dl.acm.org/citation.cfm?id=944919.944966.
- Cho21
François Chollet. Deep Learning with Python, Second Edition. Abhishek Thakur, 2021. ISBN 9781617296864.
- CUH16
Djork-Arné Clevert, Thomas Unterthiner, and Sepp Hochreiter. Fast and accurate deep network learning by exponential linear units (elus). In Yoshua Bengio and Yann LeCun, editors, ICLR (Poster). 2016. URL: http://dblp.uni-trier.de/db/conf/iclr/iclr2016.html#ClevertUH15.
- DKB+20
Hadi Daneshmand, Jonas Kohler, Francis Bach, Thomas Hofmann, and Aurelien Lucchi. Batch normalization provably avoids rank collapse for randomly initialised deep networks. 2020. arXiv:2003.01652.
- DDS+09
Jia Deng, Wei Dong, Richard Socher, Li-Jia Li, Kai Li, and Li Fei-Fei. Imagenet: a large-scale hierarchical image database. In 2009 IEEE Conference on Computer Vision and Pattern Recognition, volume, 248–255. 2009. doi:10.1109/CVPR.2009.5206848.
- Gan22
T. Ganegedara. TensorFlow in Action. Manning, 2022. ISBN 9781617298349. URL: https://books.google.com.ph/books?id=Hgh0zgEACAAJ.
- Geron19
Aurélien Géron. Hands-on machine learning with Scikit-Learn and TensorFlow : concepts, tools, and techniques to build intelligent systems, Second Edition. O'Reilly Media, Sebastopol, CA, 2019. ISBN 978-1491962299.
- GB10
Xavier Glorot and Yoshua Bengio. Understanding the difficulty of training deep feedforward neural networks. In Yee Whye Teh and Mike Titterington, editors, Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics, volume 9 of Proceedings of Machine Learning Research, 249–256. Chia Laguna Resort, Sardinia, Italy, 13–15 May 2010. PMLR. URL: https://proceedings.mlr.press/v9/glorot10a.html.
- GDG+17
Priya Goyal, Piotr Dollár, Ross Girshick, Pieter Noordhuis, Lukasz Wesolowski, Aapo Kyrola, Andrew Tulloch, Yangqing Jia, and Kaiming He. Accurate, large minibatch sgd: training imagenet in 1 hour. 2017. URL: https://arxiv.org/abs/1706.02677, doi:10.48550/ARXIV.1706.02677.
- HZRS15a
Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep residual learning for image recognition. CoRR, 2015. URL: http://arxiv.org/abs/1512.03385, arXiv:1512.03385.
- HZRS15b
Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Delving deep into rectifiers: surpassing human-level performance on imagenet classification. CoRR, 2015. URL: http://arxiv.org/abs/1502.01852, arXiv:1502.01852.
- HG16
Dan Hendrycks and Kevin Gimpel. Gaussian error linear units (gelus). arxiv, 2016. URL: http://arxiv.org/abs/1606.08415v3.
- HLW16
Gao Huang, Zhuang Liu, and Kilian Q. Weinberger. Densely connected convolutional networks. CoRR, 2016. URL: http://arxiv.org/abs/1608.06993, arXiv:1608.06993.
- IS15
Sergey Ioffe and Christian Szegedy. Batch normalization: accelerating deep network training by reducing internal covariate shift. CoRR, 2015. URL: http://arxiv.org/abs/1502.03167, arXiv:1502.03167.
- Kar22a
Andrej Karpathy. Building makemore part 2: mlp. 9 2022. URL: https://www.youtube.com/watch?v=TCH_1BHY58I.
- Kar22b
Andrej Karpathy. The spelled-out intro to language modeling: building makemore. 9 2022. URL: https://www.youtube.com/watch?v=PaCmpygFfXo.
- Kar22c
Andrej Karpathy. The spelled-out intro to neural networks and backpropagation: building micrograd. 8 2022. URL: https://www.youtube.com/watch?v=VMj-3S1tku0.
- KB15
Diederik P. Kingma and Jimmy Ba. Adam: A method for stochastic optimization. In Yoshua Bengio and Yann LeCun, editors, 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings. 2015. URL: http://arxiv.org/abs/1412.6980.
- KSH12
Alex Krizhevsky, Ilya Sutskever, and Geoffrey E. Hinton. Imagenet classification with deep convolutional neural networks. In F. Pereira, C. J. C. Burges, L. Bottou, and K. Q. Weinberger, editors, Advances in Neural Information Processing Systems 25, pages 1097–1105. Curran Associates, Inc., 2012. URL: http://papers.nips.cc/paper/4824-imagenet-classification-with-deep-convolutional-neural-networks.pdf.
- LLWT15
Ziwei Liu, Ping Luo, Xiaogang Wang, and Xiaoou Tang. Deep learning face attributes in the wild. In Proceedings of International Conference on Computer Vision (ICCV). December 2015.
- Maa13
Andrew L. Maas. Rectifier nonlinearities improve neural network acoustic models. In Proceedings of the 30th International Conference on Machine Learning, volume 28. 2013.
- Min69
S. Minsky, M. Papert. Perceptron: an introduction to computational geometry. The MIT Press, 1969.
- Mis19
Diganta Misra. Mish: A self regularized non-monotonic neural activation function. CoRR, 2019. URL: http://arxiv.org/abs/1908.08681, arXiv:1908.08681.
- RM19
Sebastian Raschka and Vahid Mirjalili. Python Machine Learning, 3rd Ed. Packt Publishing, Birmingham, UK, 3rd edition, 2019. ISBN 978-1789955750.
- Rud16
Sebastian Ruder. An overview of gradient descent optimization algorithms. CoRR, 2016. URL: http://arxiv.org/abs/1609.04747, arXiv:1609.04747.
- SHZ+18
Mark Sandler, Andrew G. Howard, Menglong Zhu, Andrey Zhmoginov, and Liang-Chieh Chen. Inverted residuals and linear bottlenecks: mobile networks for classification, detection and segmentation. CoRR, 2018. URL: http://arxiv.org/abs/1801.04381, arXiv:1801.04381.
- STIM19
Shibani Santurkar, Dimitris Tsipras, Andrew Ilyas, and Aleksander Madry. How does batch normalization help optimization? 2019. arXiv:1805.11604.
- SRYLM22
Sebastian Raschka, Yuxi Liu, and Vahid Mirjalili. Machine Learning with PyTorch and Scikit-Learn. Packt Publishing, Birmingham, UK, 2022. ISBN 978-1801819312.
- SDBR14
Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: the all convolutional net. 2014. URL: https://arxiv.org/abs/1412.6806, doi:10.48550/ARXIV.1412.6806.
- SHK+14
Nitish Srivastava, Geoffrey E. Hinton, Alex Krizhevsky, Ilya Sutskever, and Ruslan Salakhutdinov. Dropout: a simple way to prevent neural networks from overfitting. Journal of Machine Learning Research, 15(1):1929–1958, 2014. URL: http://www.cs.toronto.edu/~rsalakhu/papers/srivastava14a.pdf.
- Vie16
Tim Vieira. Evaluating ∇f(x) is as fast as f(x). 9 2016. URL: https://timvieira.github.io/blog/post/2016/09/25/evaluating-fx-is-as-fast-as-fx/.
- Wai18
Elliot Waite. Pytorch autograd explained - in-depth tutorial. 11 2018. URL: https://www.youtube.com/watch?v=MswxJw-8PvE.
- Wen18
Lilian Weng. From autoencoder to beta-vae. lilianweng.github.io, 2018. URL: https://lilianweng.github.io/posts/2018-08-12-vae/.
- WH18
Yuxin Wu and Kaiming He. Group normalization. 2018. URL: https://arxiv.org/abs/1803.08494, doi:10.48550/ARXIV.1803.08494.
- WJ21
Yuxin Wu and Justin Johnson. Rethinking "batch" in batchnorm. CoRR, 2021. URL: https://arxiv.org/abs/2105.07576, arXiv:2105.07576.
- XRV17
Han Xiao, Kashif Rasul, and Roland Vollgraf. Fashion-mnist: a novel image dataset for benchmarking machine learning algorithms. CoRR, 2017. URL: http://arxiv.org/abs/1708.07747, arXiv:1708.07747.
- ZKL+15
Bolei Zhou, Aditya Khosla, Agata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. 2015. URL: https://arxiv.org/abs/1512.04150, doi:10.48550/ARXIV.1512.04150.