Richard Wong
|
a817fe16cc
|
Feat: increased learning rate for effective large batch size learning
|
2024-09-22 22:28:41 +09:00 |
Richard Wong
|
aca80720c8
|
Feat: t5_jax_simple_parallel implements a working example of fsdp
|
2024-09-20 23:42:51 +09:00 |
Richard Wong
|
429e1742ab
|
Feat: flax pjit example
|
2024-09-16 12:19:07 +09:00 |
Richard Wong
|
ad5cf7735f
|
Feat: fsdp demo
Refactor: pulling dataloader code into dataload.py
|
2024-09-15 22:41:00 +09:00 |
Richard Wong
|
005a1a5735
|
Feat: learn flax
|
2024-09-14 14:13:38 +09:00 |
Richard Wong
|
d2dd72227f
|
Feat: introduced efficient train data dtype, jit train step, bfloat16
mat mul
|
2024-09-14 02:02:45 +09:00 |
Richard Wong
|
edd9c3551f
|
Feat: implement working prediction
|
2024-09-12 22:57:19 +09:00 |
Richard Wong
|
f523560141
|
Feat: jax implementation of t5 training and prediction
|
2024-09-11 08:17:02 +09:00 |