Teaching Computers to Speak English

Apr 1, 2023

How does ChatGPT summarizes and generates text so flawlessly? How do language models actually work? This post is for you.

Read →

3 Comments

Ishaan Jaffer

Apr 1, 2023

Really nice post ! Love how digestible some of the technically heavy topics are:

- do you think focusing on optimizing masking can lead to improved outputs ?

- if I wanted to create an LLM better than GPT-4 what component would you recommend focusing on optimizing ?

Expand full comment

Reply (1)

Janvi Kalra

Apr 1, 2023

1. You raise a good point: is there a better way to "learn" a language as opposed to the masking technique? I could imagine significant improvements from improving the masking technique. Especially because it is very costly to have to find the average error on all the training data at every step of the gradient descent.

2. The transformer. The transformer model is the secret sauce to GPT, it's what enabled it to be so powerful. GPT 1 to 4 have basically been the same architechture but at larger scales. Hence, if i had to hazard a guess (as a ML-newbie myself), i think the next huge break through will be when the transformer model gets relaced by a better architecture

Expand full comment

Ngala Y

May 29

You Are amazing and can explain complex stuff really well. Saw your podcast att pragmatic engineer and was really impresswd by you. Keep it up and keep inspiring and sharing knowledge to the world. More great things will come your way.

Expand full comment

janvi kalra

Teaching Computers to Speak English