OOK CW is keyed on and off. The RF waveform from the exciter has rise/fall times that are intentionally about 5 ms. It is amplitude modulated. If you view this in the frequency domain there is a carrier, an USB, and a LSB.
A non-linear amplifier will decrease this intentional rise/fall time and in the frequency domain will widen the signal. How far one runs an amplifier into Class-C will depend on how wide a CW signal can be tolerated.
It's a mistake thinking that shaping must be done only at low levels. Look in any 30's/40's/50's ARRL handbook; shaped keying done at the last stage was state of the art, through either cathode or grid-block keying, and some designs with a clamp tube do it at the screen grid. In fact there's a whole chapter for a couple decades just on CW transmitter keying, and preventing click and chirp.
All modern "clean" QRP transmitters do the same by shaping the collector voltage at the output amp. There's a near-universal circuit (I first saw it in W7EL's "Optimized QRP" design although I've been told it predates Roy) that uses RC networks and a PNP transistor to do this very nicely. Again, very intentional shaping done at the PA.