It's impossible to know if the P4 has a worse branch predictor than the K8, because intel's branch predictor is a trade secret unlike the K8's. However, with a long branch misprediction penalty like the P4 does, I'd think that the P4 would have a complex branch predictor with better hit rates, most probably even a multilevel overriding one that's tweaked to hell.
In any case, CMP and SMT are orthogonal, so one isn't the answer to the other. You can have both, which would be CMT. HT is a cheap tack-on to an existing design that doesn't cost the P4 a lot of die space but yet results in performance improvements in most cases. Intel did a tradeoff between size/complexity/cost, but overall it's not a bad thing.
In any case, CMP and SMT are orthogonal, so one isn't the answer to the other. You can have both, which would be CMT. HT is a cheap tack-on to an existing design that doesn't cost the P4 a lot of die space but yet results in performance improvements in most cases. Intel did a tradeoff between size/complexity/cost, but overall it's not a bad thing.