USD is designed to work best as a cached, discretely sampled format. For the most part so far I've tried to keep this guide practical first, theory second, but I'll relax that rule here.
Cache, don't evaluate
USD's speed is as much about what it IS as it ISN'T. A rule the developers have stuck to is that it is not an evaluation engine. What does that mean? Well, Houdini, Maya, C4D, they all do some kind of dependency graph cook under the hood, so while the internals of the .hip/.ma/.c4d file is a bunch of nodes and connections, for it to be converted to dancing pixels, it needs to be processed, evaluated, converted to polygons and motion.
USD on disk, wherever possible, is NOT something to be evaluated. The stuff on disk is geometry, and that geometry can be loaded and thrown to whatever is asking for it super fast, specifically because it doesn't need to be cooked or evaluated.
If USD prefers to not do evaulation, what does this impact? Well procedural generation of course, so standard tricks in Hoduini like scattering trees on a landscape, then changing the landscape at a later date won't work; you'd need to open the work you did in a DCC, and bake it back to disk again.
But this also includes things like keyframes. A camera that's been animated using spline keyframe interpolation, well, that spline needs to be evaluated, so you need to bake that per-frame from your DCC. Character rigs are the same, thats a lot of evaluating of constraints and IK handles and whatever else, USD won't do that.
Ok, some evaluation
Character skinning is an edge case, brought about from production needs, specifically Pixars film Coco. That featured a lot of crowds, too many to just treat as straight per-frame caches. To get around this USD offers a very minimal skeletal deformation engine; you give it a joint rig thats been baked per frame (think like mocap data), a skin with weighting information, and the skin can be deformed at rendertime.
Per frame bakes vs per frame bakes
If you had a rigid car model that was animating around, it would be pretty wasteful if you had to do a vertex level bake on every frame. Luckily USD supports per frame bakes on hierarchies too, so if you have the car parented under a prim, animate that, bake that per frame, you've got a much lighter cache.
Further you could split out animation into different levels of hierarchies, and have each of those baked out. Back when I was doing camera moves in Maya I'd group the camera under a group called 'translate_x', and only translate it. Then I'd parent that under a group called 'rotate_y', and only rotate it. Parent that under a group called 'translate_y', only translate it, etc. With USD I could maintain that whole setup, and have perfect control over the camera.
Linear interpolation and substeps
Again keeping the idea of simple and fast evaluation, USD only does linear interpolation between samples. This means for fast moving things, eg spinning wheels and whatnot, USD can get into issues where you'll see linear motion blur where you might expect curved blur.
You can export substeps to get around this, and like with substeps in rendering, you generally want to use as few substeps as possible.
Sometimes you can get around this by taking advantage of hierarchical motion. Keeping with the wheels example, if you rotate and translate the wheels of a car at high speed under a single transform, you'll get all kinds of artifacts with motion blur. But if you have the wheel under a prim used just for rotation, and that parented under another prim just for translation, you can use less overall substeps, and achieve a better looking result.
The best animation is no animation
USD can be selective about what is cached per frame and what isn't. Similar to ensuring sops aren't time dependendant unless really needed, a USD rop can author some prims per frame, and others as static.
A practical use case here is a heavy RBD sim. The RBD chunks can be written to disk as static heavy geometry, and put aside. The RBD animation can be saved as transform animation (basically a bunch of points converted to transforms), and the cache can load the RBD chunks in as references, and parent them under their matching lightweight transform. There's an example of this in the main USD page on the wiki.
Combining caches for charfx
Write stuff here about cloth sim and fur caches....