USD is designed to work best as a cached, discretely sampled format. For the most part so far I've tried to keep this guide practical first, theory second, but I'll relax that rule here.
Cache, don't evaluate
USD's speed is as much about what it IS as it ISN'T. The USD developers often point out that USD is not an evaluation engine. What does that mean?
Most DCC's are reactive; you change the input by moving a slider, or changing a model, the app will do some calculations and show you modified geometry. In other words the inputs are evaluated, output is created.
USD on disk, wherever possible, is NOT evaluated. Stuff is baked down to caches on save, those caches are quick to load, and quick to layer on top of each other.
This means workflows you'd expect in Houdini don't always port to USD. You can't have trees scattered on a groundplane in USD, and have the trees update if the ground is changed. Cameras can't be constrained to always point at a target. Fur can't be procedurally generated at rendertime (at least not with vanilla USD).
In all these scenarios, you have to do those updates in your DCC, and write out an updated USD cache.
This cache-centric workflow also means keyframes can't really be supported in USD. A camera that's been animated using spline keyframe interpolation, well, that spline needs to be evaluated, so you need to bake that per-frame from your DCC. Character rigs are the same, thats a lot of evaluating of constraints and IK handles and whatever else, USD won't do that.
Ok, some evaluation
Character skinning is an edge case, brought about from production needs, specifically Pixars film Coco. That featured a lot of crowds, too many to just treat as straight per-frame caches. To get around this USD offers a very minimal skeletal deformation engine; you give it a joint rig thats been baked per frame (think like mocap data), a skin with weighting information, and the skin can be deformed at rendertime.
Per frame bakes vs per frame bakes
If you had a rigid car model that was animating around, it would be pretty wasteful if you had to do a vertex level bake on every frame. Luckily USD supports per frame bakes on hierarchies too, so if you have the car parented under a prim, animate that, bake that per frame, you've got a much lighter cache.
Further you could split out animation into different levels of hierarchies, and have each of those baked out. Back when I was doing camera moves in Maya I'd group the camera under a group called 'translate_x', and only translate it. Then I'd parent that under a group called 'rotate_y', and only rotate it. Parent that under a group called 'translate_y', only translate it, etc. With USD I could maintain that whole setup, and have perfect control over the camera.
Linear interpolation and substeps
Again keeping the idea of simple and fast evaluation, USD only does linear interpolation between samples. This means for fast moving things, eg spinning wheels and whatnot, USD can get into issues where you'll see linear motion blur where you might expect curved blur.
You can export substeps to get around this, and like with substeps in rendering, you generally want to use as few substeps as possible.
Sometimes you can get around this by taking advantage of hierarchical motion. Keeping with the wheels example, if you rotate and translate the wheels of a car at high speed under a single transform, you'll get all kinds of artifacts with motion blur. But if you have the wheel under a prim used just for rotation, and that parented under another prim just for translation, you can use less overall substeps, and achieve a better looking result.
The best animation is no animation
USD can be selective about what is cached per frame and what isn't. Similar to ensuring sops aren't time dependendant unless really needed, a USD rop can author some prims per frame, and others as static.
A practical use case here is a heavy RBD sim. The RBD chunks can be written to disk as static heavy geometry, and put aside. The RBD animation can be saved as transform animation (basically a bunch of points converted to transforms), and the cache can load the RBD chunks in as references, and parent them under their matching lightweight transform. There's an example of this in the main USD page on the wiki.
Combining caches for charfx
Write stuff here about cloth sim and fur caches....
Volume caches and per frame caches