Partitioning is key to good parallel performance, but not the only key. One other key is replication, especially replication of read-only data or, failing that, read-mostly data. In either case, each CPU is (mostly) confined to its partition and/or replica, and therefore pays very little communications penalty, which in turn leads to good performance and scalability.
RCU uses a trick. It shares read-mostly data among CPUs in a shared-memory system, and the CPU caches replicate the data on RCU's behalf. Each CPU has fast access to the data in its local cache, which is one of the things that gives RCU its performance advantage.
In short, RCU uses replication of data, but makes the hardware do the work so that the developer does not have to.