i understand why data need aligned (and efforts made accomplish padding) can reduce number of memory accesses assumes processor can fetch addresses multiples of 4(supposing using 32-bit architecture). , because of assumption need align memory. my question is: why can access addresses multiple of 4(efficiency, hardware restriction, one)? which advantages of doing this? why cannot access addresses available? memory constructed hardware (ram) attached memory busses. wider bus, fewer cycles required fetch data. if memory 1 byte wide, you'd need 4 cycles read 1 32-bit value. on time memory architectures have evolved, , depending on class of processor (embedded, low power, high performance, etc.), , cache design, memory may quite wide (say, 256 bits). given wide internal bus (between ram or cache) , registers, twice width of register, fetch value in 1 cycle regardless of alignment if have barrel shifter in data path. barrel shifters expensive, not processors have them;...