In this paper, we consider a two-dimensional parabolic
equation with two small parameters. These small parameters make the
underlying problem containing multiple scales over the whole problem
domain. By using the maximum principle with carefully chosen barrier
functions, we obtain the pointwise derivative estimates of arbitrary
order, from which an anisotropic mesh is constructed. This mesh uses
very finer mesh inside the small scale regions (where the boundary
layers are located) than elsewhere (large scale regions). A fully
discrete backward difference Galerkin scheme based on this mesh with
arbitrary $k$-th ($k \geq 1$) order conforming rectangular elements is
discussed. Note that the standard finite element analysis technique can
not be used directly for such highly nonuniform anisotropic meshes
because of the violation of the quasi-uniformity assumption. Then we use
the integral identity superconvergence technique to prove the optimal
uniform convergence $O(N^{-(k+1)} + M^{-1})$ in the discrete $L^2$-norm, where $N$ and $M$ are the number of partitions in the spatial (same in both the $x$- and $y$-directions) and time directions, respectively.