square_approximation
code is pretty self explanatory and decently commented
when limit set to 10^9 or more ram usage goes to >42 gigs of ram which is not ideal and could be improved by changing storage data type or something else
also coefficient could be tinkered around more to make it more accurate at low numbers I will include a graph of errors while using coefficient 2