Jun 29
Ari Najarian Best practices on handling user-generated HTML
Jun 29, 2016; 15:09
Ari Najarian
Best practices on handling user-generated HTML
Hi all,
This is a fairly common question on LassoTalk, and I've read through the previous threads that address my question. However, I'm hoping to solicit some insight on the best approach to processing HTML input to mitigate script injection attacks, because I still don't have a definitive answer.
In my opinion, regular expressions are a dumpster fire, and will never be able to effectively weed out all the different string permutations that could conceal malicious code. So this approach doesn't seem feasible for me, because it's security theatre whack-a-mole.
A more effective approach might be XML tree traversal, which would allow me to specify a whitelist of tags and attributes on the first pass, perhaps combining this with regular expressions on the second pass to validate the remaining attributes. This seems like a better approach, but the first rule of programming is "don't", so I'm wondering if anybody else out there has already written this code before. I'd be shocked if I'm the first, and if so, then I'd be happy to share what I write.
But is this even the best approach? Maybe instead of even allowing users to submit HTML, I configure my rich text editor to use a different markup format, like Markdown. That way, I mitigate the risk of malicious HTML, since whatever input the user supplies would be run through a parser that then generates HTML. A quick search revealed that Jono has already started a Markdown parser at https://github.com/iamjono/markdown , so I wouldn't be violating the first rule of programming. This also enforces a limited subset of HTML tags, which might be more predictable when it's time to render into a template.
Is there a fourth approach that's better? Is there a community consensus about which approach is the most sensible? Are there tools, libraries or Lasso tags I don't know about that solve this problem? It seems like Lasso's gigantic standard library unfortunately lacks an HTML sanitization method. I'm basically looking for something like CodeIgniter's security->xss_clean($string) method, but without having to debase myself by using PHP.
Any and all comments, pointers and insight would be appreciated!
Ari.
#############################################################
This message is sent to you because you are subscribed to
the mailing list Lasso Lasso@lists.lassosoft.com
Official list archives available at http://www.lassotalk.com
To unsubscribe, E-mail to: <Lasso-unsubscribe@lists.lassosoft.com>
Send administrative queries to <Lasso-request@lists.lassosoft.com>
Jun 29
Brad Lindsay Re: Best practices on handling user-generated HTML
Jun 29, 2016; 19:57
Brad Lindsay
Re: Best practices on handling user-generated HTML
Jun 30
Marc Vos Re: Best practices on handling user-generated HTML
Jun 30, 2016; 09:17
Marc Vos
Re: Best practices on handling user-generated HTML
Jun 30
Jolle Carlestam Re: Best practices on handling user-generated HTML
Jun 30, 2016; 09:45
Jolle Carlestam
Re: Best practices on handling user-generated HTML
Jun 30
Jolle Carlestam Re: Best practices on handling user-generated HTML
Jun 30, 2016; 11:35
Jolle Carlestam
Re: Best practices on handling user-generated HTML
Jun 30
Jason Huck Re: Best practices on handling user-generated HTML
Jun 30, 2016; 08:12
Jason Huck
Re: Best practices on handling user-generated HTML
Jun 30
Bil Corry Re: Best practices on handling user-generated HTML
Jun 30, 2016; 05:51
Bil Corry
Re: Best practices on handling user-generated HTML