NetBSD-Users archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: firefox resource hog



On Sun, Jan 08, 2023 at 05:32:52PM +0100, tlaronde%polynum.com@localhost wrote:
> Le Sun, Jan 08, 2023 at 09:53:32PM +0530, Mayuresh a écrit :
> > On Sun, Jan 08, 2023 at 04:56:54PM +0100, tlaronde%polynum.com@localhost wrote:
> > > For this, I would use curl(1) (I do use it to automate downloading of
> > > pages when there are no capchas).
> > 
> > How I do this is:
> > 
> > 1. For some of the most simple scenarios, cookies ok but no js - curl / wget
> > 
> > 2. A little more complex, where for some reason wget/curl doesn't work,
> > but still not requiring js - python mechanize
> > 
> > 3. Requiring js - firefox, marionette
> > 
> > Within 3:
> > 
> > 3a. Headless if fully automatable use case, including some captchas which
> > I extract in headless mode and render on a terminal and get interactively
> > from keyboard.
> > 
> > 3b. Non-headless when e.g. you want to automate only logging in to a
> > portal and do further things manually
> > 
> > While I use all of them, 3b is which I require the most. For 3a and 3b
> > firefox works the best for me.
> 
> I inspect the traffic first for example with the developer tools under
> Firefox, when js is only used to verify arguments and put them in
> canonical form before sending them, calling a page with HTTP or HTTPS, 
> with GET or POST.
> 
> Then, knowing what is the "API", I script under curl...

Sometimes pleasantly I'm surprised how much webscraping one can get
done with just shell & curl ;-)

Kind regards,
           Alex.
-- 
"Opportunity is missed by most people because it is dressed in overalls and
 looks like work."                                      -- Thomas A. Edison


Home | Main Index | Thread Index | Old Index