pkgsrc-Users archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: Something better than www/urlget ?



On Wed, 13 Jul 2016, Paul Goyette wrote:

On Wed, 13 Jul 2016, coypu%SDF.ORG@localhost wrote:

Hi, you'll be able to continue getting data by using an alternate user
agent. try e.g. `curl -A 'Lynx'`.

Actually that doesn't seem to help here. I still get back all the javascript code betweem <script>...</script> tags.

There's a small section following that, between <noscript>...</noscript> tags, but it doesn't contain any useful data.

Further experimentation shows that it does indeed have the raw data that I am looking for. I just need to run 'curl -A Lynx -s $URL' and parse through a 256kb line of text! Yes, not a typo, one line of text with 256k characters! Needless to say I decided to replace some shell symbol manipulation with some grep-foo and sed-goo. :)

I'm a happy camper again!


+------------------+--------------------------+------------------------+
| Paul Goyette     | PGP Key fingerprint:     | E-mail addresses:      |
| (Retired)        | FA29 0E3B 35AF E8AE 6651 | paul at whooppee.com   |
| Kernel Developer | 0786 F758 55DE 53BA 7731 | pgoyette at netbsd.org |
+------------------+--------------------------+------------------------+


Home | Main Index | Thread Index | Old Index