Welcome, Guest. Please login or register.

Author Topic: Example of C source code for getting web page.  (Read 4378 times)

Description:

0 Members and 1 Guest are viewing this topic.

Offline AmigaEdTopic starter

  • His Dudeness, El Duderino
  • Hero Member
  • *****
  • Join Date: Jan 2005
  • Posts: 512
    • Show only replies by AmigaEd
Example of C source code for getting web page.
« on: January 21, 2006, 02:57:04 AM »
Hello,
I'm trying to learn C and I am wondering if someone out there might have a very simple example of some C source code that will grab a web page and display it or even just save it as a file.

I've looked at a few programs on aminet, but I can't seem to make sense out of them.

Thank you,
AmigaEd
"Pretty soon they will have numbers tattooed on our foreheads." - Jay Miner 1990

La Familia...
A1K - La Primera Dama -1987
A1K - La Princesa- January 2005
A2K - La Reina - February 2005
A2K - Doomy - March 2005
A500 - El Gran Jugador - April 2005
A1200 - La Hermosa Vista - May 2005
A2KHD - El Duro Grande - May 2005
A600 - Prístino - May 2005
A1200 - El Trueno Grande - July 2005
CDTV - El Misterioso - August 2005
C64 - El Gran Lebows
 

Offline koaftder

  • Hero Member
  • *****
  • Join Date: Apr 2004
  • Posts: 2116
    • Show only replies by koaftder
    • http://koft.net
Re: Example of C source code for getting web page.
« Reply #1 on: January 21, 2006, 03:05:10 AM »
Quote

AmigaEd wrote:
Hello,
I'm trying to learn C and I am wondering if someone out there might have a very simple example of some C source code that will grab a web page and display it or even just save it as a file.

I've looked at a few programs on aminet, but I can't seem to make sense out of them.

Thank you,
AmigaEd


This isnt too bad achttp://www.google.com/search?hl=en&q=software+hut&btnG=Google+Searchtually. Learn your tcp library. All you have to do is issue one simple string. something like "GET HTTP 1.0 /"

then the web server simply spits back the result.

so it goes like this:

create socket
set sock addr
set sock port
open socket
send socket ( request_string )
receive result

then write the input uffer to standard output or to a file, what ever

obviously the libs are different for every os

check out the rfc for http for more detial on what you can do with it http://www.faqs.org/rfcs/rfc2616.html

sorry i cant help you with the tcp stuff on amiga, as i have never used socket library on amigaos.


 

Offline koaftder

  • Hero Member
  • *****
  • Join Date: Apr 2004
  • Posts: 2116
    • Show only replies by koaftder
    • http://koft.net
Re: Example of C source code for getting web page.
« Reply #2 on: January 21, 2006, 03:53:38 AM »
Ive written a simple program that does what you are talking about, for linux. Theres still a small problem with it though, compiler is telling me that it doesnt know the size for socket_detials....

Code: [Select]

#include <sys/types.h>
#include <sys/socket.h>
#include <string.h>
#include <stdio.h>
#include <stdlib.h>

int main ( void ) {
        int socket_handle ;
        struct sockaddr_in socket_detials ;
        char * input_buffer;
        char * httpget = &quot;GET HTTP 1.1 / \r\r&quot; ;

        input_buffer = malloc(20000);

        socket_handle = socket ( AF_INET, SOCK_STREAM, 0) ;
        socket_detials.sin_family = AF_INET ;
        socket_detials.sin_addr.s_addr=inet_addr(&quot;68.90.68.66&quot;);
        socket_detials.sin_port = htons(80);

        connect (socket_handle,(struct sockaddr*)&socket_detials, sizeof ( struct sockaddr));
        send ( socket_handle , httpget, strlen(httpget), 0 ) ;
        recv ( socket_handle , input_buffer , 20000, 0 ) ;
        printf ( &quot;%s\n&quot;, input_buffer ) ;

        return 0 ;
}


you may want to check out the simple socket library from http://mysite.verizon.net/astronaut/ssl/

It supports a lot of os's and i seem to remember saying it supported amiga... Socket programming with training wheels
 

Offline AmigaEdTopic starter

  • His Dudeness, El Duderino
  • Hero Member
  • *****
  • Join Date: Jan 2005
  • Posts: 512
    • Show only replies by AmigaEd
Re: Example of C source code for getting web page.
« Reply #3 on: January 21, 2006, 03:59:38 AM »
Quote
by koaftder on 2006/1/20 22:05:10
This isnt too bad achttp://www.google.com/search?hl=en&q=software+hut&btnG=Google+Searchtually. Learn your tcp library. All you have to do is issue one simple string. something like "GET HTTP 1.0 /"


Hi koaftder,
The link you posted seems to take me to a bunch of links to software hut on google.

Can you please post the link again or point me to the correct site.

Thank you,
AmigaEd



"Pretty soon they will have numbers tattooed on our foreheads." - Jay Miner 1990

La Familia...
A1K - La Primera Dama -1987
A1K - La Princesa- January 2005
A2K - La Reina - February 2005
A2K - Doomy - March 2005
A500 - El Gran Jugador - April 2005
A1200 - La Hermosa Vista - May 2005
A2KHD - El Duro Grande - May 2005
A600 - Prístino - May 2005
A1200 - El Trueno Grande - July 2005
CDTV - El Misterioso - August 2005
C64 - El Gran Lebows
 

Offline koaftder

  • Hero Member
  • *****
  • Join Date: Apr 2004
  • Posts: 2116
    • Show only replies by koaftder
    • http://koft.net
Re: Example of C source code for getting web page.
« Reply #4 on: January 21, 2006, 04:04:28 AM »
Quote

AmigaEd wrote:
Quote
by koaftder on 2006/1/20 22:05:10
This isnt too bad achttp://www.google.com/search?hl=en&q=software+hut&btnG=Google+Searchtually. Learn your tcp library. All you have to do is issue one simple string. something like "GET HTTP 1.0 /"


Hi koaftder,
The link you posted seems to take me to a bunch of links to software hut on google.

Can you please post the link again or point me to the correct site.

Thank you,
AmigaEd





Sorry ): i must have hit paste when i typed that. Only link i meant to post on that comment was http://www.faqs.org/rfcs/rfc2616.html for the http protocol specification. I was on the hunt for zip ram, i figure if i cant figure out which order they need to be piled up in the sockets, i'll just fill up every socket.
 

Offline Rooster

  • Jr. Member
  • **
  • Join Date: Jun 2005
  • Posts: 85
    • Show only replies by Rooster
Re: Example of C source code for getting web page.
« Reply #5 on: January 21, 2006, 04:21:17 AM »
"Ive written a simple program that does what you are talking about, for linux. Theres still a small problem with it though, compiler is telling me that it doesnt know the size for socket_detials...."

Shouldn't that be socket_details ?  Maybe that's why it doesn't know the size, incorrect syntax?  Not sure, just guessing...  Looked like a typo, and I ran it through some search engines to check, got 0 hits - there are relevent tcp texts mentioning what I listed though..  Just curious. ;)
 

Offline ChaosLord

  • Hero Member
  • *****
  • Join Date: Nov 2003
  • Posts: 2608
    • Show only replies by ChaosLord
    • http://totalchaoseng.dbv.pl/news.php
Re: Example of C source code for getting web page.
« Reply #6 on: January 21, 2006, 04:32:26 AM »
Quote
koaftder wrote:

you may want to check out the simple socket library from http://mysite.verizon.net/astronaut/ssl/

It supports a lot of os's and i seem to remember saying it supported amiga... Socket programming with training wheels

The word "Amiga" does not exist on that site.

I cannot find any evidence that it has ever been ported to AmigaOS.

Have you personally used Simple Socket Library?
Is it reliable?
Does it limit you in some way?
Wanna try a wonderfull strategy game with lots of handdrawn anims,
Magic Spells and Monsters, Incredible playability and lastability,
English speech, etc. Total Chaos AGA
 

Offline patrik

Re: Example of C source code for getting web page.
« Reply #7 on: January 21, 2006, 04:34:08 AM »
The request-string should be "GET / HTTP/1.0\r\n\r\n" to get the default page. For another page "GET /somedir/somefile.html HTTP/1.0\r\n\r\n". This will make the HTTP-server respond with a header and then the content.

The easiest way to get the default page is to skip the HTTP-version and make the request-string just "GET /\r\n". This will make the server assume a HTTP/0.9 client and just send the file without any header before it.

When writing a simple client, I would recommend sending "HTTP/1.0" as version, because then you will get the header plus have the possibility to support virtual hosts (which almost all webservers use to share many sites on one the same ip-adress), but not have to support chunked transfer mode.

For example this site requires the client to tell what site he is referring to, as the server is using virtual hosts to host several sites. A request to get the default page of amiga.org would look like this:
Code: [Select]
GET / HTTP/1.0\r\n
Host: amiga.org\r\n\r\n
Without supplying the "Host: amiga.org"-line, the webserver wont know what site you are asking for and will return some default page - try entering the ip-adress for amiga.org in a browser and see what happens then.

\r = Carriage Return (CR = 0x0D)
\n = Line Feed (LF = 0x0A)

This page, when working, has some rather good information.

(edit): Removed errors.


/Patrik
 

Offline koaftder

  • Hero Member
  • *****
  • Join Date: Apr 2004
  • Posts: 2116
    • Show only replies by koaftder
    • http://koft.net
Re: Example of C source code for getting web page.
« Reply #8 on: January 21, 2006, 04:40:49 AM »
ok, finally got my code to work
Code: [Select]

#include <sys/types.h>
#include <sys/socket.h>
#include <netinet/in.h>
#include <string.h>
#include <stdio.h>
#include <stdlib.h>

int main ( void ) {
        int socket_handle ;
        struct sockaddr_in socket_detials ;
        char * input_buffer;
        char * httpget = &quot;GET HTTP 1.1 / \x0D\x0A\n\x0D\x0A\n&quot; ;

        input_buffer = malloc(20000);

        socket_handle = socket ( AF_INET, SOCK_STREAM, 0) ;
        socket_detials.sin_family = AF_INET ;
        socket_detials.sin_addr.s_addr=inet_addr(&quot;68.90.68.66&quot;);
        socket_detials.sin_port = htons(80);
        bzero ( &(socket_detials.sin_zero), 8 ) ;

        if ( connect (socket_handle,(struct sockaddr*)&socket_detials, sizeof ( struct sockaddr)) == -1 ){
                printf ( &quot;Couldnt connect to server\n&quot; ) ;
        }
        printf ( &quot;Sending %d bytes\n&quot;,  send ( socket_handle , httpget, strlen(httpget), 0 ) ) ;
        printf ( &quot;Received %d bytes\n&quot;, recv ( socket_handle , input_buffer , 20000, 0 ) ) ;
        printf ( &quot;%s\n&quot;, input_buffer ) ;

        return 0 ;
}


and when i run it i get :

Code: [Select]

koft@macdev:~$ ./socket
Sending 21 bytes
Received 658 bytes
HTTP/1.1 400 Bad Request
Date: Sat, 21 Jan 2006 04:37:48 GMT
Server: Apache/1.3.34 (Unix) mod_auth_passthrough/1.8 mod_log_bytes/1.2 mod_bwlimited/1.4 PHP/4.4.1 FrontPage/5.0.2.2635 mod_ssl/2.8.25 OpenSSL/0.9.7a
Connection: close
Content-Type: text/html; charset=iso-8859-1



400 Bad Request

Bad Request


Your browser sent a request that this server could not understand.


The request line contained invalid characters following the protocol string.





Apache/1.3.34 Server at cpanel1.betterbox.net Port 80



koft@macdev:~$


Tip, when things arent working, use ethereal to view your traffic. I spent some time watching the program hang because i wasnt sending it the right stuff after the get. ( i tried  0d0a0d0a but that didnt do it.... ) I wasnt even sure it actually send the packet to the server, or at the right address or port, untill i fired up ethereal and saw for sure what was really going on.
 

Offline patrik

Re: Example of C source code for getting web page.
« Reply #9 on: January 21, 2006, 04:44:34 AM »
@koaftder:

The reason why you are getting a "400 Bad Request" response is because you are specifying that your client is a HTTP/1.1 client, which requires you to supply the "Host: something.com" header-line, which is optional in HTTP/1.0, but required for virtual hosts to work, so it is definately recommended to supply it anyhow.

With a simple client, there is no advantage in telling the server that your client supports HTTP/1.1 instead of HTTP/1.0, rather disadvantages as then the server is allowed to send you dynamic pages as chunks using the so called "chunked transfer-coding".


/Patrik
 

Offline koaftder

  • Hero Member
  • *****
  • Join Date: Apr 2004
  • Posts: 2116
    • Show only replies by koaftder
    • http://koft.net
Re: Example of C source code for getting web page.
« Reply #10 on: January 21, 2006, 05:17:21 AM »
Quote

patrik wrote:
@koaftder:

The reason why you are getting a "400 Bad Request" response is because you are specifying that your client is a HTTP/1.1 client, which requires you to supply the "Host: something.com" header-line, which is optional in HTTP/1.0, but required for virtual hosts to work, so it is definately recommended to supply it anyhow.

With a simple client, there is no advantage in telling the server that your client supports HTTP/1.1 instead of HTTP/1.0, rather disadvantages as then the server is allowed to send you dynamic pages as chunks using the so called "chunked transfer-coding".


/Patrik


ok, so i will just capture what fire fox sends out when i goto amiga.org, here is the fix  :-)

Code: [Select]

#include <sys/types.h>
#include <sys/socket.h>
#include <netinet/in.h>
#include <string.h>
#include <stdio.h>
#include <stdlib.h>

int main ( void ) {
        int socket_handle ;
        struct sockaddr_in socket_detials ;
        char * input_buffer;
        char * httpget =

          &quot;GET / HTTP/1.1\r\n&quot;
          &quot;Host: www.amiga.org\r\n&quot;
          &quot;User-Agent: Mozilla/5.0 (X11; U; Linux ppc; en-US; rv:1.7.10) Gecko/20050825 Firefox/1.0.6 (Ubuntu package 1.0.6)\r\n&quot;
          &quot;Accept: text/xml,application/xml,application/xhtml+xml,text/html;q=0.9,text/plain;q=0.8,image/png,*/*;q=0.5\r\n&quot;
          &quot;Accept-Language: en-us,en;q=0.5\r\n&quot;
          &quot;Accept-Encoding: gzip,deflate\r\n&quot;
          &quot;Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7\r\n&quot;
          &quot;Keep-Alive: 300\r\n&quot;
          &quot;Connection: keep-alive\r\n&quot;
          &quot;Referer: http://www.amiga.org/gallery/index.php?n=896=33\r\n&quot;
          &quot;Cookie: PHPSESSID=442105507b7dca6d4042a641fc132c8f; AO_Session=442105507b7dca6d4042a641fc132c8f\r\n&quot;
          &quot;Cache-Control: max-age=0\r\n&quot;
          &quot;\r\n&quot;;

        input_buffer = malloc(20000);

        socket_handle = socket ( AF_INET, SOCK_STREAM, 0) ;
        socket_detials.sin_family = AF_INET ;
        socket_detials.sin_addr.s_addr=inet_addr(&quot;68.90.68.66&quot;);
        socket_detials.sin_port = htons(80);
        bzero ( &(socket_detials.sin_zero), 8 ) ;

        if ( connect (socket_handle,(struct sockaddr*)&socket_detials, sizeof ( struct sockaddr)) == -1 ){
                printf ( &quot;Couldnt connect to server\n&quot; ) ;
        }
        printf ( &quot;Sending %d bytes\n&quot;,  send ( socket_handle , httpget, strlen(httpget), 0 ) ) ;
        printf ( &quot;Received %d bytes\n&quot;, recv ( socket_handle , input_buffer , 20000, 0 ) ) ;
        printf ( &quot;%s\n&quot;, input_buffer ) ;

        return 0 ;
}


and it returns the following now:

Code: [Select]

koft@macdev:~$ ./socket
Sending 612 bytes
Received 1460 bytes
HTTP/1.1 200 OK
Date: Sat, 21 Jan 2006 05:03:13 GMT
Server: Apache/1.3.34 (Unix) mod_auth_passthrough/1.8 mod_log_bytes/1.2 mod_bwlimited/1.4 PHP/4.4.1 FrontPage/5.0.2.2635 mod_ssl/2.8.25 OpenSSL/0.9.7a
X-Powered-By: PHP/4.4.1
Set-Cookie: PHPSESSID=442105507b7dca6d4042a641fc132c8f; path=/
Expires: Mon, 26 Jul 1997 05:00:00 GMT
Cache-Control: private, no-cache
Pragma: no-cache
Set-Cookie: AO_Session=442105507b7dca6d4042a641fc132c8f; expires=Saturday, 28-Jan-06 05:03:14 GMT; path=/
Keep-Alive: timeout=15, max=100
Connection: Keep-Alive
Transfer-Encoding: chunked
Content-Type: text/html; charset=ISO-8859-1

d19







Code: [Select]

koft@macdev:~$


i broke up one of thoes lines because it was annoying

Writing network code can be a lot of fun, though it can be a major undertaking if you have to write something more than some hacks for a hobby project.

I have a few books about socket programming for windows and linux, they cover a lot of material but dont seem to dig into some detials i'd like to fill in without having to experiment for years and write mountians of code. Multithreaded socket programming comes into mind, most books seem to skirt around the subject. If i'm writing a server daemon for something, i'm going to need to handle a lot of simultaneous connections. Do i use blocking or non blocking sockets? Do i spawn off a thread for each socket, and make it a blocking socket? Should i spawn of a thread for every 10 sockets and do non blocking io? Should i fork the daemon 5 times and load balance the connections accross the processes? How do i effectively deal with resource starvation caused by jerks who write scripts that keep opening hundreds of connections and letting them hang? Thoes are the topics i'd like to see covered in a book, performance strategies and security issues. I really really dont have time to read all the socket code for apache.
 

Offline AmigaEdTopic starter

  • His Dudeness, El Duderino
  • Hero Member
  • *****
  • Join Date: Jan 2005
  • Posts: 512
    • Show only replies by AmigaEd
Re: Example of C source code for getting web page.
« Reply #11 on: January 21, 2006, 05:39:56 AM »
Quote
by ChaosLord on 2006/1/20 23:32:26
The word "Amiga" does not exist on that site.


I agree I looked that site all over and could find no reference to "Amiga".

I really need some code that I can compile and run on an Amiga. Linux is just not an option for me right now.

Best Regards,
AmigaEd
"Pretty soon they will have numbers tattooed on our foreheads." - Jay Miner 1990

La Familia...
A1K - La Primera Dama -1987
A1K - La Princesa- January 2005
A2K - La Reina - February 2005
A2K - Doomy - March 2005
A500 - El Gran Jugador - April 2005
A1200 - La Hermosa Vista - May 2005
A2KHD - El Duro Grande - May 2005
A600 - Prístino - May 2005
A1200 - El Trueno Grande - July 2005
CDTV - El Misterioso - August 2005
C64 - El Gran Lebows
 

Offline patrik

Re: Example of C source code for getting web page.
« Reply #12 on: January 21, 2006, 05:45:50 AM »
@koaftder:

Dude, no need to send so much! Check my earlier example for amiga.org, which is the only stuff you need to send and should send to make it work with all servers plus make it as easy as possible for you when coding a client. If you dont advertise that your client supports wierd encodings and transfer modes, the server wont utilize such.

This page is rather good regarding what pitfalls there are when designing high performance server software.


/Patrik
 

Offline patrik

Re: Example of C source code for getting web page.
« Reply #13 on: January 21, 2006, 06:35:26 AM »
@AmigaEd:

If targetting for the Amiga, you should take a look at the AmiTCP-SDK which gives you the necessary headers to work with bsdsocket.library (also link-libraries that can do some misc stuff for you, but they are not needed) which is the standard implementation of the BSD sockets API amongst Amiga TCP/IP stacks.

Using bsdsocket.library is not hard, you need to add the AmiTCP-SDKs in your include-path and include , and . Its like using any other shared library - you can utilize its functions after opening it with exec.library/OpenLibrary(). After that, it is more or less identical to programming the BSD sockets API, as far as the networking is concerned.

There are also some examples with the AmiTCP-SDK, even a small HTTP/GET client. If you are new to C, they might not be too straightforward though.


/Patrik
 

Offline koaftder

  • Hero Member
  • *****
  • Join Date: Apr 2004
  • Posts: 2116
    • Show only replies by koaftder
    • http://koft.net
Re: Example of C source code for getting web page.
« Reply #14 on: January 21, 2006, 06:36:49 AM »
@patrik

You are right about the extraneous stuff i had put in there. I just wanted to demonstrate the connection and retrieving some stuff, no need to confuse people.

That document you pointed out is a wonderful read.

@AmigaEd

Sorry to point you off into a wrong direction. I grabbed the package and sure enough, doesnt support amiga ): It's a really nice easy lib to work with. It supported dos/windows/os2/unix/linux/vms, etc. I guess i just thought it ran on amiga cause the guy who wrote it is a nasa geek and everybody seems to mention about how much amiga was used in that organisation.